Home › Azimuth Project › › Azimuth Blog

It looks like you're new here. If you want to get involved, click one of these buttons!

- All Categories 2.4K
- Chat 503
- Study Groups 21
- Petri Nets 9
- Epidemiology 4
- Leaf Modeling 2
- Review Sections 9
- MIT 2020: Programming with Categories 51
- MIT 2020: Lectures 20
- MIT 2020: Exercises 25
- Baez ACT 2019: Online Course 339
- Baez ACT 2019: Lectures 79
- Baez ACT 2019: Exercises 149
- Baez ACT 2019: Chat 50
- UCR ACT Seminar 4
- General 75
- Azimuth Code Project 110
- Statistical methods 4
- Drafts 10
- Math Syntax Demos 15
- Wiki - Latest Changes 3
- Strategy 113
- Azimuth Project 1.1K
- - Spam 1
- News and Information 148
- Azimuth Blog 149
- - Conventions and Policies 21
- - Questions 43
- Azimuth Wiki 718

Options

Okay, now I'm going to write an article summarizing the 2 papers by Ludescher *et al*:

Nothing here yet...

## Comments

What I wrote at Climate network may be useful.

`What I wrote at [[Climate network]] may be useful.`

Thanks! Right now I'm grabbing lots of stuff from the thread Paper - Ludescher et al - Improved El Niño forecasting by cooperativity de and the wiki page Experiments in El Niño detection and prediction. I'll try to combine all our information about their paper (not yet our attempts to replicate it) in a readable way.

`Thanks! Right now I'm grabbing lots of stuff from the thread [Paper - Ludescher et al - Improved El Niño forecasting by cooperativity de](http://forum.azimuthproject.org/discussion/1360/) and the wiki page [[Experiments in El Niño detection and prediction]]. I'll try to combine all our information about their paper (not yet our attempts to replicate it) in a readable way.`

I have a fairly complete draft up now:

Please give it a look, folks! I would like to put this on the blog in a few days.

A couple of issues:

1) The blog article currently discusses both $C_{i,j}^{t}(-\tau)$ and $C_{i,j}^{t}(\tau)$. Do Ludescher

et alactually use both of these? They define both, but when defining their all-important 'link strength' they only mention $C_{i,j}^{t}(\tau)$. Are they being sloppy here or what?2) The blog article currently currently follows David Tanzer and says:

This is more precise than what Ludescher

et alsay... it seems to me. Are we sure this is how they define the link strength: letting $\tau$ vary from 0 to 200 days?The two issues are related.

`I have a fairly complete draft up now: * [[Blog - El Nino project (part 3)]] Please give it a look, folks! I would like to put this on the blog in a few days. A couple of issues: 1) The blog article currently discusses both $C_{i,j}^{t}(-\tau)$ and $C_{i,j}^{t}(\tau)$. Do Ludescher _et al_ actually use both of these? They define both, but when defining their all-important 'link strength' they only mention $C_{i,j}^{t}(\tau)$. Are they being sloppy here or what? 2) The blog article currently currently follows David Tanzer and says: > Next, for nodes $i$ and $j$, and for each time point $t$, the maximum, the mean and the standard deviation around the mean are determined for $C_{i,j}^t(\tau)$, as $\tau$ varies across its range (0 to 200 days). > They define the <b>link strength</b> $S_{i j}(t)$ as the difference between the maximum and the mean value, divided by the standard deviation. This is more precise than what Ludescher _et al_ say... it seems to me. Are we sure this is how they define the link strength: letting $\tau$ vary from 0 to 200 days? The two issues are related.`

On 1), they use both. This seems clear from Fig 4 in the supplementary information. $\tau$ is in (0 to 200) when the $C_{i,j}^{t}$ are defined, and (-200 to 200) when used.

This should change, as its symmetric between inside and outside.

And this.

`On 1), they use both. This seems clear from Fig 4 in the supplementary information. $\tau$ is in (0 to 200) when the $C_{i,j}^{t}$ are defined, and (-200 to 200) when used. > Finally, they let $S(t)$ be the average link strength, averaging $S_{i j}(t)$ over all pairs where i is a node in the El Niño basin and j is a node outside. This should change, as its symmetric between inside and outside. > So, this is about how temperature anomalies outside the El Niño basin are correlated to temperature anomalies inside this basin at earlier times. And this.`

I changed 3 $i$s to $j$s in the definition of the time-delayed cross-correlation.

The definition of the time-delayed cross-correlation seems clearer in the 2014 paper, but note that the notation has changed. $C_{i,j}^{t}$ in 2014 was $c_{i,j}^{t}$ in 2013.

`I changed 3 $i$s to $j$s in the definition of the time-delayed cross-correlation. The definition of the time-delayed cross-correlation seems clearer in the 2014 paper, but note that the notation has changed. $C_{i,j}^{t}$ in 2014 was $c_{i,j}^{t}$ in 2013.`

What should it be changed to?

`> > Finally, they let $S(t)$ be the average link strength, averaging $S_{i j}(t)$ over all pairs where i is a node in the El Niño basin and j is a node outside. > This should change, as its symmetric between inside and outside. What should it be changed to?`

Ha! What you said is already symmetric, and is fine. Though reading it again, perhaps it could be improved:

`> What should it be changed to? Ha! What you said is already symmetric, and is fine. Though reading it again, perhaps it could be improved: > Finally, they let $S(t)$ be the average link strength for time $t$, calculated by averaging $S_{i j}(t)$ over all pairs $(i,j)$ where i is a node in the El Niño basin and j is a node outside.`

On the issue Nad has been talking about here and in other posts.

In the 2013 paper I don't see any ambiguity about the definition of the cross-covariances $C_{i,j}^{t}$. They don't mention a "usual covariance" or use double angle brackets. They say

but don't spell out exactly what they mean by "corresponding standard deviations".

In the 2014 paper, they do spell this out like this:

$$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ and then there is an ambiguity of the kind that Nad has been talking about. I think they mean

$$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ and that is what my code does.

One reason for thinking this must be what they mean: if double angle brackets mean looking back two years, then with $\tau$ included, you need over two and a half years' worth of previous data. Since the data starts in 1948, they wouldn't be able to find $S$ until mid-1950, but their graph goes back to the start of 1950!

`On the issue Nad has been talking about [here](http://forum.azimuthproject.org/discussion/1358/experiments-in-el-nino-detection-and-prediction/?Focus=11202#Comment_11202) and in other posts. In the 2013 paper I don't see any ambiguity about the definition of the cross-covariances $C_{i,j}^{t}$. They don't mention a "usual covariance" or use double angle brackets. They say > Finally, we divide the cross-covariances by the corresponding standard deviations (SD) of $T_i$ and $T_j$, to obtain the cross-correlations. but don't spell out exactly what they mean by "corresponding standard deviations". In the 2014 paper, they do spell this out like this: $$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ and then there is an ambiguity of the kind that Nad has been talking about. I think they mean $$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ and that is what my code does. One reason for thinking this must be what they mean: if double angle brackets mean looking back two years, then with $\tau$ included, you need over two and a half years' worth of previous data. Since the data starts in 1948, they wouldn't be able to find $S$ until mid-1950, but their graph goes back to the start of 1950!`

this is actually one of the reasons, why I thought that they might have had a different intention for their definition of correlation, because this is the usual form of standard deviation.

And - again - the two expressions seem to be only equal, if

$$\langle T_i(t) \langle T_i(t) \rangle \rangle = \langle T_i(t) \rangle \langle T_i(t) \rangle$$ But

$$\langle T_i(t) \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) \langle T_i(t- d) \rangle = \frac{1}{365} \sum_{d = 0}^{364} \frac{1}{365} \sum_{D = 0}^{364} T_i(t-d) T_i(t- d -D) \;\; (*)$$ which seems generically different from

$$\langle T_i(t) \rangle \langle T_i(t) \rangle = (\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d))(\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) ) \;\; (

)$$ alone for the reason that in (*) you have the term $T_i$ at time $T_i(t-364-364)$ which doesn't appear in ().`>In the 2014 paper, they do spell this out like this: >$$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ this is actually one of the reasons, why I thought that they might have had a different intention for their definition of correlation, because this is the usual form of standard deviation. >and then there is an ambiguity of the kind that Nad has been talking about. I think they mean >$$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ >and that is what my code does. And - again - the two expressions seem to be only equal, if $$\langle T_i(t) \langle T_i(t) \rangle \rangle = \langle T_i(t) \rangle \langle T_i(t) \rangle$$ But $$\langle T_i(t) \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) \langle T_i(t- d) \rangle = \frac{1}{365} \sum_{d = 0}^{364} \frac{1}{365} \sum_{D = 0}^{364} T_i(t-d) T_i(t- d -D) \;\; (*)$$ which seems generically different from $$\langle T_i(t) \rangle \langle T_i(t) \rangle = (\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d))(\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) ) \;\; (**)$$ alone for the reason that in (*) you have the term $T_i$ at time $T_i(t-364-364)$ which doesn't appear in (**).`

Very nice blog article.

I only have comments on some technical points.

In the beginning you say:

"sea surface temperature" is ambiguous, might as well spell it out by saying ocean surface air temperature, as you do later.

I find this wording to be a bit "spongy." True, the covariance is always a way to "get at" the correlation, simply by normalizing it.

Re: the question of whether to use covariances or correlations. It doesn't seem so peculiar to me to use correlations in defining the link strengths, because they remove the element which is contributed to the link strength just by the absolute magnitude of the signal. It boils down to whether we want to use the dot product of the time series (covariance) or the cosine of their angle as a measure of how closely the two time series are "in sync" with each other. Granted the dot product is mathematically simpler, but it confounds the magnitudes of the magnitudes of the time series with the measure of how much they point in the same direction.

I don't have any qualms about their handling of the definitions of correlation, which can be seen as the standard treatment.

Let's leave aside our expectations concerning moving average operators, and what should be expected of expressions like <<f(t)>> -- especially since their definitions don't invoke or depend on two applications of the angle brackets. The angle brackets are just a notation for what they are actually doing -- so let's take a look at that.

For a given day $t$, they are comparing two time series, one for node i and one for node j, each of length 365. Think of this as an "independent experiment." Their application of the angle brackets then just gives us the regular old mean of these 1-year-long time series. And their formula gives the regular covariance and correlation of these time series.

You previously defined $C^t_{(i,j)}(\tau)$, so what is it that you are introducing here? Do you mean to handle the case when $\tau$ is negative? If so, besides the formula change, some more words are needed then.

This is how is handled in earlier paper, but in the later paper they take the maximum, the mean and the SD of the

absolute valueof the cross correlation function.More specifically, they say that when the Niño index is below 0.5 degrees Centigrade, and S(t) crosses this threshold from below, then they predict that an El Niño episode will start in the following year.

Also, at the first mention of 2.82 in the article, I would state that this number is produced by a learning algorithm that finds this threshold by optimizing the predictive power of their model.

Moreover: this is a really fine blog series!

`Very nice blog article. I only have comments on some technical points. In the beginning you say: > For each pair of dots, compute a number saying how strongly correlated the sea surface temperatures are at those two places. "sea surface temperature" is ambiguous, might as well spell it out by saying ocean surface air temperature, as you do later. > Note that this [the covariance] is a way of studying the linear correlation between the temperature anomaly ... I find this wording to be a bit "spongy." True, the covariance is always a way to "get at" the correlation, simply by normalizing it. Re: the question of whether to use covariances or correlations. It doesn't seem so peculiar to me to use correlations in defining the link strengths, because they remove the element which is contributed to the link strength just by the absolute magnitude of the signal. It boils down to whether we want to use the dot product of the time series (covariance) or the cosine of their angle as a measure of how closely the two time series are "in sync" with each other. Granted the dot product is mathematically simpler, but it confounds the magnitudes of the magnitudes of the time series with the measure of how much they point in the same direction. > Ludescher _et al_ normalize this in a somewhat funny way I don't have any qualms about their handling of the definitions of correlation, which can be seen as the standard treatment. Let's leave aside our expectations concerning moving average operators, and what should be expected of expressions like <<f(t)>> -- especially since their definitions don't invoke or depend on two applications of the angle brackets. The angle brackets are just a notation for what they are actually doing -- so let's take a look at that. For a given day $t$, they are comparing two time series, one for node i and one for node j, each of length 365. Think of this as an "independent experiment." Their application of the angle brackets then just gives us the regular old mean of these 1-year-long time series. And their formula gives the regular covariance and correlation of these time series. > Anyway, $C^t_{(i,j)}(\tau)$ is defined in a similar way, starting from... You previously defined $C^t_{(i,j)}(\tau)$, so what is it that you are introducing here? Do you mean to handle the case when $\tau$ is negative? If so, besides the formula change, some more words are needed then. > Next, for nodes $i$ and $j$, and for each time point $t$, the maximum, the mean and the standard deviation around the mean are determined for $C_{i,j}^t(\tau)$, as $\tau$ varies across its range. This is how is handled in earlier paper, but in the later paper they take the maximum, the mean and the SD of the _absolute value_ of the cross correlation function. > So, when S(t) goes over 2.82, they predict an El Niño. More specifically, they say that when the Niño index is below 0.5 degrees Centigrade, and S(t) crosses this threshold from below, then they predict that an El Niño episode will start in the following year. Also, at the first mention of 2.82 in the article, I would state that this number is produced by a learning algorithm that finds this threshold by optimizing the predictive power of their model. Moreover: this is a really fine blog series!`

Yes. I made some additional assumptions about the multiplication with an average and you are right that strictly speaking the question is only whether the moving average can be treated as a linear factor ("a number") and "taken out" of the average bracket, that's why I specified in here and here in more detail where I see the problems.

`>Let’s leave aside our expectations concerning moving average operators, and what should be expected of expressions like <<f(t)>> – especially since their definitions don’t invoke or depend on two applications of the angle brackets. Yes. I made some additional assumptions about the multiplication with an average and you are right that strictly speaking the question is only whether the moving average can be treated as a linear factor ("a number") and "taken out" of the average bracket, that's why I specified in <a href="http://forum.azimuthproject.org/discussion/1377/blog-el-nino-project-part-3/?Focus=11233#Comment_11233">here</a> and <a href="http://forum.azimuthproject.org/discussion/1358/experiments-in-el-nino-detection-and-prediction/?Focus=11197#Comment_11197">here </a> in more detail where I see the problems.`

They used the absolute value in 2013 too. They say this in the correction.

`> This is how is handled in earlier paper, but in the later paper they take the maximum, the mean and the SD of the absolute value of the cross correlation function. They used the absolute value in 2013 too. They say this in the correction.`

I think the "the element which is contributed to the link strength just by the absolute magnitude of the signal" is important. A link exists for some physical reason - the movements of large amounts of air, water, water vapour - and the absolute magnitude matters.

`> Re: the question of whether to use covariances or correlations. It doesn’t seem so peculiar to me to use correlations in defining the link strengths, because they remove the element which is contributed to the link strength just by the absolute magnitude of the signal. I think the "the element which is contributed to the link strength just by the absolute magnitude of the signal" is important. A link exists for some physical reason - the movements of large amounts of air, water, water vapour - and the absolute magnitude matters.`

I have tried using covariances instead of correlations, and it hardly makes any difference. Although the values are considerably different, the second normalisation they do, the (max-mean)/sd over $\tau$, makes the link strengths similar.

`I have tried using covariances instead of correlations, and it hardly makes any difference. Although the values are considerably different, the second normalisation they do, the (max-mean)/sd over $\tau$, makes the link strengths similar.`

Graham wrote:

Okay, that's good that their two papers are in agreement with each other.

I just updated my description in the forum thread on the earlier paper to include the absolute value.

The blog article should also have a phrase added about the absolute value.

`Graham wrote: > They used the absolute value in 2013 too. They say this in the correction. Okay, that's good that their two papers are in agreement with each other. I just updated my description in the [forum thread on the earlier paper](http://forum.azimuthproject.org/discussion/1360/paper-ludescher-et-al-improved-el-nino-forecasting-by-cooperativity-detection/#Comment_10744) to include the absolute value. The blog article should also have a phrase added about the absolute value.`

Thanks for all your comments, everyone. You're really focusing in on the trickiest aspects of this paper. That's great! I will improve my article now, based on these.

Just a few comments on a couple of David's remarks - the only ones that make me want to argue.

That's deliberate. I'm not trying to give a thorough discussion of covariances and correlations here - I'm just trying to help the people who look at the formulas and go

"Eek! What's all this junk?"Just a quick nudge in the right direction.But as Nad pointed out - and Graham agrees - if we take their formula seriously, it involves a running average of (some function of) running averages. This is problematic for various reasons, both formal and practical. Graham's practical argument is the most definitive: since the data starts in 1948, if they actually meant what they wrote, they wouldn’t be able to find $S$ until mid-1950, but their graph goes back to the start of 1950!

So Graham decided, in a risky but probably wise way, that what they

wroteis not actually what theymeant. They wrote:$$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ which involves nested brackets, and thus a running average of (some function of) running averages. Graham decided that they

meant$$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ I'll mention this option in the blog article.

Their formula

$$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$

doesinvolve nested brackets, and it would equal$$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ if we had $\langle \langle f(t) \rangle^2 \rangle = \langle f(t) \rangle^2$. But we don't... not according to their definition of the angle brackets.

`Thanks for all your comments, everyone. You're really focusing in on the trickiest aspects of this paper. That's great! I will improve my article now, based on these. Just a few comments on a couple of David's remarks - the only ones that make me want to argue. <img src = "http://math.ucr.edu/home/baez/emoticons/tongue2.gif" alt = ""/> > > Note that this [the covariance] is a way of studying the linear correlation between the temperature anomaly … > I find this wording to be a bit “spongy.” That's deliberate. I'm not trying to give a thorough discussion of covariances and correlations here - I'm just trying to help the people who look at the formulas and go *"Eek! What's all this junk?"* Just a quick nudge in the right direction. > > Ludescher et al normalize this in a somewhat funny way > I don’t have any qualms about their handling of the definitions of correlation, which can be seen as the standard treatment. But as Nad pointed out - and Graham agrees - if we take their formula seriously, it involves a running average of (some function of) running averages. This is problematic for various reasons, both formal and practical. Graham's practical argument is the most definitive: since the data starts in 1948, if they actually meant what they wrote, they wouldn’t be able to find $S$ until mid-1950, but their graph goes back to the start of 1950! So Graham decided, in a risky but probably wise way, that what they _wrote_ is not actually what they _meant_. They wrote: $$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ which involves nested brackets, and thus a running average of (some function of) running averages. Graham decided that they _meant_ $$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ I'll mention this option in the blog article. > Let’s leave aside our expectations concerning moving average operators, and what should be expected of expressions like <<f(t)>> – especially since their definitions don’t invoke or depend on two applications of the angle brackets. Their formula $$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ _does_ involve nested brackets, and it would equal $$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ if we had $\langle \langle f(t) \rangle^2 \rangle = \langle f(t) \rangle^2$. But we don't... not according to their definition of the angle brackets.`

David wrote:

Actually I previously defined $C^t_{(i,j)}(-\tau)$, but I

wroteit as $C^t_{(i,j)}(\tau)$. So, I have to fix that typo. This time I'mreallyintroducing $C^t_{(i,j)}(\tau)$!By the way: their use of the notation $C^t_{(i,j)}(-\tau)$ and $C^t_{(i,j)}(\tau)$ seems silly to me. The way they define $C^t_{(i,j)}(-\tau)$, it's just the same as $C^t_{(j,i)}(\tau)$. Right? It is

notthe same as taking the formula for $C^t_{(i,j)}(\tau)$ and substituting $-\tau$ for $\tau$. But I'm not going to mess with their notation in this article; I'm just trying to describe their paper.`David wrote: > > Anyway, $C^t_{(i,j)}(\tau)$ is defined in a similar way, starting from... > You previously defined $C^t_{(i,j)}(\tau)$, so what is it that you are introducing here? Actually I previously defined $C^t_{(i,j)}(-\tau)$, but I _wrote_ it as $C^t_{(i,j)}(\tau)$. So, I have to fix that typo. This time I'm _really_ introducing $C^t_{(i,j)}(\tau)$! By the way: their use of the notation $C^t_{(i,j)}(-\tau)$ and $C^t_{(i,j)}(\tau)$ seems silly to me. The way they define $C^t_{(i,j)}(-\tau)$, it's just the same as $C^t_{(j,i)}(\tau)$. Right? It is _not_ the same as taking the formula for $C^t_{(i,j)}(\tau)$ and substituting $-\tau$ for $\tau$. But I'm not going to mess with their notation in this article; I'm just trying to describe their paper.`

I had written:

Besides leaving out the absolute value, I was fuzzy about what this range is. I believe it's from -200 to 200 days, using their silly definition of $C^t_{(i,j)}(-\tau)$. So, here is my corrected version:

I hope Graham okays this.

`I had written: > Next, for nodes ii and jj, and for each time point tt, the maximum, the mean and the standard deviation around the mean are determined for $C_{i,j}^t(\tau)$, as $\tau$ varies across its range. Besides leaving out the absolute value, I was fuzzy about what this range is. I believe it's from -200 to 200 days, using their silly definition of $C^t_{(i,j)}(-\tau)$. So, here is my corrected version: > Next, for nodes $i$ and $j$, and for each time point $t$, they determine the maximum, the mean and the standard deviation of $|C_{i,j}^t(\tau)|$, as $\tau$ ranges from -200 to 200 days. I hope Graham okays this.`

Now for details of El Niño prediction, and the source of the magic number 2.82. I had written:

I changed this to:

I won't get into details of this learning algorithm, since that would be too much.

`Now for details of El Niño prediction, and the source of the magic number 2.82. I had written: > The red line is their ‘average link strength’. Whenever this exceeds a certain threshold $\Theta = 2.82$, they predict an El Niño will start in the following calendar year. > The green arrows show their successful predictions. The dashed arrows show their false alarms. A little letter n appears next to each El Niño that they failed to predict. > Actually, chart A here shows the ‘learning phase’ of their calculation. In this phase, they adjusted the threshold Θ\Theta so their procedure would do a good job. Chart B shows the ‘testing phase’. I changed this to: > The red line is their 'average link strength'. Whenever this exceeds a certain threshold $\Theta = 2.82$, and the Niño 3.4 index is not <i>already</i> over 0.5°C, they predict an El Niño will start in the following calendar year. > The green arrows show their successful predictions. The dashed arrows show their false alarms. A little letter n appears next to each El Niño that they failed to predict. > You're probably wondering where the number $2.82$ came from. They get it from a learning algorithm that finds this threshold by optimizing the predictive power of their model. Chart A here shows the 'learning phase' of their calculation. In this phase, they adjusted the threshold $\Theta$ so their procedure would do a good job. Chart B shows the 'testing phase'... I won't get into details of this learning algorithm, since that would be too much.`

Okay, I decided we need a separate section on the mathematical nuances of how Ludescher

et aldefine standard deviations. It's important, because it's a possible mistake in their work, or at least how they explain their work. But it's complicated, so it's too distracting when you're first trying to understand this stuff.So, I start with this:

Then later I write:

`Okay, I decided we need a separate section on the mathematical nuances of how Ludescher _et al_ define standard deviations. It's important, because it's a possible mistake in their work, or at least how they explain their work. But it's complicated, so it's too distracting when you're first trying to understand this stuff. So, I start with this: > Ludescher <i>et al</i> then normalize this, defining the <b>time-delayed cross-correlation</b> $C_{i,j}^{t}(-\tau)$ to be the time-delayed cross-covariance divided by > $$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ > This is something like the standard deviation of $T_i(t)$ times the standard deviation of $T_j(t - \tau)$. Dividing by standard deviations is what people <a href = "https://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation">usually do</a> to turn covariances into correlations. However, there are some potential problems here, which I'll discuss later. Then later I write: > **Mathematical nuances** > Ludescher <i>et al</i> normalize the time-delayed cross-covariance in a somewhat odd way. They claim to divide it by > $$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ > This is a strange thing, since it has nested angle brackets. The angle brackets are defined as a running average over the 365 days, so this quantity involves data going back twice as long: 730 days. Furthermore, the 'link strength' involves the above expression where $\tau$ goes up to 200 days. > So, taking their definitions at face value, Ludescher <i>et al</i> could not actually compute their 'link strength' until 930 days after the surface temperature data first starts at the beginning of 1948. That would be <i>late 1950</i>. But their graph of the link strength starts at the <i>beginning</i> of 1950! > So, it's possible that they actually normalized the time-delayed cross-covariance by dividing it by this: > $$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ > This simpler expression avoids nested angle brackets. It makes more sense conceptually. This is the standard deviation of $T_i(t)$ over the last 365 days, times the standard deviation of $T_i(t-\tau)$ over the last 365 days.`

I had written:

David noted:

I think that sentence was a leftover from when I was confused about this! The Niño 3.4 index involves water temperatures, but their climate network is defined using air temperature. I've changed it to:

This is the first super-brief introduction to the idea, so I don't want to distract people with air temperature, ocean temperature, or (worse) "ocean surface air temperature". Note that I also don't explain we're only interested in pairs of dots where one is in the El Niño basin and the other is not! Those details will come later.

`I had written: > For each pair of dots, compute a number saying how strongly correlated the sea surface temperatures are at those two places. David noted: > “sea surface temperature” is ambiguous, might as well spell it out by saying ocean surface air temperature, as you do later. I think that sentence was a leftover from when I was confused about this! The Niño 3.4 index involves water temperatures, but their climate network is defined using air temperature. I've changed it to: > Very roughly, the idea is this. Draw a big network of dots representing different places in the Pacific Ocean. **For each pair of dots, compute a number saying how strongly correlated the temperatures are at those two places.** The paper claims that when a El Niño is getting ready to happen, the average of these numbers is big. In other words, temperatures in the Pacific tend to go up and down in synch! This is the first super-brief introduction to the idea, so I don't want to distract people with air temperature, ocean temperature, or (worse) "ocean surface air temperature". Note that I also don't explain we're only interested in pairs of dots where one is in the El Niño basin and the other is not! Those details will come later.`

I've fixed another tiny loose end. They only compute the link strength for every 10th day. So, I've changed this:

to this:

Note how each increase in accuracy makes things harder to read. This is one reason we need what my uncle called the "spiral approach", where we start with a rough summary and then bring in more detail.

`I've fixed another tiny loose end. They only compute the link strength for every 10th day. So, I've changed this: > So, when $S(t)$ goes over 2.82, they predict an El Niño. to this: > They compute $S(t)$ for every 10th day between January 1950 and November 2013. When $S(t)$ goes over 2.82, and the Niño 3.4 index is not <i>already</i> over 0.5°C, they predict an El Niño in the next calendar year. Note how each increase in accuracy makes things harder to read. <img src = "http://math.ucr.edu/home/baez/emoticons/frown.gif" alt = ""/> This is one reason we need what my uncle called the "spiral approach", where we start with a rough summary and then bring in more detail.`

Graham wrote:

That's great to know! After replicating their paper, a next goal might be to find a simpler scheme that does just as well. Using covariances instead of correlations is simpler.

Also: do we need to look at both $C_{i,j}^t(\tau)$ and what they call $C_{i,j}^t(-\tau)$, or is just one of these enough? This is conceptually interesting because the first is about the El Niño basin affecting the rest of the Pacific, while the second is about the rest of the Pacific affecting the El Niño basin. Is one of these more important than the other? (Here I'm conflating temporal order with causation, but you get the idea.)

Also, I find this $(max - mean)/sd$ a little klunky,

especiallywhen it's applied to correlations, which have already been normalized once. I can see why we might need it, but when I think about how many nested averages and standard deviations they're using to get their final 'average link strength', it makes me kind of sick.`Graham wrote: > I have tried using covariances instead of correlations, and it hardly makes any difference. Although the values are considerably different, the second normalisation they do, the (max-mean)/sd over $\tau$, makes the link strengths similar. That's great to know! After replicating their paper, a next goal might be to find a simpler scheme that does just as well. Using covariances instead of correlations is simpler. Also: do we need to look at both $C_{i,j}^t(\tau)$ and what they call $C_{i,j}^t(-\tau)$, or is just one of these enough? This is conceptually interesting because the first is about the El Niño basin affecting the rest of the Pacific, while the second is about the rest of the Pacific affecting the El Niño basin. Is one of these more important than the other? (Here I'm conflating temporal order with causation, but you get the idea.) Also, I find this $(max - mean)/sd$ a little klunky, _especially_ when it's applied to correlations, which have already been normalized once. I can see why we might need it, but when I think about how many nested averages and standard deviations they're using to get their final 'average link strength', it makes me kind of sick.`

Okay, I think the blog article is done. It should appear on July 1st at 1 am GMT, here:

If you have more corrections now or at any time, it's not too late for me to make them!

`Okay, I think the blog article is done. It should appear on July 1st at 1 am GMT, here: * [El Niño project (part 3)](http://johncarlosbaez.wordpress.com/2014/07/01/el-nino-project-part-3/), Azimuth Blog. If you have more corrections now or at any time, it's not too late for me to make them!`

it is not the usual standard definition.

Intrinsically in some sense what seems to be behind is, is that the average of an average is not the average, but strictly speaking that would actually only be true if in addition the multiplication of a function with the average is an homomorphism with respect to averaging, which it is if the average behaves as a number and that's why I said this is what is "behind" the problems you encounter. Strictly speaking however you only need to show that you can't take the average out of the average. So I would phrase this as:

$$\langle T_i(t) \langle T_i(t) \rangle \rangle \neq \langle T_i(t) \rangle \langle T_i(t) \rangle$$ since

$$\langle T_i(t) \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) \langle T_i(t- d) \rangle = \frac{1}{365} \sum_{d = 0}^{364} \frac{1}{365} \sum_{D = 0}^{364} T_i(t-d) T_i(t- d -D) \;\; (*)$$ which is generically different from

$$\langle T_i(t) \rangle \langle T_i(t) \rangle = (\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d))(\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) ) \;\; (

)$$ alone for the reason that in (*) you have the term $T_i$ at time $T_i(t-364-364)$ which doesn't appear in ().And than one has to say that those terms do not cancel out. I included this already in the wiki. Another thing is that you erased my name in the wiki article, I guess you wanted to protect me. ?. Do you think there is an error in my argumentation or do you think that pointing out the problem would not so be well perceived? In case it is the latter then I say again here: I had already written an email to them, so they know my name. And may be it is better if this background is more widely known. It would however be good that if you agree with my arguments that you explictly state this with your name. This is not some scientific article, this is an article in a renowned journal which seems to have some immmediate real-life relevance and if their (probably not intended) definition of correlation turns out ot be less useful then this might affect the overall result (i.e. the prediction) as given in the article.

`>This is the standard deviation of $T_i(t)$ over the last 365 days, times the standard deviation of $T_i(t-\tau)$ over the last 365 days. it is not the usual standard definition. >Their formula $$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ _does_ involve nested brackets, and it would equal $$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ if we had $\langle \langle f(t) \rangle^2 \rangle = \langle f(t) \rangle^2$. But we don't... not according to their definition of the angle brackets. Intrinsically in some sense what seems to be behind is, is that the average of an average is not the average, but strictly speaking that would actually only be true if in addition the multiplication of a function with the average is an homomorphism with respect to averaging, which it is if the average behaves as a number and that's why I said this is what is "behind" the problems you encounter. Strictly speaking however you only need to show that you can't take the average out of the average. So I would phrase this as: $$\langle T_i(t) \langle T_i(t) \rangle \rangle \neq \langle T_i(t) \rangle \langle T_i(t) \rangle$$ since $$\langle T_i(t) \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) \langle T_i(t- d) \rangle = \frac{1}{365} \sum_{d = 0}^{364} \frac{1}{365} \sum_{D = 0}^{364} T_i(t-d) T_i(t- d -D) \;\; (*)$$ which is generically different from $$\langle T_i(t) \rangle \langle T_i(t) \rangle = (\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d))(\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) ) \;\; (**)$$ alone for the reason that in (*) you have the term $T_i$ at time $T_i(t-364-364)$ which doesn't appear in (**). And than one has to say that those terms do not cancel out. I included this already in the wiki. Another thing is that you erased my name in the wiki article, I guess you wanted to protect me. ?. Do you think there is an error in my argumentation or do you think that pointing out the problem would not so be well perceived? In case it is the latter then I say again here: I had already written an email to them, so they know my name. And may be it is better if this background is more widely known. It would however be good that if you agree with my arguments that you explictly state this with your name. This is not some scientific article, this is an article in a renowned journal which seems to have some immmediate real-life relevance and if their (probably not intended) definition of correlation turns out ot be less useful then this might affect the overall result (i.e. the prediction) as given in the article.`

If I'm dealing with a lagged cross-covariance shouldn't it be a coefficient of 1/N-1 and start at d=1 and D=2? as described here.

`If I'm dealing with a lagged cross-covariance shouldn't it be a coefficient of 1/N-1 and start at d=1 and D=2? as described [here](http://w3eos.whoi.edu/12.747/notes/lect06/l06s02.html).`

From the blog post:

It is 9 by 23

That "alone" sounds like Nad's English, not yours!

`From the blog post: > Let i stand for any point in this 9 × 27 grid: It is 9 by 23 > alone for the reason that in (1) you have That "alone" sounds like Nad's English, not yours!`

I wrote:

Nad wrote:

I believe that

$$ \sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } $$ is equal to the usual standard deviation of $T_i(s)$ as $s$ ranges over the time period from $t - 365$ to $t$. If I'm wrong about this, I need to know!

`I wrote: > So, it's possible that they actually normalized the time-delayed cross-covariance by dividing it by this: > $$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ > This simpler expression avoids nested angle brackets. It makes more sense conceptually. This is the standard deviation of $T_i(t)$ over the last 365 days, times the standard deviation of $T_i(t-\tau)$ over the last 365 days. Nad wrote: > it is not the usual standard definition. I believe that $$ \sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } $$ is equal to the usual standard deviation of $T_i(s)$ as $s$ ranges over the time period from $t - 365$ to $t$. If I'm wrong about this, I need to know!`

Nad wrote:

No, that was just a mistake. I meant to credit you here:

and I'll fix that now. Since the blog article is under my name, it means I agree with everything in it.

I don't think you need "protection"; I prefer to give you credit for noticing this problem, and in general I prefer to credit everyone in Azimuth for what they've done - in part because it's fair and people usually enjoy getting credit, and in part helps demonstrate that Azimuth is a group of smart people working together, not just me. But if you prefer to have your name omitted from the blog article, just let me know in this thread.

`Nad wrote: > Another thing is that you erased my name in the wiki article, I guess you wanted to protect me? No, that was just a mistake. I meant to credit you here: * [[Blog - El Nino project (part 3)]] and I'll fix that now. Since the blog article is under my name, it means I agree with everything in it. I don't think you need "protection"; I prefer to give you credit for noticing this problem, and in general I prefer to credit everyone in Azimuth for what they've done - in part because it's fair and people usually enjoy getting credit, and in part helps demonstrate that Azimuth is a group of smart people working together, not just me. But if you prefer to have your name omitted from the blog article, just let me know in this thread.`

Graham wrote:

Whoops, I'll fix that.

Yes, I see that Nad has been editing the blog article some more on the wiki. Nad: please propose corrections here. I've already copied the wiki article over to the blog, so changes made on the wiki won't get into the blog article anymore.

I have changed your version:

to this:

`Graham wrote: > It is 9 by 23. Whoops, I'll fix that. > That “alone” sounds like Nad’s English, not yours! Yes, I see that Nad has been editing the blog article some more on the wiki. Nad: please propose corrections here. I've already copied the wiki article over to the blog, so changes made on the wiki won't get into the blog article anymore. I have changed your version: > Ludescher <i>et al</i> normalize the time-delayed cross-covariance in a somewhat odd way. They claim to divide it by > $$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ > This is a strange thing, since it has nested angle brackets. The angle brackets are defined as a running average over the 365 days, so this quantity involves data going back twice as long: 730 days. Furthermore, the 'link strength' involves the above expression where $\tau$ goes up to 200 days. > The covariances as well as the standard deviations, which are used do not use the usual definitions. In particular comparing the two definitions one encounters terms: > $$\langle T_i(t) \langle T_i(t) \rangle \rangle \neq \langle T_i(t) \rangle \langle T_i(t) \rangle$$ > since > $$\langle T_i(t) \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) \langle T_i(t- d) \rangle = \frac{1}{365} \sum_{d = 0}^{364} \frac{1}{365} \sum_{D = 0}^{364} T_i(t-d) T_i(t- d -D) \;\; (1)$$ > which is generically different from > $$\langle T_i(t) \rangle \langle T_i(t) \rangle = (\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d))(\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) ) \;\; (2)$$ > alone for the reason that in (1) you have the term $T_i$ at time $T_i(t-364-364)$ which doesn't appear in (2). > At least for the case of the standard deviation it is also clear that those terms do not cancel out. For the covariances this would still need to be shown. > So, taking their definitions at face value, Ludescher <i>et al</i> could not actually compute their 'link strength' until 930 days after the surface temperature data first starts at the beginning of 1948. That would be <i>late 1950</i>. But their graph of the link strength starts at the <i>beginning</i> of 1950! > So, it's possible that they actually normalized the time-delayed cross-covariance by dividing it by this: > $$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ > This simpler expression avoids nested angle brackets. It makes more sense conceptually. This is a version of the standard deviation of $T_i(t)$ over the last 365 days, times a version of the standard deviation of $T_i(t-\tau)$ over the last 365 days. > $$\langle T_i(t) \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) \langle T_i(t- d) \rangle = \frac{1}{365} \sum_{d = 0}^{364} \frac{1}{365} \sum_{D = 0}^{364} T_i(t-d) T_i(t- d -D) $$ > which is generically different from > $$\langle T_i(t) \rangle \langle T_i(t) \rangle = (\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d))(\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) ) $$ to this: > Ludescher <i>et al</i> normalize the time-delayed cross-covariance in a somewhat odd way. They claim to divide it by > $$\sqrt{\langle (T_i(t) - \langle T_i(t)\rangle)^2 \rangle} \; \sqrt{\langle (T_j(t-\tau) - \langle T_j(t-\tau)\rangle)^2 \rangle} $$ > But this is a strange thing, since it has nested angle brackets, and the angle brackets are defined as a running average over the 365 days. Thus, this quantity involves data going back twice as long: 730 days. > Furthermore, the 'link strength' involves the above expression where $\tau$ goes up to 200 days. So, taking their definitions at face value, Ludescher <i>et al</i> could not actually compute their 'link strength' until 930 days after the surface temperature data first starts at the beginning of 1948. That would be <i>late 1950</i>. But their graph of the link strength starts at the <i>beginning</i> of 1950! > Perhaps they actually normalized the time-delayed cross-covariance by dividing it by this: > $$\sqrt{\langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2 } \; \sqrt{\langle T_j(t-\tau)^2 \rangle - \langle T_j(t-\tau)\rangle^2} $$ > This simpler expression avoids nested angle brackets, and it makes more sense conceptually. It is the standard deviation of $T_i(t)$ over the last 365 days, times of the standard deviation of $T_i(t-\tau)$ over the last 365 days. > As <a href = "http://www.azimuthproject.org/azimuth/show/Nadja+Kutz">Nadja Kutz</a> noted, the expression written by Ludescher <i>et al</i> does not equal this simpler expression, since > $$\langle T_i(t) \langle T_i(t) \rangle \rangle \neq \langle T_i(t) \rangle \langle T_i(t) \rangle$$ > The reason is that > $$\langle T_i(t) \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) \langle T_i(t- d) \rangle = \frac{1}{365} \sum_{d = 0}^{364} \frac{1}{365} \sum_{D = 0}^{364} T_i(t-d) T_i(t- d -D) $$ > which is generically different from > $$\langle T_i(t) \rangle \langle T_i(t) \rangle = (\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d))(\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) ) $$`

John, I have just observed that I myself had forgotten to take the average over $\langle \langle T_i(t) \rangle^2 \rangle$, and may be one should write up the argumentation in more detail, it is very easy to oversee some points here, so please change the above to :

As Nadja Kutz noted, the expression written by Ludescher

et aldoes not equal this simpler expression, since$$\langle T_i(t) \langle T_i(t) \rangle \rangle \neq \langle \langle T_i(t) \rangle \langle T_i(t) \rangle \rangle$$ The reason is that

$$\langle T_i(t) \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) \langle T_i(t- d) \rangle = \frac{1}{365} \sum_{d = 0}^{364} \frac{1}{365} \sum_{D = 0}^{364} T_i(t-d) T_i(t- d -D) \;\;\; (1)$$ is generically different from

$$\langle \langle T_i(t) \rangle \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{D = 0}^{364} (\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d-D))(\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d-D) ) \;\;\; (2)$$ since the terms in (2) contains products with $T_i(t-364-364)T_i(t-364-364)$ which can't appear in (1).

Moreover:

$$\langle (T_i(t) - \langle T_i(t) \rangle)^2 \rangle = \langle T_i(t)^2 - 2 T_i(t) \langle T_i(t) \rangle + \langle T_i(t) \rangle^2 \rangle = \langle T_i(t)^2 \rangle - 2 \langle T_i(t) \langle T_i(t) \rangle \rangle + \langle \langle T_i(t) \rangle^2 \rangle \;\;\; (3)$$ But since

$$\langle T_i(t) \langle T_i(t) \rangle \rangle \neq \langle \langle T_i(t) \rangle \langle T_i(t) \rangle \rangle $$ as was just shown, those terms do not cancel out in (3), in particular this means that

$$\langle -2 \langle T_i(t) \langle T_i(t) \rangle \rangle + \langle \langle T_i(t) \rangle \langle T_i(t) \rangle \rangle $$ contains terms $T_i(t-364-364)$ which do not appear in $\langle T_i(t)\rangle^2$, hence

$$\langle (T_i(t) - \langle T_i(t) \rangle)^2 \rangle \neq \langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2$$ So at least for the case of the standard deviation it is clear that those two definitions are not the same for a running mean. For the covariances this would still need to be shown.

`John, I have just observed that I myself had forgotten to take the average over $\langle \langle T_i(t) \rangle^2 \rangle$, and may be one should write up the argumentation in more detail, it is very easy to oversee some points here, so please change the above to : As <a href = "http://www.azimuthproject.org/azimuth/show/Nadja+Kutz">Nadja Kutz</a> noted, the expression written by Ludescher <i>et al</i> does not equal this simpler expression, since $$\langle T_i(t) \langle T_i(t) \rangle \rangle \neq \langle \langle T_i(t) \rangle \langle T_i(t) \rangle \rangle$$ The reason is that $$\langle T_i(t) \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{d = 0}^{364} T_i(t-d) \langle T_i(t- d) \rangle = \frac{1}{365} \sum_{d = 0}^{364} \frac{1}{365} \sum_{D = 0}^{364} T_i(t-d) T_i(t- d -D) \;\;\; (1)$$ is generically different from $$\langle \langle T_i(t) \rangle \langle T_i(t) \rangle \rangle = \frac{1}{365} \sum_{D = 0}^{364} (\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d-D))(\frac{1}{365} \sum_{d = 0}^{364} T_i(t-d-D) ) \;\;\; (2)$$ since the terms in (2) contains products with $T_i(t-364-364)T_i(t-364-364)$ which can't appear in (1). Moreover: $$\langle (T_i(t) - \langle T_i(t) \rangle)^2 \rangle = \langle T_i(t)^2 - 2 T_i(t) \langle T_i(t) \rangle + \langle T_i(t) \rangle^2 \rangle = \langle T_i(t)^2 \rangle - 2 \langle T_i(t) \langle T_i(t) \rangle \rangle + \langle \langle T_i(t) \rangle^2 \rangle \;\;\; (3)$$ But since $$\langle T_i(t) \langle T_i(t) \rangle \rangle \neq \langle \langle T_i(t) \rangle \langle T_i(t) \rangle \rangle $$ as was just shown, those terms do not cancel out in (3), in particular this means that $$\langle -2 \langle T_i(t) \langle T_i(t) \rangle \rangle + \langle \langle T_i(t) \rangle \langle T_i(t) \rangle \rangle $$ contains terms $T_i(t-364-364)$ which do not appear in $\langle T_i(t)\rangle^2$, hence $$\langle (T_i(t) - \langle T_i(t) \rangle)^2 \rangle \neq \langle T_i(t)^2\rangle - \langle T_i(t)\rangle^2$$ So at least for the case of the standard deviation it is clear that those two definitions are not the same for a running mean. For the covariances this would still need to be shown.`

John wrote:

I don't think their notation here is especially good, but I don't read the nested angle brackets as denoting a running average of running averages.

EDIT: Nevermind, I was off-base here.

`John wrote: > But as Nad pointed out - and Graham agrees - if we take their formula seriously, it involves a running average of (some function of) running averages. I don't think their notation here is especially good, but I don't read the nested angle brackets as denoting a running average of running averages. EDIT: Nevermind, I was off-base here.`

I'm following this very clear explanation of cross-covariance, which I posted above at #27, as to why people use 1/(N-1) with d=1 and D = 2 with the normalisation factor suggested by John.

`I'm following this very clear [explanation](http://w3eos.whoi.edu/12.747/notes/lect06/l06s02.html) of cross-covariance, which I posted above at #27, as to why people use 1/(N-1) with d=1 and D = 2 with the normalisation factor suggested by John.`

I guess since David doesn't agree with Nad's calculation, I'll have to do Nad's calculation in even more detail and perhaps include it in the blog article, or in a comment on the blog article. I believe Nad's doing it right, but I'll check again.

Luckily we have good evidence that whatever Luedescher and company

saythey're doing, they are not taking a running average of running averages. And soon we can stop trying to guess what they did, and start doing something better.`I guess since David doesn't agree with Nad's calculation, I'll have to do Nad's calculation in even more detail and perhaps include it in the blog article, or in a comment on the blog article. I believe Nad's doing it right, but I'll check again. Luckily we have good evidence that whatever Luedescher and company _say_ they're doing, they are not taking a running average of running averages. And soon we can stop trying to guess what they did, and start doing something better.`

This is a confusing point, and I am flip flopping on it.

`This is a confusing point, and I am flip flopping on it.`

Here I will introduce what I think is some better notation for exploring the question.

Let X be a time series.

Define:

$$ Avg_t(X) = \langle X(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} X(t-d) $$

`Here I will introduce what I think is some better notation for exploring the question. Let X be a time series. Define: $$ Avg_t(X) = \langle X(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} X(t-d) $$`

Now define:

$$ f(t) \equiv \langle X(t) - \langle X(t) \rangle \rangle = Avg_t(X - Avg_t(X)) $$ The naive intention here is the following:

Let $t$ be a given day

Let $YearAvg$ = $Avg_t(x)$

Form the time series $X'$ consisting of all daily differences from $YearAvg$

Take the average of the last 365 days of $X'$

We therefore

expectf(t) to be zero.But is it, when we do a rigorous calculation?

`Now define: $$ f(t) \equiv \langle X(t) - \langle X(t) \rangle \rangle = Avg_t(X - Avg_t(X)) $$ The naive intention here is the following: * Let $t$ be a given day * Let $YearAvg$ = $Avg_t(x)$ * Form the time series $X'$ consisting of all daily differences from $YearAvg$ * Take the average of the last 365 days of $X'$ We therefore _expect_ f(t) to be zero. But is it, when we do a rigorous calculation?`

Okay, here goes.

First step:

$$ f(t) = Avg_t(X - \frac{1}{365} \sum_{d = 0}^{364} X(t-d)) $$ Now let

$$ X' \equiv X - \frac{1}{365} \sum_{d = 0}^{364} X(t-d) $$ So

$$ f(t) = Avg_t(X') $$ Now $X'$ is a time series, but let's not make the mistake of using the same variable $t$ for the dummy argument -- it can be any variable.

So we can write:

$$ X'(u) = X(u) - \frac{1}{365} \sum_{d = 0}^{364} X(t-d) $$ So,

$$ f(t) = Avg_t(X') = \frac{1}{365} \sum_{e = 0}^{364} X'(t-e) = \frac{1}{365} \sum_{e = 0}^{364} (X(t - e) - \frac{1}{365} \sum_{d = 0}^{364} X(t - d)) $$ $$ 365 f(t) = \sum_{e = 0}^{364} X'(t-e) - \sum_{e = 0}^{364} \frac{1}{365} \sum_{d = 0}^{364} X(t - d) = \sum_{e = 0}^{364} X'(t-e) - \sum_{d = 0}^{364} X(t - d) = 0 $$

`Okay, here goes. First step: $$ f(t) = Avg_t(X - \frac{1}{365} \sum_{d = 0}^{364} X(t-d)) $$ Now let $$ X' \equiv X - \frac{1}{365} \sum_{d = 0}^{364} X(t-d) $$ So $$ f(t) = Avg_t(X') $$ Now $X'$ is a time series, but let's not make the mistake of using the same variable $t$ for the dummy argument -- it can be any variable. So we can write: $$ X'(u) = X(u) - \frac{1}{365} \sum_{d = 0}^{364} X(t-d) $$ So, $$ f(t) = Avg_t(X') = \frac{1}{365} \sum_{e = 0}^{364} X'(t-e) = \frac{1}{365} \sum_{e = 0}^{364} (X(t - e) - \frac{1}{365} \sum_{d = 0}^{364} X(t - d)) $$ $$ 365 f(t) = \sum_{e = 0}^{364} X'(t-e) - \sum_{e = 0}^{364} \frac{1}{365} \sum_{d = 0}^{364} X(t - d) = \sum_{e = 0}^{364} X'(t-e) - \sum_{d = 0}^{364} X(t - d) = 0 $$`

The nested running sums would be generated by choosing the same dummy argument for the inner time series -- but this is a general problem caused by "free variable clashes," and symbolic calculations routinely choose fresh variables whenever a new dummy argument is needed.

`The nested running sums would be generated by choosing the same dummy argument for the inner time series -- but this is a general problem caused by "free variable clashes," and symbolic calculations routinely choose fresh variables whenever a new dummy argument is needed.`

EDIT: removed an off-base argument.

`EDIT: removed an off-base argument.`

John wrote

and

I wrote yesterday June 30th:

which you didn't, without any further discussions. That is you posted under my name a rather intermediary step (within a proof) which might not be sufficient for understanding and worse which formulation "since ...." looks a bit as if I had written this and had claimed that this is a proof (although it is rather only an argument for why it is rather suggestive that the two expressions are not equal). While I actually

hadoutlined my argumentation in more detail here (where you told me to do so) and in time before posting. I am not really happy about this.`John wrote >Okay, I think the blog article is done. It should appear on July 1st at 1 am GMT, here: and >Yes, I see that Nad has been editing the blog article some more on the wiki. Nad: please propose corrections here. I’ve already copied the wiki article over to the blog, so changes made on the wiki won’t get into the blog article anymore. I <a href="http://forum.azimuthproject.org/discussion/1377/blog-el-nino-project-part-3/?Focus=11282#Comment_11282">wrote yesterday June 30th</a>: >so please change the above to : which you didn't, without any further discussions. That is you posted under my name a rather intermediary step (within a proof) which might not be sufficient for understanding and worse which formulation "since ...." looks a bit as if I had written this and had claimed that this is a proof (although it is rather only an argument for why it is rather suggestive that the two expressions are not equal). While I actually <em>had</em> outlined my argumentation in more detail <a hre="http://forum.azimuthproject.org/discussion/1377/blog-el-nino-project-part-3/?Focus=11282#Comment_11282">here</a> (where you told me to do so) and in time before posting. I am not really happy about this.`

I'll add the extra stuff now, Nad. I'm sorry.

`I'll add the extra stuff now, Nad. I'm sorry.`

It's probably worth remembering that nobody cares about this issue yet except us - and probably just three of us, since Graham has already figured out the good way to do things. (Perhaps Ludescher

et alwill care about this issue someday, but I doubt they will care unless we force them to, and then they'll probably say "Oh, but what wemeantwas..." what Graham is doing. So, I personally don't plan to spend any time trying to point out this problem to them.)David doubts that

$$ \langle \langle f(t) \rangle \rangle \ne \langle f(t) \rangle $$ Let me check by simply calculating both sides. I don't feel like switching to the notation $Avg_t (f)$, since I don't think the answer to the question will depend on the notation. I think it depends solely on the definition Ludescher

et almade:$$ \langle f(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} f(t - d) $$ together with the laws of arithmetic. This definition is a recipe for taking a function of time $f(t)$ and obtaining a new function of time, its running average $\langle f(t) \rangle$. So, we're asking: is the running average of the running average equal to the running average?

Let us calculate. I start with

$$ \langle \langle f(t) \rangle \rangle $$ The outer brackets say to do a running average of the function inside, namely the function $\langle f(t) \rangle$. This function on the inside equals $\frac{1}{365} \sum_{d = 0}^{364} f(t - d) $. So, we get

$$ \langle \langle f(t) \rangle \rangle = \langle \frac{1}{365} \sum_{d = 0}^{364} f(t - d) \rangle $$ Now what? We take the running average of the function of $t$ inside the angle brackets, and get:

$$ \langle \langle f(t) \rangle \rangle = \frac{1}{365} \sum_{D = 0}^{364} \frac{1}{365} \sum_{d = 0}^{364} f(t - d- D) $$ It's different from

$$ \langle f(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} f(t - d) $$

`It's probably worth remembering that nobody cares about this issue yet except us - and probably just three of us, since Graham has already figured out the good way to do things. (Perhaps Ludescher _et al_ will care about this issue someday, but I doubt they will care unless we force them to, and then they'll probably say "Oh, but what we _meant_ was..." what Graham is doing. So, I personally don't plan to spend any time trying to point out this problem to them.) David doubts that $$ \langle \langle f(t) \rangle \rangle \ne \langle f(t) \rangle $$ Let me check by simply calculating both sides. I don't feel like switching to the notation $Avg_t (f)$, since I don't think the answer to the question will depend on the notation. I think it depends solely on the definition Ludescher _et al_ made: $$ \langle f(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} f(t - d) $$ together with the laws of arithmetic. This definition is a recipe for taking a function of time $f(t)$ and obtaining a new function of time, its running average $\langle f(t) \rangle$. So, we're asking: is the running average of the running average equal to the running average? Let us calculate. I start with $$ \langle \langle f(t) \rangle \rangle $$ The outer brackets say to do a running average of the function inside, namely the function $\langle f(t) \rangle$. This function on the inside equals $\frac{1}{365} \sum_{d = 0}^{364} f(t - d) $. So, we get $$ \langle \langle f(t) \rangle \rangle = \langle \frac{1}{365} \sum_{d = 0}^{364} f(t - d) \rangle $$ Now what? We take the running average of the function of $t$ inside the angle brackets, and get: $$ \langle \langle f(t) \rangle \rangle = \frac{1}{365} \sum_{D = 0}^{364} \frac{1}{365} \sum_{d = 0}^{364} f(t - d- D) $$ It's different from $$ \langle f(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} f(t - d) $$`

I added the extra stuff Nad wanted in the blog article. I polished some of the grammar. I was unable to include equation numbers in a nice way, because the equations are too long. So, I used words to say what was going on.

`I added the extra stuff Nad wanted in the blog article. I polished some of the grammar. I was unable to include equation numbers in a nice way, because the equations are too long. So, I used words to say what was going on.`

OK I hope there is no stupid mistake in it.

As I wrote I pointed it already out to them in an email. I thought that it could be important with respect to their El Nino forcast. Finally El Nino forcasts involve a lot of money (and finally also lives) and if there might be a mistake in the forcast (where it is here not really clear wether there is a mistake) then it would be good to check that again and to either clarify the issue or to correct the work. And if one can't fix the work, then I still think that their concept was worthwhile to look at, even if it would not usable as a forcast, last but not least it gave an interesting view into the forcast business. I am not so sure though if people are uninterested in critique to their forecast, finally as said it is an important issue. I could also imagine that they get a lot of critique and a lot of critique will probably be rather unconstructive.

But of course I shouldn't spend any time on this even less than you, since you are "somewhat" paid for doing that and with pointing out this what seemed to be an unintended step I already did way too much for what I can afford. And any business coach would again tell me that it's my own fault if I don't think as an entrepreneur, like as they did when I told them that I had taught photovoltaics for cheapo. And I guess in some sense there is a right point, you can't always try to saveguard, if there is too much which looks as if it needs to be saveguarded. In particular if you keep doing this it may be at one point yourself, which needs to be saveguarded.

That is in particular as you said here it is not clear that they value our contribution. In particular they didn't reply to my email. But then as said it could be that they are flooded with critiques. And if they don't have to publish their code then yes they could probably claim that "they meant to do so" as in Grahams formulation, since their result will probably stay rather fuzzy, if we haven't overseen something. I actually had also imagined that they could have meant that what Graham had formulated, i.e. that they take (Graham I hope I had understood you correctly):

$$\frac{(\langle x y \rangle - \langle x \rangle \langle y \rangle)}{\sqrt{(\langle x^2 \rangle - \langle x \rangle^2)(\langle y^2 \rangle - \langle y \rangle^2)}}$$ as a generalization of the correlation applied to running averages. But as I understood Graham could not reproduce their results, which could mean that they have a different solution or that some other methods were used somewhere else.

But I haven't really thought about this long enough and I could also imagine other generalizations, like the next step for me would be to understand why the correlation is usually defined as

$$\frac{\langle (x - \langle x \rangle)(y - \langle y \rangle)}{(\sqrt(\langle (x - \langle x \rangle)^2 \rangle \langle (y - \langle y \rangle)^2 \rangle))} $$ and not as

$$\langle \frac{ (x - \langle x \rangle)(y - \langle y \rangle)}{\sqrt((x - \langle x \rangle)^2 (y - \langle y \rangle)^2)} \rangle $$ But I am getting carried away again.

`>I’ll add the extra stuff now, Nad. I’m sorry. OK I hope there is no stupid mistake in it. >It’s probably worth remembering that nobody cares about this issue yet except us - and probably just three of us, since Graham has already figured out the good way to do things, but I doubt they will care unless we force them to, and then they’ll probably say “Oh, but what we meant was…” what Graham is doing. So, I personally don’t plan to spend any time trying to point out this problem to them. As I wrote I pointed it already out to them in an email. I thought that it could be important with respect to their El Nino forcast. Finally El Nino forcasts involve a lot of money (and finally also lives) and if there might be a mistake in the forcast (where it is here not really clear wether there is a mistake) then it would be good to check that again and to either clarify the issue or to correct the work. And if one can't fix the work, then I still think that their concept was worthwhile to look at, even if it would not usable as a forcast, last but not least it gave an interesting view into the forcast business. I am not so sure though if people are uninterested in critique to their forecast, finally as said it is an important issue. I could also imagine that they get a lot of critique and a lot of critique will probably be rather unconstructive. But of course I shouldn't spend any time on this even less than you, since you are "somewhat" paid for doing that and with pointing out this what seemed to be an unintended step I already did way too much for what I can afford. And any business coach would again tell me that it's my own fault if I don't think as an entrepreneur, like as they did when I told them that I had taught photovoltaics for cheapo. And I guess in some sense there is a right point, you can't always try to saveguard, if there is too much which looks as if it needs to be saveguarded. In particular if you keep doing this it may be at one point yourself, which needs to be saveguarded. That is in particular as you said here it is not clear that they value our contribution. In particular they didn't reply to my email. But then as said it could be that they are flooded with critiques. And if they don't have to publish their code then yes they could probably claim that "they meant to do so" as in Grahams formulation, since their result will probably stay rather fuzzy, if we haven't overseen something. I actually had also imagined that they could have meant that what Graham had formulated, i.e. that they take (Graham I hope I had understood you correctly): $$\frac{(\langle x y \rangle - \langle x \rangle \langle y \rangle)}{\sqrt{(\langle x^2 \rangle - \langle x \rangle^2)(\langle y^2 \rangle - \langle y \rangle^2)}}$$ as a generalization of the correlation applied to running averages. But as I understood Graham could not reproduce their results, which could mean that they have a different solution or that some other methods were used somewhere else. But I haven't really thought about this long enough and I could also imagine other generalizations, like the next step for me would be to understand why the correlation is usually defined as $$\frac{\langle (x - \langle x \rangle)(y - \langle y \rangle)}{(\sqrt(\langle (x - \langle x \rangle)^2 \rangle \langle (y - \langle y \rangle)^2 \rangle))} $$ and not as $$\langle \frac{ (x - \langle x \rangle)(y - \langle y \rangle)}{\sqrt((x - \langle x \rangle)^2 (y - \langle y \rangle)^2)} \rangle $$ But I am getting carried away again.`

Nad wrote:

I checked it and it looks right to me.

Is that true, Graham? How close are you to reproducing their results? I thought you 'almost' reproduced them.

That's what my next blog article should be about, probably. The article about me trying to learn R is less important, and should probably be part 5.

I'm hoping we can not just "fix" but actually improve on their work. Noticing this problem with running averages was good. Now we can do things more clearly... and I think we can gradually pay less and less attention to what Ludescher

et aldid and focus on whatweshould do. That's what I want to do, anyway. You don't need to spend any more time on this if you can't afford it... but whenever you have time to take a look, that's fine too.You don't need to spend any more time on this issue, Nad.

`Nad wrote: > I hope there is no stupid mistake in it. I checked it and it looks right to me. > But as I understood Graham could not reproduce their results, Is that true, Graham? How close are you to reproducing their results? I thought you 'almost' reproduced them. That's what my next blog article should be about, probably. The article about me trying to learn R is less important, and should probably be part 5. > And if one can’t fix the work, then I still think that their concept was worthwhile to look at, even if it would not useable as a forecast, last but not least it gave an interesting view into the forecast business. I'm hoping we can not just "fix" but actually improve on their work. Noticing this problem with running averages was good. Now we can do things more clearly... and I think we can gradually pay less and less attention to what Ludescher _et al_ did and focus on what _we_ should do. That's what I want to do, anyway. You don't need to spend any more time on this if you can't afford it... but whenever you have time to take a look, that's fine too. You don't need to spend any more time on this issue, Nad.`

Have a look at my second attempt near the bottom of the wiki page. I haven't tried to get closer than that. The discrepancy might be because I'm using 1950-1979, whereas they used 1948-1980. It might be a bug. Or a slightly different algorithm for some reason, not necessarily the one being discussed here.

The only thing I care enough to do anything about is if my code is not doing what I think it is doing. I would want to fix that, either my code or my thinking.

`> Is that true, Graham? How close are you to reproducing their results? I thought you ’almost’ reproduced them. Have a look at my second attempt near the bottom of the [wiki page](http://www.azimuthproject.org/azimuth/show/Experiments+in+El+Ni%C3%B1o+analysis+and+prediction+). I haven't tried to get closer than that. The discrepancy might be because I'm using 1950-1979, whereas they used 1948-1980. It might be a bug. Or a slightly different algorithm for some reason, not necessarily the one being discussed here. The only thing I care enough to do anything about is if my code is not doing what I think it is doing. I would want to fix that, either my code or my thinking.`

This paper gives me the Heebie-Jeebies... is it just me?????

I did some simple symbolic computations on small inputs to verify the confusing syntax of this paper:

Ludescher paper: symbolix

I kept the math typesetting as close to the paper as I could...

If you think it is useful, you could tell me which evaluations are incorrect, IN[i] and OUT[i] match in the pdf file so you could easily report an evaluation.

If I could do these symbolix stuff correctly then I will read the actual data and perform the computations at least in part, then perhaps could pass along output to Graham's code for further analysis.

Dara

`This paper gives me the Heebie-Jeebies... is it just me????? I did some simple symbolic computations on small inputs to verify the confusing syntax of this paper: [Ludescher paper: symbolix](http://files.lossofgenerality.com/el_nino_symblix.pdf) I kept the math typesetting as close to the paper as I could... If you think it is useful, you could tell me which evaluations are incorrect, IN[i] and OUT[i] match in the pdf file so you could easily report an evaluation. If I could do these symbolix stuff correctly then I will read the actual data and perform the computations at least in part, then perhaps could pass along output to Graham's code for further analysis. Dara`

I said back in comment 15 that the normalisation had little effect. Here are some actual results. The graph shows a variant of the Ludescher

et alalgorithm. The original algorithm is in black. A variant with no normalisation is shown in red.`I said back in comment 15 that the normalisation had little effect. Here are some actual results. The graph shows a variant of the Ludescher *et al* algorithm. The original algorithm is in black. A variant with no normalisation is shown in red. <img width = "800" src = "http://www.azimuthproject.org/azimuth/files/corrsS-vs-covsS.png" alt = ""/>`

Thanx Graham, could you show me where this code resides & is it possible if I denoised the data with other than moving average, could we experiment and see how this algorithm then performs?

Dara

`Thanx Graham, could you show me where this code resides & is it possible if I denoised the data with other than moving average, could we experiment and see how this algorithm then performs? Dara`