Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why do we Sometimes get Nonsense-Correlations between Time-Series? (mcgill.ca)
15 points by ahalan on Oct 22, 2012 | hide | past | favorite | 3 comments


The money quote of this article is:

"I propose to term such correlations... the serial correlations for the given series."

The basic idea here is that observations in a time-series are not actually independent because each observation is highly correlated with the previous observation, and so the usual significance tests and standard errors do not apply.

Some links if you're interested in learning more about analyzing serial correlation and time-series data:

http://en.wikipedia.org/wiki/Autoregressive_model

http://en.wikipedia.org/wiki/Newey–West_estimator

http://en.wikipedia.org/wiki/Prais-Winsten_transformation


i think the argument being made is that if you sample a continuous signal at a frequency higher than where most of the power in the signal's spectrum lies, then those samples are not independent. so standard statistical tests that assume independent measurements overestimate significance.

so if you have two smooth, continuous signals, over a relatively short time (compared to the underlying process that is generating them) then you should simply ask whether they both slope in the same general way (if you like, there's a 50:50 chance that both go up (or down) compared to one going one way and one the other). both sloping in the same way is not terribly significant (50:50 likely by chance). and that doesn't change even if you sample like crazy, and generate lots and lots of points, which appear to show a hugely significant correlation.

[edit as i slowly grok this better] more generally, correlation coefficient isn't a good tool to use for comparing signals. it should be used for comparing random samples from populations (a signal is not a population). and i don't think people use it that way these days. so i guess this paper won out.

but i may have missed something, or be simply wrong, because this was published 100 years after fourier died, yet when i scanned it i saw nothing that mentioned fourier analysis, which seems like an obvious way (see above) to phrase this (but i may be biased, since i guess fourier analysis boomed once machines existed to compute ffts).


Note: 63-page PDF of a mathematical paper published in 1926.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: