In time sequence evaluation, it’s helpful to grasp if one sequence influences one other. For instance, it’s helpful for commodity merchants to know if a rise in commodity A results in a rise in commodity B. Initially, this relationship was measured utilizing linear regression, nevertheless, within the Nineteen Eighties Clive Granger and Paul Newbold confirmed this strategy yields incorrect outcomes, notably for non-stationary time sequence. Because of this, they conceived the idea of cointegration, which received Granger a Nobel prize. On this submit, I wish to talk about the necessity and utility of cointegration and why it is a vital idea Information Scientists ought to perceive.
Overview
Earlier than we talk about cointegration, let’s talk about the necessity for it. Traditionally, statisticians and economists used linear regression to find out the connection between completely different time sequence. Nonetheless, Granger and Newbold confirmed that this strategy is inaccurate and results in one thing referred to as spurious correlation.
A spurious correlation is the place two time sequence could look correlated however really they lack a causal relationship. It’s the traditional ‘correlation doesn’t imply causation’ assertion. It’s harmful as even statistical exams could properly say that there’s a casual relationship.
Instance
An instance of a spurious relationship is proven within the plots under:
Right here we’ve got two time sequence A(t) and B(t) plotted as a perform of time (left) and plotted in opposition to one another (proper). Discover from the plot on the correct, that there’s some correlation between the sequence as proven by the regression line. Nonetheless, by wanting on the left plot, we see this correlation is spurious as a result of B(t) constantly will increase whereas A(t) fluctuates erratically. Moreover, the common distance between the 2 time sequence can be rising…