soci209 - module 14 - autocorrelation in time series data

Soci709 (formerly 209) Module 14 - AUTOCORRELATION IN TIME SERIES DATA

Resources:

ALSM5e pp 481--498; ALSM4e pp 497--516
Hamilton 2006 pp. 339-360 (especially commands tsset date and prais y x1 x2)

1. AUTOCORRELATION OF ERRORS: NATURE OF THE PROBLEM

Time series data are observed on the same unit (individual, country, firm, etc.) at n points in time. EX:

yearly divorce rate in the U.S. from 1922 to present
quarterly profits of a company
income inequality in the U.S. measured each year from 1964 to present
daily atmospheric pollen count in Chapel Hill since first recorded, etc.

In regression models using time series data the errors are often correlated over time (they are said to be autocorrelated or serially correlated).
NKNW illustrate the problems caused by correlated errors with simulated data generated with the model:

Y_t = b₀ + b₁X_t + e_t
e_t = e_t-1 + u_t
X_t represents "time", so that X₁=1, X₂=2, etc.
b₀ = 2; b₁ = .5
e₀ (e_t prior to beginning the process) = 3

The simulated data are shown in the next exhibit.

Exhibit: Simulation of time series data with autocorrelated errors (ALSM4e Table 12.1 p. 498)

As seen in the next exhibit, the errors e_t are positively correlated.

Exhibit: Pattern of autocorrelations of the true errors e_t (ALSM4e Figure 12.1 p. 499)

Because of serial correlation

the OLS and true regression lines may differ sharply from sample to sample depending on the initial disturbance e₀ (compare (a), (b) and (c) in next exhibit)
MSE may underestimate true variance of e_t (compare variability of residuals around regression line in (a) and (b) in next exhibit); thus standard errors of estimate of the regression coefficients may also be underestimated

These patterns can be seen in the next exhibit

Exhibit: How autocorrelated errors cause underestimation of error variance (ALSM4e Figure 12.2 p. 500)

In general, serial correlation of the disturbances may have the following effects with OLS estimation

estimated regression coefficients are still unbiased but no longer minimum variance (= inefficient)
MSE (the OLS estimate of s²) may underestimate the true variance of errors
s{b_k} may underestimate true standard error of estimate
thus, statistical inference using t and F is no longer justified

2. AUTOCORRELATION DIAGNOSTICS

1. Plot of Residuals Against Time or Sequential Order

An informal diagnostic of autocorrelation of errors is to plot the residuals from the OLS regression against time or against the sequential order of the observation in the file (after checking that observations are in fact arranged in chronological order!). Connecting the points with a dotted line makes any pattern of autocorrelation more conspicuous. Look for evidence of "tracking", in which residuals corresponding to adjacent time points have similar values. (Some people say to look for a pattern like that made by bullets fired from a machine gun.)

Exhibit: Index plot (= time plot) of residuals for Blaisdell data

2. The Wald-Wolfowitz Runs Test

The Wald-Wolfowitz runs test is a non-parametric test that detects serial patterns in a run of numbers. Applied to the residuals of the OLS regression, a significant test indicates the presence of sequences of positive or negative residuals longer than expected by chance alone. Such long sequences of residuals above or below zero is what one would expect if the errors are "tracking" because of autocorrelation.

Exhibit: Wald-Wolfowitz runs test with the Blaisdell data

For the Blaisdell data the test is significant (p=.006) so one concludes that the errors are correlated.

3. The Durbin-Watson Test

The Durbin-Watson (D-W) test is the most commonly used test of autocorrelation of residuals.
The D-W D statistic is calculated from the ordinary OLS residuals e_t = Y_t - ^Y_t as

D = S_{t=2
to n} (e_t - e_t-1)² / S_{t=1
to n} e_t²

where n is the number of cases.
To understand the D-W formula consider that

when e_t and e_t-1 are correlated they have similar values
thus when e_t and e_t-1 are correlated the terms (e_t - e_t-1)² are small and the numerator of D is small (while the denominator is the same no matter how much autocorrelation there is)
thus small values of D (close to zero) indicate serial correlation

The D-W test setup is

H₀: r = 0
H₁: r > 0

Table B7 gives critical values d_L and d_U such that

if D > d_U conclude H₀ (r = 0)
if d_L <= D <= d_U the test is inconclusive
if D < d_L conclude H₁ (r > 0)

Example: SYSTAT routinely reports the D-W D statistic with every regression (D has no meaning unless observations are sequentially ordered). For the Blaisdell data D = .735. Table B7 for n=20 and p-1=1 gives d_L=.95 and d_U=1.15. Since .735 < .95 = d_L one concludes H₁ (errors are autocorrelated).

3. REMEDIAL MEASURES FOR AUTOCORRELATION

1. Add Omitted Predictors to Model

Autocorrelation is caused by unmeasured variables that have similar values from period to period. Identifying these variables and including them in the model may eliminate the serial correlation. Some of these substantive omitted variables may be "simulated" by adding to the model

a linear or exponential trend
seasonal indicators

If adding a trend or seasonal indicators gets rid of the autocorrelation, this is by far the best solution to the problem.

2. The First-Order Autoregressive Error Model With Generalized Least Squares Estimation

1. First-Order Autoregressive Error Model

The model is

Y_t = b₀ + b₁X_t1 + b₂X_t2 + ... + b_p-1X_t,p-1 + e_t
e_t = re_t-1 + u_t

where

|r| < 1 (r is Greek "rho" and denotes the autocorrelation parameter)
u_t is i.i.d. ~ N(0, s²)

One can show the following consequences of model assumptions (see ALSM4e pp. 501-502; try to express these relationships in words):

E{e_t} = 0
s²{e_t} = s²/(1-r²)
s{e_t, e_t-1} = r(s²/(1-r²))
r{e_t, e_t-1} = s{e_t, e_t-1}/(s{e_t}s{e_t-1}) = r
r{e_t, e_t-s} = r^s

Thus the variance-covariance matrix of e is non-diagonal with a specific structure; s²{e} =

k	kr	kr²	...	kr^n-1
kr	k	kr	...	kr^n-2
...	...	...	...	...
kr^n-1	kr^n-2	kr^n-3	...	k

where

k = s²/(1-r²) ( k is Greek "kappa")

(This is why the model is called "generalized", as in "generalized least squares"; see Module 12.)

Even though the first-order autoregressive model is simple, it is often a good approximation of actual situations.

2. Generalized Least Squares Estimation Using Transformed Variables

Assume (for the sake of argument) that one knows the value of r.
Define the transformed variables

Y_t' = Y_t - rY_t-1
X_tk' = X_tk - rX_t-1,k

Then one can show that the regression

Y_t' = b₀' + ... + b_k'X_tk' + ... + u_t

based on the transformed variables has error term u_t which is no longer serially correlated, and that b_k = b_k' except that b₀' = b₀(1-r) (see NKNW pp. 508-509). Thus if one knows r one can get rid of the serial correlation by using OLS with the transformed data.
(This transformation can be derived from the application of GLS estimation to the non-diagonal variance-covariance matrix of e generated by the autocorrelation. So the transformation is a special case of GLS estimation.)

In practice the value of r is unknown. The 3 classical methods of estimation in the presence of autocorrelation discussed next (Cochrane-Orcutt, Hildreth-Lu, first differences) are all based on transforming the variables, using alternative ways of estimating r.

3. Cochrane-Orcutt Procedure

The Cochrane-orcutt procedure is

do an OLS regression of Y_t on the X_tk and calculate the residuals e_t
estimate the autocorrelation r as

r =

_{t=2
to n} e_t-1e_t /

_{t=2
to n} e_t-1²

use r to transform the variables into Y_t' and X_tk' using formula above; do an OLS regression of Y_t' on the X_tk'
if the D-W test still indicates serial correlation, reestimate r using residuals computed using the original variables Y_t and X_tk and the regression coefficients estimated from the (last) transformed regression; go to 3.

The following exhibits show the Cochrane-Orcutt procedure with the Blaisdell data.

Exhibit: Replication of Cochrane-Orcutt, Hildreth-Lu, & first differences procedures for Blaisdell data

4. Hildreth-Lu Procedure

The Hildreth-Lu procedure searches for the estimate of r that minimizes the sum of squared errors of the transformed regression, i.e.

SSE = S(Y_t' - ^Y_t')²

(Hilderth-Lu is similar to the Box-Cox procedure to estimate the parameter l of a power transformation of Y.)
One can search for the optimal r by calculating the transformed regression for closely spaced values of r and choosing the one with smallest SSE, as shown in NKNW.

Exhibit: Hildreth-Lu results using a grid of r-values (Blaisdell data) (Alsm4e Table 12.5 p. 513)

One can also estimate r and the regression coefficients simultaneously using iterative methods (nonlinear least squares). This can be done using the NONLIN module of SYSTAT, as shown in the exhibit analyzing the Blaisdell data.

Exhibit: (REPEAT) Replication of Cochrane-Orcutt, Hildreth-Lu, & first differences procedures for Blaisdell data

5. First Differences Procedure

First differences is the simplest transformation procedure as it implicitly assumes r = 1. This assumption is often approximately justified because

estimates of r are often close to 1
the relationship of SSE with r is often "flat" (as seen in the Hildreth-Lu procedure: see ALSM4e Table 12.5 p. 513) for values of r near the optimum, so the estimate of r does not need to be exact

The first differences transformation is thus

Y_t' = Y_t - Y_t-1
X_tk' = X_tk - X_t-1,k

The first differences procedure involves two regressions with the transformed data:

a first regression without a constant term to estimate the regression coefficients (since the first differences transformation "wipes out" the constant term)
a second regression with a constant term to recalculate the D-W D statistic only (because the D-W formula requires a constant in the model)

6. Comparison of the 3 Transformation Methods

Results of the 3 transformation methods (compared with OLS) are shown in the following table.

Regression results for 4 estimation methods (SYSTAT) - Blaisdell data (compare with ALSM5e <>, ALSM4e Table 12.7 p. 516 - some figures differ slightly)
	b₁	s{b₁}	t-ratio	r	MSE
Cochrane-Orcutt	.1738	.0029	59.42	.626	.004515
Hildreth-Lu (nonlinear LS)	.1605	.0079	20.24	.959	.004479
First differences	.1685	.0051	33.06	1.0	.004815
OLS	.1763	.0014	122.0	0.0	.007406

7. STATA Commands

Exhibit: STATA commands for analysis of Blaisdell data

4. COMPREHENSIVE EXAMPLE: U.S. DIVORCE RATE 1920-1970, 1920-1997

1. SYSTAT Analysis

The following exhibit present examples of the Cochrane-Orcutt, Hildreth-Lu (using nonlinear least squares), and first differences methods applied to an analysis of the divorce rate in the U.S. from 1920 to 1970.

As a substantive epilogue the next 3 exhibits relate to a model of the divorce rate that is more elaborate than one previously shown, as it includes a measure of the birth rate (women 15-44) and military personnel per 1,000 population. Only OLS results are shown.

2. STATA Analysis

To be added later.

5. SUMMARY & RECOMMENDATIONS

Analysis of the Blaisdell data and the divorce rates data illustrate the following approach:

set up your data as a time-series (i.e., identify variable that represents time, or the sequential order of observations) is required by your software (e.g., tsset in STATA)
do an OLS regression and test for autocorrelation of residuals (using Wald-Wolfowitz run test and Durbin-Watson test)
if autocorrelation is significant consider adding to the models variables that might contribute to the autocorrelation
do a Prais-Winston regression, two-step and iterated
if this solution fails investigate more complicated error structures (beyond this module)

Last modified 24 Apr 2006