PcGive Volume I: Empirical Econometric Modelling

These reference chapters have been taken from Volume I, and use the same chapter and section numbering as the printed version.

Table of contents
  Part:6 The Statistical Output of PcGive
16 Descriptive Statistics in PcGive
  16.1 Means, standard deviations and correlations
  16.2 Normality test and descriptive statistics
  16.3 Autocorrelations (ACF) and Portmanteau statistic
  16.4 Unit-root tests
  16.5 Principal components analysis
  16.6 Correlogram, ACF
  16.7 Partial autocorrelation function (PACF)
  16.8 Periodogram
  16.9 Spectral density
  16.10 Histogram, estimated density and distribution
  16.11 QQ plot
17 Model Estimation Statistics
  17.1 Recursive estimation: RLS/RIVE/RNLS/RML
  17.2 OLS estimation
  17.2.1 The estimated regression equation
  17.2.2 Standard errors of the regression coefficients
  17.2.3 t-values and t-probability
  17.2.4 Squared partial correlations
  17.2.5 Equation standard error (̂σ)
  17.2.6 Residual sum of squares (RSS)
  17.2.7 R2: squared multiple correlation coefficient
  17.2.8 F-statistic
  17.2.9 R2: Adjusted R2
  17.2.10 Log-likelihood
  17.2.11 Mean and standard error of dependent variable
  17.2.12 *Information criteria
  17.2.13 *Heteroscedastic-consistent standard errors (HCSEs)
  17.2.14 *R2 relative to difference and seasonals
  17.2.15 *Correlation matrix of regressors
  17.2.16 *Covariance matrix of estimated parameters
  17.2.17 1-step (ex post) forecast analysis
  17.2.18 Forecast test
  17.2.19 Chow test
  17.2.20 t-test for zero forecast innovation mean (RLS only)
  17.3 IV estimation
  17.3.1 *Reduced form estimates
  17.3.2 Structural estimates
  17.3.3 Specification χ2
  17.3.4 Testing β=0
  17.3.5 Forecast test
  17.4 RALS estimation
  17.4.1 Initial values for RALS
  17.4.2 Final estimates
  17.4.3 Analysis of 1-step forecasts
  17.4.4 Forecast tests
  17.5 Non-linear modelling
  17.5.1 Non-linear least squares (NLS) estimation
  17.5.2 Maximum likelihood (ML) estimation
  17.5.3 Practical details
18 Model Evaluation Statistics
  18.1 Graphic analysis
  18.2 Recursive graphics (RLS/RIVE/RNLS/RML)
  18.3 Dynamic analysis
  18.3.1 Static long-run solution
  18.3.2 Analysis of lag structure
  18.3.2.1 Tests on the significance of each variable
  18.3.2.2 Tests on the significance of each lag
  18.3.3 Tests on the significance of all lags
  18.3.4 COMFAC tests
  18.3.5 Lag weights
  18.4 Dynamic forecasting
  18.4.0.0.1 Forecast type
  18.4.0.0.2 Forecast standard errors
  18.4.0.0.3 Hedgehog plots
  18.4.0.0.4 Start forecast later
  18.4.0.0.5 Robust forecasts
  18.4.0.0.6 Level forecasts
  18.4.0.0.7 Derived function
  18.5 Diagnostic tests
  18.5.1 Introduction
  18.5.2 Residual autocorrelations (ACF), Portmanteau and DW
  18.5.2.1 Durbin--Watson statistic (DW)
  18.5.3 Error autocorrelation test (not for RALS, ML)
  18.5.4 Normality test
  18.5.5 Heteroscedasticity test using squares (not for ML)
  18.5.6 Heteroscedasticity test using squares and cross-products (not for ML)
  18.5.7 ARCH test
  18.5.8 RESET (OLS only)
  18.5.9 Parameter instability tests (OLS only)
  18.5.10 Diagnostic tests for NLS
  18.6 Linear restrictions test
  18.7 General restrictions
  18.8 Test for omitted variables (OLS)
  18.9 Progress: the sequential reduction sequence
  18.10 Encompassing and `non-nested' hypotheses tests
List of tables
  Table:18.1 Empirical size of normality tests
[2]

Part:6 The Statistical Output of PcGive

Chapter 16 Descriptive Statistics in PcGive

The Descriptive Statistics entry on the Model menu involves the formal calculation of statistics on database variables. Model-related statistics are considered in Chapters 17 and 18. This chapter provides the formulae underlying the computations. PcGive will use the largest available sample by default, here denoted by t=1,...,T. It is always possible to graph or compute the statistics over a shorter sample period.

16.1 Means, standard deviations and correlations

This reports sample means and standard deviations of the selected variables:

x= 1/T ∑t=1Txt,  s=(
1

T-1
t=1T( xt-x) 2)½.

The correlation coefficient rxy between x and y is:

rxy=
t=1T( xt-x) ( yt-y)

(∑t=1T( xt-x) 2t=1T( yt-y) 2)½
.
(eq:16.1)

The correlation matrix of the selected variables is reported as a symmetric matrix with the diagonal equal to one. Each cell records the simple correlation between the two relevant variables. The same sample is used for each variable; observations with missing values are dropped.

16.2 Normality test and descriptive statistics

This is the test statistic described in §18.5.4, which amounts to testing whether the skewness and kurtosis of the variable corresponds to that of a normal distribution. Missing value are dropped from each variable, so the sample size may be different for each variable.

16.3 Autocorrelations (ACF) and Portmanteau statistic

This prints the sample autocorrelation function of the selected variables, as described in §18.5.2. The same sample is used for each variable; observations with missing values are dropped.

16.4 Unit-root tests

A crucial property of any economic variable influencing the behaviour of statistics in econometric models is the extent to which that variable is stationary. If the autoregressive description

yt=α+∑i=1nγiyt-i+ut,
(eq:16.2)

has a root on the unit circle, then conventional distributional results are not applicable to coefficient estimates. As the simplest example, consider:

xt=α+βxt-1t where β=1    and εt~IN( 0,σε 2) ,

which generates a random walk (with drift if α≠0). Here, the autoregressive coefficient is unity and stationarity is violated. A process with no unit or explosive roots is said to be I(0); a process is I( d) if it needs to be differenced d times to become I(0) and is not I(0) if only differenced d-1 times. Many economic time series behave like I(1), though some appear to be I(0) and others I(2).

The Durbin--Watson statistic for the level of a variable offers one simple characterization of this integrated property:

DW( x) =
t=2T( xt-xt-1) 2

t=1T( xt-x) 2
.
(eq:16.3)

If xt is a random walk, DW will be very small. If xt is white noise, DW will be around 2. Very low DW values thus indicate that a transformed model may be desirable, perhaps including a mixture of differenced and disequilibrium variables.

An augmented Dickey--Fuller (ADF) test for I(1) against I(0) (see Dickey and Fuller, 1981) is provided by the t-statistic on β̂ in:

Δxt=α+μt+βxt-1+∑i=1nγiΔxt-i+ut.
(eq:16.4)

The constant or trend can optionally be excluded from (eq:16.4); the specification of the lag length n assumes that ut is white noise. The null hypothesis is H0: β=0; rejection of this hypothesis implies that xt is I(0). A failure to reject implies that Δxt is stationary, so xt is I(1). This is a second useful description of the degree of integratedness of xt. The Dickey--Fuller (DF) test has no lagged first differences on the right-hand side ( n=0) . On this topic, see the Oxford Bulletin of Economics and Statistics (Hendry, 1986, Banerjee and Hendry, 1992), and Banerjee, Dolado, Galbraith, and Hendry (1993). To test whether xt is I(1), commence with the next higher difference:

Δ2xt=α+μt+βΔxt-1+λxt-1+∑i=1nγiΔxt-i+ut.
(eq:16.5)

Output of the ADF(n) test of (eq:16.4) consists of:

coefficients α̂ and μ̂ (if included), β̂, γ̂1,...,γ̂n,
standard errors SE(α̂), SE(μ̂), SE(β̂), SE(γ̂i),
t-values tα , tμ , tβ , tγi,
σ̂ as (eq:17.10),
DW (eq:16.3) applied to ût,
DW(x) (eq:16.3) applied to xt,
ADF(x) tβ ,
Critical values
RSS as (eq:17.11).

Most of the formulae for the computed statistics are more conveniently presented in the next section on simple dynamic regressions, but the t-statistic is defined (e.g., for α̂) as tα = α̂/SE(α̂), using the formula in (eq:17.5). Critical values are derived from the reponse surfaces in MacKinnon (1991), and depend on whether a constant, or constant and trend, are included (seasonals are ignored). Under the null (β=0), α≠0 entails a trend in {xt} and μ≠0 implies a quadratic trend. However, under the stationary alternative, α=0 would impose a zero trend. Thus the test ceases to be similar if the polynomial in time (1,t,t2 etc.) in the model is not at least as large as that in the data generating process (see, for example, Kiviet and Phillips, 1992). This problem suggests allowing for a trend in the model unless the data is anticipated to have a zero mean in differences. The so-called Engle-Granger two-step method amounts to applying the ADF test to residuals from a prior static regression (the first step). The response surfaces need to be adjusted for the number of variables involved in the first step: see MacKinnon (1991).

The default of PcGive is to report a summary test output for the sequence of ADF(n)...ADF(0) tests. The summary table lists, for j=n,...,0:

D-lag j (the number of lagged differences),
t-adf the t-value on the lagged level: tβ ,
beta Y_1 the coefficient on the lagged level: β,
σ̂ as (eq:17.10),
t-DY_lag t-value of the longest lag: tγj,
t-prob significance of the longest lag: 1-P( | τ| ≤| tγj| ) ,
AIC Akaike criterion, see §17.2.12
F-prob significance level of the F-test on the lags dropped up to that point,

Critical values are given, and significance of the ADF test is marked by asterisks: * indicates significance at 5%, ** at 1%.

16.5 Principal components analysis

Principal components analysis (PCA) amounts to an eigenvalue analysis of the correlation matrix. Because the correlation matrix has ones on the diagonal, its trace equals k when k variables are involved. Therefore, the sum of the eigenvalues also equals k. Moreover, all eigenvalues are non-negative.

The eigenvalue decomposition of the k ×k correlation matrix C is:

C=HΛ H',

where λ is the diagonal matrix with the ordered eigenvalues λ1 ≥...≥λk ≥0 on the diagonal, and H=(h1, ..., hk) the matrix with the corresponding eigenvectors in the columns, H'H=Ik. The matrix of eigenvectors diagonalizes the correlation matrix:

H' C H =Λ.

Let (x1, ..., xk) denote the variables selected for principal components analysis (a T ×k matrix), and Z=(z1, ..., zk) the standardized data (i.e. in deviation from their mean, and scaled by the standard deviation). Then Z'Z/T = C. The jth principal component is defined as:

pj = Zhj = z1 h1j+...+zk hkj,

and accounts for 100 λj/k % of the variation. The largest m principal components together account for 100∑j=1m λj/k % of the variation.

Principal components analysis is used to capture the variability of the data in a small number of factors. Using the correlation matrix enforces a common scale on the data (analysis in terms of the variance matrix is not invariant to scaling). Some examples of the use of PCA in financial applications are given in Alexander (2001, Ch.6).

PCA is sometimes used to reconstruct missing data on y in combination with data condensation. Assume that T observations are available on y, but T+H on the remaining data, then two methods could be considered:

Step Data Sample Method Output
1y:X T PCAm,λ1,...,λm
2X T+H PCAp1,...,pm
3P T+H - ŷ10+∑i=1mλipi
1X T+H PCAm,p1,...,pm
2y:1:P T OLSβ01,...,βm
31:P T+H -ŷ2=β̂0+∑i=1mβ̂ipi

More recently, PCA has become a popular tool for forecasting.

16.6 Correlogram, ACF

Define the sample autocovariances {ĉj} of a stationary series xt, t=...,T:

ĉj= 1/T ∑t=j+1T( xt-x) ( xt-j-x),  j=0,...,T-1,
(eq:16.6)

using the full sample mean x= 1/T ∑t=1Txt. The variance σ̂2x corresponds to ĉ0.

The autocorrelation function (ACF) plots the series {r̂j} where r̂j is the sample correlation coefficient between xt and xt-j. The length of the ACF is specified by the user, leading to a figure which shows ( r̂1,r̂2,...,r̂s) plotted against ( 1,2,...,s) where for any j when x is any chosen variable:

r̂j=ĉj / ĉ0,  j=0,...,T-1.
(eq:16.7)

The first autocorrelation, {r̂0}, is equal to one, and omitted from the graphs.

The asymptotic variance of the autocorrelations is 1/T, so approximate 95% error bars are indicated at ±2T-1/2 (see e.g. Harvey, 1993, p.42).

If a series is non-stationary, the usual definition of a correlation between successive lags is required: see Nielsen (2006a). This comment also applies to the partial autocorrelation function described in the next section.

16.7 Partial autocorrelation function (PACF)

Given the sample autocorrelation function {r̂j}, the partial autocorrelations are computed using Durbin's method as described in Golub and Van Loan (1989, §4.7.2). This corresponds to recursively solving the Yule--Walker equations. For example, with autocorrelations, r̂0, r̂1, r̂2, ..., the first partial correlation is α̂0=1 (omitted from the graphs). The second, α̂1, is the solution from

(
r̂0
r̂1
) = (
r̂0 r̂1
r̂1 r̂0
) (
α̂0
α1
),

et cetera.

16.8 Periodogram

The periodogram is defined as:

p( ω) = 1/2π ∑j=-T+1T-1 ĉ| j| e -iωj = 1/2π ∑j=-T+1T-1 ĉ| j| cos ( jω)
= ĉ0/2π ∑j=-T+1T-1 r̂| j| cos ( jω) ,
for ω= 0, 2π/T, 4π/T, ..., (int(T/2)2π)/T.
(eq:16.8)

Note that p(0)=0.

When the periodogram is plotted, only frequencies greater than zero and up to π are used. Moreover, the x-axis, with values 0,...,π, is represented as 0,...,1. So, when T=4 the x coordinates are 0.5,1 corresponding to π/2, π. When T=5, the x coordinates are 0.4,0.8 corresponding to 2π/5, 4π/5.

16.9 Spectral density

The estimated spectral density is a smoothed function of the sample autocorrelations {r̂j}, defined as in (eq:16.7). The sample spectral density is then defined as:

ŝ( ω) =
1

j=-(T-1)T-1K( j) r̂| j| cos ( jω) ,  0≤ω≤π,
(eq:16.9)

where | .| takes the absolute value, so that, for example, r̂| -1| =r̂1. The K( .) function is called the lag window. OxMetrics uses the Parzen window:

K( j) = 1-6( j/m ) 2+6| j/m | 3, | j/m | ≤0.5,
= 2( 1-| j/m | ) 3, 0.5≤| j/m | ≤1.0,
= 0, | j/m | >1.
(eq:16.10)

We have that K(-j)=K(j), so that the sign of j does not matter ( cos (x)= cos (-x)). The r̂js are based on fewer observations as j increases. The window function attaches decreasing weights to the autocorrelations, with zero weight for j>m. The parameter m is called the lag truncation parameter. In OxMetrics, this is taken to be the same as the chosen length of the correlogram. For example, selecting s=12 (the with length setting in the dialog) results in m=12. The larger m, the less smooth the spectrum becomes, but the lower the bias. The spectrum is evaluated at 128 points between 0 and π. For more information see Priestley (1981) and Granger and Newbold (1986).

16.10 Histogram, estimated density and distribution

Given a data set {xt}=( x1...xT) which are observations on a random variable X. The range of {xt} is divided into N intervals of length h with h defined below. Then the proportion of xt in each interval constitutes the histogram; the sum of the proportions is unity on the scaling that is used. The density can be estimated as a smoothed function of the histogram using a normal or Gaussian kernel. This can then be summed (`integrated') to obtain the estimated cumulative distribution function (CDF).

Denote the actual density of X at x by fx( x) . A non-parametric estimate of the density is obtained from the sample by:

fx( x) ̂= 1/Th ∑t=1TK(
x-xt

h
) ,
(eq:16.11)

where h is the window width or smoothing parameter, and K( .) is a kernel such that:

-∞ K( z) dz=1.

PcGive sets:

h=1.06σ̂x/T0.2

as a default, and uses the standard normal density for K( .) :

K(
x-xt

h
) =
1

(2π)½
exp [ - ½(
x-xt

h
) 2] .
(eq:16.12)

fx( x) ̂ is usually calculated for 128 values of x, using a fast Fourier transform. An excellent reference on density function estimation is Silverman (1986).

16.11 QQ plot

The variable in a QQ plot would normally hold critical values which are hypothesized to come from a certain distribution. The QQ plot function then draws a cross plot of these observed values (sorted), against the theoretical quantiles. The 45o line is drawn for reference (the closer the cross plot to this line, the better the match).

The normal QQ plot includes the pointwise asymptotic 95% standard error bands, as derived in Engler and Nielsen (2009)) for residuals of regression models (possibly autoregressive) with an intercept.

Chapter 17 Model Estimation Statistics

Single equation estimation is allowed by:

OLS-CS ordinary least squares (cross-section modelling)
IVE-CS instrumental variables estimation (cross-section modelling)
OLS ordinary least squares
IVE instrumental variables estimation
RALS rth order autoregressive least squares
NLS non-linear least squares
ML maximum likelihood estimation

Once a model has been specified, a sample period selected, and an estimation method chosen, the equation can be estimated. OLS-CS/IVE-CS and OLS/IVE only differ in the way the sample period is selected. In the first, cross section, case, all observations with missing values are omitted. Therefore, `holes' in the database are simply skipped. In cross-section mode it is also possible to specify a variable Sel by which to select the sample. In that case, observations where Sel has a 0 or missing values are omitted from the estimation sample (but, if data is available, included in the prediction set). In dynamic regression, the observations must be consecutive in time, and the maximum available sample is the leading contiguous sample. The following table illustrates the default sample when regressing y on a constant:

y Sel cross section dynamic
defaultusing Sel
1980 . 0    
1981 . 0    
1982 4 0 * *
1983 7 1 ***
1984 9 1 ***
1985 10 1 ***
1986 . 0    
1987 12 0 *  

For ease of notation, the sample period is denoted t=1,...,T+H, after allowing for any lagged variables created where H is the forecast horizon. The data used for estimation are X=( x1...xT) . The H retained observations XH=( xT+1...xT+H) are used for static (1-step) forecasting and evaluating parameter constancy.

This chapter discusses the statistics reported by PcGive following model estimation. The next chapter presents the wide range of evaluation tools available following successful estimation. Sections marked with * denote information that can be shown or omitted on request. In the remainder there is no distinction between OLS/IVE and OLS-CS/IVE-CS.

17.1 Recursive estimation: RLS/RIVE/RNLS/RML

In most cases, recursive estimation is available:

RLS recursive OLS
RIVE recursive IVE
RNLS recursive NLS
RML recursive ML

Recursive OLS and IV estimation methods are initialized by a direct estimation over t=1,...,M-1, followed by recursive estimation over t=M,...,T. RLS and RIVE update inverse moment matrices . This is inherently somewhat numerically unstable, but, because it is primarily a graphical tool, this is not so important.

Recursive estimation of non-linear models is achieved by the brute-force method: first estimate for the full sample, then shrink the sample by one observation at a time. At each step the estimated parameters of the previous step are used as starting values, resulting in a considerably faster algorithm.

The final estimation results are always based on direct full-sample estimation, so unaffected whether recursive or non-recursive estimation is used. The recursive output can be plotted from the recursive graphics dialog.

17.2 OLS estimation

The algebra of OLS estimation is well established from previous chapters. The model is:

yt=β'xt+ut,   with ut~IN( 0,σ2)   t=1,...,T,

or more compactly:

y=+u, with u~NT( 02I) .
(eq:17.1)

The vectors β and xt are k×1. The OLS estimates of β are:

β̂=( X'X) -1X'y,
(eq:17.2)

with residuals

t=yt-ŷt=yt-xt'β̂,  t=1,...,T,
(eq:17.3)

and estimated residual variance

σ̂u2=
1

T-k
t=1Tt2.
(eq:17.4)

Forecast statistics are provided for the H retained observations (only if H≠0). For OLS, these are comprehensive 1-step ahead forecasts and tests, described below.

The estimation output is presented in columnar format, where each row lists information pertaining to each variable (its coefficient, standard error, t-value, etc.). Optionally, the estimation results can be printed in equation format,, which is of the form coefficient × variable with standard errors in parentheses underneath.

17.2.1 The estimated regression equation

The first column of these results records the names of the variables and the second, the estimated regression coefficients β̂=( X'X) -1X'y. PcGive does actually not use this expression to estimate β̂. Instead it uses the QR decomposition with partial pivoting, which analytically gives the same result, but in practice is a bit more reliable (i.e. numerically more stable). The QR decomposition of X is X=QR, where Q is T ×T and orthogonal (that is, Q'Q=I), and R is T ×k and upper triangular. Then X'X=R'R.

The following five columns give further information about each of the magnitudes described below in §17.2.2 to §17.2.11.

17.2.2 Standard errors of the regression coefficients

These are obtained from the variance-covariance matrix:

SE[ β̂i] =(V[ β̂i] ̂)½=σ̂u(dii)½
(eq:17.5)

where dii is the ith diagonal element of ( X'X) -1 and σ̂u is the standard error of the regression, defined in (eq:17.4).

17.2.3 t-values and t-probability

These statistics are conventionally calculated to determine whether individual coefficients are significantly different from zero:

t-value=
β̂i

SE[ β̂i]
(eq:17.6)

where the null hypothesis H0 is βi=0. The null hypothesis is rejected if the probability of getting a t-value at least as large is less than 5% (or any other chosen significance level). This probability is given as:

t-prob=1-Prob( | τ| ≤| t-value| )
(eq:17.7)

in which τ has a Student t-distribution with T-k degrees of freedom. The t-probabilities do not appear when all other options are switched on.

When H0 is true (and the model is otherwise correctly specified in a stationary process), a Student t-distribution is used since the sample size is often small, and we only have an estimate of the parameter's standard error: however, as the sample size increases, τ tends to a standard normal distribution under H0. Large t-values reject H0; but, in many situations, H0 may be of little interest to test. Also, selecting variables in a model according to their t-values implies that the usual (Neyman--Pearson) justification for testing is not valid (see, for example, Judge, Griffiths, Hill, Lütkepohl, and Lee, 1985).

17.2.4 Squared partial correlations

The final column lists the squared partial correlations under the header Part.R^2. The jth entry in this column records the correlation of the jth explanatory variable with the dependent variable, given the other k-1 variables. Adding further explanatory variables to the model may either increase or lower the squared partial correlation, and the former may occur even if the added variables are correlated with the already included variables. If the squared partial correlations fall on adding a variable, then that is suggestive of collinearity for the given equation parametrization: that is, the new variable is a substitute for, rather than a complement to, those already included.

Beneath the columnar presentation an array of summary statistics is also provided as follows:

17.2.5 Equation standard error (̂σ)

The residual variance is defined as:

σ̂ u2=
1

T-k
t=1Tt2,
(eq:17.8)

where the residuals are defined as:

t=yt-ŷt=yt-xt'β̂,  t=1,...,T.
(eq:17.9)

The equation standard error (ESE) is the square root of (eq:17.10):

σ̂ u=[
1

T-k
t=1Tt2]1/2.
(eq:17.10)

This is labelled sigma in the regression output.

17.2.6 Residual sum of squares (RSS)

RSS=∑t=1Tt2.
(eq:17.11)

17.2.7 R2: squared multiple correlation coefficient

The variation in the dependent variable, or the total sum of squares (TSS), can be broken up into two parts: the explained sum of squares (ESS) and the residual sum of squares (RSS). In symbols, TSS=ESS+RSS, or:

t=1T( yt-y) 2=∑t=1T( ŷt-y) 2+∑t=1Tt2,

and hence:

R2= ESS/TSS =
t=1T( ŷt-y) 2

t=1T( yt-y) 2
=1-
t=1Tt2

t=1T( yt-y) 2
=1- RSS/TSS ,

assuming a constant is included. Thus, R2 is the proportion of the variance of the dependent variable which is explained by the variables in the regression. By adding more variables to a regression, R2 will never decrease, and it may increase even if nonsense variables are added. Hence, R2 may be misleading. Also, R2 is dependent on the choice of transformation of the dependent variable (for example, y versus Δy) -- as is the F-statistic below. The equation standard error, σ̂u, however, provides a better comparative statistic because it is adjusted by the degrees of freedom. Generally, σ̂ can be standardized as a percentage of the mean of the original level of the dependent variable (except if the initial mean is zero) for comparisons across specifications. Since many economic magnitudes are inherently positive, that standardization is often feasible. If y is in logs, 100σ̂ is the percentage standard error.

R2 is not reported if the regression does not have an intercept.

17.2.8 F-statistic

The formula was already given:

ηβ =
R2/( k-1)

( 1-R2) /( T-k)
~F( k-1,T-k)
(eq:17.12)

Here, the null hypothesis is that the population R2 is zero, or that all the regression coefficients are zero (excluding the intercept). The value for the F-statistic is followed by its probability value between square brackets.

17.2.9 R2: Adjusted R2

The adjusted R2 incorporates a penalty for the number of regressors:

R2= R2 -
k-1

T-k
(1 - R2),

assuming a constant is included. The adjusted R-squared can go down when the number of variables increases. Nonetheless, there is no rationale to use it as a model selection criterion.

An alternative way to express it uses (eq:17.8) and (eq:17.13):

R2= 1 -
σ̂2u

σ̂2y
,

so maximizing R2 corresponds to minimizing σ̂2u.

R2 is not reported if the regression does not have an intercept.

17.2.10 Log-likelihood

The log-likelihood for model (eq:17.1) is:

l(β2 | y, X) = -T/2 log 2π- T/2 log σ2 - ½
u' u

σ2
.

Next, we can concentrate σ2 out of the log-likelihood to obtain:

lc(β | y, X) = Kc - T/2 log
'

T
,

where

Kc = -T/2(1 + log 2π).

The reported log-likelihood includes the constant, so corresponds to:

lc (β | y, X)= Kc - T/2 log RSS/T .

17.2.11 Mean and standard error of dependent variable

The final entries list the number of observations used in the regressor (so after allowing for lags), and the number of estimated parameters. This is followed by the mean and standard error of the dependent variable:

y= 1/T ∑t=1Tyt,  σ̂y= [
1

T-1
t=1T( yt-y) 2]1/2.
(eq:17.13)

Note that we use T-1 in the denominator of σ̂2y, so this is what would be reported as the equation standard error (eq:17.10) when regressing the dependent variable on just a constant.

[Note: The maximum likelihood estimate for a linear model gives 1/T, while regression on a constant using OLS produces an unbiased estimate of the variance using 1/(T-1).]

17.2.12 *Information criteria

The four statistics reported are the Schwarz criterion (SC), the Hannan--Quinn (HQ) criterion, the Final Prediction Error (FPE), and the Akaike criterion (AIC). Here:

SC = log σ̃2+k ( log T) /T,
HQ = log σ̃2+2k ( log ( log T) ) /T,
FPE = ( T+k) σ̃2/( T-k) ,
AIC = log σ̃2+2k /T.
(eq:17.14)

using the maximum likelihood estimate of σ2:

σ̃2=
T-k

T
σ̂2= 1/T ∑t=1Tt2.

For a discussion of the use of these and related scalar measures to choose between alternative models in a class, see Judge, Griffiths, Hill, Lütkepohl, and Lee (1985) and §18.9 below.

17.2.13 *Heteroscedastic-consistent standard errors (HCSEs)

These provide consistent estimates of the regression coefficients' standard errors even if the residuals are heteroscedastic in an unknown way. Large differences between the corresponding values in §17.2.2 and §17.2.13 are indicative of the presence of heteroscedasticity, in which case §17.2.13 provides the more useful measure of the standard errors (see White, 1980). PcGive contains two methods of computing heteroscedastic-consistent standard errors: as described in White (1980) (labelled HCSE), or the Jack-knife estimator from MacKinnon and White (1985) (labelled JHCSE; for which the code was initially provided by James MacKinnon).

The heteroscedasticity and autocorrelation consistent standard errors are reported in the column labelled HACSE. This follows Newey and West (1987), also see Andrews (1991).

17.2.14 *R2 relative to difference and seasonals

The R2 is preceded by the seasonal means s of the first difference of the dependent variable (\Delta y for annual data, four quarterly means for quarterly data, twelve monthly means for monthly data etc.).

The R2 relative to difference and seasonals is a measure of the goodness of fit relative to ∑(Δyt-s)2 instead of ∑(yt-y)2 in the denominator of R2 (keeping ∑ût2 in the numerator), where s denotes the relevant seasonal mean. Despite its label, such a measure can be negative: if it is, the fitted model does less well than a regression of Δyt on seasonal dummies.

17.2.15 *Correlation matrix of regressors

This reports the sample means and sample standard deviations of the selected variables:

x= 1/T ∑t=1Txt,  s=(
1

T-1
t=1T( xt-x) 2)½.

The correlation matrix of the selected variables is reported as a lower-triangular matrix with the diagonal equal to one. Each cell records the simple correlation between the two relevant variables. The calculation of the correlation coefficient rxy between x and y is:

rxy=
t=1T( xt-x) ( yt-y)

(∑t=1T( xt-x) 2t=1T( yt-y) 2)½
.
(eq:17.15)

17.2.16 *Covariance matrix of estimated parameters

The matrix of the estimated parameters' variances is reported as lower triangular. Along the diagonal, we have the variance of each estimated coefficient, and off the diagonal, the covariances. The k×k variance matrix of β̂ is estimated by:

V[ β̂] ̂=σ̂2( X'X) -1,
(eq:17.16)

where σ̂2 is the full-sample equation error variance. The variance-covariance matrix is only shown when requested, in which case it is reported before the equation output.

The remaining statistics only appear if observations were withheld for forecasting purposes:

17.2.17 1-step (ex post) forecast analysis

Following estimation over t=1,...,T, 1-step forecasts (or static forecasts) are given by:

[Note: Dynamic forecasts are needed when the xs are also predicted for the forecast period. Dynamic forecasts are also implemented, see §18.4, but the econometrics is discussed in Volume II (Doornik and Hendry, 2013c).]

t=xt'β̂,  t=T+1,...,T+H,
(eq:17.17)

which requires the observations XH'=(xT+1,...,xT+H). The 1-step forecast error is the mistake made each period:

et=yt-xt'β̂,  t=T+1,...,T+H,
(eq:17.18)

which can be written as:

et=xt'β+ut-xt'β̂=xt'( β-β̂) +ut.
(eq:17.19)

Assuming that E[β̂]=β, then E[et]=0 and:

V[ et] =E[ et2] =E[ ( xt'( β-β̂) ) 2+ut2] =σu2xt'( X'X) -1xtu2.
(eq:17.20)

This corresponds to the results given for the innovations in recursive estimation. The whole vector of forecast errors is e=( eT+1,...,eT+H) '. V[e] is derived in a similar way:

V[ e] =σ2IH+XHV[ β̂] XH'=σu2( IH+XH( X'X) -1XH') .
(eq:17.21)

Estimated variances are obtained after replacing σu2 by σ̂u2.

The columns respectively report the date for which the forecast is made, the realized outcome (yt), the forecast (ŷt), the forecast error (et=yt-ŷt), the standard error of the 1-step forecast (SE( et) =√V[ et] ̂), and a t-value (that is, the standardized forecast error et/SE( et) ).

17.2.18 Forecast test

A χ2 statistic follows the 1-step analysis, comparing within and post-sample residual variances. Neither this statistic nor η3 below measure absolute forecast accuracy. The statistic is calculated as follows:

ξ1=∑t=T+1T+H
et2

σ̂u2
   app̃ χ2( H) on H0.
(eq:17.22)

The null hypothesis is `no structural change in any parameter between the sample and the forecast periods' (denoted 1 and 2 respectively), H0: β1=β2; σ1222. A rejection of the null hypothesis of constancy by ξ3 below implies a rejection of the model used over the sample period -- so that is a model specification test -- whereas the use of ξ1 is more as a measure of numerical parameter constancy, and it should not be used as a model-selection device (see Kiviet, 1986). However, persistently large values for this statistic imply that the equation under study will not provide very accurate ex ante predictions, even one step ahead. An approximate F-equivalent is given by:

η1= 1/H ξ1   app̃ F( H,T-k) on H0.
(eq:17.23)

A second statistic takes parameter uncertainty into account, taking the denominator from (eq:17.20):

ξ2=∑t=T+1T+H
et2

σ̂ u2(1 + xt'( X'X) -1xt)
   app̃ χ2( H) on H0.
(eq:17.24)

This test is not reported in single-equation modelling, but individual terms of the summation can be plotted in the graphical analysis.

17.2.19 Chow test

This is the main test of parameter constancy and has the form:

η3=
( RSST+H-RSST) /H

RSST/(T-k)
app̃ F( H,T-k) on H0
(eq:17.25)

where H0 is as for ξ1. For fixed regressors, the Chow (1960) test is exactly distributed as an F, but is only approximately (or asymptotically) so in dynamic models.

Alternatively expressed, the Chow test is:

η3=H-1ξ3=H-1e'( V[ e] ̂) -1e.
(eq:17.26)

We can now see the relation between ξ3 and ξ1: the latter uses V[ e] ̂=σ̂u2I, obtained by dropping the (asymptotically negligible) term V[β̂] in (eq:17.21). In small samples, the dropped term is often not negligible, so ξ1 should not be taken as a test. The numerical value of ξ1 always exceeds that of ξ3: the difference indicates the relative increase in prediction uncertainty arising from estimating, rather than knowing, the parameters.

PcGive computes the Chow test efficiently, by noting that:

σ̂u2e'( V[ e] ̂) -1e=e'( IH-XH( X'X+XH'XH) -1XH') e.
(eq:17.27)

17.2.20 t-test for zero forecast innovation mean (RLS only)

The recursive formulae are applicable over the sample T+1,...,T+H, and under the null of correct specification and H0 of ξ1 above, then the standardized innovations {νt/(ωt)1/2} are distributed as IN(0,σ2u). Thus:

(H)½
1/H ∑t=T+1T+Hνt/(ωt)1/2

σ̂u
~t( H-1) on H0.
(eq:17.28)

This tests for a different facet of forecast inaccuracy in which the forecast errors have a small but systematic bias. This test is the same as an endpoint CUSUM test of recursive residuals, but using only the forecast sample period (see Harvey and Collier, 1977).

17.3 IV estimation

Here we write the model as:

yt=β0'yt*+β1'wtt,
(eq:17.29)

in which we have n-1 endogenous variables yt* and q1 non-modelled variables wt on the right-hand side (the latter may include lagged endogenous variables). We assume that we have q2 additional instruments, labelled wt*. Write yt=(yt:yt*')' for the n×1 vector of endogenous variables. Let zt denote the set of all instrumental variables (non-endogenous included regressors, plus additional instruments): zt=(wt':wt*')', which is a vector of length q=q1+q2.

17.3.1 *Reduced form estimates

The reduced form (RF) estimates are only printed on request. If Z'=( z1...zT) , and yt denotes all the n endogenous variables including yt at t with Y'=(y1,...,yT), then the RF estimates are:

Π̂'=( Z'Z) -1Z'Y,
(eq:17.30)

which is q×n. The elements of Π̂' relevant to each endogenous variable are written:

πi=( Z'Z) -1Z'Yi,  i=1,...,n,
(eq:17.31)

with Yi'=(yi1,...,yiT) the vector of observations on the ith endogenous variable. Standard errors etc. all follow as for OLS above (using Z, Yi for X,y in the relevant equations there).

17.3.2 Structural estimates

Generalized instrumental variables estimates for the k=n-1+q1 coefficients of interest β=(β0':β1')' are:

β̃=( X'Z( Z'Z) -1Z'X) -1X'Z( Z'Z) -1Z'y,
(eq:17.32)

using xt=(yt*':wt')', X'=( x1...xT) , y=(y1...yT)', which is the left-hand side of (eq:17.29), and Z is as in (eq:17.30). This allows for the case of more instruments than explanatory variables (q>k), and requires rank(X'Z)=k and rank(Z'Z)=q. If q=k the equation simplifies to:

β̃=( Z'X) -1Z'y.
(eq:17.33)

As for OLS, PcGive does not use expression (eq:17.32) directly, but instead uses the QR decomposition for numerically more stable computation. The error variance is given by

σ̃ε =
ε̃'ε̃

T-k
, where ε̃=y-Xβ̃.
(eq:17.34)

The variance of β̃ is estimated by:

V[ β̃] ̂=σ̃2ε ( X'Z( Z'Z) -1Z'X) -1.
(eq:17.35)

Again the output is closely related to that reported for least squares except that the columns for HCSE, partial r2 and instability statistics are omitted. However, RSS, σ̃ and DW are recorded, as is the reduced form σ̂ (from regressing yt on zt, already reported with the RF equation for yt). Additional statistics reported are :

17.3.3 Specification χ2

This tests for the validity of the choice of the instrumental variables as discussed by Sargan (1964). It is asymptotically distributed as χ2(q2-n+1) when the q2-n+1 over-identifying instruments are independent of the equation error. It is also interpretable as a test of whether the restricted reduced form (RRF) of the structural model (yt on xt plus xt on zt) parsimoniously encompasses the unrestricted reduced form (URF: yt on zt directly):

π̂'( Z'Z) π̂-β̃'( X'Z( Z'Z) -1Z'X) β̃

ε̃'ε̃/T
    app  ̃  χ2(q2-n+1),
(eq:17.36)

with π̂=( Z'Z) -1Z'y being the unrestricted reduced form estimates.

17.3.4 Testing β=0

Reported is the χ2 test of β=0 (other than the intercept) which has a crude correspondence to the earlier F-test. On H0: β=0, the reported statistic behaves asymptotically as a χ2( k-1) . First define

ξβ =β̃'( X'Z( Z'Z) -1Z'X) β̃.
(eq:17.37)

Then ξβ /σ̃ε     app  ̃  χ2(k) would test whether all k coefficients are zero. To keep the intercept separate, we compute:

ξβ -Ty2

σ̃2ε
    app  ̃  χ2(k-1).
(eq:17.38)

This amounts to using the formula for β̃ (eq. (eq:17.32)) in ξβ with y-yι instead of y.

17.3.5 Forecast test

A forecast test is provided if H observations are retained for forecasting. For IVE there are endogenous regressor variables: the only interesting issue is that of parameter constancy and correspondingly the output is merely ξ1 of (eq:17.22) using σ̃ε and:

et=yt-xt'β̃,  t=T+1,...,T+H.
(eq:17.39)

Dynamic forecasts (which require forecasts of the successive xT+1,...,xT+H) could be obtained from multiple equation dynamic modelling, where the system as a whole is analyzed.

17.4 RALS estimation

As discussed in the typology, if a dynamic model has common factors in its lag polynomials, then it can be re-expressed as having lower-order systematic dynamics combined with an autoregressive error process (called COMFAC. If the autoregressive error is of rth order, the estimator is called rth-order Autoregressive Least Squares or RALS, and it takes the form:

β0( L) yt=∑i=1mβi( L) zit+ut with α( L) utt,
(eq:17.40)

when:

α( L) =1-∑i=srαiLi.
(eq:17.41)

This can be written as:

yt=xt'β+ut,  ut=∑i=srαiut-it,  t=1,...,T,
(eq:17.42)

with εt~IN( 0,σε 2) .

Maximizing:

f( β,α) =- 1/T ∑t=1Tεt2
(eq:17.43)

as a function of the ( β,α) parameters yields a non-linear least squares problem necessitating iterative solution. However, conditional on values of either set of parameters, f( .) is linear in the other set, so analytical first and second derivatives are easy to obtain. There is an estimator-generating equation for this whole class (see Hendry, 1976, Section 7), but as it has almost no efficient non-iterative solutions, little is gained by its exploitation. Letting θ denote all of the unrestricted parameters in β0( .) , {βi( .) } and α( .) , then the algorithm programmed in PcGive for maximizing f( .) as a function of θ is a variant of the Gauss--Newton class. Let:

q( θ) =
f

θ
and Q=E[ qq'] ,
(eq:17.44)

so that negligible cross-products are eliminated, then at the ith iteration:

θi+1=θi-siQi-1qi,  i=0,...,I,
(eq:17.45)

where si is a scalar chosen by a line search procedure to maximize f(θi+1|θi). The convergence criterion depends on qi'Qi-1qi and on changes in θi between iterations. The bi-linearity of f( .) is exploited in computing Q.

17.4.1 Initial values for RALS

Before estimating by RALS, OLS estimates of {βi} are calculated, as are LM-test values of {αi}, where the prespecified autocorrelation order is `data frequency+1' (for example, 5 for quarterly data). These estimates are then used to initialize θ. However, the {αi} can be reset by users. Specifically, for single-order processes, utrut-rt, then αr can be selected by a prior grid search. The user can specify the maximum number of iterations, the convergence tolerance, both the starting and ending orders of the polynomial α( L) in the form:

ut=∑i=srαiut-it,

and whether to minimize f( .) sequentially over s, s+1,...,r or merely the highest order, r.

17.4.2 Final estimates

On convergence, the variances of the θs are calculated (from Q-1), as are the roots of α( L) =0. The usual statistics for σ̂, RSS (this can be used in likelihood-ratio tests between alternative nested versions of a model), t-values etc. are reported, as is T-1∑( yt-y) 2 in case a pseudo-R2 statistic is desired.

17.4.3 Analysis of 1-step forecasts

Rewrite the RALS model as:

yt=xt'β+∑i=srαiut-it
(eq:17.46)

with:

t=xt'β̂+∑i=srα̂it-i
(eq:17.47)

where β̂ and {α̂i} are obtained over 1,...,T. The forecast error is:

et=yt-ŷtt+xt'( β-β̂) +∑i=sr( αiut-i-α̂it-i)
(eq:17.48)

or:

ett+xt'( β-β̂) +∑i=sr[ ( αi-α̂i) ût-ii( ut-i-ût-i) ] .
(eq:17.49)

Now:

ut-i-ût-i=( yt-i-xt-i'β) -( yt-i-xt-i'β̂) =-xt-i'( β-β̂) .
(eq:17.50)

Consequently:

ett+( xt'-∑i=srαixt-i') ( β-β̂) +∑i=sr( αi-α̂i) ût-i.
(eq:17.51)

Thus:

ett+xt+'( β-β̂) +ûr'( α-α̂) =εt+wt'( θ-θ̂) ,
(eq:17.52)

where we define xt+'=xtsrαixt-i, ûr'=( ût-s...ût-r) , wt'=( xt+':r') , and θ'=( β':α') when α'=( αs...αr) . E[ et] ~=0 for a correctly-specified model. Finally, therefore (neglecting the second-order dependence of the variance of wt'(θ-θ̂) on θ̂ acting through wt):

V[ et] =σ2+wt'V[ θ̂] wt.
(eq:17.53)

V[θ̂] is the RALS variance-covariance matrix, and from the forecast-error covariance matrix, the 1-step analysis is calculated, as are parameter-constancy tests.

The output is as for OLS: the columns respectively report the date for which the forecast is made, the realized outcome (yt), the forecast (ŷt), the forecast error (et=yt-ŷt), the standard error of the 1-step forecast (SE( et) =√V[ et] ̂), and a t-value (that is, the standardized forecast error et/SE( et) ).

17.4.4 Forecast tests

The RALS analogues of the forecast test ξ1 of (eq:17.22), and of the Chow test η3 in (eq:17.26), are reported. The formulae follow directly from (eq:17.48) and (eq:17.53).

17.5 Non-linear modelling

17.5.1 Non-linear least squares (NLS) estimation

The non-linear regression model is written as

yt=f( xt,θ) +ut,  t=1,...,T, with ut~IN( 0,σu2) .
(eq:17.54)

We take θ to be a k×1 vector. For example:

yt01xtθ23zt1-θ2+ut.

Note that for fixed θ2 this last model becomes linear; for example, for θ2= 1/2 :

yt01xt*3zt*+ut,  xt*=(xt)½,  zt*=(zt)½,

which is linear in the transformed variables xt*,  zt*. As for OLS, estimation proceeds by minimizing the sum of squared residuals:

θ̂= argmin θt=1Tut2= argmin θt=1T( yt-f( xt,θ) ) 2.
(eq:17.55)

In linear models, this problem has an explicit solution; for non-linear models the minimum has to be found using iterative optimization methods.

Instead of minimizing the sum of squares, PcGive maximizes the sum of squares divided by -T:

θ̂= argmax θg( θ|yt,xt) = argmax θ{- 1/T ∑t=1Tut2}.
(eq:17.56)

As for RALS, an iterative procedure is used to locate the maximum:

θi+1=θi+siQ( θi) -1q( θi) ,
(eq:17.57)

with q( .) the derivatives of g(.) with respect to θj (this is determined numerically), and Q( .) -1 a symmetric, positive definite matrix (determined by the BFGS method after some initial Gauss-Newton steps). Practical details of the algorithm are provided in §17.5.3; Volume II gives a more thorough discussion of the subject of numerical optimization. Before using NLS you are advised to study the examples given in the tutorial Chapter, to learn about the potential problems.

Output is as for OLS, except for the instability tests and HCSEs which are not computed. The variance of the estimated coefficients is determined numerically, other statistics follow directly, for example:

σ̂u2=
1

T-k
t=1Tt2, with ût=yt-f( xt,θ̂) .
(eq:17.58)

Forecasts are computed and graphed, but the only statistic reported is the ξ1 test of (eq:17.22), using 1-step forecast errors:

et=yt-f( xt,θ̂) ,  t=T+1,...,T+H.
(eq:17.59)

17.5.2 Maximum likelihood (ML) estimation

We saw that for an independent sample of T observations and k parameters θ:

θ̂= argmax θl( θ|X) = argmax θt=1Tl( θ|xt) .
(eq:17.60)

This type of model can be estimated with PcGive, which solves the problem:

max θt=1Tl( θ|xt) .
(eq:17.61)

Models falling in this class are, for example, binary logit and probit, ARCH, GARCH, Tobit, Poisson regression. As an example, consider the linear regression model. PcGive gives three ways of solving this:

  1. direct estimation (OLS);

  2. numerical minimization of the residual sum of squares (NLS);

  3. numerical maximization of the likelihood function (ML).

Clearly, the first method is to be preferred when available.

Estimation of (eq:17.61) uses the same technique as NLS. The output is more concise, consisting of coefficients, standard errors (based on the numerical second derivative), t-values, t-probabilities, and `loglik' which is ∑t=1Tl(θ̂|xt). Forecasts are computed and graphed, but no statistics are reported.

17.5.3 Practical details

Non-linear model are formulated in algebra code. NLS requires the definition of a variable called actual, and one called fitted. It uses these to maximize minus the residual sum of squares divided by T:

- 1/T ∑t=1T (actualt - fittedt)2.

An example for NLS is:

actual = CONS;
fitted = &0 + &1 * INC + &2 * lag(INC,1);
&0  = 400;
&1  = 0.8;
&2  = 0.2;

This is just a linear model, and much more efficiently done using the normal options.

Models can be estimated by maximum likelihood if they can be written as a sum over the observations (note that the previous concentrated log-likehood cannot be written that way!). An additional algebra line is required, to define a variable called loglik. PcGive maximizes:

t=1T loglikt .

Consider, for example, a binary logit model:

actual = vaso;
xbeta = &0 + &1 * Lrate + &2 * Lvolume;
fitted = 1 / (1 + exp(-xbeta));
loglik = actual * log(fitted) + (1-actual) * log(1-fitted);
&0  = 0.74;
&1  = 1.3;
&2  = 2.3;

Here actual and fitted are not really that, but these variables define what is being graphed in the graphic analysis.

Note that algebra is a vector language without temporary variables, restricting the class of models that can be estimated. Non-linear models are not stored for recall and progress reports.

After correct model specification, the method is automatically set to Non-linear model (using ML if loglik is defined, NLS/RNLS otherwise); in addition, the following information needs to be specified:

  1. Estimation sample.

  2. The number of forecasts; enter the number of observations you wish to withhold for forecasting.

  3. Whether to use recursive estimation, and if so, the number of observations you wish to use for initialization.

NLS and ML estimation (and their recursive variants RNLS and RML) require numerical optimization to maximize the likelihood log L( φ( θ) ) = l( φ( θ) ) as a non-linear function of θ. PcGive maximization algorithms are based on a Newton scheme:

θi+1 = θi+siQi-1qi
(eq:17.62)

with

PcGive uses the quasi-Newton method developed by Broyden, Fletcher, Goldfarb, Shanno (BFGS) to update K = Q-1 directly. It uses numerical derivatives to compute ∂l( φ( θ) ) / ∂θi. However, for NLS, PcGive will try Gauss-Newton before starting BFGS. In this hybrid method, Gauss-Newton is used while the relative progress in the function value is 20%, then the program switches to BFGS.

Starting values must be supplied. The starting value for K consistes of 0s off-diagonal. The diagonal is the minimum of one and the inverse of the corresponding diagonal element in the matrix consisting of the sums of the outer-products of the gradient at the parameter starting values (numerically evaluated).

RNLS works as follows: starting values for θ and K for the first estimation (T-1 observations) are the full sample values (T observations); then the sample size is reduced by one observation; the previous values at convergence are used to start with.

Owing to numerical problems it is possible (especially close to the maximum) that the calculated δi does not yield a higher likelihood. Then an si∈[0,1] yielding a higher function value is determined by a line search. Theoretically, since the direction is upward, such an si should exist; however, numerically it might be impossible to find one. When using BFGS with numerical derivatives, it often pays to scale the data so that the initial gradients are of the same order of magnitude.

The convergence decision is based on two tests. The first uses likelihood elasticities (∂l/∂ log θ):

|qi,jθi,j|≤ε for all j when θi,j≠0,
|qi,j|≤ε for all j with θi,j=0.
(eq:17.63)

The second is based on the one-step-ahead relative change in the parameter values:

i+1,j|≤10 ε |θi,j| for all j with θi,j≠0,
i+1,j|≤10 ε for all j when θi,j=0.
(eq:17.64)

The status of the iterative process is given by the following messages:

  1. No convergence!

  2. Aborted: no convergence!

  3. Function evaluation failed: no convergence!

  4. Maximum number of iterations reached: no convergence!

  5. Failed to improve in line search: no convergence!

    The step length si has become too small. The convergence test (eq:17.63) was not passed, using tolerance ε=ε2.

  6. Failed to improve in line search: weak convergence

    The step length si has become too small. The convergence test (eq:17.63) was passed, using tolerance ε=ε2.

  7. Strong convergence

    Both convergence tests (eq:17.63) and (eq:17.64) were passed, using tolerance ε=ε1.

The chosen default values for the tolerances are:

ε1=10-4, ε2=5 ×10-3.
(eq:17.65)

You can:

  1. set the initial values of the parameters to zero or the previous values;

  2. set the maximum number of iterations;

  3. write iteration output;

  4. change the convergence tolerances ε1 and ε2. Care must be exercised with this: the defaults are `fine-tuned'; some selections merely show the vital role of sensible choices!

NOTE 1:
non-linear estimation can only continue after convergence.

NOTE 2:
Restarting the optimization process leads to a Hessian reset.

Chapter 18 Model Evaluation Statistics

18.1 Graphic analysis

Graphic analysis focuses on graphical inspection of individual equations. Let yt, ŷt denote respectively the actual (that is, observed) values and the fitted values of the selected equation, with residuals ût=yt-ŷt, t=1,...,T. When H observations are retained for forecasting, then ŷT+1,...,ŷT+H are the 1-step forecasts. NLS/RNLS/ML use the variables labelled `actual' and `fitted' for yt, ŷt.

Fourteen different graphs are available:

  1. Actual and fitted values

    (yt,ŷt) over t. This is a graph showing the fitted (ŷt) and actual values (yt) of the dependent variable over time, including the forecast period.

  2. Cross-plot of actual and fitted

    t against yt, also including the forecast period.

  3. Residuals (scaled)

    ( ût/σ̂) over t, where σ̂2=(T-k)-1RSS is the full-sample equation error variance. As indicated, this graph shows the scaled residuals given by ût/σ̂ over time.

  4. Forecasts and outcomes

    The 1-step forecasts can be plotted in a graph over time: yt and ŷt are shown with error bars of ±2SE( et) centered on ŷt (that is, an approximate 95% confidence interval for the 1-step forecast); et are the forecast errors.

  5. Residual density and histogram

    Plots the histogram of the standardized residuals ût/√(T-1RSS), t=1,...,T, the estimated density fu(.)̂ and a normal distribution with the same mean and variance (more details are in §16.10).

  6. Residual autocorrelations (ACF)

    This plots the residual autocorrelations using ût as the xt variable in (eq:18.13).

  7. Residual partial autocorrelations (PACF)

    This plots the partial autocorrrelation function (see §16.6)--the same graph is used if the ACF is selected.

  8. Forecasts Chow tests

    If available, the individual Chow χ2(1) tests (see §eq:17.24) are be plotted.

  9. Residuals (unscaled)

    ( ût) over t;

  10. Residual spectrum

    This plots the estimated spectral density (see §16.9) using ût as the xt variable.

  11. Residual QQ plot against N(0,1)

    Shows a QQ plot of the residuals, see §16.11.

  12. Residual density

    The non-parametrically estimated density fu(.)̂ of the standardized residuals ût/√(T-1RSS), t=1,...,T is graphed using the settings described in the OxMetrics book.

  13. Histogram

    This plots the histogram of the standardized residuals ût/√(T-1RSS), t=1,...,T--the same graph is used if the density is selected.

  14. Residual distribution (normal quantiles)

    Plots the distribution based on the non-parametrically estimated density.

The residuals can be saved to the database for further inspection.

18.2 Recursive graphics (RLS/RIVE/RNLS/RML)

Recursive methods estimate the model at each t for t=M-1,...,T. The output generated by the recursive procedures is most easily studied graphically, possibly using the facility to view multiple graphs together on screen. The dialog has a facility to write the output to the editor, instead of graphing it. The recursive estimation aims to throw light on the relative future information aspect (that is, parameter constancy).

Let β̂t denote the k parameters estimated from a sample of size t, and yj-xj'β̂t the residuals at time j evaluated at the parameter estimates based on the sample 1,...,t (for RNLS the residuals are yj-f(xj,β̂t)).

We now consider the generated output:

  1. Beta coefficient ±2 Standard Errors

    The graph shows β̂it±2SE(β̂it) for each selected coefficient i  ( i=1,...,k) over t=M,...,T.

  2. Beta t-value

    β̂it/SE(β̂it) for each selected coefficient i  ( i=1,...,k) over t=M,...,T.

  3. Residual sum of squares

    The residual sum of squares at each t is RSSt=∑j=1t(yj-xj'β̂t)2 for t=M,...,T.

  4. 1-Step residuals ±2σ̂t

    The 1-step residuals yt-xt'β̂t are shown bordered by 0±2σ̂t over M,...,T. Points outside the 2 standard-error region are either outliers or are associated with coefficient changes.

  5. Standardized innovations

    The standardized innovations (or standardized recursive residuals) for RLS are:

    νt=(yt-xt'β̂t-1)/(ωt) 1/2 where ωt=1+xt'( Xt-1'Xt-1) -1xt for t=M,...,T.

    σ2ωt is the 1-step forecast error variance of (eq:17.20), and β̂M-1 are the coefficient estimates from the initializing OLS estimation.

  6. 1-Step Chow tests

    1-step forecast tests are F( 1,t-k-1) under the null of constant parameters, for t=M,...,T. A typical statistic is calculated as:

    ( RSSt-RSSt-1) ( t-k-1)

    RSSt-1
    =
    νt2 / ωt

    σ̂ 2t-1
    .
    (eq:18.1)

    Normality of yt is needed for this statistic to be distributed as an F.

  7. Break-point Chow tests

    Break-point F-tests are F( T-t+1,t-k-1) for t=M,...,T. These are, therefore, sequences of Chow tests and are also called N↓ because the number of forecasts goes from N=T-M+1 to 1. When the forecast period exceeds the estimation period, this test is not necessarily optimal relative to the covariance test based on fitting the model separately to the split samples. A typical statistic is calculated as:

    ( RSST-RSSt-1) ( t-k-1)

    RSSt-1(T-t+1)
    . =
    1

    T-t+1
    m=tTνm2 / ωm

    σ̂ 2t-1
    .
    (eq:18.2)

    This test is closely related to the CUSUMSQ statistic in Brown, Durbin, and Evans (1975).

  8. Forecast Chow tests

    Forecast F-tests are F( t-M+1,M-k-1) for t=M,...,T, and are called N↑ as the forecast horizon increases from M to T. This tests the model over 1 to M-1 against an alternative which allows any form of change over M to T. A typical statistic is calculated as:

    ( RSSt-RSSM-1) ( M-k-1)

    RSSM-1(t-M+1)
    .
    (eq:18.3)

The statistics in (eq:18.1)--(eq:18.3) are variants of Chow (1960) tests: they are scaled by 1-off critical values from the F-distribution at any selected probability level as an adjustment for changing degrees of freedom, so that the significant critical values become a straight line at unity. Note that the first and last values of (eq:18.1) respectively equal the first value of (eq:18.3) and the last value of (eq:18.2).

The Chow test statistics are not calculated for RIVE/RML; the recursive RSS is not available for RML.

18.3 Dynamic analysis

The general class of models estimable in PcGive can be written in the form:

b0( L) yt=∑i=1qbi( L) zitt
(eq:18.4)

where b0( L) and the bi( L) are polynomials in the lag operator L. Now q+1 is the number of distinct variables (one of which is yt), whereas k remains the number of estimated coefficients. For simplicity we take all polynomials to be of length m:

bi( L) =∑j=0mbijLj,  i=0,...,q.

With b00=1 and using a(L)=-∑j=1mb0jLj-1 we can write (eq:18.4) as:

yt=a( L) yt-1+∑i=1qbi( L) zitt.
(eq:18.5)

Finally, we use a=(b01,...,b0m)' and bi=(bi0,...,bim), i=1,...,q.

In its unrestricted mode of operation, PcGive can be visualized as analyzing the polynomials involved, and it computes such functions as their roots and sums. This option is available if a general model was initially formulated, and provided OLS or IVE was selected.

18.3.1 Static long-run solution

When working with dynamic models, concepts such as equilibrium solutions, steady-state growth paths, mean lags of response etc. are generally of interest. In the simple model:

yt0zt1zt-11yt-1+ut,
(eq:18.6)

where all the variables are stationary, a static equilibrium is defined by:

E[ zt] =z* for all t

in which case, E[ yt] =y* will also be constant if |α1|<1, and yt will converge to:

y*=Kz* where K=
( β01)

( 1-α1)
(eq:18.7)

For non-stationary but cointegrated data, reinterpret expression (eq:18.7) as E[ yt-Kzt] =0.

PcGive computes estimates of K and associated standard errors. These are called static long-run parameters. If b0( 1) ≠0, the general long-run solution of (eq:18.4) is given by:

y*=∑i=1q
bi( 1)

b0( 1)
zi*=∑i=1qKizi*.
(eq:18.8)

The expression yt-ΣKizit is called the equilibrium-correction mechanism (ECM) and can be stored in the data set. If common-factor restrictions of the form bj( L) =α( L) γj( L) ,  j=0,...,q are imposed, then α( 1) will cancel, hence enforced autoregressive error representations have no impact on derived long-run solutions.

The standard errors of K̂=( K̂1...K̂q) ' are calculated from:

V[ K̂] ̂=ĴV[ β̂] ̂Ĵ' when J=
K

β'
.
(eq:18.9)

PcGive calculates J analytically using the algorithm proposed by Bårdsen (1989).

PcGive outputs the solved static long-run equation, with standard errors of the coefficients. This is followed by a Wald test of the null that all of the long-run coefficients are zero (except the constant term). The V[K̂]̂ matrix is printed when `covariance matrix of estimated coefficients' is checked under the model options.

18.3.2 Analysis of lag structure

The b̂i( L) , i=0,...,q of (eq:18.4) and their standard errors are reported in tabular form with the b̂i( 1) (their row sums) and associated standard errors.

18.3.2.1 Tests on the significance of each variable

The first column contains F-tests of each of the q+1 hypotheses:

Hv0:a=0;  Hvi:bi=0  for i=1,...,q.

These test the significance of each basic variable in turn. The final column gives the PcGive unit-root tests:

Hui:bi( 1) =0  for i=0,...,q.

If Hui: bi( 1) =0 cannot be rejected, there is no significant long-run level effect from zit; if Hvi: bi=0 cannot be rejected, there is no significant effect from zit at any (included) lag. Significance is marked by * for 5% and ** for 1%. Critical values for the PcGive unit-root test (Hu0: b0( 1) =0) are based on Ericsson and MacKinnon (2002). For the unit-root test, only significance of the dependent variable is reported (not the remaining variables!),

Conflicts between the tests' outcomes are possible in small samples.

Note that bi( 1) =0 and bi=0 are not equivalent; testing Ki=0 is different again. Using (eq:18.6) we can show the relevant hypotheses:

significance of each variable Hv01=0;  Hv101=0,
PcGive unit-root test Hu01-1=0,
Additional unit-root tests Hu101=0,
t-values from static long run Hl:(β01)/(1-α1)=0.

18.3.2.2 Tests on the significance of each lag

F-tests of each lag length are shown, beginning at the longest ( m) and continuing down to 1. The test of the longest lag is conditional on keeping lags ( 1,...,m-1) , that of ( m-1) is conditional on ( 1,...,m-2,m) etc.

18.3.3 Tests on the significance of all lags

Finally, F-tests of all lags up to m are shown, beginning at the longest ( 1,...,m) and continuing further from ( 2,...,m) down to ( m,...,m) . These tests are conditional on keeping no lags, keeping lag 1, down to keeping ( 1,...,m-1) . Thus, they show the marginal significance of all longer lags.

18.3.4 COMFAC tests

COMFAC tests for the legitimacy of common-factor restrictions of the form:

[Note: Using Sargan's Wald algorithm (see Hendry, Pagan, and Sargan, 1984 and Sargan, 1980b). Note that this non-linear Wald test is susceptible to formulation, so depends on the order of the variables.]

α( L) b0*( L) yt=α( L) ∑i=1kbi*( L) xit+ut
(eq:18.10)

where α( L) is of order r and * denotes polynomials of the original order minus r. The degrees of freedom for the Wald tests for COMFAC are equal to the number of restrictions imposed by α( L) and the Wald statistics are asymptotically χ2 with these degrees of freedom if the COMFAC restrictions are valid. It is preferable to use the incremental values obtained by subtracting successive values of the Wald tests. These are χ2 also, with degrees of freedom given by the number of additional criteria. Failure to reject common-factor restrictions does not entail that such restrictions must be imposed. For a discussion of the theory of COMFAC, see Hendry and Mizon (1978) for some finite-sample Monte Carlo evidence see Mizon and Hendry (1980). COMFAC is not available for RALS.

When the minimum order of lag length in the bi( L) is unity or larger (m say), the Wald test sequence for 1,2,...,m common factors is calculated. Variables that are redundant when lagged (Constant, Seasonals, Trend) are excluded in conducting the Wald test sequence since they always sustain a common-factor interpretation.

18.3.5 Lag weights

Consider the simple model:

( 1-α1L) yt=( β01L) zt+ut.
(eq:18.11)

With |α1|<1 this can be written as:

yt=w( L) zt+vt,

when:

w( L) =( β01L) /( 1-α1L) =( β01L) ( 1+α1L+α12L2+...) .

Starting from an equilibrium z* at t=0, a one-off increment of δ to z* has an impact on y* at t=0,1,2,... of w0δ, w1δ, w2δ, w3δ,... with the ws defined by equating coefficients of powers of L as:

w00,   w110α1,   w21w1,   w31w2,...

PcGive can graph the normalized lag weights w0/w( 1) , w1/w( 1) ,..., ws/w( 1) and the cumulative normalized lag weights w0/w( 1) , ( w0+w1) /w( 1) ,..., ( w0+...+ws) /w( 1) .

Lag weights are available for models estimated by OLS or IVE.

18.4 Dynamic forecasting

Static forecasts, §17.2.17, can only be made ex post: only observed data is used in the construction of the static forecasts. Genuine forecasts can be made ex ante, using past data only. In a dynamic model this means that the future values of the lagged dependent variable are also forecasts. Moreover, other regressors must be known or extrapolated into the forecast period.

Suppose we estimated a simple autoregressive model with just a mean:

t = α̂ yt-1+μ̂,

with the parameters estimated over the sample 1,...,T. Then the first forecast is the same as the static forecast:

T+1|T = α̂ yT+μ̂.

The second forecast is a dynamic forecast:

T+2|T = α̂ ŷT+1|T+μ̂.

When there are additional regressors in the model:

t = α̂ yt-1+μ̂ +xt'β̂,

the forecast at T+h needs xT+h. This is readily available for deterministic regressors such as the intercept, seasonals, and trend. Otherwise it has to be constructed, or the model changed into a multivariate model that is entirely closed. The standard errors of the forecast need to take into account that the lagged dependent variables themselves are forecasts. The econometrics of this is discussed in Volume II (Doornik and Hendry, 2013c). Extensive treatments of forecasting can be found in Clements and Hendry (1998) and Clements and Hendry (2011).

If the dynamic forecasts are made ex post, lagged dependent variables remain forecasted values (and not the actual values, eventhough they are known). However, in that case all other regressors are actual values. Moreover, forecast errors can then be computed, with forecast accuracy expressed in terms of mean absolute percentage error (MAPE) and root mean square error (RMSE):

RMSE = [ 1/H ∑h=1H (yT+h-ŷT+h|T)2]1/2,

and

MAPE = 100/H ∑h=1H |
yT+h-ŷT+h|T

yT+h
|.

18.4.0.0.1 Forecast type

There is a choice between dynamic forecasts (the default) and static forecasts. Static or 1-step forecasts can be obtained by selecting h-step forecasts and setting h=1. Selecting a larger h uses forecasted y's up to lag h-1, but actual ys from lag h onwards.

18.4.0.0.2 Forecast standard errors

The default is to base the standard errors on the error variance only, thus ignoring the contribution from the fact that the parameters are estimated and so uncertain. It is possible to take the parameter uncertainty into account, but this is usually small relative to the error uncertainty.

18.4.0.0.3 Hedgehog plots

Hedgehog plots graph the forecasts starting from every point in the estimation sample. They are called hedgehog plots, because they often look like that, with all forecast paths spiking upwards (or downwards for an inverted hedgehog).

If H is the forecast horizon, then one forecast path is:

t+1|t, ŷt+2|t, ..., ŷt+H|t,

starting at observation t+1 and using the estimated parameters from the full sample 1,...,T. The hedgehog plot graphs all path for t=s,...,T.

After recursive estimation, the hedgehog plot uses recursively estimated parameters. In that case the forecast path ŷt+1|t, ..., ŷt+H|t uses parameters estimated over 1,...,t.

The hedgehog graphs are displayed in the Hedgehog window. If robust forecasts are requested, these will appear in Hedgehog - robust.

18.4.0.0.4 Start forecast later

Optionally, a gap G can be specified to delay forecasting (this does not affect the hedgehog graphs). For the simple AR(1) model:

T+G+1|T =α̂ yT+G+μ̂.
T+G+2|T =α̂ ŷT+G+1|T+μ̂.

When new data is available, we can now compare the existing model that starts forecasting from the new data, to the re-estimated model that incorporates the new data.

18.4.0.0.5 Robust forecasts

Robust forecasts take the differenced model, forecast, and then re-integrate. If the estimated model is:

t = α̂(L) yt+μ̂ +xt'β̂,

then after differencing:

Δŷt = α̂(L) Δytxt'β̂,

we obtain dynamic forecasts of the differences:

ΔŷT+1|T, ..., ΔŷT+H|T.

Re-integration gives:

rT+1|T =yt+ ΔŷT+1|T
rT+2|T =rT+1|T+ ΔŷT+2|T
...

The estimated intercept disappears in the differencing, and instead we use the most recent level (similarly, a trend becomes an intercept, which is then reintegrated, etc.). If there was a recent break in the mean, the forecasts using full sample mean will be less accurate than using the most recent level. Therefore the forecasts from the differenced model are robust to breaks, at least to some extent. The price to pay in the absence of breaks is that the forecasts will be more noisy.

Another form of robust forecasting is the double-differenced device (DDD). The DDD is based on the observation that most economic time series do not continuously accelerate. It amounts to setting the second differences (of the logarithms) to zero, so no estimation is involved. This can be achieved in PcGive by creating ΔΔyt in the database, and then formulating an empty model for this. An alternative would be to use ΔΔS yt when there is seasonality and the data frequency is S. More information is in Clements and Hendry (2011).

18.4.0.0.6 Level forecasts

Models for economic variables are often formulated in terms of growth rates: denoting the level by Yt, the dependent variable is then the first difference of the logarithm: yt = Δ log Yt. The objective of the transformation is to model a (approximately) stationary representation of the dependent variable. But, when it comes to forecasting, it is often useful to be able to present the results in the original levels.

PcGive automatically recognizes the following dynamic transformations of the dependent variable:

Δyt, ΔS yt, ΔΔyt, ΔΔS yt, yt (undifferenced),

where S is the frequency of the data. Lagged dependent variables are taken into account, as are Δyt, ΔS yt if they appear on the right-hand side in a model for a higher difference.

In addition, the following functional transformations are detected:

log Yt, logit Yt = log
Yt

1-Yt
, Yt (untransformed),

together with an optional scale factor.

If the model fits in this mould, the levels forecasts can be automatically generated by PcGive. First, the dynamic transformations are substituted out to give forecasts

T+1|T, ..., ŷT+H|T,

with corresponding standard deviations

T+1|T, ..., ŝT+H|T.

Because the differenced model assumes normality, these are still normally distributed. Removing one level of differences makes the standard errors grow linearly, etc.

There are two types of level forecasts, median and mean, which are identical if no functional transformation is used. They differ, however, for logarithmic transformations:

Median forecasts
are easily derived from the inverse transformation:

yt= log Yt, then T+h|T= exp ŷT+h|T,
yt=logit Yt, then T+h|T=[1+ exp (-ŷT+h|T)]-1.

Mean forecasts
when log Yt is normally distributed, Yt is log-normal. Similarly, when logit Yt is normally distributed, Yt has the logit-normal distribution. Both are discussed by Johnson (1949).

For the log-normal, when yT+h|T~N[ŷT+h|T,ŝ2T+h|T] then

E[YT+h|T] = exp (ŷT+h|T + ½ŝ2T+h|T).

The equivalent expression for the logit-normal can be found in Johnson (1949, eqn.56) and is not quite so simple.

[Note: PcGive uses this expression for ŝT+h|T> 1/3, otherwise a third order Taylor expansion is used. So we do not exactly follow Wallis (1987) who advocates a second order Taylor expansion. The first reason is that the Taylor expansion is highly inaccurate for ŝT+h|T> 1. The second reason is that we also wish to report the standard error of the logit-normally distributed forecasts.]

The quantiles of the log and logistic-normal are simply derived from the inverse distribution. This is used in the plots for the 5% and 95% confidence bands:

yt= log Yt, then exp (ŷT+h|T±2 ŝT+h|T),
yt=logit Yt, then [1+ exp (-ŷT+h|T2 ŝT+h|T)]-1.

These bands will not be symmetric around the mean (or median) forecasts.

The standard errors of the level forecasts are also reported. In the log-normal case these are:

sd[YT+h|T] = exp (ŷT+h|T + ½ŝ2T+h|T) ( exp s2T+h|T -1)1/2.

For the logit-normal distribution we refer to Johnson (1949, eqn.58).

[Note: The required derivative is computed using a simple finite difference approximation.]

18.4.0.0.7 Derived function

OxMetrics Algebra expressions can be used for derived functions. E.g. the cum() function, together with the appropriate initial conditions maps back from a first difference, and exponents from logarithms. In this case, the forecast standard errors are derived numerically.

18.5 Diagnostic tests

18.5.1 Introduction

Irrespective of the estimator selected, a wide range of diagnostic tests is offered. Tests are available for residual autocorrelation, conditional heteroscedasticity, normality, unconditional heteroscedasticity/functional form mis-specification and omitted variables. Recursive residuals can be used if these are available. Tests for common factors and linear restrictions are discussed in §18.3.4 and §18.6 below, encompassing tests in §18.10. Thus, relating this section to the earlier information taxonomy , the diagnostic tests of this section concern the past (checking that the errors are a homoscedastic, normal, innovation process relative to the information available), whereas the forecast statistics discussed in Chapter 17 concern the future and encompassing tests concern information specific to rival models.

Many test statistics in PcGive have either a χ2 distribution or an F distribution. F-tests are usually reported as:

     F(num,denom)  =  Value  [Probability]  /*/**

for example:

     F(1, 155)     =  5.0088 [0.0266] *

where the test statistic has an F-distribution with one degree of freedom in the numerator, and 155 in the denominator. The observed value is 5.0088, and the probability of getting a value of 5.0088 or larger under this distribution is 0.0266. This is less than 5% but more than 1%, hence the star. Significant outcomes at a 1% level are shown by two stars. χ2 tests are also reported with probabilities, as for example:

    Normality Chi^2(2) = 2.1867 [0.3351]

The 5% χ2 critical values with two degrees of freedom is 5.99, so here normality is not rejected (alternatively, Prob(χ2≥ 2.1867) = 0.3351, which is more than 5%). Details on the computation of probability values and quantiles for the F and χ2 tests are given under the probf, probchi, quanf and quanchi functions in the Ox reference manual (Doornik, 2013).

Some tests take the form of a likelihood ratio (LR) test. If l is the unrestricted, and l0 the restricted log-likelihood, then -2(l0-l) has a χ2(s) distribution, with s the number of restrictions imposed (so model l0 is nested in l).

Many diagnostic tests are calculated through an auxiliary regression. For single-equation tests, they take the form of TR2 for the auxiliary regression so that they are asymptotically distributed as χ2( s) under their nulls, and hence have the usual additive property for independent χ2s. In addition, following Harvey (1990) and Kiviet (1986), F-approximations are calculated because they may be better behaved in small samples:

R2

1-R2
.
T-k-s

s
~F( s,T-k-s)
(eq:18.12)

When the covariance matrix is block diagonal between regression and heteroscedasticity (or ARCH) function parameters, tests can take the regression parameters as given, see Davidson and MacKinnon (1993, Ch. 11):

R2

1-R2
.
T-s

s
~F( s,T-s).

This may be slightly different if not all parameters are included in the test, or when observations are lost in the construction of the test.

18.5.2 Residual autocorrelations (ACF), Portmanteau and DW

The sample autocorrelation function (ACF) of a variable xt is the series {rj} where rj is the correlation coefficient between xt and xt-j for j = 1,...,s:

[Note: Old version of PcGive (version 9 and before) used the running mean in the denominator. The difference with the current definition tends to be small, and vanishes asymptotically, provided the series is stationary. Nielsen (2006b) calls this version the correlogram, and the ACF the covariogram. He argues that the correlogram provides a better discrimination between stationary and non-stationary variables: for an autoregressive value of one (or higher), the correlogram declines more slowly than the ACF. ]

rj=
t=j+1T( xt-x) ( xt-j-x)

t=1T( xt-x)2
.
(eq:18.13)

Here x= 1/T ∑t=jTxt is the sample mean of xt.

The residual correlogram is defined as above, but using the residuals from the econometric regression, rather than the data. Thus, this reports the series {rj} of correlations between the residuals ût and ût-j. In addition, PcGive prints the partial autocorrelation function (PACF) (see the OxMetrics book).

It is possible to calculate a statistic based on `T*(sum of s squared autocorrelations)', with s the length of the correlogram, called the Portmanteau statistic:

LB( s) =T2j=1s
rj2

T-j
.
(eq:18.14)

This is corresponds to Box and Pierce (1970), but with a degrees of freedom correction as suggested by Ljung and Box (1978). It is designed as a goodness-of-fit test in stationary, autoregressive moving-average models. Under the assumptions of the test, LB(s) is asymptotically distributed as χ2(s-n) after fitting an AR(n) model. A value such that LB( s) ≥2s is taken as indicative of mis-specification for large s. However, small values of such a statistic should be treated with caution since residual autocorrelations are biased towards zero (like DW) when lagged dependent variables are included in econometric equations. An appropriate test for residual autocorrelation is provided by the LM test in §18.5.3 below.

18.5.2.1 Durbin--Watson statistic (DW)

This is a test for autocorrelated residuals and is calculated as:

DW=
t=2T( ût-ût-1) 2

t=1Tt2
.
(eq:18.15)

DW is most powerful as a test of {ut} being white noise against:

ut=ρut-1t where εt~IID( 0,σε 2) .

If 0<DW<2, then the null hypothesis is H0: ρ=0, that is, zero autocorrelation (so DW=2) and the alternative is H1: ρ>0, that is, positive first-order autocorrelation.

If 2<DW<4, then H0: ρ=0 and H1: ρ<0, in which case DW*=4-DW should be computed.

The significance values of DW are widely recorded in econometrics' textbooks. However, DW is a valid statistic only if all the xt variables are non-stochastic, or at least strongly exogenous. If the model includes a lagged dependent variable, then DW is biased towards 2, that is, towards not detecting autocorrelation, and Durbin's h-test (see Durbin, 1970) or the equivalent LM-test for autocorrelation in §18.5.3 should be used instead. For this reason, we largely stopped reporting the DW statistic. Also see §16.4.

18.5.3 Error autocorrelation test (not for RALS, ML)

This is the Lagrange-multiplier test for rth order residual autocorrelation, distributed as χ2( r) in large samples, under the null hypothesis that there is no autocorrelation (that is, that the errors are white noise). In standard usage, r~= 1/2 s for s in §18.5.2 above, so this provides a type of Portmanteau test (see Godfrey, 1978). However, any orders from 1 up to 12 can be selected to test against:

ut=∑i=prαiut-it where 0≤p≤r.

As noted above, the F-form suggested by Harvey (1981, see Harvey, 1990) is the recommended diagnostic test. Following the outcome of the F-test (and its p-value), the error autocorrelation coefficients are recorded. For an autoregressive error of order r to be estimated by RALS, these LM coefficients provide good initial values, from which the iterative optimization can be commenced. The LM test is calculated by regressing the residuals on all the regressors of the original model and the lagged residuals for lags p to r (missing residuals are set to zero). The LM test χ2(r-p+1) is TR2 from this regression (or the F-equivalent), and the error autocorrelation coefficients are the coefficients of the lagged residuals. For an excellent exposition, see Pagan (1984).

18.5.4 Normality test

Let μ, σx2 denote the mean and variance of {xt}, and write μi=E[ xt-μ] i, so that σx22. The skewness and kurtosis are defined as:

√β1=
μ3

μ23/2
and β2=
μ4

μ22
.
(eq:18.16)

Sample counterparts are defined by

x= 1/T ∑t=1Txt,    mi= 1/T ∑t=1T( xt-x% ) i,    √b1=
m3

m23/2
and b2=
m4

m22
.
(eq:18.17)

A normal variate will have √β1=0 and β2=3. Bowman and Shenton (1975) consider the test:

e1=
T( √b1) 2

6
+
T( b2-3) 2

24
a ̃ χ2( 2),
(eq:18.18)

which subsequently was derived as an LM test by Jarque and Bera (1987). Unfortunately e1 has rather poor small sample properties: √b1 and b2 are not independently distributed, and the sample kurtosis especially approaches normality very slowly. The test reported by PcGive is based on Doornik and Hansen (1994), who employ a small sample correction, and adapt the test for the multivariate case. It derives from Shenton and Bowman (1977), who give b2 (conditional on b2>1+b1) a gamma distribution, and D'Agostino (1970), who approximates the distribution of √b1 by the Johnson Su system. Let z1 and z2 denote the transformed skewness and kurtosis, where the transformation creates statistics which are much closer to standard normal. The test statistic is:

e2=z12+z22 app̃ χ2( 2) .
(eq:18.19)

Table Table:18.1 compares (eq:18.19) with its asymptotic form (eq:18.18). It gives the rejection frequencies under the null of normality, using χ2(2) critical values. The experiments are based on 10000 replications and common random numbers.

Table:18.1 Empirical size of normality tests

nominal probabilities of e2 nominal probabilities of (eq:18.18)
T 20%10%5%1%20%10%5%1%
50 0.1734 0.0869 0.0450 0.0113 0.0939 0.0547 0.0346 0.0175
100 0.1771 0.0922 0.0484 0.0111 0.1258 0.0637 0.0391 0.0183
150 0.1845 0.0937 0.0495 0.0131 0.1456 0.0703 0.0449 0.0188
250 0.1889 0.0948 0.0498 0.0133 0.1583 0.0788 0.0460 0.0180

PcGive reports the following statistics under the normality test option, replacing xt by the residuals ut:

mean x
standard deviation σx=(m2)½
skewness √b1
excess kurtosis b2-3
minimum
maximum
asymptotic test e1
normality testχ2( 2) e2    [ P( χ2( 2) ≥e2) ]

18.5.5 Heteroscedasticity test using squares (not for ML)

This test is based on White (1980), and involves an auxiliary regression of {ût2} on the original regressors ( xit) and all their squares (xit2). The null is unconditional homoscedasticity, and the alternative is that the variance of the {ut} process depends on xt and on the xit2. The output comprises TR2, the F-test equivalent, the coefficients of the auxiliary regression, and their individual t-statistics, to help highlight problem variables. Variables that are redundant when squared are automatically removed, as are observations that have a residual that is (almost) zero. Some additional information can be found in Volume II.

18.5.6 Heteroscedasticity test using squares and cross-products (not for ML)

This test is that of White (1980), and only calculated if there is a large number of observations relative to the number of variables in the regression. It is based on an auxiliary regression of the squared residuals ( ût2) on all squares and cross-products of the original regressors (that is, on r=½k( k+1) variables). That is, if T>>k( k+1) , the test is calculated; redundant variables are automatically removed, as are observations that have a residual that is (almost) zero. The usual χ2 and F-values are reported; coefficients of the auxiliary regression are also shown with their t-statistics to help with model respecification. This is a general test for heteroscedastic errors: H0 is that the errors are homoscedastic or, if heteroscedasticity is present, it is unrelated to the xs.

In previous versions of PcGive this test used to be called a test for functional form mis-specification. That terminology was criticized by Godfrey and Orme (1994), who show that the test does not have power against omitted variables.

18.5.7 ARCH test

This is the ARCH (AutoRegressive Conditional Heteroscedasticity) test: see Engle, 1982) which in the present form tests the hypothesis γ=0 in the model:

E[ ut2|ut-1,...,ut-r] =c0+∑i=1rγiut-i2

where γ=( γ1,...,γr) '. Again, we have TR2 as the χ2 test from the regression of ût2 on a constant and ût-12 to ût-r2 (called the ARCH test) which is asymptotically distributed as χ2( r) on H0: γ=0. The F-form is also reported. Both first-order and higher-order lag forms are easily calculated (see Engle, 1982, and Engle, Hendry, and Trumbull, 1985).

18.5.8 RESET (OLS only)

The RESET test (Regression Specification Test) due to Ramsey (1969) tests the null of correct specification of the original model against the alternative that powers of ŷt such as (ŷt2, ŷt3...) have been omitted (PcGive only allows squares). This tests to see if the original functional form is incorrect, by adding powers of linear combinations of xs since by construction, ŷt=xt'β̂t.

We use RESET23 for the test that uses squares and cubes, while RESET refers to the test just using squares.

18.5.9 Parameter instability tests (OLS only)

Parameter instability statistics are reported for σ2, followed by the joint statistic for all the parameters in the model (also see §18.5.9), based on the approach in Hansen (1992). Next, the instability statistic is printed for each parameter ( β1,...,βk2).

Large values reveal non-constancy (marked by * or **), and indicate a fragile model. Note that this measures within-sample parameter constancy, and is computed if numerically feasible (it may fail owing to dummy variables), so no observations need be reserved. The indicated significance is only valid in the absence of non-stationary regressors.

18.5.10 Diagnostic tests for NLS

The LM tests for autocorrelation, heteroscedasticity and functional form require an auxiliary regression involving the original regressors xit. NLS uses ∂f(xt,θ)/∂θi (evaluated at θ̂) instead. The auxiliary regression for the autocorrelation test is:

t=∑i=1kβi(
∂f(xt,θ)

∂θi
) θ̂+∑i=prαit-it.
(eq:18.20)

These three tests are not computed for models estimated using ML.

18.6 Linear restrictions test

Writing the model in matrix form as y=Xβ+u, the null hypothesis of p linear restrictions can be expressed as H0 : Rβ=r, with R a (p×k) matrix and r a p×1 vector. This test is well explained in most econometrics textbooks, and uses the unrestricted estimates (that is, it is a Wald test).

The subset form of the linear restrictions tests is: H0: βi=...=βj=0: any choice of coefficients can be made, so a wide range of specification hypothesis can be tested.

18.7 General restrictions

Writing θ̂ =β̂, with corresponding variance-covariance matrix V[ θ̂ ] , we can test for (non-) linear restrictions of the form:

f( θ) =0.

The null hypothesis H0:f(θ)=0 will be tested against H1:f(θ)≠0 through a Wald test:

w=f( θ̂ ) '( V[ θ̂ ] ̃') -1f( θ̂ )

where J is the Jacobian matrix of the transformation: J=∂f(θ)/∂θ'. PcGive computes by numerical differentiation. The statistic w has a χ2(s) distribution, where s is the number of restrictions (that is, equations in f(.)). The null hypothesis is rejected if we observe a significant test statistic.

18.8 Test for omitted variables (OLS)

Lag polynomials of any variable in the database can be tested for omission. Variables that would change the sample or are already in the model are automatically deleted. The model itself remains unchanged. If the model is written in matrix form as y=Xβ+Zγ+u, then H0: γ=0 is being tested. The test exploits the fact that on H0:

(T)½γ̂  →  D    Np( 02( Z'MXZ/T) -1) with MX=IT-X( X'X) -1X',
(eq:18.21)

then:

γ̂'( Z'MXZ) γ̂

σ̂2
.
T-k-p

p
~F( p,T-k-p)
(eq:18.22)

for p added variables.

Since ( X'X) -1 is precalculated, the F-statistic is easily computed by partitioned inversion. Computations for IVE are more involved.

18.9 Progress: the sequential reduction sequence

Finally, PcGive has specific procedures programmed to operate when a general-to-specific mode is adopted.

[Note: Note that PcGive does not force you to use a general-to-specific strategy. However, we hope to have given compelling arguments in favour of adopting such a modelling strategy.]

In PcGive, when a model is specified and estimated by least squares or instrumental variables, then the general dynamic analysis is offered: see §18.3.

However, while the tests offered are a comprehensive set of Wald statistics on variables, lags and long-run outcomes, a reduction sequence can involve many linear transformations (differencing, creating differentials etc.) as well as eliminations. Consequently, as the reduction proceeds, PcGive monitors its progress, which can be reviewed at the progress menu. The main statistics reported comprise:

  1. The number of parameters, the log-likelihood and the SC, HQ and AIC information criteria for each model in the sequence.

  2. F-tests of each elimination conditional on the previous stage.

18.10 Encompassing and `non-nested' hypotheses tests

Once appropriate data representations have been selected, it is of interest to see whether the chosen model can explain (that is, account for) results reported by other investigators. Often attention has focused on the ability of chosen models to explain each other's residual variances (variance encompassing), and PcGive provides the facility for doing so using test statistics based on Cox (1961) as suggested by Pesaran (1974). Full details of those computed by PcGive for OLS and IVE are provided in Ericsson (1983). Note that a badly-fitting model should be rejected against well-fitting models on such tests, and that care is required in interpreting any outcome in which a well-fitting model (which satisfies all of the other criteria) is rejected against a badly-fitting, or silly, model (see Mizon, 1984, Mizon and Richard, 1986, and Hendry and Richard, 1989). The Sargan test is for the restricted reduced form parsimoniously encompassing the unrestricted reduced form, which is implicitly defined by projecting yt on all of the non-modelled variables. The F-test is for each model parsimoniously encompassing their union. This is the only one of these tests which is invariant to the choice of common regressors in the two models.

[Note: For example, if either the first or both models have the lagged dependent variable yt-1, the same F-value is produced. However, a different value will result if only the second model has yt-1.]

Thus, the F-test yields the same numerical outcome for the first model parsimoniously encompassing either the union of the two models under consideration, or the orthogonal complement to the first model relative to the union. In PcGive, tests of both models encompassing the other are reported.

References

Ahumada, H. (1985). An encompassing test of two models of the balance of trade for Argentina. Oxford Bulletin of Economics and Statistics 47, 51--70.

Alexander, C. (2001). Market Models: A Guide to Financial Data Analysis. Chichester: John Wiley and Sons.

Amemiya, T. (1981). Qualitative response models: A survey. Journal of Economic Literature 19, 1483--1536.

Amemiya, T. (1985). Advanced Econometrics. Oxford: Basil Blackwell.

Anderson, T. W. (1971). The Statistical Analysis of Time Series. New York: John Wiley & Sons.

Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817--858.

Baba, Y., D. F. Hendry, and R. M. Starr (1992). The demand for M1 in the U.S.A., 1960--1988. Review of Economic Studies 59, 25--61.

Banerjee, A., J. J. Dolado, J. W. Galbraith, and D. F. Hendry (1993). Co-integration, Error Correction and the Econometric Analysis of Non-Stationary Data. Oxford: Oxford University Press.

Banerjee, A., J. J. Dolado, D. F. Hendry, and G. W. Smith (1986). Exploring equilibrium relationships in econometrics through static models: Some Monte Carlo evidence. Oxford Bulletin of Economics and Statistics 48, 253--277.

Banerjee, A., J. J. Dolado, and R. Mestre (1998). Error-correction mechanism tests for cointegration in a single equation framework. Journal of Time Series Analysis 19, 267--283.

Banerjee, A. and D. F. Hendry (Eds.) (1992). Testing Integration and Cointegration. Oxford Bulletin of Economics and Statistics: 54.

Banerjee, A. and D. F. Hendry (1992). Testing integration and cointegration: An overview. Oxford Bulletin of Economics and Statistics 54, 225--255.

Bårdsen, G. (1989). The estimation of long run coefficients from error correction models. Oxford Bulletin of Economics and Statistics 50.

Bentzel, R. and B. Hansen (1955). On recursiveness and interdependency in economic models. Review of Economic Studies 22, 153--168.

Bollerslev, T., R. S. Chou, and K. F. Kroner (1992). ARCH modelling in finance -- A review of the theory and empirical evidence. Journal of Econometrics 52, 5--59.

Bontemps, C. and G. E. Mizon (2003). Congruence and encompassing. In B. P. Stigum (Ed.), Econometrics and the Philosophy of Economics, pp. 354--378. Princeton: Princeton University Press.

Bowman, K. O. and L. R. Shenton (1975). Omnibus test contours for departures from normality based on √b1 and b2. Biometrika 62, 243--250.

Box, G. E. P. and G. M. Jenkins (1976). Time Series Analysis, Forecasting and Control. San Francisco: Holden-Day. First published, 1970.

Box, G. E. P. and D. A. Pierce (1970). Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American Statistical Association 65, 1509--1526.

Breusch, T. S. and A. R. Pagan (1980). The Lagrange multiplier test and its applications to model specification in econometrics. Review of Economic Studies 47, 239--253.

Brown, R. L., J. Durbin, and J. M. Evans (1975). Techniques for testing the constancy of regression relationships over time (with discussion). Journal of the Royal Statistical Society B 37, 149--192.

Campos, J., N. R. Ericsson, and D. F. Hendry (1996). Cointegration tests in the presence of structural breaks. Journal of Econometrics 70, 187--220.

Chambers, E. A. and D. R. Cox (1967). Discrimination between alternative binary response models. Biometrika 54, 573--578.

Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica 28, 591--605.

Clements, M. P. and D. F. Hendry (1998). Forecasting Economic Time Series. Cambridge: Cambridge University Press.

Clements, M. P. and D. F. Hendry (1999). Forecasting Non-stationary Economic Time Series. Cambridge, Mass.: MIT Press.

Clements, M. P. and D. F. Hendry (2011). Forecasting from misspecified models in the presence of unanticipated location shifts. See ClementsHendry(2011), pp. 271--314.

Clements, M. P. and D. F. Hendry (Eds.) (2011). The Oxford Handbook of Economic Forecasting. Oxford: Oxford University Press.

Cochrane, D. and G. H. Orcutt (1949). Application of least squares regression to relationships containing auto-correlated error terms. Journal of the American Statistical Association 44, 32--61.

Cox, D. R. (1961). Tests of separate families of hypotheses. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, Berkeley, pp. 105--123. University of California Press.

Cramer, J. S. (1986). Econometric Applications of Maximum Likelihood Methods. Cambridge: Cambridge University Press.

D'Agostino, R. B. (1970). Transformation to normality of the null distribution of g1. Biometrika 57, 679--681.

Davidson, J. E. H., D. F. Hendry, F. Srba, and J. S. Yeo (1978). Econometric modelling of the aggregate time-series relationship between consumers' expenditure and income in the United Kingdom. Economic Journal 88, 661--692. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Davidson, R. and J. G. MacKinnon (1993). Estimation and Inference in Econometrics. New York: Oxford University Press.

Dickey, D. A. and W. A. Fuller (1981). Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 49, 1057--1072.

Doan, T., R. Litterman, and C. A. Sims (1984). Forecasting and conditional projection using realistic prior distributions. Econometric Reviews 3, 1--100.

Doornik, J. A. (2007). Autometrics. Mimeo, Department of Economics, University of Oxford.

Doornik, J. A. (2008). Encompassing and automatic model selection. Oxford Bulletin of Economics and Statistics 70, 915--925.

Doornik, J. A. (2009). Autometrics. In J. L. Castle and N. Shephard (Eds.), The Methodology and Practice of Econometrics: Festschrift in Honour of David F. Hendry. Oxford: Oxford University Press.

Doornik, J. A. (2013). Object-Oriented Matrix Programming using Ox (7th ed.). London: Timberlake Consultants Press.

Doornik, J. A. and H. Hansen (1994). A practical test for univariate and multivariate normality. Discussion paper, Nuffield College.

Doornik, J. A. and D. F. Hendry (1992). PCGIVE 7: An Interactive Econometric Modelling System. Oxford: Institute of Economics and Statistics, University of Oxford.

Doornik, J. A. and D. F. Hendry (1994). PcGive 8: An Interactive Econometric Modelling System. London: International Thomson Publishing, and Belmont, CA: Duxbury Press.

Doornik, J. A. and D. F. Hendry (2013a). Econometric Modelling using PcGive: Volume III (4th ed.). London: Timberlake Consultants Press.

Doornik, J. A. and D. F. Hendry (2013b). Interactive Monte Carlo Experimentation in Econometrics Using PcNaive (3rd ed.). London: Timberlake Consultants Press.

Doornik, J. A. and D. F. Hendry (2013c). Modelling Dynamic Systems using PcGive: Volume II (5th ed.). London: Timberlake Consultants Press.

Doornik, J. A. and D. F. Hendry (2013d). OxMetrics: An Interface to Empirical Modelling (7th ed.). London: Timberlake Consultants Press.

Doornik, J. A. and M. Ooms (2006). Introduction to Ox (2nd ed.). London: Timberlake Consultants Press.

Durbin, J. (1970). Testing for serial correlation in least squares regression when some of the regressors are lagged dependent variables. Econometrica 38, 410--421.

Eicker, F. (1967). Limit theorems for regressions with unequal and dependent errors. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, Berkeley, pp. 59--82. University of California.

Eisner, R. and R. H. Strotz (1963). Determinants of Business Investment. Englewood Cliffs, N.J.: Prentice-Hall.

Emerson, R. A. and D. F. Hendry (1996). An evaluation of forecasting using leading indicators. Journal of Forecasting 15, 271--291. Reprinted in T.C. Mills (ed.), Economic Forecasting. Edward Elgar, 1999.

Engle, R. F. (1982). Autoregressive conditional heteroscedasticity, with estimates of the variance of United Kingdom inflation. Econometrica 50, 987--1007.

Engle, R. F. (1984). Wald, likelihood ratio, and Lagrange multiplier tests in econometrics. See Gril84, Chapter 13.

Engle, R. F. and C. W. J. Granger (1987). Cointegration and error correction: Representation, estimation and testing. Econometrica 55, 251--276.

Engle, R. F., D. F. Hendry, and J.-F. Richard (1983). Exogeneity. Econometrica 51, 277--304. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; in Ericsson, N. R. and Irons, J. S. (eds.) Testing Exogeneity, Oxford: Oxford University Press, 1994; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Engle, R. F., D. F. Hendry, and D. Trumbull (1985). Small sample properties of ARCH estimators and tests. Canadian Journal of Economics 43, 66--93.

Engle, R. F. and H. White (Eds.) (1999). Cointegration, Causality and Forecasting. Oxford: Oxford University Press.

Engler, E. and B. Nielsen (2009). The empirical process of autoregressive residuals. Econometrics Journal 12, 367--381.

Ericsson, N. R. (1983). Asymptotic properties of instrumental variables statistics for testing non-nested hypotheses. Review of Economic Studies 50, 287--303.

Ericsson, N. R. (1992). Cointegration, exogeneity and policy analysis: An overview. Journal of Policy Modeling 14, 251--280.

Ericsson, N. R. and D. F. Hendry (1999). Encompassing and rational expectations: How sequential corroboration can imply refutation. Empirical Economics 24, 1--21.

Ericsson, N. R. and J. S. Irons (1995). The Lucas critique in practice: Theory without measurement. In K. D. Hoover (Ed.), Macroeconometrics: Developments, Tensions and Prospects, pp. 263--312. Dordrecht: Kluwer Academic Press.

Ericsson, N. R. and J. G. MacKinnon (2002). Distributions of error correction tests for cointegration. Econometrics Journal 5, 285--318.

Escribano, A. (1985). Non-linear error correction: The case of money demand in the UK (1878--1970). Mimeo, University of California at San Diego.

Favero, C. and D. F. Hendry (1992). Testing the Lucas critique: A review. Econometric Reviews 11, 265--306.

Finney, D. J. (1947). The estimation from individual records of the relationship between dose and quantal response. Biometrika 34, 320--334.

Fletcher, R. (1987). Practical Methods of Optimization, (2nd ed.). New York: John Wiley & Sons.

Friedman, M. and A. J. Schwartz (1982). Monetary Trends in the United States and the United Kingdom: Their Relation to Income, Prices, and Interest Rates, 1867--1975. Chicago: University of Chicago Press.

Frisch, R. (1934). Statistical Confluence Analysis by means of Complete Regression Systems. Oslo: University Institute of Economics.

Frisch, R. (1938). Statistical versus theoretical relations in economic macrodynamics. Mimeograph dated 17 July 1938, League of Nations Memorandum. Reproduced by University of Oslo in 1948 with Tinbergen's comments. Contained in Memorandum `Autonomy of Economic Relations', 6 November 1948, Oslo, Universitets Økonomiske Institutt. Reprinted in Hendry D. F. and Morgan M. S. (1995), The Foundations of Econometric Analysis. Cambridge: Cambridge University Press.

Frisch, R. and F. V. Waugh (1933). Partial time regression as compared with individual trends. Econometrica 1, 221--223.

Gilbert, C. L. (1986). Professor Hendry's econometric methodology. Oxford Bulletin of Economics and Statistics 48, 283--307. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.

Gilbert, C. L. (1989). LSE and the British approach to time-series econometrics. Oxford Review of Economic Policy 41, 108--128.

Godfrey, L. G. (1978). Testing for higher order serial correlation in regression equations when the regressors include lagged dependent variables. Econometrica 46, 1303--1313.

Godfrey, L. G. (1988). Misspecification Tests in Econometrics. Cambridge: Cambridge University Press.

Godfrey, L. G. and C. D. Orme (1994). The sensitivity of some general checks to omitted variables in the linear model. International Economic Review 35, 489--506.

Golub, G. H. and C. F. Van Loan (1989). Matrix Computations. Baltimore: The Johns Hopkins University Press.

Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37, 424--438.

Granger, C. W. J. (1986). Developments in the study of cointegrated economic variables. Oxford Bulletin of Economics and Statistics 48, 213--228.

Granger, C. W. J. and P. Newbold (1974). Spurious regressions in econometrics. Journal of Econometrics 2, 111--120.

Granger, C. W. J. and P. Newbold (1977). The time series approach to econometric model building. In C. A. Sims (Ed.), New Methods in Business Cycle Research, pp. 7--21. Minneapolis: Federal Reserve Bank of Minneapolis.

Granger, C. W. J. and P. Newbold (1986). Forecasting Economic Time Series, (2nd ed.). New York: Academic Press.

Gregory, A. W. and M. R. Veale (1985). Formulating Wald tests of non-linear restrictions. Econometrica 53, 1465--1468.

Griliches, Z. and M. D. Intriligator (Eds.) (1984). Handbook of Econometrics, Volume 2. Amsterdam: North-Holland.

Hansen, B. E. (1992). Testing for parameter instability in linear models. Journal of Policy Modeling 14, 517--533.

Harvey, A. C. (1981). The Econometric Analysis of Time Series. Deddington: Philip Allan.

Harvey, A. C. (1990). The Econometric Analysis of Time Series, (2nd ed.). Hemel Hempstead: Philip Allan.

Harvey, A. C. (1993). Time Series Models, (2nd ed.). Hemel Hempstead: Harvester Wheatsheaf.

Harvey, A. C. and P. Collier (1977). Testing for functional misspecification in regression analysis. Journal of Econometrics 6, 103--119.

Harvey, A. C. and N. G. Shephard (1992). Structural time series models. In G. S. Maddala, C. R. Rao, and H. D. Vinod (Eds.), Handbook of Statistics, Volume 11. Amsterdam: North-Holland.

Hendry, D. F. (1976). The structure of simultaneous equations estimators. Journal of Econometrics 4, 51--88. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.

Hendry, D. F. (1979). Predictive failure and econometric modelling in macro-economics: The transactions demand for money. In P. Ormerod (Ed.), Economic Modelling, pp. 217--242. London: Heinemann. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Hendry, D. F. (1980). Econometrics: Alchemy or science? Economica 47, 387--406. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.

Hendry, D. F. (Ed.) (1986). Econometric Modelling with Cointegrated Variables. Oxford Bulletin of Economics and Statistics: 48.

Hendry, D. F. (1986a). Econometric modelling with cointegrated variables: An overview. Oxford Bulletin of Economics and Statistics 48, 201--212. Reprinted in R.F. Engle and C.W.J. Granger (eds), Long-Run Economic Relationships, Oxford: Oxford University Press, 1991, 51--63.

Hendry, D. F. (1986b). Using PC-GIVE in econometrics teaching. Oxford Bulletin of Economics and Statistics 48, 87--98.

Hendry, D. F. (1987). Econometric methodology: A personal perspective. In T. F. Bewley (Ed.), Advances in Econometrics, pp. 29--48. Cambridge: Cambridge University Press. Reprinted in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Hendry, D. F. (1988). The encompassing implications of feedback versus feedforward mechanisms in econometrics. Oxford Economic Papers 40, 132--149. Reprinted in Ericsson, N. R. and Irons, J. S. (eds.) Testing Exogeneity, Oxford: Oxford University Press, 1994; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Hendry, D. F. (1989). Comment on intertemporal consumer behaviour under structural changes in income. Econometric Reviews 8, 111--121.

Hendry, D. F. (1993). Econometrics: Alchemy or Science? Oxford: Blackwell Publishers.

Hendry, D. F. (1995a). Dynamic Econometrics. Oxford: Oxford University Press.

Hendry, D. F. (1995b). On the interactions of unit roots and exogeneity. Econometric Reviews 14, 383--419.

Hendry, D. F. (1995c). A theory of co-breaking. Mimeo, Nuffield College, University of Oxford.

Hendry, D. F. (1996). On the constancy of time-series econometric equations. Economic and Social Review 27, 401--422.

Hendry, D. F. (1997). On congruent econometric relations: A comment. Carnegie--Rochester Conference Series on Public Policy 47, 163--190.

Hendry, D. F. (2000a). Econometrics: Alchemy or Science? Oxford: Oxford University Press. New Edition.

Hendry, D. F. (2000b). Epilogue: The success of general-to-specific model selection. See Hend00Al, pp. 467--490. New Edition.

Hendry, D. F. and G. J. Anderson (1977). Testing dynamic specification in small simultaneous systems: An application to a model of building society behaviour in the United Kingdom. In M. D. Intriligator (Ed.), Frontiers in Quantitative Economics, Volume 3, pp. 361--383. Amsterdam: North Holland Publishing Company. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.

Hendry, D. F. and J. A. Doornik (1994). Modelling linear dynamic econometric systems. Scottish Journal of Political Economy 41, 1--33.

Hendry, D. F. and J. A. Doornik (1997). The implications for econometric modelling of forecast failure. Scottish Journal of Political Economy 44, 437--461. Special Issue.

Hendry, D. F. and N. R. Ericsson (1991). Modeling the demand for narrow money in the United Kingdom and the United States. European Economic Review 35, 833--886.

Hendry, D. F., S. Johansen, and C. Santos (2004). Selecting a regression saturated by indicators. Unpublished paper, Economics Department, University of Oxford.

Hendry, D. F. and K. Juselius (2000). Explaining cointegration analysis: Part I. Energy Journal 21, 1--42.

Hendry, D. F. and H.-M. Krolzig (1999). Improving on `Data mining reconsidered' by K.D. Hoover and S.J. Perez. Econometrics Journal 2, 202--219. Reprinted in J. Campos, N.R. Ericsson and D.F. Hendry (eds.), General to Specific Modelling. Edward Elgar, 2005.

Hendry, D. F. and H.-M. Krolzig (2001). Automatic Econometric Model Selection. London: Timberlake Consultants Press.

Hendry, D. F. and H.-M. Krolzig (2005). The properties of automatic Gets modelling. Economic Journal 115, C32--C61.

Hendry, D. F. and M. Massmann (2007). Co-breaking: Recent advances and a synopsis of the literature. Journal of Business and Economic Statistics 25, 33--51.

Hendry, D. F. and G. E. Mizon (1978). Serial correlation as a convenient simplification, not a nuisance: A comment on a study of the demand for money by the Bank of England. Economic Journal 88, 549--563. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Hendry, D. F. and G. E. Mizon (1993). Evaluating dynamic econometric models by encompassing the VAR. In P. C. B. Phillips (Ed.), Models, Methods and Applications of Econometrics, pp. 272--300. Oxford: Basil Blackwell. Reprinted in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Hendry, D. F. and G. E. Mizon (1999). The pervasiveness of Granger causality in econometrics. See Engl99, pp. 102--134.

Hendry, D. F. and M. S. Morgan (1989). A re-analysis of confluence analysis. Oxford Economic Papers 41, 35--52.

Hendry, D. F. and M. S. Morgan (1995). The Foundations of Econometric Analysis. Cambridge: Cambridge University Press.

Hendry, D. F. and A. J. Neale (1987). Monte Carlo experimentation using PC-NAIVE. In T. Fomby and G. F. Rhodes (Eds.), Advances in Econometrics, Volume 6, pp. 91--125. Greenwich, Connecticut: Jai Press Inc.

Hendry, D. F. and A. J. Neale (1988). Interpreting long-run equilibrium solutions in conventional macro models: A comment. Economic Journal 98, 808--817.

Hendry, D. F. and A. J. Neale (1991). A Monte Carlo study of the effects of structural breaks on tests for unit roots. In P. Hackl and A. H. Westlund (Eds.), Economic Structural Change, Analysis and Forecasting, pp. 95--119. Berlin: Springer-Verlag.

Hendry, D. F., A. J. Neale, and N. R. Ericsson (1991). PC-NAIVE, An Interactive Program for Monte Carlo Experimentation in Econometrics. Version 6.0. Oxford: Institute of Economics and Statistics, University of Oxford.

Hendry, D. F., A. J. Neale, and F. Srba (1988). Econometric analysis of small linear systems using Pc-Fiml. Journal of Econometrics 38, 203--226.

Hendry, D. F. and B. Nielsen (2007). Econometric Modeling: A Likelihood Approach. Princeton: Princeton University Press.

Hendry, D. F., A. R. Pagan, and J. D. Sargan (1984). Dynamic specification. See Gril84, pp. 1023--1100. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Hendry, D. F. and J.-F. Richard (1982). On the formulation of empirical models in dynamic econometrics. Journal of Econometrics 20, 3--33. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press and in Hendry D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Hendry, D. F. and J.-F. Richard (1983). The econometric analysis of economic time series (with discussion). International Statistical Review 51, 111--163. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.

Hendry, D. F. and J.-F. Richard (1989). Recent developments in the theory of encompassing. In B. Cornet and H. Tulkens (Eds.), Contributions to Operations Research and Economics. The XXth Anniversary of CORE, pp. 393--440. Cambridge, MA: MIT Press. Reprinted in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Hendry, D. F. and F. Srba (1980). AUTOREG: A computer program library for dynamic econometric models with autoregressive errors. Journal of Econometrics 12, 85--102. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.

Hendry, D. F. and T. von Ungern-Sternberg (1981). Liquidity and inflation effects on consumers' expenditure. In A. S. Deaton (Ed.), Essays in the Theory and Measurement of Consumers' Behaviour, pp. 237--261. Cambridge: Cambridge University Press. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.

Hendry, D. F. and K. F. Wallis (Eds.) (1984). Econometrics and Quantitative Economics. Oxford: Basil Blackwell.

Hooker, R. H. (1901). Correlation of the marriage rate with trade. Journal of the Royal Statistical Society 64, 485--492. Reprinted in Hendry, D. F. and Morgan, M. S. (1995), The Foundations of Econometric Analysis. Cambridge: Cambridge University Press.

Hoover, K. D. and S. J. Perez (1999). Data mining reconsidered: Encompassing and the general-to-specific approach to specification search. Econometrics Journal 2, 167--191.

Jarque, C. M. and A. K. Bera (1987). A test for normality of observations and regression residuals. International Statistical Review 55, 163--172.

Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12, 231--254. Reprinted in R.F. Engle and C.W.J. Granger (eds), Long-Run Economic Relationships, Oxford: Oxford University Press, 1991, 131--52.

Johansen, S. (1995). Likelihood-based Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press.

Johansen, S. and K. Juselius (1990). Maximum likelihood estimation and inference on cointegration -- With application to the demand for money. Oxford Bulletin of Economics and Statistics 52, 169--210.

Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika 36, 149--176.

Judd, J. and J. Scadding (1982). The search for a stable money demand function: A survey of the post-1973 literature. Journal of Economic Literature 20, 993--1023.

Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T.-C. Lee (1985). The Theory and Practice of Econometrics, (2nd ed.). New York: John Wiley.

Kiviet, J. F. (1986). On the rigor of some mis-specification tests for modelling dynamic relationships. Review of Economic Studies 53, 241--261.

Kiviet, J. F. (1987). Testing Linear Econometric Models. Amsterdam: University of Amsterdam.

Kiviet, J. F. and G. D. A. Phillips (1992). Exact similar tests for unit roots and cointegration. Oxford Bulletin of Economics and Statistics 54, 349--367.

Kohn, A. (1987). False Prophets. Oxford: Basil Blackwell.

Koopmans, T. C. (Ed.) (1950). Statistical Inference in Dynamic Economic Models. Number 10 in Cowles Commission Monograph. New York: John Wiley & Sons.

Koopmans, T. C., H. Rubin, and R. B. Leipnik (1950). Measuring the equation systems of dynamic economics. See Koop50c, Chapter 2.

Kremers, J. J. M., N. R. Ericsson, and J. J. Dolado (1992). The power of cointegration tests. Oxford Bulletin of Economics and Statistics 54, 325--348.

Kuh, E., D. A. Belsley, and R. E. Welsh (1980). Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley.

Leamer, E. E. (1978). Specification Searches. Ad-Hoc Inference with Non-Experimental Data. New York: John Wiley.

Leamer, E. E. (1983). Let's take the con out of econometrics. American Economic Review 73, 31--43. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.

Ljung, G. M. and G. E. P. Box (1978). On a measure of lack of fit in time series models. Biometrika 65, 297--303.

Lovell, M. C. (1983). Data mining. Review of Economics and Statistics 65, 1--12.

Lucas, R. E. (1976). Econometric policy evaluation: A critique. In K. Brunner and A. Meltzer (Eds.), The Phillips Curve and Labor Markets, Volume 1 of Carnegie-Rochester Conferences on Public Policy, pp. 19--46. Amsterdam: North-Holland Publishing Company.

MacKinnon, J. G. (1991). Critical values for cointegration tests. In R. F. Engle and C. W. J. Granger (Eds.), Long-Run Economic Relationships, pp. 267--276. Oxford: Oxford University Press.

MacKinnon, J. G. and H. White (1985). Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics 29, 305--325.

Makridakis, S., S. C. Wheelwright, and R. C. Hyndman (1998). Forecasting: Methods and Applications (3rd ed.). New York: John Wiley and Sons.

Marschak, J. (1953). Economic measurements for policy and prediction. In W. C. Hood and T. C. Koopmans (Eds.), Studies in Econometric Method, Number 14 in Cowles Commission Monograph. New York: John Wiley & Sons.

Mizon, G. E. (1977). Model selection procedures. In M. J. Artis and A. R. Nobay (Eds.), Studies in Modern Economic Analysis, pp. 97--120. Oxford: Basil Blackwell.

Mizon, G. E. (1984). The encompassing approach in econometrics. See Hend84f, pp. 135--172.

Mizon, G. E. (1995). A simple message for autocorrelation correctors: Don't. Journal of Econometrics 69, 267--288.

Mizon, G. E. and D. F. Hendry (1980). An empirical application and Monte Carlo analysis of tests of dynamic specification. Review of Economic Studies 49, 21--45. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.

Mizon, G. E. and J.-F. Richard (1986). The encompassing principle and its application to non-nested hypothesis tests. Econometrica 54, 657--678.

Nelson, C. R. (1972). The prediction performance of the FRB-MIT-PENN model of the US economy. American Economic Review 62, 902--917. Reprinted in T.C. Mills (ed.), Economic Forecasting. Edward Elgar, 1999.

Newey, W. K. and K. D. West (1987). A simple positive semi-definite heteroskedasticity and autocorrelation-consistent covariance matrix. Econometrica 55, 703--708.

Nickell, S. J. (1985). Error correction, partial adjustment and all that: An expository note. Oxford Bulletin of Economics and Statistics 47, 119--130.

Nielsen, B. (2006a). Correlograms for non-stationary autoregressions. Journal of the Royal Statistical Society, B 68, 707--720.

Nielsen, B. (2006b). Correlograms for non-stationary autoregressions. Journal of the Royal Statistical Society B 68, 707--720.

Pagan, A. R. (1984). Model evaluation by variable addition. See Hend84f, pp. 103--135.

Perron, P. (1989). The Great Crash, the oil price shock and the unit root hypothesis. Econometrica 57, 1361--1401.

Pesaran, M. H. (1974). On the general problem of model selection. Review of Economic Studies 41, 153--171. Reprinted in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.

Phillips, P. C. B. (1986). Understanding spurious regressions in econometrics. Journal of Econometrics 33, 311--340.

Phillips, P. C. B. (1987). Time series regression with a unit root. Econometrica 55, 277--301.

Phillips, P. C. B. (1991). Optimal inference in cointegrated systems. Econometrica 59, 283--306.

Poincaré, H. (1905). Science and Hypothesis. New York: Science Press.

Priestley, M. B. (1981). Spectral Analysis and Time Series. London: Academic Press.

Ramsey, J. B. (1969). Tests for specification errors in classical linear least squares regression analysis. Journal of the Royal Statistical Society B 31, 350--371.

Richard, J.-F. (1980). Models with several regimes and changes in exogeneity. Review of Economic Studies 47, 1--20.

Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables. Econometrica 26, 393--415.

Sargan, J. D. (1959). The estimation of relationships with autocorrelated residuals by the use of instrumental variables. Journal of the Royal Statistical Society B 21, 91--105. Reprinted as pp. 87--104 in Sargan J. D. (1988), Contributions to Econometrics, Vol. 1, Cambridge: Cambridge University Press.

Sargan, J. D. (1964). Wages and prices in the United Kingdom: A study in econometric methodology (with discussion). In P. E. Hart, G. Mills, and J. K. Whitaker (Eds.), Econometric Analysis for National Economic Planning, Volume 16 of Colston Papers, pp. 25--63. London: Butterworth Co. Reprinted as pp. 275--314 in Hendry D. F. and Wallis K. F. (eds.) (1984). Econometrics and Quantitative Economics. Oxford: Basil Blackwell, and as pp. 124--169 in Sargan J. D. (1988), Contributions to Econometrics, Vol. 1, Cambridge: Cambridge University Press.

Sargan, J. D. (1980a). The consumer price equation in the post-war British economy. An exercise in equation specification testing. Review of Economic Studies 47, 113--135.

Sargan, J. D. (1980b). Some tests of dynamic specification for a single equation. Econometrica 48, 879--897. Reprinted as pp. 191--212 in Sargan J. D. (1988), Contributions to Econometrics, Vol. 1, Cambridge: Cambridge University Press.

Shenton, L. R. and K. O. Bowman (1977). A bivariate model for the distribution of √b1 and b2. Journal of the American Statistical Association 72, 206--211.

Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall.

Sims, C. A. (1980). Macroeconomics and reality. Econometrica 48, 1--48. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.

Sims, C. A., J. H. Stock, and M. W. Watson (1990). Inference in linear time series models with some unit roots. Econometrica 58, 113--144.

Spanos, A. (1986). Statistical Foundations of Econometric Modelling. Cambridge: Cambridge University Press.

Wallis, K. F. (1987). Time series analysis of bounded economic variables. Journal of Time Series Analysis 8, 115--123.

White, H. (1980). A heteroskedastic-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817--838.

White, H. (1984). Asymptotic Theory for Econometricians. London: Academic Press.

White, H. (1990). A consistent model selection. In C. W. J. Granger (Ed.), Modelling Economic Series, pp. 369--383. Oxford: Clarendon Press.

Wooldridge, J. M. (1999). Asymptotic properties of some specification tests in linear models with integrated processes. See Engl99, pp. 366--384.

Working, E. J. (1927). What do statistical demand curves show? Quarterly Journal of Economics 41, 212--235.

Yule, G. U. (1926). Why do we sometimes get nonsense-correlations between time-series? A study in sampling and the nature of time series (with discussion). Journal of the Royal Statistical Society 89, 1--64. Reprinted in Hendry, D. F. and Morgan, M. S. (1995), The Foundations of Econometric Analysis. Cambridge: Cambridge University Press.