These reference chapters have been taken from Volume I, and use the same chapter and section numbering as the printed version.
The Descriptive Statistics entry on the Model menu involves the formal calculation of statistics on database variables. Modelrelated statistics are considered in Chapters 17 and 18. This chapter provides the formulae underlying the computations. PcGive will use the largest available sample by default, here denoted by t=1,...,T. It is always possible to graph or compute the statistics over a shorter sample period.
This reports sample means and standard deviations of the selected variables:
x= 1/T ∑_{t=1}^{T}x_{t}, s=( 
 ∑_{t=1}^{T}( x_{t}x) ^{2})^{½}. 
The correlation coefficient r_{xy} between x and y is:


The correlation matrix of the selected variables is reported as a symmetric matrix with the diagonal equal to one. Each cell records the simple correlation between the two relevant variables. The same sample is used for each variable; observations with missing values are dropped.
This is the test statistic described in §18.5.4, which amounts to testing whether the skewness and kurtosis of the variable corresponds to that of a normal distribution. Missing value are dropped from each variable, so the sample size may be different for each variable.
This prints the sample autocorrelation function of the selected variables, as described in §18.5.2. The same sample is used for each variable; observations with missing values are dropped.
A crucial property of any economic variable influencing the behaviour of statistics in econometric models is the extent to which that variable is stationary. If the autoregressive description


has a root on the unit circle, then conventional distributional results are not applicable to coefficient estimates. As the simplest example, consider:
x_{t}=α+βx_{t1}+ε_{t} where β=1 and ε_{t}~IN( 0,σ_{ε} ^{2}) , 
which generates a random walk (with drift if α≠0). Here, the autoregressive coefficient is unity and stationarity is violated. A process with no unit or explosive roots is said to be I(0); a process is I( d) if it needs to be differenced d times to become I(0) and is not I(0) if only differenced d1 times. Many economic time series behave like I(1), though some appear to be I(0) and others I(2).
The DurbinWatson statistic for the level of a variable offers one simple characterization of this integrated property:


If x_{t} is a random walk, DW will be very small. If x_{t} is white noise, DW will be around 2. Very low DW values thus indicate that a transformed model may be desirable, perhaps including a mixture of differenced and disequilibrium variables.
An augmented DickeyFuller (ADF) test for I(1) against I(0) (see Dickey and Fuller, 1981) is provided by the tstatistic on β̂ in:


The constant or trend can optionally be excluded from (eq:16.4); the specification of the lag length n assumes that u_{t} is white noise. The null hypothesis is H_{0}: β=0; rejection of this hypothesis implies that x_{t} is I(0). A failure to reject implies that Δx_{t} is stationary, so x_{t} is I(1). This is a second useful description of the degree of integratedness of x_{t}. The DickeyFuller (DF) test has no lagged first differences on the righthand side ( n=0) . On this topic, see the Oxford Bulletin of Economics and Statistics (Hendry, 1986, Banerjee and Hendry, 1992), and Banerjee, Dolado, Galbraith, and Hendry (1993). To test whether x_{t} is I(1), commence with the next higher difference:


Output of the ADF(n) test of (eq:16.4) consists of:
coefficients  α̂ and μ̂ (if included), β̂, γ̂_{1},...,γ̂_{n}, 
standard errors  SE(α̂), SE(μ̂), SE(β̂), SE(γ̂_{i}), 
tvalues  t_{α} , t_{μ} , t_{β} , t_{γi}, 
σ̂  as (eq:17.10), 
DW  (eq:16.3) applied to û_{t}, 
DW(x)  (eq:16.3) applied to x_{t}, 
ADF(x)  t_{β} , 
Critical values  
RSS  as (eq:17.11). 
Most of the formulae for the computed statistics are more conveniently presented in the next section on simple dynamic regressions, but the tstatistic is defined (e.g., for α̂) as t_{α} = α̂/SE(α̂), using the formula in (eq:17.5). Critical values are derived from the reponse surfaces in MacKinnon (1991), and depend on whether a constant, or constant and trend, are included (seasonals are ignored). Under the null (β=0), α≠0 entails a trend in {x_{t}} and μ≠0 implies a quadratic trend. However, under the stationary alternative, α=0 would impose a zero trend. Thus the test ceases to be similar if the polynomial in time (1,t,t^{2} etc.) in the model is not at least as large as that in the data generating process (see, for example, Kiviet and Phillips, 1992). This problem suggests allowing for a trend in the model unless the data is anticipated to have a zero mean in differences. The socalled EngleGranger twostep method amounts to applying the ADF test to residuals from a prior static regression (the first step). The response surfaces need to be adjusted for the number of variables involved in the first step: see MacKinnon (1991).
The default of PcGive is to report a summary test output for the sequence of ADF(n)...ADF(0) tests. The summary table lists, for j=n,...,0:
Dlag  j (the number of lagged differences), 
tadf  the tvalue on the lagged level: t_{β} , 
beta Y_1  the coefficient on the lagged level: β, 
σ̂  as (eq:17.10), 
tDY_lag  tvalue of the longest lag: t_{γj}, 
tprob  significance of the longest lag: 1P(  τ ≤ t_{γj} ) , 
AIC  Akaike criterion, see §17.2.12 
Fprob  significance level of the Ftest on the lags dropped up to that point, 
Critical values are given, and significance of the ADF test is marked by asterisks: ^{*} indicates significance at 5%, ^{**} at 1%.
Principal components analysis (PCA) amounts to an eigenvalue analysis of the correlation matrix. Because the correlation matrix has ones on the diagonal, its trace equals k when k variables are involved. Therefore, the sum of the eigenvalues also equals k. Moreover, all eigenvalues are nonnegative.
The eigenvalue decomposition of the k ×k correlation matrix C is:
C=HΛ H', 
where λ is the diagonal matrix with the ordered eigenvalues λ_{1} ≥...≥λ_{k} ≥0 on the diagonal, and H=(h_{1}, ..., h_{k}) the matrix with the corresponding eigenvectors in the columns, H'H=I_{k}. The matrix of eigenvectors diagonalizes the correlation matrix:
H' C H =Λ. 
Let (x_{1}, ..., x_{k}) denote the variables selected for principal components analysis (a T ×k matrix), and Z=(z_{1}, ..., z_{k}) the standardized data (i.e. in deviation from their mean, and scaled by the standard deviation). Then Z'Z/T = C. The jth principal component is defined as:
p_{j} = Zh_{j} = z_{1} h_{1j}+...+z_{k} h_{kj}, 
and accounts for 100 λ_{j}/k % of the variation. The largest m principal components together account for 100∑_{j=1}^{m} λ_{j}/k % of the variation.
Principal components analysis is used to capture the variability of the data in a small number of factors. Using the correlation matrix enforces a common scale on the data (analysis in terms of the variance matrix is not invariant to scaling). Some examples of the use of PCA in financial applications are given in Alexander (2001, Ch.6).
PCA is sometimes used to reconstruct missing data on y in combination with data condensation. Assume that T observations are available on y, but T+H on the remaining data, then two methods could be considered:

More recently, PCA has become a popular tool for forecasting.
Define the sample autocovariances {ĉ_{j}} of a stationary series x_{t}, t=...,T:


using the full sample mean x= 1/T ∑_{t=1}^{T}x_{t}. The variance σ̂^{2}_{x} corresponds to ĉ_{0}.
The autocorrelation function (ACF) plots the series {r̂_{j}} where r̂_{j} is the sample correlation coefficient between x_{t} and x_{tj}. The length of the ACF is specified by the user, leading to a figure which shows ( r̂_{1},r̂_{2},...,r̂_{s}) plotted against ( 1,2,...,s) where for any j when x is any chosen variable:


The first autocorrelation, {r̂_{0}}, is equal to one, and omitted from the graphs.
The asymptotic variance of the autocorrelations is 1/T, so approximate 95% error bars are indicated at ±2T^{1/2} (see e.g. Harvey, 1993, p.42).
If a series is nonstationary, the usual definition of a correlation between successive lags is required: see Nielsen (2006a). This comment also applies to the partial autocorrelation function described in the next section.
Given the sample autocorrelation function {r̂_{j}}, the partial autocorrelations are computed using Durbin's method as described in Golub and Van Loan (1989, §4.7.2). This corresponds to recursively solving the YuleWalker equations. For example, with autocorrelations, r̂_{0}, r̂_{1}, r̂_{2}, ..., the first partial correlation is α̂_{0}=1 (omitted from the graphs). The second, α̂_{1}, is the solution from
( 
 ) = ( 
 ) ( 
 ), 
et cetera.
The periodogram is defined as:


Note that p(0)=0.
When the periodogram is plotted, only frequencies greater than zero and up to π are used. Moreover, the xaxis, with values 0,...,π, is represented as 0,...,1. So, when T=4 the x coordinates are 0.5,1 corresponding to π/2, π. When T=5, the x coordinates are 0.4,0.8 corresponding to 2π/5, 4π/5.
The estimated spectral density is a smoothed function of the sample autocorrelations {r̂_{j}}, defined as in (eq:16.7). The sample spectral density is then defined as:


where  ^{.} takes the absolute value, so that, for example, r̂_{ 1 }=r̂_{1}. The K( ^{.}) function is called the lag window. OxMetrics uses the Parzen window:


We have that K(j)=K(j), so that the sign of j does not matter ( cos (x)= cos (x)). The r̂_{j}s are based on fewer observations as j increases. The window function attaches decreasing weights to the autocorrelations, with zero weight for j>m. The parameter m is called the lag truncation parameter. In OxMetrics, this is taken to be the same as the chosen length of the correlogram. For example, selecting s=12 (the with length setting in the dialog) results in m=12. The larger m, the less smooth the spectrum becomes, but the lower the bias. The spectrum is evaluated at 128 points between 0 and π. For more information see Priestley (1981) and Granger and Newbold (1986).
Given a data set {x_{t}}=( x_{1}...x_{T}) which are observations on a random variable X. The range of {x_{t}} is divided into N intervals of length h with h defined below. Then the proportion of x_{t} in each interval constitutes the histogram; the sum of the proportions is unity on the scaling that is used. The density can be estimated as a smoothed function of the histogram using a normal or Gaussian kernel. This can then be summed (`integrated') to obtain the estimated cumulative distribution function (CDF).
Denote the actual density of X at x by f_{x}( x) . A nonparametric estimate of the density is obtained from the sample by:


where h is the window width or smoothing parameter, and K( ^{.}) is a kernel such that:
∫_{∞}^{∞} K( z) dz=1. 
PcGive sets:
h=1.06σ̂_{x}/T^{0.2} 
as a default, and uses the standard normal density for K( ^{.}) :


f_{x}( x) ̂ is usually calculated for 128 values of x, using a fast Fourier transform. An excellent reference on density function estimation is Silverman (1986).
The variable in a QQ plot would normally hold critical values which are hypothesized to come from a certain distribution. The QQ plot function then draws a cross plot of these observed values (sorted), against the theoretical quantiles. The 45^{o} line is drawn for reference (the closer the cross plot to this line, the better the match).
The normal QQ plot includes the pointwise asymptotic 95% standard error bands, as derived in Engler and Nielsen (2009)) for residuals of regression models (possibly autoregressive) with an intercept.
Single equation estimation is allowed by:
OLSCS  ordinary least squares (crosssection modelling) 
IVECS  instrumental variables estimation (crosssection modelling) 
OLS  ordinary least squares 
IVE  instrumental variables estimation 
RALS  r^{th} order autoregressive least squares 
NLS  nonlinear least squares 
ML  maximum likelihood estimation 
Once a model has been specified, a sample period selected, and an estimation method chosen, the equation can be estimated. OLSCS/IVECS and OLS/IVE only differ in the way the sample period is selected. In the first, cross section, case, all observations with missing values are omitted. Therefore, `holes' in the database are simply skipped. In crosssection mode it is also possible to specify a variable Sel by which to select the sample. In that case, observations where Sel has a 0 or missing values are omitted from the estimation sample (but, if data is available, included in the prediction set). In dynamic regression, the observations must be consecutive in time, and the maximum available sample is the leading contiguous sample. The following table illustrates the default sample when regressing y on a constant:

For ease of notation, the sample period is denoted t=1,...,T+H, after allowing for any lagged variables created where H is the forecast horizon. The data used for estimation are X=( x_{1}...x_{T}) . The H retained observations X_{H}=( x_{T+1}...x_{T+H}) are used for static (1step) forecasting and evaluating parameter constancy.
This chapter discusses the statistics reported by PcGive following model estimation. The next chapter presents the wide range of evaluation tools available following successful estimation. Sections marked with * denote information that can be shown or omitted on request. In the remainder there is no distinction between OLS/IVE and OLSCS/IVECS.
In most cases, recursive estimation is available:
RLS  recursive OLS 
RIVE  recursive IVE 
RNLS  recursive NLS 
RML  recursive ML 
Recursive OLS and IV estimation methods are initialized by a direct estimation over t=1,...,M1, followed by recursive estimation over t=M,...,T. RLS and RIVE update inverse moment matrices . This is inherently somewhat numerically unstable, but, because it is primarily a graphical tool, this is not so important.
Recursive estimation of nonlinear models is achieved by the bruteforce method: first estimate for the full sample, then shrink the sample by one observation at a time. At each step the estimated parameters of the previous step are used as starting values, resulting in a considerably faster algorithm.
The final estimation results are always based on direct fullsample estimation, so unaffected whether recursive or nonrecursive estimation is used. The recursive output can be plotted from the recursive graphics dialog.
The algebra of OLS estimation is well established from previous chapters. The model is:
y_{t}=β'x_{t}+u_{t}, with u_{t}~IN( 0,σ^{2}) t=1,...,T, 


The vectors β and x_{t} are k×1. The OLS estimates of β are:




and estimated residual variance


Forecast statistics are provided for the H retained observations (only if H≠0). For OLS, these are comprehensive 1step ahead forecasts and tests, described below.
The estimation output is presented in columnar format, where each row lists information pertaining to each variable (its coefficient, standard error, tvalue, etc.). Optionally, the estimation results can be printed in equation format,, which is of the form coefficient × variable with standard errors in parentheses underneath.
The first column of these results records the names of the variables and the second, the estimated regression coefficients β̂=( X'X) ^{1}X'y. PcGive does actually not use this expression to estimate β̂. Instead it uses the QR decomposition with partial pivoting, which analytically gives the same result, but in practice is a bit more reliable (i.e. numerically more stable). The QR decomposition of X is X=QR, where Q is T ×T and orthogonal (that is, Q'Q=I), and R is T ×k and upper triangular. Then X'X=R'R.
The following five columns give further information about each of the magnitudes described below in §17.2.2 to §17.2.11.
These are obtained from the variancecovariance matrix:


where d_{ii} is the i^{th} diagonal element of ( X'X) ^{1} and σ̂_{u} is the standard error of the regression, defined in (eq:17.4).
These statistics are conventionally calculated to determine whether individual coefficients are significantly different from zero:


where the null hypothesis H_{0} is β_{i}=0. The null hypothesis is rejected if the probability of getting a tvalue at least as large is less than 5% (or any other chosen significance level). This probability is given as:


in which τ has a Student tdistribution with Tk degrees of freedom. The tprobabilities do not appear when all other options are switched on.
When H_{0} is true (and the model is otherwise correctly specified in a stationary process), a Student tdistribution is used since the sample size is often small, and we only have an estimate of the parameter's standard error: however, as the sample size increases, τ tends to a standard normal distribution under H_{0}. Large tvalues reject H_{0}; but, in many situations, H_{0} may be of little interest to test. Also, selecting variables in a model according to their tvalues implies that the usual (NeymanPearson) justification for testing is not valid (see, for example, Judge, Griffiths, Hill, Lütkepohl, and Lee, 1985).
The final column lists the squared partial correlations under the header Part.R^2. The j^{th} entry in this column records the correlation of the j^{th} explanatory variable with the dependent variable, given the other k1 variables. Adding further explanatory variables to the model may either increase or lower the squared partial correlation, and the former may occur even if the added variables are correlated with the already included variables. If the squared partial correlations fall on adding a variable, then that is suggestive of collinearity for the given equation parametrization: that is, the new variable is a substitute for, rather than a complement to, those already included.
Beneath the columnar presentation an array of summary statistics is also provided as follows:
The residual variance is defined as:


where the residuals are defined as:


The equation standard error (ESE) is the square root of (eq:17.10):


This is labelled sigma in the regression output.


The variation in the dependent variable, or the total sum of squares (TSS), can be broken up into two parts: the explained sum of squares (ESS) and the residual sum of squares (RSS). In symbols, TSS=ESS+RSS, or:
∑_{t=1}^{T}( y_{t}y) ^{2}=∑_{t=1}^{T}( ŷ_{t}y) ^{2}+∑_{t=1}^{T}û_{t}^{2}, 
and hence:
R^{2}= ESS/TSS = 
 =1 
 =1 RSS/TSS , 
assuming a constant is included. Thus, R^{2} is the proportion of the variance of the dependent variable which is explained by the variables in the regression. By adding more variables to a regression, R^{2} will never decrease, and it may increase even if nonsense variables are added. Hence, R^{2} may be misleading. Also, R^{2} is dependent on the choice of transformation of the dependent variable (for example, y versus Δy)  as is the Fstatistic below. The equation standard error, σ̂_{u}, however, provides a better comparative statistic because it is adjusted by the degrees of freedom. Generally, σ̂ can be standardized as a percentage of the mean of the original level of the dependent variable (except if the initial mean is zero) for comparisons across specifications. Since many economic magnitudes are inherently positive, that standardization is often feasible. If y is in logs, 100σ̂ is the percentage standard error.
R^{2} is not reported if the regression does not have an intercept.
The formula was already given:


Here, the null hypothesis is that the population R^{2} is zero, or that all the regression coefficients are zero (excluding the intercept). The value for the Fstatistic is followed by its probability value between square brackets.
The adjusted R^{2} incorporates a penalty for the number of regressors:
R^{2}= R^{2}  
 (1  R^{2}), 
assuming a constant is included. The adjusted Rsquared can go down when the number of variables increases. Nonetheless, there is no rationale to use it as a model selection criterion.
An alternative way to express it uses (eq:17.8) and (eq:17.13):
R^{2}= 1  
 , 
so maximizing R^{2} corresponds to minimizing σ̂^{2}_{u}.
R^{2} is not reported if the regression does not have an intercept.
The loglikelihood for model (eq:17.1) is:
l(β,σ^{2}  y, X) = T/2 log 2π T/2 log σ^{2}  ½ 
 . 
Next, we can concentrate σ^{2} out of the loglikelihood to obtain:
l_{c}(β  y, X) = K_{c}  T/2 log 
 , 
where
K_{c} = T/2(1 + log 2π). 
The reported loglikelihood includes the constant, so corresponds to:
l_{c} (β  y, X)= K_{c}  T/2 log RSS/T . 
The final entries list the number of observations used in the regressor (so after allowing for lags), and the number of estimated parameters. This is followed by the mean and standard error of the dependent variable:


Note
that we use T1 in the denominator of σ̂^{2}_{y}, so this
is what would be reported as the equation standard error (eq:17.10)
when regressing the dependent variable on just a constant.
The four statistics reported are the Schwarz criterion (SC), the HannanQuinn (HQ) criterion, the Final Prediction Error (FPE), and the Akaike criterion (AIC). Here:


using the maximum likelihood estimate of σ^{2}:
σ̃^{2}= 
 σ̂^{2}= 1/T ∑_{t=1}^{T}û_{t}^{2}. 
For a discussion of the use of these and related scalar measures to choose between alternative models in a class, see Judge, Griffiths, Hill, Lütkepohl, and Lee (1985) and §18.9 below.
These provide consistent estimates of the regression coefficients' standard errors even if the residuals are heteroscedastic in an unknown way. Large differences between the corresponding values in §17.2.2 and §17.2.13 are indicative of the presence of heteroscedasticity, in which case §17.2.13 provides the more useful measure of the standard errors (see White, 1980). PcGive contains two methods of computing heteroscedasticconsistent standard errors: as described in White (1980) (labelled HCSE), or the Jackknife estimator from MacKinnon and White (1985) (labelled JHCSE; for which the code was initially provided by James MacKinnon).
The heteroscedasticity and autocorrelation consistent standard errors are reported in the column labelled HACSE. This follows Newey and West (1987), also see Andrews (1991).
The R^{2} is preceded by the seasonal means s of the first difference of the dependent variable (\Delta y for annual data, four quarterly means for quarterly data, twelve monthly means for monthly data etc.).
The R^{2} relative to difference and seasonals is a measure of the goodness of fit relative to ∑(Δy_{t}s)^{2} instead of ∑(y_{t}y)^{2} in the denominator of R^{2} (keeping ∑û_{t}^{2} in the numerator), where s denotes the relevant seasonal mean. Despite its label, such a measure can be negative: if it is, the fitted model does less well than a regression of Δy_{t} on seasonal dummies.
This reports the sample means and sample standard deviations of the selected variables:
x= 1/T ∑_{t=1}^{T}x_{t}, s=( 
 ∑_{t=1}^{T}( x_{t}x) ^{2})^{½}. 
The correlation matrix of the selected variables is reported as a lowertriangular matrix with the diagonal equal to one. Each cell records the simple correlation between the two relevant variables. The calculation of the correlation coefficient r_{xy} between x and y is:


The matrix of the estimated parameters' variances is reported as lower triangular. Along the diagonal, we have the variance of each estimated coefficient, and off the diagonal, the covariances. The k×k variance matrix of β̂ is estimated by:


where σ̂^{2} is the fullsample equation error variance. The variancecovariance matrix is only shown when requested, in which case it is reported before the equation output.
The remaining statistics only appear if observations were withheld for forecasting purposes:
Following estimation over t=1,...,T, 1step forecasts (or static forecasts) are given by:


which requires the observations X_{H}'=(x_{T+1},...,x_{T+H}). The 1step forecast error is the mistake made each period:




Assuming that E[β̂]=β, then E[e_{t}]=0 and:


This corresponds to the results given for the innovations in recursive estimation. The whole vector of forecast errors is e=( e_{T+1},...,e_{T+H}) '. V[e] is derived in a similar way:


Estimated variances are obtained after replacing σ_{u}^{2} by σ̂_{u}^{2}.
The columns respectively report the date for which the forecast is made, the realized outcome (y_{t}), the forecast (ŷ_{t}), the forecast error (e_{t}=y_{t}ŷ_{t}), the standard error of the 1step forecast (SE( e_{t}) =√V[ e_{t}] ̂), and a tvalue (that is, the standardized forecast error e_{t}/SE( e_{t}) ).
A χ^{2} statistic follows the 1step analysis, comparing within and postsample residual variances. Neither this statistic nor η_{3} below measure absolute forecast accuracy. The statistic is calculated as follows:


The null hypothesis is `no structural change in any parameter between the sample and the forecast periods' (denoted 1 and 2 respectively), H_{0}: β_{1}=β_{2}; σ_{1}^{2}=σ_{2}^{2}. A rejection of the null hypothesis of constancy by ξ_{3} below implies a rejection of the model used over the sample period  so that is a model specification test  whereas the use of ξ_{1} is more as a measure of numerical parameter constancy, and it should not be used as a modelselection device (see Kiviet, 1986). However, persistently large values for this statistic imply that the equation under study will not provide very accurate ex ante predictions, even one step ahead. An approximate Fequivalent is given by:


A second statistic takes parameter uncertainty into account, taking the denominator from (eq:17.20):


This test is not reported in singleequation modelling, but individual terms of the summation can be plotted in the graphical analysis.
This is the main test of parameter constancy and has the form:


where H_{0} is as for ξ_{1}. For fixed regressors, the Chow (1960) test is exactly distributed as an F, but is only approximately (or asymptotically) so in dynamic models.
Alternatively expressed, the Chow test is:


We can now see the relation between ξ_{3} and ξ_{1}: the latter uses V[ e] ̂=σ̂_{u}^{2}I, obtained by dropping the (asymptotically negligible) term V[β̂] in (eq:17.21). In small samples, the dropped term is often not negligible, so ξ_{1} should not be taken as a test. The numerical value of ξ_{1} always exceeds that of ξ_{3}: the difference indicates the relative increase in prediction uncertainty arising from estimating, rather than knowing, the parameters.
PcGive computes the Chow test efficiently, by noting that:


The recursive formulae are applicable over the sample T+1,...,T+H, and under the null of correct specification and H_{0} of ξ_{1} above, then the standardized innovations {ν_{t}/(ω_{t})^{1/2}} are distributed as IN(0,σ^{2}_{u}). Thus:


This tests for a different facet of forecast inaccuracy in which the forecast errors have a small but systematic bias. This test is the same as an endpoint CUSUM test of recursive residuals, but using only the forecast sample period (see Harvey and Collier, 1977).


in which we have n1 endogenous variables y_{t}^{*} and q_{1} nonmodelled variables w_{t} on the righthand side (the latter may include lagged endogenous variables). We assume that we have q_{2} additional instruments, labelled w_{t}^{*}. Write y_{t}=(y_{t}:y_{t}^{*'})' for the n×1 vector of endogenous variables. Let z_{t} denote the set of all instrumental variables (nonendogenous included regressors, plus additional instruments): z_{t}=(w_{t}':w_{t}^{*'})', which is a vector of length q=q_{1}+q_{2}.
The reduced form (RF) estimates are only printed on request. If Z^{'}=( z_{1}...z_{T}) , and y_{t} denotes all the n endogenous variables including y_{t} at t with Y'=(y_{1},...,y_{T}), then the RF estimates are:


which is q×n. The elements of Π̂' relevant to each endogenous variable are written:


with Y_{i}'=(y_{i1},...,y_{iT}) the vector of observations on the i^{th} endogenous variable. Standard errors etc. all follow as for OLS above (using Z, Y_{i} for X,y in the relevant equations there).
Generalized instrumental variables estimates for the k=n1+q_{1} coefficients of interest β=(β_{0}':β_{1}')' are:


using x_{t}=(y_{t}^{*'}:w_{t}')', X'=( x_{1}...x_{T}) , y=(y_{1}...y_{T})', which is the lefthand side of (eq:17.29), and Z is as in (eq:17.30). This allows for the case of more instruments than explanatory variables (q>k), and requires rank(X'Z)=k and rank(Z'Z)=q. If q=k the equation simplifies to:


As for OLS, PcGive does not use expression (eq:17.32) directly, but instead uses the QR decomposition for numerically more stable computation. The error variance is given by


The variance of β̃ is estimated by:


Again the output is closely related to that reported for least squares except that the columns for HCSE, partial r^{2} and instability statistics are omitted. However, RSS, σ̃ and DW are recorded, as is the reduced form σ̂ (from regressing y_{t} on z_{t}, already reported with the RF equation for y_{t}). Additional statistics reported are :
This tests for the validity of the choice of the instrumental variables as discussed by Sargan (1964). It is asymptotically distributed as χ^{2}(q_{2}n+1) when the q_{2}n+1 overidentifying instruments are independent of the equation error. It is also interpretable as a test of whether the restricted reduced form (RRF) of the structural model (y_{t} on x_{t} plus x_{t} on z_{t}) parsimoniously encompasses the unrestricted reduced form (URF: y_{t} on z_{t} directly):


with π̂=( Z'Z) ^{1}Z'y being the unrestricted reduced form estimates.
Reported is the χ^{2} test of β=0 (other than the intercept) which has a crude correspondence to the earlier Ftest. On H_{0}: β=0, the reported statistic behaves asymptotically as a χ^{2}( k1) . First define


Then ξ_{β} /σ̃_{ε} _{ app ̃} χ^{2}(k) would test whether all k coefficients are zero. To keep the intercept separate, we compute:


This amounts to using the formula for β̃ (eq. (eq:17.32)) in ξ_{β} with yyι instead of y.
A forecast test is provided if H observations are retained for forecasting. For IVE there are endogenous regressor variables: the only interesting issue is that of parameter constancy and correspondingly the output is merely ξ_{1} of (eq:17.22) using σ̃_{ε} and:


Dynamic forecasts (which require forecasts of the successive x_{T+1},...,x_{T+H}) could be obtained from multiple equation dynamic modelling, where the system as a whole is analyzed.
As discussed in the typology, if a dynamic model has common factors in its lag polynomials, then it can be reexpressed as having lowerorder systematic dynamics combined with an autoregressive error process (called COMFAC. If the autoregressive error is of r^{th} order, the estimator is called r^{th}order Autoregressive Least Squares or RALS, and it takes the form:






with ε_{t}~IN( 0,σ_{ε} ^{2}) .


as a function of the ( β,α) parameters yields a nonlinear least squares problem necessitating iterative solution. However, conditional on values of either set of parameters, f( ^{.}) is linear in the other set, so analytical first and second derivatives are easy to obtain. There is an estimatorgenerating equation for this whole class (see Hendry, 1976, Section 7), but as it has almost no efficient noniterative solutions, little is gained by its exploitation. Letting θ denote all of the unrestricted parameters in β_{0}( ^{.}) , {β_{i}( ^{.}) } and α( ^{.}) , then the algorithm programmed in PcGive for maximizing f( ^{.}) as a function of θ is a variant of the GaussNewton class. Let:


so that negligible crossproducts are eliminated, then at the i^{th} iteration:


where s_{i} is a scalar chosen by a line search procedure to maximize f(θ_{i+1}θ_{i}). The convergence criterion depends on q_{i}'Q_{i}^{1}q_{i} and on changes in θ_{i} between iterations. The bilinearity of f( ^{.}) is exploited in computing Q.
Before estimating by RALS, OLS estimates of {β_{i}} are calculated, as are LMtest values of {α_{i}}, where the prespecified autocorrelation order is `data frequency+1' (for example, 5 for quarterly data). These estimates are then used to initialize θ. However, the {α_{i}} can be reset by users. Specifically, for singleorder processes, u_{t}=α_{r}u_{tr}+ε_{t}, then α_{r} can be selected by a prior grid search. The user can specify the maximum number of iterations, the convergence tolerance, both the starting and ending orders of the polynomial α( L) in the form:
u_{t}=∑_{i=s}^{r}α_{i}u_{ti}+ε_{t}, 
and whether to minimize f( ^{.}) sequentially over s, s+1,...,r or merely the highest order, r.
On convergence, the variances of the θs are calculated (from Q^{1}), as are the roots of α( L) =0. The usual statistics for σ̂, RSS (this can be used in likelihoodratio tests between alternative nested versions of a model), tvalues etc. are reported, as is T^{1}∑( y_{t}y) ^{2} in case a pseudoR^{2} statistic is desired.




where β̂ and {α̂_{i}} are obtained over 1,...,T. The forecast error is:










where we define x_{t}^{+'}=x_{t}Σ_{s}^{r}α_{i}x_{ti}, û_{r}'=( û_{ts}...û_{tr}) , w_{t}'=( x_{t}^{+'}:û_{r}') , and θ'=( β':α') when α^{'}=( α_{s}...α_{r}) . E[ e_{t}] ~=0 for a correctlyspecified model. Finally, therefore (neglecting the secondorder dependence of the variance of w_{t}'(θθ̂) on θ̂ acting through w_{t}):


V[θ̂] is the RALS variancecovariance matrix, and from the forecasterror covariance matrix, the 1step analysis is calculated, as are parameterconstancy tests.
The output is as for OLS: the columns respectively report the date for which the forecast is made, the realized outcome (y_{t}), the forecast (ŷ_{t}), the forecast error (e_{t}=y_{t}ŷ_{t}), the standard error of the 1step forecast (SE( e_{t}) =√V[ e_{t}] ̂), and a tvalue (that is, the standardized forecast error e_{t}/SE( e_{t}) ).
The RALS analogues of the forecast test ξ_{1} of (eq:17.22), and of the Chow test η_{3} in (eq:17.26), are reported. The formulae follow directly from (eq:17.48) and (eq:17.53).
The nonlinear regression model is written as


We take θ to be a k×1 vector. For example:
y_{t}=θ_{0}+θ_{1}x_{t}^{θ2}+θ_{3}z_{t}^{1θ2}+u_{t}. 
Note that for fixed θ_{2} this last model becomes linear; for example, for θ_{2}= 1/2 :
y_{t}=θ_{0}+θ_{1}x_{t}^{*}+θ_{3}z_{t}^{*}+u_{t}, x_{t}^{*}=(x_{t})^{½}, z_{t}^{*}=(z_{t})^{½}, 
which is linear in the transformed variables x_{t}^{*}, z_{t}^{*}. As for OLS, estimation proceeds by minimizing the sum of squared residuals:


In linear models, this problem has an explicit solution; for nonlinear models the minimum has to be found using iterative optimization methods.
Instead of minimizing the sum of squares, PcGive maximizes the sum of squares divided by T:


As for RALS, an iterative procedure is used to locate the maximum:


with q( ^{.}) the derivatives of g(^{.}) with respect to θ_{j} (this is determined numerically), and Q( ^{.}) ^{1} a symmetric, positive definite matrix (determined by the BFGS method after some initial GaussNewton steps). Practical details of the algorithm are provided in §17.5.3; Volume II gives a more thorough discussion of the subject of numerical optimization. Before using NLS you are advised to study the examples given in the tutorial Chapter, to learn about the potential problems.
Output is as for OLS, except for the instability tests and HCSEs which are not computed. The variance of the estimated coefficients is determined numerically, other statistics follow directly, for example:


Forecasts are computed and graphed, but the only statistic reported is the ξ_{1} test of (eq:17.22), using 1step forecast errors:


We saw that for an independent sample of T observations and k parameters θ:


This type of model can be estimated with PcGive, which solves the problem:


Models falling in this class are, for example, binary logit and probit, ARCH, GARCH, Tobit, Poisson regression. As an example, consider the linear regression model. PcGive gives three ways of solving this:
Clearly, the first method is to be preferred when available.
Estimation of (eq:17.61) uses the same technique as NLS. The output is more concise, consisting of coefficients, standard errors (based on the numerical second derivative), tvalues, tprobabilities, and `loglik' which is ∑_{t=1}^{T}l(θ̂x_{t}). Forecasts are computed and graphed, but no statistics are reported.
Nonlinear model are formulated in algebra code. NLS requires the definition of a variable called actual, and one called fitted. It uses these to maximize minus the residual sum of squares divided by T:
 1/T ∑_{t=1}^{T} (actual_{t}  fitted_{t})^{2}. 
An example for NLS is:
actual = CONS; fitted = &0 + &1 * INC + &2 * lag(INC,1); &0 = 400; &1 = 0.8; &2 = 0.2;
This is just a linear model, and much more efficiently done using the normal options.
Models can be estimated by maximum likelihood if they can be written as a sum over the observations (note that the previous concentrated loglikehood cannot be written that way!). An additional algebra line is required, to define a variable called loglik. PcGive maximizes:
∑_{t=1}^{T} loglik_{t} . 
Consider, for example, a binary logit model:
actual = vaso; xbeta = &0 + &1 * Lrate + &2 * Lvolume; fitted = 1 / (1 + exp(xbeta)); loglik = actual * log(fitted) + (1actual) * log(1fitted); &0 = 0.74; &1 = 1.3; &2 = 2.3;
Here actual and fitted are not really that, but these variables define what is being graphed in the graphic analysis.
Note that algebra is a vector language without temporary variables, restricting the class of models that can be estimated. Nonlinear models are not stored for recall and progress reports.
After correct model specification, the method is automatically set to Nonlinear model (using ML if loglik is defined, NLS/RNLS otherwise); in addition, the following information needs to be specified:
NLS and ML estimation (and their recursive variants RNLS and RML) require numerical optimization to maximize the likelihood log L( φ( θ) ) = l( φ( θ) ) as a nonlinear function of θ. PcGive maximization algorithms are based on a Newton scheme:


with
PcGive uses the quasiNewton method developed by Broyden, Fletcher, Goldfarb, Shanno (BFGS) to update K = Q^{1} directly. It uses numerical derivatives to compute ∂l( φ( θ) ) / ∂θ_{i}. However, for NLS, PcGive will try GaussNewton before starting BFGS. In this hybrid method, GaussNewton is used while the relative progress in the function value is 20%, then the program switches to BFGS.
Starting values must be supplied. The starting value for K consistes of 0s offdiagonal. The diagonal is the minimum of one and the inverse of the corresponding diagonal element in the matrix consisting of the sums of the outerproducts of the gradient at the parameter starting values (numerically evaluated).
RNLS works as follows: starting values for θ and K for the first estimation (T1 observations) are the full sample values (T observations); then the sample size is reduced by one observation; the previous values at convergence are used to start with.
Owing to numerical problems it is possible (especially close to the maximum) that the calculated δ_{i} does not yield a higher likelihood. Then an s_{i}∈[0,1] yielding a higher function value is determined by a line search. Theoretically, since the direction is upward, such an s_{i} should exist; however, numerically it might be impossible to find one. When using BFGS with numerical derivatives, it often pays to scale the data so that the initial gradients are of the same order of magnitude.
The convergence decision is based on two tests. The first uses likelihood elasticities (∂l/∂ log θ):


The second is based on the onestepahead relative change in the parameter values:


The status of the iterative process is given by the following messages:
The step length s_{i} has become too small. The convergence test (eq:17.63) was not passed, using tolerance ε=ε_{2}.
The step length s_{i} has become too small. The convergence test (eq:17.63) was passed, using tolerance ε=ε_{2}.
Both convergence tests (eq:17.63) and (eq:17.64) were passed, using tolerance ε=ε_{1}.
The chosen default values for the tolerances are:


You can:
Graphic analysis focuses on graphical inspection of individual equations. Let y_{t}, ŷ_{t} denote respectively the actual (that is, observed) values and the fitted values of the selected equation, with residuals û_{t}=y_{t}ŷ_{t}, t=1,...,T. When H observations are retained for forecasting, then ŷ_{T+1},...,ŷ_{T+H} are the 1step forecasts. NLS/RNLS/ML use the variables labelled `actual' and `fitted' for y_{t}, ŷ_{t}.
Fourteen different graphs are available:
(y_{t},ŷ_{t}) over t. This is a graph showing the fitted (ŷ_{t}) and actual values (y_{t}) of the dependent variable over time, including the forecast period.
ŷ_{t} against y_{t}, also including the forecast period.
( û_{t}/σ̂) over t, where σ̂^{2}=(Tk)^{1}RSS is the fullsample equation error variance. As indicated, this graph shows the scaled residuals given by û_{t}/σ̂ over time.
The 1step forecasts can be plotted in a graph over time: y_{t} and ŷ_{t} are shown with error bars of ±2SE( e_{t}) centered on ŷ_{t} (that is, an approximate 95% confidence interval for the 1step forecast); e_{t} are the forecast errors.
Plots the histogram of the standardized residuals û_{t}/√(T^{1}RSS), t=1,...,T, the estimated density f_{u}(^{.})̂ and a normal distribution with the same mean and variance (more details are in §16.10).
This plots the residual autocorrelations using û_{t} as the x_{t} variable in (eq:18.13).
This plots the partial autocorrrelation function (see §16.6)the same graph is used if the ACF is selected.
If available, the individual Chow χ^{2}(1) tests (see §eq:17.24) are be plotted.
( û_{t}) over t;
This plots the estimated spectral density (see §16.9) using û_{t} as the x_{t} variable.
Shows a QQ plot of the residuals, see §16.11.
The nonparametrically estimated density f_{u}(^{.})̂ of the standardized residuals û_{t}/√(T^{1}RSS), t=1,...,T is graphed using the settings described in the OxMetrics book.
This plots the histogram of the standardized residuals û_{t}/√(T^{1}RSS), t=1,...,Tthe same graph is used if the density is selected.
Plots the distribution based on the nonparametrically estimated density.
The residuals can be saved to the database for further inspection.
Recursive methods estimate the model at each t for t=M1,...,T. The output generated by the recursive procedures is most easily studied graphically, possibly using the facility to view multiple graphs together on screen. The dialog has a facility to write the output to the editor, instead of graphing it. The recursive estimation aims to throw light on the relative future information aspect (that is, parameter constancy).
Let β̂_{t} denote the k parameters estimated from a sample of size t, and y_{j}x_{j}'β̂_{t} the residuals at time j evaluated at the parameter estimates based on the sample 1,...,t (for RNLS the residuals are y_{j}f(x_{j},β̂_{t})).
We now consider the generated output:
The graph shows β̂_{it}±2SE(β̂_{it}) for each selected coefficient i ( i=1,...,k) over t=M,...,T.
β̂_{it}/SE(β̂_{it}) for each selected coefficient i ( i=1,...,k) over t=M,...,T.
The residual sum of squares at each t is RSS_{t}=∑_{j=1}^{t}(y_{j}x_{j}'β̂_{t})^{2} for t=M,...,T.
The 1step residuals y_{t}x_{t}'β̂_{t} are shown bordered by 0±2σ̂_{t} over M,...,T. Points outside the 2 standarderror region are either outliers or are associated with coefficient changes.
The standardized innovations (or standardized recursive residuals) for RLS are:
ν_{t}=(y_{t}x_{t}'β̂_{t1})/(ω_{t}) ^{1/2} where ω_{t}=1+x_{t}'( X_{t1}'X_{t1}) ^{1}x_{t} for t=M,...,T.
σ^{2}ω_{t} is the 1step forecast error variance of (eq:17.20), and β̂_{M1} are the coefficient estimates from the initializing OLS estimation.
1step forecast tests are F( 1,tk1) under the null of constant parameters, for t=M,...,T. A typical statistic is calculated as:


Normality of y_{t} is needed for this statistic to be distributed as an F.
Breakpoint Ftests are F( Tt+1,tk1) for t=M,...,T. These are, therefore, sequences of Chow tests and are also called N↓ because the number of forecasts goes from N=TM+1 to 1. When the forecast period exceeds the estimation period, this test is not necessarily optimal relative to the covariance test based on fitting the model separately to the split samples. A typical statistic is calculated as:


This test is closely related to the CUSUMSQ statistic in Brown, Durbin, and Evans (1975).
Forecast Ftests are F( tM+1,Mk1) for t=M,...,T, and are called N↑ as the forecast horizon increases from M to T. This tests the model over 1 to M1 against an alternative which allows any form of change over M to T. A typical statistic is calculated as:


The statistics in (eq:18.1)(eq:18.3) are variants of Chow (1960) tests: they are scaled by 1off critical values from the Fdistribution at any selected probability level as an adjustment for changing degrees of freedom, so that the significant critical values become a straight line at unity. Note that the first and last values of (eq:18.1) respectively equal the first value of (eq:18.3) and the last value of (eq:18.2).
The Chow test statistics are not calculated for RIVE/RML; the recursive RSS is not available for RML.
The general class of models estimable in PcGive can be written in the form:


where b_{0}( L) and the b_{i}( L) are polynomials in the lag operator L. Now q+1 is the number of distinct variables (one of which is y_{t}), whereas k remains the number of estimated coefficients. For simplicity we take all polynomials to be of length m:
b_{i}( L) =∑_{j=0}^{m}b_{ij}L^{j}, i=0,...,q. 
With b_{00}=1 and using a(L)=∑_{j=1}^{m}b_{0j}L^{j1} we can write (eq:18.4) as:


Finally, we use a=(b_{01},...,b_{0m})' and b_{i}=(b_{i0},...,b_{im}), i=1,...,q.
In its unrestricted mode of operation, PcGive can be visualized as analyzing the polynomials involved, and it computes such functions as their roots and sums. This option is available if a general model was initially formulated, and provided OLS or IVE was selected.
When working with dynamic models, concepts such as equilibrium solutions, steadystate growth paths, mean lags of response etc. are generally of interest. In the simple model:


where all the variables are stationary, a static equilibrium is defined by:
E[ z_{t}] =z^{*} for all t 
in which case, E[ y_{t}] =y^{*} will also be constant if α_{1}<1, and y_{t} will converge to:


For nonstationary but cointegrated data, reinterpret expression (eq:18.7) as E[ y_{t}Kz_{t}] =0.
PcGive computes estimates of K and associated standard errors. These are called static longrun parameters. If b_{0}( 1) ≠0, the general longrun solution of (eq:18.4) is given by:


The expression y_{t}ΣK_{i}z_{it} is called the equilibriumcorrection mechanism (ECM) and can be stored in the data set. If commonfactor restrictions of the form b_{j}( L) =α( L) γ_{j}( L) , j=0,...,q are imposed, then α( 1) will cancel, hence enforced autoregressive error representations have no impact on derived longrun solutions.
The standard errors of K̂=( K̂_{1}...K̂_{q}) ' are calculated from:


PcGive calculates J analytically using the algorithm proposed by Bårdsen (1989).
PcGive outputs the solved static longrun equation, with standard errors of the coefficients. This is followed by a Wald test of the null that all of the longrun coefficients are zero (except the constant term). The V[K̂]̂ matrix is printed when `covariance matrix of estimated coefficients' is checked under the model options.
The b̂_{i}( L) , i=0,...,q of (eq:18.4) and their standard errors are reported in tabular form with the b̂_{i}( 1) (their row sums) and associated standard errors.
The first column contains Ftests of each of the q+1 hypotheses:
H_{v0}:a=0; H_{vi}:b_{i}=0 for i=1,...,q. 
These test the significance of each basic variable in turn. The final column gives the PcGive unitroot tests:
H_{ui}:b_{i}( 1) =0 for i=0,...,q. 
If H_{ui}: b_{i}( 1) =0 cannot be rejected, there is no significant longrun level effect from z_{it}; if H_{vi}: b_{i}=0 cannot be rejected, there is no significant effect from z_{it} at any (included) lag. Significance is marked by ^{*} for 5% and ^{**} for 1%. Critical values for the PcGive unitroot test (H_{u0}: b_{0}( 1) =0) are based on Ericsson and MacKinnon (2002). For the unitroot test, only significance of the dependent variable is reported (not the remaining variables!),
Conflicts between the tests' outcomes are possible in small samples.
Note that b_{i}( 1) =0 and b_{i}=0 are not equivalent; testing K_{i}=0 is different again. Using (eq:18.6) we can show the relevant hypotheses:

Ftests of each lag length are shown, beginning at the longest ( m) and continuing down to 1. The test of the longest lag is conditional on keeping lags ( 1,...,m1) , that of ( m1) is conditional on ( 1,...,m2,m) etc.
Finally, Ftests of all lags up to m are shown, beginning at the longest ( 1,...,m) and continuing further from ( 2,...,m) down to ( m,...,m) . These tests are conditional on keeping no lags, keeping lag 1, down to keeping ( 1,...,m1) . Thus, they show the marginal significance of all longer lags.
COMFAC tests
for the legitimacy of commonfactor restrictions of the
form:


where α( L) is of order r and ^{*} denotes polynomials of the original order minus r. The degrees of freedom for the Wald tests for COMFAC are equal to the number of restrictions imposed by α( L) and the Wald statistics are asymptotically χ^{2} with these degrees of freedom if the COMFAC restrictions are valid. It is preferable to use the incremental values obtained by subtracting successive values of the Wald tests. These are χ^{2} also, with degrees of freedom given by the number of additional criteria. Failure to reject commonfactor restrictions does not entail that such restrictions must be imposed. For a discussion of the theory of COMFAC, see Hendry and Mizon (1978) for some finitesample Monte Carlo evidence see Mizon and Hendry (1980). COMFAC is not available for RALS.
When the minimum order of lag length in the b_{i}( L) is unity or larger (m say), the Wald test sequence for 1,2,...,m common factors is calculated. Variables that are redundant when lagged (Constant, Seasonals, Trend) are excluded in conducting the Wald test sequence since they always sustain a commonfactor interpretation.


With α_{1}<1 this can be written as:
y_{t}=w( L) z_{t}+v_{t}, 
when:
w( L) =( β_{0}+β_{1}L) /( 1α_{1}L) =( β_{0}+β_{1}L) ( 1+α_{1}L+α_{1}^{2}L^{2}+...) . 
Starting from an equilibrium z^{*} at t=0, a oneoff increment of δ to z^{*} has an impact on y^{*} at t=0,1,2,... of w_{0}δ, w_{1}δ, w_{2}δ, w_{3}δ,... with the ws defined by equating coefficients of powers of L as:
w_{0}=β_{0}, w_{1}=β_{1}+β_{0}α_{1}, w_{2}=α_{1}w_{1}, w_{3}=α_{1}w_{2},... 
PcGive can graph the normalized lag weights w_{0}/w( 1) , w_{1}/w( 1) ,..., w_{s}/w( 1) and the cumulative normalized lag weights w_{0}/w( 1) , ( w_{0}+w_{1}) /w( 1) ,..., ( w_{0}+...+w_{s}) /w( 1) .
Lag weights are available for models estimated by OLS or IVE.
Static forecasts, §17.2.17, can only be made ex post: only observed data is used in the construction of the static forecasts. Genuine forecasts can be made ex ante, using past data only. In a dynamic model this means that the future values of the lagged dependent variable are also forecasts. Moreover, other regressors must be known or extrapolated into the forecast period.
Suppose we estimated a simple autoregressive model with just a mean:
ŷ_{t} = α̂ y_{t1}+μ̂, 
with the parameters estimated over the sample 1,...,T. Then the first forecast is the same as the static forecast:
ŷ_{T+1T} = α̂ y_{T}+μ̂. 
The second forecast is a dynamic forecast:
ŷ_{T+2T} = α̂ ŷ_{T+1T}+μ̂. 
When there are additional regressors in the model:
ŷ_{t} = α̂ y_{t1}+μ̂ +x_{t}'β̂, 
the forecast at T+h needs x_{T+h}. This is readily available for deterministic regressors such as the intercept, seasonals, and trend. Otherwise it has to be constructed, or the model changed into a multivariate model that is entirely closed. The standard errors of the forecast need to take into account that the lagged dependent variables themselves are forecasts. The econometrics of this is discussed in Volume II (Doornik and Hendry, 2013c). Extensive treatments of forecasting can be found in Clements and Hendry (1998) and Clements and Hendry (2011).
If the dynamic forecasts are made ex post, lagged dependent variables remain forecasted values (and not the actual values, eventhough they are known). However, in that case all other regressors are actual values. Moreover, forecast errors can then be computed, with forecast accuracy expressed in terms of mean absolute percentage error (MAPE) and root mean square error (RMSE):
RMSE = [ 1/H ∑_{h=1}^{H} (y_{T+h}ŷ_{T+hT})^{2}]^{1/2}, 
and
MAPE = 100/H ∑_{h=1}^{H}  
 . 
There is a choice between dynamic forecasts (the default) and static forecasts. Static or 1step forecasts can be obtained by selecting hstep forecasts and setting h=1. Selecting a larger h uses forecasted y's up to lag h1, but actual ys from lag h onwards.
The default is to base the standard errors on the error variance only, thus ignoring the contribution from the fact that the parameters are estimated and so uncertain. It is possible to take the parameter uncertainty into account, but this is usually small relative to the error uncertainty.
Hedgehog plots graph the forecasts starting from every point in the estimation sample. They are called hedgehog plots, because they often look like that, with all forecast paths spiking upwards (or downwards for an inverted hedgehog).
If H is the forecast horizon, then one forecast path is:
ŷ_{t+1t}, ŷ_{t+2t}, ..., ŷ_{t+Ht}, 
starting at observation t+1 and using the estimated parameters from the full sample 1,...,T. The hedgehog plot graphs all path for t=s,...,T.
After recursive estimation, the hedgehog plot uses recursively estimated parameters. In that case the forecast path ŷ_{t+1t}, ..., ŷ_{t+Ht} uses parameters estimated over 1,...,t.
The hedgehog graphs are displayed in the Hedgehog window. If robust forecasts are requested, these will appear in Hedgehog  robust.
Optionally, a gap G can be specified to delay forecasting (this does not affect the hedgehog graphs). For the simple AR(1) model:

When new data is available, we can now compare the existing model that starts forecasting from the new data, to the reestimated model that incorporates the new data.
Robust forecasts take the differenced model, forecast, and then reintegrate. If the estimated model is:
ŷ_{t} = α̂(L) y_{t}+μ̂ +x_{t}'β̂, 
then after differencing:
Δŷ_{t} = α̂(L) Δy_{t}+Δx_{t}'β̂, 
we obtain dynamic forecasts of the differences:
Δŷ_{T+1T}, ..., Δŷ_{T+HT}. 
Reintegration gives:

The estimated intercept disappears in the differencing, and instead we use the most recent level (similarly, a trend becomes an intercept, which is then reintegrated, etc.). If there was a recent break in the mean, the forecasts using full sample mean will be less accurate than using the most recent level. Therefore the forecasts from the differenced model are robust to breaks, at least to some extent. The price to pay in the absence of breaks is that the forecasts will be more noisy.
Another form of robust forecasting is the doubledifferenced device (DDD). The DDD is based on the observation that most economic time series do not continuously accelerate. It amounts to setting the second differences (of the logarithms) to zero, so no estimation is involved. This can be achieved in PcGive by creating ΔΔy_{t} in the database, and then formulating an empty model for this. An alternative would be to use ΔΔ_{S} y_{t} when there is seasonality and the data frequency is S. More information is in Clements and Hendry (2011).
Models for economic variables are often formulated in terms of growth rates: denoting the level by Y_{t}, the dependent variable is then the first difference of the logarithm: y_{t} = Δ log Y_{t}. The objective of the transformation is to model a (approximately) stationary representation of the dependent variable. But, when it comes to forecasting, it is often useful to be able to present the results in the original levels.
PcGive automatically recognizes the following dynamic transformations of the dependent variable:
Δy_{t}, Δ_{S} y_{t}, ΔΔy_{t}, ΔΔ_{S} y_{t}, y_{t} (undifferenced), 
where S is the frequency of the data. Lagged dependent variables are taken into account, as are Δy_{t}, Δ_{S} y_{t} if they appear on the righthand side in a model for a higher difference.
In addition, the following functional transformations are detected:
log Y_{t}, logit Y_{t} = log 
 , Y_{t} (untransformed), 
together with an optional scale factor.
If the model fits in this mould, the levels forecasts can be automatically generated by PcGive. First, the dynamic transformations are substituted out to give forecasts
ŷ_{T+1T}, ..., ŷ_{T+HT}, 
with corresponding standard deviations
ŝ_{T+1T}, ..., ŝ_{T+HT}. 
Because the differenced model assumes normality, these are still normally distributed. Removing one level of differences makes the standard errors grow linearly, etc.
There are two types of level forecasts, median and mean, which are identical if no functional transformation is used. They differ, however, for logarithmic transformations:

For the lognormal, when y_{T+hT}~N[ŷ_{T+hT},ŝ^{2}_{T+hT}] then
E[Y_{T+hT}] = exp (ŷ_{T+hT} + ½ŝ^{2}_{T+hT}). 
The equivalent expression for the logitnormal can be found in Johnson (1949, eqn.56) and is
not quite so simple.
The quantiles of the log and logisticnormal are simply derived from the inverse distribution. This is used in the plots for the 5% and 95% confidence bands:

These bands will not be symmetric around the mean (or median) forecasts.
The standard errors of the level forecasts are also reported. In the lognormal case these are:
sd[Y_{T+hT}] = exp (ŷ_{T+hT} + ½ŝ^{2}_{T+hT}) ( exp s^{2}_{T+hT} 1)^{1/2}. 
For the logitnormal distribution we refer to Johnson (1949, eqn.58).
OxMetrics Algebra expressions can be used for derived functions. E.g. the cum() function, together with the appropriate initial conditions maps back from a first difference, and exponents from logarithms. In this case, the forecast standard errors are derived numerically.
Irrespective of the estimator selected, a wide range of diagnostic tests is offered. Tests are available for residual autocorrelation, conditional heteroscedasticity, normality, unconditional heteroscedasticity/functional form misspecification and omitted variables. Recursive residuals can be used if these are available. Tests for common factors and linear restrictions are discussed in §18.3.4 and §18.6 below, encompassing tests in §18.10. Thus, relating this section to the earlier information taxonomy , the diagnostic tests of this section concern the past (checking that the errors are a homoscedastic, normal, innovation process relative to the information available), whereas the forecast statistics discussed in Chapter 17 concern the future and encompassing tests concern information specific to rival models.
Many test statistics in PcGive have either a χ^{2} distribution or an F distribution. Ftests are usually reported as:
F(num,denom) = Value [Probability] /*/**
for example:
F(1, 155) = 5.0088 [0.0266] *
where the test statistic has an Fdistribution with one degree of freedom in the numerator, and 155 in the denominator. The observed value is 5.0088, and the probability of getting a value of 5.0088 or larger under this distribution is 0.0266. This is less than 5% but more than 1%, hence the star. Significant outcomes at a 1% level are shown by two stars. χ^{2} tests are also reported with probabilities, as for example:
Normality Chi^2(2) = 2.1867 [0.3351]
The 5% χ^{2} critical values with two degrees of freedom is 5.99, so here normality is not rejected (alternatively, Prob(χ^{2}≥ 2.1867) = 0.3351, which is more than 5%). Details on the computation of probability values and quantiles for the F and χ^{2} tests are given under the probf, probchi, quanf and quanchi functions in the Ox reference manual (Doornik, 2013).
Some tests take the form of a likelihood ratio (LR) test. If l is the unrestricted, and l_{0} the restricted loglikelihood, then 2(l_{0}l) has a χ^{2}(s) distribution, with s the number of restrictions imposed (so model l_{0} is nested in l).
Many diagnostic tests are calculated through an auxiliary regression. For singleequation tests, they take the form of TR^{2} for the auxiliary regression so that they are asymptotically distributed as χ^{2}( s) under their nulls, and hence have the usual additive property for independent χ^{2}s. In addition, following Harvey (1990) and Kiviet (1986), Fapproximations are calculated because they may be better behaved in small samples:


When the covariance matrix is block diagonal between regression and heteroscedasticity (or ARCH) function parameters, tests can take the regression parameters as given, see Davidson and MacKinnon (1993, Ch. 11):
 . 
 ~F( s,Ts). 
This may be slightly different if not all parameters are included in the test, or when observations are lost in the construction of the test.
The sample autocorrelation function (ACF) of a variable x_{t} is the series
{r_{j}} where r_{j} is the correlation coefficient between
x_{t} and x_{tj} for j = 1,...,s:


Here x= 1/T ∑_{t=j}^{T}x_{t} is the sample mean of x_{t}.
The residual correlogram is defined as above, but using the residuals from the econometric regression, rather than the data. Thus, this reports the series {r_{j}} of correlations between the residuals û_{t} and û_{tj}. In addition, PcGive prints the partial autocorrelation function (PACF) (see the OxMetrics book).
It is possible to calculate a statistic based on `T*(sum of s squared autocorrelations)', with s the length of the correlogram, called the Portmanteau statistic:


This is corresponds to Box and Pierce (1970), but with a degrees of freedom correction as suggested by Ljung and Box (1978). It is designed as a goodnessoffit test in stationary, autoregressive movingaverage models. Under the assumptions of the test, LB(s) is asymptotically distributed as χ^{2}(sn) after fitting an AR(n) model. A value such that LB( s) ≥2s is taken as indicative of misspecification for large s. However, small values of such a statistic should be treated with caution since residual autocorrelations are biased towards zero (like DW) when lagged dependent variables are included in econometric equations. An appropriate test for residual autocorrelation is provided by the LM test in §18.5.3 below.
This is a test for autocorrelated residuals and is calculated as:


DW is most powerful as a test of {u_{t}} being white noise against:
u_{t}=ρu_{t1}+ε_{t} where ε_{t}~IID( 0,σ_{ε} ^{2}) . 
If 0<DW<2, then the null hypothesis is H_{0}: ρ=0, that is, zero autocorrelation (so DW=2) and the alternative is H_{1}: ρ>0, that is, positive firstorder autocorrelation.
If 2<DW<4, then H_{0}: ρ=0 and H_{1}: ρ<0, in which case DW^{*}=4DW should be computed.
The significance values of DW are widely recorded in econometrics' textbooks. However, DW is a valid statistic only if all the x_{t} variables are nonstochastic, or at least strongly exogenous. If the model includes a lagged dependent variable, then DW is biased towards 2, that is, towards not detecting autocorrelation, and Durbin's htest (see Durbin, 1970) or the equivalent LMtest for autocorrelation in §18.5.3 should be used instead. For this reason, we largely stopped reporting the DW statistic. Also see §16.4.
This is the Lagrangemultiplier test for r^{th} order residual autocorrelation, distributed as χ^{2}( r) in large samples, under the null hypothesis that there is no autocorrelation (that is, that the errors are white noise). In standard usage, r~= 1/2 s for s in §18.5.2 above, so this provides a type of Portmanteau test (see Godfrey, 1978). However, any orders from 1 up to 12 can be selected to test against:
u_{t}=∑_{i=p}^{r}α_{i}u_{ti}+ε_{t} where 0≤p≤r. 
As noted above, the Fform suggested by Harvey (1981, see Harvey, 1990) is the recommended diagnostic test. Following the outcome of the Ftest (and its pvalue), the error autocorrelation coefficients are recorded. For an autoregressive error of order r to be estimated by RALS, these LM coefficients provide good initial values, from which the iterative optimization can be commenced. The LM test is calculated by regressing the residuals on all the regressors of the original model and the lagged residuals for lags p to r (missing residuals are set to zero). The LM test χ^{2}(rp+1) is TR^{2} from this regression (or the Fequivalent), and the error autocorrelation coefficients are the coefficients of the lagged residuals. For an excellent exposition, see Pagan (1984).
Let μ, σ_{x}^{2} denote the mean and variance of {x_{t}}, and write μ_{i}=E[ x_{t}μ] ^{i}, so that σ_{x}^{2}=μ_{2}. The skewness and kurtosis are defined as:


Sample counterparts are defined by


A normal variate will have √β_{1}=0 and β_{2}=3. Bowman and Shenton (1975) consider the test:


which subsequently was derived as an LM test by Jarque and Bera (1987). Unfortunately e_{1} has rather poor small sample properties: √b_{1} and b_{2} are not independently distributed, and the sample kurtosis especially approaches normality very slowly. The test reported by PcGive is based on Doornik and Hansen (1994), who employ a small sample correction, and adapt the test for the multivariate case. It derives from Shenton and Bowman (1977), who give b_{2} (conditional on b_{2}>1+b_{1}) a gamma distribution, and D'Agostino (1970), who approximates the distribution of √b_{1} by the Johnson S_{u} system. Let z_{1} and z_{2} denote the transformed skewness and kurtosis, where the transformation creates statistics which are much closer to standard normal. The test statistic is:


Table Table:18.1 compares (eq:18.19) with its asymptotic form (eq:18.18). It gives the rejection frequencies under the null of normality, using χ^{2}(2) critical values. The experiments are based on 10000 replications and common random numbers.
nominal probabilities of e_{2}  nominal probabilities of (eq:18.18)  
T  20%  10%  5%  1%  20%  10%  5%  1% 
50  0.1734  0.0869  0.0450  0.0113  0.0939  0.0547  0.0346  0.0175 
100  0.1771  0.0922  0.0484  0.0111  0.1258  0.0637  0.0391  0.0183 
150  0.1845  0.0937  0.0495  0.0131  0.1456  0.0703  0.0449  0.0188 
250  0.1889  0.0948  0.0498  0.0133  0.1583  0.0788  0.0460  0.0180 
PcGive reports the following statistics under the normality test option, replacing x_{t} by the residuals u_{t}:
mean  x 
standard deviation  σ_{x}=(m_{2})^{½} 
skewness  √b_{1} 
excess kurtosis  b_{2}3 
minimum  
maximum  
asymptotic test  e_{1} 
normality testχ^{2}( 2)  e_{2} [ P( χ^{2}( 2) ≥e_{2}) ] 
This test is based on White (1980), and involves an auxiliary regression of {û_{t}^{2}} on the original regressors ( x_{it}) and all their squares (x_{it}^{2}). The null is unconditional homoscedasticity, and the alternative is that the variance of the {u_{t}} process depends on x_{t} and on the x_{it}^{2}. The output comprises TR^{2}, the Ftest equivalent, the coefficients of the auxiliary regression, and their individual tstatistics, to help highlight problem variables. Variables that are redundant when squared are automatically removed, as are observations that have a residual that is (almost) zero. Some additional information can be found in Volume II.
This test is that of White (1980), and only calculated if there is a large number of observations relative to the number of variables in the regression. It is based on an auxiliary regression of the squared residuals ( û_{t}^{2}) on all squares and crossproducts of the original regressors (that is, on r=½k( k+1) variables). That is, if T>>k( k+1) , the test is calculated; redundant variables are automatically removed, as are observations that have a residual that is (almost) zero. The usual χ^{2} and Fvalues are reported; coefficients of the auxiliary regression are also shown with their tstatistics to help with model respecification. This is a general test for heteroscedastic errors: H_{0} is that the errors are homoscedastic or, if heteroscedasticity is present, it is unrelated to the xs.
In previous versions of PcGive this test used to be called a test for functional form misspecification. That terminology was criticized by Godfrey and Orme (1994), who show that the test does not have power against omitted variables.
This is the ARCH (AutoRegressive Conditional Heteroscedasticity) test: see Engle, 1982) which in the present form tests the hypothesis γ=0 in the model:
E[ u_{t}^{2}u_{t1},...,u_{tr}] =c_{0}+∑_{i=1}^{r}γ_{i}u_{ti}^{2} 
where γ=( γ_{1},...,γ_{r}) '. Again, we have TR^{2} as the χ^{2} test from the regression of û_{t}^{2} on a constant and û_{t1}^{2} to û_{tr}^{2} (called the ARCH test) which is asymptotically distributed as χ^{2}( r) on H_{0}: γ=0. The Fform is also reported. Both firstorder and higherorder lag forms are easily calculated (see Engle, 1982, and Engle, Hendry, and Trumbull, 1985).
The RESET test (Regression Specification Test) due to Ramsey (1969) tests the null of correct specification of the original model against the alternative that powers of ŷ_{t} such as (ŷ_{t}^{2}, ŷ_{t}^{3}...) have been omitted (PcGive only allows squares). This tests to see if the original functional form is incorrect, by adding powers of linear combinations of xs since by construction, ŷ_{t}=x_{t}^{'}β̂_{t}.
We use RESET23 for the test that uses squares and cubes, while RESET refers to the test just using squares.
Parameter instability statistics are reported for σ^{2}, followed by the joint statistic for all the parameters in the model (also see §18.5.9), based on the approach in Hansen (1992). Next, the instability statistic is printed for each parameter ( β_{1},...,β_{k},σ^{2}).
Large values reveal nonconstancy (marked by ^{*} or ^{**}), and indicate a fragile model. Note that this measures withinsample parameter constancy, and is computed if numerically feasible (it may fail owing to dummy variables), so no observations need be reserved. The indicated significance is only valid in the absence of nonstationary regressors.
The LM tests for autocorrelation, heteroscedasticity and functional form require an auxiliary regression involving the original regressors x_{it}. NLS uses ∂f(x_{t},θ)/∂θ_{i} (evaluated at θ̂) instead. The auxiliary regression for the autocorrelation test is:


These three tests are not computed for models estimated using ML.
Writing the model in matrix form as y=Xβ+u, the null hypothesis of p linear restrictions can be expressed as H_{0} : Rβ=r, with R a (p×k) matrix and r a p×1 vector. This test is well explained in most econometrics textbooks, and uses the unrestricted estimates (that is, it is a Wald test).
The subset form of the linear restrictions tests is: H_{0}: β_{i}=...=β_{j}=0: any choice of coefficients can be made, so a wide range of specification hypothesis can be tested.
Writing θ̂ =β̂, with corresponding variancecovariance matrix V[ θ̂ ] , we can test for (non) linear restrictions of the form:
f( θ) =0. 
The null hypothesis H_{0}:f(θ)=0 will be tested against H_{1}:f(θ)≠0 through a Wald test:
w=f( θ̂ ) '( ĴV[ θ̂ ] ̃Ĵ^{'}) ^{1}f( θ̂ ) 
where J is the Jacobian matrix of the transformation: J=∂f(θ)/∂θ'. PcGive computes Ĵ by numerical differentiation. The statistic w has a χ^{2}(s) distribution, where s is the number of restrictions (that is, equations in f(^{.})). The null hypothesis is rejected if we observe a significant test statistic.
Lag polynomials of any variable in the database can be tested for omission. Variables that would change the sample or are already in the model are automatically deleted. The model itself remains unchanged. If the model is written in matrix form as y=Xβ+Zγ+u, then H_{0}: γ=0 is being tested. The test exploits the fact that on H_{0}:




for p added variables.
Since ( X'X) ^{1} is precalculated, the Fstatistic is easily computed by partitioned inversion. Computations for IVE are more involved.
Finally, PcGive has specific procedures programmed to operate when a
generaltospecific mode is adopted.
In PcGive, when a model is specified and estimated by least squares or instrumental variables, then the general dynamic analysis is offered: see §18.3.
However, while the tests offered are a comprehensive set of Wald statistics on variables, lags and longrun outcomes, a reduction sequence can involve many linear transformations (differencing, creating differentials etc.) as well as eliminations. Consequently, as the reduction proceeds, PcGive monitors its progress, which can be reviewed at the progress menu. The main statistics reported comprise:
Once appropriate data representations have been selected, it is of interest
to see whether the chosen model can explain (that is, account for) results
reported by other investigators. Often attention has focused on the ability
of chosen models to explain each other's residual variances (variance
encompassing), and PcGive provides the facility for doing so using test
statistics based on Cox (1961) as suggested by Pesaran (1974). Full
details of those computed by PcGive for OLS and IVE are provided in Ericsson (1983). Note that a badlyfitting model should be rejected against
wellfitting models on such tests, and that care is required in interpreting
any outcome in which a wellfitting model (which satisfies all of the other
criteria) is rejected against a
badlyfitting, or silly, model (see Mizon, 1984, Mizon and Richard, 1986,
and Hendry and Richard, 1989). The Sargan test is for the restricted reduced form
parsimoniously encompassing the unrestricted reduced form, which is implicitly defined by
projecting y_{t} on all of the nonmodelled variables. The Ftest
is for each model parsimoniously encompassing
their union. This is the only one of these tests which is invariant to the
choice of common regressors in the two models.
Thus, the Ftest yields the same numerical outcome for the first model parsimoniously encompassing either the union of the two models under consideration, or the orthogonal complement to the first model relative to the union. In PcGive, tests of both models encompassing the other are reported.
Ahumada, H. (1985). An encompassing test of two models of the balance of trade for Argentina. Oxford Bulletin of Economics and Statistics 47, 5170.
Alexander, C. (2001). Market Models: A Guide to Financial Data Analysis. Chichester: John Wiley and Sons.
Amemiya, T. (1981). Qualitative response models: A survey. Journal of Economic Literature 19, 14831536.
Amemiya, T. (1985). Advanced Econometrics. Oxford: Basil Blackwell.
Anderson, T. W. (1971). The Statistical Analysis of Time Series. New York: John Wiley & Sons.
Andrews, D. W. K. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817858.
Baba, Y., D. F. Hendry, and R. M. Starr (1992). The demand for M1 in the U.S.A., 19601988. Review of Economic Studies 59, 2561.
Banerjee, A., J. J. Dolado, J. W. Galbraith, and D. F. Hendry (1993). Cointegration, Error Correction and the Econometric Analysis of NonStationary Data. Oxford: Oxford University Press.
Banerjee, A., J. J. Dolado, D. F. Hendry, and G. W. Smith (1986). Exploring equilibrium relationships in econometrics through static models: Some Monte Carlo evidence. Oxford Bulletin of Economics and Statistics 48, 253277.
Banerjee, A., J. J. Dolado, and R. Mestre (1998). Errorcorrection mechanism tests for cointegration in a single equation framework. Journal of Time Series Analysis 19, 267283.
Banerjee, A. and D. F. Hendry (Eds.) (1992). Testing Integration and Cointegration. Oxford Bulletin of Economics and Statistics: 54.
Banerjee, A. and D. F. Hendry (1992). Testing integration and cointegration: An overview. Oxford Bulletin of Economics and Statistics 54, 225255.
Bårdsen, G. (1989). The estimation of long run coefficients from error correction models. Oxford Bulletin of Economics and Statistics 50.
Bentzel, R. and B. Hansen (1955). On recursiveness and interdependency in economic models. Review of Economic Studies 22, 153168.
Bollerslev, T., R. S. Chou, and K. F. Kroner (1992). ARCH modelling in finance  A review of the theory and empirical evidence. Journal of Econometrics 52, 559.
Bontemps, C. and G. E. Mizon (2003). Congruence and encompassing. In B. P. Stigum (Ed.), Econometrics and the Philosophy of Economics, pp. 354378. Princeton: Princeton University Press.
Bowman, K. O. and L. R. Shenton (1975). Omnibus test contours for departures from normality based on √b_{1} and b_{2}. Biometrika 62, 243250.
Box, G. E. P. and G. M. Jenkins (1976). Time Series Analysis, Forecasting and Control. San Francisco: HoldenDay. First published, 1970.
Box, G. E. P. and D. A. Pierce (1970). Distribution of residual autocorrelations in autoregressiveintegrated moving average time series models. Journal of the American Statistical Association 65, 15091526.
Breusch, T. S. and A. R. Pagan (1980). The Lagrange multiplier test and its applications to model specification in econometrics. Review of Economic Studies 47, 239253.
Brown, R. L., J. Durbin, and J. M. Evans (1975). Techniques for testing the constancy of regression relationships over time (with discussion). Journal of the Royal Statistical Society B 37, 149192.
Campos, J., N. R. Ericsson, and D. F. Hendry (1996). Cointegration tests in the presence of structural breaks. Journal of Econometrics 70, 187220.
Chambers, E. A. and D. R. Cox (1967). Discrimination between alternative binary response models. Biometrika 54, 573578.
Chow, G. C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica 28, 591605.
Clements, M. P. and D. F. Hendry (1998). Forecasting Economic Time Series. Cambridge: Cambridge University Press.
Clements, M. P. and D. F. Hendry (1999). Forecasting Nonstationary Economic Time Series. Cambridge, Mass.: MIT Press.
Clements, M. P. and D. F. Hendry (2011). Forecasting from misspecified models in the presence of unanticipated location shifts. See ClementsHendry(2011), pp. 271314.
Clements, M. P. and D. F. Hendry (Eds.) (2011). The Oxford Handbook of Economic Forecasting. Oxford: Oxford University Press.
Cochrane, D. and G. H. Orcutt (1949). Application of least squares regression to relationships containing autocorrelated error terms. Journal of the American Statistical Association 44, 3261.
Cox, D. R. (1961). Tests of separate families of hypotheses. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, Berkeley, pp. 105123. University of California Press.
Cramer, J. S. (1986). Econometric Applications of Maximum Likelihood Methods. Cambridge: Cambridge University Press.
D'Agostino, R. B. (1970). Transformation to normality of the null distribution of g_{1}. Biometrika 57, 679681.
Davidson, J. E. H., D. F. Hendry, F. Srba, and J. S. Yeo (1978). Econometric modelling of the aggregate timeseries relationship between consumers' expenditure and income in the United Kingdom. Economic Journal 88, 661692. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Davidson, R. and J. G. MacKinnon (1993). Estimation and Inference in Econometrics. New York: Oxford University Press.
Dickey, D. A. and W. A. Fuller (1981). Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 49, 10571072.
Doan, T., R. Litterman, and C. A. Sims (1984). Forecasting and conditional projection using realistic prior distributions. Econometric Reviews 3, 1100.
Doornik, J. A. (2007). Autometrics. Mimeo, Department of Economics, University of Oxford.
Doornik, J. A. (2008). Encompassing and automatic model selection. Oxford Bulletin of Economics and Statistics 70, 915925.
Doornik, J. A. (2009). Autometrics. In J. L. Castle and N. Shephard (Eds.), The Methodology and Practice of Econometrics: Festschrift in Honour of David F. Hendry. Oxford: Oxford University Press.
Doornik, J. A. (2013). ObjectOriented Matrix Programming using Ox (7th ed.). London: Timberlake Consultants Press.
Doornik, J. A. and H. Hansen (1994). A practical test for univariate and multivariate normality. Discussion paper, Nuffield College.
Doornik, J. A. and D. F. Hendry (1992). PCGIVE 7: An Interactive Econometric Modelling System. Oxford: Institute of Economics and Statistics, University of Oxford.
Doornik, J. A. and D. F. Hendry (1994). PcGive 8: An Interactive Econometric Modelling System. London: International Thomson Publishing, and Belmont, CA: Duxbury Press.
Doornik, J. A. and D. F. Hendry (2013a). Econometric Modelling using PcGive: Volume III (4th ed.). London: Timberlake Consultants Press.
Doornik, J. A. and D. F. Hendry (2013b). Interactive Monte Carlo Experimentation in Econometrics Using PcNaive (3rd ed.). London: Timberlake Consultants Press.
Doornik, J. A. and D. F. Hendry (2013c). Modelling Dynamic Systems using PcGive: Volume II (5th ed.). London: Timberlake Consultants Press.
Doornik, J. A. and D. F. Hendry (2013d). OxMetrics: An Interface to Empirical Modelling (7th ed.). London: Timberlake Consultants Press.
Doornik, J. A. and M. Ooms (2006). Introduction to Ox (2nd ed.). London: Timberlake Consultants Press.
Durbin, J. (1970). Testing for serial correlation in least squares regression when some of the regressors are lagged dependent variables. Econometrica 38, 410421.
Eicker, F. (1967). Limit theorems for regressions with unequal and dependent errors. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1, Berkeley, pp. 5982. University of California.
Eisner, R. and R. H. Strotz (1963). Determinants of Business Investment. Englewood Cliffs, N.J.: PrenticeHall.
Emerson, R. A. and D. F. Hendry (1996). An evaluation of forecasting using leading indicators. Journal of Forecasting 15, 271291. Reprinted in T.C. Mills (ed.), Economic Forecasting. Edward Elgar, 1999.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity, with estimates of the variance of United Kingdom inflation. Econometrica 50, 9871007.
Engle, R. F. (1984). Wald, likelihood ratio, and Lagrange multiplier tests in econometrics. See Gril84, Chapter 13.
Engle, R. F. and C. W. J. Granger (1987). Cointegration and error correction: Representation, estimation and testing. Econometrica 55, 251276.
Engle, R. F., D. F. Hendry, and J.F. Richard (1983). Exogeneity. Econometrica 51, 277304. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; in Ericsson, N. R. and Irons, J. S. (eds.) Testing Exogeneity, Oxford: Oxford University Press, 1994; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Engle, R. F., D. F. Hendry, and D. Trumbull (1985). Small sample properties of ARCH estimators and tests. Canadian Journal of Economics 43, 6693.
Engle, R. F. and H. White (Eds.) (1999). Cointegration, Causality and Forecasting. Oxford: Oxford University Press.
Engler, E. and B. Nielsen (2009). The empirical process of autoregressive residuals. Econometrics Journal 12, 367381.
Ericsson, N. R. (1983). Asymptotic properties of instrumental variables statistics for testing nonnested hypotheses. Review of Economic Studies 50, 287303.
Ericsson, N. R. (1992). Cointegration, exogeneity and policy analysis: An overview. Journal of Policy Modeling 14, 251280.
Ericsson, N. R. and D. F. Hendry (1999). Encompassing and rational expectations: How sequential corroboration can imply refutation. Empirical Economics 24, 121.
Ericsson, N. R. and J. S. Irons (1995). The Lucas critique in practice: Theory without measurement. In K. D. Hoover (Ed.), Macroeconometrics: Developments, Tensions and Prospects, pp. 263312. Dordrecht: Kluwer Academic Press.
Ericsson, N. R. and J. G. MacKinnon (2002). Distributions of error correction tests for cointegration. Econometrics Journal 5, 285318.
Escribano, A. (1985). Nonlinear error correction: The case of money demand in the UK (18781970). Mimeo, University of California at San Diego.
Favero, C. and D. F. Hendry (1992). Testing the Lucas critique: A review. Econometric Reviews 11, 265306.
Finney, D. J. (1947). The estimation from individual records of the relationship between dose and quantal response. Biometrika 34, 320334.
Fletcher, R. (1987). Practical Methods of Optimization, (2nd ed.). New York: John Wiley & Sons.
Friedman, M. and A. J. Schwartz (1982). Monetary Trends in the United States and the United Kingdom: Their Relation to Income, Prices, and Interest Rates, 18671975. Chicago: University of Chicago Press.
Frisch, R. (1934). Statistical Confluence Analysis by means of Complete Regression Systems. Oslo: University Institute of Economics.
Frisch, R. (1938). Statistical versus theoretical relations in economic macrodynamics. Mimeograph dated 17 July 1938, League of Nations Memorandum. Reproduced by University of Oslo in 1948 with Tinbergen's comments. Contained in Memorandum `Autonomy of Economic Relations', 6 November 1948, Oslo, Universitets Økonomiske Institutt. Reprinted in Hendry D. F. and Morgan M. S. (1995), The Foundations of Econometric Analysis. Cambridge: Cambridge University Press.
Frisch, R. and F. V. Waugh (1933). Partial time regression as compared with individual trends. Econometrica 1, 221223.
Gilbert, C. L. (1986). Professor Hendry's econometric methodology. Oxford Bulletin of Economics and Statistics 48, 283307. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.
Gilbert, C. L. (1989). LSE and the British approach to timeseries econometrics. Oxford Review of Economic Policy 41, 108128.
Godfrey, L. G. (1978). Testing for higher order serial correlation in regression equations when the regressors include lagged dependent variables. Econometrica 46, 13031313.
Godfrey, L. G. (1988). Misspecification Tests in Econometrics. Cambridge: Cambridge University Press.
Godfrey, L. G. and C. D. Orme (1994). The sensitivity of some general checks to omitted variables in the linear model. International Economic Review 35, 489506.
Golub, G. H. and C. F. Van Loan (1989). Matrix Computations. Baltimore: The Johns Hopkins University Press.
Granger, C. W. J. (1969). Investigating causal relations by econometric models and crossspectral methods. Econometrica 37, 424438.
Granger, C. W. J. (1986). Developments in the study of cointegrated economic variables. Oxford Bulletin of Economics and Statistics 48, 213228.
Granger, C. W. J. and P. Newbold (1974). Spurious regressions in econometrics. Journal of Econometrics 2, 111120.
Granger, C. W. J. and P. Newbold (1977). The time series approach to econometric model building. In C. A. Sims (Ed.), New Methods in Business Cycle Research, pp. 721. Minneapolis: Federal Reserve Bank of Minneapolis.
Granger, C. W. J. and P. Newbold (1986). Forecasting Economic Time Series, (2nd ed.). New York: Academic Press.
Gregory, A. W. and M. R. Veale (1985). Formulating Wald tests of nonlinear restrictions. Econometrica 53, 14651468.
Griliches, Z. and M. D. Intriligator (Eds.) (1984). Handbook of Econometrics, Volume 2. Amsterdam: NorthHolland.
Hansen, B. E. (1992). Testing for parameter instability in linear models. Journal of Policy Modeling 14, 517533.
Harvey, A. C. (1981). The Econometric Analysis of Time Series. Deddington: Philip Allan.
Harvey, A. C. (1990). The Econometric Analysis of Time Series, (2nd ed.). Hemel Hempstead: Philip Allan.
Harvey, A. C. (1993). Time Series Models, (2nd ed.). Hemel Hempstead: Harvester Wheatsheaf.
Harvey, A. C. and P. Collier (1977). Testing for functional misspecification in regression analysis. Journal of Econometrics 6, 103119.
Harvey, A. C. and N. G. Shephard (1992). Structural time series models. In G. S. Maddala, C. R. Rao, and H. D. Vinod (Eds.), Handbook of Statistics, Volume 11. Amsterdam: NorthHolland.
Hendry, D. F. (1976). The structure of simultaneous equations estimators. Journal of Econometrics 4, 5188. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.
Hendry, D. F. (1979). Predictive failure and econometric modelling in macroeconomics: The transactions demand for money. In P. Ormerod (Ed.), Economic Modelling, pp. 217242. London: Heinemann. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Hendry, D. F. (1980). Econometrics: Alchemy or science? Economica 47, 387406. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.
Hendry, D. F. (Ed.) (1986). Econometric Modelling with Cointegrated Variables. Oxford Bulletin of Economics and Statistics: 48.
Hendry, D. F. (1986a). Econometric modelling with cointegrated variables: An overview. Oxford Bulletin of Economics and Statistics 48, 201212. Reprinted in R.F. Engle and C.W.J. Granger (eds), LongRun Economic Relationships, Oxford: Oxford University Press, 1991, 5163.
Hendry, D. F. (1986b). Using PCGIVE in econometrics teaching. Oxford Bulletin of Economics and Statistics 48, 8798.
Hendry, D. F. (1987). Econometric methodology: A personal perspective. In T. F. Bewley (Ed.), Advances in Econometrics, pp. 2948. Cambridge: Cambridge University Press. Reprinted in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Hendry, D. F. (1988). The encompassing implications of feedback versus feedforward mechanisms in econometrics. Oxford Economic Papers 40, 132149. Reprinted in Ericsson, N. R. and Irons, J. S. (eds.) Testing Exogeneity, Oxford: Oxford University Press, 1994; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Hendry, D. F. (1989). Comment on intertemporal consumer behaviour under structural changes in income. Econometric Reviews 8, 111121.
Hendry, D. F. (1993). Econometrics: Alchemy or Science? Oxford: Blackwell Publishers.
Hendry, D. F. (1995a). Dynamic Econometrics. Oxford: Oxford University Press.
Hendry, D. F. (1995b). On the interactions of unit roots and exogeneity. Econometric Reviews 14, 383419.
Hendry, D. F. (1995c). A theory of cobreaking. Mimeo, Nuffield College, University of Oxford.
Hendry, D. F. (1996). On the constancy of timeseries econometric equations. Economic and Social Review 27, 401422.
Hendry, D. F. (1997). On congruent econometric relations: A comment. CarnegieRochester Conference Series on Public Policy 47, 163190.
Hendry, D. F. (2000a). Econometrics: Alchemy or Science? Oxford: Oxford University Press. New Edition.
Hendry, D. F. (2000b). Epilogue: The success of generaltospecific model selection. See Hend00Al, pp. 467490. New Edition.
Hendry, D. F. and G. J. Anderson (1977). Testing dynamic specification in small simultaneous systems: An application to a model of building society behaviour in the United Kingdom. In M. D. Intriligator (Ed.), Frontiers in Quantitative Economics, Volume 3, pp. 361383. Amsterdam: North Holland Publishing Company. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.
Hendry, D. F. and J. A. Doornik (1994). Modelling linear dynamic econometric systems. Scottish Journal of Political Economy 41, 133.
Hendry, D. F. and J. A. Doornik (1997). The implications for econometric modelling of forecast failure. Scottish Journal of Political Economy 44, 437461. Special Issue.
Hendry, D. F. and N. R. Ericsson (1991). Modeling the demand for narrow money in the United Kingdom and the United States. European Economic Review 35, 833886.
Hendry, D. F., S. Johansen, and C. Santos (2004). Selecting a regression saturated by indicators. Unpublished paper, Economics Department, University of Oxford.
Hendry, D. F. and K. Juselius (2000). Explaining cointegration analysis: Part I. Energy Journal 21, 142.
Hendry, D. F. and H.M. Krolzig (1999). Improving on `Data mining reconsidered' by K.D. Hoover and S.J. Perez. Econometrics Journal 2, 202219. Reprinted in J. Campos, N.R. Ericsson and D.F. Hendry (eds.), General to Specific Modelling. Edward Elgar, 2005.
Hendry, D. F. and H.M. Krolzig (2001). Automatic Econometric Model Selection. London: Timberlake Consultants Press.
Hendry, D. F. and H.M. Krolzig (2005). The properties of automatic Gets modelling. Economic Journal 115, C32C61.
Hendry, D. F. and M. Massmann (2007). Cobreaking: Recent advances and a synopsis of the literature. Journal of Business and Economic Statistics 25, 3351.
Hendry, D. F. and G. E. Mizon (1978). Serial correlation as a convenient simplification, not a nuisance: A comment on a study of the demand for money by the Bank of England. Economic Journal 88, 549563. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Hendry, D. F. and G. E. Mizon (1993). Evaluating dynamic econometric models by encompassing the VAR. In P. C. B. Phillips (Ed.), Models, Methods and Applications of Econometrics, pp. 272300. Oxford: Basil Blackwell. Reprinted in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Hendry, D. F. and G. E. Mizon (1999). The pervasiveness of Granger causality in econometrics. See Engl99, pp. 102134.
Hendry, D. F. and M. S. Morgan (1989). A reanalysis of confluence analysis. Oxford Economic Papers 41, 3552.
Hendry, D. F. and M. S. Morgan (1995). The Foundations of Econometric Analysis. Cambridge: Cambridge University Press.
Hendry, D. F. and A. J. Neale (1987). Monte Carlo experimentation using PCNAIVE. In T. Fomby and G. F. Rhodes (Eds.), Advances in Econometrics, Volume 6, pp. 91125. Greenwich, Connecticut: Jai Press Inc.
Hendry, D. F. and A. J. Neale (1988). Interpreting longrun equilibrium solutions in conventional macro models: A comment. Economic Journal 98, 808817.
Hendry, D. F. and A. J. Neale (1991). A Monte Carlo study of the effects of structural breaks on tests for unit roots. In P. Hackl and A. H. Westlund (Eds.), Economic Structural Change, Analysis and Forecasting, pp. 95119. Berlin: SpringerVerlag.
Hendry, D. F., A. J. Neale, and N. R. Ericsson (1991). PCNAIVE, An Interactive Program for Monte Carlo Experimentation in Econometrics. Version 6.0. Oxford: Institute of Economics and Statistics, University of Oxford.
Hendry, D. F., A. J. Neale, and F. Srba (1988). Econometric analysis of small linear systems using PcFiml. Journal of Econometrics 38, 203226.
Hendry, D. F. and B. Nielsen (2007). Econometric Modeling: A Likelihood Approach. Princeton: Princeton University Press.
Hendry, D. F., A. R. Pagan, and J. D. Sargan (1984). Dynamic specification. See Gril84, pp. 10231100. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Hendry, D. F. and J.F. Richard (1982). On the formulation of empirical models in dynamic econometrics. Journal of Econometrics 20, 333. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press and in Hendry D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers 1993, and Oxford University Press, 2000; and in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Hendry, D. F. and J.F. Richard (1983). The econometric analysis of economic time series (with discussion). International Statistical Review 51, 111163. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.
Hendry, D. F. and J.F. Richard (1989). Recent developments in the theory of encompassing. In B. Cornet and H. Tulkens (Eds.), Contributions to Operations Research and Economics. The XXth Anniversary of CORE, pp. 393440. Cambridge, MA: MIT Press. Reprinted in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Hendry, D. F. and F. Srba (1980). AUTOREG: A computer program library for dynamic econometric models with autoregressive errors. Journal of Econometrics 12, 85102. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.
Hendry, D. F. and T. von UngernSternberg (1981). Liquidity and inflation effects on consumers' expenditure. In A. S. Deaton (Ed.), Essays in the Theory and Measurement of Consumers' Behaviour, pp. 237261. Cambridge: Cambridge University Press. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.
Hendry, D. F. and K. F. Wallis (Eds.) (1984). Econometrics and Quantitative Economics. Oxford: Basil Blackwell.
Hooker, R. H. (1901). Correlation of the marriage rate with trade. Journal of the Royal Statistical Society 64, 485492. Reprinted in Hendry, D. F. and Morgan, M. S. (1995), The Foundations of Econometric Analysis. Cambridge: Cambridge University Press.
Hoover, K. D. and S. J. Perez (1999). Data mining reconsidered: Encompassing and the generaltospecific approach to specification search. Econometrics Journal 2, 167191.
Jarque, C. M. and A. K. Bera (1987). A test for normality of observations and regression residuals. International Statistical Review 55, 163172.
Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12, 231254. Reprinted in R.F. Engle and C.W.J. Granger (eds), LongRun Economic Relationships, Oxford: Oxford University Press, 1991, 13152.
Johansen, S. (1995). Likelihoodbased Inference in Cointegrated Vector Autoregressive Models. Oxford: Oxford University Press.
Johansen, S. and K. Juselius (1990). Maximum likelihood estimation and inference on cointegration  With application to the demand for money. Oxford Bulletin of Economics and Statistics 52, 169210.
Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Biometrika 36, 149176.
Judd, J. and J. Scadding (1982). The search for a stable money demand function: A survey of the post1973 literature. Journal of Economic Literature 20, 9931023.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T.C. Lee (1985). The Theory and Practice of Econometrics, (2nd ed.). New York: John Wiley.
Kiviet, J. F. (1986). On the rigor of some misspecification tests for modelling dynamic relationships. Review of Economic Studies 53, 241261.
Kiviet, J. F. (1987). Testing Linear Econometric Models. Amsterdam: University of Amsterdam.
Kiviet, J. F. and G. D. A. Phillips (1992). Exact similar tests for unit roots and cointegration. Oxford Bulletin of Economics and Statistics 54, 349367.
Kohn, A. (1987). False Prophets. Oxford: Basil Blackwell.
Koopmans, T. C. (Ed.) (1950). Statistical Inference in Dynamic Economic Models. Number 10 in Cowles Commission Monograph. New York: John Wiley & Sons.
Koopmans, T. C., H. Rubin, and R. B. Leipnik (1950). Measuring the equation systems of dynamic economics. See Koop50c, Chapter 2.
Kremers, J. J. M., N. R. Ericsson, and J. J. Dolado (1992). The power of cointegration tests. Oxford Bulletin of Economics and Statistics 54, 325348.
Kuh, E., D. A. Belsley, and R. E. Welsh (1980). Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley.
Leamer, E. E. (1978). Specification Searches. AdHoc Inference with NonExperimental Data. New York: John Wiley.
Leamer, E. E. (1983). Let's take the con out of econometrics. American Economic Review 73, 3143. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.
Ljung, G. M. and G. E. P. Box (1978). On a measure of lack of fit in time series models. Biometrika 65, 297303.
Lovell, M. C. (1983). Data mining. Review of Economics and Statistics 65, 112.
Lucas, R. E. (1976). Econometric policy evaluation: A critique. In K. Brunner and A. Meltzer (Eds.), The Phillips Curve and Labor Markets, Volume 1 of CarnegieRochester Conferences on Public Policy, pp. 1946. Amsterdam: NorthHolland Publishing Company.
MacKinnon, J. G. (1991). Critical values for cointegration tests. In R. F. Engle and C. W. J. Granger (Eds.), LongRun Economic Relationships, pp. 267276. Oxford: Oxford University Press.
MacKinnon, J. G. and H. White (1985). Some heteroskedasticityconsistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics 29, 305325.
Makridakis, S., S. C. Wheelwright, and R. C. Hyndman (1998). Forecasting: Methods and Applications (3rd ed.). New York: John Wiley and Sons.
Marschak, J. (1953). Economic measurements for policy and prediction. In W. C. Hood and T. C. Koopmans (Eds.), Studies in Econometric Method, Number 14 in Cowles Commission Monograph. New York: John Wiley & Sons.
Mizon, G. E. (1977). Model selection procedures. In M. J. Artis and A. R. Nobay (Eds.), Studies in Modern Economic Analysis, pp. 97120. Oxford: Basil Blackwell.
Mizon, G. E. (1984). The encompassing approach in econometrics. See Hend84f, pp. 135172.
Mizon, G. E. (1995). A simple message for autocorrelation correctors: Don't. Journal of Econometrics 69, 267288.
Mizon, G. E. and D. F. Hendry (1980). An empirical application and Monte Carlo analysis of tests of dynamic specification. Review of Economic Studies 49, 2145. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.
Mizon, G. E. and J.F. Richard (1986). The encompassing principle and its application to nonnested hypothesis tests. Econometrica 54, 657678.
Nelson, C. R. (1972). The prediction performance of the FRBMITPENN model of the US economy. American Economic Review 62, 902917. Reprinted in T.C. Mills (ed.), Economic Forecasting. Edward Elgar, 1999.
Newey, W. K. and K. D. West (1987). A simple positive semidefinite heteroskedasticity and autocorrelationconsistent covariance matrix. Econometrica 55, 703708.
Nickell, S. J. (1985). Error correction, partial adjustment and all that: An expository note. Oxford Bulletin of Economics and Statistics 47, 119130.
Nielsen, B. (2006a). Correlograms for nonstationary autoregressions. Journal of the Royal Statistical Society, B 68, 707720.
Nielsen, B. (2006b). Correlograms for nonstationary autoregressions. Journal of the Royal Statistical Society B 68, 707720.
Pagan, A. R. (1984). Model evaluation by variable addition. See Hend84f, pp. 103135.
Perron, P. (1989). The Great Crash, the oil price shock and the unit root hypothesis. Econometrica 57, 13611401.
Pesaran, M. H. (1974). On the general problem of model selection. Review of Economic Studies 41, 153171. Reprinted in Campos, J., Ericsson, N.R. and Hendry, D.F. (eds.), General to Specific Modelling. Edward Elgar, 2005.
Phillips, P. C. B. (1986). Understanding spurious regressions in econometrics. Journal of Econometrics 33, 311340.
Phillips, P. C. B. (1987). Time series regression with a unit root. Econometrica 55, 277301.
Phillips, P. C. B. (1991). Optimal inference in cointegrated systems. Econometrica 59, 283306.
Poincaré, H. (1905). Science and Hypothesis. New York: Science Press.
Priestley, M. B. (1981). Spectral Analysis and Time Series. London: Academic Press.
Ramsey, J. B. (1969). Tests for specification errors in classical linear least squares regression analysis. Journal of the Royal Statistical Society B 31, 350371.
Richard, J.F. (1980). Models with several regimes and changes in exogeneity. Review of Economic Studies 47, 120.
Sargan, J. D. (1958). The estimation of economic relationships using instrumental variables. Econometrica 26, 393415.
Sargan, J. D. (1959). The estimation of relationships with autocorrelated residuals by the use of instrumental variables. Journal of the Royal Statistical Society B 21, 91105. Reprinted as pp. 87104 in Sargan J. D. (1988), Contributions to Econometrics, Vol. 1, Cambridge: Cambridge University Press.
Sargan, J. D. (1964). Wages and prices in the United Kingdom: A study in econometric methodology (with discussion). In P. E. Hart, G. Mills, and J. K. Whitaker (Eds.), Econometric Analysis for National Economic Planning, Volume 16 of Colston Papers, pp. 2563. London: Butterworth Co. Reprinted as pp. 275314 in Hendry D. F. and Wallis K. F. (eds.) (1984). Econometrics and Quantitative Economics. Oxford: Basil Blackwell, and as pp. 124169 in Sargan J. D. (1988), Contributions to Econometrics, Vol. 1, Cambridge: Cambridge University Press.
Sargan, J. D. (1980a). The consumer price equation in the postwar British economy. An exercise in equation specification testing. Review of Economic Studies 47, 113135.
Sargan, J. D. (1980b). Some tests of dynamic specification for a single equation. Econometrica 48, 879897. Reprinted as pp. 191212 in Sargan J. D. (1988), Contributions to Econometrics, Vol. 1, Cambridge: Cambridge University Press.
Shenton, L. R. and K. O. Bowman (1977). A bivariate model for the distribution of √b_{1} and b_{2}. Journal of the American Statistical Association 72, 206211.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall.
Sims, C. A. (1980). Macroeconomics and reality. Econometrica 48, 148. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.
Sims, C. A., J. H. Stock, and M. W. Watson (1990). Inference in linear time series models with some unit roots. Econometrica 58, 113144.
Spanos, A. (1986). Statistical Foundations of Econometric Modelling. Cambridge: Cambridge University Press.
Wallis, K. F. (1987). Time series analysis of bounded economic variables. Journal of Time Series Analysis 8, 115123.
White, H. (1980). A heteroskedasticconsistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48, 817838.
White, H. (1984). Asymptotic Theory for Econometricians. London: Academic Press.
White, H. (1990). A consistent model selection. In C. W. J. Granger (Ed.), Modelling Economic Series, pp. 369383. Oxford: Clarendon Press.
Wooldridge, J. M. (1999). Asymptotic properties of some specification tests in linear models with integrated processes. See Engl99, pp. 366384.
Working, E. J. (1927). What do statistical demand curves show? Quarterly Journal of Economics 41, 212235.
Yule, G. U. (1926). Why do we sometimes get nonsensecorrelations between timeseries? A study in sampling and the nature of time series (with discussion). Journal of the Royal Statistical Society 89, 164. Reprinted in Hendry, D. F. and Morgan, M. S. (1995), The Foundations of Econometric Analysis. Cambridge: Cambridge University Press.