OxMetrics provides many convenient graphical options for prior data analysis. For simplicity, we denote the selected variable by xt, t=1,...,T. This chapter summarizes the underlying formulae, discussing each available graph type in turn.
Actual series gives a graph for each selected variable, showing the variable over time. The All scatter plots option gives scatter plots between all selected variables (omitting redundant scatter plots).
These are relevant for the statistics described below. For a series xt, t=...,T:
|
|
Note that an alternative definition of the variance would divide by T-1 instead of T.
Define the sample autocovariances {ĉj} of a series xt, t=...,T:
|
|
using the full sample mean x= 1/T ∑t=1Txt. The variance σ̂2x corresponds to ĉ0, see (eq:9.1).
The sample autocorrelation function (ACF) is the series {r̂j} where r̂j is the sample correlation coefficient between xt and xt-j. The length of the correlogram is specified by the user, leading to a figure which shows ( r̂1,r̂2,...,r̂s) plotted against ( 1,2,...,s) where for any j when x is any chosen variable:
|
|
The first autocorrelation, {r̂0}, is equal to one, and omitted from the graphs.
The asymptotic variance of the autocorrelations is 1/T, so approximate 95% error bars are indicated at ±2T-1/2 (see e.g. Harvey, 1993, p.42).
If there are missing values in the data series, the correlogram uses all valid observation as follows: the mean is computed for all valid observations; terms where xt or xt-j are missing are omitted from the summation in (eq:9.2). Consequently, the same procedure is used in (eq:9.3) and any other derived functions such as the spectrum and periodogram.
Given the sample autocorrelation function {r̂j} the partial autocorrelations are computed using Durbin's method as described in Golub and Van Loan (1989, §4.7.2). This corresponds to recursively solving the Yule--Walker equations. For example, with autocorrelations, r̂0, r̂1, r̂2, ..., the first partial correlation is α̂0=1 (omitted from the graphs). The second, α̂1, is the solution from
| ( |
| ) = ( |
| ) ( |
| ), |
et cetera.
The sample correlogram plots the series {r̂j*} where r̂j* is the correlation coefficient between xt and xt-j. The length of the correlogram is specified by the user, leading to a figure which shows ( r̂1*,r̂2*,...,r̂s*) plotted against ( 1,2,...,s) where for any j when x is any chosen variable:
|
|
Here x0= 1/T-j ∑t=j+1Txt is the sample mean of xt, t=j+1,...,T, and xj= 1/T-j ∑t=j+1Txt-j is the sample mean of xt-j, so that r̂j* corresponds to a correlation coefficient proper.
Note the difference with the definition of the sample autocorrelations {r̂j} in (eq:9.3) above. This difference tends to be small, and vanishes asymptotically, provided the series are stationary. But, as argued in Nielsen (2006), the correlogram provides a better discrimination between stationary and non-stationary variables: for an autoregressive value of one (or higher), the correlogram declines more slowly than the ACF.
When the correlogram is selected for plotting, the PACF is based on the correlogram values, rather than the ACF. Handling of missing values is the same as for the ACF.
The sample cross-correlation function graphs the correlation between a series xt and the lags of another series yt:
|
|
using the full sample means x and y.
The periodogram is defined as:
|
|
Note that p(0)=0.
When the periodogram is plotted, only frequencies greater than zero and up to π are used. Moreover, the x-axis, with values 0,...,π, is represented as 0,...,1. So, when T=4 the x coordinates are 0.5,1 corresponding to π/2, π. When T=5, the x coordinates are 0.4,0.8 corresponding to 2π/5, 4π/5.
However, when the periodogram is evaluated using the Algebra function periodogram, it is evaluated at all frequencies up to 2π, resulting in T periodogram values. For T=4 these are evaluated at 0, π/2, π, 3π/2. For T=5: 0, 2π/5, 4π/5, 6π/5, 8π/5. The frequencies are returned in the last argument of periodogram.
The estimated spectral density is a smoothed function of the sample autocorrelations {r̂j}, defined as in (eq:9.3). The sample spectral density is then defined as:
|
|
where | .| takes the absolute value, so that, for example, r̂| -1| =r̂1. The K( .) function is called the lag window. OxMetrics uses the Parzen window:
|
|
We have that K(-j)=K(j), so that the sign of j does not matter ( cos (x)= cos (-x)). The r̂js are based on fewer observations as j increases. The window function attaches decreasing weights to the autocorrelations, with zero weight for j>m. The parameter m is called the lag truncation parameter. In OxMetrics, this is taken to be the same as the chosen length of the correlogram. For example, selecting s=12 (the with length setting in the dialog) results in m=12. The larger m, the less smooth the spectrum becomes, but the lower the bias. The spectrum is evaluated at 128 points between 0 and π. For more information see Priestley (1981) and Granger and Newbold (1986).
Given a data set {xt}=( x1...xT) which are observations on a random variable X. The range of {xt} is divided into N intervals of length h with h defined below. Then the proportion of xt in each interval constitutes the histogram; the sum of the proportions is unity on the scaling in OxMetrics. The density can be estimated as a smoothed function of the histogram using a normal or Gaussian kernel. This can then be summed (`integrated') to obtain the estimated cumulative distribution function (CDF).
Denote the actual density of X at x by fx( x) . A non-parametric estimate of the density is obtained from the sample by:
|
|
where h is the window width or smoothing parameter, and K( .) is a kernel such that:
| ∫-∞∞ K( z) dz=1. |
OxMetrics sets:
| h=1.06σ̂x/T0.2 |
as a default, and uses the standard normal density for K( .) :
|
|
The default window width is used when the dialog option is set to Default bars. Alternatively, it is possible to directly choose the number of bars, in which case, the more bars, the less smooth the density estimate will be. The histogram uses the whole sample, skipping over missing values.
fx( x) ̂ is usually calculated for 128 values of x, but since direct evaluation can be somewhat expensive in computer time, a fast Fourier transform is used (we are grateful to Dr. Silverman for permission to use his algorithm). The estimated CDF of X can be derived from fx( x) ̂; this is shown with a standard normal CDF for comparison. An excellent reference on density function estimation is Silverman (1986).
The scatter plots option allows for various types of regression lines to be drawn. A single regression line draws the fitted values from the OLS estimates α̂ and β̂ in
| ŷt=α̂+β̂xt, t=1,...T, |
assuming that yt and xt are the selected variables. The distinction between sequential and recursive regression lines is easily explained using three lines. Divide the sample in three parts: 1,...,T/3, T/3+1,...,2T/3 and 2T/3+1,...,T, then:
| sequential |
| |||||||||
| recursive |
|
The regression line is fitted over the whole sample, skipping over missing values inside sample.
As a first step towards non-parametric regression, we start with a scatter plot of yt and xt, with xt along the horizontal axis. Divide the x-axis in N intervals, and compute the average y-value in each interval. The resulting step function provides a non-parametric regression line: to compute ŷ(x0) we locate the interval on which x0 falls, and use the mean of y in that interval as ŷ. An obvious drawback is that, if x0 is towards the edge of an interval, most of the observations are on the other side. Solve this by centring the interval in x0. This still gives as much weight to points far away as to points nearby. In line with the smoothing functions (kernels) in the spectrum and density estimates, we use a density function to take a weighted average.
The non-parametric estimate of y is obtained from
|
|
where, as for (eq:9.9), h is the window width, bandwidth or smoothing parameter, and K( .) is a kernel which integrates to 1. OxMetrics uses the Epanechnikov kernel:
|
|
which is optimal in a certain sense. The optimal bandwidth, equivalent to the default chosen for the non-parametric density estimate is:
| h=0.75σ̂x/T0.2. |
The function ŷh( .) is computed at 128 points. As h→0, ŷh( .) →y: the sample mean is the fitted value for any x. On the other hand, if h→∞ the interval goes to zero, and we fit the nearest corresponding y-value so that each data point is picked up exactly. Note, however, that the ŷ( .) function is evaluated at the T data points xt (which is time in the absence of x). Härdle (1990) is a general reference on the subject of non-parametric regression.
There are three ways of specifying the bandwidth:
This specifies the equivalent number of parameters (approximately comparable to the number of regressors used in a linear regression). The default is
| 3/4 [ |
| ]1/2 T-0.2. |
Sets the bandwidth directly.
Chooses the bandwidth by generalized cross validation (GCV). We find that choosing bandwidth using GCV or cross validation (CV) tends to undersmooth.
Equation (eq:9.13) below indicates how GCV is computed.
The kernel smooth is not computed using a Fourier transform, but directly, so can be slow for large T. The smooth is fitted over the whole sample skipping over missing values inside sample, which are estimated by the fit from the smooth.
One drawback of this smooth is that, for trending lines, it behaves counter intuitively at the edges. Consider, for example, the left edge of the kernel smooth of a variable which is upwardly trending. Since the smooth starts as a moving average of points which are only to the right, and hence mainly higher, the left edge will have a J shape, starting above what would be fitted by eye.
A spline is another method for smoothing a scatter plot. Consider a plot of yt, against xt, and sort the data according to x: a<x[1]<...<x[T]<b. In a spline model, the sum of squared deviations from a function g is minimized, subject to a roughness penalty:
| min ∑t=1T [ yt - g(x[t])]2 +α∫ab [ g''(x)]2 dx. |
OxMetrics uses a natural cubic spline, which is cubic because the function g is chosen as a third degree polynomial, and natural because the smooth is a straight line between a and x[1] and between x[T] and b. This avoids the end-point problem of the kernel smooth of the previous section. Two good references on splines and nonparametric regression are Green and Silverman (1994) and Hastie and Tibshirani (1994).
The α parameter is the bandwidth: the smaller α, the lower the roughness penalty, and hence the closer the smooth will track the actual data.
The spline is evaluated at all the data points, where missing y values are estimated by the fit from the smooth. The spline procedure handles ties in the x variable. The algorithm used to compute the spline is of order T, and discussed extensively in Green and Silverman (1994, Chs.2,3).
For evenly-spaced data (e.g. scatter plot against time), the algorithm involves a Toeplitz matrix. This illustrates the closeness of a natural cubic spline to the Hodrick--Prescott filter which is popular in macro-economics:
|
The only difference is in this Toeplitz matrix. Since the Hodrick--Prescott
filter uses α=1600 (for quarterly data), the smoothers from both methods are virtually
identical.
There are three ways of specifying the bandwidth:
This specifies the equivalent number of parameters, ke, (approximately comparable to the number of regressors used in a linear regression). The default is
| 3/4 [ |
| ]1/2 T-0.2. |
Sets the bandwidth directly, with the default of 1600 corresponding to the Hodrick--Prescott filter for quarterly data (see §10.4.4.1 for other frequencies).
Chooses the bandwidth by generalized cross validation (GCV). We find that choosing bandwidth using GCV or cross validation (CV) tends to undersmooth. The GCV criterion is computed as:
|
|
We have adopted GCV instead of CV because a very good fit at one point could dominate the CV criterion.
Draws a QQ plot. The variable would normally hold critical values which are hypothesized to come from a certain distribution. This function then draws a cross plot of these observed values (sorted), against the theoretical quantiles. The 45o line is drawn for reference (the closer the cross plot to this line, the better the match). The QQ plot line is drawn over the whole sample, skipping over missing values.
The following distributions are supported:
A box plot shows the distribution of a variable in terms of its quartiles, labelled Q1, Q2, Q3 (the 25%, 50%and 75%quartiles). Define the interquartile range as IQR = 1.5 (Q3 - Q1). The box plot consists of the following elements:
The box plot uses the whole sample, skipping over missing values.
A simple exponentially-weighted moving average of a series yt, t=1,...,T, is defined as:
|
|
using m1=y1. The resulting series zt is zt=mt, except when any yt+h is missing for h≥1, in which case it is replaced by forecasts zt+h=mt.
An extended version introduces a slope coefficient β, and is also called Holt's method:
|
In this case m1=y1, m2=y2, b2=y2-y1; when β=0 the EWMA (eq:9.14) is used. The resulting series zt is zt=mt, except when any yt+h is missing for h≥1, in which case it is replaced by forecasts zt+h=mt+hbt.
Back-substitution in the EWMA without slope coefficient shows that:
| mt = (1-α)t-1m1 + α∑s=0t-1(1-α)s yt-s. |
See e.g. Harvey (1993, Ch. 5), or Makridakis, Wheelwright and Hyndman (1998, Ch. 4).
The exponentially weighted moving sample correlation of two series yt and xt, both having expectation zero, is defined as:
| rt = |
| , t=3,...,T |
where
| pt(x,y) = ∑s=0t-1λs xt-s yt-s, t=1,...,T, |
and r1,r2 are set to missing values. When any yt+h or xt+h is missing rt+h is set to rt.
Premultiplying the numerator and denominator of rt by 1-λ, shows that mt*=(1-λ)pt(x,y) is the EWMA of the cross-product with α=1-λ and m1=0. This does not work for λ=1.
OxMetrics is mostly menu-driven for ease of use. To add flexibility, certain functions can be accessed through entering commands. The syntax of these commands, which can be seen as small computer languages, is described in this and the next chapter. In addition, there is the Ox language, which is a more serious computer language, and described in a separate book.
Algebra is a simple vector language, operating on only one type of object: the variables in the database. This object is manipulated as a whole, although it is possible to limit access to a subsample. The only valid statements are assignments and conditional assignments. Statements have to be terminated with a semicolon.
If an error occurs while executing the algebra code, the processing will be aborted, and control returned to OxMetrics. All successful statements up to that point will have been executed, with corresponding changes to the database. Take this into account when rerunning the corrected code.
There are three ways of running Algebra code, as discussed in the next three sections.
The calculator enables easy manipulation of the variables in the database, and is a convenient way to write algebra expressions.
The aim is to build a valid algebra expression in the expression window (without the assignment part and the terminating semi-colon). All successful transformations are logged in the Results window as Algebra code.
The Tools/Algebra Editor command enables you to transform the database variables by typing mathematical formulae into an editor. The algebra code can be saved and reloaded. The Algebra editor also allows for choosing the database to which the code is applied.
A text selection containing Algebra code can be run directly from that window using the Edit/Run as Algebra command (or Ctrl+a as short cut). The code is applied to the currently active database. The current database is changed by setting focus to that database, or using any of the drop down boxes (calculator, algebra editor, graphics), or using the usedata() batch command.
Algebra code can be specified in a Batch program, see Ch. 11, and specifically §11.5.1. An Algebra file can also be loaded into a batch file (§11.5.14).
Names are made up of letters and digits. The first character must be a letter. Underscores (_) count as a letter. Valid names are CONS, cons, cons_1, _a_1_b, etc. Examples of invalid names are ΔCONS, 1-CONS, log(X), etc. Invalid names may be used in Algebra when enclosed in double quotes, so
"log(X)" = log(X);
is valid as long as the variable called X exists. Algebra is case-sensitive: CONS and cons refer to different variables. When you create a new variable through an assignment operation, it is immediately added to the database, and initialized to missing values. If necessary the database name will be truncated (longer names are allowed in algebra than in the database).
Anything between /* and */ is considered comment. Note that this comment cannot be nested. So
aap = CONS + 1; /* comment /* nested comment */ */
leads to a syntax error. Everything following // up to the end of the line is also comment.
A constant is a number which is used in an expression. Examples are 1, 1.2, .5, -77000, -.5e-10 and 2.1E-12.
The binary arithmetic operators are ^4 , *, /, +, -. The ^4 is the power operator; CONS ^4 2 raises the variable CONS to the power two. All computations are double precision floating point arithmetic. The precedence order is: ^4, unary + and unary -, then *, /, and finally +, -. An example of unary minus is: x = -1. Any arithmetic operation involving missing values returns a missing value.
The relational operators are <, <=, >, >=, standing for `less', `less or equal', `greater', `greater or equal'. The equality operators are == and !=, `is equal to' and is `not equal to'. These are ranked below the relational operators in precedence. Ranked yet lower are logical AND (&&) and logical OR (||). The unary negation operator ! has the highest precedence. An example of unary negation is:
new = !season();
will have a 0 where the season function returns a 1, and vice versa. Boolean arithmetic is also done in floating point. The numeric result is 1. for an expression that evaluates to true, and 0. for one that is false. An expression with value 0 is false, whereas a value that is not 0 is true.
Missing values are treated as follows:
| a op b | result when a and/or b is a missing value |
| <, <=, >, >= | FALSE if a and/or b is missing |
| == | TRUE if both a and b are missing |
| != | TRUE if either a or b is missing (but not both) |
| &&, || | no special treatment of missing values |
Table Table:10.1 lists the operator precedence, with the highest precedence at the top of the table.
| Symbol | Description | Associativity |
| ^4 | Power | Left to right |
| ! | Logical NOT | Right to left |
| - | Unary minus | |
| + | Unary plus | |
| * | Multiply | Left to right |
| / | Divide | |
| + | Add | Left to right |
| - | Subtract | |
| < | Less than | Left to right |
| <= | Less than or equal to | |
| > | Greater than | |
| >= | Greater than or equal to | |
| == | Equal | Left to right |
| != | Not equal | |
| && | Logical AND | Left to right |
| || | Logical OR | Left to right |
The assignment operator is the = symbol. Assignment statements have to be terminated by a semicolon (;). If the variable to which a result is assigned does not yet exist, it is created and added to the database. Otherwise the existing variable is overwritten. Note that assignment expressions are vector expressions: all observations will be overwritten (it is possible to restrict assignment to a subsample using the insample function). Some valid assignment statements are:
cons2 = 2 * CONS - 1; cons3 = CONS / 3 - (cons2 + 5) * 1.55; Seasonal = season();
The last statement constructs a variable called Seasonal, which is 1 in period 1, and 0 otherwise. This variable will be used in further examples in this chapter.
Take care not to confuse the = and == symbols: = assigns, while == compares. Parentheses may be used in expressions in the usual way; in the example above (cons2 + 5) * 1.55 evaluates to cons2 * 1.55 + 5 * 1.55.
The conditional assignment expression has the following form:
TestExpression ? TrueExpression : FalseExpression
First the TestExpression is evaluated; if it is true (not 0.), then TrueExpression will be evaluated, else FalseExpression. Let us consider an example involving the Seasonal variable created above, which is 1 in the first quarter, and 0 in the other quarters. The statement
new = Seasonal ? CONS : 0;
can be read as follows: the variable new will get the value of CONS when Seasonal is true (that is, when Seasonal is not 0., so in the first quarter of each year), else it will get the value 0. In this case, the same result could have been reached with:
new = Seasonal * CONS;
However, if we had used a seasonal with the value 2 in the first quarter, and the rest zeros, then only the conditional assignment would have given the desired result (since 2 is not false, and hence true).
Note that the `: FalseExpression' part is optional. For example:
new = Seasonal ? CONS;
is a valid conditional assignment statement. Now the variable will have the value of CONS for the first quarter, but the other observations of new are not touched (so if new is created by this expression, it will contain missing values for quarters 2, 3 and 4, making the variable unusable for modelling purposes). Another example:
new = (CONS == MISSING) ? 0 : CONS;
This new variable takes the values of CONS, replacing missing values by zeros.
It is possible to index a particular observation in a variable using rectangular brackets. For example:
CONS_1 = CONS[-1];
There are three methods of indexing:
Each Algebra statement can be seen as running within a loop over all observations. The relative index is offset to the current observation in the loop. So
CONS_1 = CONS[-1];
can be interpreted as:
for (t = first observation) to (t = last observation) do
CONS_1[t] = CONS[t - 1];
Consequently, this statement has the same effect as
CONS_1[0] = CONS[-1];
In this case the index has the form year(period), for example
X = INC + CONS[1980(1)]; y[1955(1)] = 12;
In this case the index has the form index(0), which references observation index (with the first observation having index 0!). For example
X = INC + CONS[0(0)]; // add first CONS value
The following keywords are reserved by algebra:
| keyword | value |
| FALSE | 0. |
| TRUE | !FALSE |
| FREQUENCY | data frequency |
| MISSING | the missing value |
| PI | Pi (3.1415..) |
| NOBS | number of observations |
A large set of functions is provided by algebra. Most of these take both variables and constants as arguments. A function name must be followed by parentheses, even if it does not take any arguments. See the Algebra function list of Table Table:10.3 for function definitions. Some examples of statements involving functions are:
lcons = log (CONS); // takes the natural logarithm of CONS
cons_1 = lag(CONS, 1); // lag CONS one period
ccons = exp(log(CONS)); // gives back original CONS
dummy = insample(1979, 1, 1983, 4) ? log(CONS) : 0;
// Dummy will be log(CONS) for the period 1979(1)
// to 1983(4), and 0 outside it.
trough = (lag(trough,1) == MISSING || CONS < lag(trough,1))
? CONS : lag(trough,1);
// a trough can also be created with the built in
// algebra function trough()
dummy = (year() == 1979) ? 1 : 0; // creates a dummy variable
A variable is lagged using the lag function, and differenced using the diff function. For example, for first lag and difference respectively, as well as second lag and difference:
Y_1 = lag(Y, 1); DY = diff(Y, 1); Y_2 = lag(Y, 2); D2Y = diff(Y, 2);
This code adopts the naming convention which is used by OxMetrics: append an underscore followed by lag length for a lagged variable, and a prefix `D' for differenced variables.
When taking a first difference or lag, the first observation becomes missing; for a second difference or lag, the first two observations are missing, etc.
Neither function can have an expression as the argument for differencing. So you cannot write diff(diff(Y,1),1) but must write:
DY = diff(Y, 1); DDY = diff(DY, 1);
Note that this DDY is not the same as D2Y above.
Finally, the dlog(.) function takes the first difference of the (natural) logarithm of the variable.
The sample autocorrelation function can easily be plotted. To compute the exact values of the ACF, use the following algebra code:
acf_lag = MISSING; acf_dcons = MISSING; acf(DCONS, acf_dcons, acf_lag);
The first two lines are required to create the output variables for the acf function. The output from acf is the actual ACF values (here stored in ACF_DCONS), and the lag length (in ACF_LAG).
The periodogram function has a similar syntax, returning the sample periodogram in the second argument, and the frequencies at which it was computed in the third.
The sort functions change the order of observations in the database and must be handled with care. Note that the trend() function can be used to create a variable which corresponds to the original observation index of the data. So if that variable is sorted together with the variable we wish to sort, it can be used to `unsort' the variable.
Sorts the variable arg1 in increasing order (arg1 must be a variable, and cannot be an expression). Returns the value of arg1 after sorting. Suppose the residuals of a regression have been saved in the variable called Residual, then a sorted copy is created as follows:
res = Residual; sort(res);
Sorts the variable arg1 in increasing order, and sorts arg2 accordingly. Returns the value of arg1 after sorting. Both arg1 and arg2 must be variables. Suppose the residuals of a regression have been saved in the variable called Residual, then a sorted residual is created as follows:
index = trend();
tmp = _sortby(Residual, index);
/* Residual is sorted, and index accordingly */
/* tmp is a dummy variable, */
/* at this stage identical to Residual. */
/* Now it is easy to locate outliers in Residual */
The following statement will reset the old ordering (do not rerun the index = trend(); statement, because that will overwrite the index and lose the information on the original ordering):
tmp = _sortby(index, Residual);
/* Restores index and Residual, */
/* at this stage index is equal to Trend. */
Sorts the variable arg1 in increasing order, and sorts the whole database accordingly. Returns the value of arg1 after sorting. This function will be most useful with cross-section data; for example, to push missing values to the end:
exclude =
(Var1 == MISSING || Var2 == MISSING || Var3 == MISSING);
/* exclude is 1 for each observation where any of the
three variables has a missing value, 0 elsewhere */
tmp = _sortallby(exclude);
/* the observations without any missing values will
precede those with missing values, making it easy
to exclude them from the regression.
An index variable can be used to restore the order */
The smoothing functions which are available in scatter plots can also be accessed from algebra.
The Hodrick--Prescott is a filter which is popular in macro-economics, and virtually identical to a natural cubic spline (§9.10.2). The syntax is:
where var is the variable to be smoothed, alpha is the bandwith (use 0 for the default of 1600), and var_dest is the destination variable (must be different from var). For example:
hpCONS = smooth_hp(CONS, 0, hpCONS);
which creates hpCONS through the assignment statement, and uses it as the destination for the fitted values from the filter.
If there are missing values in the data series, smooth_hp uses data starting from the first valid observation, and stopping at the first missing value thereafter (this is unlike the smoothing functions of the next section, which will skip over missing values, using the maximum sample). The sample used can be restricted using the insample function.
The bandwidth parameter associated with the Hodrick--Prescott filter is 100 (freq)2:
| annual data | 100 |
| quarterly data | 1600 |
| monthly data | 14400 |
The kernel based smoother (using the Epanechnikov kernel, §9.10.1), and the natural cubic spline (§9.10.2) are computed respectively by:
where var is the variable to be smoothed, var_dest is the destination variable (must be different from var), and alpha is the bandwith, used as follows:
These functions will use all available observations (unless restricted by the insample function), and will fill in missing values using the fit from the smoother.
A simple exponentially-weighted moving average (see §9.13) of a series y can be created using:
ysmo = ewma(y, 0.5, 0, ysmo);
Here α=0.5. An extended version introduces a slope coefficient β, and is also called Holt's method, e.g. with β=0.1:
ysmo = ewma(y, 0.5, 0.1, ysmo);
In both cases, in-sample missing values are replaced by forecasts.
The exponentially weighted moving sample correlation (see §9.14) of two series y and x (both assumed to have expectation zero) can be created by:
xysmo = ewmc0(x, y, 0.5, xysmo);
Here, λ is set to 0.5.
When a database is dated, there are functions to return components of the current date and time for each observation: day, month, week, hours, minutes, seconds.
The makedate and maketime functions translate year, month, day and hours, minutes, seconds to date and time respectively. These can be added together to create a date with a time.
The dayofweek, isdayofmonth and iseaster functions are needed when determining
holidays, for example to delete all holidays and weekends from a database.
The following code, supplied as holiday_us.alg in the algebra folder,
determines all the official US holidays:
hol_us_fix = /* official US holidays: fixed dates */
(month()==1 && day()==1) /* New Year's Day */
|| (month()==7 && day()==4) /* Independence Day */
|| (month()==11 && day()==11) /* Veterans Day */
|| (month()==12 && day()==25); /* Christmas Day */
hol_us_fix = hol_us_fix /* fixed dates moved to Fri */
|| (month()==12 && day()==31 && dayofweek()==6)
|| (month()==7 && day()==3 && dayofweek()==6)
|| (month()==11 && day()==10 && dayofweek()==6)
|| (month()==12 && day()==24 && dayofweek()==6);
hol_us_fix = hol_us_fix /* fixed dates moved to Mon */
|| (month()==1 && day()==2 && dayofweek()==2)
|| (month()==7 && day()==5 && dayofweek()==2)
|| (month()==11 && day()==12 && dayofweek()==2)
|| (month()==12 && day()==26 && dayofweek()==2);
hol_us_flt = /* official US holidays: floating dates */
isdayofmonth(1,2,3) /* 3rd Mon in Jan: Martin Luther King day */
|| isdayofmonth(2,2,3) /* 3rd Mon in Feb: Washington's Birthday */
|| isdayofmonth(5,2,-1) /*last Mon in May: Memorial Day */
|| isdayofmonth(9,2,1) /* 1st Mon in Sep: Labor Day */
|| isdayofmonth(10,2,2) /* 2nd Mon in Oct: Columbus Day */
|| isdayofmonth(11,5,4); /* 4th Thu in Nov: Thanksgiving */
UK holidays also require the iseaster function:
hol_uk = /* Bank holidays for England and Wales */
(month()==1 && day()==1) /* Jan 1 */
|| (month()==1 && day()==2 && dayofweek() == 2) /* Jan 1 on Sun */
|| (month()==1 && day()==3 && dayofweek() == 2) /* Jan 1 on Sat */
|| iseaster(-2) /* Good Friday */
|| iseaster(1) /* Easter Monday */
|| isdayofmonth(5,2,1) /* Early May Bank Holiday */
|| isdayofmonth(5,2,-1) /* Spring Bank Holiday */
|| isdayofmonth(8,2,-1) /* Summer Bank Holiday */
|| (month()==12 && day()==25) /* Christmas day */
|| (month()==12 && day()==26) /* Boxing day */
|| (month()==12 && day()==27 && dayofweek() == 3) /* Xmas on Sun */
|| (month()==12 && day()==27 && dayofweek() == 2) /* Xmas on Sat */
|| (month()==12 && day()==28 && dayofweek() == 3);/* Xmas on Sat */
Available functions are listed in Table Table:10.3. Any function operating on missing values returns a missing value. Any function which fails also returns a missing value. Where the argument is var, it must be just a variable name, e.g. one = 1; cum(one); is allowed, but cum(1) is not. Similarly: lag(CONS+1,1) is not allowed, while lag(CONS,1)+1 is.
Several examples of Algebra code were given above; for convenience these are summarized here:
LX = log(X);
DX = diff(X, 1);
"log(X)" = log(X);
aap = CONS + 1; /* comment /* nested comment */ */
new1 = !season();
cons2 = 2 * CONS - 1;
cons3 = CONS / 3 - (cons2 + 5) * 1.55;
Seasonal = season();
new2 = Seasonal ? CONS : 0;
new3 = Seasonal * CONS;
new4 = Seasonal ? CONS;
new5 = (CONS == MISSING) ? 0 : CONS;
CONSlag1 = CONS[-1];
X = INC + CONS[1980(1)];
y[1955(1)] = 12;
X = INC + CONS[0(0)]; // add first CONS value
Function Returns abs(arg) absolute value of arg (same as fabs) acf(var,dest,lag) autocorrelation function (all arguments must be a variable) acos(arg) arccosine of arg, or 0 if |arg| > 1 almon(var,arg,power) almon lag of var almon(x,c1,c2)=
∑c1k=0(c1+1-i)c2 xt-i/
∑c1k=0(c1+1-i)c2 asin(arg) arcsine of arg, or 0 if |arg| > 1 atan(arg) arctangent of arg ceil(arg) ceiling of arg (the smallest integer ≥ arg) cos(arg) cosine of arg cum(var) cumulative sum of var date() date value (dated database) or the observation index day() day (dated database) dayofweek() day of the week (dated database, 1=Sunday,...,7=Saturday) delete(var,...) delete the list of variables when this Algebra run completes denschi(arg) χ2(arg) density densf(arg1,arg2) F(arg1, arg2) density densn() N(0,1) density denst(arg1) student-t(arg) density diff(var,arg) argth difference of var,
or the missing value if var is not a variable,
or the difference cannot be taken div(arg1,arg2) returns the result of the (32 bit) integer division dlog(var) first difference of natural logrithm of var dummy(yr1,per1,yr2,per2) 1 inside the sample yr1 (per1) -- yr2 (per2), 0 outside it dummydates(date1,date2) as dummy for a dated database ewma(var,alpha,beta,dest) see §10.4.4.3, §9.13 ewmc0(varx,vary,lambda,dest) see §10.4.4.3, §9.14 exp(arg) exponential function of arg fabs(arg) absolute value of arg floor(arg) floor of arg (the largest integer ≤ arg ) fmod(arg1,arg2) floating point remainder of arg1/arg2 hours() indates(date1,date2) as insample for a dated database insample(yr1,per1,yr2,per2) true if inside the sample yr1(per1) to yr2(per2), false outside isdayofmonth(month,dayofweek,nth) returns 1 for nth day of week in month; dayofweek: 1=Sunday, etc. nth<0: from end iseaster(daysafter) returns 1 for daysafter easter lag(var,arg) argth lag of var, or the missing value if arg1 is not a
variable, or the lag cannot be taken; use arg <0 for leads log(var) natural logarithm of var log10(var) base 10 logarithm of var loggamma(var) logarithm of the gamma function at var) (Continued on next page)
Function Returns (Continued) makedate(year,month,day) make date values maketime(hour,min,sec) make time values max(arg1, arg2) maximum of arg1 and arg2 mean(var) mean of var min(arg1,arg2) minimum of arg1 and arg2 minutes() minutes (dated database) month() month (dated database) movingavg(var,lag,lead) the moving average of var MAV(x,c1,c2)
= ∑t+c2k=t-c1 xk/(c2+c1+1) movingSD(var,lag,lead,mvar) the moving standard deviation of var around mvar MSD(x,c1,c2,z)
= [∑t+c2k=t-c1 (xk-zk)2/(c2+c1+1) ]1/2 pacf(var,dest,lag) PACF (all arguments must be a variable) peak(var) maximum value of var up to current observation period() current period periodogram(var,dest,freq) periodogram (all arguments must be a variable) probn(arg1) P(X ≤ arg1 | X ~N(0,1) ) quanchi(p,arg) χ2(arg) quantiles at p quanf(p,arg1,arg2) F(arg1, arg2) quantiles at arg1 quann(p) N(0,1) quantiles at p quant(p,arg1) student-t(arg) quantiles at p ranu() uniform random numbers ranchi(arg) χ2(arg) random numbers ranf(arg1,arg2) F(arg1, arg2) random numbers rann() N(0,1) random numbers rant(arg1) student-t(arg) random numbers ranseed(arg) sets the random number seed to arg, returns arg round(arg) rounded value of arg season() 1 in period 1, 0 otherwise seconds() seconds (dated database) sin(arg) sine of arg smooth_hp(var,alpha,dest) see §10.4.4 smooth_np(var,alpha,dest) see §10.4.4 smooth_sp(var,alpha,dest) see §10.4.4 sort(var) see §10.4.3 _sortallby(var1) see §10.4.3 _sortby(var1, var2) see §10.4.3 sqrt(arg) square root of arg stock(var,arvalue,init) integrates var stock(x,c1,c2)
= (1-c1)yt-1 + xt, given y0=c2 stockv(var,arvar,init) integrates var stockv(x,z,c2) = (1-zt)yt-1 + xt, given y0=c2 (Continued on next page)
Function Returns (Continued) tailchi(arg1,arg2) P(X ≥ arg1 | X ~χ2(arg2) ) tailf(arg1,arg2,arg3) P(X ≥ arg1 | X ~F(arg2, arg3) ) tailn(arg1) P(X ≥ arg1 | X ~N(0,1) ) tailt(arg1,arg2) P(X ≥ arg1 | X ~t(arg2) ) tan(arg) tangent of arg time() time value (dated database) trend() 1 for the first observation, 2 for the second, etc. trough(var) minimum value of var up to current observation variance(var) variance of var year() current year.
The Batch language gives some control over OxMetrics through a command language. This allows for automating repetitive tasks, or as a quick way to get to a certain point in your analysis. Other programs (such as PcGive, STAMP and X12arima) extend this batch language with their own module-specific commands.
OxMetrics allows you to save the most recent model from the current module as a batch file (if supported by the module). If a model has been created interactively, it can be saved as a batch file for further editing or easy recall in a later session. This is also the most convenient way to create a batch file.
If an error occurs while executing the Batch commands, the processing will be aborted, and control returned to OxMetrics. All successful statements up to that point will have been executed.
There are five ways of running Batch commands, as discussed in the next four sections.
The Tools/Batch Editor command activates the edit window in which you can edit/load/save a set of batch commands. The file extension used for batch files is .FL.
A text selection containing Batch commands can be run directly from that window using the Edit/Run as Batch command (or Ctrl+b as short cut).
If you use File/Open, and set the file type to Run Batch file (*.fl), then the batch file is run immediately.
You can double click on a .FL file in the Windows Explorer to run the file directly. If OxMetrics is not active yet, it will be started automatically.
A batch file can be called from another batch file, see §11.5.15.
When a batch file is run, the working folder is set to that of the batch file. This allows data and algebra files which reside in the folder of the batch file to be found, even if the batch file is run from somewhere else. In addition the chdir command allows specifying a new default directory. Finally, paths to data files can be hard coded, but that is to be avoided if possible.
When a batch file is loaded in the batch editor, and then run, there will not be an associated default folder. However, the process of opening the batch file would set the default folder to that of the batch file.
Table Table:11.1 gives an alphabetical list of the OxMetrics batch language statements.
| algebra {...} |
| appenddata("filename", "group"=""); |
| appresults("filename"); |
| break; |
| chdir("path"); |
| closedata("databasename"); |
| command("command_line"); |
| database("name", year1, period1, year2, period2, frequency); |
| draw(area, "y", "mode"=""); |
| drawf(area, "y", "function", d1=0, d2=0); |
| drawx(area, "y", "x", "mode"=""); |
| drawz(area, "y", "x", "mode"=""); |
| exit; |
| loadalgebra("filename"); |
| loadbatch("filename"); |
| loadcommand("filename"); |
| loaddata("filename"); |
| loadgraph("filename"); |
| module("name"); |
| package("packagename", "modeltype"=""); |
| print("text"); |
| println("text"); |
| savedata("filename"); |
| savedrawwindow("filename", "window"=""); |
| saveresults("filename"); |
| setdraw("option", i1=0, i2=0, i3=0, i4=0, i5=0); |
| setdrawwindow("name"); |
| show; |
| usedata("databasename", i1=0); |
Anything between /* and */ is considered comment. Note that this comment cannot be nested. Everything following // up to the end of the line is also comment.
There are two types of batch commands:
Table Table:11.1 shows two instances of default arguments, in the appenddata command, and in the setdraw command. For example, when omitting the last argument as in
appenddata("data.in7");
the command is actually interpreted as:
appenddata("data.in7", "");
In the following list, function arguments are indicated by words, whereas the areas where statement blocks are expected are indicated by .... Examples follow the list of descriptions. For terms in double quotes, the desired term must be substituted and provided together with the quotes. A command summary is given in Table Table:11.1. Module specific commands are documented with each module (so, for PcGive batch commands consult Volume I of the PcGive books).
Contains the algebra code to execute. The block of algebra should be enclosed in curly braces, for example:
algebra
{ // Create SAVINGSL in database test
SAVINGSL = lag(INC,1) - lag(CONS, 1);
y = log(Y);
}
Appends the data from the named data file to the existing in-memory database. If no database exists yet, the named data file will be opened as in the loaddata() command. If the second argument is omitted, or equal to "" (that is, no group name is specified) the whole file will be appended, otherwise only the named group.
Append the contents of the Results window to the named file and clear the Results window.
Stop the batch file, return to OxMetrics menus. While a batch file is running, there is no way of stopping it other than this batch command.
Sets the current working folder to the specified path. There are three special forms of this command:
Closes the database, which is currently loaded in OxMetrics. The database is closed without saving, so any changes which have been made since the last save are lost.
Sends the line of text directly to the active module. The module must be able to accept commands (either via batch, or as an input console module).
Creates a database with the specified name and frequency, and sample period year1 (period1) to year2 (period2).
Graphs variable y from the current database in the requested area
(the first area has number 0). If the second argument is missing,
the graph is a standard time-series plot. For other types of graphs,
mode can be one of:
| "bar" | draw bars |
| "diff" | take first differences of y |
| "dlog" | growth rates (first difference of natural logarithm) |
| "index" | draw index line |
| "log" | take natural logarithms of y |
| "logscale" | take natural logarithms and show on (natural) log scale |
| "match" | match in variables area by means and ranges to first variable |
| "startat100" | scale the variable to start at 100. |
| "symbol" | draw symbols as well as the line |
| "twoscale" | as "match", but showing second variable in area on right-hand axis |
Graphs a function of the variable y from the current database in the
requested area (the first area has number 0). The mode argument can
be one of:
| "acf" | autocorrelation function with specified lag length |
| "boxplot" | boxplot |
| "cdf" | estimated distribution (QQ plot against normal) |
| "chisq" | QQ plot against Chi2(d1) |
| "cumfreq" | cumulative frequency count (for non-standard bars, specify the bar count) |
| "density" | estimated density (for non-standard bars, specify the bar count) |
| "f" | QQ plot against F(d1,d2) |
| "frequency" | frequency count (for non-standard bars, specify the bar count) |
| "histogram" | histogram (for non-standard bars, specify the bar count) |
| "normal" | QQ plot against normal with same mean and variance |
| "pacf" | partial autocorrelation function with specified lag length |
| "periodogram" | periodogram |
| "spectrum" | spectral density using specified lag length |
| "t" | QQ plot against student-t(d1) |
| "uniform" | Quantile plot (QQ against uniform) |
Graphs variable y against x from the current database in the requested
area (the first area has number 0). This is a standard scatter plot, but
if the second argument equals "", the graph is a standard time-series plot.
For other types of graphs, mode can be one of:
| "alt" | use alternate style: x,y labels along axes |
| "project" | add regression line with projections |
| "regression" | add a regression line |
| "smooth" | add a cubic spline (automatic bandwidth) |
Graphs variable y against x by z from the current database in the requested
area (the first area has number 0). This is a standard scatter plot, but the
second argument may equal "" for a time-series plot. The mode argument can
be one of:
| "bar" | z is error bar |
| "band" | z is error band |
| "fan" | z is error fan |
| "hilo" | high-low |
| "symbol" | z is symbol size |
| "value" | print values of z |
| "contour" | contour from scattered data; x,y,z are vectors |
| "contourx" | contour from tabular data; x,y are vectors, z=z(x,y) |
| "contoury" | contour from tabular data; x,y are vectors, z=z(y,x) |
| "points" | sequence of 3D points |
| "surface" | surface from scattered data; x,y,z are vectors |
| "surfacex" | surface from tabular data; x,y are vectors, z=z(x,y) |
| "surfacey" | surface from tabular data; x,y are vectors, z=z(y,x) |
| "trisurface" | triangulated surface from scattered data; x,y,z are vectors |
Stops the batch file and exits OxMetrics.
Loads and executes the algebra code from the named .ALG file. The extension must be provided. See §11.3 on the default folder.
Loads and executes the batch code from the named .FL file. The extension must be provided. See §11.3 on the default folder.
Sends the contents of the named file to the active module. The module must be able to accept commands (either via batch, or as an input console module).
Load the data from the named .IN7 file. To load, a full pathname would sometimes have to be specified, e.g. "c:\mydata\data.in7". The extension indicates the data type and must be provided. Inside OxMetrics, the database will then be known as data.in7, and will be the current default database. See §11.3 on the default folder.
If the database is already opened, the command works as usedata().
Load the graph from the named file. To load, a full pathname would sometimes have to be specified, e.g. "c:\mydata\data.in7". The extension indicates the file type and must be provided (but OxMetrics will always require the .gwg file for loading, so when using loadgraph("fig1.eps"), OxMetrics will look for fig1.gwg.
Starts the specified module. When the module is already active, it becomes the focus for subsequent module-specific batch commands. Otherwise the module is started first.
Used to specify the package and model type for modelling:
package("PcGive", "Single-equation");
Prints the specified text in the Results window. This can be useful to introduce explanatory notes. The println commands adds a new line after printing the text.
Save the current database to the named .IN7 file. The .BN7 file will get the same base name. If files with these names already exist, they will be overwritten!
If no window is specified, the command saves the current Batch Graphics window to the named file. Otherwise it saves the named graphics window.
Save the contents of the results to the named file and clear the Results window. If a file with that name already exists, it will be overwritten!
This command changes the default settings used in graphics. The easiest way to use this command is to design the layout using the interactive dialogs (Graphics Setup and Postscript Setup). Then use the Write as batch command on both pages of the Graphics Setup dialog to record the batch code in the results window. This can then be pasted to the batch editor, and saved to a file. Or edited, and run directly from the results window.
| option | changes | option | changes |
| "axis" | axis fonts/ticks | "legendfontsize" | legend font size |
| "axisline" | axis options | "legendhide" | legend hiding |
| "axisformat" | format of labels | "legendresize" | legend resizing |
| "box" | box and grid | "line" | colour line settings |
| "bw" | black&white settings | "linebw" | b&w line settings |
| "color" | colour settings | "margin" | paper margins |
| "colormodel" | PostScript model | "palette_max" | palette light colour |
| "default" | reset default | "palette_min" | palette dark colour |
| "font" | font | "papercolor" | Paper colour |
| "grid" | grid style | "printpage" | PostScript paper |
| "histogram" | bar colours | "symbol" | symbol settings |
| "legend" | legend style | "xystyle" | cross-plot style |
Use of the default option is as follows:
| setdraw("default") | reset all graphics defaults, |
| setdraw("default", 0) | reset all graphics defaults, |
| setdraw("default", 1) | reset layout, leave line types and colours, |
| setdraw("default", 2) | reset line types and colours, leave layout. |
The following table lists the integer arguments for each option, with the range of possible values. If no range is given, the argument is a size in pixel coordinates (see Ch. 12). The default width and precision for axisformat is 8 and 6 respectively.
| option | i1 | i2 | i3 | i4 | i5 |
| "axis" | fontsize | step | tick | ||
| "axisline" | no X-line:0--1 | no Y-line:0--1 | :0--1 | no small Y:0--1 | |
| "axisformat" | width | precision | :0--1 | :0--1 | |
| "box" | box:0--1 | X-grid:0--1 | Y-grid:0--1 | ||
| "bw" | lineno:0--15 | red:0--255 | green:0--255 | blue:0--255 | |
| "color" | lineno:0--15 | red:0--255 | green:0--255 | blue:0--255 | |
| "colormodel" | model:0--3 | ||||
| "default" | 0,1,2 | ||||
| "font" | fontno:0--3 | fontsize | |||
| "grid" | color:0--15 | type:0--15 | |||
| "histogram" | inside:0--15 | outside:0--15 | |||
| "legend" | boxed:0--1 | columns | |||
| "legendfontsize" | |||||
| "legendhide" | hide:0--1 | ||||
| "legendresize" | resize:0--1 | ||||
| "line" | lineno:0--15 | linetype:0--4 | width | on | off |
| "linebw" | lineno:0--15 | linetype:0--4 | width | on | off |
| "margin" | left | top | |||
| "palette_max" | index:0--7 | red:0--255 | green:0--255 | blue:0--255 | |
| "palette_min" | index:0--7 | red:0--255 | green:0--255 | blue:0--255 | |
| "papercolor" | red:0--255 | green:0--255 | blue:0--255 | ||
| "printpage" | :0--1 | papertype:0--2 | X-size | Y-size | |
| "symbol" | lineno:0--15 | symtype:0--4 | size | ||
| "xystyle" | :0--1 | 3d along axes:0--1 |
| linetype | papertype | colormodel | ||||
| solid | 0 | A4 | 0 | black &white | ||
| dotted | 1 | Letter | 1 | black, white, gray | ||
| dashed | 2 | user defined | 2 | gray | ||
| long dashes | 3 | color | ||||
| user defined | ||||||
| symboltype | ||||||
| 0 | solid box | 5 | none | 10 | diamond | |
| 1 | open box | 6 | line | 11 | solid diamond | |
| 2 | plus | 7 | solid circle | 12 | cross | |
| 3 | dash | 8 | triangle | |||
| 4 | circle | 9 | solid triangle | |||
Sets the name of the graphics window in which the graphs appear. The default is "Batch Graphics".
Shows the graph created in batch. Note that graphs from draw/drawf/drawx/drawz commands are not displayed until the show command is issued.
Sets the default database to databasename. The database must already be loaded into OxMetrics. Normally, only the one-argument function would be used, e.g, usedata("ukm1.in7"). If the second argument is 1, the database is marked clean: when it is closed there is no prompt for saving it. This can be useful after a section of algebra, which implements basic transformation for a dataset. This can be used as follows:
loaddata("ukm1.in7");
loadalgebra("ukm1.alg");
usedata("ukm1.in7", 1);
We finish with two example batch files. The first uses most non-graphics Batch commands, as well as some commands from the PcGive module.
database("test", 1950, 1, 2000, 4, 4); // Create the database
// Append the tutorial data set to test
// note that test has a longer sample then data.in7
chdir("C:\Program Files\OxMetrics5"); // default install folder
// it is better to use chdir("#home");
appenddata("data.in7");
loaddata("data.in7"); // Load the tutorial data. Now there are
// two databases, with data.in7 the default database
usedata("test"); // make test the default database
algebra
{ // Create SAVINGSL in database test
SAVINGSL = lag(INC,1) - lag(CONS, 1);
}
package("PcGive", "Single-equation");
system
{
Y = CONS; // The endogenous variable
Z = Constant, CONS_1, CONS_2; // the regressors
}
estimate("OLS", 1953, 3, 1992, 3, 8);
// Estimate the system by OLS over 1953(2)-1992(3)
// withhold 8 forecasts
println("Saving results to test.out");
saveresults("test.out"); // Save the contents of the Results
// window to test.out and clear the window
testsummary; // Do the test summary
println("Appending results to test.out");
appresults("test.out"); // Append the contents of the Results
// window to test.out, and clear the window
break; // stop the batch run, remaining
// commands will not be executed
savedata("newtest.in7");// save test to newtest.in7/newtest.bn7
exit; // Exit OxMetrics and PcGive
The final example uses many graphics batch commands, and is supplied as plots.fl with OxMetrics:
chdir("#home");
loaddata("data.in7");
setdrawwindow("Time plots");
draw(0, "CONS");
draw(0, "INC","match");
draw(1, "CONS");
draw(1, "INC","twoscale");
draw(2, "CONS","startat100");
draw(2, "INC","startat100");
draw(3, "CONS","log");
draw(4, "CONS","logscale");
draw(5, "CONS","dlog");
show;
setdrawwindow("Scatter plots");
drawx(0, "CONS","INC","alt");
drawx(1, "CONS","INC","regression");
drawx(2, "CONS","INC","project");
drawx(3, "CONS","INC","smooth");
show;
setdrawwindow("Function plots");
drawf(0, "CONS","acf", 5);
drawf(0, "CONS","pacf", 5);
drawf(1, "CONS","periodogram");
drawf(2, "CONS","spectrum", 5);
drawf(3, "CONS","density");
drawf(3, "CONS","histogram");
drawf(4, "CONS","cdf");
drawf(5, "CONS","frequency");
drawf(6, "CONS","cumfreq", 20);
drawf(7, "CONS","boxplot");
drawf(8, "CONS","normal");
show;
algebra
{
z = fabs(INC - 890) * 10;
}
usedata("data.in7", 1); // mark as clean
setdrawwindow("Z plots");
drawz(0, "CONS","","z", "bar");
drawz(1, "CONS","","z", "band");
drawz(2, "CONS","","INC", "hilo");
drawz(3, "CONS","INC","z", "value");
drawz(4, "CONS","INC","z", "symbol");
show;
database("3d_scatter", 1, 1, 625, 1, 1);
algebra
{ x = (ranu() - 0.5) * 5;
y = (ranu() - 0.5) * 5;
z = exp(-x^2-y^2);
}
setdrawwindow("3D contour plots");
drawz(0, "x","y","z", "contour");
show;
setdrawwindow("3D surface plots");
drawz(0, "x","y","z", "surface");
drawz(1, "x","y","z", "trisurface");
show;
Graphs in OxMetrics are drawn on a graphics worksheet, consisting of 10000 by 15000 pixels, with (0,0) in the bottom left corner:
These pixels are virtual and different from screen pixels: the paper is always 10000 ×15000, regardless of the screen resolution or the size on screen.
Positions can be specified in pixel coordinates, as for example (px,py) =(700,3200). More often it is convenient to use real-world coordinates to map the pixel coordinates into real data values. This is done by specifying an area on the graphics worksheet, and attaching real-world coordinates to it. These areas are allowed to overlap, but need not:
The areas are numbered from left to right and top to bottom, counting starts at zero. Suppose we have set up all areas as being from (x,y)=(0.0,0.0) to (x,y)=(1.0,1.0) (again within each area the origin is the lower left corner). Then we can draw a line through area 2 in two ways:
where we assume that (600,600) to (3600,3600) are the pixel coordinates chosen for area 2. Drawing in real-world coordinates has the advantage that it corresponds more closely to our data.
A graph is created from data in a database using the Graphics toolbar button, or Model/Graphics. These graphs appear in the window labelled OxMetrics Graphics. Subsequent graphs are added to this window. To start from scratch, close the window, or select and delete each area. To keep the existing OxMetrics graphics window and open a new one for subsequent graphs, use File/New Data Plot Window (note that the menu contents change according to the type of window which is active).
Modules such as PcGive will put their graphs in a window with a different name.
plots the actual values of all selected variables against time (or the observation index for undated series), together or each in a separate graph. If there are missing values, these show up as gaps in the line.
This creates as many graphs as there are series.
The available scatter-plot types are:
The Style, Regression and Smoothing sections allow additional options and combinations.
If there are missing values, these are omitted from the graphs.
The histogram is a simple graph: the range of xt is divided into intervals, and the number of observations in each interval is counted. The height of each bar records the number of entries in that interval. In OxMetrics, these are divided by the total number of observations to show the relative frequency (use bar to keep the count). OxMetrics sets a default number of intervals dependent on the sample size, but this can be changed by the user by clicking on Smoothing and Use custom bar count to set a different value.
The sample autocorrelation function (ACF), or correlogram, plots the correlations r̂j between xt and successive xt-j, Also see §9.3. The length of the ACF can be set by the user. Since the correlation between xt and xt is always unity, it is not drawn in the graphs.
The partial autocorrelation coefficients correct the autocorrelation for the effects of previous lags. So the first partial autocorrelation coefficient equals the first normal autocorrelation coefficient.
The sample cross-correlation function (CCF) graphs the correlation between a series and the lags of another series.
The periodogram and the spectrum (or more accurately here: spectral density) graph the series in the frequency domain. The sample periodogram is based on the coefficients of the Fourier decomposition of the sample autocorrelations (that is, the correlations {r̂j} between xt and xt-j).
The sample spectral density is a smoothed (and scaled) function of the periodogram. It is symmetric between -π and π, and so is only graphed for [0,π] ; 1 on the horizontal axis stands for π, 0.5 for 0.5π, etc. Peaks at certain frequencies can indicate regular (cyclical or seasonal) behaviour in the series.
In a seasonal sub-plot, the data are displayed by season. For example, for quarterly data, first the quarter 1 data are graphed, then quarter 2, etc. This helps detecting changing seasonal patterns.
QQ (quantile-quantile), or cross-probability, plots:
As discussed in the companion book on PcGive, statisticians view variables as being described by probability distributions. If X is the variable, and x a value it could take, then
| Px( X>x) |
is the probability that the value is in fact greater than x. For example, if the variable is the height of a child, when x is one metre, Px( X>1) is the probability that the child is taller than a metre. The values of x can be any number, but Px( .) cannot be negative or exceed unity. Plotting Px( X>x) against x generates an ∫-shaped curve (but more stretched out horizontally), which is hard to interpret.
When X has a uniform distribution over ( 0,1) , if Px( .) is plotted against x in a unit square, the result is a straight line. A similar idea applies to all distributions and QQ plots can be selected so some reference distribution is a straight line with the empirical distribution compared to it.
The QQ plot options page on the Graphics dialog allows for a choice of reference distributions. For the t, F and χ2 distribution it is necessary to also supply the degrees of freedom arguments. Drawing a variable against a uniform results in a so-called quantile plot.
A bubble chart consists of a scatter plot, where the symbols are circles, and the size of the circle indicates a third dimension (e.g. market share).
With 3-dimensional plots there are two ways of presenting the data: as a scatter of points in three-dimensional space, or as a table with the X and Y values along the side, and the Z values in the cells. The tabular format specifies the 3D surface directly, whereas for the scatter it is left to OxMetrics to work out the surface. Although the tabular format leads to better results, the scatter is easier to use, as it requires only three variables.
To create a surface from a scatter, OxMetrics derives a triangulation, and from the triangulation a smooth surface. This method works best when the surface is smooth.
There are a few additional edit actions available for 3-dimensional graphs:
The Edit menu has an entry for rotation: left/right (azimuth, key short-cut is < and >), up/down (elevation, key short-cut is + and -) and side ways (twist). This can be done for the currently selected 3-d area, or for all graphs.
The Style entry for the 3D graph on Edit Graphs allows for a choice of palette.
The same page allows for the addition of contour lines to a surface (but not a triangulation).
Select File/Print to print to an attached printer and File/Print Preview to first view the results. The following control of headers and footers is available for printing from File/Print Page Setup: