/usr/share/gretl/gretlgui.hlp

# add Tests "Add variables to model"

The selected variables are added to the previous model and the new model estimated. A test statistic for the joint significance of the added variables is printed, along with its p-value. 

Menu path: Model window, /Tests/Add variables

Script command: <@ref="add">

# addline Graphs "Add line to graph"

This dialog box allows you to add a line, defined via a formula, to a graph. The formula must be an expression acceptable to gnuplot. Use <@lit="x"> to denote the value of the variable on the x-axis. Please note that gnuplot uses <@lit="**"> for exponentiation (raising to a power), and that the decimal character must be given as ".". Examples: 

<code>          
   10+0.35*x
   100+5.3*x-0.12*x**2
   sin(x)
   exp(sqrt(pi*x))
</code>

# adf Tests "Augmented Dickey-Fuller test"

This command needs an integer lag order; if the order is zero a standard (not augmented) Dickey–Fuller test is run. Computes a set of Dickey–Fuller tests on the selected variable, the null hypothesis being that the variable has a unit root. (But if the differencing option is selected, the first difference of the variable is taken prior to testing, and the discussion below must be taken as referring to the transformed variable.) 

In all cases the dependent variable is the first difference of the specified variable, <@itl="y">, and the key independent variable is the first lag of <@itl="y">. The model is constructed so that the coefficient on lagged <@itl="y"> equals the root in question minus 1. For example, the model with a constant may be written as 

  <@fig="adf1">

Under the null hypothesis of a unit root the coefficient on lagged <@itl="y"> equals zero; under the alternative that <@itl="y"> is stationary this coefficient is negative. 

If the lag order, <@itl="k">, is greater than 0, then <@itl="k"> lags of the dependent variable are included on the right-hand side of the test regressions, subject to the following qualification. If the box labeled "test down from maximum lag" is checked, the selected lag order is taken as a maximum and the actual lag order used is obtained by testing down, using the criterion chosen via the accompanying drop-down list. The modified AIC and modified BIC methods are as described in <@bib="Ng and Perron (2001);ng-perron01">; the lag order is chosen so as to optimize an appropriately modified version of the Akaike Information Criterion (AIC) or the Schwartz Bayesian Criterion (BIC). The <@itl="t">-statistic method is a follows: 

<indent>
1. Estimate the Dickey–Fuller regression with <@itl="k"> lags of the dependent variable. 
</indent>

<indent>
2. Is the last lag significant? If so, execute the test with lag order <@itl="k">. Otherwise, let <@itl="k"> = <@itl="k"> – 1; if <@itl="k"> equals 0, execute the test with lag order 0, else go to step 1. 
</indent>

In the context of step 2 above, "significant" means that the <@itl="t">-statistic for the last lag has an asymptotic two-sided <@itl="p">-value, against the normal distribution, of 0.10 or less. 

<@itl="P">-values for the Dickey–Fuller tests are based on <@bib="MacKinnon (1996);mackinnon96">. The relevant code is included by kind permission of the author. In the case of the test with linear trend using GLS these <@itl="P">-values are not applicable; critical values from Table 1 in <@bib="Elliott, Rothenberg and Stock (1996);ERS96"> are shown instead. 

Menu path: /Variable/Unit root tests/Augmented Dickey-Fuller test

Script command: <@ref="adf">

# anova Statistics "ANOVA"

Analysis of Variance: <@var="response"> is a series measuring some effect of interest and <@var="treatment"> must be a discrete variable that codes for two or more types of treatment (or non-treatment). For two-way ANOVA, the <@var="block"> variable (which should also be discrete) codes for the values of some control variable. 

The null hypothesis for the <@itl="F">-test is that the mean response is invariant with respect to the treatment type, or in words that the treatment has no effect. Strictly speaking, the test is valid only if the variance of the response is the same for all treatment types. 

Note that the results shown by this command are in fact a subset of the information given by the following procedure, which is easily implemented in gretl. Create a set of dummy variables coding for all but one of the treatment types. For two-way ANOVA, in addition create a set of dummies coding for all but one of the "blocks". Then regress <@var="response"> on a constant and the dummies using <@ref="ols">. For a one-way design the ANOVA table is printed via the <@opt="--⁠anova"> option to <@lit="ols">. In the two-way case the relevant <@itl="F">-test is found by using the <@ref="omit"> command. For example (assuming <@lit="y"> is the response, <@lit="xt"> codes for the treatment, and <@lit="xb"> codes for blocks): 

<code>          
   # one-way
   list dxt = dummify(xt)
   ols y 0 dxt --anova
   # two-way
   list dxb = dummify(xb)
   ols y 0 dxt dxb
   # test joint significance of dxt
   omit dxt --quiet
</code>

Menu path: /Model/Other linear models/ANOVA

Script command: <@ref="anova">

# ar Estimation "Autoregressive estimation"

Computes parameter estimates using the generalized Cochrane–Orcutt iterative procedure; see Section 9.5 of <@bib="Ramanathan (2002);ramanathan02">. Iteration is terminated when successive error sums of squares do not differ by more than 0.005 percent or after 20 iterations. 

The "list of AR lags" specifies the structure of the error process. For example, the entry "1 3 4" corresponds to the process: 

  <@fig="arlags">

Menu path: /Model/Time series/Autoregressive estimation

Script command: <@ref="ar">

# ar1 Estimation "AR(1) estimation"

Computes feasible GLS estimates for a model in which the error term is assumed to follow a first-order autoregressive process. 

The default method is the Cochrane–Orcutt iterative procedure; see for example section 9.4 of <@bib="Ramanathan (2002);ramanathan02">. Iteration is terminated when successive estimates of the autocorrelation coefficient do not differ by more than 0.001 or after 20 iterations. 

If the <@opt="--⁠pwe"> option is given, the Prais–Winsten estimator is used. This involves an an iteration similar to Cochrane–Orcutt; the difference is that while Cochrane–Orcutt discards the first observation, Prais–Winsten makes use of it. See, for example, Chapter 13 of <@bib="Greene's <@itl="Econometric Analysis"> (2000);greene00"> for details. 

If the <@opt="--⁠hilu"> option is given, the Hildreth–Lu search procedure is used. The results are then fine-tuned using the Cochrane–Orcutt method, unless the <@opt="--⁠no-corc"> flag is specified. The <@opt="--⁠no-corc"> option is ignored for estimators other than Hildreth–Lu. 

Menu path: /Model/Time series/AR(1)

Script command: <@ref="ar1">

# arch Estimation "ARCH model"

This command is retained at present for backward compatibility, but you are better off using the maximum likelihood estimator offered by the <@ref="garch"> command; for a plain ARCH model, set the first GARCH parameter to 0. 

Estimates the given model specification allowing for ARCH (Autoregressive Conditional Heteroskedasticity). The model is first estimated via OLS, then an auxiliary regression is run, in which the squared residual from the first stage is regressed on its own lagged values. The final step is weighted least squares estimation, using as weights the reciprocals of the fitted error variances from the auxiliary regression. (If the predicted variance of any observation in the auxiliary regression is not positive, then the corresponding squared residual is used instead). 

The <@lit="alpha"> values displayed below the coefficients are the estimated parameters of the ARCH process from the auxiliary regression. 

See also <@ref="garch"> and <@ref="modtest"> (the <@opt="--⁠arch"> option). 

Menu path: /Model/Time series/ARCH

Script command: <@ref="arch">

# arima Estimation "ARMA model"

Estimates an ARMA model, with or without exogenous regressors. If the order of differencing is greater than zero the model becomes ARIMA. If the data have a frequency greater than 1 the option of including a seasonal component is presented. 

If you wish to include only specified AR or MA lags in the model (as opposed to all lags up to a given order) check the box to the right of the spinner and type a list of lags, separated by spaces, into the entry field. Alternatively, if you have defined a matrix containing the desired set of lags you can type its name into the entry field. 

The default is to use the "native" gretl ARMA functionality, with estimation by exact ML using the Kalman filter; estimation via conditional ML is available as an option. (If X-12-ARIMA is installed you have the option of using it instead of native code.) For details regarding these options, please see <@pdf="the Gretl User's Guide">. 

The AIC value given in connection with ARIMA models is calculated according to the definition used in X-12-ARIMA, namely 

  <@fig="aic">

where <@fig="ell"> is the log-likelihood and <@itl="k"> is the total number of parameters estimated. Note that X-12-ARIMA does not produce information criteria such as AIC when estimation is by conditional ML. 

The AR and MA roots shown in connection with ARMA estimation are based on the following representation of an ARMA(p, q) process: 

<mono>          
   (1 - a_1*L - a_2*L^2 - ... - a_p*L^p)Y =
          c + (1 + b_1*L + b_2*L^2 + ... + b_q*L^q) e_t
</mono>

The AR roots are therefore the solutions to 

<mono>          
         1 - a_1*z - a_2*z^2 - ... - a_p*L^p = 0
</mono>

and stability requires that these roots lie outside the unit circle. 

The "frequency" figure printed in connection with AR and MA roots is the λ value that solves <@itl="z"> = <@itl="r"> * exp(i*2*π*λ) where <@itl="z"> is the root in question and <@itl="r"> is its modulus. 

Menu path: /Model/Time series/ARIMA
Other access: Main window pop-up menu (single selection)

Script command: <@ref="arima">

# bfgs-config Estimation "BFGS options"

This dialog allows you to control some aspects of the operation of the BFGS maximizer. In case the maximizer fails to converge it may help matters, in some cases, to increase the number of iterations allowed and/or to increase (make more permissive) the convergence tolerance. However, you should be suspicious of results obtained using a high tolerance and should consider the possibility that the model you are estimating is misspecified. 

For most applications we recommend use of the regular BFGS maximizer but for some problems the "limited memory" variant of the algorithm, L-BFGS-B, may produce more rapid convergence. When L-BFGS-B is selected, you have the option of setting the number of corrections used in the limited memory matrix (between 3 and 20, with a default of 8). 

# bootstrap Tests "Bootstrap options"

In this dialog you get to choose: 

<indent>
• The variable/coefficient to examine. (You can test only one coefficient at a time using this method.) 
</indent>

<indent>
• The sort of analysis to perform. The default (95 percent) confidence interval is based directly on the quantiles of the bootstrap coefficient estimates. The "studentized" version is as per Davidson and MacKinnon's <@itl="Economic Theory and Methods"> (ETM), chapter 5: at each bootstrap replication a <@itl="t">-ratio is formed as (a) the difference between the current and the baseline coefficient estimate, divided by (b) the baseline estimated standard error. Then the confidence interval is formed based on the quantiles of this t-ratio, as explained in ETM. The P-value option is based on the distribution of the bootstrap <@itl="t">-ratio: it is the proportion of the replications where the absolute value of this statistic exceeds the absolute value of the baseline <@itl="t">-ratio. 
</indent>

<indent>
• Resampled residuals versus simulate normal errors. In the first case the original residuals (rescaled as suggested in ETM) are resampled with replacement. In the second case pseudo-random normal values are generated with the original residual variance. 
</indent>

<indent>
• The number of replications to perform. Note that when you're constructing a 95 percent confidence interval it is desirable that 0.05(<@itl="B"> + 1)/2 is an integer (where <@itl="B"> is the number of replications). So gretl may adjust the chosen number of replications to ensure this is the case. 
</indent>

<indent>
• Whether or not to produce a graph of the bootstrap distribution. This option employs gretl's kernel density estimation facility. 
</indent>

# boxplot Graphs "Boxplots"

These plots display the distribution of a variable. The central box encloses the middle 50 percent of the data, i.e. it is bounded by the first and third quartiles. The "whiskers" extend to the minimum and maximum values. A line is drawn across the box at the median. A "+" sign is used to indicate the mean. If the option of showing a confidence interval for the median is selected, this is computed via the bootstrap method and shown in the form of dashed horizontal lines above and/or below the median. 

The "factorized" option allows you to examine the distribution of a chosen variable conditional on the value of some discrete factor. For example, if a data set contains wages and a gender dummy variable you can select the wage variable as the target and gender as the factor, to see side-by-side boxplots of male and female wages. 

Menu path: /View/Graph specified vars/Boxplots

Script command: <@ref="boxplot">

# bwfilter Transformations "The Butterworth filter"

The Butterworth filter is an appromixation to an ideal square-wave filter which allows frequencies over a certain range to pass at full strength while stopping all others. 

Higher values of the order parameter, <@itl="n">, produce a closer approximation to the ideal filter, in principle, but at the possible cost of numerical instability. The "cutoff" value sets the boundary between the pass band and the stop band. It is expressed in degrees, and must be greater than 0 and less than 180° (or π radians, corresponding to the highest frequency in the data). Smaller values of the cutoff produce a smoother trend. 

Inspecting the periodogram of the target series is a useful preliminary when you wish to apply this filter. See <@pdf="the Gretl User's Guide"> for details. 

Menu path: /Variable/Filter/Butterworth

# chow Tests "Chow test"

This command needs either an observation number (or date, with dated data), or the name of a dummy variable. 

Must follow an OLS regression. If an observation number or date is given, provides a test for the null hypothesis of no structural break at the given split point. The procedure is to create a dummy variable which equals 1 from the split point specified by <@var="obs"> to the end of the sample, 0 otherwise, and also interaction terms between this dummy and the original regressors. If a dummy variable is given, tests the null hypothesis of structural homogeneity with respect to that dummy. Again, interaction terms are added. In either case an augmented regression is run including the additional terms. 

By default an <@itl="F"> statistic is calculated, taking the augmented regression as the unrestricted model and the original as the restricted. But if the original model used a robust estimator for the covariance matrix, the test statistic is a Wald chi-square value based on a robust estimator of the covariance matrix for the augmented regression. 

Menu path: Model window, /Tests/Chow test

Script command: <@ref="chow">

# cluster Estimation "Robust variance estimation"

If you select the second option you must supply the name of a clustering variable. This variable should have at least two distinct values but generally should have substantially fewer distinct values than there are observations in the sample range. 

The "cluster-robust" variance estimator divides the sample into a number of subsets or clusters according to the value taken on by the selected variable. In place of the classical assumption that the error term is independently and identically distributed, this estimator allows for the error variance to differ by cluster and also allows for a degree of dependence of the error within each cluster. 

# coeffsum Tests "Sum of coefficients"

This command needs a list of variables, selected from the set of independent variables in a given model. 

Calculates the sum of the coefficients on the variables in the specified list. Prints this sum along with its standard error and the p-value for the null hypothesis that the sum is zero. 

Note the difference between this and <@ref="omit">, which tests the null hypothesis that the coefficients on a specified subset of independent variables are <@itl="all"> equal to zero. 

Menu path: Model window, /Tests/Sum of coefficients

Script command: <@ref="coeffsum">

# coint Tests "Engle-Granger cointegration test"

The Engle–Granger cointegration test. The default procedure is: (1) carry out Dickey–Fuller tests on the null hypothesis that each of the variables listed has a unit root; (2) estimate the cointegrating regression; and (3) run a DF test on the residuals from the cointegrating regression. If the box labeled "skip initial DF tests" is checked, however, the first of these steps is omitted. 

If the lag order, <@itl="k">, is greater than 0, then <@itl="k"> lags of the dependent variable are included on the right-hand side of each test regression, unless the box labeled "test down from maximum lag" is checked: in that case the selected lag order is taken as a maximum and the actual lag order used is obtained by testing down. See the <@ref="adf"> command for details of this procedure. 

By default, the cointegrating regression contains a constant. If you wish to suppress the constant, or to add a linear or quadratic trend, select the appropriate option from the set of radio buttons in the Cointegration dialog box. 

<@itl="P-">values for this test are based on <@bib="MacKinnon (1996);mackinnon96">. The relevant code is included by kind permission of the author. 

Menu path: /Model/Time series/Cointegration test/Engle-Granger

Script command: <@ref="coint">

# coint2 Tests "Johansen cointegration test"

Carries out the Johansen test for cointegration among the listed variables for the selected lag order. For details of this test see, for example, Hamilton, <@itl="Time Series Analysis"> (1994), Chapter 20. P-values are computed via Doornik's (1998) gamma approximation. Two sets of p-values are shown for the trace test, straight asymptotic values and values adjusted for the sample size. 

The inclusion of deterministic terms in the model is controlled by the drop-down option list. The default is to include an "unrestricted constant", which allows for the presence of a non-zero intercept in the cointegrating relations as well as a trend in the levels of the endogenous variables. In the literature stemming from the work of Johansen (see for example his 1995 book) this is often referred to as "case 3". The other four options produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the criteria for selecting a case are explained in <@pdf="the Gretl User's Guide">. 

You may control for exogenous variables by adding them to the lower list box. By default these enter the model in unrestricted form (indicated by a <@lit="U"> next to the name of the variable). If you want a certain exogenous variable to be restricted to the cointegrating space, right-click on it and select "Restricted" from the pop-up menu. The symbol next to the variable will change to R. 

If the data are quarterly or monthly, a check box is shown that allows you to include a set of centered seasonal dummy variables. In all cases, an additional check box ("Show details") allows for the printing of the auxiliary regressions that form the starting point of the Johansen maximum likelihood estimation procedure. 

The following table is offered as a guide to the interpretation of the results shown for the test, for the 3-variable case. <@lit="H0"> denotes the null hypothesis, <@lit="H1"> the alternative hypothesis, and <@lit="c"> the number of cointegrating relations. 

<mono>          
         Rank     Trace test         Lmax test
                  H0     H1          H0     H1
         ---------------------------------------
          0      c = 0  c = 3       c = 0  c = 1
          1      c = 1  c = 3       c = 1  c = 2
          2      c = 2  c = 3       c = 2  c = 3
         ---------------------------------------
</mono>

See also the <@ref="vecm"> command. 

Menu path: /Model/Time series/Cointegration test/Johansen

Script command: <@ref="coint2">

# compact Dataset "Compact data"

When you add to a dataset a series that is of higher frequency, it is necessary to "compact" the new series. For instance, a monthly series will have to be compacted to fit into a quarterly dataset. 

In addition, you may sometimes want to compact an entire dataset to a lower frequency (perhaps, prior to adding a lower-frequency variable to the dataset). 

Gretl offers four options for compacting: 

<indent>
• Averaging: The value written to the dataset will be the arithmetic mean of the relevant series values. For instance the value written for the first quarter of 1990 will be the average of the values for January, February and March of 1990. 
</indent>

<indent>
• Summing: The value written to the dataset will be the sum of the relevant higher-frequency values. For example, the first-quarter value will be the sum of the January, February and March values. 
</indent>

<indent>
• End-of-period values: The value written to the dataset is the last relevant value from the higher-frequency data. For example, the first quarter of 1990 will get the March 1990 value. 
</indent>

<indent>
• Start-of-period values: The value written to the dataset is the first relevant value from the higher-frequency data. For example, the first quarter of 1990 will get the January 1990 value. 
</indent>

In the case of compacting an entire dataset, the choice you make in this dialog box sets the default method. But if you have set a compaction method for an individual variable (menu item "Variable/Edit attributes") that method is used rather than the default. If the compaction method is already set for all variables, the choice of a default compaction method is not presented. 

# controlled Graphs "Scatterplot with control"

This command requires the selection of three variables, one for the X axis, one for the Y axis, and one for which you wish to control (call it Z). The plot shows adjusted Y against adjusted X, where the adjusted version of the variable is the residual from an OLS regression on Z. 

Example: You have data on wages, experience and education level for a sample of people. You wish to plot wages against education, controlling for experience. In that case you select wages for the Y axis, education for the X axis, and experience as the control. The plot shows wages against education, with both variables "purged" of the effect of experience. 

# corr Statistics "Correlation coefficients"

Prints the pairwise correlation coefficients (Pearson's product-moment correlation) for the selected variables. The default behavior is to use all available observations for computing each pairwise coefficient, but if the option box is checked the sample is limited (if necessary) so that the same set of observations is used for all the coefficients. This option has an effect only if there are differing numbers of missing values for the variables used. 

Menu path: /View/Correlation matrix
Other access: Main window pop-up menu (multiple selection)

Script command: <@ref="corr">

# corrgm Statistics "Correlogram"

Prints the values of the autocorrelation function for <@var="series">, which may be specified by name or number. The values are defined as ρ(<@itl="u"><@sub="t">, <@itl="u"><@sub="t-s">) where <@itl="u"><@sub="t"> is the <@itl="t"><@sup="th"> observation of the variable <@itl="u"> and <@itl="s"> denotes the number of lags. 

The partial autocorrelations (calculated using the Durbin–Levinson algorithm) are also shown: these are net of the effects of intervening lags. In addition the Ljung–Box <@itl="Q"> statistic is printed. This may be used to test the null hypothesis that the series is "white noise"; it is asymptotically distributed as chi-square with degrees of freedom equal to the number of lags used. 

If an <@var="order"> value is specified the length of the correlogram is limited to at most that number of lags, otherwise the length is determined automatically, as a function of the frequency of the data and the number of observations. 

By default, a plot of the correlogram is produced: a gnuplot graph in interactive mode or an ASCII graphic in batch mode. This can be adjusted via the <@opt="--⁠plot"> option. The acceptable parameters to this option are <@lit="none"> (to suppress the plot); <@lit="ascii"> (to produce a text graphic even when in interactive mode); <@lit="display"> (to produce a gnuplot graph even when in batch mode); or a file name. The effect of providing a file name is as described for the <@opt="--⁠output"> option of the <@ref="gnuplot"> command. 

Upon successful completion, the accessors <@lit="$test"> and <@lit="$pvalue"> contain the corresponding figures of the Ljung–Box test for the maximum order displayed. Note that if you just want to compute the <@itl="Q"> statistic, you'll probably want to use the <@xrf="ljungbox"> function instead. 

Menu path: /Variable/Correlogram
Other access: Main window pop-up menu (single selection)

Script command: <@ref="corrgm">

# count-model Estimation "Models for count data"

The dependent variable is taken to represent a count of the occurrence of events of some sort, and must have only non-negative integer values. By default the Poisson distribution is used, but the drop-down selector gives the options of using the Negative Binomial distribution. (The variant NegBin 2 is commonly used in econometrics, but the lesser used NegBin 1 is also available.) 

Optionally, you may add an "offset" variable to the specification. This is a scale variable, the log of which is added to the linear regression function (implicitly, with a coefficient of 1.0). This makes sense if you expect the number of occurrences of the event in question to be proportional, other things equal, to some known factor. For example, the number of traffic accidents might be supposed to be proportional to traffic volume, other things equal, and in that case traffic volume could be specified as an "offset" in a model of the accident rate. The offset variable must be strictly positive. 

By default, standard errors are computed using a numerical approximation to the Hessian at convergence. But if the "Robust standard errors" box is checked then QML standard errors are calculated, using a "sandwich" of the inverse of the Hessian and the outer product of the gradient. 

# curve Graphs "Plot a curve"

This dialog box allows you to create a gnuplot graph by specifying a formula. This must be an expression acceptable to gnuplot. Use <@lit="x"> to denote the value of the variable on the x-axis. Please note that gnuplot uses <@lit="**"> for exponentiation (raising to a power), and that the decimal character must be given as ".". Examples: 

<code>          
   10+0.35*x
   100+5.3*x-0.12*x**2
   sin(x)
   exp(sqrt(pi*x))
</code>

To put an additional line onto a graph created in this way, click on the graph and select "Edit", select the "Lines" tab in the graph editing dialog, and use the "Add line" button. 

# cusum Tests "CUSUM test"

Must follow the estimation of a model via OLS. Performs the CUSUM test—or if the <@opt="--⁠squares"> option is given, the CUSUMSQ test—for parameter stability. A series of one-step ahead forecast errors is obtained by running a series of regressions: the first regression uses the first <@itl="k"> observations and is used to generate a prediction of the dependent variable at observation <@itl="k"> + 1; the second uses the first <@itl="k"> + 1 observations and generates a prediction for observation <@itl="k"> + 2, and so on (where <@itl="k"> is the number of parameters in the original model). 

The cumulated sum of the scaled forecast errors, or the squares of these errors, is printed and graphed. The null hypothesis of parameter stability is rejected at the 5 percent significance level if the cumulated sum strays outside of the 95 percent confidence band. 

In the case of the CUSUM test, the Harvey–Collier <@itl="t">-statistic for testing the null hypothesis of parameter stability is also printed. See Greene's <@itl="Econometric Analysis"> for details. For the CUSUMSQ test, the 95 percent confidence band is calculated using the algorithm given in <@bib="Edgerton and Wells (1994);edgerton94">. 

Menu path: Model window, /Tests/CUSUM(SQ)

Script command: <@ref="cusum">

# datasort Dataset "Sorting data"

The selected variable is used as a sort key for the entire data set. The observations on all variables are re-ordered by increasing value of the key variable, or by decreasing value if you select the "Descending" option. 

# density Statistics "Kernel density estimation"

Kernel density estimation proceeds by defining a set of evenly spaced reference points, over a suitable range in relation to the range of the data, and attributing a density to each reference point based on the actual observations in the vicinity. 

The formula used to compute the estimated density at each reference point, <@itl="x">, is 

  <@fig="kernel1">

where <@itl="n"> denotes the number of data points, <@itl="h"> is a "bandwidth" parameter, and <@itl="k">() is the kernel function. The larger the value of the bandwidth parameter, the smoother the estimated density. 

You are given the choice of using a Gaussian kernel (the standard normal density) or the Epanechnikov kernel. By default, the bandwidth is that suggested as a rule of thumb by <@bib="Silverman (1986);silverman96">, namely 

  <@fig="kernel2">

where <@itl="s"> denotes the standard deviation of the data and IQR denotes the inter-quartile range. You can widen or shrink the bandwidth via the "bandwidth adjustment factor": the actual bandwidth used is obtained by multiplying the Silverman value by the adjustment factor. 

For a good introductory discussion of kernel density estimation see Chapter 15 of Davidson and MacKinnon's <@itl="Econometric Theory and Methods">. 

# dfgls Tests "The ADF-GLS test"

The ADF-GLS test is a variant of the Dickey–Fuller test for a unit root, for the case where the variable to be tested is assumed to have a non-zero mean or to exhibit a linear trend. The difference is that the de-meaning or de-trending of the variable is done using the GLS procedure suggested by Elliott, Rothenberg and Stock (1996). This gives a test of greater power than the standard Dickey–Fuller approach. 

See also the <@ref="adf"> command and the <@opt="--⁠gls"> option. 

Menu path: /Variable/Unit root tests/ADF-GLS test

# dialog Estimation "Model dialog box"

To select the dependent variable, highlight a variable in the list on the left and press the "Choose" button pointing to the Dependent variable slot. If you check the "Set as default" box, the selected variable will be pre-selected as dependent when the model dialog is next opened. Short-cut: double-click on a variable on the left to select it as the dependent variable and also set it as the default. 

To select independent variables, highlight them on the left and press the "Add" button (or click the right mouse button). You can highlight several contiguous variables by dragging with the mouse. You can highlight a group of non-contiguous variables by clicking on them with the <@lit="Ctrl"> key pressed. 

# dpanel Estimation "Dynamic panel models"

Carries out estimation of dynamic panel data models (that is, panel models including one or more lags of the dependent variable) using either the GMM-DIF or GMM-SYS method. 

The dependent variable and regressors should be given in levels form; they will be differenced automatically (since this estimator uses differencing to cancel out the individual effects). 

As regards the handling of instruments, please see the documentation for the script version of this command. Currently you cannot specify instruments explicitly in the GUI: all the independent variables are taken to be strictly exogenous. 

By default the results of 1-step estimation are reported (with robust standard errors). You may select 2-step estimation as an option. In both cases tests for autocorrelation of orders 1 and 2 are provided, as well as the Sargan overidentification test and a Wald test for the joint significance of the regressors. Note that in this differenced model first-order autocorrelation is not a threat to the validity of the model, but second-order autocorrelation violates the maintained statistical assumptions. 

For further details and examples, please see <@pdf="the Gretl User's Guide">. 

Menu path: /Model/Panel/Dynamic panel model

Script command: <@ref="dpanel">

# expand Dataset "Expand data"

If you wish to add to a dataset a series that is of lower frequency, it is necessary to "expand" the new series. For instance, a quarterly series will have to be expanded to fit into a monthly dataset. In addition, you may sometimes want to expand an entire dataset to a higher frequency (perhaps, prior to adding a higher-frequency variable to the dataset). 

Expansion of data should be considered an "expert" option; you need to know what you are doing. When combining series of differing original frequencies within one dataset, you should probably consider compacting the higher-frequency data rather than expanding the lower-frequency series. 

That said, gretl offers two options: higher-frequency values can be interpolated using the method of <@bib="Chow and Lin (1971);chowlin71">, or the values of the lower-frequency series can be repeated as many times as required. 

The Chow–Lin method is regression-based, using a constant and quadratic trend and assuming a first-order autoregressive process for the disturbances. Four degrees of freedom are used up by this procedure. 

As for the repetition of values, suppose we have a quarterly series with the value 35.5 in 1990:1, the first quarter of 1990. On expansion to monthly, the value 35.5 will be assigned to the observations for January, February and March of 1990. The expanded variable is therefore useless for fine-grained time-series analysis, outside of the special case where you know that the variable in question does in fact remain constant over the sub-periods. 

# export Dataset "Export data"

You may export data in Comma-Separated Values (CSV) format: such data may be opened in spreadsheets and many other application programs. If you select this option you will get some further options regarding the specific format of the CSV file. 

You also have the option of exporting data in the form of a "native" gretl datafile, or (if the data are suitable) exporting to a gretl database. See <@url="gretl.sourceforge.net/gretl_data.html"> for an account of gretl databases. 

You may also export data in a format suitable for use with the following programs: 

<indent>
• GNU R (<@url="www.r-project.org">) 
</indent>

<indent>
• GNU octave (<@url="www.gnu.org/software/octave">) 
</indent>

<indent>
• JMulTi (<@url="www.jmulti.de">) 
</indent>

<indent>
• PcGive (<@url="www.pcgive.com">) 
</indent>

If you wish to export data by copying to the clipboard rather than writing to a file on disk, select the series you want to copy in the main window, right-click, and select "Copy to clibboard". (Only CSV format is supported in this context.) 

# factorized Graphs "Factorized plot"

This command requires the selection of three variables, the last of which must be a dummy variable (values 1 or 0). The Y variable is plotted against the X variable, with the data points colored differently depending on the value of the third. 

Example: You have data on wages and educational attainment for a sample of people; you also have a dummy variable with value 1 for men and 0 for women (as in Ramanathan's <@lit="data7-2">). A "factorized plot" of <@lit="WAGE"> against <@lit="EDUC"> using the <@lit="GENDER"> dummy as factor will show the data points for men in one color and those for women in another (with a legend to identify them). 

# fcast Prediction "Generate forecasts"

Must follow an estimation command. Forecasts are generated for the specified range of observations. Depending on the nature of the model, standard errors may also be generated (see below). 

The choice between a static and a dynamic forecast applies only in the case of dynamic models, with an autoregressive error process or including one or more lagged values of the dependent variable as regressors. Static forecasts are one step ahead, based on realized values from the previous period, while dynamic forecasts employ the chain rule of forecasting. For example, if a forecast for <@itl="y"> in 2008 requires as input a value of <@itl="y"> for 2007, a static forecast is impossible without actual data for 2007. A dynamic forecast for 2008 is possible if a prior forecast can be substituted for <@itl="y"> in 2007. 

The default is to give a static forecast for any portion of the forecast range that lies within the sample range over which the model was estimated, and a dynamic forecast (if relevant) out of sample. The <@opt="--⁠dynamic"> option requests a dynamic forecast from the earliest possible date, and the <@opt="--⁠static"> option requests a static forecast even out of sample. 

<code>          
   fcast --plot=fc.pdf
</code>

will generate a graphic in PDF format. Absolute pathnames are respected, otherwise files are written to the gretl working directory. 

The nature of the forecast standard errors (if available) depends on the nature of the model and the forecast. For static linear models standard errors are computed using the method outlined by <@bib="Davidson and MacKinnon (2004);davidson-mackinnon04">; they incorporate both uncertainty due to the error process and parameter uncertainty (summarized in the covariance matrix of the parameter estimates). For dynamic models, forecast standard errors are computed only in the case of a dynamic forecast, and they do not incorporate parameter uncertainty. For nonlinear models, forecast standard errors are not presently available. 

Menu path: Model window, /Analysis/Forecasts

Script command: <@ref="fcast">

# fractint Statistics "Fractional integration"

Tests the specified series for fractional integration ("long memory"). The null hypothesis is that the integration order of the series is zero. By default the local Whittle estimator <@bib="(Robinson, 1995);robinson95"> is used but if the <@opt="--⁠gph"> option is given the GPH test <@bib="(Geweke and Porter-Hudak, 1983);GPH83"> is performed instead. If the <@opt="--⁠all"> flag is given then the results of both tests are printed. 

For details on this sort of test, see <@bib="Phillips and Shimotsu (2004);phillips04">. 

If the optional <@var="order"> argument is not given the order for the test(s) is set automatically as the lesser of <@itl="T">/2 and <@itl="T"><@sup="0.6">. 

The results can be retrieved using the accessors <@lit="$test"> and <@lit="$pvalue">. These values are based on the Local Whittle Estimator unless the <@opt="--⁠gph"> option is given. 

Menu path: /Variable/Unit root tests/Fractional integration

Script command: <@ref="fractint">

# freq Statistics "Frequency distribution"

In the frequency plot dialog box you can control the characteristics of the plot in either of two ways. 

First, you may choose the number of bins. In this case the width and placement of the bins are calculated automatically. 

Alternatively, you may specify the lower limit of the left-most bin, and the width of the bins. In this case the number of bins is calculated automatically. 

If you wish to align the bins on round numbers, here is one way to proceed: start by specifying the number of bins you want, and take a look at the plot that is produced. If it's not to your liking, take note of the modification that is required (for example, make the left-most bin start at 100 and impose a bin width of 200). Then make a second pass where you specify the left-hand limit and bin width. 

This dialog also allows you to select a theoretical distribution to be plotted against the data: either the normal or the gamma. If the normal option is selected the Doornik–Hansen test for normality is computed. If the gamma option is selected, gretl computes Locke's nonparametric test for the null hypothesis that the variable follows the gamma distribution. Note that the parameterization of the gamma distribution used in gretl is (shape, scale). 

Menu path: /Variable/Frequency distribution

Script command: <@ref="freq">

# garch Estimation "GARCH model"

Estimates a GARCH model (GARCH = Generalized Autoregressive Conditional Heteroskedasticity), either a univariate model or, if independent variables are selected, including the given exogenous variables. The conditional variance equation is shown below. 

  <@fig="garch_h">

The parameter <@var="p"> therefore represents the Generalized (or "AR") order, while <@var="q"> represents the regular ARCH (or "MA") order. If <@var="p"> is non-zero, <@var="q"> must also be non-zero otherwise the model is unidentified. However, you can estimate a regular ARCH model by setting <@var="q"> to a positive value and <@var="p"> to zero. The sum of <@var="p"> and <@var="q"> must be no greater than 5. 

By default native gretl code is used in estimation of GARCH models, but you also have the option of using the algorithm of <@bib="Fiorentini, Calzolari and Panattoni (1996);fiorentini96">. The former uses the BFGS maximizer while the latter uses the information matrix to maximize the likelihood, with fine-tuning via the Hessian. 

Several variant estimates of the coefficient covariance matrix are available with this command. By default, the Hessian is used unless the "Robust standard errors" box is checked, in which case the QML (White) covariance matrix is used. Other possibilities (e.g. the information matrix, or the Bollerslev–Wooldridge estimator) can be specified using the <@ref="set"> command. 

The estimated conditional variance, along with the residuals and various other model statistics, can be accessed and added to the dataset using the "Save" menu in the window where the model is displayed. If the box marked "Standardize the residuals" is checked, the residuals are divided by the square root of the conditional variance. 

Menu path: /Model/Time series/GARCH

Script command: <@ref="garch">

# genr Dataset "Generate a new variable"

NOTE: this command has undergone numerous changes and enhancements since the following help text was written, so for comprehensive and updated info on this command you'll want to refer to <@pdf="the Gretl User's Guide">. On the other hand, this help does not contain anything actually erroneous, so take the following as "you have this, plus more". 

Use this box to define a new variable, on the pattern <@var="name"> = <@var="formula">. The formula should be a well-formed combination of variable names, constants, operators and functions (details below). To ensure you get the type of variable you want, you can prefix the formula with a type-name, e.g. <@lit="scalar">, <@lit="series"> or <@lit="matrix">. For example, to create a series that has a constant value of 10, you can type 

<code>          
   series c = 10
</code>

(otherwise <@lit="c = 10"> would create a scalar variable). 

Supported <@itl="arithmetical operators"> are, in order of precedence: <@lit="^"> (exponentiation); <@lit="*">, <@lit="/"> and <@lit="%"> (modulus or remainder); <@lit="+"> and <@lit="-">. 

The available <@itl="Boolean operators"> are (again, in order of precedence): <@lit="!"> (negation), <@lit="&&"> (logical AND), <@lit="||"> (logical OR), <@lit=">">, <@lit="<">, <@lit="=">, <@lit=">="> (greater than or equal), <@lit="<="> (less than or equal) and <@lit="!="> (not equal). The Boolean operators can be used in constructing dummy variables: for instance <@lit="(x > 10)"> returns 1 if <@lit="x"> > 10, 0 otherwise. 

Built-in constants are <@lit="pi"> and <@lit="NA">. The latter is the missing value code: you can initialize a variable to the missing value with <@lit="scalar x = NA">. 

The <@lit="genr"> command supports a wide range of mathematical and statistical functions, including all the common ones plus several that are special to econometrics. In addition it offers access to numerous internal variables that are defined in the course of running regressions, doing hypothesis tests, and so on. For a listing of functions and accessors, see <@gfr="the Gretl function reference">. 

Besides the operators and functions noted above there are some special uses of <@lit="genr">: 

<indent>
• <@lit="genr time"> creates a time trend variable (1,2,3,…) called <@lit="time">. <@lit="genr index"> does the same thing except that the variable is called <@lit="index">. 
</indent>

<indent>
• <@lit="genr dummy"> creates dummy variables up to the periodicity of the data. In the case of quarterly data (periodicity 4), the program creates <@lit="dq1"> = 1 for first quarter and 0 in other quarters, <@lit="dq2"> = 1 for the second quarter and 0 in other quarters, and so on. With monthly data the dummies are named <@lit="dm1">, <@lit="dm2">, and so on. With other frequencies the names are <@lit="dummy_1">, <@lit="dummy_2">, etc. 
</indent>

<indent>
• <@lit="genr unitdum"> and <@lit="genr timedum"> create sets of special dummy variables for use with panel data. The first codes for the cross-sectional units and the second for the time period of the observations. 
</indent>

<@itl="Note">: In the command-line program, <@lit="genr"> commands that retrieve model-related data always reference the model that was estimated most recently. This is also true in the GUI program, if one uses <@lit="genr"> in the "gretl console" or enters a formula using the "Define new variable" option under the Add menu in the main window. With the GUI, however, you have the option of retrieving data from any model currently displayed in a window (whether or not it's the most recent model). You do this under the "Save" menu in the model's window. 

The special variable <@lit="obs"> serves as an index of the observations. For instance <@lit="genr dum = (obs=15)"> will generate a dummy variable that has value 1 for observation 15, 0 otherwise. You can also use this variable to pick out particular observations by date or name. For example, <@lit="genr d = (obs>1986:4)">, <@lit="genr d = (obs>"2008-04-01")">, or <@lit="genr d = (obs="CA")">. If daily dates or observation labels are used in this context, they should be enclosed in double quotes. Quarterly and monthly dates (with a colon) may be used unquoted. Note that in the case of annual time series data, the year is not distinguishable syntactically from a plain integer; therefore if you wish to compare observations against <@lit="obs"> by year you must use the function <@lit="obsnum"> to convert the year to a 1-based index value, as in <@lit="genr d = (obs>obsnum(1986))">. 

Scalar values can be pulled from a series in the context of a <@lit="genr"> formula, using the syntax <@var="varname"><@lit="["><@var="obs"><@lit="]">. The <@var="obs"> value can be given by number or date. Examples: <@lit="x[5]">, <@lit="CPI[1996:01]">. For daily data, the form <@var="YYYY-MM-DD"> should be used, e.g. <@lit="ibm[1970-01-23]">. 

An individual observation in a series can be modified via <@lit="genr">. To do this, a valid observation number or date, in square brackets, must be appended to the name of the variable on the left-hand side of the formula. For example, <@lit="genr x[3] = 30"> or <@lit="genr x[1950:04] = 303.7">. 

Menu path: /Add/Define new variable
Other access: Main window pop-up menu

Script command: <@ref="genr">

# genrand Programming "Generating random variables"

In this dialog you must give a name for the variable to be created, plus some additional information depending on the distribution. 

<indent>
• Uniform: the lower and upper bounds for the distribution. 
</indent>

<indent>
• Normal: the mean and (positive) standard deviation. 
</indent>

<indent>
• Chi-square and Student's t: the degrees of freedom, which must be positive. 
</indent>

<indent>
• F: both numerator and denominator degrees of freedom. 
</indent>

<indent>
• gamma: shape and scale parameters (both positive). 
</indent>

<indent>
• Binomial: the "success" probability and the integer number of trials. 
</indent>

<indent>
• Poisson: the positive mean (which also equals the variance). 
</indent>

If you want to generate repeatable sequences of pseudo-random numbers, you can set the seed, under the Tools menu. 

# genseed Programming "Setting the seed for random numbers"

The "seed" controls the starting point for the sequence of pseudo-random numbers generated in a given gretl session. By default the seed is set when the program is started, using the system time. This ensures that you get a different sequence of random numbers each time you run the program. If you want to obtain repeatable sequences, you need to set the seed manually (and take note of the value you used). 

Note that whenever you click "OK" in this dialog box, the generator is re-started, using the given seed. So, for example, if you (a) set the seed to (say) 147; (b) generate a series from the standard normal distribution; (c) revisit this dialog and click "OK" again with the seed still at 147; then (d) generate a second series from the standard normal distribution, the two generated series will be identical. 

# gmm Estimation "GMM estimation"

Performs Generalized Method of Moments (GMM) estimation using the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm. You must specify one or more commands for updating the relevant quantities (typically GMM residuals), one or more sets of orthogonality conditions, an initial matrix of weights, and a listing of the parameters to be estimated, all enclosed between the tags <@lit="gmm"> and <@lit="end gmm">. Any options should be appended to the <@lit="end gmm"> line. 

Please see <@pdf="the Gretl User's Guide"> for details on this command. Here we just illustrate with a simple example. 

<code>          
   gmm e = y - X*b
     orthog e ; W
     weights V
     params b
   end gmm
</code>

In the example above we assume that <@lit="y"> and <@lit="X"> are data matrices, <@lit="b"> is an appropriately sized vector of parameter values, <@lit="W"> is a matrix of instruments, and <@lit="V"> is a suitable matrix of weights. The statement 

<code>          
   orthog e ; W
</code>

indicates that the residual vector <@lit="e"> is in principle orthogonal to each of the instruments composing the columns of <@lit="W">. 

Menu path: /Model/GMM

Script command: <@ref="gmm">

# graphing Graphs "Graphing"

Gretl calls a separate program, namely gnuplot, to generate graphs. Gnuplot is a very full-featured graphing program with myriad options. Gretl gives you direct access, via a graphical interface, to a subset of these options and it tries to choose sensible values for you; it also allows you to take complete control over graph details if you wish. 

With a graph displayed, you can click on the graph window for a pop-up menu with the following options: 

<indent>
• Save as postscript: save the graph in encapsulated postscript (EPS) format 
</indent>

<indent>
• Save as PNG: save in Portable Network Graphics format 
</indent>

<indent>
• Save to session as icon: the graph will appear in iconic form when you select "Icon view" from the Session menu 
</indent>

<indent>
• Zoom: lets you select an area within the graph for closer inspection 
</indent>

<indent>
• Print: (on the Gnome desktop and MS Windows only) lets you print the graph directly 
</indent>

<indent>
• Copy to clipboard: (MS Windows only) lets you paste the graph into Windows applications such as MS Word 
</indent>

<indent>
• Edit: opens a controller for the plot which lets you adjust various aspects of its appearance 
</indent>

<indent>
• Close: closes the graph window 
</indent>

If you know something about gnuplot and wish to get finer control over the appearance of a graph than is available via the graphical controller ("Edit" option), you have two further options: 

<indent>
• Once the graph is saved as a session icon, you can right-click on its icon for a further pop-up menu. One of the options here is "Edit plot commands", which opens an editing window with the actual gnuplot commands displayed. You can edit these commands and either save them for future processing or send them to gnuplot (with the execute toolbar icon in the plot commands editing window). 
</indent>

<indent>
• Another way to save the plot commands (or to save the displayed plot in formats other than EPS or PNG) is to use "Edit" item on a graph's pop-up menu to invoke the graphical controller, then click on the "Output to file" tab in the controller. You are then presented with a drop-down menu of formats in which to save the graph. 
</indent>

To find out more about gnuplot, see <@url="www.gnuplot.info">. 

# graphpg Graphs "Gretl graph page"

The session "graph page" will work only if you have the LaTeX typesetting system installed, and are able to generate and view PDF or PostScript output. 

In the session icon window, you can drag up to eight graphs onto the graph page icon. When you double-click on the graph page (or right-click and select "Display"), a page containing the selected graphs will be composed and opened in a suitable viewer. From there you should be able to print the page. 

To clear the graph page, right-click on its icon and select "Clear". 

Note that on systems other than MS Windows, you may have to adjust the setting for the program used to view PDF or PostScript files. Find that under the "Programs" tab in the gretl Preferences dialog box (under the Tools menu in the main window). 

It's also possible to operate on the graph page via script, or using the console (in the GUI program). The following commands and options are supported: 

To add a graph to the graph page, issue the command <@lit="graphpg add"> after saving a named graph, as in 

<code>          
   grf1 <- gnuplot Y X
   graphpg add
</code>

To display the graph page: <@lit="graphpg show">. 

To clear the graph page: <@lit="graphpg free">. 

To adjust the scale of the font used in the graph page, use <@lit="graphpg fontscale"> <@var="scale">, where <@var="scale"> is a multiplier (with a default of 1.0). Thus to make the font size 50 percent bigger than the default you can do 

<code>          
   graphpg fontscale 1.5
</code>

To call for printing of the graph page to file, use the flag <@opt="--⁠output="> plus a filename; the filename should have the suffix "<@lit=".pdf">", "<@lit=".ps">" or "<@lit=".eps">". For example: 

<code>          
   graphpg --output="myfile.pdf"
</code>

In this context the output uses colored lines by default; to use dot/dash patterns instead of colors you can append the <@opt="--⁠monochrome"> flag. 

Script command: <@ref="graphpg">

# 3-D Graphs "3-dimensional plots"

you can manipulate the 3-D plot with the mouse (rotate it, and expand or shrink the axes). 

In composing a 3-D plot, note that the Z-axis will be shown as the vertical axis. Thus if you have some dependent variable that you think may be influenced by two independent variables, you should put the dependent variable on the Z-axis, and the independent variables on the X and Y axes. 

Unlike most other gretl graphs, 3-D plots are controlled by gnuplot rather than gretl itself. The gretl graph-editing menu is not available. 

# gui-funcs Programming "Special functions"

This dialog enables you to specify which functions within a package, if any, should be assigned to certain special roles. Note that a given function can be assigned to at most one of the following roles, and to qualify as a candidate for one of these roles a function has to satisfy certain criteria. 

<indent>
• <@lit="bundle-print">: prints output based on the content of a bundle produced by your package. Criteria: this function must have as its first parameter a bundle-pointer. If a second parameter is present it must take the form of an integer switch that has a default value. 
</indent>

<indent>
• <@lit="bundle-plot">: produces one or more plots using a bundle produced by your package. Criteria: as for <@lit="bundle-print">. 
</indent>

<indent>
• <@lit="bundle-test">: carries out some sort of statistical test using a bundle produced by your package. Criteria: as for <@lit="bundle-print">. 
</indent>

<indent>
• <@lit="gui-main">: the public interface that should be presented to users by default in GUI use. This is useful only if the package has more than one public interface. 
</indent>

<indent>
• <@lit="gui-precheck">: gate-keeper function which returns 0 if the functionality of your package is applicable in the current context, non-zero otherwise. This is intended for use with packages that operate on a model in some way, to screen out types of model that are not handled by the package. 
</indent>

# gui-htest Tests "Test statistic calculator"

Gretl's test calculator computes test statistics and p-values for various common hypothesis tests concerning one or two populations. The required input takes the form of sample statistics derived from one or two samples, depending on the test chosen. These statistics can be typed in as numerical values. Alternatively, if you have a data file open, you can get gretl to calculate sample statistics for a selected variable or variables (in the case of means and variances, but not in the case of proportions). 

If you want to base your test on a variable in the data set, first activate this option by checking the box titled "Use variable from dataset". Then the drop-down list of variables will become active and you can select a variable. When you select a variable from the list, the relevant statistics are automatically entered in the boxes below. 

In addition to the simple selection of a variable, you have the option of specifying a restriction on the selected variable (that is, defining a sub-sample). For example, suppose you have wage data in a variable called "wage" and you also have a dummy variable called "gender" that equals 1 for males and 0 for females (or vice versa). Then, in the test for the difference of two means, you could select "wage" in both slots, but add to the top slot "(gender=0)" and to the bottom "(gender=1)". This would then give you a test for the difference between mean male income and mean female income. Note that when you type a restriction in this way, you must then press the Enter key to have the sample statistics calculated. 

The sub-sampling restriction must be placed in parentheses following the selected variable, and in general the restriction takes the form "var2 op value," where var2 is the name of a variable in the current data set, val is a numerical value, and op is a comparison operator chosen from =, !=, <, >, <= or >= (respectively equality, inequality, less than, greater than, less than or equal, and greater than or equal). The spaces around the operator are optional. 

# gui-htest-np Tests "Nonparametric tests"

Under the "Difference test" tab you can carry out a nonparametric test for a difference between two populations or groups, the specific test depending on the option selected. 

Sign test: This test is based on the fact that if two samples, <@itl="x"> and <@itl="y">, are drawn randomly from the same distribution, the probability that <@itl="x"><@sub="i"> > <@itl="y"><@sub="i">, for each observation <@itl="i">, should equal 0.5. The test statistic is <@itl="w">, the number of observations for which <@itl="x"><@sub="i"> > <@itl="y"><@sub="i">. Under the null hypothesis this follows the Binomial distribution with parameters (<@itl="n">, 0.5), where <@itl="n"> is the number of observations. 

Rank sum test: The Wilcoxon rank-sum test is performed. This test proceeds by ranking the observations from both samples jointly, from smallest to largest, then finding the sum of the ranks of the observations from one of the samples. The two samples do not have to be of the same size, and if they differ the smaller sample is used in calculating the rank-sum. Under the null hypothesis that the samples are drawn from populations with the same median, the probability distribution of the rank-sum can be computed for any given sample sizes; and for reasonably large samples a close Normal approximation exists. 

Signed rank test: The Wilcoxon signed-rank test is performed. This is designed for matched data pairs such as, for example, the values of a variable for a sample of individuals before and after some treatment. The test proceeds by finding the differences between the paired observations, <@itl="x"><@sub="i"> – <@itl="y"><@sub="i">, ranking these differences by absolute value, then assigning to each pair a signed rank, the sign agreeing with the sign of the difference. One then calculates <@itl="W"><@sub="+">, the sum of the positive signed ranks. As with the rank-sum test, this statistic has a well-defined distribution under the null that the median difference is zero, which converges to the Normal for samples of reasonable size. 

Under the "Runs test" tab you can carry out a test for the randomness of a given variable, based on the number of runs of consecutive positive or negative values. If you select the option "Use first difference", the variable is differenced prior to the analysis and hence the runs are interpreted as runs of increasing or decreasing values of the original variable. The test statistic is based on a normal approximation to the distribution of the number of runs under the null of randomness. 

# hausman Tests "Panel diagnostics"

This test is available only after estimating an OLS model using panel data (see also <@lit="setobs">). It tests the simple pooled model against the principal alternatives, the fixed effects and random effects models. 

The fixed effects model allows the intercept of the regression to vary across the cross-sectional units. An <@itl="F">-test is reported for the null hypotheses that the intercepts do not differ. The random effects model decomposes the residual variance into two parts, one part specific to the cross-sectional unit and the other specific to the particular observation. (This estimator can be computed only if the number of cross-sectional units in the data set exceeds the number of parameters to be estimated.) The Breusch–Pagan LM statistic tests the null hypothesis that the pooled OLS estimator is adequate against the random effects alternative. 

The pooled OLS model may be rejected against both of the alternatives, fixed effects and random effects. Provided the unit- or group-specific error is uncorrelated with the independent variables, the random effects estimator is more efficient than the fixed effects estimator; otherwise the random effects estimator is inconsistent and the fixed effects estimator is to be preferred. The null hypothesis for the Hausman test is that the group-specific error is not so correlated (and therefore the random effects model is preferable). A low p-value for this test counts against the random effects model and in favor of fixed effects. 

Menu path: Model window, /Tests/Panel diagnostics

Script command: <@ref="hausman">

# hccme Estimation "Robust standard errors"

You are offered several variant calculations for standard errors that are robust in the presence of heteroskedasticity (and, in the case of the HAC estimator, autocorrelation). 

HC0 produces the original "White's standard errors"; HC1, HC2, HC3 and HC3a are subsequent variations that are generally reckoned to produce superior (more reliable) results. For details of the estimators, see <@bib="MacKinnon and White (Journal of Econometrics, 1985);mackinnon-white85"> or <@bib="Davidson and MacKinnon, Econometric Theory and Methods (Oxford, 2004);davidson-mackinnon04">. The labels given here are those used by Davidson and MacKinnon. Variant "HC3a" is the jackknife, as described in MacKinnon and White; HC3 is a close approximation to the jackknife. 

If you use the HAC estimator for OLS on time-series data, you are able to fine-tune the lag-length using the <@lit="set"> command. Please see the gretl manual or the script commands help file for details. 

When estimating a model via OLS using panel data, the default robust estimator of the covariance matrix is that given by Arellano. The alternative is Beck and Katz's Panel Corrected Standard Errors (PCSE). The latter take into account heteroskedasticity but not autocorrelation. 

Two robust estimators of the covariance matrix are offered for GARCH models: QML is the Quasi-Maximum Likelihood Estimator, and BW is the Bollerslev-Wooldridge estimator. 

# hsk Estimation "Heteroskedasticity-corrected estimates"

This command is applicable where heteroskedasticity is present in the form of an unknown function of the regressors which can be approximated by a quadratic relationship. In that context it offers the possibility of consistent standard errors and more efficient parameter estimates as compared with OLS. 

The procedure involves (a) OLS estimation of the model of interest, followed by (b) an auxiliary regression to generate an estimate of the error variance, then finally (c) weighted least squares, using as weight the reciprocal of the estimated variance. 

In the auxiliary regression (b) we regress the log of the squared residuals from the first OLS on the original regressors and their squares. The log transformation is performed to ensure that the estimated variances are non-negative. Call the fitted values from this regression <@itl="u"><@sup="*">. The weight series for the final WLS is then formed as 1/exp(<@itl="u"><@sup="*">). 

Menu path: /Model/Other linear models/Heteroskedasticity corrected

Script command: <@ref="hsk">

# hurst Statistics "Hurst exponent"

Calculates the Hurst exponent (a measure of persistence or long memory) for a time-series variable having at least 128 observations. 

The Hurst exponent is discussed by Mandelbrot. In theoretical terms it is the exponent, <@itl="H">, in the relationship 

  <@fig="hurst">

where RS is the "rescaled range" of the variable <@itl="x"> in samples of size <@itl="n"> and <@itl="a"> is a constant. The rescaled range is the range (maximum minus minimum) of the cumulated value or partial sum of <@itl="x"> over the sample period (after subtraction of the sample mean), divided by the sample standard deviation. 

As a reference point, if <@itl="x"> is white noise (zero mean, zero persistence) then the range of its cumulated "wandering" (which forms a random walk), scaled by the standard deviation, grows as the square root of the sample size, giving an expected Hurst exponent of 0.5. Values of the exponent significantly in excess of 0.5 indicate persistence, and values less than 0.5 indicate anti-persistence (negative autocorrelation). In principle the exponent is bounded by 0 and 1, although in finite samples it is possible to get an estimated exponent greater than 1. 

In gretl, the exponent is estimated using binary sub-sampling: we start with the entire data range, then the two halves of the range, then the four quarters, and so on. For sample sizes smaller than the data range, the RS value is the mean across the available samples. The exponent is then estimated as the slope coefficient in a regression of the log of RS on the log of sample size. 

Menu path: /Variable/Hurst exponent

Script command: <@ref="hurst">

# intreg Estimation "Interval regression model"

Estimates an interval regression model. This model arises when the dependent variable is imperfectly observed for some (possibly all) observations. In other words, the data generating process is assumed to be 

  <@itl="y* = x b + u">

but we only observe <@itl="m <= y* <= M"> (the interval may be left- or right-unbounded). Note that for some observations <@itl="m"> may equal <@itl="M">. The variables <@var="minvar"> and <@var="maxvar"> must contain <@lit="NA">s for left- and right-unbounded observations, respectively. 

In the model specification dialog, <@var="minvar"> and <@var="maxvar"> are indentified as the Lower bound variable and the Upper bound variable respectively. 

The model is estimated by maximum likelihood, assuming normality of the disturbance term. 

By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. 

Menu path: /Model/Limited dependent variable/Interval regression

Script command: <@ref="intreg">

# irfboot Graphs "Impulse response plots"

If you select the bootstrap option when plotting impulse responses, gretl computes a confidence interval for the responses using the bootstrap method. The residuals from the original VAR (or VECM) are resampled with replacement; an artificial dataset is constructed based on the original parameter estimates and the resampled residuals; the system is re-estimated and the impulse responses are re-evaluated. This is repeated 999 times and the α/2 and 1 – α/2 quantiles for the responses are found and plotted along with the point estimates. This option is not currently available for restricted VECMs. 

This dialog also supports reordering of the variables for the Cholesky decomposition of the cross-equation covariance matrix. The default is given by the order in which the variables are entered into the model specification, but the up and down arrows can be used to promote or demote a selected variable. 

# kalman Estimation "Kalman filter"

Opens a block of statements to set up a Kalman filter. This block should end with the line <@lit="end kalman">, to which the options shown above may be appended. The intervening lines specify the matrices that compose the filter. For example, 

<code>          
   kalman 
     obsy y
     obsymat H
     statemat F
     statevar Q
   end kalman
</code>

Please see <@pdf="the Gretl User's Guide"> for details. 

See also <@xrf="kfilter">, <@xrf="ksimul">, <@xrf="ksmooth">. 

Script command: <@ref="kalman">

# kpss Tests "KPSS stationarity test"

Computes the KPSS test (Kwiatkowski, Phillips, Schmidt and Shin, Journal of Econometrics, 1992) for stationarity of the given variable (or its first difference, if the differencing option is selected). The null hypothesis is that the variable in question is stationary, either around a level or, if the "include a trend" box is checked, around a deterministic linear trend. 

The selected lag order determines the size of the window used for Bartlett smoothing. If the "show regression results" box is checked the results of the auxiliary regression are printed, along with the estimated variance of the random walk component of the variable. 

The critical values shown for the test statistic are based on the response surfaces estimated by <@bib="Sephton (Economics Letters, 1995);sephton95">, which are more accurate for small samples than the values given in the original KPSS article. When the test statistic lies between the 10 percent and 1 percent critical values a p-value is shown; this is obtained by linear interpolation and should not be taken too literally. 

Menu path: /Variable/Unit root tests/KPSS test

Script command: <@ref="kpss">

# lad Estimation "Least Absolute Deviation estimation"

Calculates a regression that minimizes the sum of the absolute deviations of the observed from the fitted values of the dependent variable. Coefficient estimates are derived using the Barrodale–Roberts simplex algorithm; a warning is printed if the solution is not unique. 

Standard errors are derived using the bootstrap procedure with 500 drawings. The covariance matrix for the parameter estimates, printed when the <@opt="--⁠vcv"> flag is given, is based on the same bootstrap. 

Menu path: /Model/Robust estimation/Least Absolute Deviation

Script command: <@ref="lad">

# lags-dialog Estimation "Lag selection box"

In this dialog you can select the lag order for the independent variables in a time-series model, and in some cases for the dependent variable also. (But note that the common lag order for vector models such as VARs and VECMs is handled separately, via a selection spinner in the main model dialog box.) 

The spinners on the left let you select a range of consecutive lags for any given variable. To specify non-consecutive lags, click the check box next to the entry field titled "specific lags". This activates the entry box, into which you can type a list of lags, separated by spaces. 

The row marked "default" offers a quick way to set a common lag specification for all the independent variables: values set in that row are copied to all the others (apart from the dependent variable, if present). 

The dependent variable is treated specially: the minimum lag must be zero, which places the current value of the variable on the left-hand side of the model. Any higher lags appear with the independent variables on the right-hand side of the model. 

Values selected in this dialog are remembered for the duration of your session with a given dataset. 

# leverage Tests "Influential observations"

Must follow an <@lit="ols"> command. Calculates the leverage (<@itl="h">, which must lie in the range 0 to 1) for each data point in the sample on which the previous model was estimated. Displays the residual (<@itl="u">) for each observation along with its leverage and a measure of its influence on the estimates, <@itl="uh">/(1 – <@itl="h">). "Leverage points" for which the value of <@itl="h"> exceeds 2<@itl="k">/<@itl="n"> (where <@itl="k"> is the number of parameters being estimated and <@itl="n"> is the sample size) are flagged with an asterisk. For details on the concepts of leverage and influence see <@bib="Davidson and MacKinnon (1993);davidson-mackinnon93">, Chapter 2. 

DFFITS values are also computed: these are "studentized residuals" (predicted residuals divided by their standard errors) multiplied by <@fig="dffit">. For discussions of studentized residuals and DFFITS see chapter 12 of <@bib="Maddala's Introduction to Econometrics;maddala92"> or <@bib="Belsley, Kuh and Welsch (1980);belsley-etal80">. 

Briefly, a "predicted residual" is the difference between the observed value of the dependent variable at observation <@itl="t">, and the fitted value for observation <@itl="t"> obtained from a regression in which that observation is omitted (or a dummy variable with value 1 for observation <@itl="t"> alone has been added); the studentized residual is obtained by dividing the predicted residual by its standard error. 

The "+" icon at the top of the leverage test window brings up a dialog box that allows you to save one or more of the test variables to the current data set. 

After execution, the <@lit="$test"> accessor returns the cross-validation criterion, which is defined as the sum of squared deviations of the dependent variable from its forecast value, the forecast for each observation being based on a sample from which that observation is excluded. (This is known as the <@itl="leave-one-out"> estimator). For a broader discussion of the cross-validation criterion, see Davidson and MacKinnon's <@itl="Econometric Theory and Methods">, pages 685–686, and the references therein. 

Menu path: Model window, /Tests/Influential observations

Script command: <@ref="leverage">

# levinlin Tests "Levin-Lin-Chu test"

Carries out the panel unit-root test described by <@bib="Levin, Lin and Chu (2002);LLC2002">. The null hypothesis is that all of the individual time series exhibit a unit root, and the alternative is that none of the series has a unit root. (That is, a common AR(1) coefficient is assumed, although in other respects the statistical properties of the series are allowed to vary across individuals.) 

Menu path: /Variable/Unit root tests/Levin-Lin-Chu test

Script command: <@ref="levinlin">

# loess Estimation "Loess"

Performs locally-weighted polynomial regression and produces a series containing predicted values of the dependent variable for each non-missing value of the independent variable. The method is as described by <@bib="William Cleveland (1979);cleveland79">. 

The controls allow you to specify the order of the polynomial in the independent variable and the proportion of the data points to be used in each local regression (the bandwidth). Higher values of the bandwidth produce a smoother outcome. 

If the robust weights box is checked the local regression procedure is iterated twice, with the weights being modified based on the residuals from the previous iteration so as to give less influence to outliers. 

# logistic Estimation "Logistic regression"

Logistic regression: carries out an OLS regression using the logistic transformation of the dependent variable, 

  <@fig="logistic1">

The dependent variable must be strictly positive. If all its values lie between 0 and 1, the default is to use a <@itl="y"><@sup="*"> value (the asymptotic maximum of the dependent variable) of 1; if its values lie between 0 and 100, the default <@itl="y"><@sup="*"> is 100. 

You may specify a different maximum <@itl="y"> value. Note that the supplied value must be greater than all of the observed values of the dependent variable. 

The fitted values and residuals from the regression are automatically transformed using 

  <@fig="logistic2">

where <@itl="x"> represents either a fitted value or a residual from the OLS regression using the transformed dependent variable. The reported values are therefore comparable with the original dependent variable. 

Note that if the dependent variable is binary, you should use the <@ref="logit"> command instead. 

Menu path: /Model/Limited dependent variable/Logistic

Script command: <@ref="logistic">

# logit Estimation "Logit regression"

If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on <@var="indepvars"> are obtained via the Newton–Raphson method. As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed my giving the <@opt="--⁠p-values"> option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant. 

By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. See chapter 10 of Davidson and MacKinnon for details. 

If the dependent variable is not binary but is discrete, then by default it is interpreted as an ordinal response, and Ordered Logit estimates are obtained. However, if the <@opt="--⁠multinomial"> option is given, the dependent variable is interpreted as an unordered response, and Multinomial Logit estimates are produced. (In either case, if the variable selected as dependent is not discrete an error is flagged.) In the multinomial case, the accessor <@lit="$mnlprobs"> is available after estimation, to get a matrix containing the estimated probabilities of the outcomes at each observation (observations in rows, outcomes in columns). 

If you want to use logit for analysis of proportions (where the dependent variable is the proportion of cases having a certain characteristic, at each observation, rather than a 1 or 0 variable indicating whether the characteristic is present or not) you should not use the <@lit="logit"> command, but rather construct the logit variable, as in 

<code>          
   series lgt_p = log(p/(1 - p))
</code>

and use this as the dependent variable in an OLS regression. See chapter 12 of <@bib="Ramanathan (2002);ramanathan02">. 

Menu path: /Model/Limited dependent variable/Logit

Script command: <@ref="logit">

# mahal Statistics "Mahalanobis distances"

Computes the Mahalanobis distances between the series in <@var="varlist">. The Mahalanobis distance is the distance between two points in a <@itl="k">-dimensional space, scaled by the statistical variation in each dimension of the space. For example, if <@itl="p"> and <@itl="q"> are two observations on a set of <@itl="k"> variables with covariance matrix <@itl="C">, then the Mahalanobis distance between the observations is given by 

  <@fig="mahal">

where (<@itl="p"> – <@itl="q">) is a <@itl="k">-vector. This reduces to Euclidean distance if the covariance matrix is the identity matrix. 

The space for which distances are computed is defined by the selected variables. For each observation in the current sample range, the distance is computed between the observation and the centroid of the selected variables. This distance is the multidimensional counterpart of a standard <@itl="z">-score, and can be used to judge whether a given observation "belongs" with a group of other observations. 

If the number of variables selected is 4 or less, the covariance matrix and its inverse are printed. Clicking the "+" button at the top of the window displaying the distances give you the option of adding the distances to the dataset as a new variable. 

Menu path: /View/Mahalanobis distances

Script command: <@ref="mahal">

# meantest Tests "Difference of means"

By default the test statistic is calculated on the assumption that the variances are equal for the two variables; with the <@opt="--⁠unequal-vars"> option the variances are assumed to be different. This will make a difference to the test statistic only if there are different numbers of non-missing observations for the two series. 

Calculates the t statistic for the null hypothesis that the population means are equal for two selected series, and shows its p-value. The command may be called with or without the assumption that the variances are equal for the two variables (although this will make a difference to the test statistic only if there are different numbers of non-missing observations for the two series.) 

Menu path: /Tools/Test statistic calculator

Script command: <@ref="meantest">

# missing Dataset "Missing data values"

Set a numerical value that will be interpreted as "missing" or "not available", either for a particular data series (under the Variable menu) or globally for the entire data set (under the Sample menu). 

Gretl has its own internal coding for missing values, but sometimes imported data may employ a different code. For example, if a particular series is coded such that a value of -1 indicates "not applicable", you can select "Set missing value code" under the Variable menu and type in the value "-1" (without the quotes). Gretl will then read the -1s as missing observations. 

# menu-attach Programming "Menu attachment"

This dialog enables you to specify a menu attachment for a function package. To do this you must complete the three fields in the dialog box. 

<@itl="Label"> 

This requires a short label string, which will appear as the menu entry for the package. 

<@itl="Window"> 

Select "model window" for a function package that does something with a gretl model, and should appear in the menu bar in a gretl model window. Otherwise, select "main window". 

<@itl="Menu tree"> 

Select the position within the menu tree (for either the main window or the model window, as chosen above) where the entry for the package should appear. 

# mle Estimation "Maximum likelihood estimation"

Performs Maximum Likelihood (ML) estimation using either the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm or Newton's method. You must specify the log-likelihood function; it is recommended that you also supply expressions for the derivatives of this function with respect to each of the parameters if possible. 

Simple example: Suppose we have a series <@lit="X"> with values 0 or 1 and we wish to obtain the maximum likelihood estimate of the probability, <@lit="p">, that <@lit="X"> = 1. (In this simple case we can guess in advance that the ML estimate of <@lit="p"> will simply equal the proportion of Xs equal to 1 in the sample.) 

The parameter <@lit="p"> must first be added to the dataset and given an initial value. This can be done using the genr command or via menu choices. Appropriate "genr" lines may be typed into the MLE specification window prior to the specification of the log-likelihood function. 

In the MLE window we type the following lines: 

<code>          
   loglik = X*log(p) + (1-X)*log(1-p)
   deriv p = X/p - (1-X)/(1-p)
</code>

The first line specifies the log-likelihood function, and the next line supplies the derivative of that function with respect to the parameter p. If no "deriv" lines are given, a numerical approximation to the derivatives is computed. 

If the parameter p was not previously declared we could preface the above lines with something like the following: 

<code>          
   scalar p = 0.5
</code>

By default, standard errors are based on the Outer Product of the Gradient. If the robust standard errors box is checked, a QML estimator is used (namely, a sandwich of the negative inverse of the Hessian and the covariance matrix of the gradient). The Hessian is approximated numerically. 

For a much more in-depth description of <@lit="mle">, please refer to <@pdf="the Gretl User's Guide">. 

Menu path: /Model/Maximum likelihood

Script command: <@ref="mle">

# modeltab Utilities "The model table"

In econometric research it is common to estimate several models with a common dependent variable—the models differing in respect of which independent variables are included, or perhaps in respect of the estimator used. In this situation it is convenient to present the regression results in the form of a table, where each column contains the results (coefficient estimates and standard errors) for a given model, and each row contains the estimates for a given variable across the models. 

Gretl provides a means of constructing such a table (and copying it in plain text, LaTeX or Rich Text Format). Here is how to do it: 

<indent>
1. Estimate a model which you wish to include in the table, and in the model display window, under the File menu, select "Save to session as icon" or "Save as icon and close". 
</indent>

<indent>
2. Repeat step 1 for the other models to be included in the table (up to a total of six models). 
</indent>

<indent>
3. When you are done estimating the models, open the icon view of your gretl session (by selecting "icon view" under the Session menu in the main gretl window, or by clicking the "session icon view" icon on the gretl toolbar). 
</indent>

<indent>
4. In session icon view, there is an icon labeled "Model table". Decide which model you wish to appear in the left-most column of the model table and add it to the table, either by dragging its icon onto the Model table icon, or by right-clicking on the model icon and selecting "Add to model table" from the pop-up menu. 
</indent>

<indent>
5. Repeat step 4 for the other models you wish to include in the table. The second model selected will appear in the second column from the left, and so on. 
</indent>

<indent>
6. When you are finished composing the model table, display it by double-clicking on its icon. Under the Edit menu in the window which appears, you have the option of copying the table to the clipboard in various formats. 
</indent>

<indent>
7. If the ordering of the models in the table is not what you wanted, right-click on the model table icon and select "Clear table". Then go back to step 4 above and try again. 
</indent>

Menu path: Session icon window, Model table icon

Script command: <@ref="modeltab">

# mpols Estimation "Multiple-precision OLS"

Computes OLS estimates for the specified model using multiple precision floating-point arithmetic, with the help of the Gnu Multiple Precision (GMP) library. By default 256 bits of precision are used for the calculations, but this can be increased via the environment variable <@lit="GRETL_MP_BITS">. For example, when using the bash shell one could issue the following command, before starting gretl, to set a precision of 1024 bits. 

<code>          
   export GRETL_MP_BITS=1024
</code>

Menu path: /Model/Other linear models/High precision OLS

Script command: <@ref="mpols">

# nadarwat Estimation "Nadaraya-Watson"

Computes the Nadaraya–Watson nonparametric estimator of the conditional mean of the dependent variable, <@itl="m(x)">, for each non-missing value of the independent variable. 

The kernel function <@itl="K"> is given by <@itl="K = exp(-x"><@sup="2"><@itl=" / 2h)"> for <@itl="|x| < T"> and zero otherwise. 

The bandwidth, usually a small number, controls the smoothness of <@itl="m(x)"> (higher values producing a smoother series); the default value is <@itl="n"><@sup="-0.2">. 

If the "leave-one-out" box is checked, a variant of the estimator is employed in which the <@itl="i">-th observation is not used in evaluating <@itl="m(x"><@sub="i"><@itl=")">. This makes the Nadaraya–Watson estimator more robust numerically and its usage is normally advised when the estimator is computed for inference purposes. 

# negbin Estimation "Negative Binomial regression"

Estimates a Negative Binomial model. The dependent variable is taken to represent a count of the occurrence of events of some sort, and must have only non-negative integer values. By default the model NegBin 2 is used, in which the conditional variance of the count is given by μ(1 + αμ), where μ denotes the conditional mean. But if the <@opt="--⁠model1"> option is given the conditional variance is μ(1 + α). 

The optional <@lit="offset"> series works in the same way as for the <@ref="poisson"> command. The Poisson model is a restricted form of the Negative Binomial in which α = 0 by construction. 

By default, standard errors are computed using a numerical approximation to the Hessian at convergence. But if the <@opt="--⁠opg"> option is given the covariance matrix is based on the Outer Product of the Gradient (OPG), or if the <@opt="--⁠robust"> option is given QML standard errors are calculated, using a "sandwich" of the inverse of the Hessian and the OPG. 

Menu path: /Model/Limited dependent variable/Count data...

Script command: <@ref="negbin">

# nls Estimation "Nonlinear Least Squares"

Performs Nonlinear Least Squares (NLS) estimation using a modified version of the Levenberg–Marquardt algorithm. You must supply a function specification; it is recommended but not required that you also supply expressions for the derivatives of this function with respect to each of the parameters if possible. If you do not supply derivatives you should instead give a list of the parameters to be estimated (separated by spaces or commas), preceded by the keyword <@lit="params">; these can be either scalars, or vectors, or any combination of the two. 

Example: Suppose we have a data set with variables <@itl="C"> and <@itl="Y"> (e.g. <@lit="greene11_3.gdt">) and we wish to estimate a nonlinear consumption function of the form 

  <@fig="greene_Cfunc">

The parameters alpha, beta and gamma must first be added to the dataset and given initial values. Appropriate lines may be typed into the NLS specification window prior to the function specification. 

In the NLS window we type the following lines: 

<code>          
   C = alpha + beta * Y^gamma
   deriv alpha = 1
   deriv beta = Y^gamma
   deriv gamma = beta * Y^gamma * log(Y)
</code>

The first line specifies the regression function, and the next three lines supply the derivatives of that function with respect to each of the parameters in turn. If the "deriv" lines are not given, a numerical approximation to the Jacobian is computed. 

If the parameters alpha, beta and gamma were not previously declared we could preface the above lines with something like the following: 

<code>          
   scalar alpha = 1
   scalar beta = 1
   scalar gamma = 1
</code>

For further details on NLS estimation please see <@pdf="the Gretl User's Guide">. 

Menu path: /Model/Nonlinear Least Squares

Script command: <@ref="nls">

# normtest Tests "Normality test"

Carries out a test for normality for the given <@var="series">. The specific test is controlled by the option flags (but if no flag is given, the Doornik–Hansen test is performed). Note: the Doornik–Hansen and Shapiro–Wilk tests are recommended over the others, on account of their superior small-sample properties. 

The test statistic and its p-value may be retrieved using the accessors <@lit="$test"> and <@lit="$pvalue">. Please note that if the <@opt="--⁠all"> option is given, the result recorded is that from the Doornik–Hansen test. 

Menu path: /Variable/Normality test

Script command: <@ref="normtest">

# nulldata Dataset "Creating a blank dataset"

Establishes a "blank" data set, containing only a constant and an index variable, with periodicity 1 and the specified number of observations. This may be used for simulation purposes: some of the <@lit="genr"> commands (e.g. <@lit="genr uniform()">, <@lit="genr normal()">) will generate dummy data from scratch to fill out the data set. This command may be useful in conjunction with <@lit="loop">. See also the "seed" option to the <@ref="set"> command. 

By default, this command cleans out all data in gretl's current workspace. If you give the <@opt="--⁠preserve"> option, however, any currently defined matrices are retained. 

Menu path: /File/New data set

Script command: <@ref="nulldata">

# ols Estimation "Ordinary Least Squares"

Computes ordinary least squares (OLS) estimates for the specified model. 

Besides coefficient estimates and standard errors, the program also prints p-values for <@itl="t"> (two-tailed) and <@itl="F">-statistics. A p-value below 0.01 indicates statistical significance at the 1 percent level and is marked with <@lit="***">. <@lit="**"> indicates significance between 1 and 5 percent and <@lit="*"> indicates significance between the 5 and 10 percent levels. Model selection statistics (the Akaike Information Criterion or AIC and Schwarz's Bayesian Information Criterion) are also printed. The formula used for the AIC is that given by <@bib="Akaike (1974);akaike74">, namely minus two times the maximized log-likelihood plus two times the number of parameters estimated. 

Menu path: /Model/Ordinary Least Squares
Other access: Beta-hat button on toolbar

Script command: <@ref="ols">

# omit Tests "Omit variables"

This command re-estimates the given model after omitting the specified variables, or after sequentially omitting insignificant variables if the relevant box is available and is checked. Besides the usual model output, it prints a test for the joint significance of the omitted variables. The null hypothesis is that the true coefficients on all the omitted variables equal zero. 

Sequential elimination works as follows: at each step the variable with the highest p-value is omitted, until all remaining variables have a p-value no greater than some cutoff. The default cutoff is 10 percent (two-sided); this can be adjusted via the spin button. By default this process operates on all variables in the model (apart from the constant). If you want to confine it to a subset of the variables, check the box labeled "Test only selected variables" and make a selection. 

Menu path: Model window, /Tests/Omit variables

Script command: <@ref="omit">

# online Dataset "Access online databases"

Gretl is able to access databases at Wake Forest University (your computer must be connected to the internet for this to work). 

Under the "File, Browse databases" menu, select the item "on database server". A window should appear, showing a listing of the gretl databases available at Wake Forest. (Depending on your location and the speed of your internet connection, this may take a few seconds.) Along with the name of the database and a short description, there will appear a "Local status" entry: this shows whether you have the database installed locally (on the hard drive of your computer) and if so, whether or not it is up to date with the version on the server. 

If you have a given database installed locally, and it is up to date, there is no advantage in accessing it via the server. But for a database that is not already installed and up to date, you may wish to get a listing of the data series: click on "Get series listing". This brings up a further window, from which you can display the values of a chosen data series, graph those values, or import them into gretl's workspace. These tasks can be accomplished using the "Series" menu, or via the popup menu that appears when you click the right mouse button on a given series. You can also search the listing for a variable of interest (the "Find" menu item). 

If you want faster access to the data, or wish to access the database offline, then select the line showing the database you want, in the initial database window, and press the "Install" button. This will download the database in compressed format, then uncompress it and install it on your hard drive. Thereafter you should be able to find it under the "File, Browse databases, gretl native" menu. 

# panel Estimation "Panel models"

Estimates a panel model. By default the fixed effects estimator is used; this is implemented by subtracting the group or unit means from the original data. 

If the "Random effects" button is checked, random effects (GLS) estimates are computed. By default the method of Swamy and Arora is used for the GLS transformation, but the Nerlove method is available as an option. 

For more details on panel estimation, please see <@pdf="the Gretl User's Guide">. 

Menu path: /Model/Panel

Script command: <@ref="panel">

# panel-between Estimation "Between groups model"

This dialog allows you to enter a specification for the "between model" in the context of panel data. This regression uses the group-means of the data, thereby ignoring the variation within the groups. This model is rarely of great interest in its own right, but may be useful for purposes of comparison (for example, with the fixed effects model). 

# panel-mode Dataset "Panel data organization"

This dialog offers up to three options with regard to defining a data set as a panel. The first two options require that the data set is already organized in a panel format (although this may not yet be recognized by gretl). The third option requires that the data set contains variables that represent the panel structure. 

<@itl="Stacked time series">: Let there be <@var="N"> cross-sectional units in the data set, and let <@var="T"> = the number of time-series observations per unit. By selecting this option you are telling gretl that the data set is currently composed of <@var="N"> consecutive blocks of <@var="T"> time-series observations, one for each cross-sectional unit. The next step will be to specify the value of <@var="N">. 

<@itl="Stacked cross sections">: You are telling gretl that the data set is currently composed of <@var="T"> consecutive blocks of <@var="N"> cross-sectional observations, one for each time period. The next step, again, will be to specify the value of <@var="N">. 

If the total number of observations in the current dataset is prime, the above options are not available. 

<@itl="Use index variables">: You are saying that the data set is currently organized any old way (it doesn't matter how), but that it contains two variables that index the cross-sectional units and the time periods respectively. The next step will be to select those two variables. Panel index variables must have nothing but non-negative integer values, with no missing values. If there are no such variables in the dataset this option is not available. 

# panel-wls Estimation "Groupwise weighted least squares"

Groupwise weighted least squares for panel data. Computes weighted least squares (WLS) estimates, with the weights based on the estimated error variances for the respective cross-sectional units in the sample. 

If the iteration option is selected, the procedure is iterated: at each round the residuals are re-computed using the current WLS parameter estimates, which gives rise to a new set of estimates of the error variances, and a hence a new set of weights. Iteration stops when the maximum difference in the parameter estimates from one round to the next falls below 0.0001 or the number of iterations reaches 20. If the iteration converges, the resulting estimates are Maximum Likelihood. 

# pca Statistics "Principal Components Analysis"

Principal Components Analysis. Prints the eigenvalues of the correlation matrix (or the covariance matrix if the option box is checked) for the variables in <@var="varlist">, along with the proportion of the joint variance accounted for by each component. Also prints the corresponding eigenvectors (or "component loadings"). 

In the window displaying the results, you have the option of saving the principal components to the dataset as series. 

Menu path: /View/Principal components
Other access: Main window pop-up (multiple selection)

Script command: <@ref="pca">

# pergm Statistics "Periodogram"

Computes and displays the spectrum of the specified series. By default the sample periodogram is given, but optionally a Bartlett lag window is used in estimating the spectrum (see, for example, Greene's <@itl="Econometric Analysis"> for a discussion of this). The default width of the Bartlett window is twice the square root of the sample size but this can be set manually using the <@var="bandwidth"> parameter, up to a maximum of half the sample size. 

If the <@opt="--⁠log"> option is given the spectrum is represented on a logarithmic scale. 

The (mutually exclusive) options <@opt="--⁠radians"> and <@opt="--⁠degrees"> influence the appearance of the frequency axis when the periodogram is graphed. By default the frequency is scaled by the number of periods in the sample, but these options cause the axis to be labeled from 0 to π radians or from 0 to 180°, respectively. 

By default, if the program is not in batch mode a plot of the periodogram is shown. This can be adjusted via the <@opt="--⁠plot"> option. The acceptable parameters to this option are <@lit="none"> (to suppress the plot); <@lit="display"> (to display a plot even when in batch mode); or a file name. The effect of providing a file name is as described for the <@opt="--⁠output"> option of the <@ref="gnuplot"> command. 

Menu path: /Variable/Periodogram
Other access: Main window pop-up menu (single selection)

Script command: <@ref="pergm">

# polyweights Transformations "Polynomial trend fitting"

In fitting a polynomial trend to a time series it may be desirable to give extra weight to the observations at the start and end of the sample. (Points in the middle of the sample range have neighbours on both sides that are likely to be pulling the fit in the same general direction.) 

The weighting schemes offered here (quadratic, cosine-bell and steps) can be used to this effect. If you select one of these schemes two additional settings must be chosen: first, what maximum weight should be used (the minimum, baseline weight is 1.0)? Second, what central fraction of the sample should be given a uniform (minimal) weighting? 

Suppose, for example, you select a maximum weight of 3.0 and a central fraction of 0.4. This means that the middle 40 percent of the data get a weight of 1.0. If the steps shape is selected the first and last 30 percent of the observations get a weight of 3.0; otherwise, for the first 30 percent of observations the weights decline gradually from 3.0 to 1.0; and for the last 30 percent the weights increase from 1.0 to 3.0. 

# poisson Estimation "Poisson estimation"

Estimates a poisson regression. The dependent variable is taken to represent the occurrence of events of some sort, and must take on only non-negative integer values. 

If a discrete random variable <@itl="Y"> follows the Poisson distribution, then 

  <@fig="poisson1">

for <@itl="y"> = 0, 1, 2,…. The mean and variance of the distribution are both equal to <@itl="v">. In the Poisson regression model, the parameter <@itl="v"> is represented as a function of one or more independent variables. The most common version (and the only one supported by gretl) has 

  <@fig="poisson2">

or in other words the log of <@itl="v"> is a linear function of the independent variables. 

Optionally, you may add an "offset" variable to the specification. This is a scale variable, the log of which is added to the linear regression function (implicitly, with a coefficient of 1.0). This makes sense if you expect the number of occurrences of the event in question to be proportional, other things equal, to some known factor. For example, the number of traffic accidents might be supposed to be proportional to traffic volume, other things equal, and in that case traffic volume could be specified as an "offset" in a Poisson model of the accident rate. The offset variable must be strictly positive. 

By default, standard errors are computed using the negative inverse of the Hessian. If the <@opt="--⁠robust"> flag is given, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. 

See also <@ref="negbin">. 

Menu path: /Model/Limited dependent variable/Count data...

Script command: <@ref="poisson">

# probit Estimation "Probit model"

If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on <@var="indepvars"> are obtained via the Newton–Raphson method. As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed my giving the <@opt="--⁠p-values"> option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant. 

By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. See chapter 10 of Davidson and MacKinnon for details. 

If the dependent variable is not binary but is discrete, then Ordered Probit estimates are obtained. (If the variable selected as dependent is not discrete, an error is flagged.) 

With the <@opt="--⁠random-effects"> option, the error term is assumed to be composed of two normally distributed components: one time-invariant term that is specific to the cross-sectional unit or "individual" (and is known as the individual effect); and one term that is specific to the particular observation. 

Evaluation of the likelihood for this model involves the use of Gauss-Hermite quadrature for approximating the value of expectations of functions of normal variates. The number of quadrature points used can be chosen through the <@opt="--⁠quadpoints"> option (the default is 32). Using more points will increase the accuracy of the results, but at the cost of longer compute time; with many quadrature points and a large dataset estimation may be quite time consuming. 

Besides the usual parameter estimates (and associated statistics) relating to the included regressors, certain additional information is presented on estimation of this sort of model: 

<indent>
• <@lit="lnsigma2">: the maximum likelihood estimate of the log of the variance of the individual effect; 
</indent>

<indent>
• <@lit="sigma_u">: the estimated standard deviation of the individual effect; and 
</indent>

<indent>
• <@lit="rho">: the estimated share of the individual effect in the composite error variance (also known as the intra-class correlation). 
</indent>

The Likelihood Ratio test of the null hypothesis that <@lit="rho"> equals zero provides a means of assessing whether the random effects specification is needed. If the null is not rejected that suggests that a simple pooled probit specification is adequate. 

Probit for analysis of proportions is not implemented in gretl at this point. 

Menu path: /Model/Limited dependent variable/Probit

Script command: <@ref="probit">

# qlrtest Tests "Quandt likelihood ratio test"

For a model estimated on time-series data via OLS, performs the Quandt likelihood ratio (QLR) test for a structural break at an unknown point in time, with 15 percent trimming at the beginning and end of the sample period. 

For each potential break point within the central 70 percent of the observations, a Chow test is performed. See <@ref="chow"> for details; as with the regular Chow test, this is a robust Wald test if the original model was estimated with the <@opt="--⁠robust"> option, an F-test otherwise. The QLR statistic is then the maximum of the individual test statistics. 

An asymptotic p-value is obtained using the method of <@bib="Bruce Hansen (1997);hansen97">. 

Menu path: Model window, /Tests/QLR test

Script command: <@ref="qlrtest">

# qqplot Graphs "Q-Q plot"

With just one series selected, displays a plot of the empirical quantiles of the given series against the quantiles of the normal distribution. The series must include at least 20 valid observations in the current sample range. By default the empirical quantiles are plotted against quantiles of the normal distribution having the same mean and variance as the sample data, but two alternatives are available: the data may be standardized (converted to z-scores) before plotting, or the "raw" empirical quantiles may be plotted against the quantiles of the standard normal distribution. 

Given two series arguments, <@var="y"> and <@var="x">, displays a plot of the empirical quantiles of <@var="y"> against those of <@var="x">. The data values are not standardized. 

Menu path: /Variable/Normal Q-Q plot
Menu path: /View/Graph specified vars/Q-Q plot

Script command: <@ref="qqplot">

# quantreg Estimation "Quantile regression"

Quantile regression. By default standard errors are computed according to the asymptotic formula given by <@bib="Koenker and Bassett (<@itl="Econometrica">, 1978);koenker-bassett78">, but if the "robust" box is checked we use the heteroskedasticity-robust variant from <@bib="Koenker and Zhao (<@itl="Journal of Nonparametric Statistics">, 1994);koenker-zhao94">. 

If the "Compute confidence intervals" option is checked gretl will calculate confidence intervals for the coefficients, in place of standard errors. The "robust" check-box still has an effect: if it is not checked, the intervals are computed on the assumption of IID errors; with it, gretl uses the robust estimator developed by <@bib="Koenker and Machado (<@itl="Journal of the American Statistical Association">, 1999);koenker-machado99">. Note that these intervals are not just "plus or minus so many standard errors"; in general, they are asymmetrical about the point estimates of the coefficients. 

You may give a list of quantiles (see the drop-down list for some pre-defined possibilities). In that case gretl will calculate quantile estimates and either standard errors or confidence intervals for each of the specified values. 

To Follow up on the references given above, please see <@pdf="the Gretl User's Guide">. 

Menu path: /Model/Robust estimation/Quantile regression

Script command: <@ref="quantreg">

# reprobit Estimation "Random effects probit"

The random effects probit estimator provides a means of estimating a (binary) probit model for panel data. The error term is assumed to be composed of two normally distributed components: one time-invariant term that is specific to the cross-sectional unit or "individual" (and is known as the individual effect); and one term that is specific to the particular observation. 

Evaluation of the likelihood for this model involves the use of Gauss-Hermite quadrature for approximating the value of expectations of functions of normal variates. In this dialog you can select the number of quadrature points used. Using more points will increase the accuracy of the results, but at the cost of longer compute time; with many quadrature points and a large dataset estimation may be quite time consuming. 

Besides the usual parameter estimates (and associated statistics) relating to the included regressors, certain additional information is presented on estimation of this sort of model: 

<indent>
• <@lit="lnsigma2">: the maximum likelihood estimate of the log of the variance of the individual effect; 
</indent>

<indent>
• <@lit="sigma_u">: the estimated standard deviation of the individual effect; and 
</indent>

<indent>
• <@lit="rho">: the estimated share of the individual effect in the composite error variance (also known as the intra-class correlation). 
</indent>

The Likelihood Ratio test of the null hypothesis that <@lit="rho"> equals zero provides a means of assessing whether the random effects specification is needed. If the null is not rejected that suggests that a simple pooled probit specification is adequate. 

In scripting mode, the random effects probit model is estimated using the <@lit="probit"> command with the <@opt="--⁠random-effects"> option. 

# reset Tests "Ramsey's RESET"

Must follow the estimation of a model via OLS. Carries out Ramsey's RESET test for model specification (non-linearity) by adding the square and/or the cube of the fitted values to the regression and calculating the <@itl="F"> statistic for the null hypothesis that the parameters on the added terms are zero. 

Menu path: Model window, /Tests/Ramsey's RESET

Script command: <@ref="reset">

# restrict-model Tests "Restrictions on a model"

Each restriction in the set should be expressed as an equation, with a linear combination of parameters on the left and a numeric value to the right of the equals sign. Parameters may be referenced in the form <@lit="b["><@var="i"><@lit="]">, where <@var="i"> represents the position in the list of regressors (starting at 1), or <@lit="b["><@var="varname"><@lit="]">, where <@var="varname"> is the name of the regressor in question. 

The <@lit="b"> terms in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[4]">. 

Here is an example of a set of restrictions: 

<code>          
   b[1] = 0
   b[2] - b[3] = 0
   b[4] + 2*b[5] = 1
</code>

# restrict-system Tests "Restrictions on a system of equations"

Each restriction in the set should be expressed as an equation, with a linear combination of parameters on the left and a numeric value to the right of the equals sign. Parameters are referenced using <@lit="b"> plus two numbers in square brackets. The leading number represents the position of the equation within the system and the second number indicates position in the list of regressors, starting at 1 in both cases. For example <@lit="b[2,1]"> denotes the first parameter in the second equation, and <@lit="b[3,2]"> the second parameter in the third equation. 

The <@lit="b"> terms in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[1,4]">. 

Here is an example of a set of restrictions: 

<code>          
   b[1,1] = 0
   b[1,2] - b[2,2] = 0
   b[3,4] + 2*b[3,5] = 1
</code>

# restrict-vecm Tests "Restrictions on a VECM"

Use this command to place linear restrictions on the cointegrating relations (beta) and/or adjustment coefficients (alpha) in a vector error-correction model (VECM). 

Each restriction should be expressed as an equation, with a linear combination of parameters to the left of the equals sign and a numerical value on the right. Restrictions on beta may be non-homogeneous (non-zero on the right), but alpha restrictions must be homogeneous (zero on the right). 

If the VECM is of rank 1, the elements of beta are referenced in the form <@lit="b["><@var="i"><@lit="]">, where <@var="i"> represents position in the cointegrating vector, starting at 1. For example, <@lit="b[2]"> denotes the second element in beta. If the rank is greater than 1, use <@lit="b"> plus two numbers in square brackets. For example, <@lit="b[2,1]"> denotes the first element in the second cointegrating vector. 

To reference elements of alpha, use <@lit="a"> instead of <@lit="b">. 

The parameter identifiers in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[4]">. 

Here is an example of a set of restrictions on a VECM of rank 1. 

<code>          
   b[1] + b[2] = 0
   b[1] + b[3] = 0
</code>

See also <@pdf="the Gretl User's Guide">. 

# rmplot Graphs "Range-mean plot"

Range–mean plot: this command creates a simple graph to help in deciding whether a time series, <@itl="y">(t), has constant variance or not. We take the full sample t=1,...,T and divide it into small subsamples of arbitrary size <@itl="k">. The first subsample is formed by <@itl="y">(1),...,<@itl="y">(k), the second is <@itl="y">(k+1), ..., <@itl="y">(2k), and so on. For each subsample we calculate the sample mean and range (= maximum minus minimum), and we construct a graph with the means on the horizontal axis and the ranges on the vertical. So each subsample is represented by a point in this plane. If the variance of the series is constant we would expect the subsample range to be independent of the subsample mean; if we see the points approximate an upward-sloping line this suggests the variance of the series is increasing in its mean; and if the points approximate a downward sloping line this suggests the variance is decreasing in the mean. 

Besides the graph, gretl displays the means and ranges for each subsample, along with the slope coefficient for an OLS regression of the range on the mean and the p-value for the null hypothesis that this slope is zero. If the slope coefficient is significant at the 10 percent significance level then the fitted line from the regression of range on mean is shown on the graph. The <@itl="t">-statistic for the null, and the corresponding p-value, are recorded and may be retrieved using the accessors <@lit="$test"> and <@lit="$pvalue"> respectively. 

Menu path: /Variable/Range-mean graph

Script command: <@ref="rmplot">

# runs Tests "Runs test"

Carries out the nonparametric "runs" test for randomness of the specified <@var="series">, where runs are defined as sequences of consecutive positive or negative values. If you want to test for randomness of deviations from the median, for a variable named <@lit="x1"> with a non-zero median, you can do the following: 

<code>          
   series signx1 = x1 - median(x1)
   runs signx1
</code>

If the <@opt="--⁠difference"> option is given, the variable is differenced prior to the analysis, hence the runs are interpreted as sequences of consecutive increases or decreases in the value of the variable. 

If the <@opt="--⁠equal"> option is given, the null hypothesis incorporates the assumption that positive and negative values are equiprobable, otherwise the test statistic is invariant with respect to the "fairness" of the process generating the sequence, and the test focuses on independence alone. 

Menu path: /Tools/Nonparametric tests

Script command: <@ref="runs">

# sampling Dataset "Setting the sample"

The Sample menu offers several ways of selecting a sub-sample from the current dataset. 

If you choose "Sample/Restrict based on criterion..." you need to supply a Boolean (logical) expression, of the same sort that you would use to define a dummy variable. For example the expression "sqft > 1400" will select only cases for which the variable sqft has a value greater than 1400. Conditions may be concatenated using the logical operators "&&" (AND) and "||" (OR), and may be negated using "!" (NOT). If the dataset already contains dummy variables, you are also given the option of selecting one of these to define the sample (observations with a value of 1 for the selected dummy will be included, and others excluded). 

The menu item "Sample/Drop all obs with missing values" redefines the sample to exclude all observations for which values of one or more variables are missing (leaving only complete cases). 

To select observations for which a particular variable has no missing values, use "Restrict based on criterion..." and supply the Boolean condition "!missing(varname)" (replace "varname" with the name of the variable you want to use). 

If the observations are labeled, you can exclude particular observations using, for example, <@lit="obs!="France""> as the Boolean criterion. The observation name must be enclosed in double quotes. 

One point should be noted about defining a sample based on a dummy variable, a Boolean expression, or on the missing values criterion: Any "structural" information in the data header file (regarding the time series or panel nature of the data) is lost. You may reimpose structure with "Sample/Set frequency, startobs...". 

Please see <@pdf="the Gretl User's Guide"> for further details. 

# save-labels Utilities "Save or remove series labels"

If you choose Export here, gretl will write a file containing the descriptive labels of any series in the current dataset that have such labels. This is a plain text file with one line per variable. The line will be empty for variables that have no descriptive label. 

If you choose Remove, the descriptive labels will be removed for all series that have such labels. This would be appropriate only if the current labels have somehow been added in error. 

# add-labels Utilities "Add series labels"

If you choose Yes here, you are offered a file-open dialog box to select a plain text file containing descriptive labels for the series in the current dataset. The file should contain one label per line; a blank line means no label. Gretl will attempt to read as many labels as there are series in the dataset, excluding the constant. 

# save-script Utilities "Save commands?"

If you choose Yes here, gretl will write a file containing a record of the commands you executed in the current session. Most commands that you execute via "point and click" have a "script" counterpart, and it is these script commands that will be saved. You could take the file as the basis for writing a gretl command script. 

If you don't care to be prompted to save a record of commands on exit, uncheck the tick box in the save commands dialog. 

# save-session Utilities "Save this gretl session?"

If you choose Yes here, gretl will write a file containing a "snapshot" of the current session, including a copy of the working dataset along with any models, graphs or other objects that you have saved "as icons". You can re-open this file later to recreate the state of gretl as of the time you quit the session (see the "File/Session files" menu). 

If you mostly work with gretl using command scripts (which we recommend for "serious" econometric work) you probably don't need to save the session, but you should be sure to save any changes to your script that you wish to keep. You may also want to save any changes to your dataset, unless these are of a sort that can easily be recreated by running a script. 

If you work with scripts and don't care to be prompted to save your session on exit, uncheck the tick box in the save session dialog. 

# scatters Graphs "Multiple pairwise graphs"

Generates pairwise graphs of the selected "Y-axis variable" against each of the selected "X-axis variables" in turn. (Or you can select several variables for the Y-axis and one for the X-axis.) Scanning a set of such plots can be a useful step in exploratory data analysis. The maximum number of plots is 16; any extra variables will be ignored. 

If the dataset is time-series, then the second sub-list can be omitted, in which case it will implicitly be taken as "time", so you can plot multiple time series in separated sub-graphs 

Menu path: /View/Multiple graphs

Script command: <@ref="scatters">

# setinfo Dataset "Edit attributes of variable"

In this dialog box you can: 

* Rename a (series) variable. 

* Add or edit a description of the variable: this appears next to the variable name in the gretl main window. 

* Add or edit the "display name" for the variable (if the variable is a series, not a scalar). This string (maximum 19 characters) is shown in place of the variable name when the variable is displayed in a graph. Thus for instance you can associate a more comprehensible string such as "T-bill rate" with a cryptically named variable such as "tb3". 

* (For time-series data) set the compaction method for the variable. This method will be used if you decide to reduce the frequency of the dataset, or if you update the variable by importing from a database where the variable is at a higher frequency than in the working dataset. 

* Mark a variable as discrete (for series with integer values only). This affects the way the variable is handled when you ask for a frequency plot. 

Menu path: /Variable/Edit attributes
Other access: Main window pop-up menu

Script command: <@ref="setinfo">

# setmiss Dataset "Missing value code"

Set a numerical value that will be interpreted as "missing" or "not applicable", either for a particular data series (under the Variable menu) or globally for the entire data set (under the Sample menu). 

Gretl has its own internal coding for missing values, but sometimes imported data may employ a different code. For example, if a particular series is coded such that a value of -1 indicates "not applicable", you can select "Set missing value code" under the Variable menu and type in the value "-1" (without the quotes). Gretl will then read the -1s as missing observations. 

Menu path: /Data/Set missing value code

Script command: <@ref="setmiss">

# spearman Statistics "Spearmans's rank correlation"

Prints Spearman's rank correlation coefficient for a specified pair of series. The series do not have to be ranked manually in advance; the function takes care of this. 

The automatic ranking is from largest to smallest (i.e. the largest data value gets rank 1). If you need to invert this ranking, create a new variable which is the negative of the original. For example: 

<code>          
   series altx = -x
   spearman altx y
</code>

Menu path: /Model/Robust estimation/Rank correlation

Script command: <@ref="spearman">

# store Dataset "Save data"

Save data to <@var="filename">. By default all currently defined series are saved but the optional <@var="varlist"> argument can be used to select a subset of series. If the dataset is sub-sampled, only the observations in the current sample range are saved. 

By default the data are saved in "native" gretl format, but the option flags permit saving in several alternative formats. CSV (Comma-Separated Values) data may be read into spreadsheet programs, and can also be manipulated using a text editor. The formats of Octave, R and PcGive are designed for use with the respective programs. Gzip compression may be useful for large datasets. See <@pdf="the Gretl User's Guide"> for details on the various formats. 

The option flags <@opt="--⁠omit-obs"> and <@opt="--⁠no-header"> are applicable only when saving data in CSV format. By default, if the data are time series or panel, or if the dataset includes specific observation markers, the CSV file includes a first column identifying the observations (e.g. by date). If the <@opt="--⁠omit-obs"> flag is given this column is omitted. The <@opt="--⁠no-header"> flag suppresses the usual printing of the names of the variables at the top of the columns. 

The option flag <@opt="--⁠decimal-comma"> is also confined to the case of saving data in CSV format. The effect of this option is to replace the decimal point with the decimal comma; in addition the column separator is forced to be a semicolon. 

The option of saving in gretl database format is intended to help with the construction of large sets of series, possibly having mixed frequencies and ranges of observations. At present this option is available only for annual, quarterly or monthly time-series data. If you save to a file that already exists, the default action is to append the newly saved series to the existing content of the database. In this context it is an error if one or more of the variables to be saved has the same name as a variable that is already present in the database. The <@opt="--⁠overwrite"> flag has the effect that, if there are variable names in common, the newly saved variable replaces the variable of the same name in the original dataset. 

The <@opt="--⁠comment"> option is available when saving data as a database or in CSV format. The required parameter is a double-quoted one-line string, attached to the option flag with an equals sign. The string is inserted as a comment into the database index file or at the top of the CSV output. 

The <@lit="store"> command behaves in a special manner in the context of a "progressive loop". See <@pdf="the Gretl User's Guide"> for details. 

Menu path: /File/Save data; /File/Export data

Script command: <@ref="store">

# system Estimation "Systems of equations"

In this window you can define a system of equations and choose an estimator for the system. Four sorts of statement may be given here, as follows: 

<indent>
• <@ref="equation">: specify an equation within the system. At least two such statements must be provided. 
</indent>

<indent>
• <@lit="instr">: for a system to be estimated via Three-Stage Least Squares, a list of instruments (by variable name or number). Alternatively, you can put this information into the <@lit="equation"> line using the same syntax as in the <@ref="tsls"> command. 
</indent>

<indent>
• <@lit="endog">: for a system of simultaneous equations, a list of endogenous variables. This is primarily intended for use with FIML estimation, but with Three-Stage Least Squares this approach may be used instead of giving an <@lit="instr"> list; then all the variables not identified as endogenous will be used as instruments. 
</indent>

<indent>
• <@lit="identity">: for use with FIML, an identity linking two or more of the variables in the system. This sort of statement is ignored when an estimator other than FIML is used. 
</indent>

Menu path: /Model/Simultaneous equations

Script command: <@ref="system">

# tobit Estimation "Tobit model"

Estimates a Tobit model, which may be appropriate when the dependent variable is "censored". For example, positive and zero values of purchases of durable goods on the part of individual households are observed, and no negative values, yet decisions on such purchases may be thought of as outcomes of an underlying, unobserved disposition to purchase that may be negative in some cases. 

By default it is assumed that the dependent variable is censored at zero on the left and is uncensored on the right. However you can use the entry boxes marked "left bound" and "right bound" to specify a different pattern of censoring. Enter either a numerical value or <@lit="NA"> for no censoring. 

The Tobit model is a special case of interval regression, which is supported via the <@ref="intreg"> command. 

Menu path: /Model/Limited dependent variable/Tobit

Script command: <@ref="tobit">

# transpos Dataset "Transpose data"

Transposes the current data set. That is, each observation (row) in the current data set will be treated as a variable (column), and each variable as an observation. This command may be useful if data have been read from some external source in which the rows of the data table represent variables. 

See also <@ref="dataset">. 

Menu path: /Data/Transpose data

# tsls Estimation "Instrumental variables regression"

This command requires the selection of two lists of variables: the independent variables to appear in the given model and a set of instruments. Note that any exogenous regressors should appear in both lists. 

Output for two-stage least squares estimates includes the Hausman test and, if the model is over-identified, the Sargan over-identification test. In the Hausman test, the null hypothesis is that OLS estimates are consistent, or in other words estimation by means of instrumental variables is not really required. A model of this sort is over-identified if there are more instruments than are strictly required. The Sargan test is based on an auxiliary regression of the residuals from the two-stage least squares model on the full list of instruments. The null hypothesis is that all the instruments are valid, and suspicion is thrown on this hypothesis if the auxiliary regression has a significant degree of explanatory power. For a good explanation of both tests see chapter 8 of <@bib="Davidson and MacKinnon (2004);davidson-mackinnon04">. 

For both TSLS and LIML estimation, an additional test result is shown provided that the model is estimated under the assumption of i.i.d. errors (that is, the <@opt="--⁠robust"> option is not selected). This is a test for weakness of the instruments. Weak instruments can lead to serious problems in IV regression: biased estimates and/or incorrect size of hypothesis tests based on the covariance matrix, with rejection rates well in excess of the nominal significance level <@bib="(Stock, Wright and Yogo, 2002);stock-wright-yogo02">. The test statistic is the first-stage <@itl="F">-test if the model contains just one endogenous regressor, otherwise it is the smallest eigenvalue of the matrix counterpart of the first stage <@itl="F">. Critical values based on the Monte Carlo analysis of <@bib="Stock and Yogo (2003);stock-yogo03"> are shown when available. 

The R-squared value printed for models estimated via two-stage least squares is the square of the correlation between the dependent variable and the fitted values. 

Menu path: /Model/Instrumental variables

Script command: <@ref="tsls">

# var Estimation "Vector Autoregression"

This command requires specification of: 

<indent>
• - the lag order, that is, the number of lags of each variable that should be included in the system; 
</indent>

<indent>
• - any exogenous variables (but note that a constant is included automatically unless you specify otherwise, a trend can be added using the trend checkbox, and seasonal dummy variables can be added using the seasonals checkbox); and 
</indent>

<indent>
• - a list of endogenous variables, lags of which will be included on the right-hand side of each equation (note: do not include lagged variables in this list -- they will be added automatically). 
</indent>

A separate regression will be run for each variable in the system. Output for each equation includes F-tests for zero restrictions on all lags of each of the variables and an F-test for the maximum lag, along with (optionally) forecast variance decompositions and impulse response functions. 

Forecast variance decompositions and impulse responses are based on the Cholesky decomposition of the contemporaneous covariance matrix, and in this context the order in which the (stochastic) variables are given matters. The first variable in the list is assumed to be "most exogenous" within-period. The horizon for variance decompositions and impulse responses can be set using the <@ref="set"> command. For retrieval of a specified impulse response function in matrix form, see the <@xrf="irf"> function. 

Menu path: /Model/Time series/Vector autoregression

Script command: <@ref="var">

# VAR-lagselect Tests "VAR lag-length selection"

In this dialog box you specify a VAR as usual, but use the lag order spin button to set the maximum number of lags to test. 

Output will consist of a table showing the values of the Akaike (AIC), Schwartz (BIC) and Hannan–Quinn (HQC) information criteria computed from VARs of order 1 to the chosen maximum. This is intended to help with the selection of the optimal lag order. 

# VAR-omit Tests "Test exogenous variables in VAR"

Use this dialog box to specify a subset of exogenous variables in a VAR. These variables will be omitted from the original VAR, and the system re-estimated. 

A Likelihood Ratio test is reported, where the null hypothesis is that the true parameter values are zero, in all equations of the VAR, for the omitted variables. The test is based on the difference between the log-determinant of the variance matrix for the unrestricted system, and that for the restricted system with the selected variables omitted. 

# vartest Tests "Difference of variances"

Calculates the <@itl="F"> statistic for the null hypothesis that the population variances are equal for the two selected series, and shows its p-value. 

Menu path: /Tools/Test statistic calculator

Script command: <@ref="vartest">

# vecm Estimation "Vector Error Correction Model"

A VECM is a form of vector autoregression or VAR (see <@ref="var">), applicable where the variables in the model are individually integrated of order 1 (that is, are random walks, with or without drift), but exhibit cointegration. This command is closely related to the Johansen test for cointegration (see <@ref="coint2">). 

The lag order selected in the VECM dialog box is that of the VAR system. The number of lags in the VECM itself (where the dependent variable is given as a first difference) is one less than this number. 

The "rank" represents the number of cointegrating vectors. This must be greater than zero and less than or equal to (generally, less than) the number of endogenous variables selected. 

In the "Endogenous variables" box you select the vector of endogenous variables, in levels. The inclusion of deterministic terms in the model is controlled by the option buttons. The default is to include an "unrestricted constant", which allows for the presence of a non-zero intercept in the cointegrating relations as well as a trend in the levels of the endogenous variables. In the literature stemming from the work of Johansen (see for example his 1995 book) this is often referred to as "case 3". The other four options produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the criteria for selecting a case are explained in <@pdf="the Gretl User's Guide">. 

In the "Exogenous variables" box you may add specific exogenous variables. By default these enter the model in unrestricted form (indicated by a <@lit="U"> next to the name of the variable). If you want a certain exogenous variable to be restricted to the cointegrating space, right-click on it and select "Restricted" from the pop-up menu. The symbol next to the variable will change to R. 

If the data are quarterly or monthly, a check box is shown that allows you to include a set of centered seasonal dummy variables. In all cases, an additional check box ("Show details") allows for the printing of the auxiliary regressions that form the starting point of the Johansen maximum likelihood estimation procedure. 

Menu path: /Model/Time series/VECM

Script command: <@ref="vecm">

# wls Estimation "Weighted Least Squares"

Let "wtvar" denote the variable selected in the "Weight variable" box. An OLS regression is run, where the dependent variable is the product of the positive square root of wtvar and the selected dependent variable, and the independent variables are also multiplied by the square root of wtvar. Statistics such as <@itl="R">-squared are based on the weighted data. If wtvar is a dummy variable, weighted least squares estimation is equivalent to eliminating all observations with value zero for wtvar. 

Menu path: /Model/Other linear models/Weighted Least Squares

Script command: <@ref="wls">

# working-dir Utilities "Working directory"

The "working directory" is where gretl looks by default when reading or writing data files or scripts via the file Open and Save dialogs. 

In addition the working directory is the default location for 

<indent>
• reading files via the script commands <@lit="append">, <@lit="open">, <@lit="run"> and <@lit="include">; and 
</indent>

<indent>
• writing files via the commands <@lit="eqnprint">, <@lit="tabprint">, <@lit="gnuplot">, <@lit="outfile"> and <@lit="store">. 
</indent>

The option of having gretl use the current directory (as determined via the shell) at start-up may be useful to people who are in the habit of launching gretl from a command prompt rather than a menu or icon. 

This dialog also allows you to set the behavior of the GUI file selector: when you open or save a file in a given folder, should the selector remember and return to the same folder on the next invocation? Or should the selector always visit the chosen working directory? 

Menu path: /File/Working directory

# x12a Utilities "X-12-ARIMA"

There are two procedural options here, controlled by the lower set of radio-buttons. 

If you select "Execute X-12-ARIMA directly" then gretl writes a command file for X-12-ARIMA and calls the x12a program to execute the commands. In this case you have the option of producing a graph and/or saving selected output series to the gretl dataset. 

If you select "Make X-12-ARIMA command file" gretl writes a command file for X-12-ARIMA, as above, but then opens this file in an editor window. In that window you are able to make changes and to save the file under a chosen name. You are also able to send the file for execution by x12a (by clicking the "Run" button on the editor window toolbar) and view the output. But in this case you do not have the option of saving data as gretl series or producing a gretl graph. 

# xcorrgm Statistics "Cross-correlogram"

Prints and graphs the cross-correlogram for <@var="series1"> and <@var="series2">, which may be specified by name or number. The values are the sample correlation coefficients between the current value of <@var="series1"> and successive leads and lags of <@var="series2">. 

If an <@var="order"> value is specified the length of the cross-correlogram is limited to at most that number of leads and lags, otherwise the length is determined automatically, as a function of the frequency of the data and the number of observations. 

By default, a plot of the cross-correlogram is produced: a gnuplot graph in interactive mode or an ASCII graphic in batch mode. This can be adjusted via the <@opt="--⁠plot"> option. The acceptable parameters to this option are <@lit="none"> (to suppress the plot); <@lit="ascii"> (to produce a text graphic even when in interactive mode); <@lit="display"> (to produce a gnuplot graph even when in batch mode); or a file name. The effect of providing a file name is as described for the <@opt="--⁠output"> option of the <@ref="gnuplot"> command. 

Menu path: /View/Cross-correlogram
Other access: Main window pop-up menu (multiple selection)

Script command: <@ref="xcorrgm">

# xtab Statistics "Cross-tabulate variables"

Displays a contingency table or cross-tabulation for each combination of the selected variables. Note that all the variables must be discrete. 

By default, frequency count values are shown in the cells and on the margins of the table. However, you can choose to display either row or column percentages instead. 

By default, cells with a zero count are shown as empty, but you can choose to show zero values explicitly. 

Pearson's chi-square test for independence is displayed if the expected frequency under independence is at least 1.0e-7 for all cells. A common rule of thumb for the validity of this statistic is that at least 80 percent of cells should have expected frequencies of 5 or greater; if this criterion is not met a warning is printed. 

If the contingency table is 2 by 2, Fisher's Exact Test for independence is computed. Note that this test is based on the assumption that the row and column totals are fixed, which may or may not be approriate depending on how the data were generated. The left p-value should be used when the alternative to independence is negative association (values tend to cluster in the lower left and upper right cells); the right p-value should be used if the alternative is positive association. The two-tailed p-value for this test is calculated by method (b) in section 2.1 of <@bib="Agresti (1992);agresti92">: it is the sum of the probabilities of all possible tables having the given row and column totals and having a probability less than or equal to that of the observed table. 

Script command: <@ref="xtab">
gretl-common 1.9.14-2 / usr / share / gretl / gretlgui.hlp