/usr/share/gretl/gretl_gui_help.en is in gretl-common 2017d-3build1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 | # add Tests "Add variables to model"
The selected variables are added to the previous model and the new model estimated. A test statistic for the joint significance of the added variables is printed, along with its p-value.
Menu path: Model window, /Tests/Add variables
Script command: <@ref="add">
# addline Graphs "Add line to graph"
This dialog box allows you to add a line, defined via a formula, to a graph. The formula must be an expression acceptable to gnuplot. Use <@lit="x"> to denote the value of the variable on the x-axis. Please note that gnuplot uses <@lit="**"> for exponentiation (raising to a power), and that the decimal character must be given as “.”. Examples:
<code>
10+0.35*x
100+5.3*x-0.12*x**2
sin(x)
exp(sqrt(pi*x))
</code>
# adf Tests "Augmented Dickey-Fuller test"
This command needs an integer lag order; if the order is zero a standard (not augmented) Dickey–Fuller test is run. Computes a set of Dickey–Fuller tests on the selected variable, the null hypothesis being that the variable has a unit root. (But if the differencing option is selected, the first difference of the variable is taken prior to testing, and the discussion below must be taken as referring to the transformed variable.)
In all cases the dependent variable is the first difference of the specified variable, <@mth="y">, and the key independent variable is the first lag of <@mth="y">. The model is constructed so that the coefficient on lagged <@mth="y"> equals the root in question minus 1. For example, the model with a constant may be written as
<@fig="adf1">
Under the null hypothesis of a unit root the coefficient on lagged <@mth="y"> equals zero; under the alternative that <@mth="y"> is stationary this coefficient is negative.
If the lag order, <@mth="k">, is greater than 0, then <@mth="k"> lags of the dependent variable are included on the right-hand side of the test regressions, subject to the following qualification. If the box labeled “test down from maximum lag” is checked, the selected lag order is taken as a maximum and the actual lag order used is obtained by testing down, using the criterion chosen via the accompanying drop-down list.
When testing down via AIC or BIC is called for, the final lag order for the ADF equation is that which optimizes the chosen information criterion (Akaike or Schwarz Bayesian).
When testing down via the <@mth="t">-statistic method is called for, the procedure is as follows:
<indent>
1. Estimate the Dickey–Fuller regression with <@mth="k"> lags of the dependent variable.
</indent>
<indent>
2. Is the last lag significant? If so, execute the test with lag order <@mth="k">. Otherwise, let <@mth="k"> = <@mth="k"> – 1; if <@mth="k"> equals 0, execute the test with lag order 0, else go to step 1.
</indent>
In the context of step 2 above, “significant” means that the <@mth="t">-statistic for the last lag has an asymptotic two-sided <@itl="p">-value, against the normal distribution, of 0.10 or less.
<@itl="P">-values for the Dickey–Fuller tests are based on <@bib="MacKinnon (1996);mackinnon96">. The relevant code is included by kind permission of the author. In the case of the test with linear trend using GLS these <@itl="P">-values are not applicable; critical values from Table 1 in <@bib="Elliott, Rothenberg and Stock (1996);ERS96"> are shown instead.
Menu path: /Variable/Unit root tests/Augmented Dickey-Fuller test
Script command: <@ref="adf">
# anova Statistics "ANOVA"
Analysis of Variance: <@var="response"> is a series measuring some effect of interest and <@var="treatment"> must be a discrete variable that codes for two or more types of treatment (or non-treatment). For two-way ANOVA, the <@var="block"> variable (which should also be discrete) codes for the values of some control variable.
The null hypothesis for the <@mth="F">-test is that the mean response is invariant with respect to the treatment type, or in words that the treatment has no effect. Strictly speaking, the test is valid only if the variance of the response is the same for all treatment types.
Note that the results shown by this command are in fact a subset of the information given by the following procedure, which is easily implemented in gretl. Create a set of dummy variables coding for all but one of the treatment types. For two-way ANOVA, in addition create a set of dummies coding for all but one of the “blocks”. Then regress <@var="response"> on a constant and the dummies using <@ref="ols">. For a one-way design the ANOVA table is printed via the <@opt="--anova"> option to <@lit="ols">. In the two-way case the relevant <@mth="F">-test is found by using the <@ref="omit"> command. For example (assuming <@lit="y"> is the response, <@lit="xt"> codes for the treatment, and <@lit="xb"> codes for blocks):
<code>
# one-way
list dxt = dummify(xt)
ols y 0 dxt --anova
# two-way
list dxb = dummify(xb)
ols y 0 dxt dxb
# test joint significance of dxt
omit dxt --quiet
</code>
Menu path: /Model/Other linear models/ANOVA
Script command: <@ref="anova">
# ar Estimation "Autoregressive estimation"
Computes parameter estimates using the generalized Cochrane–Orcutt iterative procedure; see Section 9.5 of <@bib="Ramanathan (2002);ramanathan02">. Iteration is terminated when successive error sums of squares do not differ by more than 0.005 percent or after 20 iterations.
The “list of AR lags” specifies the structure of the error process. For example, the entry “1 3 4” corresponds to the process:
<@fig="arlags">
Menu path: /Model/Time series/AR Errors (GLS)
Script command: <@ref="ar">
# ar1 Estimation "AR(1) estimation"
Computes feasible GLS estimates for a model in which the error term is assumed to follow a first-order autoregressive process.
The default method is the Cochrane–Orcutt iterative procedure; see for example section 9.4 of <@bib="Ramanathan (2002);ramanathan02">. The criterion for convergence is that successive estimates of the autocorrelation coefficient do not differ by more than 1e-6, or if the <@opt="--loose"> option is given, by more than 0.001. If this is not achieved within 100 iterations an error is flagged.
If the <@opt="--pwe"> option is given, the Prais–Winsten estimator is used. This involves an iteration similar to Cochrane–Orcutt; the difference is that while Cochrane–Orcutt discards the first observation, Prais–Winsten makes use of it. See, for example, Chapter 13 of <@bib="Greene (2000);greene00"> for details.
If the <@opt="--hilu"> option is given, the Hildreth–Lu search procedure is used. The results are then fine-tuned using the Cochrane–Orcutt method, unless the <@opt="--no-corc"> flag is specified. The <@opt="--no-corc"> option is ignored for estimators other than Hildreth–Lu.
Menu path: /Model/Time series/AR Errors (GLS)
Script command: <@ref="ar1">
# arch Estimation "ARCH model"
This command is retained at present for backward compatibility, but you are better off using the maximum likelihood estimator offered by the <@ref="garch"> command; for a plain ARCH model, set the first GARCH parameter to 0.
Estimates the given model specification allowing for ARCH (Autoregressive Conditional Heteroskedasticity). The model is first estimated via OLS, then an auxiliary regression is run, in which the squared residual from the first stage is regressed on its own lagged values. The final step is weighted least squares estimation, using as weights the reciprocals of the fitted error variances from the auxiliary regression. (If the predicted variance of any observation in the auxiliary regression is not positive, then the corresponding squared residual is used instead).
The <@lit="alpha"> values displayed below the coefficients are the estimated parameters of the ARCH process from the auxiliary regression.
See also <@ref="garch"> and <@ref="modtest"> (the <@opt="--arch"> option).
Script command: <@ref="arch">
# arima Estimation "ARIMA model"
Note: <@lit="arma"> is an acceptable alias for this command.
Estimates an ARMA model, with or without exogenous regressors. If the order of differencing is greater than zero the model becomes ARIMA. If the data have a frequency greater than 1 the option of including a seasonal component is presented.
If you wish to include only specified AR or MA lags in the model (as opposed to all lags up to a given order) check the box to the right of the spinner and type a list of lags, separated by spaces, into the entry field. Alternatively, if you have defined a matrix containing the desired set of lags you can type its name into the entry field.
The default is to use the “native” gretl ARMA functionality, with estimation by exact ML using the Kalman filter; estimation via conditional ML is available as an option. (If X-12-ARIMA is installed you have the option of using it instead of native code.) For details regarding these options, please see chapter 27 of the <@pdf="Gretl User's Guide#chap:timeseries">.
The AIC value given in connection with ARIMA models is calculated according to the definition used in X-12-ARIMA, namely
<@fig="aic">
where <@fig="ell"> is the log-likelihood and <@mth="k"> is the total number of parameters estimated. Note that X-12-ARIMA does not produce information criteria such as AIC when estimation is by conditional ML.
The AR and MA roots shown in connection with ARMA estimation are based on the following representation of an ARMA(p, q) process:
<mono>
(1 - a_1*L - a_2*L^2 - ... - a_p*L^p)Y =
c + (1 + b_1*L + b_2*L^2 + ... + b_q*L^q) e_t
</mono>
The AR roots are therefore the solutions to
<mono>
1 - a_1*z - a_2*z^2 - ... - a_p*L^p = 0
</mono>
and stability requires that these roots lie outside the unit circle.
The “frequency” figure printed in connection with AR and MA roots is the λ value that solves <@mth="z"> = <@mth="r"> * exp(i*2*π*λ) where <@mth="z"> is the root in question and <@mth="r"> is its modulus.
Menu path: /Model/Time series/ARIMA
Script command: <@ref="arima">
# arma Estimation "ARMA model"
See <@ref="arima">; <@lit="arma"> is an alias.
Script command: <@ref="arma">
# bfgs-config Estimation "BFGS options"
This dialog allows you to control some aspects of the operation of the BFGS maximizer. In case the maximizer fails to converge it may help matters, in some cases, to increase the number of iterations allowed and/or to increase (make more permissive) the convergence tolerance. However, you should be suspicious of results obtained using a high tolerance and should consider the possibility that the model you are estimating is misspecified.
For most applications we recommend use of the regular BFGS maximizer but for some problems the “limited memory” variant of the algorithm, L-BFGS-B, may produce more rapid convergence. When L-BFGS-B is selected, you have the option of setting the number of corrections used in the limited memory matrix (between 3 and 20, with a default of 8).
# bootstrap Tests "Bootstrap options"
In this dialog you get to choose:
<indent>
• The variable/coefficient to examine. (You can test only one coefficient at a time using this method.)
</indent>
<indent>
• The sort of analysis to perform. The default (95 percent) confidence interval is based directly on the quantiles of the bootstrap coefficient estimates. The “studentized” version is as per Davidson and MacKinnon's <@bib="Economic Theory and Methods;davidson-mackinnon04"> (ETM), chapter 5: at each bootstrap replication a <@mth="t">-ratio is formed as the difference between the bootstrap and the baseline coefficient estimates, divided by the bootstrap standard error. The confidence interval is then based on the quantiles of this bootstrap <@mth="t">-ratio, as explained in ETM. The P-value option is based on the distribution of the bootstrap <@mth="t">-ratio: it is the proportion of the replications where the absolute value of this statistic exceeds the absolute value of the baseline <@mth="t">-ratio.
</indent>
<indent>
• The bootstrap method. Under the first option the residuals from the original estimation are resampled with replacement (after rescaling as suggested in ETM). Under the second, resampling with replacement is performed on “pairs” or “cases”; that is, the <@mth="y">, <@mth="X"> data rows. In the third option the original residuals are first transformed as per <@bib="Davidson and Flachaire (2001);davidson-flachaire01">, then on each bootstrap replication each transformed residual is multiplied by either 1 or –1, with probability 0.5 in either case. In the final option pseudo-random normal values are generated with the original residual variance.
</indent>
<indent>
• The number of replications to perform. Note that when you're constructing a 95 percent confidence interval it is desirable that 0.05(<@mth="B"> + 1)/2 is an integer (where <@mth="B"> is the number of replications). So gretl may adjust the chosen number of replications to ensure this is the case.
</indent>
<indent>
• Whether or not to produce a graph of the bootstrap distribution. This option employs gretl's kernel density estimation facility.
</indent>
# boxplot Graphs "Boxplots"
These plots display the distribution of a variable. The central box encloses the middle 50 percent of the data, i.e. it is bounded by the first and third quartiles. The “whiskers” extend from each end of the box for a range equal to 1.5 times the interquartile range. Observations outside that range are considered outliers and represented via dots. A line is drawn across the box at the median. A “+” sign is used to indicate the mean. If the option of showing a confidence interval for the median is selected, this is computed via the bootstrap method and shown in the form of dashed horizontal lines above and/or below the median.
The “factorized” option allows you to examine the distribution of a chosen variable conditional on the value of some discrete factor. For example, if a data set contains wages and a gender dummy variable you can select the wage variable as the target and gender as the factor, to see side-by-side boxplots of male and female wages.
Menu path: /View/Graph specified vars/Boxplots
Script command: <@ref="boxplot">
# bwfilter Transformations "The Butterworth filter"
The Butterworth filter is an appromixation to an ideal square-wave filter which allows frequencies over a certain range to pass at full strength while stopping all others.
Higher values of the order parameter, <@mth="n">, produce a closer approximation to the ideal filter, in principle, but at the possible cost of numerical instability. The “cutoff” value sets the boundary between the pass band and the stop band. It is expressed in degrees, and must be greater than 0 and less than 180° (or π radians, corresponding to the highest frequency in the data). Smaller values of the cutoff produce a smoother trend.
Inspecting the periodogram of the target series is a useful preliminary when you wish to apply this filter. See chapter 9 of the <@pdf="Gretl User's Guide#chap:genr"> for details.
Menu path: /Variable/Filter/Butterworth
# chow Tests "Chow test"
This command needs either an observation number (or date, with dated data), or the name of a dummy variable.
Must follow an OLS regression. If an observation number or date is given, provides a test for the null hypothesis of no structural break at the given split point. The procedure is to create a dummy variable which equals 1 from the split point specified by <@var="obs"> to the end of the sample, 0 otherwise, and also interaction terms between this dummy and the original regressors. If a dummy variable is given, tests the null hypothesis of structural homogeneity with respect to that dummy. Again, interaction terms are added. In either case an augmented regression is run including the additional terms.
By default an <@mth="F"> statistic is calculated, taking the augmented regression as the unrestricted model and the original as the restricted. But if the original model used a robust estimator for the covariance matrix, the test statistic is a Wald chi-square value based on a robust estimator of the covariance matrix for the augmented regression.
Menu path: Model window, /Tests/Chow test
Script command: <@ref="chow">
# cluster Estimation "Robust variance estimation"
If you select the second option you must supply the name of a clustering variable. This variable should have at least two distinct values but generally should have substantially fewer distinct values than there are observations in the sample range.
The “cluster-robust” variance estimator divides the sample into a number of subsets or clusters according to the value taken on by the selected variable. In place of the classical assumption that the error term is independently and identically distributed, this estimator allows for the error variance to differ by cluster and also allows for a degree of dependence of the error within each cluster.
# coeffsum Tests "Sum of coefficients"
This command needs a list of variables, selected from the set of independent variables in a given model.
Calculates the sum of the coefficients on the variables in the specified list. Prints this sum along with its standard error and the p-value for the null hypothesis that the sum is zero.
Note the difference between this and <@ref="omit">, which tests the null hypothesis that the coefficients on a specified subset of independent variables are <@itl="all"> equal to zero.
Menu path: Model window, /Tests/Sum of coefficients
Script command: <@ref="coeffsum">
# coint Tests "Engle-Granger cointegration test"
The Engle–Granger cointegration test. The default procedure is: (1) carry out Dickey–Fuller tests on the null hypothesis that each of the variables listed has a unit root; (2) estimate the cointegrating regression; and (3) run a DF test on the residuals from the cointegrating regression. If the box labeled “skip initial DF tests” is checked, however, the first of these steps is omitted.
If the lag order, <@mth="k">, is greater than 0, then <@mth="k"> lags of the dependent variable are included on the right-hand side of each test regression, unless the box labeled “test down from maximum lag” is checked: in that case the selected lag order is taken as a maximum and the actual lag order used is obtained by testing down. See the <@ref="adf"> command for details of this procedure.
By default, the cointegrating regression contains a constant. If you wish to suppress the constant, or to add a linear or quadratic trend, select the appropriate option from the set of radio buttons in the Cointegration dialog box.
<@itl="P-">values for this test are based on <@bib="MacKinnon (1996);mackinnon96">. The relevant code is included by kind permission of the author.
Menu path: /Model/Time series/Multivariate
Script command: <@ref="coint">
# coint2 Tests "Johansen cointegration test"
Carries out the Johansen test for cointegration among the listed variables for the selected lag order. For details of this test see, for example, Hamilton, <@itl="Time Series Analysis"> (1994), Chapter 20. P-values are computed via Doornik's (1998) gamma approximation. Two sets of p-values are shown for the trace test, straight asymptotic values and values adjusted for the sample size.
The inclusion of deterministic terms in the model is controlled by the drop-down option list. The default is to include an “unrestricted constant”, which allows for the presence of a non-zero intercept in the cointegrating relations as well as a trend in the levels of the endogenous variables. In the literature stemming from the work of Johansen (see for example his 1995 book) this is often referred to as “case 3”. The other four options produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the criteria for selecting a case are explained in chapter 29 of the <@pdf="Gretl User's Guide#chap:vecm">.
You may control for exogenous variables by adding them to the lower list box. By default these enter the model in unrestricted form (indicated by a <@lit="U"> next to the name of the variable). If you want a certain exogenous variable to be restricted to the cointegrating space, right-click on it and select “Restricted” from the pop-up menu. The symbol next to the variable will change to R.
If the data are quarterly or monthly, a check box is shown that allows you to include a set of centered seasonal dummy variables. In all cases, an additional check box (“Show details”) allows for the printing of the auxiliary regressions that form the starting point of the Johansen maximum likelihood estimation procedure.
The following table is offered as a guide to the interpretation of the results shown for the test, for the 3-variable case. <@lit="H0"> denotes the null hypothesis, <@lit="H1"> the alternative hypothesis, and <@lit="c"> the number of cointegrating relations.
<mono>
Rank Trace test Lmax test
H0 H1 H0 H1
---------------------------------------
0 c = 0 c = 3 c = 0 c = 1
1 c = 1 c = 3 c = 1 c = 2
2 c = 2 c = 3 c = 2 c = 3
---------------------------------------
</mono>
See also the <@ref="vecm"> command.
Menu path: /Model/Time series/Multivariate
Script command: <@ref="coint2">
# compact Dataset "Compact data"
When you add to a dataset a series that is of higher frequency, it is necessary to “compact” the new series. For instance, a monthly series will have to be compacted to fit into a quarterly dataset.
In addition, you may sometimes want to compact an entire dataset to a lower frequency (perhaps, prior to adding a lower-frequency variable to the dataset).
Gretl offers five options for compacting:
<indent>
• Averaging: The value written to the dataset will be the arithmetic mean of the relevant series values. For instance the value written for the first quarter of 1990 will be the average of the values for January, February and March of 1990.
</indent>
<indent>
• Summing: The value written to the dataset will be the sum of the relevant higher-frequency values. For example, the first-quarter value will be the sum of the January, February and March values.
</indent>
<indent>
• End-of-period values: The value written to the dataset is the last relevant value from the higher-frequency data. For example, the first quarter of 1990 will get the March 1990 value.
</indent>
<indent>
• Start-of-period values: The value written to the dataset is the first relevant value from the higher-frequency data. For example, the first quarter of 1990 will get the January 1990 value.
</indent>
<indent>
• “Spreading”: In this case no higher-frequency information is lost; rather, it is spread over a set of <@mth="m"> series, where <@mth="m"> is the ratio of the higher frequency to the lower. Each series in the set holds the values from a given sub-period, which is identified by a specific suffix. For example, when compacting a series <@lit="x"> from monthly to quarterly the names of the generated series will be <@lit="x_m01">, <@lit="x_m02"> and <@lit="x_m03">, for the three months in each quarter.
</indent>
In the case of compacting an entire dataset, the choice you make in this dialog box sets the default method. But if you have set a compaction method for an individual variable (menu item “Variable/Edit attributes”) that method is used rather than the default. If the compaction method is already set for all variables, the choice of a default compaction method is not presented.
# controlled Graphs "Scatterplot with control"
This command requires the selection of three variables, one for the X axis, one for the Y axis, and one for which you wish to control (call it Z). The plot shows adjusted Y against adjusted X, where the adjusted version of the variable is the residual from an OLS regression on Z.
Example: You have data on wages, experience and education level for a sample of people. You wish to plot wages against education, controlling for experience. In that case you select wages for the Y axis, education for the X axis, and experience as the control. The plot shows wages against education, with both variables “purged” of the effect of experience.
# corr Statistics "Correlation coefficients"
Prints the pairwise correlation coefficients (Pearson's product-moment correlation) for the selected variables. The default behavior is to use all available observations for computing each pairwise coefficient, but if the option box is checked the sample is limited (if necessary) so that the same set of observations is used for all the coefficients. This option has an effect only if there are differing numbers of missing values for the variables used.
Menu path: /View/Correlation matrix
Other access: Main window pop-up menu (multiple selection)
Script command: <@ref="corr">
# corrgm Statistics "Correlogram"
Prints the values of the autocorrelation function (ACF) for <@var="series">, which may be specified by name or number. The values are defined as ρ(<@mth="u"><@sub="t">, <@mth="u"><@sub="t-s">) where <@mth="u"><@sub="t"> is the <@mth="t"><@sup="th"> observation of the variable <@mth="u"> and <@mth="s"> denotes the number of lags.
The partial autocorrelations (PACF, calculated using the Durbin–Levinson algorithm) are also shown: these are net of the effects of intervening lags. In addition the Ljung–Box <@mth="Q"> statistic is printed. This may be used to test the null hypothesis that the series is “white noise”; it is asymptotically distributed as chi-square with degrees of freedom equal to the number of lags used.
Asterisks are used to indicate statistical significance of the individual autocorrelations. By default this is assessed using a standard error of one over the square root of the sample size, but you have the choice of using Bartlett standard errors for the ACF. This option also governs the confidence band drawn in the ACF plot.
Menu path: /Variable/Correlogram
Other access: Main window pop-up menu (single selection)
Script command: <@ref="corrgm">
# count-model Estimation "Models for count data"
The dependent variable is taken to represent a count of the occurrence of events of some sort, and must have only non-negative integer values. By default the Poisson distribution is used, but the drop-down selector gives the options of using the Negative Binomial distribution. (The variant NegBin 2 is commonly used in econometrics, but the lesser used NegBin 1 is also available.)
Optionally, you may add an “offset” variable to the specification. This is a scale variable, the log of which is added to the linear regression function (implicitly, with a coefficient of 1.0). This makes sense if you expect the number of occurrences of the event in question to be proportional, other things equal, to some known factor. For example, the number of traffic accidents might be supposed to be proportional to traffic volume, other things equal, and in that case traffic volume could be specified as an “offset” in a model of the accident rate. The offset variable must be strictly positive.
By default, standard errors are computed using a numerical approximation to the Hessian at convergence. But if the “Robust standard errors” box is checked then QML standard errors are calculated, using a “sandwich” of the inverse of the Hessian and the outer product of the gradient.
# curve Graphs "Plot a curve"
This dialog box allows you to create a gnuplot graph by specifying a formula. This must be an expression acceptable to gnuplot. Use <@lit="x"> to denote the value of the variable on the x-axis. Please note that gnuplot uses <@lit="**"> for exponentiation (raising to a power), and that the decimal character must be given as “.”. Examples:
<code>
10+0.35*x
100+5.3*x-0.12*x**2
sin(x)
exp(sqrt(pi*x))
</code>
To put an additional line onto a graph created in this way, click on the graph and select “Edit”, select the “Lines” tab in the graph editing dialog, and use the “Add line” button.
# cusum Tests "CUSUM test"
Must follow the estimation of a model via OLS. Performs the CUSUM test—or if the <@opt="--squares"> option is given, the CUSUMSQ test—for parameter stability. A series of one-step ahead forecast errors is obtained by running a series of regressions: the first regression uses the first <@mth="k"> observations and is used to generate a prediction of the dependent variable at observation <@mth="k"> + 1; the second uses the first <@mth="k"> + 1 observations and generates a prediction for observation <@mth="k"> + 2, and so on (where <@mth="k"> is the number of parameters in the original model).
The cumulated sum of the scaled forecast errors, or the squares of these errors, is printed and graphed. The null hypothesis of parameter stability is rejected at the 5 percent significance level if the cumulated sum strays outside of the 95 percent confidence band.
In the case of the CUSUM test, the Harvey–Collier <@mth="t">-statistic for testing the null hypothesis of parameter stability is also printed. See Greene's <@itl="Econometric Analysis"> for details. For the CUSUMSQ test, the 95 percent confidence band is calculated using the algorithm given in <@bib="Edgerton and Wells (1994);edgerton94">.
Menu path: Model window, /Tests/CUSUM(SQ)
Script command: <@ref="cusum">
# daily-purge Dataset "Purge daily data"
If a daily dataset is nominally on a 7-day calendar but in fact only includes business-day data, it is recommended that you delete the blank weekend rows, thereby switching to a 5-day calendar.
If a business-daily dataset contains a relatively small number of rows with no data entries (presumably due to trading holidays) you may wish to delete these rows. In effect, this means treating the missing values for holidays as non-existent rather than truly “missing”, and treating the trading days as forming a continuous time-series.
Note that if you take either of these options gretl will nonetheless preserve the date information, and it will be possible to reconstruct the full calendar dataset later if that is required. This can be done using the <@ref="dataset"> command with the <@lit="pad-daily"> option.
# data-files Programming "Data files"
This dialog enables you to specify additional files to be included with a function package. Including such material implies that the package takes the form of a zip file. If gretl is to build the zip file for you, all files referenced here must be present in the same directory as the gfn file. Sub-directories can be listed as well as regular files; in that case it is implied that all of their contents should be included in the zip package.
There are two main intended uses for this facility. First, you can include a data file for use with the package's sample script, if none of the data files supplied with the gretl distribution are suitable. In this case the data should be in gretl native format (<@lit="gdt"> or binary <@lit="gdtb">). Second, if your package requires a big matrix (for example, holding critical values for a specialized test statistic) it may be more convenient to include this as a gretl matrix file (<@lit="mat">) than to assemble the matrix via multiple hansl statements.
To access a packaged <@lit="gdt"> or <@lit="gdtb"> file from a sample script, use the <@opt="--frompkg"> option with the <@lit="open"> command, supplying the name of the package as a parameter, as in
<code>
open almon.gdt --frompkg=almonreg
</code>
To read a packaged matrix file from within your package code, use the built-in string variable <@lit="$pkgdir">, as in
<code>
string mname = sprintf("%s/A.mat", $pkgdir)
matrix A = mread(mname)
</code>
(Note that “<@lit="/">” will work OK as path separator on MS Windows.)
# datasort Dataset "Sorting data"
The selected variable is used as a sort key for the entire data set. The observations on all variables are re-ordered by increasing value of the key variable, or by decreasing value if you select the “Descending” option.
# density Statistics "Kernel density estimation"
Kernel density estimation proceeds by defining a set of evenly spaced reference points, over a suitable range in relation to the range of the data, and attributing a density to each reference point based on the actual observations in the vicinity.
The formula used to compute the estimated density at each reference point, <@mth="x">, is
<@fig="kernel1">
where <@mth="n"> denotes the number of data points, <@mth="h"> is a “bandwidth” parameter, and <@mth="k">() is the kernel function. The larger the value of the bandwidth parameter, the smoother the estimated density.
You are given the choice of using a Gaussian kernel (the standard normal density) or the Epanechnikov kernel. By default, the bandwidth is that suggested as a rule of thumb by <@bib="Silverman (1986);silverman96">, namely
<@fig="kernel2">
where <@mth="s"> denotes the standard deviation of the data and IQR denotes the inter-quartile range. You can widen or shrink the bandwidth via the “bandwidth adjustment factor”: the actual bandwidth used is obtained by multiplying the Silverman value by the adjustment factor.
For a good introductory discussion of kernel density estimation see Chapter 15 of Davidson and MacKinnon's <@itl="Econometric Theory and Methods">.
# dfgls Tests "The ADF-GLS test"
The ADF-GLS test is a variant of the Dickey–Fuller test for a unit root, for the case where the variable to be tested is assumed to have a non-zero mean or to exhibit a linear trend. The difference is that the de-meaning or de-trending of the variable is done using the GLS procedure suggested by <@bib="Elliott, Rothenberg and Stock (1996);ERS96">. This gives a test of greater power than the standard Dickey–Fuller approach.
When testing down from a given maximum lag order, the “modified” AIC and BIC criteria are as described in <@bib="Ng and Perron (2001);ng-perron01">, with or without the amendment proposed by <@bib="Perron and Qu (2007);perron-qu07">. The amendment involves using OLS-detrended data in the phase of determining the optimal lag order, then GLS-detrending in the final unit root test.
See also the <@ref="adf"> command and the <@opt="--gls"> option.
Menu path: /Variable/Unit root tests/ADF-GLS test
# dialog Estimation "Model dialog box"
To select the dependent variable, highlight a variable in the list on the left and press the “Choose” button pointing to the Dependent variable slot. If you check the “Set as default” box, the selected variable will be pre-selected as dependent when the model dialog is next opened. Short-cut: double-click on a variable on the left to select it as the dependent variable and also set it as the default.
To select independent variables, highlight them on the left and press the “Add” button (or click the right mouse button). You can highlight several contiguous variables by dragging with the mouse. You can highlight a group of non-contiguous variables by clicking on them with the <@lit="Ctrl"> key pressed.
# dpanel Estimation "Dynamic panel models"
Carries out estimation of dynamic panel data models (that is, panel models including one or more lags of the dependent variable) using either the GMM-DIF or GMM-SYS method.
The dependent variable and regressors should be given in levels form; they will be differenced automatically (since this estimator uses differencing to cancel out the individual effects).
As regards the handling of instruments, please see the documentation for the script version of this command. Currently you cannot specify instruments explicitly in the GUI: all the independent variables are taken to be strictly exogenous.
By default the results of 1-step estimation are reported (with robust standard errors). You may select 2-step estimation as an option. In both cases tests for autocorrelation of orders 1 and 2 are provided, as well as the Sargan overidentification test and a Wald test for the joint significance of the regressors. Note that in this differenced model first-order autocorrelation is not a threat to the validity of the model, but second-order autocorrelation violates the maintained statistical assumptions.
For further details and examples, please see chapter 20 of the <@pdf="Gretl User's Guide#chap:dpanel">.
Menu path: /Model/Panel/Dynamic panel model
Script command: <@ref="dpanel">
# dummify Transformations "Create sets of dummies"
The “dummify” operation is available only for discrete-valued series. Its effect is to create a set of dummy variables coding for the distinct values present in the series.
For example suppose one has a series named <@lit="race">, with values 1 for “white”, 2 for “black”, 3 for “hispanic” and 4 for “other”. To dummify this series means to create 4 dummy variables: the first has value 1 for all observations at which race = 1, zero otherwise; the second has value 1 for all observations at which race = 2, zero otherwise; and so on.
In practice it's likely that for a discrete series with <@mth="k"> categories you will want to create only <@mth="k"> – 1 dummies, to avoid falling into the so-called “dummy variable trap”. Hence you have the option of dropping either the lowest or the highest value from the coding.
# ema-filter Transformations "Exponential Moving Average"
The formula for the exponential moving average (EMA) employed by gretl is that of <@bib="Roberts (1959);roberts59">, namely
<@mth="s"><@sub="t"> = α<@mth="y"><@sub="t"> + (1–α)<@mth="s"><@sub="t–1">
where <@mth="s"> is the EMA, <@mth="y"> is the original series, and α is a constant between 0 and 1. Larger values of α place more weight on the current observation; smaller values produce greater smoothing.
The “initial EMA value”, however specified, is taken to be the last pre-sample value, meaning that calculation of the filter starts with the first observation in the current sample range.
For a command-line equivalent, see the <@xrf="movavg"> function.
# expand Dataset "Expand data"
If you wish to add to a dataset a series that is of lower frequency, it is necessary to “expand” the new series. For instance, a quarterly series will have to be expanded to fit into a monthly dataset. In addition, you may sometimes want to expand an entire dataset to a higher frequency (perhaps, prior to adding a higher-frequency variable to the dataset).
Expansion of data should be considered an “expert” option; you need to know what you are doing. When combining series of differing original frequencies within one dataset, you should probably consider compacting the higher-frequency data rather than expanding the lower-frequency series.
That said, gretl offers two options: higher-frequency values can be interpolated using the method of <@bib="Chow and Lin (1971);chowlin71">, or the values of the lower-frequency series can be repeated as many times as required.
The Chow–Lin method is regression-based, using a constant and quadratic trend and assuming a first-order autoregressive process for the disturbances. Four degrees of freedom are used up by this procedure.
As for the repetition of values, suppose we have a quarterly series with the value 35.5 in 1990:1, the first quarter of 1990. On expansion to monthly, the value 35.5 will be assigned to the observations for January, February and March of 1990. The expanded variable is therefore useless for fine-grained time-series analysis, outside of the special case where you know that the variable in question does in fact remain constant over the sub-periods.
# export Dataset "Export data"
You may export data in Comma-Separated Values (CSV) format: such data may be opened in spreadsheets and many other application programs. If you select this option you will get some further options regarding the specific format of the CSV file.
You also have the option of exporting data in the form of a “native” gretl datafile, or (if the data are suitable) exporting to a gretl database. See <@url="gretl.sourceforge.net/gretl_data.html"> for an account of gretl databases.
You may also export data in a plain text format suitable for use with the following programs:
<indent>
• GNU R (<@url="www.r-project.org">): we write the data in a space-separated format that is easily digested by R's <@lit="read.table"> function. Default filename suffix: <@lit=".txt">
</indent>
<indent>
• GNU Octave (<@url="www.gnu.org/software/octave">): the data are written as a matrix in Octave's preferred format. Default filename suffix: <@lit=".m">
</indent>
<indent>
• JMulTi (<@url="www.jmulti.de">). Default filename suffix: <@lit=".dat">
</indent>
<indent>
• PcGive (<@url="www.pcgive.com">). Default filename suffix: <@lit=".dat">
</indent>
If you wish to export data by copying to the clipboard rather than writing to a file on disk, select the series you want to copy in the main window, right-click, and select “Copy to clipboard”. (Only CSV format is supported in this context.)
# factorized Graphs "Factorized plot"
This command requires the selection of three variables, the last of which must be a dummy variable (values 1 or 0). The Y variable is plotted against the X variable, with the data points colored differently depending on the value of the third.
Example: You have data on wages and educational attainment for a sample of people; you also have a dummy variable with value 1 for men and 0 for women (as in Ramanathan's <@lit="data7-2">). A “factorized plot” of <@lit="WAGE"> against <@lit="EDUC"> using the <@lit="GENDER"> dummy as factor will show the data points for men in one color and those for women in another (with a legend to identify them).
# fcast Prediction "Generate forecasts"
Must follow an estimation command. Forecasts are generated for the specified range of observations. Depending on the nature of the model, standard errors may also be generated (see below).
The choice between a static and a dynamic forecast applies only in the case of dynamic models, with an autoregressive error process or including one or more lagged values of the dependent variable as regressors. Static forecasts are one step ahead, based on realized values from the previous period, while dynamic forecasts employ the chain rule of forecasting. For example, if a forecast for <@mth="y"> in 2008 requires as input a value of <@mth="y"> for 2007, a static forecast is impossible without actual data for 2007. A dynamic forecast for 2008 is possible if a prior forecast can be substituted for <@mth="y"> in 2007.
The default is to give a static forecast for any portion of the forecast range that lies within the sample range over which the model was estimated, and a dynamic forecast (if relevant) out of sample. The <@opt="--dynamic"> option requests a dynamic forecast from the earliest possible date, and the <@opt="--static"> option requests a static forecast even out of sample.
The nature of the forecast standard errors (if available) depends on the nature of the model and the forecast. For static linear models standard errors are computed using the method outlined by <@bib="Davidson and MacKinnon (2004);davidson-mackinnon04">; they incorporate both uncertainty due to the error process and parameter uncertainty (summarized in the covariance matrix of the parameter estimates). For dynamic models, forecast standard errors are computed only in the case of a dynamic forecast, and they do not incorporate parameter uncertainty. For nonlinear models, forecast standard errors are not presently available.
Menu path: Model window, /Analysis/Forecasts
Script command: <@ref="fcast">
# fractint Statistics "Fractional integration"
Tests the specified series for fractional integration (“long memory”). The null hypothesis is that the integration order of the series is zero. By default the local Whittle estimator <@bib="(Robinson, 1995);robinson95"> is used but if the <@opt="--gph"> option is given the GPH test <@bib="(Geweke and Porter-Hudak, 1983);GPH83"> is performed instead. If the <@opt="--all"> flag is given then the results of both tests are printed.
For details on this sort of test, see <@bib="Phillips and Shimotsu (2004);phillips04">.
If the optional <@var="order"> argument is not given the order for the test(s) is set automatically as the lesser of <@mth="T">/2 and <@mth="T"><@sup="0.6">.
The results can be retrieved using the accessors <@xrf="$test"> and <@xrf="$pvalue">. These values are based on the Local Whittle Estimator unless the <@opt="--gph"> option is given.
Menu path: /Variable/Unit root tests/Fractional integration
Script command: <@ref="fractint">
# freq Statistics "Frequency distribution"
In the frequency plot dialog box you can control the characteristics of the plot in either of two ways.
First, you may choose the number of bins. In this case the width and placement of the bins are calculated automatically.
Alternatively, you may specify the lower limit of the left-most bin, and the width of the bins. In this case the number of bins is calculated automatically.
If you wish to align the bins on round numbers, here is one way to proceed: start by specifying the number of bins you want, and take a look at the plot that is produced. If it's not to your liking, take note of the modification that is required (for example, make the left-most bin start at 100 and impose a bin width of 200). Then make a second pass where you specify the left-hand limit and bin width.
This dialog also allows you to select a theoretical distribution to be plotted against the data: either the normal or the gamma. If the normal option is selected the Doornik–Hansen test for normality is computed. If the gamma option is selected, gretl computes Locke's nonparametric test for the null hypothesis that the variable follows the gamma distribution. Note that the parameterization of the gamma distribution used in gretl is (shape, scale).
Menu path: /Variable/Frequency distribution
Script command: <@ref="freq">
# garch Estimation "GARCH model"
Estimates a GARCH model (GARCH = Generalized Autoregressive Conditional Heteroskedasticity), either a univariate model or, if independent variables are selected, including the given exogenous variables. The conditional variance equation is shown below.
<@fig="garch_h">
The parameter <@var="p"> therefore represents the Generalized (or “AR”) order, while <@var="q"> represents the regular ARCH (or “MA”) order. If <@var="p"> is non-zero, <@var="q"> must also be non-zero otherwise the model is unidentified. However, you can estimate a regular ARCH model by setting <@var="q"> to a positive value and <@var="p"> to zero. The sum of <@var="p"> and <@var="q"> must be no greater than 5.
By default native gretl code is used in estimation of GARCH models, but you also have the option of using the algorithm of <@bib="Fiorentini, Calzolari and Panattoni (1996);fiorentini96">. The former uses the BFGS maximizer while the latter uses the information matrix to maximize the likelihood, with fine-tuning via the Hessian.
Several variant estimates of the coefficient covariance matrix are available with this command. By default, the Hessian is used unless the “Robust standard errors” box is checked, in which case the QML (White) covariance matrix is used. Other possibilities (e.g. the information matrix, or the Bollerslev–Wooldridge estimator) can be specified using the <@ref="set"> command.
The estimated conditional variance, along with the residuals and various other model statistics, can be accessed and added to the dataset using the “Save” menu in the window where the model is displayed. If the box marked “Standardize the residuals” is checked, the residuals are divided by the square root of the conditional variance.
Menu path: /Model/Time series/GARCH
Script command: <@ref="garch">
# genr Dataset "Generate a new variable"
NOTE: this command has undergone numerous changes and enhancements since the following help text was written, so for comprehensive and updated info on this command you'll want to refer to chapter 9 of the <@pdf="Gretl User's Guide#chap:genr">. On the other hand, this help does not contain anything actually erroneous, so take the following as “you have this, plus more”.
Use this box to define a new variable, on the pattern <@var="name"> = <@var="formula">. The formula should be a well-formed combination of variable names, constants, operators and functions (details below). To ensure you get the type of variable you want, you can prefix the formula with a type-name, e.g. <@lit="scalar">, <@lit="series"> or <@lit="matrix">. For example, to create a series that has a constant value of 10, you can type
<code>
series c = 10
</code>
(otherwise <@lit="c = 10"> would create a scalar variable).
Supported <@itl="arithmetical operators"> are, in order of precedence: <@lit="^"> (exponentiation); <@lit="*">, <@lit="/"> and <@lit="%"> (modulus or remainder); <@lit="+"> and <@lit="-">.
The available <@itl="Boolean operators"> are (again, in order of precedence): <@lit="!"> (negation), <@lit="&&"> (logical AND), <@lit="||"> (logical OR), <@lit=">">, <@lit="<">, <@lit="=="> (is equal to), <@lit=">="> (greater than or equal), <@lit="<="> (less than or equal) and <@lit="!="> (not equal). The Boolean operators can be used in constructing dummy variables: for instance <@lit="(x > 10)"> returns 1 if <@lit="x"> > 10, 0 otherwise.
Built-in constants are <@lit="pi"> and <@lit="NA">. The latter is the missing value code: you can initialize a variable to the missing value with <@lit="scalar x = NA">.
The <@lit="genr"> command supports a wide range of mathematical and statistical functions, including all the common ones plus several that are special to econometrics. In addition it offers access to numerous internal variables that are defined in the course of running regressions, doing hypothesis tests, and so on. For a listing of functions and accessors, see <@gfr="the Gretl function reference">.
Besides the operators and functions noted above there are some special uses of <@lit="genr">:
<indent>
• <@lit="genr time"> creates a time trend variable (1,2,3,…) called <@lit="time">. <@lit="genr index"> does the same thing except that the variable is called <@lit="index">.
</indent>
<indent>
• <@lit="genr dummy"> creates dummy variables up to the periodicity of the data. In the case of quarterly data (periodicity 4), the program creates <@lit="dq1"> = 1 for first quarter and 0 in other quarters, <@lit="dq2"> = 1 for the second quarter and 0 in other quarters, and so on. With monthly data the dummies are named <@lit="dm1">, <@lit="dm2">, and so on. With other frequencies the names are <@lit="dummy_1">, <@lit="dummy_2">, etc.
</indent>
<indent>
• <@lit="genr unitdum"> and <@lit="genr timedum"> create sets of special dummy variables for use with panel data. The first codes for the cross-sectional units and the second for the time period of the observations.
</indent>
<@itl="Note">: In the command-line program, <@lit="genr"> commands that retrieve model-related data always reference the model that was estimated most recently. This is also true in the GUI program, if one uses <@lit="genr"> in the “gretl console” or enters a formula using the “Define new variable” option under the Add menu in the main window. With the GUI, however, you have the option of retrieving data from any model currently displayed in a window (whether or not it's the most recent model). You do this under the “Save” menu in the model's window.
The special variable <@lit="obs"> serves as an index of the observations. For instance <@lit="series dum = (obs==15)"> will generate a dummy variable that has value 1 for observation 15, 0 otherwise. You can also use this variable to pick out particular observations by date or name. For example, <@lit="series d = (obs>1986:4)">, <@lit="series d = (obs>"2008-04-01")">, or <@lit="series d = (obs=="CA")">. If daily dates or observation labels are used in this context, they should be enclosed in double quotes. Quarterly and monthly dates (with a colon) may be used unquoted. Note that in the case of annual time series data, the year is not distinguishable syntactically from a plain integer; therefore if you wish to compare observations against <@lit="obs"> by year you must use the function <@lit="obsnum"> to convert the year to a 1-based index value, as in <@lit="series d = (obs>obsnum(1986))">.
Scalar values can be pulled from a series in the context of a <@lit="genr"> formula, using the syntax <@var="varname"><@lit="["><@var="obs"><@lit="]">. The <@var="obs"> value can be given by number or date. Examples: <@lit="x[5]">, <@lit="CPI[1996:01]">. For daily data, the form <@var="YYYY-MM-DD"> should be used, e.g. <@lit="ibm[1970-01-23]">.
An individual observation in a series can be modified via <@lit="genr">. To do this, a valid observation number or date, in square brackets, must be appended to the name of the variable on the left-hand side of the formula. For example, <@lit="genr x[3] = 30"> or <@lit="genr x[1950:04] = 303.7">.
Menu path: /Add/Define new variable
Other access: Main window pop-up menu
Script command: <@ref="genr">
# genrand Programming "Generating random variables"
In this dialog you must give a name for the variable to be created, plus some additional information depending on the distribution.
<indent>
• Uniform: the lower and upper bounds for the distribution.
</indent>
<indent>
• Normal: the mean and (positive) standard deviation.
</indent>
<indent>
• Chi-square and Student's t: the degrees of freedom, which must be positive.
</indent>
<indent>
• F: both numerator and denominator degrees of freedom.
</indent>
<indent>
• gamma: shape and scale parameters (both positive).
</indent>
<indent>
• Binomial: the “success” probability and the integer number of trials.
</indent>
<indent>
• Poisson: the positive mean (which also equals the variance).
</indent>
If you want to generate repeatable sequences of pseudo-random numbers, you can set the seed, under the Tools menu.
# genseed Programming "Setting the seed for random numbers"
The "seed" controls the starting point for the sequence of pseudo-random numbers generated in a given gretl session. By default the seed is set when the program is started, using the system time. This ensures that you get a different sequence of random numbers each time you run the program. If you want to obtain repeatable sequences, you need to set the seed manually (and take note of the value you used).
Note that whenever you click "OK" in this dialog box, the generator is re-started, using the given seed. So, for example, if you (a) set the seed to (say) 147; (b) generate a series from the standard normal distribution; (c) revisit this dialog and click "OK" again with the seed still at 147; then (d) generate a second series from the standard normal distribution, the two generated series will be identical.
# gmm Estimation "GMM estimation"
Performs Generalized Method of Moments (GMM) estimation using the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm. You must specify one or more commands for updating the relevant quantities (typically GMM residuals), one or more sets of orthogonality conditions, an initial matrix of weights, and a listing of the parameters to be estimated, all enclosed between the tags <@lit="gmm"> and <@lit="end gmm">. Any options should be appended to the <@lit="end gmm"> line.
Please see chapter 23 of the <@pdf="Gretl User's Guide#chap:gmm"> for details on this command. Here we just illustrate with a simple example.
<code>
gmm e = y - X*b
orthog e ; W
weights V
params b
end gmm
</code>
In the example above we assume that <@lit="y"> and <@lit="X"> are data matrices, <@lit="b"> is an appropriately sized vector of parameter values, <@lit="W"> is a matrix of instruments, and <@lit="V"> is a suitable matrix of weights. The statement
<code>
orthog e ; W
</code>
indicates that the residual vector <@lit="e"> is in principle orthogonal to each of the instruments composing the columns of <@lit="W">.
<@itl="Parameter names">
In estimating a nonlinear model it is often convenient to name the parameters tersely. In printing the results, however, it may be desirable to use more informative labels. This can be achieved via the additional keyword <@lit="param_names"> within the command block. For a model with <@mth="k"> parameters the argument following this keyword should be either a double-quoted string literal holding <@mth="k"> space-separated names or the name of a string variable that holds <@mth="k"> such names.
Menu path: /Model/GMM
Script command: <@ref="gmm">
# graphing Graphs "Graphing"
Gretl calls a separate program, namely gnuplot, to generate graphs. Gnuplot is a very full-featured graphing program with myriad options. Gretl gives you direct access, via a graphical interface, to a subset of these options and it tries to choose sensible values for you; it also allows you to take complete control over graph details if you wish.
With a graph displayed, you can click on the graph window for a pop-up menu with several options, including these:
<indent>
• Save as PNG: save in Portable Network Graphics format
</indent>
<indent>
• Save as postscript: save the graph in encapsulated postscript (EPS) format
</indent>
<indent>
• Save to session as icon: the graph will appear in iconic form when you select “Icon view” from the Session menu
</indent>
<indent>
• Zoom: lets you select an area within the graph for closer inspection
</indent>
<indent>
• Copy to clipboard: lets you paste the graph into applications such as word processors
</indent>
<indent>
• Edit: opens a controller for the plot which lets you adjust various aspects of its appearance
</indent>
<indent>
• Close: closes the graph window
</indent>
If you know something about gnuplot and wish to get finer control over the appearance of a graph than is available via the graphical controller (“Edit” option), you have a further option:
<indent>
• Once the graph is saved as a session icon, right-click on its icon for a further pop-up menu. One of the options here is “Edit plot commands”: this opens an editing window with the actual gnuplot commands displayed. You can edit these commands and either save them for future processing or send them to gnuplot (with the execute toolbar icon in the plot commands window).
</indent>
To find out more about gnuplot, see <@url="www.gnuplot.info">.
# graphpg Graphs "Gretl graph page"
The session “graph page” will work only if you have the LaTeX typesetting system installed, and are able to generate and view PDF or PostScript output.
In the session icon window, you can drag up to eight graphs onto the graph page icon. When you double-click on the graph page (or right-click and select “Display”), a page containing the selected graphs will be composed and opened in a suitable viewer. From there you should be able to print the page.
To clear the graph page, right-click on its icon and select “Clear”.
Note that on systems other than MS Windows, you may have to adjust the setting for the program used to view PDF or PostScript files. Find that under the “Programs” tab in the gretl Preferences dialog box (under the Tools menu in the main window).
It's also possible to operate on the graph page via script, or using the console (in the GUI program). The following commands and options are supported:
To add a graph to the graph page, issue the command <@lit="graphpg add"> after saving a named graph, as in
<code>
grf1 <- gnuplot Y X
graphpg add
</code>
To display the graph page: <@lit="graphpg show">.
To clear the graph page: <@lit="graphpg free">.
To adjust the scale of the font used in the graph page, use <@lit="graphpg fontscale"> <@var="scale">, where <@var="scale"> is a multiplier (with a default of 1.0). Thus to make the font size 50 percent bigger than the default you can do
<code>
graphpg fontscale 1.5
</code>
To call for printing of the graph page to file, use the flag <@opt="--output="> plus a filename; the filename should have the suffix “<@lit=".pdf">”, “<@lit=".ps">” or “<@lit=".eps">”. For example:
<code>
graphpg --output="myfile.pdf"
</code>
The output file will be written in the currently set <@ref="workdir">, unless the <@var="filename"> string contains a full path specification.
In this context the output uses colored lines by default; to use dot/dash patterns instead of colors you can append the <@opt="--monochrome"> flag.
Script command: <@ref="graphpg">
# 3-D Graphs "3-dimensional plots"
If the “Make plot interactive” button is available and checked, you can manipulate the 3-D plot with the mouse (rotate it, and expand or shrink the axes).
In composing a 3-D plot, note that the Z-axis will be shown as the vertical axis. Thus if you have some dependent variable that you think may be influenced by two independent variables, you should put the dependent variable on the Z-axis, and the independent variables on the X and Y axes.
Unlike most other gretl graphs, an interactive 3-D plot is controlled by gnuplot rather than gretl itself. The gretl graph-editing menu is not available.
# gui-funcs Programming "Special functions"
This dialog enables you to specify which functions within a package, if any, should be assigned to certain special roles. Note that a given function can be assigned to at most one of the following roles, and to qualify as a candidate for one of these roles a function has to satisfy certain criteria.
<indent>
• <@lit="bundle-print">: prints output based on the content of a bundle produced by your package. Criteria: this function must have as its first parameter a bundle-pointer. If a second parameter is present it must take the form of an integer switch that has a default value.
</indent>
<indent>
• <@lit="bundle-plot">: produces one or more plots using a bundle produced by your package. Criteria: as for <@lit="bundle-print">.
</indent>
<indent>
• <@lit="bundle-test">: carries out some sort of statistical test using a bundle produced by your package. Criteria: as for <@lit="bundle-print">.
</indent>
<indent>
• <@lit="gui-main">: the public interface that should be presented to users by default in GUI use. This is useful only if the package has more than one public interface.
</indent>
<indent>
• <@lit="gui-precheck">: gate-keeper function which returns 0 if the functionality of your package is applicable in the current context, non-zero otherwise. This is intended for use with packages that operate on a model in some way, to screen out types of model that are not handled by the package.
</indent>
In addition certain functions may be marked as “no-print”. Usually, when a function is invoked via the GUI program, gretl opens a window to display its text output. By checking this box you are telling gretl not to do this, since no text output should be expected.
Finally, the <@lit="gui-main"> function (if any) can be marked as “menu-only”. This tells gretl that the function in question is specifically designed to be called from the GUI menu to which it is attached, and should not be presented to users otherwise.
# gui-htest Tests "Test statistic calculator"
Gretl's test calculator computes test statistics and <@mth="p">-values for various common hypothesis tests concerning one or two populations. The required input takes the form of sample statistics derived from one or two samples, depending on the test chosen. These statistics can be typed in as numerical values. Alternatively, if you have a data file open, you can get gretl to calculate sample statistics for a selected variable or variables (in the case of means and variances, but not in the case of proportions).
If you want to base your test on a variable in the data set, first activate this option by checking the box titled “Use variable from dataset”. Then the drop-down list of variables will become active and you can select a variable. When you select a variable from the list, the relevant statistics are automatically entered in the boxes below.
In addition to the simple selection of a variable, you have the option of specifying a restriction on the selected variable (that is, defining a sub-sample). For example, suppose you have wage data in a variable called <@lit="wage"> and you also have a dummy variable <@lit="gender"> that equals 1 for males and 0 for females (or vice versa). Then, in the test for the difference of two means, you could select <@lit="wage"> in both slots, but add to the top slot <@lit="(gender=0)"> and to the bottom <@lit="(gender=1)">. This would then give you a test for the difference between mean male income and mean female income. Note that when you type a restriction in this way, you must then press the Enter key to have the sample statistics calculated.
The sub-sampling restriction must be placed in parentheses following the selected variable, and in general the restriction takes the form
<code>
var2 op val
</code>
where <@lit="var2"> is the name of a variable in the current data set, <@lit="val"> is a numerical value, and <@lit="op"> is one of the following comparison operators:
<code>
== != < > <= >=
</code>
(respectively equality, inequality, less than, greater than, less than or equal, and greater than or equal). The spaces around the operator are optional.
# gui-htest-np Tests "Nonparametric tests"
Three sorts of nonparametric test are available via this dialog: for a difference between groups, for randomness, and for (rank) correlation.
<@itl="Difference tests">
Under the “Difference test” tab you can carry out a nonparametric test for a difference between two populations or groups, the specific test depending on the option selected.
<indent>
• <@itl="Sign test">: This test is based on the fact that if two samples, <@mth="x"> and <@mth="y">, are drawn randomly from the same distribution, the probability that <@mth="x"><@sub="i"> > <@mth="y"><@sub="i">, for each observation <@mth="i">, should equal 0.5. The test statistic is <@mth="w">, the number of observations for which <@mth="x"><@sub="i"> > <@mth="y"><@sub="i">. Under the null hypothesis this follows the Binomial distribution with parameters (<@mth="n">, 0.5), where <@mth="n"> is the number of observations.
</indent>
<indent>
• <@itl="Rank sum test">: The Wilcoxon rank-sum test is performed. This test proceeds by ranking the observations from both samples jointly, from smallest to largest, then finding the sum of the ranks of the observations from one of the samples. The two samples do not have to be of the same size, and if they differ the smaller sample is used in calculating the rank-sum. Under the null hypothesis that the samples are drawn from populations with the same median, the probability distribution of the rank-sum can be computed for any given sample sizes; and for reasonably large samples a close Normal approximation exists.
</indent>
<indent>
• <@itl="Signed rank test">: The Wilcoxon signed-rank test is performed. This is designed for matched data pairs such as, for example, the values of a variable for a sample of individuals before and after some treatment. The test proceeds by finding the differences between the paired observations, <@mth="x"><@sub="i"> – <@mth="y"><@sub="i">, ranking these differences by absolute value, then assigning to each pair a signed rank, the sign agreeing with the sign of the difference. One then calculates <@mth="W"><@sub="+">, the sum of the positive signed ranks. As with the rank-sum test, this statistic has a well-defined distribution under the null that the median difference is zero, which converges to the Normal for samples of reasonable size.
</indent>
<@itl="Randomness">
Under the “Runs test” tab you can carry out a test for the randomness of a given variable, based on the number of runs of consecutive positive or negative values. If you select the option “Use first difference”, the variable is differenced prior to the analysis and hence the runs are interpreted as runs of increasing or decreasing values of the original variable. The test statistic is based on a normal approximation to the distribution of the number of runs under the null of randomness.
<@itl="Correlation">
Under the “Correlation” tab you have Spearman's rank correlation rho and Kendall's rank correlation tau.
# hausman Tests "Panel diagnostics"
This test is available only after estimating an OLS model using panel data (see also <@lit="setobs">). It tests the simple pooled model against the principal alternatives, the fixed effects and random effects models.
The fixed effects model allows the intercept of the regression to vary across the cross-sectional units. An <@mth="F">-test is reported for the null hypotheses that the intercepts do not differ. The random effects model decomposes the residual variance into two parts, one part specific to the cross-sectional unit and the other specific to the particular observation. (This estimator can be computed only if the number of cross-sectional units in the data set exceeds the number of parameters to be estimated.) The Breusch–Pagan LM statistic tests the null hypothesis that the pooled OLS estimator is adequate against the random effects alternative.
The pooled OLS model may be rejected against both of the alternatives, fixed effects and random effects. Provided the unit- or group-specific error is uncorrelated with the independent variables, the random effects estimator is more efficient than the fixed effects estimator; otherwise the random effects estimator is inconsistent and the fixed effects estimator is to be preferred. The null hypothesis for the Hausman test is that the group-specific error is not so correlated (and therefore the random effects model is preferable). A low p-value for this test counts against the random effects model and in favor of fixed effects.
The two options for this command pertain to the random effects model. By default the method of Swamy and Arora is used, and the Hausman test statistic is calculated using the regression method. The options enable the use of Nerlove's alternative variance estimator and/or the matrix-difference approach to the Hausman statistic.
Menu path: Model window, /Tests/Panel diagnostics
Script command: <@ref="hausman">
# hccme Estimation "Robust standard errors"
You are offered several variant calculations for standard errors that are robust in the presence of heteroskedasticity (and, in the case of the HAC estimator, autocorrelation).
HC0 produces the original “White's standard errors”; HC1, HC2, HC3 and HC3a are subsequent variations that are generally reckoned to produce superior (more reliable) results. For details of the estimators, see <@bib="MacKinnon and White (Journal of Econometrics, 1985);mackinnon-white85"> or <@bib="Davidson and MacKinnon, Econometric Theory and Methods (Oxford, 2004);davidson-mackinnon04">. The labels given here are those used by Davidson and MacKinnon. Variant “HC3a” is the jackknife, as described in MacKinnon and White; HC3 is a close approximation to the jackknife.
If you use the HAC estimator for OLS on time-series data, you are able to fine-tune the lag-length using the <@lit="set"> command. Please see the gretl manual or the script commands help file for details.
When estimating a model via OLS using panel data, the default robust estimator of the covariance matrix is that given by Arellano. The alternative is Beck and Katz's Panel Corrected Standard Errors (PCSE). The latter take into account heteroskedasticity but not autocorrelation.
Two robust estimators of the covariance matrix are offered for GARCH models: QML is the Quasi-Maximum Likelihood Estimator, and BW is the Bollerslev-Wooldridge estimator.
By default gretl uses the Student <@mth="t"> distribution when calculating p-values based on robust standard errors in the context of least squares estimators. The option labeled “Use the normal distribution for robust p-values” can be used to change this behavior.
# hsk Estimation "Heteroskedasticity-corrected estimates"
This command is applicable where heteroskedasticity is present in the form of an unknown function of the regressors which can be approximated by a quadratic relationship. In that context it offers the possibility of consistent standard errors and more efficient parameter estimates as compared with OLS.
The procedure involves (a) OLS estimation of the model of interest, followed by (b) an auxiliary regression to generate an estimate of the error variance, then finally (c) weighted least squares, using as weight the reciprocal of the estimated variance.
In the auxiliary regression (b) we regress the log of the squared residuals from the first OLS on the original regressors and their squares (by default), or just on the original regressors (if the “include squares” box is cleared). The log transformation is performed to ensure that the estimated variances are all non-negative. Call the fitted values from this regression <@mth="u"><@sup="*">. The weight series for the final WLS is then formed as 1/exp(<@mth="u"><@sup="*">).
Menu path: /Model/Other linear models/Heteroskedasticity corrected
Script command: <@ref="hsk">
# hurst Statistics "Hurst exponent"
Calculates the Hurst exponent (a measure of persistence or long memory) for a time-series variable having at least 128 observations.
The Hurst exponent is discussed by <@bib="Mandelbrot (1983);mandelbrot83">. In theoretical terms it is the exponent, <@mth="H">, in the relationship
<@fig="hurst">
where RS is the “rescaled range” of the variable <@mth="x"> in samples of size <@mth="n"> and <@mth="a"> is a constant. The rescaled range is the range (maximum minus minimum) of the cumulated value or partial sum of <@mth="x"> over the sample period (after subtraction of the sample mean), divided by the sample standard deviation.
As a reference point, if <@mth="x"> is white noise (zero mean, zero persistence) then the range of its cumulated “wandering” (which forms a random walk), scaled by the standard deviation, grows as the square root of the sample size, giving an expected Hurst exponent of 0.5. Values of the exponent significantly in excess of 0.5 indicate persistence, and values less than 0.5 indicate anti-persistence (negative autocorrelation). In principle the exponent is bounded by 0 and 1, although in finite samples it is possible to get an estimated exponent greater than 1.
In gretl, the exponent is estimated using binary sub-sampling: we start with the entire data range, then the two halves of the range, then the four quarters, and so on. For sample sizes smaller than the data range, the RS value is the mean across the available samples. The exponent is then estimated as the slope coefficient in a regression of the log of RS on the log of sample size.
By default, if the program is not in batch mode a plot of the rescaled range is shown. This can be adjusted via the <@opt="--plot"> option. The acceptable parameters to this option are <@lit="none"> (to suppress the plot); <@lit="display"> (to display a plot even when in batch mode); or a file name. The effect of providing a file name is as described for the <@opt="--output"> option of the <@ref="gnuplot"> command.
Menu path: /Variable/Hurst exponent
Script command: <@ref="hurst">
# intreg Estimation "Interval regression model"
Estimates an interval regression model. This model arises when the dependent variable is imperfectly observed for some (possibly all) observations. In other words, the data generating process is assumed to be
<@mth="y* = x b + u">
but we only observe <@mth="m <= y* <= M"> (the interval may be left- or right-unbounded). Note that for some observations <@mth="m"> may equal <@mth="M">. The variables <@var="minvar"> and <@var="maxvar"> must contain <@lit="NA">s for left- and right-unbounded observations, respectively.
In the model specification dialog, <@var="minvar"> and <@var="maxvar"> are indentified as the Lower bound variable and the Upper bound variable respectively.
The model is estimated by maximum likelihood, assuming normality of the disturbance term.
By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a “sandwich” of the inverse of the estimated Hessian and the outer product of the gradient.
Menu path: /Model/Limited dependent variable/Interval regression
Script command: <@ref="intreg">
# irfboot Graphs "Impulse response plots"
If you select the bootstrap option when plotting impulse responses, gretl computes a confidence interval for the responses using the bootstrap method. The residuals from the original VAR (or VECM) are resampled with replacement; an artificial dataset is constructed based on the original parameter estimates and the resampled residuals; the system is re-estimated and the impulse responses are re-evaluated. This is repeated 999 times and the α/2 and 1 – α/2 quantiles for the responses are found and plotted along with the point estimates. This option is not currently available for restricted VECMs.
This dialog also supports reordering of the variables for the Cholesky decomposition of the cross-equation covariance matrix. The default is given by the order in which the variables are entered into the model specification, but the up and down arrows can be used to promote or demote a selected variable.
Regarding the scale of the impulse responses: the “shock” is sized at one standard deviation of the estimated innovations in the source variable, and the responses are given in whatever is the “natural” unit of the target variable.
# join Dataset "Append data with controls"
This dialog gives you access to some, but not all, of the functionality of the <@lit="join"> command. For full details see chapter 7 of the <@pdf="Gretl User's Guide#chap:join">.
On the left you should see a listing of series in the current dataset. You can select a series here and use the arrow buttons to specify it as one or other of the (optional) “inner keys”. Keys work to match rows between the current dataset and the file from which you are importing data.
On the right should be listed the series in the data file you selected. The arrow buttons can be used to select from that list the name of the series to import, and (if required) the names of series that correspond to the “inner” keys. (By default the inner and outer keys are presumed to have the same name.)
In the middle panel of the dialog you can specify additional parameters for the “join” operation:
<indent>
• A name under which the imported series should be known. (By default this is the same as the “import” name).
</indent>
<indent>
• A filter expression. This will be evaluated for each row in the outer dataset, and only rows for which the expression yields a non-zero value will be imported.
</indent>
<indent>
• An aggregation method. This is required only if matching by keys selects more than one outer value per inner observation.
</indent>
# kpss Tests "KPSS stationarity test"
Computes the KPSS test (Kwiatkowski, Phillips, Schmidt and Shin, Journal of Econometrics, 1992) for stationarity of the given variable (or its first difference, if the differencing option is selected). The null hypothesis is that the variable in question is stationary, either around a level or, if the “include a trend” box is checked, around a deterministic linear trend.
The selected lag order determines the size of the window used for Bartlett smoothing. If the “show regression results” box is checked the results of the auxiliary regression are printed, along with the estimated variance of the random walk component of the variable.
The critical values shown for the test statistic are based on response surfaces estimated in the manner set out by <@bib="Sephton (Economics Letters, 1995);sephton95">, which are more accurate for small samples than the values given in the original KPSS article. When the test statistic lies between the 10 percent and 1 percent critical values a p-value is shown; this is obtained by linear interpolation and should not be taken too literally. See the <@xrf="kpsscrit"> function for a means of obtaining these critical values programmatically.
Menu path: /Variable/Unit root tests/KPSS test
Script command: <@ref="kpss">
# lad Estimation "Least Absolute Deviation estimation"
Calculates a regression that minimizes the sum of the absolute deviations of the observed from the fitted values of the dependent variable. Coefficient estimates are derived using the Barrodale–Roberts simplex algorithm; a warning is printed if the solution is not unique.
Standard errors are derived using the bootstrap procedure with 500 drawings. The covariance matrix for the parameter estimates, printed when the <@opt="--vcv"> flag is given, is based on the same bootstrap.
Menu path: /Model/Robust estimation/Least Absolute Deviation
Script command: <@ref="lad">
# lags-dialog Estimation "Lag selection box"
In this dialog you can select the lag order for the independent variables in a time-series model, and in some cases for the dependent variable also. (But note that the common lag order for vector models such as VARs and VECMs is handled separately, via a selection spinner in the main model dialog box.)
The spinners on the left let you select a range of consecutive lags for any given variable. To specify non-consecutive lags, click the check box next to the entry field titled “specific lags”. This activates the entry box, into which you can type a list of lags, separated by spaces.
The row marked “default” offers a quick way to set a common lag specification for all the independent variables: values set in that row are copied to all the others (apart from the dependent variable, if present).
The dependent variable is treated specially: the minimum lag must be zero, which places the current value of the variable on the left-hand side of the model. Any higher lags appear with the independent variables on the right-hand side of the model.
Values selected in this dialog are remembered for the duration of your session with a given dataset.
# leverage Tests "Influential observations"
Must follow an <@lit="ols"> command. Calculates the leverage (<@mth="h">, which must lie in the range 0 to 1) for each data point in the sample on which the previous model was estimated. Displays the residual (<@mth="u">) for each observation along with its leverage and a measure of its influence on the estimates, <@mth="uh">/(1 – <@mth="h">). “Leverage points” for which the value of <@mth="h"> exceeds 2<@mth="k">/<@mth="n"> (where <@mth="k"> is the number of parameters being estimated and <@mth="n"> is the sample size) are flagged with an asterisk. For details on the concepts of leverage and influence see <@bib="Davidson and MacKinnon (1993);davidson-mackinnon93">, Chapter 2.
DFFITS values are also computed: these are “studentized residuals” (predicted residuals divided by their standard errors) multiplied by <@fig="dffit">. For discussions of studentized residuals and DFFITS see chapter 12 of Maddala's <@bib="Introduction to Econometrics;maddala92"> or <@bib="Belsley, Kuh and Welsch (1980);belsley-etal80">.
Briefly, a “predicted residual” is the difference between the observed value of the dependent variable at observation <@mth="t">, and the fitted value for observation <@mth="t"> obtained from a regression in which that observation is omitted (or a dummy variable with value 1 for observation <@mth="t"> alone has been added); the studentized residual is obtained by dividing the predicted residual by its standard error.
The "+" icon at the top of the leverage test window brings up a dialog box that allows you to save one or more of the test variables to the current data set.
After execution, the <@xrf="$test"> accessor returns the cross-validation criterion, which is defined as the sum of squared deviations of the dependent variable from its forecast value, the forecast for each observation being based on a sample from which that observation is excluded. (This is known as the <@itl="leave-one-out"> estimator). For a broader discussion of the cross-validation criterion, see Davidson and MacKinnon's <@itl="Econometric Theory and Methods">, pages 685–686, and the references therein.
Menu path: Model window, /Analysis/Influential observations
Script command: <@ref="leverage">
# levinlin Tests "Levin-Lin-Chu test"
Carries out the panel unit-root test described by <@bib="Levin, Lin and Chu (2002);LLC2002">. The null hypothesis is that all of the individual time series exhibit a unit root, and the alternative is that none of the series has a unit root. (That is, a common AR(1) coefficient is assumed, although in other respects the statistical properties of the series are allowed to vary across individuals.)
Menu path: /Variable/Unit root tests/Levin-Lin-Chu test
Script command: <@ref="levinlin">
# loess Estimation "Loess"
Performs locally-weighted polynomial regression and produces a series containing predicted values of the dependent variable for each non-missing value of the independent variable. The method is as described by <@bib="William Cleveland (1979);cleveland79">.
The controls allow you to specify the order of the polynomial in the independent variable and the proportion of the data points to be used in each local regression (the bandwidth). Higher values of the bandwidth produce a smoother outcome.
If the robust weights box is checked the local regression procedure is iterated twice, with the weights being modified based on the residuals from the previous iteration so as to give less influence to outliers.
# logistic Estimation "Logistic regression"
Logistic regression: carries out an OLS regression using the logistic transformation of the dependent variable,
<@fig="logistic1">
The dependent variable must be strictly positive. If all its values lie between 0 and 1, the default is to use a <@mth="y"><@sup="*"> value (the asymptotic maximum of the dependent variable) of 1; if its values lie between 0 and 100, the default <@mth="y"><@sup="*"> is 100.
You may specify a different maximum <@mth="y"> value. Note that the supplied value must be greater than all of the observed values of the dependent variable.
The fitted values and residuals from the regression are automatically transformed using
<@fig="logistic2">
where <@mth="x"> represents either a fitted value or a residual from the OLS regression using the transformed dependent variable. The reported values are therefore comparable with the original dependent variable.
Note that if the dependent variable is binary, you should use the <@ref="logit"> command instead.
Menu path: /Model/Limited dependent variable/Logistic
Script command: <@ref="logistic">
# logit Estimation "Logit regression"
If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on <@var="indepvars"> are obtained via the Newton–Raphson method. As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed by giving the <@opt="--p-values"> option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant.
By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a “sandwich” of the inverse of the estimated Hessian and the outer product of the gradient. See chapter 10 of Davidson and MacKinnon for details.
If the dependent variable is not binary but is discrete, then by default it is interpreted as an ordinal response, and Ordered Logit estimates are obtained. However, if the <@opt="--multinomial"> option is given, the dependent variable is interpreted as an unordered response, and Multinomial Logit estimates are produced. (In either case, if the variable selected as dependent is not discrete an error is flagged.) In the multinomial case, the accessor <@lit="$mnlprobs"> is available after estimation, to get a matrix containing the estimated probabilities of the outcomes at each observation (observations in rows, outcomes in columns).
If you want to use logit for analysis of proportions (where the dependent variable is the proportion of cases having a certain characteristic, at each observation, rather than a 1 or 0 variable indicating whether the characteristic is present or not) you should not use the <@lit="logit"> command, but rather construct the logit variable, as in
<code>
series lgt_p = log(p/(1 - p))
</code>
and use this as the dependent variable in an OLS regression. See chapter 12 of <@bib="Ramanathan (2002);ramanathan02">.
Menu path: /Model/Limited dependent variable/Logit
Script command: <@ref="logit">
# mahal Statistics "Mahalanobis distances"
Computes the Mahalanobis distances between the series in <@var="varlist">. The Mahalanobis distance is the distance between two points in a <@mth="k">-dimensional space, scaled by the statistical variation in each dimension of the space. For example, if <@mth="p"> and <@mth="q"> are two observations on a set of <@mth="k"> variables with covariance matrix <@mth="C">, then the Mahalanobis distance between the observations is given by
<@fig="mahal">
where (<@mth="p"> – <@mth="q">) is a <@mth="k">-vector. This reduces to Euclidean distance if the covariance matrix is the identity matrix.
The space for which distances are computed is defined by the selected variables. For each observation in the current sample range, the distance is computed between the observation and the centroid of the selected variables. This distance is the multidimensional counterpart of a standard <@mth="z">-score, and can be used to judge whether a given observation “belongs” with a group of other observations.
If the number of variables selected is 4 or less, the covariance matrix and its inverse are printed. Clicking the "+" button at the top of the window displaying the distances give you the option of adding the distances to the dataset as a new variable.
Menu path: /View/Mahalanobis distances
Script command: <@ref="mahal">
# meantest Tests "Difference of means"
By default the test statistic is calculated on the assumption that the variances are equal for the two variables. With the <@opt="--unequal-vars"> option the variances are assumed to be different; in this case the degrees of freedom for the test statistic are approximated as per <@bib="Satterthwaite (1946);satter46">.
Calculates the t-test for the null hypothesis that the population means are equal for two selected series, and shows its p-value. The command may be called with or without the assumption that the variances are equal for the two variables. In the latter case the degrees of freedom for the test are approximated as per <@bib="Satterthwaite (1946);satter46">.
Menu path: /Tools/Test statistic calculator
Script command: <@ref="meantest">
# MIDAS_list Dataset "MIDAS list"
A MIDAS list (MIDAS = Mixed Data Sampling) is a named list (of series) whose members jointly represent a time-series variable which is observed at a higher frequency than that of the “host” dataset. For example, such a list might represent a monthly series in the context of a quarterly or annual dataset, or a daily series in the context of a monthly dataset.
Such a list must have <@mth="m"> members, where <@mth="m"> is the number of high-frequency periods per dataset period: each series holds the values for a given sub-period. In the monthly/quarterly case, this means the list has three members: one holds values for the first month of the quarter; another holds values for the second month; and another, values for the third month.
Moreover, these list members must be arranged in a particular order, namely <@itl="most recent first">. Continuing the quarterly/monthly example, the order must be month 3, month 2, month 1. This may seem “backwards”, but it's the order that is required for creating lists of lags, which are the stock in trade of MIDAS modeling.
For guidance on how to create a dataset that supports MIDAS lists, please see
<@url="http://gretl.sourceforge.net/midas/midas_gretl.pdf">
# MIDAS_parm Estimation "MIDAS hyper-parameters"
In this dialog you are asked to select the type of parameterization for a set of high-frequency terms, as well as the range of lags of these terms. The supported types of parameterization are:
<indent>
• U-MIDAS or “unrestricted MIDAS”: each lag has its own coefficient.
</indent>
<indent>
• Normalized exponential Almon: this requires at least one parameter and commonly uses two.
</indent>
<indent>
• Normalized beta with a zero last lag; requires exactly two parameters.
</indent>
<indent>
• Normalized beta with non-zero last lag; requires exactly three parameters.
</indent>
<indent>
• Almon polynomial; requires at least one parameter.
</indent>
<indent>
• Normalized beta, one parameter: this is a variant of the normalized beta with a zero last lag, in which the first parameter is fixed at 1.0. The second parameter is estimated subject to the restriction that it be at least 1.0.
</indent>
# midasreg Estimation "MIDAS regression"
Carries out least-squares estimation (either NLS or OLS, depending on the specification) of a MIDAS (Mixed Data Sampling) model. Such models include one or more independent variables that are observed at a higher frequency than the dependent variable; for a good brief introduction see <@bib="Armesto, Engemann and Owyang (2010);armesto10">.
The variables under <@var="Regressors"> are of the same frequency as the dependent variable, and are selected from the upper list on the left. MIDAS models typically include one or more lags of the dependent variable; that is controlled by the “AR order” spin button, which defaults to 1 lag. To add MIDAS (high frequency) terms, select from the lower left-hand list and use the lower green arrow (or right-click).
On adding a MIDAS term, a dialog pops up to let you select the range of lags, the parameterization type, and (for types that do not have a fixed number of parameters) the number of hyper-parameters. You can bring this dialog up again to revise a specification by right-clicking on a MIDAS term on the right.
The estimation method used by this command depends on the specification of the high-frequency terms. In the case of U-MIDAS the method is OLS, otherwise it is nonlinear least squares (NLS). When the normalized exponential Almon or normalized beta parameterization is specified, the NLS method is a combination of constrained BFGS and OLS, unless you check the box labeled “Prefer NLS via Levenberg-Marquardt”.
Menu path: /Model/Time series/MIDAS
Script command: <@ref="midasreg">
# missing Dataset "Missing data values"
Set a numerical value that will be interpreted as “missing” or “not available”, either for a particular data series (under the Variable menu) or globally for the entire data set (under the Data menu).
Gretl has its own internal coding for missing values, but sometimes imported data may employ a different code. For example, if a particular series is coded such that a value of -1 indicates “not applicable”, you can select “Set missing value code” under the Variable menu and type in the value “-1” (without the quotes). Gretl will then read the -1s as missing observations.
# menu-attach Programming "Menu attachment"
This dialog enables you to specify a menu attachment for a function package. To do this you must complete the following three fields in the dialog box.
<@itl="1. Label">
This requires a short label string, which will appear as the menu entry for the package.
<@itl="2. Window">
Select “model window” for a function package that does something with a gretl model, and should appear in the menu bar in a gretl model window. Otherwise, select “main window”.
<@itl="3. Menu tree">
Select the position within the menu tree (for either the main window or the model window, as chosen above) where the entry for the package should appear.
<@itl="Optional elements">
In addition you can use the “GUI help text” button to add or edit GUI-specific help text, to be shown when the package is called from a menu. And if the package is intended to be called from a model window you can specify a certain type of model (identified by its gretl command-word) as a requirement.
# mle Estimation "Maximum likelihood estimation"
Performs Maximum Likelihood (ML) estimation using either the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm or Newton's method. You must specify the log-likelihood function; it is recommended that you also supply expressions for the derivatives of this function with respect to each of the parameters if possible.
Simple example: Suppose we have a series <@lit="X"> with values 0 or 1 and we wish to obtain the maximum likelihood estimate of the probability, <@lit="p">, that <@lit="X"> = 1. (In this simple case we can guess in advance that the ML estimate of <@lit="p"> will simply equal the proportion of Xs equal to 1 in the sample.)
The parameter <@lit="p"> must first be added to the dataset and given an initial value. This can be done using the genr command or via menu choices. Appropriate “genr” lines may be typed into the MLE specification window prior to the specification of the log-likelihood function.
In the MLE window we type the following lines:
<code>
loglik = X*log(p) + (1-X)*log(1-p)
deriv p = X/p - (1-X)/(1-p)
</code>
The first line specifies the log-likelihood function, and the next line supplies the derivative of that function with respect to the parameter p. If no "deriv" lines are given, a numerical approximation to the derivatives is computed.
If the parameter p was not previously declared we could preface the above lines with something like the following:
<code>
scalar p = 0.5
</code>
By default, standard errors are based on the Outer Product of the Gradient. If the robust standard errors box is checked, a QML estimator is used (namely, a sandwich of the negative inverse of the Hessian and the covariance matrix of the gradient). The Hessian is approximated numerically.
For a much more in-depth description of <@lit="mle">, please refer to chapter 22 of the <@pdf="Gretl User's Guide#chap:mle">.
Menu path: /Model/Maximum likelihood
Script command: <@ref="mle">
# modeltab Utilities "The model table"
In econometric research it is common to estimate several models with a common dependent variable—the models differing in respect of which independent variables are included, or perhaps in respect of the estimator used. In this situation it is convenient to present the regression results in the form of a table, where each column contains the results (coefficient estimates and standard errors) for a given model, and each row contains the estimates for a given variable across the models.
Gretl provides a means of constructing such a table (and copying it in plain text, LaTeX or Rich Text Format). Here is how to do it:
<indent>
1. Estimate a model which you wish to include in the table, and in the model display window, under the File menu, select “Save to session as icon” or “Save as icon and close”.
</indent>
<indent>
2. Repeat step 1 for the other models to be included in the table (up to a total of six models).
</indent>
<indent>
3. When you are done estimating the models, open the icon view of your gretl session (by selecting “icon view” under the View menu in the main gretl window, or by clicking the “session icon view” icon on the gretl toolbar).
</indent>
<indent>
4. In session icon view, there is an icon labeled “Model table”. Decide which model you wish to appear in the left-most column of the model table and add it to the table, either by dragging its icon onto the Model table icon, or by right-clicking on the model icon and selecting “Add to model table” from the pop-up menu.
</indent>
<indent>
5. Repeat step 4 for the other models you wish to include in the table. The second model selected will appear in the second column from the left, and so on.
</indent>
<indent>
6. When you are finished composing the model table, display it by double-clicking on its icon. Via the Copy button in the window which appears, you have the option of copying the table to the clipboard in various formats.
</indent>
<indent>
7. If the ordering of the models in the table is not what you wanted, right-click on the model table icon and select “Clear table”. Then go back to step 4 above and try again.
</indent>
Menu path: Session icon window, Model table icon
Script command: <@ref="modeltab">
# mpols Estimation "Multiple-precision OLS"
Computes OLS estimates for the specified model using multiple precision floating-point arithmetic, with the help of the Gnu Multiple Precision (GMP) library. By default 256 bits of precision are used for the calculations, but this can be increased via the environment variable <@lit="GRETL_MP_BITS">. For example, when using the bash shell one could issue the following command, before starting gretl, to set a precision of 1024 bits.
<code>
export GRETL_MP_BITS=1024
</code>
Menu path: /Model/Other linear models/High precision OLS
Script command: <@ref="mpols">
# nadarwat Estimation "Nadaraya-Watson"
Computes the Nadaraya–Watson nonparametric estimator of the conditional mean of the dependent variable, <@mth="m(x)">, for each non-missing value of the independent variable.
The kernel function <@mth="K"> is given by <@mth="K = exp(-x"><@sup="2"><@mth=" / 2h)"> for <@mth="|x| < T"> and zero otherwise.
The bandwidth, usually a small number, controls the smoothness of <@mth="m(x)"> (higher values producing a smoother series); the default value is <@mth="n"><@sup="-0.2">.
If the “leave-one-out” box is checked, a variant of the estimator is employed in which the <@mth="i">-th observation is not used in evaluating <@mth="m(x"><@sub="i"><@mth=")">. This makes the Nadaraya–Watson estimator more robust numerically and its usage is normally advised when the estimator is computed for inference purposes.
# negbin Estimation "Negative Binomial regression"
Estimates a Negative Binomial model. The dependent variable is taken to represent a count of the occurrence of events of some sort, and must have only non-negative integer values. By default the model NegBin 2 is used, in which the conditional variance of the count is given by μ(1 + αμ), where μ denotes the conditional mean. But if the <@opt="--model1"> option is given the conditional variance is μ(1 + α).
The optional <@lit="offset"> series works in the same way as for the <@ref="poisson"> command. The Poisson model is a restricted form of the Negative Binomial in which α = 0 by construction.
By default, standard errors are computed using a numerical approximation to the Hessian at convergence. But if the <@opt="--opg"> option is given the covariance matrix is based on the Outer Product of the Gradient (OPG), or if the <@opt="--robust"> option is given QML standard errors are calculated, using a “sandwich” of the inverse of the Hessian and the OPG.
Menu path: /Model/Limited dependent variable/Count data
Script command: <@ref="negbin">
# nls Estimation "Nonlinear Least Squares"
Performs Nonlinear Least Squares (NLS) estimation using a modified version of the Levenberg–Marquardt algorithm. You must supply a function specification; it is recommended but not required that you also supply expressions for the derivatives of this function with respect to each of the parameters if possible. If you do not supply derivatives you should instead give a list of the parameters to be estimated (separated by spaces or commas), preceded by the keyword <@lit="params">; these can be either scalars, or vectors, or any combination of the two.
Example: Suppose we have a data set with variables <@mth="C"> and <@mth="Y"> (e.g. <@lit="greene11_3.gdt">) and we wish to estimate a nonlinear consumption function of the form
<@fig="greene_Cfunc">
The parameters alpha, beta and gamma must first be added to the dataset and given initial values. Appropriate lines may be typed into the NLS specification window prior to the function specification.
In the NLS window we type the following lines:
<code>
C = alpha + beta * Y^gamma
deriv alpha = 1
deriv beta = Y^gamma
deriv gamma = beta * Y^gamma * log(Y)
</code>
The first line specifies the regression function, and the next three lines supply the derivatives of that function with respect to each of the parameters in turn. If the "deriv" lines are not given, a numerical approximation to the Jacobian is computed.
If the parameters alpha, beta and gamma were not previously declared we could preface the above lines with something like the following:
<code>
scalar alpha = 1
scalar beta = 1
scalar gamma = 1
</code>
For further details on NLS estimation please see chapter 21 of the <@pdf="Gretl User's Guide#chap:nls">.
Menu path: /Model/Nonlinear Least Squares
Script command: <@ref="nls">
# normtest Tests "Normality test"
Carries out a test for normality for the given <@var="series">. The specific test is controlled by the option flags (but if no flag is given, the Doornik–Hansen test is performed). Note: the Doornik–Hansen and Shapiro–Wilk tests are recommended over the others, on account of their superior small-sample properties.
The test statistic and its p-value may be retrieved using the accessors <@xrf="$test"> and <@xrf="$pvalue">. Please note that if the <@opt="--all"> option is given, the result recorded is that from the Doornik–Hansen test.
Menu path: /Variable/Normality test
Script command: <@ref="normtest">
# nulldata Dataset "Creating a blank dataset"
Establishes a “blank” data set, containing only a constant and an index variable, with periodicity 1 and the specified number of observations. This may be used for simulation purposes: functions such as <@lit="uniform()"> and <@lit="normal()"> will generate artificial series from scratch to fill out the data set. This command may be useful in conjunction with <@lit="loop">. See also the “seed” option to the <@ref="set"> command.
By default, this command cleans out all data in gretl's current workspace: not only series but also matrices, scalars, strings, etc. If you give the <@opt="--preserve"> option, however, any currently defined variables other than series are retained.
Menu path: /File/New data set
Script command: <@ref="nulldata">
# ols Estimation "Ordinary Least Squares"
Computes ordinary least squares (OLS) estimates for the specified model.
Besides coefficient estimates and standard errors, the program also prints p-values for <@mth="t"> (two-tailed) and <@mth="F">-statistics. A p-value below 0.01 indicates statistical significance at the 1 percent level and is marked with <@lit="***">. <@lit="**"> indicates significance between 1 and 5 percent and <@lit="*"> indicates significance between the 5 and 10 percent levels. Model selection statistics (the Akaike Information Criterion or AIC and Schwarz's Bayesian Information Criterion) are also printed. The formula used for the AIC is that given by <@bib="Akaike (1974);akaike74">, namely minus two times the maximized log-likelihood plus two times the number of parameters estimated.
Menu path: /Model/Ordinary Least Squares
Other access: Beta-hat button on toolbar
Script command: <@ref="ols">
# omit Tests "Omit variables"
This command re-estimates the given model after omitting the specified variables, or after sequentially omitting insignificant variables if the relevant box is available and is checked. Besides the usual model output, it prints a test for the joint significance of the omitted variables. The null hypothesis is that the true coefficients on all the omitted variables equal zero.
Sequential elimination works as follows: at each step the variable with the highest p-value is omitted, until all remaining variables have a p-value no greater than some cutoff. The default cutoff is 10 percent (two-sided); this can be adjusted via the spin button. By default this process operates on all variables in the model (apart from the constant). If you want to confine it to a subset of the variables, check the box labeled “Test only selected variables” and make a selection.
Menu path: Model window, /Tests/Omit variables
Script command: <@ref="omit">
# online Dataset "Access online databases"
Gretl is able to access databases at Wake Forest University (your computer must be connected to the internet for this to work).
Under the “File, Browse databases” menu, select the item “on database server”. A window should appear, showing a listing of the gretl databases available at Wake Forest. (Depending on your location and the speed of your internet connection, this may take a few seconds.) Along with the name of the database and a short description, there will appear a “Local status” entry: this shows whether you have the database installed locally (on the hard drive of your computer) and if so, whether or not it is up to date with the version on the server.
If you have a given database installed locally, and it is up to date, there is no advantage in accessing it via the server. But for a database that is not already installed and up to date, you may wish to get a listing of the data series: click on “Get series listing”. This brings up a further window, from which you can display the values of a chosen data series, graph those values, or import them into gretl's workspace. These tasks can be accomplished using the “Series” menu, or via the popup menu that appears when you click the right mouse button on a given series. You can also search the listing for a variable of interest (the “Find” menu item).
If you want faster access to the data, or wish to access the database offline, then select the line showing the database you want, in the initial database window, and press the “Install” button. This will download the database in compressed format, then uncompress it and install it on your hard drive. Thereafter you should be able to find it under the “File, Browse databases, gretl native” menu.
# packages Utilities "Function packages"
Gretl's functionality can be extended by the use of function packages. These come in two sorts: official “Addons” and contributed packages. Jointly, they cover many estimators and utilities not available as built-in commands or functions.
The official Addons are included in the gretl installers for Windows and Mac. On Linux if they are not preinstalled then they are downloaded on demand (for example, if you select the menu item /Model/Time series/GARCH variants, gretl will download the <@lit="gig"> (GARCH in gretl) Addon. You can check that your Addons are up to date via <@mnu="PkgHelp:SFAddons"> in the Help menu.
You can browse the contributed packages installed on your computer via the menu item <@mnu="PkgHelp:LocalGfn">, and if you are online you can access a listing of available packages via the item <@mnu="PkgHelp:RemoteGfn">. Both items are found under /File/Function packages.
Many packages offer to attach themselves to GUI menus. You can inspect these attachments via the <@mnu="PkgHelp:Registry"> (access via the Preferences button in the browser for installed packages).
For full details on installing and working with function packages, see the <@mnu="PkgHelp:Pkgbook"> (under the Help menu). This guide also contains details on writing function packages.
# panel Estimation "Panel models"
Estimates a panel model. By default the fixed effects estimator is used; this is implemented by subtracting the group or unit means from the original data.
If the "Random effects" button is checked, random effects (GLS) estimates are computed. By default the method of Swamy and Arora is used for the GLS transformation, but the Nerlove method is available as an option (via the drop-down selector). A further option is “Swamy-Arora / Baltagi-Chang”: in the case of an unbalanced panel this invokes a modification of the Swamy-Arora method devised by <@bib="Baltagi and Chang (1994);baltagi-chang94">, otherwise it's just equivalent to regular Swamy-Arora.
For more details on panel estimation, please see chapter 19 of the <@pdf="Gretl User's Guide#chap:panel">.
Menu path: /Model/Panel
Script command: <@ref="panel">
# panel-between Estimation "Between groups model"
This dialog allows you to enter a specification for the “between model” in the context of panel data. This regression uses the group-means of the data, thereby ignoring the variation within the groups. This model is rarely of great interest in its own right, but may be useful for purposes of comparison (for example, with the fixed effects model).
# panel-mode Dataset "Panel data organization"
This dialog offers up to three options with regard to defining a data set as a panel. The first two options require that the data set is already organized in a panel format (although this may not yet be recognized by gretl). The third option requires that the data set contains variables that represent the panel structure.
<@itl="Stacked time series">: Let there be <@var="N"> cross-sectional units in the data set, and let <@var="T"> = the number of time-series observations per unit. By selecting this option you are telling gretl that the data set is currently composed of <@var="N"> consecutive blocks of <@var="T"> time-series observations, one for each cross-sectional unit. The next step will be to specify the value of <@var="N">.
<@itl="Stacked cross sections">: You are telling gretl that the data set is currently composed of <@var="T"> consecutive blocks of <@var="N"> cross-sectional observations, one for each time period. The next step, again, will be to specify the value of <@var="N">.
If the total number of observations in the current dataset is prime, the above options are not available.
<@itl="Use index variables">: You are saying that the data set is currently organized any old way (it doesn't matter how), but that it contains two variables that index the cross-sectional units and the time periods respectively. The next step will be to select those two variables. Panel index variables must have nothing but non-negative integer values, with no missing values. If there are no such variables in the dataset this option is not available.
# panel-wls Estimation "Groupwise weighted least squares"
Groupwise weighted least squares for panel data. Computes weighted least squares (WLS) estimates, with the weights based on the estimated error variances for the respective cross-sectional units in the sample.
If the iteration option is selected, the procedure is iterated: at each round the residuals are re-computed using the current WLS parameter estimates, which gives rise to a new set of estimates of the error variances, and a hence a new set of weights. Iteration stops when the maximum difference in the parameter estimates from one round to the next falls below 0.0001 or the number of iterations reaches 20. If the iteration converges, the resulting estimates are Maximum Likelihood.
# pca Statistics "Principal Components Analysis"
Principal Components Analysis. Prints the eigenvalues of the correlation matrix (or the covariance matrix if the option box is checked) for the variables in <@var="varlist">, along with the proportion of the joint variance accounted for by each component. Also prints the corresponding eigenvectors (or “component loadings”).
In the window displaying the results, you have the option of saving the principal components to the dataset as series.
Menu path: /View/Principal components
Other access: Main window pop-up (multiple selection)
Script command: <@ref="pca">
# pergm Statistics "Periodogram"
Computes and displays the spectrum of the specified series. By default the sample periodogram is given, but optionally a Bartlett lag window is used in estimating the spectrum (see, for example, Greene's <@itl="Econometric Analysis"> for a discussion of this). The default width of the Bartlett window is twice the square root of the sample size but this can be set manually using the <@var="bandwidth"> parameter, up to a maximum of half the sample size.
If the <@opt="--log"> option is given the spectrum is represented on a logarithmic scale.
The (mutually exclusive) options <@opt="--radians"> and <@opt="--degrees"> influence the appearance of the frequency axis when the periodogram is graphed. By default the frequency is scaled by the number of periods in the sample, but these options cause the axis to be labeled from 0 to π radians or from 0 to 180°, respectively.
By default, if the program is not in batch mode a plot of the periodogram is shown. This can be adjusted via the <@opt="--plot"> option. The acceptable parameters to this option are <@lit="none"> (to suppress the plot); <@lit="display"> (to display a plot even when in batch mode); or a file name. The effect of providing a file name is as described for the <@opt="--output"> option of the <@ref="gnuplot"> command.
Menu path: /Variable/Periodogram
Other access: Main window pop-up menu (single selection)
Script command: <@ref="pergm">
# polyweights Transformations "Polynomial trend fitting"
In fitting a polynomial trend to a time series it may be desirable to give extra weight to the observations at the start and end of the sample. (Points in the middle of the sample range have neighbours on both sides that are likely to be pulling the fit in the same general direction.)
The weighting schemes offered here (quadratic, cosine-bell and steps) can be used to this effect. If you select one of these schemes two additional settings must be chosen: first, what maximum weight should be used (the minimum, baseline weight is 1.0)? Second, what central fraction of the sample should be given a uniform (minimal) weighting?
Suppose, for example, you select a maximum weight of 3.0 and a central fraction of 0.4. This means that the middle 40 percent of the data get a weight of 1.0. If the “steps” shape is selected the first and last 30 percent of the observations get a weight of 3.0; otherwise, for the first 30 percent of observations the weights decline gradually from 3.0 to 1.0; and for the last 30 percent the weights increase from 1.0 to 3.0.
# poisson Estimation "Poisson estimation"
Estimates a poisson regression. The dependent variable is taken to represent the occurrence of events of some sort, and must take on only non-negative integer values.
If a discrete random variable <@mth="Y"> follows the Poisson distribution, then
<@fig="poisson1">
for <@mth="y"> = 0, 1, 2,…. The mean and variance of the distribution are both equal to <@mth="v">. In the Poisson regression model, the parameter <@mth="v"> is represented as a function of one or more independent variables. The most common version (and the only one supported by gretl) has
<@fig="poisson2">
or in other words the log of <@mth="v"> is a linear function of the independent variables.
Optionally, you may add an “offset” variable to the specification. This is a scale variable, the log of which is added to the linear regression function (implicitly, with a coefficient of 1.0). This makes sense if you expect the number of occurrences of the event in question to be proportional, other things equal, to some known factor. For example, the number of traffic accidents might be supposed to be proportional to traffic volume, other things equal, and in that case traffic volume could be specified as an “offset” in a Poisson model of the accident rate. The offset variable must be strictly positive.
By default, standard errors are computed using the negative inverse of the Hessian. If the <@opt="--robust"> flag is given, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a “sandwich” of the inverse of the estimated Hessian and the outer product of the gradient.
See also <@ref="negbin">.
Menu path: /Model/Limited dependent variable/Count data
Script command: <@ref="poisson">
# probit Estimation "Probit model"
If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on <@var="indepvars"> are obtained via the Newton–Raphson method. As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed by giving the <@opt="--p-values"> option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant.
By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a “sandwich” of the inverse of the estimated Hessian and the outer product of the gradient. See chapter 10 of Davidson and MacKinnon for details.
If the dependent variable is not binary but is discrete, then Ordered Probit estimates are obtained. (If the variable selected as dependent is not discrete, an error is flagged.)
<@itl="Probit for panel data">
With the <@opt="--random-effects"> option, the error term is assumed to be composed of two normally distributed components: one time-invariant term that is specific to the cross-sectional unit or “individual” (and is known as the individual effect); and one term that is specific to the particular observation.
Evaluation of the likelihood for this model involves the use of Gauss-Hermite quadrature for approximating the value of expectations of functions of normal variates. The number of quadrature points used can be chosen through the <@opt="--quadpoints"> option (the default is 32). Using more points will increase the accuracy of the results, but at the cost of longer compute time; with many quadrature points and a large dataset estimation may be quite time consuming.
Besides the usual parameter estimates (and associated statistics) relating to the included regressors, certain additional information is presented on estimation of this sort of model:
<indent>
• <@lit="lnsigma2">: the maximum likelihood estimate of the log of the variance of the individual effect;
</indent>
<indent>
• <@lit="sigma_u">: the estimated standard deviation of the individual effect; and
</indent>
<indent>
• <@lit="rho">: the estimated share of the individual effect in the composite error variance (also known as the intra-class correlation).
</indent>
The Likelihood Ratio test of the null hypothesis that <@lit="rho"> equals zero provides a means of assessing whether the random effects specification is needed. If the null is not rejected that suggests that a simple pooled probit specification is adequate.
Menu path: /Model/Limited dependent variable/Probit
Script command: <@ref="probit">
# qlrtest Tests "Quandt likelihood ratio test"
For a model estimated on time-series data via OLS, performs the Quandt likelihood ratio (QLR) test for a structural break at an unknown point in time, with 15 percent trimming at the beginning and end of the sample period.
For each potential break point within the central 70 percent of the observations, a Chow test is performed. See <@ref="chow"> for details; as with the regular Chow test, this is a robust Wald test if the original model was estimated with the <@opt="--robust"> option, an F-test otherwise. The QLR statistic is then the maximum of the individual test statistics.
An asymptotic p-value is obtained using the method of <@bib="Bruce Hansen (1997);hansen97">.
Besides the standard hypothesis test accessors <@xrf="$test"> and <@xrf="$pvalue">, <@xrf="$qlrbreak"> can be used to retrieve the index of the observation at which the test statistic is maximized.
When this command is run interactively (only), a plot of the Chow test statistic is displayed by default. This can be adjusted via the <@opt="--plot"> option. The acceptable parameters to this option are <@lit="none"> (to suppress the plot); <@lit="display"> (to display a plot even when not in interactive mode); or a file name. The effect of providing a file name is as described for the <@opt="--output"> option of the <@ref="gnuplot"> command.
Menu path: Model window, /Tests/QLR test
Script command: <@ref="qlrtest">
# qqplot Graphs "Q-Q plot"
With just one series selected, displays a plot of the empirical quantiles of the given series against the quantiles of the normal distribution. The series must include at least 20 valid observations in the current sample range. By default the empirical quantiles are plotted against quantiles of the normal distribution having the same mean and variance as the sample data, but two alternatives are available: the data may be standardized (converted to z-scores) before plotting, or the “raw” empirical quantiles may be plotted against the quantiles of the standard normal distribution.
The option <@opt="--output"> has the effect of sending the output to the specified file; use “display” to force output to the screen. See the <@ref="gnuplot"> command for more detail on this option.
Given two series arguments, <@var="y"> and <@var="x">, displays a plot of the empirical quantiles of <@var="y"> against those of <@var="x">. The data values are not standardized.
Menu path: /Variable/Normal Q-Q plot
Menu path: /View/Graph specified vars/Q-Q plot
Script command: <@ref="qqplot">
# quantreg Estimation "Quantile regression"
Quantile regression. By default standard errors are computed according to the asymptotic formula given by <@bib="Koenker and Bassett (<@itl="Econometrica">, 1978);koenker-bassett78">, but if the “robust” box is checked we use the heteroskedasticity-robust variant from <@bib="Koenker and Zhao (<@itl="Journal of Nonparametric Statistics">, 1994);koenker-zhao94">.
If the “Compute confidence intervals” option is checked gretl will calculate confidence intervals for the coefficients, in place of standard errors. The “robust” check-box still has an effect: if it is not checked, the intervals are computed on the assumption of IID errors; with it, gretl uses the robust estimator developed by <@bib="Koenker and Machado (<@itl="Journal of the American Statistical Association">, 1999);koenker-machado99">. Note that these intervals are not just “plus or minus so many standard errors”; in general, they are asymmetrical about the point estimates of the coefficients.
You may give a list of quantiles (see the drop-down list for some pre-defined possibilities). In that case gretl will calculate quantile estimates and either standard errors or confidence intervals for each of the specified values.
To Follow up on the references given above, please see chapter 35 of the <@pdf="Gretl User's Guide#chap:quantreg">.
Menu path: /Model/Robust estimation/Quantile regression
Script command: <@ref="quantreg">
# reprobit Estimation "Random effects probit"
The random effects probit estimator provides a means of estimating a (binary) probit model for panel data. The error term is assumed to be composed of two normally distributed components: one time-invariant term that is specific to the cross-sectional unit or “individual” (and is known as the individual effect); and one term that is specific to the particular observation.
Evaluation of the likelihood for this model involves the use of Gauss-Hermite quadrature for approximating the value of expectations of functions of normal variates. In this dialog you can select the number of quadrature points used. Using more points will increase the accuracy of the results, but at the cost of longer compute time; with many quadrature points and a large dataset estimation may be quite time consuming.
Besides the usual parameter estimates (and associated statistics) relating to the included regressors, certain additional information is presented on estimation of this sort of model:
<indent>
• <@lit="lnsigma2">: the maximum likelihood estimate of the log of the variance of the individual effect;
</indent>
<indent>
• <@lit="sigma_u">: the estimated standard deviation of the individual effect; and
</indent>
<indent>
• <@lit="rho">: the estimated share of the individual effect in the composite error variance (also known as the intra-class correlation).
</indent>
The Likelihood Ratio test of the null hypothesis that <@lit="rho"> equals zero provides a means of assessing whether the random effects specification is needed. If the null is not rejected that suggests that a simple pooled probit specification is adequate.
In scripting mode, the random effects probit model is estimated using the <@lit="probit"> command with the <@opt="--random-effects"> option.
# reset Tests "Ramsey's RESET"
Must follow the estimation of a model via OLS. Carries out Ramsey's RESET test for model specification (non-linearity) by adding the square and/or the cube of the fitted values to the regression and calculating the <@mth="F"> statistic for the null hypothesis that the parameters on the added terms are zero.
Menu path: Model window, /Tests/Ramsey's RESET
Script command: <@ref="reset">
# restrict-model Tests "Restrictions on a model"
Each restriction in the set should be expressed as an equation, with a linear combination of parameters on the left and a numeric value to the right of the equals sign. Parameters may be referenced in the form <@lit="b["><@var="i"><@lit="]">, where <@var="i"> represents the position in the list of regressors (starting at 1), or <@lit="b["><@var="varname"><@lit="]">, where <@var="varname"> is the name of the regressor in question.
The <@lit="b"> terms in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[4]">.
Here is an example of a set of restrictions:
<code>
b[1] = 0
b[2] - b[3] = 0
b[4] + 2*b[5] = 1
</code>
# restrict-system Tests "Restrictions on a system of equations"
Each restriction in the set should be expressed as an equation, with a linear combination of parameters on the left and a numeric value to the right of the equals sign. Parameters are referenced using <@lit="b"> plus two numbers in square brackets. The leading number represents the position of the equation within the system and the second number indicates position in the list of regressors, starting at 1 in both cases. For example <@lit="b[2,1]"> denotes the first parameter in the second equation, and <@lit="b[3,2]"> the second parameter in the third equation.
The <@lit="b"> terms in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[1,4]">.
Here is an example of a set of restrictions:
<code>
b[1,1] = 0
b[1,2] - b[2,2] = 0
b[3,4] + 2*b[3,5] = 1
</code>
# restrict-vecm Tests "Restrictions on a VECM"
Use this command to place linear restrictions on the cointegrating relations (beta) and/or adjustment coefficients (alpha) in a vector error-correction model (VECM).
Each restriction should be expressed as an equation, with a linear combination of parameters to the left of the equals sign and a numerical value on the right. Restrictions on beta may be non-homogeneous (non-zero on the right), but alpha restrictions must be homogeneous (zero on the right).
If the VECM is of rank 1, the elements of beta are referenced in the form <@lit="b["><@var="i"><@lit="]">, where <@var="i"> represents position in the cointegrating vector, starting at 1. For example, <@lit="b[2]"> denotes the second element in beta. If the rank is greater than 1, use <@lit="b"> plus two numbers in square brackets. For example, <@lit="b[2,1]"> denotes the first element in the second cointegrating vector.
To reference elements of alpha, use <@lit="a"> instead of <@lit="b">.
The parameter identifiers in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[4]">.
Here is an example of a set of restrictions on a VECM of rank 1.
<code>
b[1] + b[2] = 0
b[1] + b[3] = 0
</code>
See also chapter 29 of the <@pdf="Gretl User's Guide#chap:vecm">.
# rmplot Graphs "Range-mean plot"
Range–mean plot: this command creates a simple graph to help in deciding whether a time series, <@mth="y">(t), has constant variance or not. We take the full sample t=1,...,T and divide it into small subsamples of arbitrary size <@mth="k">. The first subsample is formed by <@mth="y">(1),...,<@mth="y">(k), the second is <@mth="y">(k+1), ..., <@mth="y">(2k), and so on. For each subsample we calculate the sample mean and range (= maximum minus minimum), and we construct a graph with the means on the horizontal axis and the ranges on the vertical. So each subsample is represented by a point in this plane. If the variance of the series is constant we would expect the subsample range to be independent of the subsample mean; if we see the points approximate an upward-sloping line this suggests the variance of the series is increasing in its mean; and if the points approximate a downward sloping line this suggests the variance is decreasing in the mean.
Besides the graph, gretl displays the means and ranges for each subsample, along with the slope coefficient for an OLS regression of the range on the mean and the p-value for the null hypothesis that this slope is zero. If the slope coefficient is significant at the 10 percent significance level then the fitted line from the regression of range on mean is shown on the graph. The <@mth="t">-statistic for the null, and the corresponding p-value, are recorded and may be retrieved using the accessors <@xrf="$test"> and <@xrf="$pvalue"> respectively.
Menu path: /Variable/Range-mean graph
Script command: <@ref="rmplot">
# runs Tests "Runs test"
Carries out the nonparametric “runs” test for randomness of the specified <@var="series">, where runs are defined as sequences of consecutive positive or negative values. If you want to test for randomness of deviations from the median, for a variable named <@lit="x1"> with a non-zero median, you can do the following:
<code>
series signx1 = x1 - median(x1)
runs signx1
</code>
If the <@opt="--difference"> option is given, the variable is differenced prior to the analysis, hence the runs are interpreted as sequences of consecutive increases or decreases in the value of the variable.
If the <@opt="--equal"> option is given, the null hypothesis incorporates the assumption that positive and negative values are equiprobable, otherwise the test statistic is invariant with respect to the “fairness” of the process generating the sequence, and the test focuses on independence alone.
Menu path: /Tools/Nonparametric tests
Script command: <@ref="runs">
# sampling Dataset "Setting the sample"
The Sample menu offers several ways of selecting a sub-sample from the current dataset.
If you choose “Sample/Restrict based on criterion...” you need to supply a Boolean (logical) expression, of the same sort that you would use to define a dummy variable. For example the expression “sqft > 1400” will select only cases for which the variable sqft has a value greater than 1400. Conditions may be concatenated using the logical operators “&&” (AND) and “||” (OR), and may be negated using “!” (NOT). If the dataset already contains dummy variables, you are also given the option of selecting one of these to define the sample (observations with a value of 1 for the selected dummy will be included, and others excluded).
The menu item “Sample/Drop all obs with missing values” redefines the sample to exclude all observations for which values of one or more variables are missing (leaving only complete cases).
To select observations for which a particular variable has no missing values, use “Restrict based on criterion...” and supply the Boolean condition “!missing(varname)” (replace “varname” with the name of the variable you want to use).
If the observations are labeled, you can exclude particular observations using, for example, <@lit="obs!="France""> as the Boolean criterion. The observation name must be enclosed in double quotes.
One point should be noted about defining a sample based on a dummy variable, a Boolean expression, or on the missing values criterion: Any “structural” information in the data header file (regarding the time series or panel nature of the data) is lost. You may reimpose structure via the “Dataset structure” item under the Data menu.
Please see chapter 5 of the <@pdf="Gretl User's Guide#chap:sampling"> for further details.
# save-labels Utilities "Save or remove series labels"
If you choose Export here, gretl will write a file containing the descriptive labels of any series in the current dataset that have such labels. This is a plain text file with one line per variable. The line will be empty for variables that have no descriptive label.
If you choose Remove, the descriptive labels will be removed for all series that have such labels. This would be appropriate only if the current labels have somehow been added in error.
# add-labels Utilities "Add series labels"
If you choose Yes here, you are offered a file-open dialog box to select a plain text file containing descriptive labels for the series in the current dataset. The file should contain one label per line; a blank line means no label. Gretl will attempt to read as many labels as there are series in the dataset, excluding the constant.
# save-script Utilities "Save commands?"
If you choose Yes here, gretl will write a file containing a record of the commands you executed in the current session. Most commands that you execute via “point and click” have a “script” counterpart, and it is these script commands that will be saved. You could take the file as the basis for writing a gretl command script.
If you don't care to be prompted to save a record of commands on exit, uncheck the tick box in the save commands dialog.
# save-session Utilities "Save this gretl session?"
If you choose Yes here, gretl will write a file containing a “snapshot” of the current session, including a copy of the working dataset along with any models, graphs or other objects that you have saved “as icons”. You can re-open this file later to recreate the state of gretl as of the time you quit the session (see the “File/Session files” menu).
If you mostly work with gretl using command scripts (which we recommend for “serious” econometric work) you probably don't need to save the session, but you should be sure to save any changes to your script that you wish to keep. You may also want to save any changes to your dataset, unless these are of a sort that can easily be recreated by running a script.
If you work with scripts and don't care to be prompted to save your session on exit, uncheck the tick box in the save session dialog.
# scatters Graphs "Multiple pairwise graphs"
Generates pairwise graphs of the selected “Y-axis variable” against each of the selected “X-axis variables” in turn. (Or you can select several variables for the Y-axis and one for the X-axis.) Scanning a set of such plots can be a useful step in exploratory data analysis. The maximum number of plots is 16; any extra variables will be ignored.
If the dataset is time-series, then the second sub-list can be omitted, in which case it will implicitly be taken as "time", so you can plot multiple time series in separated sub-graphs.
Menu path: /View/Multiple graphs
Script command: <@ref="scatters">
# script-editor Utilities "Script editor preferences"
Note that some of these preferences apply only when editing native gretl scripts (Smart Tab and Enter, Script editor uses tabs); some also apply when editing or viewing any script (Show line numbers, Highlighting style).
<@itl="Smart Tab and Enter">: If this is turned on, then when you press the <@lit="Tab"> key at the start of a line in a hansl script, instead of just entering a tab stop the program will try to adjust the indentation level of the line consistently with other lines that have been entered. Similarly, when you press the <@lit="Enter"> key the program will try to ensure that the indentation of the completed line is correct.
<@itl="Show line numbers">: Display line numbers in the left margin of the script editor or viewer.
<@itl="Script editor uses tabs">: Affects program behavior when you are editing more than one script at a time. If this is checked then each script is shown in a “tab” in a notebook-style window; otherwise each script has its own window.
<@itl="Enable auto-completion">: If this is checked, you will be offered possible completions for the word you are typing, based on the words already entered in the editor window or tab. To select a completion, use the up/down arrow keys and the <@lit="Enter"> key; or just keep on typing to dismiss the suggested completions.
<@itl="Enable auto-brackets">: If this is checked, then when you type a left parenthesis, bracket or brace at the end of a line the matching right-hand delimiter will be added automatically, and the cursor moved between the two delimiters.
<@itl="Number of spaces per Tab">: How wide do you want a tab-stop or indentation level to be? An integer value from 2 to 8.
<@itl="Highlighting style">: Provides a drop-down list of syntax highlighting styles. Some of these are dark-on-light and some light-on-dark: experiment and find what you like.
# setinfo Dataset "Edit attributes of variable"
In this dialog box you can:
* Rename a (series) variable.
* Add or edit a description of the variable: this appears next to the variable name in the gretl main window.
* Add or edit the "display name" for the variable (if the variable is a series, not a scalar). This string (maximum 19 characters) is shown in place of the variable name when the variable is displayed in a graph. Thus for instance you can associate a more comprehensible string such as "T-bill rate" with a cryptically named variable such as "tb3".
* (For time-series data) set the compaction method for the variable. This method will be used if you decide to reduce the frequency of the dataset, or if you update the variable by importing from a database where the variable is at a higher frequency than in the working dataset.
* Mark a variable as discrete (for series with integer values only). This affects the way the variable is handled when you ask for a frequency plot.
Menu path: /Variable/Edit attributes
Other access: Main window pop-up menu
Script command: <@ref="setinfo">
# setmiss Dataset "Missing value code"
Set a numerical value that will be interpreted as "missing" or "not applicable", either for a particular data series (under the Variable menu) or globally for the entire data set (under the Data menu).
Gretl has its own internal coding for missing values, but sometimes imported data may employ a different code. For example, if a particular series is coded such that a value of -1 indicates "not applicable", you can select "Set missing value code" under the Variable menu and type in the value "-1" (without the quotes). Gretl will then read the -1s as missing observations.
Menu path: /Data/Set missing value code
Script command: <@ref="setmiss">
# spearman Statistics "Spearmans's rank correlation"
Prints Spearman's rank correlation coefficient for a specified pair of series. The series do not have to be ranked manually in advance; the function takes care of this.
The automatic ranking is from largest to smallest (i.e. the largest data value gets rank 1). If you need to invert this ranking, create a new variable which is the negative of the original. For example:
<code>
series altx = -x
spearman altx y
</code>
Menu path: /Tools/Nonparametric tests/Correlation
Script command: <@ref="spearman">
# store Dataset "Save data"
Save data to <@var="filename">. By default all currently defined series are saved but the optional <@var="varlist"> argument can be used to select a subset of series. If the dataset is sub-sampled, only the observations in the current sample range are saved.
The output file will be written in the currently set <@ref="workdir">, unless the <@var="filename"> string contains a full path specification.
The format in which the data are written may be controlled to a degree by the extension or suffix of <@var="filename">, as follows:
<indent>
• <@lit=".gdt">, or no extension: gretl's native XML data format. (If no extension is provided, “<@lit=".gdt">” is added automatically.)
</indent>
<indent>
• <@lit=".gtdb">: gretl's native binary data format.
</indent>
<indent>
• <@lit=".csv">: comma-separated values (CSV).
</indent>
<indent>
• <@lit=".txt"> or <@lit=".asc">: space-separated values.
</indent>
<indent>
• <@lit=".m">: GNU Octave matrix format.
</indent>
<indent>
• <@lit=".dta">: Stata dta format (version 113).
</indent>
The format-related option flags shown above can be used to force the issue of the save format independently of the filename (or to get gretl to write in the formats of PcGive or JMulTi). However, if <@var="filename"> has extension <@lit=".gdt"> or <@lit=".gdtb"> this necessarily implies use of native format and the addition of a conflicting option flag will generate an error.
When data are saved in native format (only), the <@opt="--gzipped"> option may be used for data compression, which can be useful for large datasets. The optional parameter for this flag controls the level of compression (from 0 to 9): higher levels produce a smaller file, but compression takes longer. The default level is 1; a level of 0 means that no compression is applied.
The option flags <@opt="--omit-obs"> and <@opt="--no-header"> are applicable only when saving data in CSV format. By default, if the data are time series or panel, or if the dataset includes specific observation markers, the CSV file includes a first column identifying the observations (e.g. by date). If the <@opt="--omit-obs"> flag is given this column is omitted. The <@opt="--no-header"> flag suppresses the usual printing of the names of the variables at the top of the columns.
The option flag <@opt="--decimal-comma"> is also confined to the case of saving data in CSV format. The effect of this option is to replace the decimal point with the decimal comma; in addition the column separator is forced to be a semicolon.
The option of saving in gretl database format is intended to help with the construction of large sets of series, possibly having mixed frequencies and ranges of observations. At present this option is available only for annual, quarterly or monthly time-series data. If you save to a file that already exists, the default action is to append the newly saved series to the existing content of the database. In this context it is an error if one or more of the variables to be saved has the same name as a variable that is already present in the database. The <@opt="--overwrite"> flag has the effect that, if there are variable names in common, the newly saved variable replaces the variable of the same name in the original dataset.
The <@opt="--comment"> option is available when saving data as a database or in CSV format. The required parameter is a double-quoted one-line string, attached to the option flag with an equals sign. The string is inserted as a comment into the database index file or at the top of the CSV output.
The <@lit="store"> command behaves in a special manner in the context of a “progressive loop”. See chapter 12 of the <@pdf="Gretl User's Guide#chap:looping"> for details.
Menu path: /File/Save data; /File/Export data
Script command: <@ref="store">
# system Estimation "Systems of equations"
In this window you can define a system of equations and choose an estimator for the system. Four sorts of statement may be given here, as follows:
<indent>
• <@ref="equation">: specify an equation within the system. At least two such statements must be provided.
</indent>
<indent>
• <@lit="instr">: for a system to be estimated via Three-Stage Least Squares, a list of instruments (by variable name or number). Alternatively, you can put this information into the <@lit="equation"> line using the same syntax as in the <@ref="tsls"> command.
</indent>
<indent>
• <@lit="endog">: for a system of simultaneous equations, a list of endogenous variables. This is primarily intended for use with FIML estimation, but with Three-Stage Least Squares this approach may be used instead of giving an <@lit="instr"> list; then all the variables not identified as endogenous will be used as instruments.
</indent>
<indent>
• <@lit="identity">: for use with FIML, an identity linking two or more of the variables in the system. This sort of statement is ignored when an estimator other than FIML is used.
</indent>
Menu path: /Model/Simultaneous equations
Script command: <@ref="system">
# tobit Estimation "Tobit model"
Estimates a Tobit model, which may be appropriate when the dependent variable is “censored”. For example, positive and zero values of purchases of durable goods on the part of individual households are observed, and no negative values, yet decisions on such purchases may be thought of as outcomes of an underlying, unobserved disposition to purchase that may be negative in some cases.
By default it is assumed that the dependent variable is censored at zero on the left and is uncensored on the right. However you can use the entry boxes marked “left bound” and “right bound” to specify a different pattern of censoring. Enter either a numerical value or <@lit="NA"> for no censoring.
The Tobit model is a special case of interval regression, which is supported via the <@ref="intreg"> command.
Menu path: /Model/Limited dependent variable/Tobit
Script command: <@ref="tobit">
# transpos Dataset "Transpose data"
Transposes the current data set. That is, each observation (row) in the current data set will be treated as a variable (column), and each variable as an observation. This command may be useful if data have been read from some external source in which the rows of the data table represent variables.
See also <@ref="dataset">.
Menu path: /Data/Transpose data
# tsls Estimation "Instrumental variables regression"
This command requires the selection of two lists of variables: the independent variables to appear in the given model and a set of instruments. Note that any exogenous regressors should appear in both lists.
Output for two-stage least squares estimates includes the Hausman test and, if the model is over-identified, the Sargan over-identification test. In the Hausman test, the null hypothesis is that OLS estimates are consistent, or in other words estimation by means of instrumental variables is not really required. A model of this sort is over-identified if there are more instruments than are strictly required. The Sargan test is based on an auxiliary regression of the residuals from the two-stage least squares model on the full list of instruments. The null hypothesis is that all the instruments are valid, and suspicion is thrown on this hypothesis if the auxiliary regression has a significant degree of explanatory power. For a good explanation of both tests see chapter 8 of <@bib="Davidson and MacKinnon (2004);davidson-mackinnon04">.
For both TSLS and LIML estimation, an additional test result is shown provided that the model is estimated under the assumption of i.i.d. errors (that is, the <@opt="--robust"> option is not selected). This is a test for weakness of the instruments. Weak instruments can lead to serious problems in IV regression: biased estimates and/or incorrect size of hypothesis tests based on the covariance matrix, with rejection rates well in excess of the nominal significance level <@bib="(Stock, Wright and Yogo, 2002);stock-wright-yogo02">. The test statistic is the first-stage <@mth="F">-test if the model contains just one endogenous regressor, otherwise it is the smallest eigenvalue of the matrix counterpart of the first stage <@mth="F">. Critical values based on the Monte Carlo analysis of <@bib="Stock and Yogo (2003);stock-yogo03"> are shown when available.
The R-squared value printed for models estimated via two-stage least squares is the square of the correlation between the dependent variable and the fitted values.
Menu path: /Model/Instrumental variables
Script command: <@ref="tsls">
# var Estimation "Vector Autoregression"
This command requires specification of:
<indent>
• - the lag order, that is, the number of lags of each variable that should be included in the system;
</indent>
<indent>
• - any exogenous variables (but note that a constant is included automatically unless you specify otherwise, a trend can be added using the trend checkbox, and seasonal dummy variables can be added using the seasonals checkbox); and
</indent>
<indent>
• - a list of endogenous variables, lags of which will be included on the right-hand side of each equation (note: do not include lagged variables in this list -- they will be added automatically).
</indent>
A separate regression will be run for each variable in the system. Output for each equation includes F-tests for zero restrictions on all lags of each of the variables and an F-test for the maximum lag, along with (optionally) forecast variance decompositions and impulse response functions.
Forecast variance decompositions and impulse responses are based on the Cholesky decomposition of the contemporaneous covariance matrix, and in this context the order in which the (stochastic) variables are given matters. The first variable in the list is assumed to be “most exogenous” within-period. The horizon for variance decompositions and impulse responses can be set using the <@ref="set"> command. For retrieval of a specified impulse response function in matrix form, see the <@xrf="irf"> function.
Menu path: /Model/Time series/Multivariate
Script command: <@ref="var">
# VAR-lagselect Tests "VAR lag-length selection"
In this dialog box you specify a VAR as usual, but use the lag order spin button to set the maximum number of lags to test.
Output will consist of a table showing the values of the Akaike (AIC), Schwarz (BIC) and Hannan–Quinn (HQC) information criteria computed from VARs of order 1 to the chosen maximum. This is intended to help with the selection of the optimal lag order.
# VAR-omit Tests "Test exogenous variables in VAR"
Use this dialog box to specify a subset of exogenous variables in a VAR. These variables will be omitted from the original VAR, and the system re-estimated.
A Likelihood Ratio test is reported, where the null hypothesis is that the true parameter values are zero, in all equations of the VAR, for the omitted variables. The test is based on the difference between the log-determinant of the variance matrix for the unrestricted system, and that for the restricted system with the selected variables omitted.
# vartest Tests "Difference of variances"
Calculates the <@mth="F"> statistic for the null hypothesis that the population variances are equal for the two selected series, and shows its p-value.
Menu path: /Tools/Test statistic calculator
Script command: <@ref="vartest">
# vecm Estimation "Vector Error Correction Model"
A VECM is a form of vector autoregression or VAR (see <@ref="var">), applicable where the variables in the model are individually integrated of order 1 (that is, are random walks, with or without drift), but exhibit cointegration. This command is closely related to the Johansen test for cointegration (see <@ref="coint2">).
The lag order selected in the VECM dialog box is that of the VAR system. The number of lags in the VECM itself (where the dependent variable is given as a first difference) is one less than this number.
The “rank” represents the number of cointegrating vectors. This must be greater than zero and less than or equal to (generally, less than) the number of endogenous variables selected.
In the “Endogenous variables” box you select the vector of endogenous variables, in levels. The inclusion of deterministic terms in the model is controlled by the option buttons. The default is to include an “unrestricted constant”, which allows for the presence of a non-zero intercept in the cointegrating relations as well as a trend in the levels of the endogenous variables. In the literature stemming from the work of Johansen (see for example his 1995 book) this is often referred to as “case 3”. The other four options produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the criteria for selecting a case are explained in chapter 29 of the <@pdf="Gretl User's Guide#chap:vecm">.
In the “Exogenous variables” box you may add specific exogenous variables. By default these enter the model in unrestricted form (indicated by a <@lit="U"> next to the name of the variable). If you want a certain exogenous variable to be restricted to the cointegrating space, right-click on it and select “Restricted” from the pop-up menu. The symbol next to the variable will change to R.
If the data are quarterly or monthly, a check box is shown that allows you to include a set of centered seasonal dummy variables. In all cases, an additional check box (“Show details”) allows for the printing of the auxiliary regressions that form the starting point of the Johansen maximum likelihood estimation procedure.
Menu path: /Model/Time series/Multivariate
Script command: <@ref="vecm">
# wls Estimation "Weighted Least Squares"
Let "wtvar" denote the variable selected in the "Weight variable" box. An OLS regression is run, where the dependent variable is the product of the positive square root of wtvar and the selected dependent variable, and the independent variables are also multiplied by the square root of wtvar. Statistics such as <@itl="R">-squared are based on the weighted data. If wtvar is a dummy variable, weighted least squares estimation is equivalent to eliminating all observations with value zero for wtvar.
Menu path: /Model/Other linear models/Weighted Least Squares
Script command: <@ref="wls">
# workdir Utilities "Working directory"
The working directory (or “workdir”) is where gretl looks by default when reading or writing data files or scripts via the file Open and Save dialogs. In addition it is the default location for
<indent>
• reading files via commands such as <@lit="append">, <@lit="open">, <@lit="run"> and <@lit="include">; and
</indent>
<indent>
• writing files via commands such as <@lit="eqnprint">, <@lit="tabprint">, <@lit="gnuplot">, <@lit="outfile"> and <@lit="store">.
</indent>
The working directory can be set in either of two ways: using the dialog accessed by the “Working directory” item under the File menu, or using the <@ref="set"> command, as in
<code>
set workdir /path/to/somewhere
</code>
The current value of the <@lit="workdir"> variable can be inspected in the dialog just mentioned or via the command
<code>
eval $workdir
</code>
By default the value of <@lit="workdir"> is preserved across gretl sessions. However, users who like to work from the command prompt (launching gretl from a terminal window) may prefer to have the working directory set automatically as the current directory (according to the shell) at start-up. This option can be selected in the dialog or by the command
<code>
set use_cwd on
</code>
(“cwd” = current working directory).
The working directory dialog also allows you to set the behavior of the GUI file selector: when you open or save a file in a given folder, should the selector remember and return to the same folder on the next invocation? Or should the selector always visit the chosen working directory?
Menu path: /File/Working directory
# x12a Utilities "X-12-ARIMA"
There are two procedural options here, controlled by the lower set of radio-buttons.
If you select “Execute X-12-ARIMA directly” then gretl writes a command file for X-12-ARIMA and calls the x12a program to execute the commands. In this case you have the option of producing a graph and/or saving selected output series to the gretl dataset.
If you select “Make X-12-ARIMA command file” gretl writes a command file for X-12-ARIMA, as above, but then opens this file in an editor window. In that window you are able to make changes and to save the file under a chosen name. You are also able to send the file for execution by x12a (by clicking the “Run” button on the editor window toolbar) and view the output. But in this case you do not have the option of saving data as gretl series or producing a gretl graph.
# xcorrgm Statistics "Cross-correlogram"
Prints and graphs the cross-correlogram for <@var="series1"> and <@var="series2">, which may be specified by name or number. The values are the sample correlation coefficients between the current value of <@var="series1"> and successive leads and lags of <@var="series2">.
If an <@var="order"> value is specified the length of the cross-correlogram is limited to at most that number of leads and lags, otherwise the length is determined automatically, as a function of the frequency of the data and the number of observations.
By default, a plot of the cross-correlogram is produced: a gnuplot graph in interactive mode or an ASCII graphic in batch mode. This can be adjusted via the <@opt="--plot"> option. The acceptable parameters to this option are <@lit="none"> (to suppress the plot); <@lit="ascii"> (to produce a text graphic even when in interactive mode); <@lit="display"> (to produce a gnuplot graph even when in batch mode); or a file name. The effect of providing a file name is as described for the <@opt="--output"> option of the <@ref="gnuplot"> command.
Menu path: /View/Cross-correlogram
Other access: Main window pop-up menu (multiple selection)
Script command: <@ref="xcorrgm">
# xtab Statistics "Cross-tabulate variables"
Displays a contingency table or cross-tabulation for each combination of the selected variables. Note that all the variables must be discrete.
By default, frequency count values are shown in the cells and on the margins of the table. However, you can choose to display either row or column percentages instead.
By default, cells with a zero count are shown as empty, but you can choose to show zero values explicitly.
Pearson's chi-square test for independence is displayed if the expected frequency under independence is at least 1.0e-7 for all cells. A common rule of thumb for the validity of this statistic is that at least 80 percent of cells should have expected frequencies of 5 or greater; if this criterion is not met a warning is printed.
If the contingency table is 2 by 2, Fisher's Exact Test for independence is computed. Note that this test is based on the assumption that the row and column totals are fixed, which may or may not be appropriate depending on how the data were generated. The left p-value should be used when the alternative to independence is negative association (values tend to cluster in the lower left and upper right cells); the right p-value should be used if the alternative is positive association. The two-tailed p-value for this test is calculated by method (b) in section 2.1 of <@bib="Agresti (1992);agresti92">: it is the sum of the probabilities of all possible tables having the given row and column totals and having a probability less than or equal to that of the observed table.
Script command: <@ref="xtab">
|