/usr/share/EMBOSS/doc/manuals/admin.tex is in emboss-doc 6.6.0-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 2591 2592 2593 2594 2595 2596 2597 2598 2599 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 2713 2714 2715 2716 2717 2718 2719 2720 2721 2722 2723 2724 2725 2726 2727 2728 2729 2730 2731 2732 2733 2734 2735 2736 2737 2738 2739 2740 2741 2742 2743 2744 2745 2746 2747 2748 2749 2750 2751 2752 2753 2754 2755 2756 2757 2758 2759 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 | \documentclass{report}
\usepackage{verbatim}
\usepackage{emboss}
\begin{document}
\title{The \EMBOSS\ Administrator's Guide}
\author{David Martin, EMBnet Norway \\
Peter Rice, LION Bioscience \\
Alan Bleasby, HGMP (EMBnet UK)}
\date{This guide relates to \EMBOSS\ 2.5.0}
\maketitle
Copyright (c) 2000, 2002 David Martin, Peter Rice, Alan Bleasby.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation
License\URL{http://www.gnu.org/copyleft/fdl.html}, Version 1.1 or any
later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the chapter entitled "GNU
Free Documentation License".
\tableofcontents
\chapter{Introduction}
\section{About this document}
This guide has been written to assist system administrators and
developers with the installation and configuration of \EMBOSS. If you
are reading this to find out how to do bioinformatics then you are
wasting your time. You are referred instead to the Resources chapter
below where there is a list of more relevant literature and web sites.
Experienced users may find this document useful for configuring their
own databases and customising their \EMBOSS\ experience.
\subsection{Credits}
The original author of this guide was David
Martin\URL{damartin\@@hgmp.mrc.ac.uk} at the Norwegian EMBnet
node.\URL{http://www.no.embnet.org} It is however the result of a team
effort. Thanks are due in particular to Johann Visagie for the FreeBSD
information. Other contributors are acknowledged in the text.
\subsection{Reproduction}
The obligatory bit of legalese. The first version of this guide was
not in the public domain but has been released under the GNU Free
Documentation License by the original author.
Although 'Free' in this license is usually explained as 'free as in
freedom, not as in beer' the authors are likely to appreciate offers
of free drinks should you ever meet them.
\section{What is \EMBOSS?}
\EMBOSS\ is a freely available suite of bioinformatics applications
and libraries. It can be downloaded via the internet, copied,
customised, and passed on under the terms of the various General
Public Licenses. \EMBOSS\ has been developed in response to the need
for a powerful, adaptable suite of software that can interface readily
with many different situations and meet the need of professional
bioinformaticists, particularly those needing high throughput and/or
scriptable capabilities.
\EMBOSS\ has primarily been developed by those responsible for the
public extensions to the GCG package. \EMBOSS\ supercedes much of EGCG
and includes far better database interaction. \EMBOSS\ also has the
benefit of freely accessible source code so novel applications can be
developed rapidly and at minimal cost.
\EMBOSS\ is currently only available for Unix/Linux systems but it has
been known to compile and run on Windows NT. This document will only
consider the UNIX version and will assume the reader has some
familiarity with UNIX system administration.
\subsection{Where to get it?}
\EMBOSS\ is available for download from the primary site at Open-Bio
by anonymous ftp.\URL{ftp://emboss.open-bio.org/pub/EMBOSS/} This
directory contains the \EMBOSS\ package and several associated
packages (collectively known as EMBASSY) that are distributed with
\EMBOSS. Download these to a suitable location. Documentation is
available on the WWW at the \EMBOSS\ web
site.\URL{http://emboss.sf.net/}
FreeBSD distributions from 4.2 onwards now include \EMBOSS\ as an
optional package maintained by Johann
Visagie.\URL{johann\@@egenetics.com} Please see section
\ref{sec:FreeBSD} for more information on installation on FreeBSD.
\chapter{Installation}
\section{Retrieving \EMBOSS\ by anonymous ftp}
\subsection{Interactive FTP}
Change directory to the location in which you wish to download the
\EMBOSS\ source code. In this example we will download the source to
\filename{/packages/EMBOSS}. Then start your ftp client and point it
to emboss.open-bio.org.
\begin{verbatim}
% ftp emboss.open-bio.org
Connected to emboss.open-bio.org.
220 (vsFTPd 2.0.1)
530 Please login with USER and PASS.
530 Please login with USER and PASS.
KERBEROS_V4 rejected as an authentication type
Name (emboss.open-bio.org:someuser):
\end{verbatim}
We are using anonymous FTP so type the username \ilcomm{anonymous}.
\begin{verbatim}
Name (emboss.open-bio.org:someuser): anonymous
331 Guest login ok, send your complete e-mail address as password.
Password:
\end{verbatim}
Enter your email address here as the password for user \filename{anonymous}.
\begin{verbatim}
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp>
\end{verbatim}
Move to the \EMBOSS\ directory and list the files. The output has been
truncated a little to save space.
\begin{verbatim}
ftp> cd /pub/EMBOSS
ftp> ls
200 PORT command successful.
150 Opening BINARY mode data connection for /bin/ls.
total 22334
... 1024 May 26 20:17 .gnu
... 9079913 May 14 21:37 EMBOSS-2.5.0.tar.gz
... 19 May 14 21:37 EMBOSS-latest.tar.gz -> EMBOSS-2.5.0.tar.gz
... 196872 May 12 18:49 EMNU-1.0.5.tar.gz
... 231485 May 15 13:55 ESIM4-1.0.0.tar.gz
... 405620 May 12 18:49 HMMER-2.1.1.tar.gz
... 1024 Jul 25 08:54 Jemboss
... 264189 May 12 18:49 MEME-2.3.1.tar.gz
... 251061 Jul 9 19:01 MSE-0.0.4.tar.gz
... 694450 May 12 18:49 PHYLIP-3.573c.tar.gz
... 200490 May 12 18:49 TOPO-0.1.tar.gz
... 1536 Jul 9 19:01 old
... 512 Jun 27 14:40 patchfiles
... 512 Feb 22 15:19 tutorials
226 Transfer complete.
ftp>
\end{verbatim}
Now download the source files
\begin{verbatim}
ftp> get EMBOSS-latest.tar.gz
200 PORT command successful.
150 Opening BINARY mode data connection for EMBOSS-latest.tar.gz
(9079913 bytes).
...
ftp>
\end{verbatim}
And repeat for each file. Or use \ilcomm{mget *gz} to download all the
files at once. Exit your ftp session with the command \ilcomm{bye}.
\subsection{FTP using \progname{wget}}
The program \progname{wget} can be used to download a remote directory
noninteractively. More details on \progname{wget} can be obtained from
the Free Software Foundation.\URL{http://www.gnu.org} Assuming you
have \progname{wget} installed, use the following command which
generates a lot of output on the screen:
\begin{verbatim}
% wget -m 'ftp://emboss.open-bio.org/pub/EMBOSS'
--15:04:41-- ftp://emboss.open-bio.org:21/pub/EMBOSS
=> `emboss.open-bio.org/pub/.listing'
Connecting to emboss.open-bio.org:21... connected!
Logging in as anonymous ... Logged in!
==> TYPE I ... done. ==> CWD pub ... done.
==> PORT ... done. ==> LIST ... done.
...
many pages truncated
...
FINISHED --15:04:55--
Downloaded: 2,657,366 bytes in 4 files
\end{verbatim}
A new directory \filename{emboss.open-bio.org} has been created and
EMBOSS can be found at \filename{emboss.open-bio.org/pub/EMBOSS}. You
may wish to create a symbolic link to this from your
\filename{/packages} directory for convenience.
\section{Unpacking}
You will have downloaded the \EMBOSS\ and EMBASSY packages to a
suitable directory. For this example we will assume you have
downloaded them to \filename{/packages} so you should now have the
following files (or similar) and maybe more packages in EMBASSY.
\begin{verbatim}
% ls
EMBOSS-latest.tar.gz
EMNU-1.0.5.tar.gz
ESIM4-1.0.0.tar.gz
HMMER-2.1.1.tar.gz
MEME-2.3.1.tar.gz
MSE-0.0.4.tar.gz
PHYLIP-3.573c.tar.gz
TOPO-0.1.tar.gz
\end{verbatim}
First unpack the \EMBOSS\ distribution
\begin{verbatim}
% gunzip EMBOSS-latest.tar.gz
% tar xf EMBOSS-latest.tar
\end{verbatim}
This will create a new directory, \filename{EMBOSS-2.5.0} or
similar. You may wish to use \ilcomm{tar xpf} for unpacking \EMBOSS.
Enter the \EMBOSS\ directory
\begin{verbatim}
% cd EMBOSS-2.5.0
\end{verbatim}
create a directory for the EMBASSY packages
\begin{verbatim}
% mkdir embassy
\end{verbatim}
Now move the EMBASSY packages to the EMBASSY directory
\begin{verbatim}
% mv ../MSE-0.0.4.tar.gz PHYLIP-3.573c.tar.gz \
TOPO-0.1.tar.gz embassy
\end{verbatim}
Go into the EMBASSY directory and unpack those packages.
\begin{verbatim}
% cd embassy
% gunzip MSE-0.0.4.tar.gz
% tar xf MSE-0.0.4.tar
\end{verbatim}
and so on for each EMBASSY package.
Go back up one directory to the main \EMBOSS\ package directory and
prepare to start compilation.
\section{Graphics Requirements}
Depending on your system you may need to explicitly configure the
graphics. EMBOSS includes the plplot graphics library and will link to
X11 and the recent (non-GIF) releases of the gd graphics library which
also require libz and libpng (and possibly libjpeg). Please see the
section 'Configuring \EMBOSS\ graphics' below.
To get PLPLOT to produce PNG images you will need to have the
\filename{z}\URL{http://www.info-zip.org/pub/infozip/zlib/},
\filename{png}\URL{http://libpng.sourceforge.net/} and
\filename{gd}\URL{http://www.boutell.com/gd/} libraries
installed. \filename{gd} version $>=$ 1.8.4 is recommended. A recent
release must be used as older versions support GIF which is NOT
supported in later versions because of software patent problems. If
for some reason you do not have the required libraries and your system
support group will not update them for the system then install all
three latest versions (\filename{z},\filename{gd},\filename{png}) to a
new directory and then add this new directory to your configure line
for \EMBOSS\ --- \verb+./configure --with-pngdriver=my_dir+ where the
\filename{z}, \filename{png} and \filename{gd} libraries were each
installed using \verb+./configure --prefix=my_dir+
??? It may also be helpful to ensure that the \ilcomm{LD\_LIBRARY\_PATH}
environment variable is set appropriately to include the libraries in
the path. ???
GD) http://www.boutell.com/gd/
Z) http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/
PNG) http://www.mirror.ac.uk/sites/ftp.libpng.org/pub/png/libpng.html
These also list the various mirror sites for non UK people.
Alternatively, using ftp :-
GD) (boutell.com no longer allows FTP, no known mirror sites, use HTTP)
Z) ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz
PNG) ftp://swrinde.nde.swri.edu/pub/png/src/libpng.1.2.1.tar.gz
You can unpack the tar.gz files in any directory, and install them in
a common area.
By default everything (including EMBOSS) installs
in /usr/local but in the examples below we use /home/joe/local
Note: gd does not use a ./configure script, and will fail at the
"make install" stage if the installation directory does not have a
/bin subdirectory. You can create this directory
(e.g. /home/joe/local/bin) if it does not already exist.
\subsection{zlib}
Zlib is avilable from these sites:
\filename{http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/}
\URL{http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/}
\filename{http://www.info-zip.org/pub/infozip/zlib/}
\URL{http://www.info-zip.org/pub/infozip/zlib/}
\filename{ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz}
\URL{ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz}
To install, pick up the sources and then:
\begin{verbatim}
% gunzip -c zlib-1_1_3_tar.gz | tar xf -
% ln -s zlib-1.1.3 zlib
% cd zlib
% ./configure --prefix=/home/joe/local
% make
% make install
% cd ..
\end{verbatim}
\subsection{libpng}
Libpng is avilable from these sites:
\URL{http://libpng.sourceforge.net/}
\URL{http://www.mirror.ac.uk/sites/ftp.libpng.org/pub/png/libpng.html}
\URL{ftp://swrinde.nde.swri.edu/pub/png/src/libpng.1.2.1.tar.gz}
To install, pick up the sources and then:
\begin{verbatim}
% gunzip -c libpng-1_2_1_tar.gz | tar xf -
% ln -s libpng-1.2.1 libpng
% cd libpng
% cp scripts/makefile.linux makefile
\end{verbatim}
Libpng has no configure script so you have to do some work by
hand. Edit makefile, change prefix to be /home/joe/local and any
other places - some files point to ../zlib others use
/usr/local/lib and /usr/local/include. On HP-UX this is
trickier. CFLAGS has to match the definition for zlib.
Now build using the edited makefile:
\begin{verbatim}
% make
% make install
% cd ..
\end{verbatim}
\subsection{gd}
Gd is available from these sites:
\URL{http://www.boutell.com/gd/}
There is no FTP server at this site.
To install, pick up the sources, build zlib and libpng first, and then:
\begin{verbatim}
% gunzip -c gd-1.8.4.tar.gz | tar xf -
% ln -s gd-1.8.4 gd
% cd gd
\end{verbatim}
Now edit Makefile, change the definitions for INCLUDEDIRS, LIBDIRS,
INSTALL\_LIB, INSTALL\_INCLUDE, INSTALL\_BIN, and change all
\filename{/usr/local} to \filename{/home/joe/local}
\begin{verbatim}
% make
% make install
% cd ..
\end{verbatim}
If the gd "make install" fails with a warning about the "bin"
directory, you need to create it by hand (see above).
To compile with the local version your EMBOSS configure line should
now read:
\begin{verbatim}
./configure --with-pngdriver=/home/joe/local
\end{verbatim}
This will look for the graphics libraries in your local installation
under \filename{/home/joe/local} instead of a system-wide location
configure keeps a copy of the previous settings. With earlier releases
of EMBOSS, or as a developer with an earlier release of autoconf, you
may need to delete files \filename{config.cache} and
\filename{config.status} if configure has been run before.
\section{Compilation}
Building \EMBOSS\ is easy. It follows the usual GNU style of
\ilcomm{./configure}, \ilcomm{make}, \ilcomm{make install}. We'll take
these steps one at a time.
\subsection{Configure}
To accept the default configuration, just type \ilcomm{./configure}
and let \EMBOSS\ get on with it. You may however want to make some
changes to the configuration parameters according to your local
policy. This section will not cover all the possibilities, just some
of the more common. The configuration script will attempt to find the
necessary components in your system to determine how to successfully
build \EMBOSS. It typically expects the GNU C compiler (gcc) and
several standard libraries that should already be part of your
Unix/Linux system. \EMBOSS\ should configure, compile and run on most
modern Linux distributions straight out of the box.
\subsubsection{Installation directory}
You need to have write permission on the directory in which you
eventually wish to install \EMBOSS. You may also wish to put it
somewhere else other than the standard location of
\filename{/usr/local/emboss}.
The installation directory is controlled by the \ilcomm{--prefix}
argument. For example, you can have all third party applications owned
by a non-privileged user and installed in a package specific directory
under \filename{/site/prog}
\begin{verbatim}
% ./configure --prefix=/site/prog/emboss
\end{verbatim}
will install \EMBOSS\ under \filename{/site/prog/emboss}. The binaries
will be installed in \filename{/site/prog/emboss/bin} with shared
libraries installed in \filename{/site/prog/emboss/lib}. System wide
data are installed in \filename{/site/prog/emboss/share/EMBOSS/data},
and the configuration files (ACD files) for the applications will be
installed in \filename{/site/prog/emboss/share/EMBOSS/acd} (or for
EMBASSY in directories corresponding to the package name.)
Documentation is installed in
\filename{/site/prog/emboss/share/EMBOSS/doc}. The installation
directory should be specified using a full path otherwise interesting
failures may occur.
The individual directories for installation can be modified with other
configuration commands but this is usually not necessary. Run
\ilcomm{./configure --help} to get more information on the directories
that can be changed and other configuration options.
Run \ilcomm{./configure} with the options you wish to use. This may
take a short time as various messages scroll up the screen.
All should be well with this and configure should exit with a message
like this:
\begin{verbatim}
... much output skipped
creating ./config.status
creating plplot/Makefile
creating plplot/lib/Makefile
creating nucleus/Makefile
creating ajax/Makefile
creating emboss/Makefile
creating emboss/acd/Makefile
creating test/Makefile
creating test/data/Makefile
creating test/embl/Makefile
creating test/pir/Makefile
creating test/swiss/Makefile
creating test/swnew/Makefile
creating test/wormpep/Makefile
creating emboss/data/Makefile
creating emboss/data/AAINDEX/Makefile
creating emboss/data/CODONS/Makefile
creating emboss/data/REBASE/Makefile
creating emboss/data/PRINTS/Makefile
creating emboss/data/PROSITE/Makefile
creating Makefile
\end{verbatim}
Configuration is now complete.
\subsubsection{Reconfiguration}
If at first you don't succeed, try, try and try again. It is not
uncommon to make typos or other mistakes when running
\ilcomm{./configure}. If you want to run configure again you should
run \ilcomm{make clean} before running \ilcomm{./configure} with
(hopefully) the correct options. With an earlier EMBOSS release, or as
a developer with an earlier release of autoconf, you must first delete
the file \filename{config.cache} but this is no longer produced.
\subsubsection{Configuring \EMBOSS\ graphics}
The PLPLOT library can produce output to many devices but requires
certain libraries that are NOT distributed with \EMBOSS
To get X-windows based output you must have X installed, or else PLplot
will not build the required driver. You may need to specify the
location of your X-windows library with the configuration options:
\ilcomm{--x-includes=DIR} (X include files are in DIR)
\ilcomm{--x-libraries=DIR} (X library files are in DIR)
To explicitly configure PLPLOT without X-windows, use \ilcomm{--without-x}.
You can explicitly tell \EMBOSS\ to not include PNG support with
\ilcomm{--without-pngdriver}.
You can tell if \ilcomm{./configure} has
found a suitable PNG library by watching for something like the
following when running \ilcomm{./configure}:
\begin{verbatim}
checking if png driver is wanted... yes
checking for inflateEnd in -lz... (cached) yes
checking for png_destroy_read_struct in -lpng... (cached) yes
checking for gdImageCreateFromPng in -lgd... (cached) yes
\end{verbatim}
This means that the configuration script has located the PNG libraries
on your system. If you see a message indicating that
\ilcomm{./configure} could not find the libraries or that the version
of \filename{gd} was too old then you should install the latest
versions of the libraries yourself and rerun configure with the
correct \ilcomm{--with-pngdriver} value.
When you run an EMBOSS graphical application you can see the list of
installed graph devices by giving '?' as the response to the 'Graph
type' prompt.
\subsection{Configuring for 64 bit systems}
\EMBOSS\ configure looks for \progname{gcc} and uses this of
preference when compiling \EMBOSS. This is not ideal for those who
wish to have a compiled and linked 64bit version of \EMBOSS. The
current version is NOT 64 bit clean (ie. it does not necessarily use
64 bit representation internally) but will compile and run quite
happily on 64 bit systems.
Additional notes are appended below for the various operating systems
we have information on.
\subsubsection{IRIX 6.5.10}
In order to compile for 64 bit on IRIX you have to specify the native
compiler in 64 bit mode (\ilcomm{cc -64}) and the linker in 64 bit
mode (\ilcomm{/bin/ld -64}). The following notes were provided by Jose
Ramon Valverde\footnote{jrvalverde\@@cnb.uam.es}.
{\it We have succeeded in compiling EMBOSS for IRIX using 64 bit
compilation.
It required some tweaking, but works. The recipe for those willing to
give it a try is: }
\begin{itemize}
\item remove '\filename{gcc}' from your path
\item define \filename{COMPILER\_DEFAULTS\_PATH} appropriately
(see \filename{pe\_environ}) to look for a
\filename{compiler.defaults} file containing
e.g. \ilcomm{:abi=64:isa=4:proc=r10k}
\item \ilcomm{./configure} in \EMBOSS\ and all EMBASSY subdirs
\item search in all files for '\ilcomm{CC = cc}' and
substitute it for '\ilcomm{CC = cc -64}'
\item same for '\ilcomm{LD = /bin/ld}' to '\ilcomm{LD = /bin/ld -64}'
\item \ilcomm{make}
\end{itemize}
{\it The reason is that compiling depends on the Makefile and on libtool,
as well as linking. We didn't spend much in looking at configure since
the above steps where so straightforward. We know we should look into
the configure script and add an option for 64-bit-irix-compile or some
such, but that'll have to wait till we have time for it.
Yes, we know, the search and substitute thing looks tedious, but it
isn't, honest: create a 'chfile.sh' out of the EMBOSS source hierarchy
containing: }
\begin{verbatim}
#/bin/sh
cp \$1 \$1.orig
mv \$1 tmpfile
sed -e 's/CC="cc"/CC="cc -64"/g' tmpfile | \
sed -e 's/CC = cc/CC = cc -64/g' | \
sed -e 's/\/bin\/ld/\/bin\/ld -64/g' \$1
rm tmpfile
## if you are sure, uncomment this
#rm \$1.orig
\end{verbatim}
{\it
'\ilcomm{cd}' to the \filename{emboss} directory and run}
\begin{verbatim}
find . -type f -exec /path/to/chfile.sh {} \; -print
\end{verbatim}
{\it and you are done with the \progname{CC}
changes. \progname{Libtool} requires special treatment since it uses
quotes. }
\subsection{Building \EMBOSS}
Building \EMBOSS\ is a matter of typing '\ilcomm{make}' and going to
find something else to do for the next ten minutes to half an hour
depending on the speed of your system. \EMBOSS\ will first build the
shared libraries (\filename{PL\_PLOT}, \filename{AJAX}, and
\filename{NUCLEUS}) and then build the applications.
You may see plenty of warnings (especially on SGI systems) complaining
about libraries not being used to resolve any symbols. These can be
safely ignored.
If all goes according to plan you should have built \EMBOSS
successfully. If not you will have to try to work out why the build
failed. If you can't work it out yourself, send an email describing
the problem to emboss-bug@emboss.open-bio.org preferably with a copy of the
output from the installation.
Assuming that compilation was successful, you can\footnote{You don't
have to do this. You can leave \EMBOSS\ where it is and just add the
path to the \filename{emboss} directory to your \ilcomm{PATH}} now
type '\ilcomm{make install}'. After a few minutes and many pagefuls of
messages, \EMBOSS\ should be installed where you specified in the
\ilcomm{--prefix} option (or in the default location of
\filename{/usr/local/emboss} if \ilcomm{--prefix} was not specified).
\subsection{Post compilation setup}
You will now need to make a few adjustments to your enviromnent to
ensure that \EMBOSS\ runs smoothly. \EMBOSS\ looks for certain
environment variables to determine where the libraries and data are
found. These instructions assumed you installed \EMBOSS\ in
\filename{/site/prog/emboss}. Adjust these instructions to suit your
installation. Insert the following lines at the end of
\filename{/etc/cshrc} (or \filename{~/.cshrc} for a personal
installation)
\begin{verbatim}
setenv PLPLOT_LIB /site/prog/emboss/lib
set path=( /site/prog/emboss/bin \${path} )
\end{verbatim}
Or for bash/ksh/sh users, insert the following at the end of
\filename{/etc/profile} or \filename{~/.bashrc}
\begin{verbatim}
PLPLOT_LIB=/site/prog/emboss/lib
PATH=/site/prog/emboss/bin:\$PATH
export PLPLOT_LIB PATH
\end{verbatim}
\EMBOSS\ should now be ready for use.
\subsection{\EMBOSS\ data files}
\EMBOSS\ will by default install the data files (including those
installed with \progname{Rebaseextract}, \progname{Prosextract}
\progname{Printsextract} \progname{Aaindexextract} or
\progname{Cutgsextract}) in the default directory
\filename{share/EMBOSS/data} in the install prefix directory. If
\EMBOSS\ is not installed (for example, your own personal
installation) the data files are written to \filename{emboss/data} in
the directory where emboss was built.
If you want to place your data files elsewhere, or have a separate set
of datafiles you wish to use, you can set the \ilcomm{EMBOSS\_DATA}
variable in \filename{emboss.default} or, for personal use, in your \filename{.embossrc} file.
\subsection{Testing your \EMBOSS\ installation}
You can test your \EMBOSS\ installation by trying the program
'\ilcomm{wossname}'
\begin{verbatim}
% wossname -auto |more
\end{verbatim}
This should give a long list of programs that are available. Press
space to page down through the list. This is just the \EMBOSS
programs and doesn't include any of the EMBASSY programs, but only
because they are not yet installed. (Note: Although wossname does have
a -noembassy option this does not work with installed programs because
wossname can no longer find any difference between EMBOSS and EMBASSY)
\section{Installing EMBASSY}
As well as the base libraries and standard EMBOSS distribution,
various extra packages (EMBASSY) are distributed with EMBOSS.
To install an EMBASSY package, go to the relevant directory. For
example to install PHYLIP (which was unpacked into
\filename{/packages/EMBOSS-2.5.0/embassy/PHYLIP-3.573c} earlier) go to
the relevant directory.
\begin{verbatim}
% cd /packages/EMBOSS-2.5.0/embassy/PHYLIP-3.573c
% ./configure --prefix=/site/prog/emboss
... output not shown
% make
... output not shown
% make install
... output not shown
\end{verbatim}
Note. You {\bf MUST} use the same arguments for \ilcomm{./configure}
that you used for the installation of the main \EMBOSS\ package. It
may be necessary to add other options as required by individual
packages (see below).
Repeat as necessary for the other EMBASSY packages. It should also be
noted that certain EMBASSY packages may require additional libraries.
You should now find that running \progname{wossname} as before lists
the EMBASSY programs.
\subsection{EMBASSY package specific notes}
In most cases, EMBASSY packages should build with no problems. Known
problems are described below.
\subsubsection{Packages with no known problems}
So far \progname{ESIM4}, \progname{HMMER}, \progname{MEME},
\progname{MSE}, \progname{PHYLIP} and \progname{TOPO} appear to
install without a problem using the same arguments to
\ilcomm{configure}.
\subsubsection{\progname{EMNU}}
\progname{EMNU} requires the \filename{curses} or \filename{ncurses} libraries
that come as standard on most Unix-like systems. In particular \progname{EMNU}
requires two header files \filename{form.h} and \filename{menu.h} that are not
distributed with all implementations.
If your \filename{curses/ncurses}
library is installed in a strange place then you may need to instruct
\ilcomm{configure} with the option
\begin{verbatim}
--with-curses=/path/to/curses
\end{verbatim}
\section{Installing \EMBOSS\ in package format}
\label{sec:FreeBSD}
\EMBOSS\ can be installed on almost all Unix/Linux operating systems
using the instructions above, but the package format can be far more
convenient. A package is a precompiled set of binaries with
installation instructions that can be set up on your system with a
minimum of work. In some cases the package will check for the correct
libraries and install those as necessary.
Brief instructions are given here for the packages of which we are
aware. These are maintained separately from the main source tree and
may also install some files in operating system standard locations
instead of the locations used by the `raw' \EMBOSS
distribution. Please read the more detailed instructions that
accompany each package.
\subsection{Installing \EMBOSS\ on FreeBSD}
A FreeBSD \EMBOSS\ package has been created by Johann
Visagie\URL{johann\@@egenetics.com} of Electric Genetics. This will be
distributed on the installation CD's and through the normal
distribution channels from FreeBSD version 4.2 onwards.
For the FreeBSD user with an up-to-date ports tree\footnote{FreeBSD
users can update their ports tree through a variety of
mechanisms. Please see the FreeBSD specific guide produced by Johann
for more information}, installing \EMBOSS\ reduces to two simple
commands (as root):
\begin{verbatim}
# cd /usr/ports/biology/emboss
# make install
\end{verbatim}
The FreeBSD specific parts of the port are that
\filename{emboss.default} is included with the other configuration
files under \filename{/usr/local/etc} as
\filename{emboss.default.sample}, and the \EMBOSS\ documentation is
installed in \filename{/usr/local/share/doc/EMBOSS} instead of the
default location. For further information on installation under
FreeBSD you are referred to the Resources chapter.
\chapter{Configuration}
\EMBOSS\ can be readily configured to match your requirements. In a
standard installation of \EMBOSS\ the configuration directives are
looked for in the following locations and in the following search
order:
\begin{enumerate}
\item A file \filename{emboss.default} in the \filename{share/EMBOSS}
subdirectory of your \EMBOSS\ installation.\footnote{This location may
have been redefined in installations of \EMBOSS\ that have been
packaged for specific operating systems. See section \ref{sec:FreeBSD}
for further information on OS specific package
installations.}\footnote{\EMBOSS\ will also look in the
\filename{emboss} directory under the \EMBOSS\ source distribution for
\filename{emboss.default.template} and install this as
\filename{emboss.default} if no existing file is found under the
installation directory}
\item A file \filename{.embossrc} in the directory specified by the
\ilcomm{EMBOSSRC} environment variable.
\item A file \filename{.embossrc} in the users home directory.
\end{enumerate}
\filename{emboss.default} and \filename{.embossrc} are plain text
files that can readily be edited to suit.\footnote{A sample
\filename{emboss.default} is located in \filename{emboss/acd} under
the source distribution.} Redefinitions of configuration parameters
will override those previously defined. In the descriptions that
follow only \filename{.embossrc} will be mentioned but all directives
can be placed in \filename{emboss.default} for site wide
configuration.
Several aspects of \EMBOSS\ can be defined. These are:
\begin{itemize}
\item\EMBOSS\ environment variables
\item\EMBOSS\ databases
\item Default behaviour of \EMBOSS\ programs
\end{itemize}
Databases are by far the most complex of these.
\EMBOSS\ will ignore blank lines in the \filename{emboss.default} and
\filename{.embossrc} files. It will also ignore any lines beginning
with \ilcomm{\#} or \ilcomm{!} allowing comments to illuminate the
declarations in the file.
\section{\EMBOSS\ environment variables}
\EMBOSS\ environment variables are set with an '\ilcomm{env}' or a
'\ilcomm{set}' declaration. '\ilcomm{env}' and '\ilcomm{set}' are
interchangeable. The most important environment variable is the
location of the \filename{.acd} files that describe each program.
\begin{verbatim}
set emboss_acdroot /site/prog/emboss/share/EMBOSS/acd
\end{verbatim}
Environment variables are useful for simplifying maintenance of your
\filename{.embossrc}. For example you may want to specify the location
of your databases as an environment variable. Then if you move the
databases you only have to update one line in the configuration file.
\begin{verbatim}
set emboss_database_dir /data/databases/flatfiles
\end{verbatim}
This would then be referred to later in \filename{.embossrc} as
\begin{verbatim}
\$emboss_database_dir/embl
\end{verbatim}
for the directory \filename{/data/databases/flatfiles/embl}
\subsection{Configuring \EMBOSS\ differently for different groups of users}
It may be the case that you have users who need to share a specific
setup. Maybe to have access to different sets of databases or need to
use a different data directory.
It can be time consuming and error prone to maintain a series of
individual \filename{.embossrc} files or to cause users to have to
work in the same directory or to copy an \filename{.embossrc} to each
directory they wish to work in. The environment variable
\ilcomm{EMBOSSRC} can be set to point to an arbitrary directory
containing an \filename{.embossrc} which can then be used to give
workgroup specific configuration. Each user then only needs to set
\ilcomm{EMBOSSRC} in their \filename{.cshrc} (\progname{csh}) or
\filename{.profile} (\progname{bash}) to get the workgroup specific
setup.
In our case we have several groups of researchers for whom we maintain
biological sequence databases. These databases have been made
available under restrictive licenses so that we cannot allow
researchers outside the groups to access the databases. Using
\ilcomm{\$EMBOSSRC} we can set up a common configuration for the
members of each group by defining the databases in the
\filename{\$EMBOSSRC/.embossrc} file.
\section{Databases}
\subsection{Database access modes}
\EMBOSS\ offers three modes for accessing databases:
\begin{description}
\item[Single:]\EMBOSS\ retrieves a single sequence indexed by
ID.
\item[Query:]\EMBOSS\ retrieves a set of sequences
corresponding to a query that can return more than one entry,
including accession numbers or wildcard IDs.
\item[All:]\EMBOSS\ returns all the sequences in the database
in no particular order.
\end{description}
Each database definition can configure one or many of these modes for
database access.
Typically \EMBOSS\ uses variations on the \progname{emblcd} system of
database indexing to provide rapid access in single and query modes to
flat file databases. The \progname{emblcd} method is implemented in a
variety of ways depending on the original format of your database.
The \progname{emblcd} method assumes that you have one or both of ID
and accession number in each record and that they are unique for the
whole database index. \EMBOSS\ also provides methods for retrieving
sequences via the WWW and three specific methods for interaction with
SRS\URL{http://www.lionbioscience.com/solutions/srs} installed localy
or through a remote public server. For other non flatfile databases
or flat file databases in formats not currently supported by \EMBOSS
you will have to configure an external application to retrieve
sequences.
\subsection{General database configuration.}
Each database is configured using a DB declaration.
The generalised form is
\begin{verbatim}
DB databasename [
Configuration options
]
\end{verbatim}
The configuration options are tag/value pairs and must contain at
least a description of the access method (using \ilcomm{method:} or
one or more of \ilcomm{methodsingle:}, \ilcomm{methodquery:} and
\ilcomm{methodall:}) and a description of the original format of the
sequences (using \ilcomm{format:}). In addition to these tags there
will be other tags that are needed for particular methods and other
tags that are optional.
\subsubsection{Database access methods}
The scope of each method is:
\begin{description}
\item[Single mode - \ilcomm{s}] Supports retrieval of a single
sequence.
\item[Query mode - \ilcomm{q}] Supports retrieval of a subset of the
sequences in the database specified using a wild card query in the
USA\footnote{Please see the \EMBOSS\ documentation for description of
Uniform Sequence Address format}
\item[All mode - \ilcomm{a}] Supports retrieval of all sequences in
the database as a stream of data.
\end{description}
An example entry for each access method is shown.
\paragraph{APP}\par\noindent
Modes: \ilcomm{a q s}\par\noindent
APP is the same as EXTERNAL.
\paragraph{BLAST}\par\noindent
Modes: \ilcomm{a q s} \par\noindent BLAST uses EMBLCD indices created
with \progname{dbiblast} to access databases in BLAST format, created
with NCBI's \ilcomm{formatdb} program.
Note that the latest 'format version 4' is not yet documented by
NCBI. \EMBOSS\ will only work with 'format version 3' databases, indexed
with:
\begin{verbatim}
formatdb -A F
\end{verbatim}
We hope to support 'format version 4' databases in future. If you pick
up a blast database from NCBI (or elsewhere) check the format. If it
is in the new format, you will need to pick up the original FASTA
format file, and either index it yourself with formatdb, or run
\ilcomm{dbifasta} and use the FASTA file in \EMBOSS\ (see EMBLCD
access method)
The definition should use format: ncbi because this is what the blast
formatdb databases store internally.
\begin{verbatim}
DB mydb [
#required parameters
method: "blast"
format: "ncbi"
type: "N"
dir: "\$emboss_db_dir/blas"t
#optional parameters
fields: "sv des"
release: "63.0"
comment: "my comment"
indexdir: "\$emboss_db_dir/blastindices"]
\end{verbatim}
The index files can be kept in the same directory as the database, but
as each EMBLCD index needs its own directory (the filenames are fixed)
the indexdir is usually defined.
The EMBLCD index files include the filenames indexed by
\ilcomm{dbiblast}. You can use the file: and exclude: attributes to
create file-specific subsets from a single \ilcomm{dbiblast} generated
index, but as blast index files are split only by the number of
entries this is not generally useful.
If the database was indexed with additional fields, they can be
included in the definition as fields: to allow their use in USAs.
\paragraph{DIRECT}\par\noindent
Modes: \ilcomm{a}\par\noindent Direct accesses the flatfile
directly. It returns all the database entries, one after the other. It
assumes no indexing. Queries are still possible as \EMBOSS\ will read
each entry and match it against the query, but are slow as the entire
database must be read.
\begin{verbatim}
DB mydb [
#required parameters
method: "direct"
format: "embl"
type: "N"
dir: "\$emboss_db_dir/mydb"
file: "*.dat"
#optional parameters
fields: "sv des key org"
release: "63.0"
comment: "My own database with no indices"
exclude: "est*.dat"
]
\end{verbatim}
For most cases, it is simpler to use \ilcomm{dbiflat} for EMBL,
Genbank or SwissProt format, or \ilcomm{dbifasta} to index FASTA or NCBI
format files, and to use the EMBLCD access method.
If the file format supports additional fields, they can be
included in the definition as fields: to allow their use in USAs.
\paragraph{EMBLCD}\par\noindent
Modes: \ilcomm{a q s}\par\noindent EMBLCD uses EMBLCD indices created
with \progname{dbiflat} or \progname{dbifasta} to access flatfile
databases in the original format.
\begin{verbatim}
DB mydb [
#required parameters
method: "emblcd"
format: "embl"
type: "N"
dir: "\$emboss_db_dir/emb"l
#optional parameters
fields: "sv des key org"
file: "*.dat"
release: "63.0"
comment: "my comment"
exclude: "est*.dat"
indexdir: "\$emboss_db_dir/indice"s
]
\end{verbatim}
The EMBLCD index files include the filenames indexed by
\ilcomm{dbiflat} or \ilcomm{dbifasta}. You can use the file: and
exclude: attributes to create file-specific subsets from a single
index.
This method can require careful setup. Please read the more specific
descriptions below.
If the database was indexed with additional fields, they can be
included in the definition as fields: to allow their use in USAs.
\paragraph{EXTERNAL}\par\noindent
Modes: \ilcomm{a q s}\par\noindent EXTERNAL uses an external
application to retrieve sequences. The ID is passed as an argument to
the application, either replacing \%s in the command string (if
present) or as an additional argument (if there is no \%s).
EXTERNAL requires the application to return the sequence on STDOUT. If
the application writes to somewhere else, simply wrap it in a script
that copies the output to STDOUT.
\begin{verbatim}
DB mydb [
#required parameters
method: "app"
format: "fasta"
type: "P"
app: "getfromdb"
#optional parameters
comment: "my own protein database with a custom retrieval program"
app: "getfromdb mydatabase \%s"
]
\end{verbatim}
The first app: definition will use the default call 'getfromdb mydb:id'
The alternative app: definition will use the \%s format and call
'getfromdb mydatabase id'
Both will pass either the ID or accession from the query, so that USAs
mydb-id:x13776 and mydb-acc:x13776 are equivalent.
\paragraph{GCG}\par\noindent
Modes: \ilcomm{a q s}\par\noindent GCG uses EMBLCD indices created
with \progname{dbigcg} to access databases in GCG format. This method
uses the \filename{.ref} and \filename{.seq} files created by the
\progname{GCG} suite of programs.
\begin{verbatim}
DB mygcgdb [
#required parameters
method: "gcg"
format: "embl"
type: "N"
dir: "\$emboss_db_dir/gcgembl"
#optional parameters
fields: "sv des key org"
file: "*.seq"
release: "63.0"
comment: "my comment"
exclude: "est*"
indexdir: "\$emboss_db_dir/indices"
]
\end{verbatim}
The EMBLCD index files include the filenames indexed by
\ilcomm{dbigcg}. You can use the file: and exclude: attributes to
create file-specific subsets from a single \ilcomm{dbigcg} generated
index.
\paragraph{SRS}\par\noindent
Modes: \ilcomm{a q s}\par\noindent SRS returns entries from a local
installation of SRS using the -e switch to getz to return entries in
the original format.
\begin{verbatim}
DB mydb [
#required parameters
method: "srs"
format: "embl"
type: "N"
#optional parameters
dbalias: "embl"
fields: "sv des key org"
app: "getz"
comment: "My srs indexed database"
release: "63.0"
]
\end{verbatim}
This access method builds an SRS commandline query to getz. If you
have getz installed under another name, define this as app:
The SRS query by default uses the EMBOSS database name. If the
database has a different name in SRS, define dbalias: as the database
name to pass to SRS.
SRS will return the results using 'getz -e' so the format should match
the format of the original data. For some formats this can be tricky
(PIR for example), so consider using SRSFASTA although this will lose
information that is not included in the FASTA format SRS output.
To query using the additional fields SRS supports, add them as fields:
\paragraph{SRSFASTA}\par\noindent
Modes: \ilcomm{a q s}\par\noindent
As SRS but returns the sequences in FASTA format. The definition must
include format: fasta so that EMBOSS will read the results in FASTA
format.
\begin{verbatim}
DB mydb [
#required parameters
method: "srsfasta"
format: "fasta"
type: "N"
#optional parameters
dbalias: "embl"
fields: "sv des key org"
app: "getz"
comment: "My srs indexed database"
release: "63.0"
]
\end{verbatim}
This access method builds an SRS commandline query to getz. If you
have getz installed under another name, define this as app:
The SRS query by default uses the EMBOSS database name. If the
database has a different name in SRS, define dbalias: as the database
name to pass to SRS.
SRS will return the results using 'getz -f -sf fasta' so the format
must be 'fasta'.
To query using the additional fields SRS supports, add them as fields:
\paragraph{SRSWWW}\par\noindent
Modes: \ilcomm{a q s}\par\noindent
As URL, but specific to an SRS web server. This method takes a base
URL (up to wgetz) for an SRS server, and builds the rest of the URL as
a valid SRS query.
By building the URL, SRSWWW access can query both ID and accession
number, and can query additional fields 'sv', 'des', 'key' and 'org'
if they are allowed with a fields definition.
\begin{verbatim}
DB mydb [
# required parameters
method: "srswww"
format: "genbank"
type: "N"
url: "http://www.infobiogen.fr/srs5bin/cgi-bin/wgetz?"
#optional parameters
dbalias: "genbank"
fields: "sv des key org"
comment: "Genbank by SRS from InfoBiogen"
proxy: ":"
httpversion: "1.0"
]
\end{verbatim}
Because queries for such fields to a remote server can find a very
large number of hits, and EMBOSS will load the entire output into
memory to process the HTML, many EMBOSS administrators choose not to
define these fields for an SRSWWW server.
If there is sufficient demand, it should be possible to rewrite the
HTML preprocessing to avoid buffering in memory.
SRSWWW support the \ilcomm{proxy} and \ilcomm{httpversion} settings
described under access method URL.
\paragraph{URL}\par\noindent
Modes: \ilcomm{s}\par\noindent URL uses a defined web server to
retrieve a specific entry. EMBOSS may fail if the HTML causes
complications with parsing of the entry.
\begin{verbatim}
DB mydb [
# required parameters
method: "url"
format: "genbank"
type: "N"
url: "http://www.infobiogen.fr/srs5bin/cgi-bin/wgetz?-e+[genbank-id:%s]"
#optional parameters
comment: "Genbank by ID from InfoBiogen"
]
\end{verbatim}
The \%s in the URL string indicates where \EMBOSS\ will insert the
identifier portion of the USA.
At many sites, remote HTTP access is controlled by a proxy
server. EMBOSS uses a proxy server defined as EMBOSS\_PROXY with a
value in the format \ilcomm{domain.address:port}, for example:
\begin{verbatim}
set emboss_httpversion 'proxy.mydomain.org:8080'
\end{verbatim}
This is a global definition. For selected databases (local web-based
services, for example) you can turn off the proxy inside the database
definition with:
\begin{verbatim}
DB [ ...
proxy: ":"
]
\end{verbatim}
HTTP access by default used HTTP protocol version 1.0. EMBOSS can also
support version 1.1, which provides chunked HTML results to improve
improve network performance. The HTTP version is controlled by a
variable EMBOSS\_HTTPVERSION and by a DB attribute, for example:
\begin{verbatim}
set emboss_httpversion "1.1"
\end{verbatim}
or
\begin{verbatim}
DB [ ...
httpversion: '1.1'
]
\end{verbatim}
\subsection{Mixed access methods}
For any given \ilcomm{method:} declaration, \EMBOSS\ will use that
method for those access modes supported by the method.
If you wish to specify which access mode (all, query or single) should
be handled by which database retrieval method then the
\ilcomm{methodsingle:}, \ilcomm{methodquery:} and \ilcomm{methodall:}
declarations should be used instead of \ilcomm{method:}
\begin{verbatim}
DB mydb [
methodsingle: app
format: fasta
app: "customapp myproteindb"
methodall: direct
dir: \$emboss_db_dir/myproteindb
file: myproteindb.dat
type: P
comment: "single and all access for myproteindb"
]
\end{verbatim}
You can mix these, for example, to use a script to query a file, and
direct acces to read all entries,
\begin{verbatim}
methodall: 'direct'
methodquery: 'external'
\end{verbatim}
\subsection{Indexing and configuring flatfile databases}
Flatfile databases are plain text files in a defined format such as
those released by EMBL, Swissprot and so on. The \EMBOSS\ program
\progname{dbiflat} is used to generate EMBLCD indices that can be used
for all types of database access. \progname{dbiflat} can process
databases in EMBL, SWISSPROT and GENBANK format. Pseudo EMBL format
databases which do not have unique ID and AC entries may cause
\progname{dbiflat} to do mysterious things and should be avoided.
\progname{dbiflat} (and the EMBLCD access method) requires the
databases to be uncompressed. The examples given here will not probe
the deeper secrets of \progname{dbiflat} (for which the reader is
referred to the documentation, or failing that the source code) but
will show a typical installation for a common database.
We assume that \EMBOSS\ has been installed and works. This can be
tested with the command \ilcomm{wossname -auto} which should list all
the programs available.
In this example we will index and configure the EMBL database for use
with \EMBOSS.
First download and unpack the EMBL database. This will require a
considerable amount of disk space. If you do not have sufficient space
available then just download a subset of the database.
Use \ilcomm{cd} to move the directory in which you have unpacked
EMBL. This should look something like this when you run \ilcomm{ls}:
\begin{verbatim}
% ls
est_fun.dat
est_hum1.dat
est_hum10.dat
.
Output truncated
.
syn.dat
unc.dat
vrl.dat
vrt.dat
\end{verbatim}
Run \progname{dbiflat} to create the EMBLCD indices.
\begin{verbatim}
% dbiflat
Index a flat file database
EMBL : EMBL
SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
GB : Genbank, DDBJ
Entry format [SWISS]: EMBL
Database name: embl
Database directory [.]:
Wildcard database filename [*.dat]:
Release number [0.0]: 63.0
Index date [00/00/00]: 31/07/00
\end{verbatim}
\progname{dbiflat} should happily chug away for some considerable time
(up to a few hours depending on the speed of your machine) and will
generate (eventually) the following index files:
\begin{verbatim}
% ls
acnum.hit
acnum.trg
division.lkp
entrynam.idx
\end{verbatim}
Now we create an entry in the \EMBOSS\ configuration files to acces
sthe database. It is probably a good idea to try new database
definitions in your local configuration file first.
Put the following entry in your \filename{.embossrc}
\begin{verbatim}
DB embl [
type: N
method: emblcd
format: embl
dir: \$emboss_db_dir/embl
file: "*.dat"
release: "63.0"
comment: "EMBL release 63.0"
]
\end{verbatim}
you will have needed to predefine \ilcomm{\$emboss\_db\_dir} using a
directive such as
\begin{verbatim}
set emboss_db_dir /path_to_databases
\end{verbatim}
somewhere in your \filename{emboss.default} or \filename{.embossrc}.
Save \filename{.embossrc} and try \progname{showdb}. You should see a
line that looks like:
\begin{verbatim}
% showdb
.. output deleted
embl N OK OK OK EMBL release 63.0
.. output deleted
\end{verbatim}
\subsection{Fine tuning the installation:}
\label{sec:finetune}
It is probably a good idea to set up subsections of the database so
that end users can search just the regions they wish to search. This
section applies to all access methods that use EMBLCD style indexes
and probably to others as well.
Files can be included with the declaration \ilcomm{file:} or excluded
with the declaration \ilcomm{exclude:}. It is a good idea to put the
wild card directory specifier (\filename{*/})in front of the filename
to ensure that any path that may be included in
\filename{division.lkp} will be matched. Please note especially the
notes for \progname{GCG} formatted databases indexed with
\progname{dbigcg}.
In order to just take the EST files in our EMBL database try the following:
\begin{verbatim}
DB emblest [
type: N
method: emblcd
format: embl
dir: \$emboss_db_dir/embl
file: "est*.dat"
release: "63.0"
comment: "EMBL release 63.0"
]
\end{verbatim}
Files can also be given as a space separated list enclosed in
quotes. For example to set up a database of all mamallian sequences
(except genomes) try the following:
\begin{verbatim}
DB emblallmam [
type: N
method: emblcd
format: embl
dir: \$emboss_db_dir/embl
file: "rod*.dat hum*.dat mam*.dat"
release: "63.0"
comment: "EMBL release 63.0"
]
\end{verbatim}
As you can see from these two examples, the \ilcomm{file:} tag takes a
space delimited list of filenames enclosed in quotes that can contain
normal wildcard (\ilcomm{?*}) characters.
It can be quite tedious to set up a long list of sequences to
search. In many cases you can use the \ilcomm{exclude:} tag to make
things easier.
\begin{verbatim}
DB emblnoest [
type: N
method: emblcd
format: embl
dir: \$emboss_db_dir/embl
file: "*.dat"
exclude: "est*.dat"
release: "63.0"
comment: "EMBL release 63.0"
]
\end{verbatim}
This configures the \filename{emblnoest} database to contain all of
EMBL except the EST's.
\subsection{Indexing and configuring GCG format databases}
\EMBOSS\ can access GCG formatted databases, thus avoiding having
multiple copies of the same databases in different formats for those
who still use GCG alongside the flatfiles. \EMBOSS\ creates EMBLCD
like indices for the GCG format databases using the program
\progname{dbigcg}. This runs in much the same way as
\progname{dbiflat}. You will need the GCG format \filename{.seq} and
\filename{.header} files in order to create an EMBLCD indexed
database.
Move to the GCG database directory containing your data and run
\progname{dbigcg}
\begin{verbatim}
Index a GCG formatted database
EMBL : EMBL
SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
GB : Genbank, DDBJ
PIR : NBRF
Entry format [EMBL]:
Database name: embl
Database directory [.]:
Wildcard database filename [*.seq]:
Release number [0.0]: 63.0
Index date [00/00/00]: 31/07/00
\end{verbatim}
The program will chug along for a while and will then generate the
EMBLCD index files for the GCG format database.
When \progname{dbigcg} prompts for the entry format (\ilcomm{Entry
format [EMBL]:}) you should enter the original database format before
you ran \progname{embltogcg} or similar to generate the \progname{GCG}
databases.
The following entry should be put in your \filename{.embossrc}
\begin{verbatim}
DB gcgembl [
type: N
method: gcg
format: embl
dir: \$emboss_db_dir/embl
file: "*.dat"
release: "63.0"
comment: "EMBL release 63.0"
]
\end{verbatim}
\progname{showdb} should show your newly configured database.
You can configure subsets of the databases in the same way as for the
original format databases, described in section \ref{sec:finetune}
above. One difference to \progname{dbiflat} indexing is that both the
\filename{.seq} and \filename{.header} files are listed in the
\filename{division.lkp} file. \ilcomm{file:} and \ilcomm{exclude:}
directives should therefore be of the form \ilcomm{exclude:
*/em\_est*} instead of just \ilcomm{*/em\_est*.seq}.
\subsection{Indexing and configuring BLAST databases}
BLAST format databases are generated for efficient homology searching
using the BLAST programs. It can be convenient to avoid redundant
copies of databases so \EMBOSS\ provides a mechanism for accessing
these databases.
BLAST format databases are those generated using the tools distributed
with NCBI-BLAST or with WU-BLAST.
\begin{comment}At present \EMBOSS
will only index BLAST databases created from FASTA format input files
with one of the recognised header formats. More information on the
relevant formats can be found in subsection \ref{subsec:fasta}
below.
\end{comment}
For indexing of one BLAST database, move to the
directory containing your BLAST format databases and run
\progname{dbiblast}
\begin{verbatim}
Index a BLAST database
Database name: blastsw
Database directory [.]:
database base filename [blastsw]:
Release number [0.0]:
Index date [00/00/00]:
N : nucleic
P : protein
? : unknown
Sequence type [unknown]: p
1 : wublast and setdb/pressdb
2 : formatdb
0 : unknown
Blast index version [unknown]: 2
\end{verbatim}
The program will chug along for a while and will then generate the
EMBLCD index files for the BLAST format database.
The following entry (or one like it that is more appropriate to your
particular installation) should be put in your \filename{.embossrc}
\begin{verbatim}
DB blastsw [
type: P
method: blast
format: ncbi
dir: \$emboss_db_dir/blastsw
file: "blastsw"
release: "38.9"
comment: "BLAST format Swissprot"
]
\end{verbatim}
\progname{showdb} should show your newly configured database.
Because of the way BLAST works, many sites may group their BLAST
databases in the same directory. You can index these {\it in situ}
with \progname{dbiblast} but this may require some extra steps if your
databases are not of the same type as generation of subsequent index
files will overwrite those that already exist. To avoid overwriting of
index files you can index many databases with one set of index files,
or you can use the \ilcomm{indexdir} options to place the indices in a
different directory.
There are two requirements for indexing several databases together in
one index. The first is that the databases are the same type
(protein/nucleic acid) and generated with the same tool (pressdb or
formatdb); the second is that all the ID and accession numbers in the
combined databases are unique.
Run \progname{dbiblast} as before but specify all the databases you
wish to be included when prompted for the database filename.
\begin{verbatim}
Index a BLAST database
Database name: alldbs
Database directory [.]:
database base filename [alldbs]: dbone dbtwo dbthree dbfour
Release number [0.0]:
Index date [00/00/00]:
N : nucleic
P : protein
? : unknown
Sequence type [unknown]: p
1 : wublast and setdb/pressdb
2 : formatdb
0 : unknown
Blast index version [unknown]: 2
\end{verbatim}
These can then be configured as described in section
\ref{sec:finetune} above by using the '\ilcomm{file:}' and
'\ilcomm{exclude:}' tags as appropriate.\footnote{There is one
difference to the standard EMBLCD access method in that the database
indexes will not allow the generation of exclusive subsections of the
combined database. If an ID or accession number is specified that is
present in the index then the sequence will be returned irrespective
of which database it is in.}
When you have databases of different types, generated with different
programs or where the ID/accession numbers are duplicated between
databases the preferred strategy is probably to keep the source data
for the individual databases in separate directories and index them
there.\footnote{Keeping one directory with symbolic links for your
BLAST installation will ensure that BLAST continues to function
correctly if you set BLASTDB to point to the directory containing the
symbolic links. The EMBOSS indices can be placed wherever you wish as
long as you remember to run \progname{dbiblast} with the appropriate
options and put an appropriate \ilcomm{indexdir} tag in the DB
configuration in your ~/.embossrc}
Alternatively you can place the index files in a separate
directory. This requires that you run \progname{dbiblast} with the
\ilcomm{-indexdirectory} option and set the \ilcomm{indexdir:} tag in
the database configuration to point to the correct database. The
example below illustrates database configuration using the
\ilcomm{indexdir} options.
\begin{verbatim}
% dbiblast -indexdir=/databases/indices/mydb
Index a BLAST database
Database name: mydb
Database directory [.]:
database base filename [mydb]:
Release number [0.0]:
Index date [00/00/00]:
N : nucleic
P : protein
? : unknown
Sequence type [unknown]: p
1 : wublast and setdb/pressdb
2 : formatdb
0 : unknown
Blast index version [unknown]: 2
\end{verbatim}
The corresponding entry in \filename{~/.embossrc} (or
\filename{emboss.default}) would look like:
\begin{verbatim}
DB mydb [
type: P
method: blast
format: ncbi
dir: \$emboss_db_dir/blastsw
indexdir: /databases/indices/mydb
file: mydb
release: "1.0"
comment: "My BLAST DB with an index in a different directory"
]
\end{verbatim}
Again, multiple indices cannot coexist in the same directory so care
should be taken when using the \ilcomm{indexdir} options that an
existing database index is not overwritten.
\begin{comment}
\subsubsection{FASTA formats used with \progname{dbiblast}}
\label{subsec:fasta}
The following FASTA formats are recognised by \progname{dbiblast}:
\begin{tabular}[t]{|l|l|}\hline \setlength{\baselineskip}{1.2\baselineskip}
GENBANK/NCBI & \ilcomm{> \ldots |accno|id \ldots }\\
\hline
GCG & \ilcomm{>{\sl dbname}:accno id \ldots }\\
\hline
SIMPLE &\ilcomm{ >accno id \ldots} \\
\hline
ID & \ilcomm{>id}\\
\hline
\end{tabular}
\ilcomm{...} refers to any text. Note that the ID must be the only
item in the header for the ID format.
\end{comment}
\subsection{Indexing and configuring FASTA databases}
The FASTA specifications just define the sequence file as a header
line that begins with \ilcomm{>} and subsequent lines containing the
sequence. The header line can be present in an almost infinite number
of formats, several of which can be processed by \EMBOSS. \EMBOSS
attempts to determine the accession number and/or ID for each
sequence. For indexing purposes there is no semantic difference
between an accession number and an ID. In the real world, acession
numbers are immutable, ie. they do not change with subsequent releases
of the dataabse, but ID's may change. In any case IDs and accession
numbers are unique, and that is all that matters for database indexing
\EMBOSS.
The program used to process FASTA format databases is
\progname{dbifasta}. It can recognise the following header line
formats, specified on the command line:
\begin{tabular}[t]{|l|l|}\hline\setlength{\baselineskip}{1.5\baselineskip}
simple &%
\ilcomm{>id ...}\\
\hline
idacc &%
\ilcomm{>id accno ...}\\
\hline
gcgid &%
\ilcomm{>db:id ...}\footnotemark[\value{footnote}]\\
\hline
gcgidacc &%
\ilcomm{>db:id acc ...}\footnotemark[\value{footnote}]\\
\hline
dbid &%
\ilcomm{>db id ...}\footnotemark\\
\hline
ncbi &%
\ilcomm{>...[|accno]|id ...}\footnotemark\\
\hline
\end{tabular}
\addtocounter{footnote}{-1} \footnotetext{{\em db} is one word}
\addtocounter{footnote}{1} \footnotetext{The ID is always taken to be
the characters after the last bar (\ilcomm{|}). The previous field is
also indexed but ONLY if it looks like an accession number
(e.g. AC00001).}
Other header formats will not be recognised by \progname{dbifasta} and
will cause indexing and/or database lookup to fail. If you have a
different header format that \progname{dbifasta} cannot yet handle you
have two options:
\begin{enumerate}
\item (The preferred option) Get a C programmer to modify the source
code for \progname{dbifasta} and recompile. If you are a community
spirited person you will also contribute these changes to the main
\EMBOSS\ source tree. (email emboss-dev\@@emboss.open-bio.org for more
information on contributing changes to the \EMBOSS\ source code and/or
read the \EMBOSS\ developers documentation)
\item (The quick hack) Write a custom script (using
e.g. BioPerl\URL{http://www.bioperl.org}) to access your database and
use \ilcomm{method: external} to configure it. This is less desirable
as you may be limited in the access modes you can use.
\end{enumerate}
To index a FASTA format database, run \progname{dbifasta}.
\begin{verbatim}
% dbifasta
Index a fasta database
simple : >ID
idacc : >ID ACC
gcgid : >db:ID
gcgidacc : >db:ID ACC
ncbi : >blah|...[|ACC]|ID
ID line format [idacc]:
Database name: mydb
Database directory [.]:
Wildcard database filename [*.dat]: mydb.fasta
Release number [0.0]:
Index date [00/00/00]:
\end{verbatim}
\progname{dbifasta} will chug along for a little while and will
produce the index files. You can use the same \ilcomm{indexdir}
options as for \progname{dbiflat},\progname{dbigcg} and
\progname{dbiblast} to place the indices in a different directory.
Place the following entry in your \filename{.embossrc}
\begin{verbatim}
DB mydb [
type: P
method: emblcd
format: fasta
dir: \$emboss_db_dir/mydb
file: mydb.fasta
comment: "My database"
]
\end{verbatim}
\ilcomm{format:} should be \ilcomm{dbid}, \ilcomm{ncbi} or
\ilcomm{fasta} (for every format except \ilcomm{dbid} or
\ilcomm{ncbi}. The same \ilcomm{file:} and \ilcomm{include:} tags can
be used as for the other database indexing programs.
\subsection{Configuring \EMBOSS\ to use SRS for database lookup.}
\ilcomm{method: srs} is really a special case of \ilcomm{method:
external} with some additional features.
SRS is a powerful database querying system that can cross reference
between different databases, launch applications and so on. SRS can be
run either through a web interface (see the description of the URL
method above for an example) or via the command line program
\progname{getz}. Indexing and configuring databases for SRS is
outside the scope of this document which will describe how to connect
to preconfigured and indexed SRS databases.\footnote{For information
on configuring and indexing SRS databases please look at the SRS
administrators guide \filename{www/doc/srsadmin.pdf} in your SRS 6
installation} If \progname{getz} is already in your \ilcomm{PATH}
environment variable then insert the following (or similar) in your
\filename{.embossrc}:
\begin{verbatim}
DB emblgetz [
type: N
method: srs
release: "63"
format: embl
comment: 'EMBL using getz'
dbalias: embl
app: getz
]
\end{verbatim}
This will provide access to the SRS database 'embl' as
\ilcomm{emblgetz:acc}. If the SRS database has a different name to the
\EMBOSS\ database (as is the case here) then the \ilcomm{dbalias:} tag
should be used to access the correct SRS database.
This configuration can be extremely slow for the all access mode. It
is probably a better idea to set up the database as follows:
\begin{verbatim}
DB emblgetz [
type: N
methodquery: srs
release: "63"
format: embl
comment: 'EMBL using getz'
dbalias: embl
app: getz
methodall: direct
file: "*.dat"
dir: \$emboss_db_dir/embl
]
\end{verbatim}
which will use \ilcomm{method: srs} for the \ilcomm{query} access mode
but will use \ilcomm{method: direct} for the \ilcomm{all} access mode,
thus speeding up reading of the whole database.
The SRSFASTA access method is identical to the normal SRS method
except that it returns the sequence in FASTA format and so does not
need a \ilcomm{format:} tag.
\subsection{Indexing and configuring other databases}
Many institutions may have local databases set up in their own
Laboratory Information Management System. \EMBOSS\ provides a simple
mechanism for interfacing with such systems.
As long as a program is available that can be called noninteractively
and returns the specified sequence on standard output, \EMBOSS\ can
interface with it. Use method: app or external (the two are
equivalent) and app: "program command". The ID given in the USA will
be appended to the command used to run the program. It is probably
best to specify the methods available using the method subsets,
methodall:, methodquery: and methodsingle: rather than using the
generic method: tag.
\section{Other data}
\EMBOSS\ can be integrated with some common biological
databases. These are described in this section.
\subsection{REBASE}
Rebase is the restriction enzyme database maintained by New
England Biolabs. It is needed for programs such as remap and
restrict.
The latest version of Rebase can be obtained by anonymous
FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/rebase} \EMBOSS\ needs
the \filename{withrefm} file. The data is extracted for \EMBOSS\ with
the program \progname{rebaseextract}.
\begin{verbatim}
% mkdir /site/prog/emboss/data/REBASE
% rebaseextract
Extract data from REBASE
Full pathname of WITHREFM: /data/rebase/withrefm.208
\end{verbatim}
Rebase is now installed and ready to use.
\subsection{TRANSFAC}
Transfac is the transcription factor binding site database. It is
available by anonymous
FTP.\footnote{ftp://transfac.gbf.de/pub/transfac/ascii/} Unpacking the
distribution reveals a file called site.dat. This is the one \EMBOSS
needs.
Run \progname{tfextract} to extract the data from TRANSFAC.
\begin{verbatim}
% tfextract
Extract data from TRANSFAC
Full pathname of transfac SITE.DAT: /databases/transfac/site.dat
\end{verbatim}
\progname{tfscan} can now access the TRANSFAC database.
\subsection{PROSITE}
Prosite is a database of regular expressions that match potentially
diagnostic regions for structural/functional classification of
proteins. \EMBOSS\ needs this database for the patmatmotifs program.
PROSITE can be obtained via anonymous
FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/prosite}
You may need to create a PROSITE subdirectory under data in the
\EMBOSS\ installation directory.
Then run \progname{prosextract} to build the \EMBOSS\ Prosite database.
\begin{verbatim}
% prosextract
Builds the PROSITE motif database for patmatmotifs to search
Enter name of prosite directory: /data/prosite
\end{verbatim}
PROSITE is now integrated into your EMBOSS installation.
\subsection{PRINTS}
Prints is a database of diagnostic patterns of blocks of sequence
homology in protein families. The PRINTS database can be searched
using the \EMBOSS\ program \progname{pscan}.
PRINTS can be obtained via anonymous
FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/prints} The database
is made available as compressed files which should be uncompressed
using \progname{gzip} before integrating them into \EMBOSS
PRINTS is integrated with \EMBOSS\ using the program \progname{printsextract}
\begin{verbatim}
% printsextract
Extract data from PRINTS
Input file: /data/prints/prints27_0.dat
\end{verbatim}
The PRINTS database is now integrated with \EMBOSS.
\subsection{AAINDEX}
An amino acid index is a set of 20 numerical values representing any
of the different physicochemical and biological properties of amino
acids. The AAindex1 section of the Amino Acid Index Database is a
collection of published indices together with the result of cluster
analysis using the correlation coefficient as the distance between two
indices. This section currently contains 437 indices in release
\filename{4.0} of the database.
The \EMBOSS\ programs \progname{pepwindow} and {pepwindowall} plot
hydrophobicity using the data from an Aaindex entry. If Aaindex is
installed these programs can plot the other amino acid properties.
Aaindex can be obtained via anonymous
FTP.\footnote{ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex/aaindex1}
Aaindex is integrated with \EMBOSS\ using the program \progname{aaindexextract}
\begin{verbatim}
% aaindexextract
Extract data from AAINDEX
Full pathname of file aaindex1: /data/aaindex/aaindex1
\end{verbatim}
The AAINDEX database is now integrated with \EMBOSS.
\subsection{CUTG}
The CUTG database contains a series of codon usage tables calculated
from GenBank.
CUTG can be obtained via anonymous
FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/cutg/ or
ftp://ftp.kazusa.or.jp/pub/codon/current/}
CUTG is integrated with \EMBOSS\ using the program
\progname{cutgextract} which writes files to the CODONS data
directory.
\begin{verbatim}
% cutgextract
Extract data from CUTG
CUTG directory [.]: /data/cutg/
\end{verbatim}
The CUTG database is now integrated with \EMBOSS.
\subsection{Miscellaneous data files}
Other data files should be kept in the data directory under the main
\EMBOSS\ installation. Individual users personal data files can be
kept in the current working directory, a subdirectory
\filename{.embossdata} of the current directory, their home directory
or a subdirectory \filename{.embossdata} of their home
directory. \EMBOSS\ will search these locations in this order and will
stop as soon as it finds a matching file. If the personal directories
do not contain the desired file, \EMBOSS\ will search the system wide
data directory, \filename{/site/prog/emboss/data} in this example.
Apparently inexplicable errors when running \EMBOSS\ programs may be
caused by the system not using the data files one expects. The search
path can be displayed in search order using the command
\progname{embossdata}.
\section{Default program settings}
As with many other areas, the default behaviour of programs can be
controlled by setting appropriate values in \filename{.embossrc}.
All general qualifiers\footnote{See the \EMBOSS\ Quick Guide or the
web documentation (or use \ilcomm{wossname -help -verbose}) for an
overview of general qualifiers.} can be specified as
\begin{verbatim}
set emboss_QUALIFIER 1
\end{verbatim}
where \ilcomm{QUALIFIER} is one of the general qualifiers and the
value can be \ilcomm{1} or \ilcomm{1} for true, or \ilcomm{0} or
\ilcomm{N} for false.
Setting the qualifier value to true has the effect of running every
program with that qualifier set.\footnote{You can specifically unset
it by using the \ilcomm{-noQUALIFIER} command line option} Qualifiers
can be set and will work in the same way as if you set them when
running the program. For example you can \ilcomm{set emboss\_verbose
Y} and the program will run normally, but when the program is run with
the \ilcomm{-help} qualifier, the output will be in verbose form.
There is no point in globally setting options that are there for
producing help output.
Qualifiers that can be set:
\begin{description}
\item[VERBOSE] Causes \ilcomm{-help} to print verbose text.
\item[STDOUT] Causes all output to go to \filename{STDOUT} as
default. Programs will usually build a default output file name form
the input sequence and the program name.
\item[DEBUG] Writes debugging output to a file. Useful for finding
bugs as a command line option.
\item[OPTIONS] Enable prompting for optional parameters.
\item[FILTER] Take input from \filename{STDIN} and send it to
\filename{STDOUT}, and turn on \ilcomm{-auto}
\item[AUTO] Do not prompt for any options but accept the defaults if
no values are given.
\item[WARNING] Print warning messages to \filename{STDERR} (default is true)
\item[ERROR] Print error messages to \filename{STDERR} (default is true)
\item[FATAL] Print fatal messages to \filename{STDERR} (default is true)
\item[DIE] Print crash messages to \filename{STDERR}
\end{description}
These general qualifiers are typically used by advanced users
(\ilcomm{-options}, \ilcomm{-verbose}) or by developers
(\ilcomm{-debug -acdlog}).
Other program options that can be set are \ilcomm{emboss\_format},
\ilcomm{emboss\_acdroot}, and \ilcomm{emboss\_data}. The value of
\ilcomm{emboss\_format} determines which default sequence format to
use for output. for example, if you are running \EMBOSS\ alongside
\progname{GCG} you may wish to have the following entry in your
\progname{.embossrc}
\begin{verbatim}
set emboss_FORMAT gcg
set emboss_OUTFORMAT gcg
\end{verbatim}
which has the effect of using \progname{GCG} format by
default.\footnote{This can of course be overridden using the
\ilcomm{-sformat} and \ilcomm{-osformat} associated qualifiers. See
the \EMBOSS\ ACD Syntax documentation or the \EMBOSS\ Quick Guide for
more information.}
\ilcomm{emboss\_acdroot} \filename{/path/to/acd} can be set if you
wish to use a different directory for the ACD files, and
\ilcomm{emboss\_data} \filename{/path/to/data} if you wish to use a
separate data directory.
\section{Logging}
Many system administrators may wish to make use of the logging
facilities of \EMBOSS. Setting the variable \ilcomm{emboss\_logfile}
in \filename{emboss.default} or \filename{.embossrc} allows the system
to keep a log of which programs are used when and by whom.
\begin{verbatim}
set emboss_logfile /site/log/emboss.log
\end{verbatim}
The log file structure is very simple. Three tab separated fields are
stored, program name, user name, and the date and time.
\begin{verbatim}
prettyplot joeuser Wed Aug 02 14:29:13 2000
\end{verbatim}
The file defined in emboss\_logfile should be world writable. The
following command ensures logging can occur.
\begin{verbatim}
chmod +w /site/log/emboss.log
\end{verbatim}
All settings can be overridden in a users \filename{.embossrc} files
by redefining the relevant variables. So to prevent our system usage
being logged we can redefine emboss\_logfile by putting the following
entry in our \filename{.embossrc} file.
\begin{verbatim}
set emboss_logfile /dev/null
\end{verbatim}
This behaviour may change in the future to prevent users redefining
some system settings.
\chapter{Graphical interfaces to EMBOSS}
This chapter needs to be written. It will be written when the
available GUIs are stable enough to document.
\chapter{Resources}
\section{Web sites}
\subsection{Programs}
\begin{description}
\item[\EMBOSS\ source code]ftp://emboss.open-bio.org/pub/EMBOSS
\item[\EMBOSS\ Documentation]http://emboss.sf.net/
\item[BLAST tools]Tools for generating BLAST format databases are
contained in the NCBI toolkit which can be obtained from NCBI at:
\begin{quote}
http://www.ncbi.nlm.nih.gov/
\end{quote}
\item[SRS software]The SRS software can be obtained from Lion
Bioscience.\URL{http://www.lionbioscience.com/solutions/srs} This is a
commercial package but at the time of writing is available free of
charge to academic institutions.
\item[\progname{wget}]Various useful utilities including the
\progname{wget} program are available from the Free Software
Foundation.\URL{http://www.gnu.org}
\end{description}
\subsection{Databases}
Most of the databases mentioned in the text along with many others can
be obtained via anonymous ftp from the European Bioinformatics
Institute (EBI) at:
\begin{quote}
ftp://ftp.ebi.ac.uk/pub/databases
\end{quote}
Please use a mirror site where possible to avoid overloading of the
EBI's resources.
Other databases can be obtained from NCBI (Genbank,UniGene etc.)
\subsection{Other Documentation}
Please review the \EMBOSS\ documentation available on the WWW at the
URL above.
\begin{description}
\item[The \EMBOSS\ Quick guide]A pocket reference guide to using
\EMBOSS\URL{ftp://ftp.no.embnet.org/pub/EMBOSS-extra/emboss-qg.ps}.
\item[The \EMBOSS\ Tutorial]A tutorial to give an introduction to
using \EMBOSS\ for bioinformatics
users.\URL{http://www.hgmp.mrc.ac.uk/Registered/Option/emboss.html}
\item[The updated ABC guide]This is a series of bioinformatics
practicals based predominantly on
\EMBOSS.\URL{ftp://ftp.no.embnet.org/pub/ABC}
\item[EMBOSS-FreeBSD-HOWTO]Detailed documentation on installation of
\EMBOSS\ on
FreeBSD.\URL{ftp://ftp.no.embnet.org/pub/EMBOSS-extra/EMBOSS-FreeBSD-HOWTO}
\end{description}
\section{Maintainance of your \EMBOSS\ installation}
\EMBOSS\ is a rapidly evolving software packages. It is constantly
being improved, new features added and `issues' resolved. In addition
there are new applications added and you probably want to make use of
these.
\subsection{Automated installation of \EMBOSS\ and EMBASSY}
Once you have installed \EMBOSS\ and got it to work you have solved
the hardest part of the struggle. Updating \EMBOSS\ as new releases
appear\footnote{\EMBOSS\ is rebuilt nightly from CVS, tested, and,
assuming it passes the compilation tests, the latest version is posted
to the \EMBOSS\ FTP server. } can be quite tedious. UNIX is designed
for the lazy, so here is our lazy man's guide to always having an up to
the minute \EMBOSS\ installation.
The following script can be run manually (it should probably be
`\ilcomm{source}d' rather than executed directly) or can be fired off
with cron (in the early hours of the morning is a good time). It
assumes you are installing \EMBOSS\ outside the source directory and
have write permissions to do so.
\EMBOSS\ will update \EMBOSS\ distributed files but will not alter or
overwrite your own datafiles\footnote{Assuming of course that you
haven't overwritten \EMBOSS\ datafiles with your own to begin with.}
or your \filename{emboss.default}.
\begin{verbatim}
# This script should be sourced, not run.
# EMBOSS UPDATE.
# it assumes \$packages_dir/EMBOSS is a symbolic link to
# \$mirror_dir/emboss.open-bio.org/pub/EMBOSS
#
#site specific variables: season according to taste..
set mirror_dir=('/ftp/mirrors')
set packages_dir=('/site/newprog')
set emboss_config_options=\
('--prefix=/site/prog/emboss --with-pngdriver=/site/lib')
# Now the script proper
set oldpwd=`pwd`
cd \$mirror_dir
echo 'updating EMBOSS'
if ( `wget -m 'ftp://emboss.open-bio.org/pub/EMBOSS' |& \
tail -1 | awk '/^Downloaded:/{print \$5}'` != "0" ) then
cd \${packages_dir}/EMBOSS
echo 'new EMBOSS programs found .. installing'
set latest_emboss=`ls -t EMBOSS*|head -1`
cd \$packages_dir
rm -Rf EMBOSS-*
tar zxf EMBOSS/\$latest_emboss
set emboss_dir=`ls -dt EMBOSS-*[^z]|head -1`
#the next line is necessary on our system but may not be for yours.
setenv LD_LIBRARYN32_PATH /site/lib
cd \$emboss_dir
# If you have any site specific changes to the source code
# that you want to include, copy them in here
./configure \$emboss_config_options &&\
make && \
make install
#Now unpack and build EMBASSY
mkdir embassy
cd embassy
#Unpack and build each package one at a time
foreach embassadir ( `ls ../../EMBOSS/*gz |grep -v E
MBOSS-` )
tar zxf \$embassadir
set embassadir_arch=\$embassadir:t
set embassadir_root=\$embassadir_arch:r
cd \$embassadir_root:r
./configure \$emboss_config_options &&\
make && \
make install
cd ..
end
else
echo 'No new version of EMBOSS available'
endif
cd \$oldpwd
\end{verbatim}
\subsection{Automated database updating}
In the same way, scripts can be written to automatically update the
biological databases. An example is given here for REBASE. As all the
parameters for \EMBOSS\ programs can be specified on the command line
it is a trivial matter to include index generation in your nightly
update scripts. The management of a bioinformatic resource is beyond
the scope of this document, though \EMBOSS\ goes a long way towards
easing the burden of management.
\subsubsection{Automated update of REBASE}
This script will look for a new version of REBASE and install it in
\EMBOSS\ using \progname{rebaseextract}.
\begin{verbatim}
# This script should be sourced, not run.
# REBASE UPDATE. Should be run just after the beginning of the month.
set mirrors_dir=('/ftp/mirrors')
set oldpwd=`pwd`
cd \$mirrors_dir
if ( ` wget -m 'ftp://ftp.ebi.ac.uk/pub/databases/rebase/*' |& \
tail -1 | awk '/^Downloaded:/{print \$5}'` != "0" ) then
cd ftp.ebi.ac.uk/pub/databases/rebase
cp `ls -t withrefm.*.Z|head -1` withrefm.Z
uncompress withrefm.Z
rebaseextract \
\${mirrors_dir}/ftp.ebi.ac.uk/pub/databases/rebase/withrefm
rm withrefm
endif
cd \$oldpwd
\end{verbatim}
We make no guarantees that these scripts will work correctly on your
system. If it deletes all your files, spams your associates, scratches
your CD's and initiates a nuclear strike on a small unpopulated
pacific island it is NOT OUR FAULT. It just happens to work for us.
\chapter{GNU Free Documentation License}
\begin{verbatim}
GNU Free Documentation License
Version 1.1, March 2000
Copyright (C) 2000 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
0. PREAMBLE
The purpose of this License is to make a manual, textbook, or other
written document "free" in the sense of freedom: to assure everyone
the effective freedom to copy and redistribute it, with or without
modifying it, either commercially or noncommercially. Secondarily,
this License preserves for the author and publisher a way to get
credit for their work, while not being considered responsible for
modifications made by others.
This License is a kind of "copyleft", which means that derivative
works of the document must themselves be free in the same sense. It
complements the GNU General Public License, which is a copyleft
license designed for free software.
We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does. But this License is not limited to software manuals;
it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book. We recommend this License
principally for works whose purpose is instruction or reference.
1. APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work that contains a
notice placed by the copyright holder saying it can be distributed
under the terms of this License. The "Document", below, refers to any
such manual or work. Any member of the public is a licensee, and is
addressed as "you".
A "Modified Version" of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of
the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall subject
(or to related matters) and contains nothing that could fall directly
within that overall subject. (For example, if the Document is in part a
textbook of mathematics, a Secondary Section may not explain any
mathematics.) The relationship could be a matter of historical
connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding
them.
The "Invariant Sections" are certain Secondary Sections whose titles
are designated, as being those of Invariant Sections, in the notice
that says that the Document is released under this License.
The "Cover Texts" are certain short passages of text that are listed,
as Front-Cover Texts or Back-Cover Texts, in the notice that says that
the Document is released under this License.
A "Transparent" copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, whose contents can be viewed and edited directly and
straightforwardly with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some widely available
drawing editor, and that is suitable for input to text formatters or
for automatic translation to a variety of formats suitable for input
to text formatters. A copy made in an otherwise Transparent file
format whose markup has been designed to thwart or discourage
subsequent modification by readers is not Transparent. A copy that is
not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format, SGML
or XML using a publicly available DTD, and standard-conforming simple
HTML designed for human modification. Opaque formats include
PostScript, PDF, proprietary formats that can be read and edited only
by proprietary word processors, SGML or XML for which the DTD and/or
processing tools are not generally available, and the
machine-generated HTML produced by some word processors for output
purposes only.
The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page. For works in
formats which do not have any title page as such, "Title Page" means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies
to the Document are reproduced in all copies, and that you add no other
conditions whatsoever to those of this License. You may not use
technical measures to obstruct or control the reading or further
copying of the copies you make or distribute. However, you may accept
compensation in exchange for copies. If you distribute a large enough
number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and
you may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies of the Document numbering more than 100,
and the Document's license notice requires Cover Texts, you must enclose
the copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover. Both covers must also clearly and legibly identify
you as the publisher of these copies. The front cover must present
the full title with all words of the title equally prominent and
visible. You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve
the title of the Document and satisfy these conditions, can be treated
as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto adjacent
pages.
If you publish or distribute Opaque copies of the Document numbering
more than 100, you must either include a machine-readable Transparent
copy along with each Opaque copy, or state in or with each Opaque copy
a publicly-accessible computer-network location containing a complete
Transparent copy of the Document, free of added material, which the
general network-using public has access to download anonymously at no
charge using public-standard network protocols. If you use the latter
option, you must take reasonably prudent steps, when you begin
distribution of Opaque copies in quantity, to ensure that this
Transparent copy will remain thus accessible at the stated location
until at least one year after the last time you distribute an Opaque
copy (directly or through your agents or retailers) of that edition to
the public.
It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to give
them a chance to provide you with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy
of it. In addition, you must do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct
from that of the Document, and from those of previous versions
(which should, if there were any, be listed in the History section
of the Document). You may use the same title as a previous version
if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities
responsible for authorship of the modifications in the Modified
Version, together with at least five of the principal authors of the
Document (all of its principal authors, if it has less than five).
C. State on the Title page the name of the publisher of the
Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications
adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license notice
giving the public permission to use the Modified Version under the
terms of this License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections
and required Cover Texts given in the Document's license notice.
H. Include an unaltered copy of this License.
I. Preserve the section entitled "History", and its title, and add to
it an item stating at least the title, year, new authors, and
publisher of the Modified Version as given on the Title Page. If
there is no section entitled "History" in the Document, create one
stating the title, year, authors, and publisher of the Document as
given on its Title Page, then add an item describing the Modified
Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for
public access to a Transparent copy of the Document, and likewise
the network locations given in the Document for previous versions
it was based on. These may be placed in the "History" section.
You may omit a network location for a work that was published at
least four years before the Document itself, or if the original
publisher of the version it refers to gives permission.
K. In any section entitled "Acknowledgements" or "Dedications",
preserve the section's title, and preserve in the section all the
substance and tone of each of the contributor acknowledgements
and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document,
unaltered in their text and in their titles. Section numbers
or the equivalent are not considered part of the section titles.
M. Delete any section entitled "Endorsements". Such a section
may not be included in the Modified Version.
N. Do not retitle any existing section as "Endorsements"
or to conflict in title with any Invariant Section.
If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no material
copied from the Document, you may at your option designate some or all
of these sections as invariant. To do this, add their titles to the
list of Invariant Sections in the Modified Version's license notice.
These titles must be distinct from any other section titles.
You may add a section entitled "Endorsements", provided it contains
nothing but endorsements of your Modified Version by various
parties--for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition of a
standard.
You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list
of Cover Texts in the Modified Version. Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the Document already
includes a cover text for the same cover, previously added by you or
by arrangement made by the same entity you are acting on behalf of,
you may not add another; but you may replace the old one, on explicit
permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its
license notice.
The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy. If there are multiple Invariant Sections with the same name but
different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original
author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections entitled "History"
in the various original documents, forming one section entitled
"History"; likewise combine any sections entitled "Acknowledgements",
and any sections entitled "Dedications". You must delete all sections
entitled "Endorsements."
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this
License in the various documents with a single copy that is included in
the collection, provided that you follow the rules of this License for
verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute
it individually under this License, provided you insert a copy of this
License into the extracted document, and follow this License in all
other respects regarding verbatim copying of that document.
7. AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, does not as a whole count as a Modified Version
of the Document, provided no compilation copyright is claimed for the
compilation. Such a compilation is called an "aggregate", and this
License does not apply to the other self-contained works thus compiled
with the Document, on account of their being thus compiled, if they
are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one quarter
of the entire aggregate, the Document's Cover Texts may be placed on
covers that surround only the Document within the aggregate.
Otherwise they must appear on covers around the whole aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections. You may include a
translation of this License provided that you also include the
original English version of this License. In case of a disagreement
between the translation and the original English version of this
License, the original English version will prevail.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except
as expressly provided for under this License. Any other attempt to
copy, modify, sublicense or distribute the Document is void, and will
automatically terminate your rights under this License. However,
parties who have received copies, or rights, from you under this
License will not have their licenses terminated so long as such
parties remain in full compliance.
10. FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions
of the GNU Free Documentation License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number.
If the Document specifies that a particular numbered version of this
License "or any later version" applies to it, you have the option of
following the terms and conditions either of that specified version or
of any later version that has been published (not as a draft) by the
Free Software Foundation. If the Document does not specify a version
number of this License, you may choose any version ever published (not
as a draft) by the Free Software Foundation.
ADDENDUM: How to use this License for your documents
To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and
license notices just after the title page:
Copyright (c) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
A copy of the license is included in the section entitled "GNU
Free Documentation License".
If you have no Invariant Sections, write "with no Invariant Sections"
instead of saying which ones are invariant. If you have no
Front-Cover Texts, write "no Front-Cover Texts" instead of
"Front-Cover Texts being LIST"; likewise for Back-Cover Texts.
If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License,
to permit their use in free software.
\end{verbatim}
\chapter{Acknowledgements}
The acknowledgements and credits are found at the front of this guide
because no one ever reads them if they are at the back.
\end{document}
|