This file is indexed.

/usr/share/EMBOSS/doc/manuals/admin.tex is in emboss-doc 6.6.0-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
2532
2533
2534
2535
2536
2537
2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
2578
2579
2580
2581
2582
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
\documentclass{report}
\usepackage{verbatim}
\usepackage{emboss}

\begin{document}
\title{The \EMBOSS\ Administrator's Guide}
\author{David Martin, EMBnet Norway \\
Peter Rice, LION Bioscience \\
Alan Bleasby, HGMP (EMBnet UK)}
\date{This guide relates to \EMBOSS\ 2.5.0}

\maketitle

Copyright (c) 2000, 2002 David Martin, Peter Rice, Alan Bleasby.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation
License\URL{http://www.gnu.org/copyleft/fdl.html}, Version 1.1 or any
later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts.  A copy of the license is included in the chapter entitled "GNU
Free Documentation License".

\tableofcontents

\chapter{Introduction}
\section{About this document}
This guide has been written to assist system administrators and
developers with the installation and configuration of \EMBOSS. If you
are reading this to find out how to do bioinformatics then you are
wasting your time. You are referred instead to the Resources chapter
below where there is a list of more relevant literature and web sites.
Experienced users may find this document useful for configuring their
own databases and customising their \EMBOSS\ experience.


\subsection{Credits}
The original author of this guide was David
Martin\URL{damartin\@@hgmp.mrc.ac.uk} at the Norwegian EMBnet
node.\URL{http://www.no.embnet.org} It is however the result of a team
effort. Thanks are due in particular to Johann Visagie for the FreeBSD
information. Other contributors are acknowledged in the text.

\subsection{Reproduction}
The obligatory bit of legalese. The first version of this guide was
not in the public domain but has been released under the GNU Free
Documentation License by the original author.

Although 'Free' in this license is usually explained as 'free as in
freedom, not as in beer' the authors are likely to appreciate offers
of free drinks should you ever meet them.
\section{What is \EMBOSS?}

\EMBOSS\ is a freely available suite of bioinformatics applications
and libraries. It can be downloaded via the internet, copied,
customised, and passed on under the terms of the various General
Public Licenses.  \EMBOSS\ has been developed in response to the need
for a powerful, adaptable suite of software that can interface readily
with many different situations and meet the need of professional
bioinformaticists, particularly those needing high throughput and/or
scriptable capabilities.

\EMBOSS\ has primarily been developed by those responsible for the
public extensions to the GCG package. \EMBOSS\ supercedes much of EGCG
and includes far better database interaction. \EMBOSS\ also has the
benefit of freely accessible source code so novel applications can be
developed rapidly and at minimal cost.

\EMBOSS\ is currently only available for Unix/Linux systems but it has
been known to compile and run on Windows NT. This document will only
consider the UNIX version and will assume the reader has some
familiarity with UNIX system administration.

\subsection{Where to get it?}

\EMBOSS\ is available for download from the primary site at Open-Bio
by anonymous ftp.\URL{ftp://emboss.open-bio.org/pub/EMBOSS/} This
directory contains the \EMBOSS\ package and several associated
packages (collectively known as EMBASSY) that are distributed with
\EMBOSS. Download these to a suitable location. Documentation is
available on the WWW at the \EMBOSS\ web
site.\URL{http://emboss.sf.net/}

FreeBSD distributions from 4.2 onwards now include \EMBOSS\ as an
optional package maintained by Johann
Visagie.\URL{johann\@@egenetics.com} Please see section
\ref{sec:FreeBSD} for more information on installation on FreeBSD.

\chapter{Installation}
\section{Retrieving \EMBOSS\ by anonymous ftp}
\subsection{Interactive FTP}

Change directory to the location in which you wish to download the
\EMBOSS\ source code. In this example we will download the source to
\filename{/packages/EMBOSS}. Then start your ftp client and point it
to emboss.open-bio.org.

\begin{verbatim}
% ftp emboss.open-bio.org
Connected to emboss.open-bio.org.
220 (vsFTPd 2.0.1)
530 Please login with USER and PASS.
530 Please login with USER and PASS.
KERBEROS_V4 rejected as an authentication type
Name (emboss.open-bio.org:someuser):
\end{verbatim}

We are using anonymous FTP so type the username \ilcomm{anonymous}.

\begin{verbatim}
Name (emboss.open-bio.org:someuser): anonymous
331 Guest login ok, send your complete e-mail address as password.
Password:
\end{verbatim}

Enter your email address here as the password for user \filename{anonymous}.

\begin{verbatim}
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp>
\end{verbatim}

Move to the \EMBOSS\ directory and list the files. The output has been
truncated a little to save space.

\begin{verbatim}
ftp> cd /pub/EMBOSS
ftp> ls
200 PORT command successful.
150 Opening BINARY mode data connection for /bin/ls.
total 22334
...     1024 May 26 20:17 .gnu
...  9079913 May 14 21:37 EMBOSS-2.5.0.tar.gz
...       19 May 14 21:37 EMBOSS-latest.tar.gz -> EMBOSS-2.5.0.tar.gz
...   196872 May 12 18:49 EMNU-1.0.5.tar.gz
...   231485 May 15 13:55 ESIM4-1.0.0.tar.gz
...   405620 May 12 18:49 HMMER-2.1.1.tar.gz
...     1024 Jul 25 08:54 Jemboss
...   264189 May 12 18:49 MEME-2.3.1.tar.gz
...   251061 Jul  9 19:01 MSE-0.0.4.tar.gz
...   694450 May 12 18:49 PHYLIP-3.573c.tar.gz
...   200490 May 12 18:49 TOPO-0.1.tar.gz
...     1536 Jul  9 19:01 old
...      512 Jun 27 14:40 patchfiles
...      512 Feb 22 15:19 tutorials
226 Transfer complete.
ftp>
\end{verbatim}

Now download the source files

\begin{verbatim}
ftp> get EMBOSS-latest.tar.gz
200 PORT command successful.
150 Opening BINARY mode data connection for EMBOSS-latest.tar.gz 
(9079913 bytes).
...
ftp>
\end{verbatim}

And repeat for each file. Or use \ilcomm{mget *gz} to download all the
files at once.  Exit your ftp session with the command \ilcomm{bye}.

\subsection{FTP using \progname{wget}}
The program \progname{wget} can be used to download a remote directory
noninteractively. More details on \progname{wget} can be obtained from
the Free Software Foundation.\URL{http://www.gnu.org} Assuming you
have \progname{wget} installed, use the following command which
generates a lot of output on the screen:

\begin{verbatim}
% wget -m 'ftp://emboss.open-bio.org/pub/EMBOSS'
--15:04:41--  ftp://emboss.open-bio.org:21/pub/EMBOSS
           => `emboss.open-bio.org/pub/.listing'
Connecting to emboss.open-bio.org:21... connected!
Logging in as anonymous ... Logged in!
==> TYPE I ... done.  ==> CWD pub ... done.
==> PORT ... done.    ==> LIST ... done.

...
many pages truncated
...

FINISHED --15:04:55--
Downloaded: 2,657,366 bytes in 4 files
\end{verbatim}

A new directory \filename{emboss.open-bio.org} has been created and
EMBOSS can be found at \filename{emboss.open-bio.org/pub/EMBOSS}. You
may wish to create a symbolic link to this from your
\filename{/packages} directory for convenience.


\section{Unpacking}

You will have downloaded the \EMBOSS\ and EMBASSY packages to a
suitable directory. For this example we will assume you have
downloaded them to \filename{/packages} so you should now have the
following files (or similar) and maybe more packages in EMBASSY.

\begin{verbatim}
% ls
EMBOSS-latest.tar.gz
EMNU-1.0.5.tar.gz
ESIM4-1.0.0.tar.gz
HMMER-2.1.1.tar.gz
MEME-2.3.1.tar.gz
MSE-0.0.4.tar.gz
PHYLIP-3.573c.tar.gz
TOPO-0.1.tar.gz
\end{verbatim}

First unpack the \EMBOSS\ distribution

\begin{verbatim}
% gunzip EMBOSS-latest.tar.gz
% tar xf EMBOSS-latest.tar
\end{verbatim}

This will create a new directory, \filename{EMBOSS-2.5.0} or
similar. You may wish to use \ilcomm{tar xpf} for unpacking \EMBOSS.

Enter the \EMBOSS\ directory

\begin{verbatim}
% cd EMBOSS-2.5.0
\end{verbatim}

create a directory for the EMBASSY packages

\begin{verbatim}
% mkdir embassy
\end{verbatim}

Now move the EMBASSY packages to the EMBASSY directory

\begin{verbatim}
% mv ../MSE-0.0.4.tar.gz PHYLIP-3.573c.tar.gz \
   TOPO-0.1.tar.gz embassy
\end{verbatim}

Go into the EMBASSY directory and unpack those packages.

\begin{verbatim}
% cd embassy

% gunzip MSE-0.0.4.tar.gz
% tar xf MSE-0.0.4.tar
\end{verbatim}

and so on for each EMBASSY package.

Go back up one directory to the main \EMBOSS\ package directory and
prepare to start compilation.

\section{Graphics Requirements}

Depending on your system you may need to explicitly configure the
graphics. EMBOSS includes the plplot graphics library and will link to
X11 and the recent (non-GIF) releases of the gd graphics library which
also require libz and libpng (and possibly libjpeg). Please see the
section 'Configuring \EMBOSS\ graphics' below.

To get PLPLOT to produce PNG images you will need to have the
\filename{z}\URL{http://www.info-zip.org/pub/infozip/zlib/},
\filename{png}\URL{http://libpng.sourceforge.net/} and
\filename{gd}\URL{http://www.boutell.com/gd/} libraries
installed. \filename{gd} version $>=$ 1.8.4 is recommended. A recent
release must be used as older versions support GIF which is NOT
supported in later versions because of software patent problems.  If
for some reason you do not have the required libraries and your system
support group will not update them for the system then install all
three latest versions (\filename{z},\filename{gd},\filename{png}) to a
new directory and then add this new directory to your configure line
for \EMBOSS\ --- \verb+./configure --with-pngdriver=my_dir+ where the
\filename{z}, \filename{png} and \filename{gd} libraries were each
installed using \verb+./configure --prefix=my_dir+

??? It may also be helpful to ensure that the \ilcomm{LD\_LIBRARY\_PATH}
environment variable is set appropriately to include the libraries in
the path. ???

   GD)  http://www.boutell.com/gd/
   Z)   http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/
   PNG) http://www.mirror.ac.uk/sites/ftp.libpng.org/pub/png/libpng.html

   These also list the various mirror sites for non UK people.

   Alternatively, using ftp :-

   GD)  (boutell.com no longer allows FTP, no known mirror sites, use HTTP)
   Z)   ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz
   PNG) ftp://swrinde.nde.swri.edu/pub/png/src/libpng.1.2.1.tar.gz
   You can unpack the tar.gz files in any directory, and install them in
   a common area.

   By default everything (including EMBOSS) installs
   in /usr/local but in the examples below we use /home/joe/local

   Note: gd does not use a ./configure script, and will fail at the
   "make install" stage if the installation directory does not have a
   /bin subdirectory. You can create this directory
   (e.g. /home/joe/local/bin) if it does not already exist.

\subsection{zlib}

Zlib is avilable from these sites:

\filename{http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/}
\URL{http://www.mirror.ac.uk/sites/ftp.cdrom.com/pub/infozip/zlib/}
\filename{http://www.info-zip.org/pub/infozip/zlib/}
\URL{http://www.info-zip.org/pub/infozip/zlib/}
\filename{ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz}
\URL{ftp://ftp.info-zip.org/pub/infozip/zlib/zlib-1.1.3.tar.gz}

To install, pick up the sources and then:

\begin{verbatim}
% gunzip -c zlib-1_1_3_tar.gz   | tar xf -
% ln -s zlib-1.1.3   zlib
%  cd zlib
%  ./configure --prefix=/home/joe/local
%  make
%  make install
%  cd ..
\end{verbatim}

\subsection{libpng}

Libpng is avilable from these sites:

\URL{http://libpng.sourceforge.net/}
\URL{http://www.mirror.ac.uk/sites/ftp.libpng.org/pub/png/libpng.html}
\URL{ftp://swrinde.nde.swri.edu/pub/png/src/libpng.1.2.1.tar.gz}

To install, pick up the sources and then:

\begin{verbatim}
% gunzip -c libpng-1_2_1_tar.gz | tar xf -
%   ln -s libpng-1.2.1 libpng
%   cd libpng
%   cp scripts/makefile.linux makefile
\end{verbatim}

Libpng has no configure script so you have to do some work by
hand. Edit makefile, change prefix to be /home/joe/local and any
other places - some files point to ../zlib  others use
/usr/local/lib and /usr/local/include. On HP-UX this is
trickier. CFLAGS has to match the definition for zlib.

Now build using the edited makefile:

\begin{verbatim}
%   make
%   make install
%   cd ..
\end{verbatim}


\subsection{gd}

Gd is available from these sites:

\URL{http://www.boutell.com/gd/}

There is no FTP server at this site.

To install, pick up the sources, build zlib and libpng first, and then:

\begin{verbatim}
% gunzip -c gd-1.8.4.tar.gz     | tar xf -
% ln -s gd-1.8.4     gd
% cd gd
\end{verbatim}

Now edit Makefile, change the definitions for INCLUDEDIRS, LIBDIRS,
INSTALL\_LIB, INSTALL\_INCLUDE, INSTALL\_BIN, and change all
\filename{/usr/local} to \filename{/home/joe/local}

\begin{verbatim}
% make
% make install
% cd ..
\end{verbatim}

If the gd "make install" fails with a warning about the "bin"
directory, you need to create it by hand (see above).

To compile with the local version your EMBOSS configure line should
now read:

\begin{verbatim}
./configure --with-pngdriver=/home/joe/local
\end{verbatim}

This will look for the graphics libraries in your local installation
under \filename{/home/joe/local} instead of a system-wide location

configure keeps a copy of the previous settings. With earlier releases
of EMBOSS, or as a developer with an earlier release of autoconf, you
may need to delete files \filename{config.cache} and
\filename{config.status} if configure has been run before.

\section{Compilation}

Building \EMBOSS\ is easy. It follows the usual GNU style of
\ilcomm{./configure}, \ilcomm{make}, \ilcomm{make install}. We'll take
these steps one at a time.

\subsection{Configure}

To accept the default configuration, just type \ilcomm{./configure}
and let \EMBOSS\ get on with it. You may however want to make some
changes to the configuration parameters according to your local
policy. This section will not cover all the possibilities, just some
of the more common. The configuration script will attempt to find the
necessary components in your system to determine how to successfully
build \EMBOSS. It typically expects the GNU C compiler (gcc) and
several standard libraries that should already be part of your
Unix/Linux system. \EMBOSS\ should configure, compile and run on most
modern Linux distributions straight out of the box.


\subsubsection{Installation directory}

You need to have write permission on the directory in which you
eventually wish to install \EMBOSS. You may also wish to put it
somewhere else other than the standard location of
\filename{/usr/local/emboss}.

The installation directory is controlled by the \ilcomm{--prefix}
argument. For example, you can have all third party applications owned
by a non-privileged user and installed in a package specific directory
under \filename{/site/prog}

\begin{verbatim}
% ./configure --prefix=/site/prog/emboss
\end{verbatim}

will install \EMBOSS\ under \filename{/site/prog/emboss}. The binaries
will be installed in \filename{/site/prog/emboss/bin} with shared
libraries installed in \filename{/site/prog/emboss/lib}. System wide
data are installed in \filename{/site/prog/emboss/share/EMBOSS/data},
and the configuration files (ACD files) for the applications will be
installed in \filename{/site/prog/emboss/share/EMBOSS/acd} (or for
EMBASSY in directories corresponding to the package name.)
Documentation is installed in
\filename{/site/prog/emboss/share/EMBOSS/doc}.  The installation
directory should be specified using a full path otherwise interesting
failures may occur.

The individual directories for installation can be modified with other
configuration commands but this is usually not necessary. Run
\ilcomm{./configure --help} to get more information on the directories
that can be changed and other configuration options.

Run \ilcomm{./configure} with the options you wish to use. This may
take a short time as various messages scroll up the screen.

All should be well with this and configure should exit with a message
like this:

\begin{verbatim}
... much output skipped

creating ./config.status
creating plplot/Makefile
creating plplot/lib/Makefile
creating nucleus/Makefile
creating ajax/Makefile
creating emboss/Makefile
creating emboss/acd/Makefile
creating test/Makefile
creating test/data/Makefile
creating test/embl/Makefile
creating test/pir/Makefile
creating test/swiss/Makefile
creating test/swnew/Makefile
creating test/wormpep/Makefile
creating emboss/data/Makefile
creating emboss/data/AAINDEX/Makefile
creating emboss/data/CODONS/Makefile
creating emboss/data/REBASE/Makefile
creating emboss/data/PRINTS/Makefile
creating emboss/data/PROSITE/Makefile
creating Makefile
\end{verbatim}

Configuration is now complete.

\subsubsection{Reconfiguration}

If at first you don't succeed, try, try and try again. It is not
uncommon to make typos or other mistakes when running
\ilcomm{./configure}. If you want to run configure again you should
run \ilcomm{make clean} before running \ilcomm{./configure} with
(hopefully) the correct options. With an earlier EMBOSS release, or as
a developer with an earlier release of autoconf, you must first delete
the file \filename{config.cache} but this is no longer produced.

\subsubsection{Configuring \EMBOSS\ graphics}

The PLPLOT library can produce output to many devices but requires
certain libraries that are NOT distributed with \EMBOSS

To get X-windows based output you must have X installed, or else PLplot
will not build the required driver. You may need to specify the
location of your X-windows library with the configuration options:
\ilcomm{--x-includes=DIR} (X include files are in DIR)
\ilcomm{--x-libraries=DIR} (X library files are in DIR)

To explicitly configure PLPLOT without X-windows, use \ilcomm{--without-x}.

You can explicitly tell \EMBOSS\ to not include PNG support with
\ilcomm{--without-pngdriver}.

 You can tell if \ilcomm{./configure} has
found a suitable PNG library by watching for something like the
following when running \ilcomm{./configure}:

\begin{verbatim}
checking if png driver is wanted... yes
checking for inflateEnd in -lz... (cached) yes
checking for png_destroy_read_struct in -lpng... (cached) yes
checking for gdImageCreateFromPng in -lgd... (cached) yes
\end{verbatim}

This means that the configuration script has located the PNG libraries
on your system. If you see a message indicating that
\ilcomm{./configure} could not find the libraries or that the version
of \filename{gd} was too old then you should install the latest
versions of the libraries yourself and rerun configure with the
correct \ilcomm{--with-pngdriver} value.

When you run an EMBOSS graphical application you can see the list of
installed graph devices by giving '?' as the response to the 'Graph
type' prompt.

\subsection{Configuring for 64 bit systems}

\EMBOSS\ configure looks for \progname{gcc} and uses this of
preference when compiling \EMBOSS. This is not ideal for those who
wish to have a compiled and linked 64bit version of \EMBOSS. The
current version is NOT 64 bit clean (ie. it does not necessarily use
64 bit representation internally) but will compile and run quite
happily on 64 bit systems.

Additional notes are appended below for the various operating systems
we have information on.

\subsubsection{IRIX 6.5.10}

In order to compile for 64 bit on IRIX you have to specify the native
compiler in 64 bit mode (\ilcomm{cc -64}) and the linker in 64 bit
mode (\ilcomm{/bin/ld -64}). The following notes were provided by Jose
Ramon Valverde\footnote{jrvalverde\@@cnb.uam.es}.


{\it We have succeeded in compiling EMBOSS for IRIX using 64 bit
compilation.

It required some tweaking, but works. The recipe for those willing to
give it a try is: }

\begin{itemize}
	\item remove '\filename{gcc}' from your path
	\item define \filename{COMPILER\_DEFAULTS\_PATH} appropriately
	(see \filename{pe\_environ}) to look for a
	\filename{compiler.defaults} file containing
	e.g. \ilcomm{:abi=64:isa=4:proc=r10k}
	\item \ilcomm{./configure} in \EMBOSS\ and all EMBASSY subdirs
	\item search in all files for '\ilcomm{CC = cc}' and
	substitute it for '\ilcomm{CC = cc -64}'
	\item same for '\ilcomm{LD = /bin/ld}' to '\ilcomm{LD = /bin/ld -64}'
	\item \ilcomm{make}
\end{itemize}

{\it The reason is that compiling depends on the Makefile and on libtool,
as well as linking. We didn't spend much in looking at configure since
the above steps where so straightforward. We know we should look into
the configure script and add an option for 64-bit-irix-compile or some
such, but that'll have to wait till we have time for it.

Yes, we know, the search and substitute thing looks tedious, but it
isn't, honest: create a 'chfile.sh' out of the EMBOSS source hierarchy
containing: }

\begin{verbatim}
#/bin/sh
cp \$1 \$1.orig
mv \$1 tmpfile
sed -e 's/CC="cc"/CC="cc -64"/g' tmpfile | \
sed -e 's/CC = cc/CC = cc -64/g' | \
sed -e 's/\/bin\/ld/\/bin\/ld -64/g' \$1 
rm tmpfile
## if you are sure, uncomment this 
#rm \$1.orig
\end{verbatim}

{\it
'\ilcomm{cd}' to the \filename{emboss} directory and run}

\begin{verbatim}
	find . -type f -exec /path/to/chfile.sh {} \; -print
\end{verbatim}

{\it and you are done with the \progname{CC}
changes. \progname{Libtool} requires special treatment since it uses
quotes.  }

\subsection{Building \EMBOSS}

Building \EMBOSS\ is a matter of typing '\ilcomm{make}' and going to
find something else to do for the next ten minutes to half an hour
depending on the speed of your system. \EMBOSS\ will first build the
shared libraries (\filename{PL\_PLOT}, \filename{AJAX}, and
\filename{NUCLEUS}) and then build the applications.

You may see plenty of warnings (especially on SGI systems) complaining
about libraries not being used to resolve any symbols. These can be
safely ignored.

If all goes according to plan you should have built \EMBOSS
successfully. If not you will have to try to work out why the build
failed. If you can't work it out yourself, send an email describing
the problem to emboss-bug@emboss.open-bio.org preferably with a copy of the
output from the installation.

Assuming that compilation was successful, you can\footnote{You don't
have to do this. You can leave \EMBOSS\ where it is and just add the
path to the \filename{emboss} directory to your \ilcomm{PATH}} now
type '\ilcomm{make install}'. After a few minutes and many pagefuls of
messages, \EMBOSS\ should be installed where you specified in the
\ilcomm{--prefix} option (or in the default location of
\filename{/usr/local/emboss} if \ilcomm{--prefix} was not specified).

\subsection{Post compilation setup}

You will now need to make a few adjustments to your enviromnent to
ensure that \EMBOSS\ runs smoothly.  \EMBOSS\ looks for certain
environment variables to determine where the libraries and data are
found. These instructions assumed you installed \EMBOSS\ in
\filename{/site/prog/emboss}. Adjust these instructions to suit your
installation.  Insert the following lines at the end of
\filename{/etc/cshrc} (or \filename{~/.cshrc} for a personal
installation)

\begin{verbatim}
setenv PLPLOT_LIB /site/prog/emboss/lib
set path=( /site/prog/emboss/bin \${path} )
\end{verbatim}

Or for bash/ksh/sh users, insert the following at the end of
\filename{/etc/profile} or \filename{~/.bashrc}

\begin{verbatim}
PLPLOT_LIB=/site/prog/emboss/lib
PATH=/site/prog/emboss/bin:\$PATH
export PLPLOT_LIB PATH
\end{verbatim}

\EMBOSS\ should now be ready for use.

\subsection{\EMBOSS\ data files}

\EMBOSS\ will by default install the data files (including those
installed with \progname{Rebaseextract}, \progname{Prosextract}
\progname{Printsextract} \progname{Aaindexextract} or
\progname{Cutgsextract}) in the default directory
\filename{share/EMBOSS/data} in the install prefix directory.  If
\EMBOSS\ is not installed (for example, your own personal
installation) the data files are written to \filename{emboss/data} in
the directory where emboss was built.

If you want to place your data files elsewhere, or have a separate set
of datafiles you wish to use, you can set the \ilcomm{EMBOSS\_DATA}
variable in \filename{emboss.default} or, for personal use, in your \filename{.embossrc} file.

\subsection{Testing your \EMBOSS\ installation}

You can test your \EMBOSS\ installation by trying the program
'\ilcomm{wossname}'

\begin{verbatim}
% wossname -auto |more
\end{verbatim}

This should give a long list of programs that are available. Press
space to page down through the list. This is just the \EMBOSS
programs and doesn't include any of the EMBASSY programs, but only
because they are not yet installed. (Note: Although wossname does have
a -noembassy option this does not work with installed programs because
wossname can no longer find any difference between EMBOSS and EMBASSY)

\section{Installing EMBASSY}

As well as the base libraries and standard EMBOSS distribution,
various extra packages (EMBASSY) are distributed with EMBOSS.

To install an EMBASSY package, go to the relevant directory. For
example to install PHYLIP (which was unpacked into
\filename{/packages/EMBOSS-2.5.0/embassy/PHYLIP-3.573c} earlier) go to
the relevant directory.

\begin{verbatim}
% cd  /packages/EMBOSS-2.5.0/embassy/PHYLIP-3.573c
% ./configure --prefix=/site/prog/emboss
... output not shown
% make
... output not shown
% make install
... output not shown
\end{verbatim}

Note. You {\bf MUST} use the same arguments for \ilcomm{./configure}
that you used for the installation of the main \EMBOSS\ package. It
may be necessary to add other options as required by individual
packages (see below).

Repeat as necessary for the other EMBASSY packages. It should also be
noted that certain EMBASSY packages may require additional libraries.
 
You should now find that running \progname{wossname} as before lists
the EMBASSY programs.

\subsection{EMBASSY package specific notes}

In most cases, EMBASSY packages should build with no problems. Known
problems are described below.

\subsubsection{Packages with no known problems}
So far \progname{ESIM4}, \progname{HMMER}, \progname{MEME},
\progname{MSE}, \progname{PHYLIP} and \progname{TOPO} appear to
install without a problem using the same arguments to
\ilcomm{configure}.

\subsubsection{\progname{EMNU}}

\progname{EMNU} requires the \filename{curses} or \filename{ncurses} libraries 
that come as standard on most Unix-like systems. In particular \progname{EMNU} 
requires two header files \filename{form.h} and \filename{menu.h} that are not 
distributed with all implementations.

If your \filename{curses/ncurses} 
library is installed in a strange place then you may need to instruct 
\ilcomm{configure} with the option 

\begin{verbatim}
--with-curses=/path/to/curses
\end{verbatim}


\section{Installing \EMBOSS\ in package format}
\label{sec:FreeBSD}
\EMBOSS\ can be installed on almost all Unix/Linux operating systems
using the instructions above, but the package format can be far more
convenient.  A package is a precompiled set of binaries with
installation instructions that can be set up on your system with a
minimum of work. In some cases the package will check for the correct
libraries and install those as necessary.

Brief instructions are given here for the packages of which we are
aware. These are maintained separately from the main source tree and
may also install some files in operating system standard locations
instead of the locations used by the `raw' \EMBOSS
distribution. Please read the more detailed instructions that
accompany each package.

\subsection{Installing \EMBOSS\ on FreeBSD}

A FreeBSD \EMBOSS\ package has been created by Johann
Visagie\URL{johann\@@egenetics.com} of Electric Genetics. This will be
distributed on the installation CD's and through the normal
distribution channels from FreeBSD version 4.2 onwards.

For the FreeBSD user with an up-to-date ports tree\footnote{FreeBSD
users can update their ports tree through a variety of
mechanisms. Please see the FreeBSD specific guide produced by Johann
for more information}, installing \EMBOSS\ reduces to two simple
commands (as root):

\begin{verbatim}
# cd /usr/ports/biology/emboss
# make install
\end{verbatim}

The FreeBSD specific parts of the port are that
\filename{emboss.default} is included with the other configuration
files under \filename{/usr/local/etc} as
\filename{emboss.default.sample}, and the \EMBOSS\ documentation is
installed in \filename{/usr/local/share/doc/EMBOSS} instead of the
default location.  For further information on installation under
FreeBSD you are referred to the Resources chapter.


\chapter{Configuration}
 
\EMBOSS\ can be readily configured to match your requirements. In a
standard installation of \EMBOSS\ the configuration directives are
looked for in the following locations and in the following search
order:
\begin{enumerate}
\item A file \filename{emboss.default} in the \filename{share/EMBOSS}
subdirectory of your \EMBOSS\ installation.\footnote{This location may
have been redefined in installations of \EMBOSS\ that have been
packaged for specific operating systems. See section \ref{sec:FreeBSD}
for further information on OS specific package
installations.}\footnote{\EMBOSS\ will also look in the
\filename{emboss} directory under the \EMBOSS\ source distribution for
\filename{emboss.default.template} and install this as
\filename{emboss.default} if no existing file is found under the
installation directory}
\item A file \filename{.embossrc} in the directory specified by the
\ilcomm{EMBOSSRC} environment variable.
\item A file \filename{.embossrc} in the users home directory.
\end{enumerate}
\filename{emboss.default} and \filename{.embossrc} are plain text
files that can readily be edited to suit.\footnote{A sample
\filename{emboss.default} is located in \filename{emboss/acd} under
the source distribution.} Redefinitions of configuration parameters
will override those previously defined. In the descriptions that
follow only \filename{.embossrc} will be mentioned but all directives
can be placed in \filename{emboss.default} for site wide
configuration.

Several aspects of \EMBOSS\ can be defined. These are:
\begin{itemize}
\item\EMBOSS\ environment variables
\item\EMBOSS\ databases
\item Default behaviour of \EMBOSS\ programs
\end{itemize}
Databases are by far the most complex of these. 

\EMBOSS\ will ignore blank lines in the \filename{emboss.default} and
\filename{.embossrc} files. It will also ignore any lines beginning
with \ilcomm{\#} or \ilcomm{!} allowing comments to illuminate the
declarations in the file.


\section{\EMBOSS\ environment variables}

\EMBOSS\ environment variables are set with an '\ilcomm{env}' or a
'\ilcomm{set}' declaration. '\ilcomm{env}' and '\ilcomm{set}' are
interchangeable.  The most important environment variable is the
location of the \filename{.acd} files that describe each program.

\begin{verbatim}
set emboss_acdroot /site/prog/emboss/share/EMBOSS/acd
\end{verbatim}

Environment variables are useful for simplifying maintenance of your
\filename{.embossrc}. For example you may want to specify the location
of your databases as an environment variable. Then if you move the
databases you only have to update one line in the configuration file.

\begin{verbatim}
set emboss_database_dir /data/databases/flatfiles
\end{verbatim}

This would then be referred to later in \filename{.embossrc} as

\begin{verbatim}
\$emboss_database_dir/embl 
\end{verbatim}

for the directory  \filename{/data/databases/flatfiles/embl}

\subsection{Configuring \EMBOSS\ differently for different groups of users}
It may be the case that you have users who need to share a specific
setup. Maybe to have access to different sets of databases or need to
use a different data directory.

It can be time consuming and error prone to maintain a series of
individual \filename{.embossrc} files or to cause users to have to
work in the same directory or to copy an \filename{.embossrc} to each
directory they wish to work in.  The environment variable
\ilcomm{EMBOSSRC} can be set to point to an arbitrary directory
containing an \filename{.embossrc} which can then be used to give
workgroup specific configuration. Each user then only needs to set
\ilcomm{EMBOSSRC} in their \filename{.cshrc} (\progname{csh}) or
\filename{.profile} (\progname{bash}) to get the workgroup specific
setup.

In our case we have several groups of researchers for whom we maintain
biological sequence databases. These databases have been made
available under restrictive licenses so that we cannot allow
researchers outside the groups to access the databases. Using
\ilcomm{\$EMBOSSRC} we can set up a common configuration for the
members of each group by defining the databases in the
\filename{\$EMBOSSRC/.embossrc} file.


\section{Databases}

\subsection{Database access modes}

\EMBOSS\ offers three modes for accessing databases:
\begin{description}

       \item[Single:]\EMBOSS\ retrieves a single sequence indexed by
       ID.

       \item[Query:]\EMBOSS\ retrieves a set of sequences
       corresponding to a query that can return more than one entry,
       including accession numbers or wildcard IDs.

       \item[All:]\EMBOSS\ returns all the sequences in the database
       in no particular order.

\end{description}

Each database definition can configure one or many of these modes for
database access.

Typically \EMBOSS\ uses variations on the \progname{emblcd} system of
database indexing to provide rapid access in single and query modes to
flat file databases. The \progname{emblcd} method is implemented in a
variety of ways depending on the original format of your database.
The \progname{emblcd} method assumes that you have one or both of ID
and accession number in each record and that they are unique for the
whole database index.  \EMBOSS\ also provides methods for retrieving
sequences via the WWW and three specific methods for interaction with
SRS\URL{http://www.lionbioscience.com/solutions/srs} installed localy
or through a remote public server.  For other non flatfile databases
or flat file databases in formats not currently supported by \EMBOSS
you will have to configure an external application to retrieve
sequences.

\subsection{General database configuration.}

Each database is configured using a DB declaration.

The generalised form is 

\begin{verbatim}
DB databasename [

Configuration options

]
\end{verbatim}

The configuration options are tag/value pairs and must contain at
least a description of the access method (using \ilcomm{method:} or
one or more of \ilcomm{methodsingle:}, \ilcomm{methodquery:} and
\ilcomm{methodall:}) and a description of the original format of the
sequences (using \ilcomm{format:}).  In addition to these tags there
will be other tags that are needed for particular methods and other
tags that are optional.

\subsubsection{Database access methods}

The scope of each method is:

\begin{description}

\item[Single mode - \ilcomm{s}] Supports retrieval of a single
sequence.

\item[Query mode - \ilcomm{q}] Supports retrieval of a subset of the
sequences in the database specified using a wild card query in the
USA\footnote{Please see the \EMBOSS\ documentation for description of
Uniform Sequence Address format}

\item[All mode - \ilcomm{a}] Supports retrieval of all sequences in
the database as a stream of data.

\end{description}

An example entry for each access method is shown.

\paragraph{APP}\par\noindent
Modes: \ilcomm{a q s}\par\noindent
APP is the same as EXTERNAL.

\paragraph{BLAST}\par\noindent
Modes: \ilcomm{a q s} \par\noindent BLAST uses EMBLCD indices created
with \progname{dbiblast} to access databases in BLAST format, created
with NCBI's \ilcomm{formatdb} program.

Note that the latest 'format version 4' is not yet documented by
NCBI. \EMBOSS\ will only work with 'format version 3' databases, indexed
with:

\begin{verbatim}
formatdb -A F
\end{verbatim}

We hope to support 'format version 4' databases in future. If you pick
up a blast database from NCBI (or elsewhere) check the format. If it
is in the new format, you will need to pick up the original FASTA
format file, and either index it yourself with formatdb, or run
\ilcomm{dbifasta} and use the FASTA file in \EMBOSS\ (see EMBLCD
access method)

The definition should use format: ncbi because this is what the blast
formatdb databases store internally.

\begin{verbatim}
DB mydb [
#required parameters
   method: "blast"
   format: "ncbi"
   type: "N"
   dir: "\$emboss_db_dir/blas"t
#optional parameters
   fields: "sv des"
   release: "63.0"
   comment: "my comment"
   indexdir: "\$emboss_db_dir/blastindices"]
\end{verbatim}

The index files can be kept in the same directory as the database, but
as each EMBLCD index needs its own directory (the filenames are fixed)
the indexdir is usually defined.

The EMBLCD index files include the filenames indexed by
\ilcomm{dbiblast}. You can use the file: and exclude: attributes to
create file-specific subsets from a single \ilcomm{dbiblast} generated
index, but as blast index files are split only by the number of
entries this is not generally useful.

If the database was indexed with additional fields, they can be
included in the definition as fields: to allow their use in USAs.

\paragraph{DIRECT}\par\noindent
Modes: \ilcomm{a}\par\noindent Direct accesses the flatfile
directly. It returns all the database entries, one after the other. It
assumes no indexing. Queries are still possible as \EMBOSS\ will read
each entry and match it against the query, but are slow as the entire
database must be read.
 
\begin{verbatim}
DB mydb [ 
#required parameters
   method: "direct"
   format: "embl"
   type: "N"
   dir: "\$emboss_db_dir/mydb"
   file: "*.dat"   
#optional parameters
   fields: "sv des key org"
   release: "63.0"
   comment: "My own database with no indices"
   exclude: "est*.dat"
]
\end{verbatim}

For most cases, it is simpler to use \ilcomm{dbiflat} for EMBL,
Genbank or SwissProt format, or \ilcomm{dbifasta} to index FASTA or NCBI
format files, and to use the EMBLCD access method.

If the file format supports additional fields, they can be
included in the definition as fields: to allow their use in USAs.

\paragraph{EMBLCD}\par\noindent
Modes: \ilcomm{a q s}\par\noindent EMBLCD uses EMBLCD indices created
with \progname{dbiflat} or \progname{dbifasta} to access flatfile
databases in the original format.

\begin{verbatim}
DB mydb [
#required parameters
   method: "emblcd"
   format: "embl"
   type: "N"
   dir: "\$emboss_db_dir/emb"l
#optional parameters
   fields: "sv des key org"
   file: "*.dat"
   release: "63.0"
   comment: "my comment"
   exclude: "est*.dat"
   indexdir: "\$emboss_db_dir/indice"s
]
\end{verbatim}

The EMBLCD index files include the filenames indexed by
\ilcomm{dbiflat} or \ilcomm{dbifasta}. You can use the file: and
exclude: attributes to create file-specific subsets from a single
index.

This method can require careful setup. Please read the more specific
descriptions below.

If the database was indexed with additional fields, they can be
included in the definition as fields: to allow their use in USAs.

\paragraph{EXTERNAL}\par\noindent
Modes: \ilcomm{a q s}\par\noindent EXTERNAL uses an external
application to retrieve sequences.  The ID is passed as an argument to
the application, either replacing \%s in the command string (if
present) or as an additional argument (if there is no \%s).

EXTERNAL requires the application to return the sequence on STDOUT. If
the application writes to somewhere else, simply wrap it in a script
that copies the output to STDOUT.

\begin{verbatim}
DB mydb [
#required parameters
    method: "app"
    format: "fasta"
    type: "P"
    app: "getfromdb"
#optional parameters
    comment: "my own protein database with a custom retrieval program"
    app: "getfromdb mydatabase \%s"
]
\end{verbatim}

The first app: definition will use the default call 'getfromdb mydb:id'

The alternative app: definition will use the \%s format and call
'getfromdb mydatabase id'

Both will pass either the ID or accession from the query, so that USAs
mydb-id:x13776 and mydb-acc:x13776 are equivalent.

\paragraph{GCG}\par\noindent
Modes: \ilcomm{a q s}\par\noindent GCG uses EMBLCD indices created
with \progname{dbigcg} to access databases in GCG format. This method
uses the \filename{.ref} and \filename{.seq} files created by the
\progname{GCG} suite of programs.

\begin{verbatim}
DB mygcgdb [
#required parameters
   method: "gcg"
   format: "embl"
   type: "N"
   dir: "\$emboss_db_dir/gcgembl"
#optional parameters
   fields: "sv des key org"
   file: "*.seq"
   release: "63.0"
   comment: "my comment"
   exclude: "est*"
   indexdir: "\$emboss_db_dir/indices"
]
\end{verbatim}

The EMBLCD index files include the filenames indexed by
\ilcomm{dbigcg}. You can use the file: and exclude: attributes to
create file-specific subsets from a single \ilcomm{dbigcg} generated
index.

\paragraph{SRS}\par\noindent
Modes: \ilcomm{a q s}\par\noindent SRS returns entries from a local
installation of SRS using the -e switch to getz to return entries in
the original format.

\begin{verbatim}
DB mydb [
#required parameters
   method: "srs"
   format: "embl"
   type: "N"
#optional parameters
   dbalias: "embl"
   fields: "sv des key org"
   app: "getz"
   comment: "My srs indexed database"
   release: "63.0"
]
\end{verbatim}

This access method builds an SRS commandline query to getz. If you
have getz installed under another name, define this as app:

The SRS query by default uses the EMBOSS database name. If the
database has a different name in SRS, define dbalias: as the database
name to pass to SRS.

SRS will return the results using 'getz -e' so the format should match
the format of the original data. For some formats this can be tricky
(PIR for example), so consider using SRSFASTA although this will lose
information that is not included in the FASTA format SRS output.

To query using the additional fields SRS supports, add them as fields:

\paragraph{SRSFASTA}\par\noindent
Modes: \ilcomm{a q s}\par\noindent
As SRS but returns the sequences in FASTA format. The definition must
include format: fasta so that EMBOSS will read the results in FASTA
format.

\begin{verbatim}
DB mydb [
#required parameters
   method: "srsfasta"
   format: "fasta"
   type: "N"
#optional parameters
   dbalias: "embl"
   fields: "sv des key org"
   app: "getz"
   comment: "My srs indexed database"
   release: "63.0"
]
\end{verbatim}

This access method builds an SRS commandline query to getz. If you
have getz installed under another name, define this as app:

The SRS query by default uses the EMBOSS database name. If the
database has a different name in SRS, define dbalias: as the database
name to pass to SRS.

SRS will return the results using 'getz -f -sf fasta' so the format
must be 'fasta'.

To query using the additional fields SRS supports, add them as fields:

\paragraph{SRSWWW}\par\noindent
Modes: \ilcomm{a q s}\par\noindent
As URL, but specific to an SRS web server. This method takes a base
URL (up to wgetz) for an SRS server, and builds the rest of the URL as
a valid SRS query.

By building the URL, SRSWWW access can query both ID and accession
number, and can query additional fields 'sv', 'des', 'key' and 'org'
if they are allowed with a fields definition.

\begin{verbatim}
DB mydb [
# required parameters
    method: "srswww"
    format: "genbank"
    type: "N"
    url: "http://www.infobiogen.fr/srs5bin/cgi-bin/wgetz?"
#optional parameters
    dbalias: "genbank"
    fields: "sv des key org"
    comment: "Genbank by SRS from InfoBiogen"
    proxy: ":"
    httpversion: "1.0"
]
\end{verbatim}

Because queries for such fields to a remote server can find a very
large number of hits, and EMBOSS will load the entire output into
memory to process the HTML, many EMBOSS administrators choose not to
define these fields for an SRSWWW server.

If there is sufficient demand, it should be possible to rewrite the
HTML preprocessing to avoid buffering in memory.

SRSWWW support the \ilcomm{proxy} and \ilcomm{httpversion} settings
described under access method URL.

\paragraph{URL}\par\noindent
Modes: \ilcomm{s}\par\noindent URL uses a defined web server to
retrieve a specific entry. EMBOSS may fail if the HTML causes
complications with parsing of the entry.

\begin{verbatim}
DB mydb [
# required parameters
   method: "url"
   format: "genbank"
   type: "N"
   url: "http://www.infobiogen.fr/srs5bin/cgi-bin/wgetz?-e+[genbank-id:%s]"
#optional parameters
   comment: "Genbank by ID from InfoBiogen"
]
\end{verbatim}

The \%s in the URL string indicates where \EMBOSS\ will insert the
identifier portion of the USA.

At many sites, remote HTTP access is controlled by a proxy
server. EMBOSS uses a proxy server defined as EMBOSS\_PROXY with a
value in the format \ilcomm{domain.address:port}, for example:

\begin{verbatim}
set emboss_httpversion 'proxy.mydomain.org:8080'
\end{verbatim}

This is a global definition. For selected databases (local web-based
services, for example) you can turn off the proxy inside the database
definition with:

\begin{verbatim}
DB [ ...
  proxy: ":"
]
\end{verbatim}

HTTP access by default used HTTP protocol version 1.0. EMBOSS can also
support version 1.1, which provides chunked HTML results to improve
improve network performance. The HTTP version is controlled by a
variable EMBOSS\_HTTPVERSION and by a DB attribute, for example:

\begin{verbatim}
set emboss_httpversion "1.1"
\end{verbatim}

or

\begin{verbatim}
DB [ ...
  httpversion: '1.1'
]
\end{verbatim}

\subsection{Mixed access methods}

For any given \ilcomm{method:} declaration, \EMBOSS\ will use that
method for those access modes supported by the method.

If you wish to specify which access mode (all, query or single) should
be handled by which database retrieval method then the
\ilcomm{methodsingle:}, \ilcomm{methodquery:} and \ilcomm{methodall:}
declarations should be used instead of \ilcomm{method:}

\begin{verbatim}
DB mydb [
methodsingle: app
format: fasta
app: "customapp myproteindb"
methodall: direct
dir: \$emboss_db_dir/myproteindb
file: myproteindb.dat
type: P
comment: "single and all access for myproteindb"
]
\end{verbatim}

You can mix these, for example, to use a script to query a file, and
direct acces to read all entries,

\begin{verbatim}
  methodall: 'direct'
  methodquery: 'external'
\end{verbatim}

\subsection{Indexing and configuring flatfile databases}

Flatfile databases are plain text files in a defined format such as
those released by EMBL, Swissprot and so on. The \EMBOSS\ program
\progname{dbiflat} is used to generate EMBLCD indices that can be used
for all types of database access. \progname{dbiflat} can process
databases in EMBL, SWISSPROT and GENBANK format. Pseudo EMBL format
databases which do not have unique ID and AC entries may cause
\progname{dbiflat} to do mysterious things and should be avoided.

\progname{dbiflat} (and the EMBLCD access method) requires the
databases to be uncompressed. The examples given here will not probe
the deeper secrets of \progname{dbiflat} (for which the reader is
referred to the documentation, or failing that the source code) but
will show a typical installation for a common database.

We assume that \EMBOSS\ has been installed and works. This can be
tested with the command \ilcomm{wossname -auto} which should list all
the programs available.

In this example we will index and configure the EMBL database for use
with \EMBOSS.

First download and unpack the EMBL database. This will require a
considerable amount of disk space. If you do not have sufficient space
available then just download a subset of the database.

Use \ilcomm{cd} to move the directory in which you have unpacked
EMBL. This should look something like this when you run \ilcomm{ls}:

\begin{verbatim}
% ls
est_fun.dat
est_hum1.dat
est_hum10.dat
.
Output truncated
.
syn.dat
unc.dat
vrl.dat
vrt.dat
\end{verbatim}

Run \progname{dbiflat} to create the EMBLCD indices.

\begin{verbatim}
% dbiflat

Index a flat file database
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
        GB : Genbank, DDBJ
Entry format [SWISS]: EMBL   
Database name: embl
Database directory [.]: 
Wildcard database filename [*.dat]: 
Release number [0.0]: 63.0
Index date [00/00/00]: 31/07/00
\end{verbatim}

\progname{dbiflat} should happily chug away for some considerable time
(up to a few hours depending on the speed of your machine) and will
generate (eventually) the following index files:

\begin{verbatim}
% ls
acnum.hit
acnum.trg
division.lkp
entrynam.idx
\end{verbatim}

Now we create an entry in the \EMBOSS\ configuration files to acces
sthe database. It is probably a good idea to try new database
definitions in your local configuration file first.

Put the following entry in your \filename{.embossrc}

\begin{verbatim}
DB embl [
   type: N
   method: emblcd
   format: embl
   dir: \$emboss_db_dir/embl
   file: "*.dat"
   release: "63.0"
   comment: "EMBL release 63.0"
]
\end{verbatim}

you will have needed to predefine \ilcomm{\$emboss\_db\_dir} using a
directive such as

\begin{verbatim}
set emboss_db_dir /path_to_databases
\end{verbatim} 

somewhere in your \filename{emboss.default} or \filename{.embossrc}.

Save \filename{.embossrc} and try \progname{showdb}. You should see a
line that looks like:

\begin{verbatim}
% showdb
.. output deleted
embl          N    OK  OK  OK  EMBL release 63.0
.. output deleted
\end{verbatim}

\subsection{Fine tuning the installation:}
\label{sec:finetune}
It is probably a good idea to set up subsections of the database so
that end users can search just the regions they wish to search. This
section applies to all access methods that use EMBLCD style indexes
and probably to others as well.

Files can be included with the declaration \ilcomm{file:} or excluded
with the declaration \ilcomm{exclude:}. It is a good idea to put the
wild card directory specifier (\filename{*/})in front of the filename
to ensure that any path that may be included in
\filename{division.lkp} will be matched. Please note especially the
notes for \progname{GCG} formatted databases indexed with
\progname{dbigcg}.

In order to just take the EST files in our EMBL database try the following:

\begin{verbatim}
DB emblest [
   type: N
   method: emblcd
   format: embl
   dir: \$emboss_db_dir/embl
   file: "est*.dat"
   release: "63.0"
   comment: "EMBL release 63.0"
]
\end{verbatim}

Files can also be given as a space separated list enclosed in
quotes. For example to set up a database of all mamallian sequences
(except genomes) try the following:

\begin{verbatim}
DB emblallmam [
   type: N
   method: emblcd
   format: embl
   dir: \$emboss_db_dir/embl
   file: "rod*.dat hum*.dat mam*.dat"
   release: "63.0"
   comment: "EMBL release 63.0"
]
\end{verbatim}

As you can see from these two examples, the \ilcomm{file:} tag takes a
space delimited list of filenames enclosed in quotes that can contain
normal wildcard (\ilcomm{?*}) characters.

It can be quite tedious to set up a long list of sequences to
search. In many cases you can use the \ilcomm{exclude:} tag to make
things easier.

\begin{verbatim}
DB emblnoest [
   type: N
   method: emblcd
   format: embl
   dir: \$emboss_db_dir/embl
   file: "*.dat"
   exclude: "est*.dat"
   release: "63.0"
   comment: "EMBL release 63.0"
]
\end{verbatim}

This configures the \filename{emblnoest} database to contain all of
EMBL except the EST's.

\subsection{Indexing and configuring GCG format databases}

\EMBOSS\ can access GCG formatted databases, thus avoiding having
multiple copies of the same databases in different formats for those
who still use GCG alongside the flatfiles.  \EMBOSS\ creates EMBLCD
like indices for the GCG format databases using the program
\progname{dbigcg}.  This runs in much the same way as
\progname{dbiflat}. You will need the GCG format \filename{.seq} and
\filename{.header} files in order to create an EMBLCD indexed
database.

Move to the GCG database directory containing your data and run
\progname{dbigcg}

\begin{verbatim}
Index a GCG formatted database
      EMBL : EMBL
     SWISS : Swiss-Prot, SpTrEMBL, TrEMBLnew
        GB : Genbank, DDBJ
       PIR : NBRF
Entry format [EMBL]: 
Database name: embl
Database directory [.]: 
Wildcard database filename [*.seq]: 
Release number [0.0]: 63.0
Index date [00/00/00]: 31/07/00
\end{verbatim}

The program will chug along for a while and will then generate the
EMBLCD index files for the GCG format database.

When \progname{dbigcg} prompts for the entry format (\ilcomm{Entry
format [EMBL]:}) you should enter the original database format before
you ran \progname{embltogcg} or similar to generate the \progname{GCG}
databases.

The following entry should be put in your \filename{.embossrc}

\begin{verbatim}
DB gcgembl [
   type: N
   method: gcg
   format: embl
   dir: \$emboss_db_dir/embl
   file: "*.dat"
   release: "63.0"
   comment: "EMBL release 63.0"
]
\end{verbatim}

\progname{showdb} should show your newly configured database.

You can configure subsets of the databases in the same way as for the
original format databases, described in section \ref{sec:finetune}
above. One difference to \progname{dbiflat} indexing is that both the
\filename{.seq} and \filename{.header} files are listed in the
\filename{division.lkp} file. \ilcomm{file:} and \ilcomm{exclude:}
directives should therefore be of the form \ilcomm{exclude:
*/em\_est*} instead of just \ilcomm{*/em\_est*.seq}.

\subsection{Indexing and configuring BLAST databases}
BLAST format databases are generated for efficient homology searching
using the BLAST programs. It can be convenient to avoid redundant
copies of databases so \EMBOSS\ provides a mechanism for accessing
these databases.

BLAST format databases are those generated using the tools distributed
with NCBI-BLAST or with WU-BLAST.

\begin{comment}At present \EMBOSS
will only index BLAST databases created from FASTA format input files
with one of the recognised header formats.  More information on the
relevant formats can be found in subsection \ref{subsec:fasta}
below.
\end{comment}

For indexing of one BLAST database, move to the
directory containing your BLAST format databases and run
\progname{dbiblast}

\begin{verbatim}
Index a BLAST database
Database name: blastsw
Database directory [.]: 
database base filename [blastsw]: 
Release number [0.0]: 
Index date [00/00/00]: 
         N : nucleic
         P : protein
         ? : unknown
Sequence type [unknown]: p
         1 : wublast and setdb/pressdb
         2 : formatdb
         0 : unknown
Blast index version [unknown]: 2

\end{verbatim}

The program will chug along for a while and will then generate the
EMBLCD index files for the BLAST format database.

The following entry (or one like it that is more appropriate to your
particular installation) should be put in your \filename{.embossrc}

\begin{verbatim}
DB blastsw [
   type: P
   method: blast
   format: ncbi
   dir: \$emboss_db_dir/blastsw
   file: "blastsw"
   release: "38.9"
   comment: "BLAST format Swissprot"
]
\end{verbatim}

\progname{showdb} should show your newly configured database.

Because of the way BLAST works, many sites may group their BLAST
databases in the same directory. You can index these {\it in situ}
with \progname{dbiblast} but this may require some extra steps if your
databases are not of the same type as generation of subsequent index
files will overwrite those that already exist. To avoid overwriting of
index files you can index many databases with one set of index files,
or you can use the \ilcomm{indexdir} options to place the indices in a
different directory.

There are two requirements for indexing several databases together in
one index. The first is that the databases are the same type
(protein/nucleic acid) and generated with the same tool (pressdb or
formatdb); the second is that all the ID and accession numbers in the
combined databases are unique.

Run \progname{dbiblast} as before but specify all the databases you
wish to be included when prompted for the database filename.

\begin{verbatim}
Index a BLAST database
Database name: alldbs
Database directory [.]: 
database base filename [alldbs]: dbone dbtwo dbthree dbfour 
Release number [0.0]: 
Index date [00/00/00]: 
         N : nucleic
         P : protein
         ? : unknown
Sequence type [unknown]: p
         1 : wublast and setdb/pressdb
         2 : formatdb
         0 : unknown
Blast index version [unknown]: 2

\end{verbatim}

These can then be configured as described in section
\ref{sec:finetune} above by using the '\ilcomm{file:}' and
'\ilcomm{exclude:}' tags as appropriate.\footnote{There is one
difference to the standard EMBLCD access method in that the database
indexes will not allow the generation of exclusive subsections of the
combined database. If an ID or accession number is specified that is
present in the index then the sequence will be returned irrespective
of which database it is in.}

When you have databases of different types, generated with different
programs or where the ID/accession numbers are duplicated between
databases the preferred strategy is probably to keep the source data
for the individual databases in separate directories and index them
there.\footnote{Keeping one directory with symbolic links for your
BLAST installation will ensure that BLAST continues to function
correctly if you set BLASTDB to point to the directory containing the
symbolic links. The EMBOSS indices can be placed wherever you wish as
long as you remember to run \progname{dbiblast} with the appropriate
options and put an appropriate \ilcomm{indexdir} tag in the DB
configuration in your ~/.embossrc}

Alternatively you can place the index files in a separate
directory. This requires that you run \progname{dbiblast} with the
\ilcomm{-indexdirectory} option and set the \ilcomm{indexdir:} tag in
the database configuration to point to the correct database. The
example below illustrates database configuration using the
\ilcomm{indexdir} options.

\begin{verbatim}
% dbiblast -indexdir=/databases/indices/mydb
Index a BLAST database
Database name: mydb
Database directory [.]: 
database base filename [mydb]: 
Release number [0.0]: 
Index date [00/00/00]: 
         N : nucleic
         P : protein
         ? : unknown
Sequence type [unknown]: p
         1 : wublast and setdb/pressdb
         2 : formatdb
         0 : unknown
Blast index version [unknown]: 2

\end{verbatim}

The corresponding entry in \filename{~/.embossrc} (or
\filename{emboss.default}) would look like:


\begin{verbatim}
DB mydb [
   type: P
   method: blast
   format: ncbi
   dir: \$emboss_db_dir/blastsw
   indexdir: /databases/indices/mydb
   file: mydb
   release: "1.0"
   comment: "My BLAST DB with an index in a different directory"
]
\end{verbatim}

Again, multiple indices cannot coexist in the same directory so care
should be taken when using the \ilcomm{indexdir} options that an
existing database index is not overwritten.

\begin{comment}
\subsubsection{FASTA formats used with \progname{dbiblast}}
\label{subsec:fasta}
The following FASTA formats are recognised by \progname{dbiblast}:

\begin{tabular}[t]{|l|l|}\hline \setlength{\baselineskip}{1.2\baselineskip}
GENBANK/NCBI & \ilcomm{> \ldots |accno|id \ldots }\\
\hline
GCG & \ilcomm{>{\sl dbname}:accno id \ldots }\\
\hline
SIMPLE &\ilcomm{ >accno id \ldots} \\
\hline
ID & \ilcomm{>id}\\
\hline
\end{tabular}
\ilcomm{...} refers to any text. Note that the ID must be the only
item in the header for the ID format.

\end{comment}
\subsection{Indexing and configuring FASTA databases}

The FASTA specifications just define the sequence file as a header
line that begins with \ilcomm{>} and subsequent lines containing the
sequence.  The header line can be present in an almost infinite number
of formats, several of which can be processed by \EMBOSS.  \EMBOSS
attempts to determine the accession number and/or ID for each
sequence.  For indexing purposes there is no semantic difference
between an accession number and an ID. In the real world, acession
numbers are immutable, ie. they do not change with subsequent releases
of the dataabse, but ID's may change. In any case IDs and accession
numbers are unique, and that is all that matters for database indexing
\EMBOSS.

The program used to process FASTA format databases is
\progname{dbifasta}. It can recognise the following header line
formats, specified on the command line:

\begin{tabular}[t]{|l|l|}\hline\setlength{\baselineskip}{1.5\baselineskip}
simple &%
\ilcomm{>id ...}\\
\hline
idacc &%
\ilcomm{>id accno ...}\\
\hline
gcgid &%
\ilcomm{>db:id ...}\footnotemark[\value{footnote}]\\
\hline
gcgidacc &%
\ilcomm{>db:id acc ...}\footnotemark[\value{footnote}]\\
\hline
dbid &%
\ilcomm{>db id ...}\footnotemark\\
\hline
ncbi &%
\ilcomm{>...[|accno]|id ...}\footnotemark\\
\hline
\end{tabular}
\addtocounter{footnote}{-1} \footnotetext{{\em db} is one word}
\addtocounter{footnote}{1} \footnotetext{The ID is always taken to be
the characters after the last bar (\ilcomm{|}). The previous field is
also indexed but ONLY if it looks like an accession number
(e.g. AC00001).}


Other header formats will not be recognised by \progname{dbifasta} and
will cause indexing and/or database lookup to fail. If you have a
different header format that \progname{dbifasta} cannot yet handle you
have two options:
\begin{enumerate}
\item (The preferred option) Get a C programmer to modify the source
code for \progname{dbifasta} and recompile. If you are a community
spirited person you will also contribute these changes to the main
\EMBOSS\ source tree. (email emboss-dev\@@emboss.open-bio.org for more
information on contributing changes to the \EMBOSS\ source code and/or
read the \EMBOSS\ developers documentation)
\item (The quick hack) Write a custom script (using
e.g. BioPerl\URL{http://www.bioperl.org}) to access your database and
use \ilcomm{method: external} to configure it. This is less desirable
as you may be limited in the access modes you can use.
\end{enumerate}

To index a FASTA format database, run \progname{dbifasta}.

\begin{verbatim}
% dbifasta
Index a fasta database
    simple : >ID
     idacc : >ID ACC
     gcgid : >db:ID
  gcgidacc : >db:ID ACC
      ncbi : >blah|...[|ACC]|ID
ID line format [idacc]: 
Database name: mydb
Database directory [.]: 
Wildcard database filename [*.dat]: mydb.fasta
Release number [0.0]: 
Index date [00/00/00]: 
\end{verbatim}

\progname{dbifasta} will chug along for a little while and will
produce the index files. You can use the same \ilcomm{indexdir}
options as for \progname{dbiflat},\progname{dbigcg} and
\progname{dbiblast} to place the indices in a different directory.

Place the following entry in your \filename{.embossrc}

\begin{verbatim}
DB mydb [
        type: P
        method: emblcd
        format: fasta
        dir: \$emboss_db_dir/mydb
	file: mydb.fasta
        comment: "My database"
]
\end{verbatim}

\ilcomm{format:} should be \ilcomm{dbid}, \ilcomm{ncbi} or
\ilcomm{fasta} (for every format except \ilcomm{dbid} or
\ilcomm{ncbi}. The same \ilcomm{file:} and \ilcomm{include:} tags can
be used as for the other database indexing programs.


\subsection{Configuring \EMBOSS\ to use SRS for database lookup.}

\ilcomm{method: srs} is really a special case of \ilcomm{method:
external} with some additional features.

SRS is a powerful database querying system that can cross reference
between different databases, launch applications and so on. SRS can be
run either through a web interface (see the description of the URL
method above for an example) or via the command line program
\progname{getz}.  Indexing and configuring databases for SRS is
outside the scope of this document which will describe how to connect
to preconfigured and indexed SRS databases.\footnote{For information
on configuring and indexing SRS databases please look at the SRS
administrators guide \filename{www/doc/srsadmin.pdf} in your SRS 6
installation} If \progname{getz} is already in your \ilcomm{PATH}
environment variable then insert the following (or similar) in your
\filename{.embossrc}:

\begin{verbatim}
 DB emblgetz [ 
    type: N 
    method: srs 
    release: "63" 
    format: embl
    comment: 'EMBL using getz' 
    dbalias: embl 
    app: getz 
]
\end{verbatim}

This will provide access to the SRS database 'embl' as
\ilcomm{emblgetz:acc}. If the SRS database has a different name to the
\EMBOSS\ database (as is the case here) then the \ilcomm{dbalias:} tag
should be used to access the correct SRS database.

This configuration can be extremely slow for the all access mode. It
is probably a better idea to set up the database as follows:

\begin{verbatim}
 DB emblgetz [ 
    type: N 
    methodquery: srs 
    release: "63" 
    format: embl
    comment: 'EMBL using getz' 
    dbalias: embl 
    app: getz 
    methodall: direct
    file: "*.dat"
    dir: \$emboss_db_dir/embl
]
\end{verbatim}

which will use \ilcomm{method: srs} for the \ilcomm{query} access mode
but will use \ilcomm{method: direct} for the \ilcomm{all} access mode,
thus speeding up reading of the whole database.

The SRSFASTA access method is identical to the normal SRS method
except that it returns the sequence in FASTA format and so does not
need a \ilcomm{format:} tag.


\subsection{Indexing and configuring other databases}

Many institutions may have local databases set up in their own
Laboratory Information Management System. \EMBOSS\ provides a simple
mechanism for interfacing with such systems.

As long as a program is available that can be called noninteractively
and returns the specified sequence on standard output, \EMBOSS\ can
interface with it.  Use method: app or external (the two are
equivalent) and app: "program command".  The ID given in the USA will
be appended to the command used to run the program. It is probably
best to specify the methods available using the method subsets,
methodall:, methodquery: and methodsingle: rather than using the
generic method: tag.


\section{Other data}

\EMBOSS\ can be integrated with some common biological
databases. These are described in this section.

\subsection{REBASE}

Rebase is the restriction enzyme database maintained by New
England Biolabs. It is needed for programs such as remap and
restrict.

The latest version of Rebase can be obtained by anonymous
FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/rebase} \EMBOSS\ needs
the \filename{withrefm} file. The data is extracted for \EMBOSS\ with
the program \progname{rebaseextract}.

\begin{verbatim}
% mkdir /site/prog/emboss/data/REBASE
% rebaseextract
Extract data from REBASE
Full pathname of WITHREFM: /data/rebase/withrefm.208
\end{verbatim}

Rebase is now installed and ready to use.

\subsection{TRANSFAC}

Transfac is the transcription factor binding site database. It is
available by anonymous
FTP.\footnote{ftp://transfac.gbf.de/pub/transfac/ascii/} Unpacking the
distribution reveals a file called site.dat. This is the one \EMBOSS
needs.

Run \progname{tfextract} to extract the data from TRANSFAC.

\begin{verbatim}
% tfextract
Extract data from TRANSFAC
Full pathname of transfac SITE.DAT: /databases/transfac/site.dat
\end{verbatim}

\progname{tfscan} can now access the TRANSFAC database.

\subsection{PROSITE}

Prosite is a database of regular expressions that match potentially
diagnostic regions for structural/functional classification of
proteins. \EMBOSS\ needs this database for the patmatmotifs program.

PROSITE can be obtained via anonymous
FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/prosite}

You may need to create a PROSITE subdirectory under data in the
\EMBOSS\ installation directory.

Then run \progname{prosextract} to build the \EMBOSS\ Prosite database.

\begin{verbatim}
% prosextract
Builds the PROSITE motif database for patmatmotifs to search
Enter name of prosite directory: /data/prosite
\end{verbatim}

PROSITE is now integrated into your EMBOSS installation.

\subsection{PRINTS}

Prints is a database of diagnostic patterns of blocks of sequence
homology in protein families. The PRINTS database can be searched
using the \EMBOSS\ program \progname{pscan}.

PRINTS can be obtained via anonymous
FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/prints} The database
is made available as compressed files which should be uncompressed
using \progname{gzip} before integrating them into \EMBOSS

PRINTS is integrated with \EMBOSS\ using the program \progname{printsextract}

\begin{verbatim}
% printsextract
Extract data from PRINTS
Input file: /data/prints/prints27_0.dat
\end{verbatim}

The PRINTS database is now integrated with \EMBOSS.

\subsection{AAINDEX}

An amino acid index is a set of 20 numerical values representing any
of the different physicochemical and biological properties of amino
acids.  The AAindex1 section of the Amino Acid Index Database is a
collection of published indices together with the result of cluster
analysis using the correlation coefficient as the distance between two
indices.  This section currently contains 437 indices in release
\filename{4.0} of the database.

The \EMBOSS\ programs \progname{pepwindow} and {pepwindowall} plot
hydrophobicity using the data from an Aaindex entry. If Aaindex is
installed these programs can plot the other amino acid properties.

Aaindex can be obtained via anonymous
FTP.\footnote{ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex/aaindex1}

Aaindex is integrated with \EMBOSS\ using the program \progname{aaindexextract}

\begin{verbatim}
% aaindexextract
Extract data from AAINDEX
Full pathname of file aaindex1: /data/aaindex/aaindex1
\end{verbatim}

The AAINDEX database is now integrated with \EMBOSS.

\subsection{CUTG}

The CUTG database contains a series of codon usage tables calculated
from GenBank.

CUTG can be obtained via anonymous
FTP.\footnote{ftp://ftp.ebi.ac.uk/pub/databases/cutg/ or
ftp://ftp.kazusa.or.jp/pub/codon/current/}

CUTG is integrated with \EMBOSS\ using the program
\progname{cutgextract} which writes files to the CODONS data
directory.

\begin{verbatim}
% cutgextract
Extract data from CUTG
CUTG directory [.]: /data/cutg/
\end{verbatim}

The CUTG database is now integrated with \EMBOSS.

\subsection{Miscellaneous data files}

Other data files should be kept in the data directory under the main
\EMBOSS\ installation. Individual users personal data files can be
kept in the current working directory, a subdirectory
\filename{.embossdata} of the current directory, their home directory
or a subdirectory \filename{.embossdata} of their home
directory. \EMBOSS\ will search these locations in this order and will
stop as soon as it finds a matching file. If the personal directories
do not contain the desired file, \EMBOSS\ will search the system wide
data directory, \filename{/site/prog/emboss/data} in this example.

Apparently inexplicable errors when running \EMBOSS\ programs may be
caused by the system not using the data files one expects. The search
path can be displayed in search order using the command
\progname{embossdata}.

\section{Default program settings}

As with many other areas, the default behaviour of programs can be
controlled by setting appropriate values in \filename{.embossrc}.

All general qualifiers\footnote{See the \EMBOSS\ Quick Guide or the
web documentation (or use \ilcomm{wossname -help -verbose}) for an
overview of general qualifiers.} can be specified as

\begin{verbatim}
set emboss_QUALIFIER 1
\end{verbatim}

where \ilcomm{QUALIFIER} is one of the general qualifiers and the
value can be \ilcomm{1} or \ilcomm{1} for true, or \ilcomm{0} or
\ilcomm{N} for false.

Setting the qualifier value to true has the effect of running every
program with that qualifier set.\footnote{You can specifically unset
it by using the \ilcomm{-noQUALIFIER} command line option} Qualifiers
can be set and will work in the same way as if you set them when
running the program. For example you can \ilcomm{set emboss\_verbose
Y} and the program will run normally, but when the program is run with
the \ilcomm{-help} qualifier, the output will be in verbose form.

There is no point in globally setting options that are there for
producing help output.

Qualifiers that can be set:

\begin{description}

\item[VERBOSE] Causes \ilcomm{-help} to print verbose text.

\item[STDOUT] Causes all output to go to \filename{STDOUT} as
default. Programs will usually build a default output file name form
the input sequence and the program name.

\item[DEBUG] Writes debugging output to a file. Useful for finding
bugs as a command line option.

\item[OPTIONS] Enable prompting for optional parameters.

\item[FILTER] Take input from \filename{STDIN} and send it to
\filename{STDOUT}, and turn on \ilcomm{-auto}

\item[AUTO] Do not prompt for any options but accept the defaults if
no values are given.

\item[WARNING] Print warning messages to \filename{STDERR} (default is true)

\item[ERROR] Print error messages to \filename{STDERR} (default is true)

\item[FATAL] Print fatal messages to \filename{STDERR} (default is true)

\item[DIE] Print crash messages to \filename{STDERR}

\end{description}

These general qualifiers are typically used by advanced users
(\ilcomm{-options}, \ilcomm{-verbose}) or by developers
(\ilcomm{-debug -acdlog}).


Other program options that can be set are \ilcomm{emboss\_format},
\ilcomm{emboss\_acdroot}, and \ilcomm{emboss\_data}. The value of
\ilcomm{emboss\_format} determines which default sequence format to
use for output. for example, if you are running \EMBOSS\ alongside
\progname{GCG} you may wish to have the following entry in your
\progname{.embossrc}

\begin{verbatim}
set emboss_FORMAT gcg
set emboss_OUTFORMAT gcg
\end{verbatim}

which has the effect of using \progname{GCG} format by
default.\footnote{This can of course be overridden using the
\ilcomm{-sformat} and \ilcomm{-osformat} associated qualifiers. See
the \EMBOSS\ ACD Syntax documentation or the \EMBOSS\ Quick Guide for
more information.}

\ilcomm{emboss\_acdroot} \filename{/path/to/acd} can be set if you
wish to use a different directory for the ACD files, and
\ilcomm{emboss\_data} \filename{/path/to/data} if you wish to use a
separate data directory.


\section{Logging}

Many system administrators may wish to make use of the logging
facilities of \EMBOSS. Setting the variable \ilcomm{emboss\_logfile}
in \filename{emboss.default} or \filename{.embossrc} allows the system
to keep a log of which programs are used when and by whom.

\begin{verbatim}
set emboss_logfile /site/log/emboss.log
\end{verbatim}

The log file structure is very simple. Three tab separated fields are
stored, program name, user name, and the date and time.

\begin{verbatim}
prettyplot      joeuser        Wed Aug 02 14:29:13 2000
\end{verbatim}

The file defined in emboss\_logfile should be world writable. The
following command ensures logging can occur.

\begin{verbatim}
chmod +w /site/log/emboss.log
\end{verbatim}

All settings can be overridden in a users \filename{.embossrc} files
by redefining the relevant variables. So to prevent our system usage
being logged we can redefine emboss\_logfile by putting the following
entry in our \filename{.embossrc} file.

\begin{verbatim}
set emboss_logfile /dev/null
\end{verbatim}

This behaviour may change in the future to prevent users redefining
some system settings.

\chapter{Graphical interfaces to EMBOSS}

This chapter needs to be written. It will be written when the
available GUIs are stable enough to document.

\chapter{Resources}
\section{Web sites}
\subsection{Programs}
\begin{description}
\item[\EMBOSS\ source code]ftp://emboss.open-bio.org/pub/EMBOSS
\item[\EMBOSS\ Documentation]http://emboss.sf.net/
\item[BLAST tools]Tools for generating BLAST format databases are
contained in the NCBI toolkit which can be obtained from NCBI at:
\begin{quote}
http://www.ncbi.nlm.nih.gov/
\end{quote}
\item[SRS software]The SRS software can be obtained from Lion
Bioscience.\URL{http://www.lionbioscience.com/solutions/srs} This is a
commercial package but at the time of writing is available free of
charge to academic institutions.
\item[\progname{wget}]Various useful utilities including the
\progname{wget} program are available from the Free Software
Foundation.\URL{http://www.gnu.org}
\end{description}
\subsection{Databases}

Most of the databases mentioned in the text along with many others can
be obtained via anonymous ftp from the European Bioinformatics
Institute (EBI) at:
\begin{quote}
ftp://ftp.ebi.ac.uk/pub/databases
\end{quote}
Please use a mirror site where possible to avoid overloading of the
EBI's resources.

Other databases can be obtained from NCBI (Genbank,UniGene etc.)

\subsection{Other Documentation}
Please review the \EMBOSS\ documentation available on the WWW at the
URL above.

\begin{description}
\item[The \EMBOSS\ Quick guide]A pocket reference guide to using
\EMBOSS\URL{ftp://ftp.no.embnet.org/pub/EMBOSS-extra/emboss-qg.ps}.
\item[The \EMBOSS\ Tutorial]A tutorial to give an introduction to
using \EMBOSS\ for bioinformatics
users.\URL{http://www.hgmp.mrc.ac.uk/Registered/Option/emboss.html}
\item[The updated ABC guide]This is a series of bioinformatics
practicals based predominantly on
\EMBOSS.\URL{ftp://ftp.no.embnet.org/pub/ABC}
\item[EMBOSS-FreeBSD-HOWTO]Detailed documentation on installation of
\EMBOSS\ on
FreeBSD.\URL{ftp://ftp.no.embnet.org/pub/EMBOSS-extra/EMBOSS-FreeBSD-HOWTO}
\end{description}

\section{Maintainance of your \EMBOSS\ installation}

\EMBOSS\ is a rapidly evolving software packages. It is constantly
being improved, new features added and `issues' resolved. In addition
there are new applications added and you probably want to make use of
these.

\subsection{Automated installation of \EMBOSS\ and EMBASSY}

Once you have installed \EMBOSS\ and got it to work you have solved
the hardest part of the struggle. Updating \EMBOSS\ as new releases
appear\footnote{\EMBOSS\ is rebuilt nightly from CVS, tested, and,
assuming it passes the compilation tests, the latest version is posted
to the \EMBOSS\ FTP server. } can be quite tedious. UNIX is designed
for the lazy, so here is our lazy man's guide to always having an up to
the minute \EMBOSS\ installation.

The following script can be run manually (it should probably be
`\ilcomm{source}d' rather than executed directly) or can be fired off
with cron (in the early hours of the morning is a good time). It
assumes you are installing \EMBOSS\ outside the source directory and
have write permissions to do so.

\EMBOSS\ will update \EMBOSS\ distributed files but will not alter or
overwrite your own datafiles\footnote{Assuming of course that you
haven't overwritten \EMBOSS\ datafiles with your own to begin with.}
or your \filename{emboss.default}.

\begin{verbatim}

# This script should be sourced, not run.
# EMBOSS UPDATE.
# it assumes \$packages_dir/EMBOSS is a symbolic link to 
# \$mirror_dir/emboss.open-bio.org/pub/EMBOSS
#

#site specific variables: season according to taste..

set mirror_dir=('/ftp/mirrors')
set packages_dir=('/site/newprog')
set emboss_config_options=\
('--prefix=/site/prog/emboss --with-pngdriver=/site/lib')

# Now the script proper

set oldpwd=`pwd`

cd \$mirror_dir
echo 'updating EMBOSS'
if ( `wget -m 'ftp://emboss.open-bio.org/pub/EMBOSS' |& \
  tail -1 | awk '/^Downloaded:/{print \$5}'` != "0" ) then 

    cd \${packages_dir}/EMBOSS
    echo 'new EMBOSS programs found .. installing'
    set latest_emboss=`ls -t EMBOSS*|head -1`

    cd \$packages_dir
    rm -Rf EMBOSS-*
    tar zxf EMBOSS/\$latest_emboss
    set emboss_dir=`ls -dt EMBOSS-*[^z]|head -1`

#the next line is necessary on our system but may not be for yours.
    setenv LD_LIBRARYN32_PATH /site/lib

    cd \$emboss_dir

# If you have any site specific changes to the source code 
# that you want to include, copy them in here

    ./configure \$emboss_config_options &&\
    make && \
    make install

#Now unpack and build EMBASSY

    mkdir embassy
    cd embassy

#Unpack and build each package one at a time

    foreach embassadir ( `ls ../../EMBOSS/*gz |grep -v E
MBOSS-` )

	tar zxf \$embassadir
	set embassadir_arch=\$embassadir:t
	set embassadir_root=\$embassadir_arch:r

	cd \$embassadir_root:r
	./configure  \$emboss_config_options &&\
	make && \
	make install

	cd ..
    end
else
    echo 'No new version of EMBOSS available'
endif

cd \$oldpwd
\end{verbatim} 

\subsection{Automated database updating}

In the same way, scripts can be written to automatically update the
biological databases. An example is given here for REBASE. As all the
parameters for \EMBOSS\ programs can be specified on the command line
it is a trivial matter to include index generation in your nightly
update scripts. The management of a bioinformatic resource is beyond
the scope of this document, though \EMBOSS\ goes a long way towards
easing the burden of management.

\subsubsection{Automated update of REBASE}

This script will look for a new version of REBASE and install it in
\EMBOSS\ using \progname{rebaseextract}.

\begin{verbatim}
# This script should be sourced, not run.
# REBASE UPDATE. Should be run just after the beginning of the month.
set mirrors_dir=('/ftp/mirrors')
set oldpwd=`pwd`

cd \$mirrors_dir

if ( ` wget -m 'ftp://ftp.ebi.ac.uk/pub/databases/rebase/*' |& \
  tail -1 | awk '/^Downloaded:/{print \$5}'` != "0" ) then 
	cd ftp.ebi.ac.uk/pub/databases/rebase
	cp `ls -t withrefm.*.Z|head -1` withrefm.Z
	uncompress withrefm.Z
	rebaseextract \
  \${mirrors_dir}/ftp.ebi.ac.uk/pub/databases/rebase/withrefm 
	rm withrefm
endif 

cd \$oldpwd
\end{verbatim}

We make no guarantees that these scripts will work correctly on your
system. If it deletes all your files, spams your associates, scratches
your CD's and initiates a nuclear strike on a small unpopulated
pacific island it is NOT OUR FAULT.  It just happens to work for us.

\chapter{GNU Free Documentation License}

\begin{verbatim}
		GNU Free Documentation License
		   Version 1.1, March 2000

 Copyright (C) 2000  Free Software Foundation, Inc.
     59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.


0. PREAMBLE

The purpose of this License is to make a manual, textbook, or other
written document "free" in the sense of freedom: to assure everyone
the effective freedom to copy and redistribute it, with or without
modifying it, either commercially or noncommercially.  Secondarily,
this License preserves for the author and publisher a way to get
credit for their work, while not being considered responsible for
modifications made by others.

This License is a kind of "copyleft", which means that derivative
works of the document must themselves be free in the same sense.  It
complements the GNU General Public License, which is a copyleft
license designed for free software.

We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does.  But this License is not limited to software manuals;
it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book.  We recommend this License
principally for works whose purpose is instruction or reference.


1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work that contains a
notice placed by the copyright holder saying it can be distributed
under the terms of this License.  The "Document", below, refers to any
such manual or work.  Any member of the public is a licensee, and is
addressed as "you".

A "Modified Version" of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of
the Document that deals exclusively with the relationship of the
publishers or authors of the Document to the Document's overall subject
(or to related matters) and contains nothing that could fall directly
within that overall subject.  (For example, if the Document is in part a
textbook of mathematics, a Secondary Section may not explain any
mathematics.)  The relationship could be a matter of historical
connection with the subject or with related matters, or of legal,
commercial, philosophical, ethical or political position regarding
them.

The "Invariant Sections" are certain Secondary Sections whose titles
are designated, as being those of Invariant Sections, in the notice
that says that the Document is released under this License.

The "Cover Texts" are certain short passages of text that are listed,
as Front-Cover Texts or Back-Cover Texts, in the notice that says that
the Document is released under this License.

A "Transparent" copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, whose contents can be viewed and edited directly and
straightforwardly with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some widely available
drawing editor, and that is suitable for input to text formatters or
for automatic translation to a variety of formats suitable for input
to text formatters.  A copy made in an otherwise Transparent file
format whose markup has been designed to thwart or discourage
subsequent modification by readers is not Transparent.  A copy that is
not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format, SGML
or XML using a publicly available DTD, and standard-conforming simple
HTML designed for human modification.  Opaque formats include
PostScript, PDF, proprietary formats that can be read and edited only
by proprietary word processors, SGML or XML for which the DTD and/or
processing tools are not generally available, and the
machine-generated HTML produced by some word processors for output
purposes only.

The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page.  For works in
formats which do not have any title page as such, "Title Page" means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.


2. VERBATIM COPYING

You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies
to the Document are reproduced in all copies, and that you add no other
conditions whatsoever to those of this License.  You may not use
technical measures to obstruct or control the reading or further
copying of the copies you make or distribute.  However, you may accept
compensation in exchange for copies.  If you distribute a large enough
number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and
you may publicly display copies.


3. COPYING IN QUANTITY

If you publish printed copies of the Document numbering more than 100,
and the Document's license notice requires Cover Texts, you must enclose
the copies in covers that carry, clearly and legibly, all these Cover
Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on
the back cover.  Both covers must also clearly and legibly identify
you as the publisher of these copies.  The front cover must present
the full title with all words of the title equally prominent and
visible.  You may add other material on the covers in addition.
Copying with changes limited to the covers, as long as they preserve
the title of the Document and satisfy these conditions, can be treated
as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto adjacent
pages.

If you publish or distribute Opaque copies of the Document numbering
more than 100, you must either include a machine-readable Transparent
copy along with each Opaque copy, or state in or with each Opaque copy
a publicly-accessible computer-network location containing a complete
Transparent copy of the Document, free of added material, which the
general network-using public has access to download anonymously at no
charge using public-standard network protocols.  If you use the latter
option, you must take reasonably prudent steps, when you begin
distribution of Opaque copies in quantity, to ensure that this
Transparent copy will remain thus accessible at the stated location
until at least one year after the last time you distribute an Opaque
copy (directly or through your agents or retailers) of that edition to
the public.

It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to give
them a chance to provide you with an updated version of the Document.


4. MODIFICATIONS

You may copy and distribute a Modified Version of the Document under
the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified
Version filling the role of the Document, thus licensing distribution
and modification of the Modified Version to whoever possesses a copy
of it.  In addition, you must do these things in the Modified Version:

A. Use in the Title Page (and on the covers, if any) a title distinct
   from that of the Document, and from those of previous versions
   (which should, if there were any, be listed in the History section
   of the Document).  You may use the same title as a previous version
   if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities
   responsible for authorship of the modifications in the Modified
   Version, together with at least five of the principal authors of the
   Document (all of its principal authors, if it has less than five).
C. State on the Title page the name of the publisher of the
   Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications
   adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license notice
   giving the public permission to use the Modified Version under the
   terms of this License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections
   and required Cover Texts given in the Document's license notice.
H. Include an unaltered copy of this License.
I. Preserve the section entitled "History", and its title, and add to
   it an item stating at least the title, year, new authors, and
   publisher of the Modified Version as given on the Title Page.  If
   there is no section entitled "History" in the Document, create one
   stating the title, year, authors, and publisher of the Document as
   given on its Title Page, then add an item describing the Modified
   Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for
   public access to a Transparent copy of the Document, and likewise
   the network locations given in the Document for previous versions
   it was based on.  These may be placed in the "History" section.
   You may omit a network location for a work that was published at
   least four years before the Document itself, or if the original
   publisher of the version it refers to gives permission.
K. In any section entitled "Acknowledgements" or "Dedications",
   preserve the section's title, and preserve in the section all the
   substance and tone of each of the contributor acknowledgements
   and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document,
   unaltered in their text and in their titles.  Section numbers
   or the equivalent are not considered part of the section titles.
M. Delete any section entitled "Endorsements".  Such a section
   may not be included in the Modified Version.
N. Do not retitle any existing section as "Endorsements"
   or to conflict in title with any Invariant Section.

If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no material
copied from the Document, you may at your option designate some or all
of these sections as invariant.  To do this, add their titles to the
list of Invariant Sections in the Modified Version's license notice.
These titles must be distinct from any other section titles.

You may add a section entitled "Endorsements", provided it contains
nothing but endorsements of your Modified Version by various
parties--for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition of a
standard.

You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list
of Cover Texts in the Modified Version.  Only one passage of
Front-Cover Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity.  If the Document already
includes a cover text for the same cover, previously added by you or
by arrangement made by the same entity you are acting on behalf of,
you may not add another; but you may replace the old one, on explicit
permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.


5. COMBINING DOCUMENTS

You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified
versions, provided that you include in the combination all of the
Invariant Sections of all of the original documents, unmodified, and
list them all as Invariant Sections of your combined work in its
license notice.

The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy.  If there are multiple Invariant Sections with the same name but
different contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the original
author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of
Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections entitled "History"
in the various original documents, forming one section entitled
"History"; likewise combine any sections entitled "Acknowledgements",
and any sections entitled "Dedications".  You must delete all sections
entitled "Endorsements."


6. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this
License in the various documents with a single copy that is included in
the collection, provided that you follow the rules of this License for
verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute
it individually under this License, provided you insert a copy of this
License into the extracted document, and follow this License in all
other respects regarding verbatim copying of that document.


7. AGGREGATION WITH INDEPENDENT WORKS

A compilation of the Document or its derivatives with other separate
and independent documents or works, in or on a volume of a storage or
distribution medium, does not as a whole count as a Modified Version
of the Document, provided no compilation copyright is claimed for the
compilation.  Such a compilation is called an "aggregate", and this
License does not apply to the other self-contained works thus compiled
with the Document, on account of their being thus compiled, if they
are not themselves derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one quarter
of the entire aggregate, the Document's Cover Texts may be placed on
covers that surround only the Document within the aggregate.
Otherwise they must appear on covers around the whole aggregate.


8. TRANSLATION

Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections.  You may include a
translation of this License provided that you also include the
original English version of this License.  In case of a disagreement
between the translation and the original English version of this
License, the original English version will prevail.


9. TERMINATION

You may not copy, modify, sublicense, or distribute the Document except
as expressly provided for under this License.  Any other attempt to
copy, modify, sublicense or distribute the Document is void, and will
automatically terminate your rights under this License.  However,
parties who have received copies, or rights, from you under this
License will not have their licenses terminated so long as such
parties remain in full compliance.


10. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions
of the GNU Free Documentation License from time to time.  Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.  See
http://www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version number.
If the Document specifies that a particular numbered version of this
License "or any later version" applies to it, you have the option of
following the terms and conditions either of that specified version or
of any later version that has been published (not as a draft) by the
Free Software Foundation.  If the Document does not specify a version
number of this License, you may choose any version ever published (not
as a draft) by the Free Software Foundation.


ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and
license notices just after the title page:

      Copyright (c)  YEAR  YOUR NAME.
      Permission is granted to copy, distribute and/or modify this document
      under the terms of the GNU Free Documentation License, Version 1.1
      or any later version published by the Free Software Foundation;
      with the Invariant Sections being LIST THEIR TITLES, with the
      Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
      A copy of the license is included in the section entitled "GNU
      Free Documentation License".

If you have no Invariant Sections, write "with no Invariant Sections"
instead of saying which ones are invariant.  If you have no
Front-Cover Texts, write "no Front-Cover Texts" instead of
"Front-Cover Texts being LIST"; likewise for Back-Cover Texts.

If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License,
to permit their use in free software.
\end{verbatim}

\chapter{Acknowledgements}

The acknowledgements and credits are found at the front of this guide
because no one ever reads them if they are at the back.

\end{document}