/usr/share/EMBOSS/doc/html/embassy/domalign/domainalign.html is in embassy-domalign 0.1.650-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 | <!-- START OF HEADER -->
<HTML><HEAD>
<TITLE> EMBASSY: DOMAINALIGN documentation. </TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFFF" text="#000000">
<table align=center border=0 cellspacing=0 cellpadding=0>
<tr><td valign=top>
<A HREF="/" ONMOUSEOVER="self.status='Go to the EMBOSS home page';return true"><img border=0 src="emboss_icon.jpg" alt="" width=150 height=48></a>
</td>
<td align=left valign=middle>
<b><font size="+6">
<H2> DOMAINALIGN documentation
</font></b>
</td></tr>
</table>
<br>
<p>
<!-- END OF HEADER -->
<!-- CONTENTS
This always includes the sections below.
Other subsections can be added for individual applications.
-->
<br><H2>CONTENTS </H2>
<b> <a href="#1.0">1.0 SUMMARY </a></b><br>
<b> <a href="#2.0">2.0 INPUTS & OUTPUTS </a></b><br>
<b> <a href="#3.0">3.0 INPUT FILE FORMAT </a></b><br>
<b> <a href="#4.0">4.0 OUTPUT FILE FORMAT </a></b><br>
<b> <a href="#5.0">5.0 DATA FILES </a></b><br>
<b> <a href="#6.0">6.0 USAGE </a></b><br>
<b> <a href="#7.0">7.0 KNOWN BUGS & WARNINGS </a></b><br>
<b> <a href="#8.0">8.0 NOTES </a></b><br>
<b> <a href="#9.0">9.0 DESCRIPTION </a></b><br>
<b> <a href="#10.0">10.0 ALGORITHM </a></b><br>
<b> <a href="#11.0">11.0 RELATED APPLICATIONS </a></b><br>
<b> <a href="#12.0">12.0 DIAGNOSTIC ERROR MESSAGES </a></b><br>
<b> <a href="#13.0">13.0 AUTHORS </a></b><br>
<b> <a href="#14.0">14.0 REFERENCES </a></b><br>
<!-- SUMMARY
Succint description of the application, particularly its inputs, outputs
and what it does. The same text is given at the top of the source (.c)
file and in the <documentation> attribute of the <application definition>
of the ACD file.
-->
<a name="1.0"></a>
<br><br><br><H2> 1.0 SUMMARY </H2>
Generate alignments (DAF file) for nodes in a DCF file
<!-- INPUTS & OUTPUTS
Short summary of the application inputs and outputs in its different
modes of usage (if appropriate). More detail than the summary.
-->
<a name="2.0"></a>
<br><br><br><H2> 2.0 INPUTS & OUTPUTS </H2>
DOMAINALIGN generates a DAF file (domain alignment file) for each user-defined node (e.g. family or superfamily) in a DCF file (domain classification file) that is read. Each DAF file contains a structure-based sequence alignment annotated with domain classification data. If the STAMP algorithm is used, structural superimpositions are also generated and saved to file (PDB format). The alignments are calculated by using STAMP or TCOFFEE and these applications must be installed on the system that is running DOMAINALIGN (see 'Notes' below).
<br>Clearly no alignment can be generated for nodes with a single entry (domain) only: sequences for such domains are (optionally) written to file (fasta format).
<br>DOMAINALIGN requires a directory of <a href ="domainalign.html#ref1"> domain PDB files</a>; the path and extension of these must be set by the user (via the ACD file) and also specified in the STAMP "pdb.directories" file (see 'Notes' below)
<br>A log file of diagnostic messages is written. The identifier (e.g SCOP Sunid) of the nodes from the DCF file are used to name the output files. The user also specifies the input file, paths for the two types of alignment files (output), path of singlet sequence files (if output) and name of log file.
<!-- INPUT FILE FORMAT
Description and example(s) of input file formats. Should provide enough
information to write and parse the file. Should describe the format in
unusual cases - null input, etc.
Cannot use the test data files because they might be empty or need
hand-editing
Use "<b>DOMAINALIGN</b> reads any normal sequence USAs." if
appropriate.
-->
<a name="3.0"></a>
<br><br><br><H2> 3.0 INPUT FILE FORMAT </H2>
The format of the DCF (domain classification file) is described in <a href="scopparse.html">SCOPPARSE documentation</a>
<a name="input.1"></a>
<h3>Input files for usage example </h3>
<p><h3>File: all.scop2</h3>
<table width="90%"><tr><td bgcolor="#FFCCFF">
<pre>
ID D1CS4A_
XX
EN 1CS4
XX
TY SCOP
XX
SI 53931 CL; 54861 FO; 55073 SF; 55074 FA; 55077 DO; 55078 SO; 39418 DD;
XX
CL Alpha and beta proteins (a+b)
XX
FO Ferredoxin-like
XX
SF Adenylyl and guanylyl cyclase catalytic domain
XX
FA Adenylyl and guanylyl cyclase catalytic domain
XX
DO Adenylyl cyclase VC1, domain C1a
XX
OS Dog (Canis familiaris)
XX
NC 1
XX
CN [1]
XX
CH A CHAIN; . START; . END;
//
ID D1FX2A_
XX
EN 1FX2
XX
TY SCOP
XX
SI 53931 CL; 54861 FO; 55073 SF; 55074 FA; 55081 DO; 55082 SO; 39430 DD;
XX
CL Alpha and beta proteins (a+b)
XX
FO Ferredoxin-like
XX
SF Adenylyl and guanylyl cyclase catalytic domain
XX
FA Adenylyl and guanylyl cyclase catalytic domain
XX
DO Receptor-type monomeric adenylyl cyclase
XX
OS Trypanosome (Trypanosoma brucei), different isoform
XX
NC 1
XX
CN [1]
XX
<font color=red> [Part of this file has been deleted for brevity]</font>
XX
EN 4AT1
XX
TY SCOP
XX
SI 53931 CL; 54861 FO; 54893 SF; 54894 FA; 54895 DO; 54896 SO; 39019 DD;
XX
CL Alpha and beta proteins (a+b)
XX
FO Ferredoxin-like
XX
SF Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
FA Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
DO Aspartate carbamoyltransferase
XX
OS Escherichia coli
XX
NC 1
XX
CN [1]
XX
CH B CHAIN; 8 START; 100 END;
//
ID D4AT1D1
XX
EN 4AT1
XX
TY SCOP
XX
SI 53931 CL; 54861 FO; 54893 SF; 54894 FA; 54895 DO; 54896 SO; 39020 DD;
XX
CL Alpha and beta proteins (a+b)
XX
FO Ferredoxin-like
XX
SF Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
FA Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
DO Aspartate carbamoyltransferase
XX
OS Escherichia coli
XX
NC 1
XX
CN [1]
XX
CH D CHAIN; 8 START; 100 END;
//
</pre>
</td></tr></table><p>
<!-- OUTPUT FILE FORMAT
Description and example(s) of output file formats. Should provide enough
information to write and parse the file. Should describe the format in
unusual cases - null input, etc.
If the standard description of the avalable report formats is required,
use: #include file="inc/reportformats.ihtml"
Use "Outputs a graph to the specified graphics device."
or "outputs a report format file. The default format is ..."
if appropriate.
-->
<a name="4.0"></a>
<br><br><br><H2> 4.0 OUTPUT FILE FORMAT </h2>
<b> Structure-based sequence alignment </b><br>
The DAF (domain alignment file) format (Figure 1) consists of an alignment in EMBOSS "simple" multiple sequence alignment format with domain classification records.
All lines other than sequence lines begin with '#' to denote a comment.
The domain classification records for the appopriate node from the DCF file are given at the top of the file above the alignment. The records shown are TY (domain type, either SCOP or CATH), CL (class), FO (fold), SF (superfamily) and FA (family). For CATH domains, AR (architecture) and TP (topology) may also be given. Regardless of which node (family, superfamily etc) is represented, a <b>unique identifier</b> for the node is given after <b>SI</b>.
Below the classification records, there are blocks that contain the sequence names, positions and aligned sequences. The names are the 7 character domain identifier codes taken from the DCF file. The positions are the start and end residue positions of the appropriate section of sequence. The sequence uses '-' as a gap character. The STAMP 'Post similar' line is given as a markup line underneath the sequence but no dssp assignments are written.
<a name="output.1"></a>
<h3>Output files for usage example </h3>
<p><h3>File: 54894.ent</h3>
<table width="90%"><tr><td bgcolor="#CCFFCC">
<pre>
REMARK Output from transform
REMARK STAMP Package (Russell and Barton Proteins, 14, 309-323, 1992)
REMARK Domains were read from the file ./domainalign-1234567890.1234.sort
REMARK Chains are labelled sequentially starting with 'A' and
REMARK after the order given in the file ./domainalign-1234567890.1234.sort
REMARK The domains in this file are:
REMARK d4at1b1 chain A
REMARK d4at1d1_1 chain B
REMARK Does not include heteroatoms
REMARK Does not include DNA/RNA
REMARK Does not include waters
ATOM 1 N GLY A 8 13.557 82.383 35.829 1.00 92.06 N
ATOM 2 CA GLY A 8 14.363 82.355 34.620 1.00 92.63 C
ATOM 3 C GLY A 8 14.480 83.797 34.150 1.00 91.80 C
ATOM 4 O GLY A 8 13.625 84.596 34.527 1.00 93.34 O
ATOM 5 N VAL A 9 15.499 84.132 33.352 1.00 90.04 N
ATOM 6 CA VAL A 9 15.799 85.525 33.008 1.00 89.55 C
ATOM 7 C VAL A 9 15.294 86.094 31.646 1.00 90.94 C
ATOM 8 O VAL A 9 14.395 85.528 31.008 1.00 91.46 O
ATOM 9 CB VAL A 9 17.381 85.731 33.167 1.00 88.13 C
ATOM 10 CG1 VAL A 9 17.571 86.421 34.506 1.00 86.27 C
ATOM 11 CG2 VAL A 9 18.232 84.456 33.191 1.00 86.95 C
ATOM 12 N GLU A 10 15.892 87.223 31.201 1.00 89.98 N
ATOM 13 CA GLU A 10 15.543 87.988 30.004 1.00 87.19 C
ATOM 14 C GLU A 10 16.748 88.251 29.055 1.00 83.25 C
ATOM 15 O GLU A 10 17.879 87.817 29.318 1.00 81.89 O
ATOM 16 CB GLU A 10 14.876 89.324 30.474 1.00 89.77 C
ATOM 17 CG GLU A 10 15.390 90.058 31.748 1.00 93.53 C
ATOM 18 CD GLU A 10 16.862 90.513 31.791 1.00 96.84 C
ATOM 19 OE1 GLU A 10 17.570 90.129 32.731 1.00 98.75 O
ATOM 20 OE2 GLU A 10 17.306 91.258 30.905 1.00 97.00 O
ATOM 21 N ALA A 11 16.471 89.023 27.987 1.00 77.76 N
ATOM 22 CA ALA A 11 17.315 89.379 26.829 1.00 71.32 C
ATOM 23 C ALA A 11 18.829 89.525 26.622 1.00 65.96 C
ATOM 24 O ALA A 11 19.605 90.124 27.370 1.00 64.35 O
ATOM 25 CB ALA A 11 16.692 90.608 26.193 1.00 70.48 C
ATOM 26 N ILE A 12 19.133 88.956 25.457 1.00 62.08 N
ATOM 27 CA ILE A 12 20.419 88.929 24.778 1.00 60.99 C
ATOM 28 C ILE A 12 19.984 89.116 23.325 1.00 61.87 C
ATOM 29 O ILE A 12 18.841 88.755 22.999 1.00 62.16 O
ATOM 30 CB ILE A 12 21.198 87.545 24.850 1.00 59.43 C
ATOM 31 CG1 ILE A 12 20.354 86.363 24.347 1.00 55.38 C
ATOM 32 CG2 ILE A 12 21.619 87.298 26.290 1.00 59.29 C
ATOM 33 CD1 ILE A 12 21.107 85.072 24.029 1.00 50.25 C
ATOM 34 N LYS A 13 20.745 89.683 22.389 1.00 62.88 N
ATOM 35 CA LYS A 13 20.307 89.668 20.991 1.00 63.76 C
ATOM 36 C LYS A 13 21.229 88.611 20.439 1.00 63.26 C
ATOM 37 O LYS A 13 22.425 88.631 20.755 1.00 63.54 O
ATOM 38 CB LYS A 13 20.570 90.941 20.133 1.00 65.83 C
ATOM 39 CG LYS A 13 21.100 92.232 20.765 1.00 68.73 C
<font color=red> [Part of this file has been deleted for brevity]</font>
ATOM 675 O LYS B 94 12.694 74.659 17.436 1.00 60.00 O
ATOM 676 CB LYS B 94 11.414 76.090 20.144 1.00 54.56 C
ATOM 677 CG LYS B 94 10.901 77.252 20.962 1.00 56.83 C
ATOM 678 CD LYS B 94 9.923 76.671 21.990 1.00 59.11 C
ATOM 679 CE LYS B 94 9.043 77.717 22.641 1.00 60.10 C
ATOM 680 NZ LYS B 94 9.890 78.702 23.259 1.00 64.17 N
ATOM 681 N SER B 95 13.444 74.034 19.484 1.00 54.16 N
ATOM 682 CA SER B 95 13.785 72.651 19.195 1.00 49.80 C
ATOM 683 C SER B 95 13.767 71.904 20.531 1.00 49.50 C
ATOM 684 O SER B 95 13.715 72.494 21.617 1.00 50.66 O
ATOM 685 CB SER B 95 15.186 72.559 18.595 1.00 49.06 C
ATOM 686 OG SER B 95 15.488 73.550 17.615 1.00 48.30 O
ATOM 687 N ARG B 96 13.817 70.590 20.497 1.00 50.27 N
ATOM 688 CA ARG B 96 13.868 69.771 21.694 1.00 49.58 C
ATOM 689 C ARG B 96 14.839 68.647 21.295 1.00 47.31 C
ATOM 690 O ARG B 96 15.006 68.414 20.088 1.00 44.69 O
ATOM 691 CB ARG B 96 12.426 69.305 21.983 1.00 53.09 C
ATOM 692 CG ARG B 96 12.197 68.404 23.200 1.00 58.05 C
ATOM 693 CD ARG B 96 10.794 68.557 23.787 1.00 61.70 C
ATOM 694 NE ARG B 96 10.618 69.895 24.354 1.00 66.11 N
ATOM 695 CZ ARG B 96 9.772 70.814 23.850 1.00 69.37 C
ATOM 696 NH1 ARG B 96 9.702 72.024 24.422 1.00 71.16 N
ATOM 697 NH2 ARG B 96 8.988 70.558 22.790 1.00 70.99 N
ATOM 698 N PRO B 97 15.591 68.002 22.198 1.00 47.31 N
ATOM 699 CA PRO B 97 16.438 66.853 21.917 1.00 47.03 C
ATOM 700 C PRO B 97 15.755 65.610 21.357 1.00 46.69 C
ATOM 701 O PRO B 97 14.800 65.056 21.929 1.00 47.44 O
ATOM 702 CB PRO B 97 17.138 66.581 23.238 1.00 48.99 C
ATOM 703 CG PRO B 97 17.256 67.940 23.887 1.00 49.17 C
ATOM 704 CD PRO B 97 15.868 68.473 23.559 1.00 49.45 C
ATOM 705 N SER B 98 16.313 65.239 20.191 1.00 45.02 N
ATOM 706 CA SER B 98 16.016 64.022 19.436 1.00 41.65 C
ATOM 707 C SER B 98 17.197 63.101 19.711 1.00 39.46 C
ATOM 708 O SER B 98 18.327 63.608 19.718 1.00 38.78 O
ATOM 709 CB SER B 98 15.952 64.291 17.940 1.00 42.24 C
ATOM 710 OG SER B 98 14.657 64.744 17.586 1.00 46.01 O
ATOM 711 N LEU B 99 16.992 61.793 19.950 1.00 36.31 N
ATOM 712 CA LEU B 99 18.078 60.869 20.273 1.00 33.04 C
ATOM 713 C LEU B 99 19.004 60.692 19.051 1.00 33.65 C
ATOM 714 O LEU B 99 18.467 60.339 17.994 1.00 35.86 O
ATOM 715 CB LEU B 99 17.420 59.573 20.697 1.00 29.39 C
ATOM 716 CG LEU B 99 18.126 58.649 21.690 1.00 28.47 C
ATOM 717 CD1 LEU B 99 18.261 59.352 23.048 1.00 27.24 C
ATOM 718 CD2 LEU B 99 17.332 57.342 21.817 1.00 25.66 C
ATOM 719 N PRO B 100 20.325 60.971 19.041 1.00 33.12 N
ATOM 720 CA PRO B 100 21.146 60.885 17.830 1.00 35.17 C
ATOM 721 C PRO B 100 21.517 59.413 17.592 1.00 38.22 C
ATOM 722 O PRO B 100 21.547 58.620 18.542 1.00 36.98 O
ATOM 723 CB PRO B 100 22.317 61.776 18.143 1.00 34.28 C
ATOM 724 CG PRO B 100 22.530 61.464 19.626 1.00 33.62 C
ATOM 725 CD PRO B 100 21.124 61.360 20.204 1.00 31.93 C
</pre>
</td></tr></table><p>
<p><h3>Directory: daf</h3>
<p>This directory contains output files, for example 54894.daf and 55074.daf.
<p><h3>File: daf/54894.daf</h3>
<table width="90%"><tr><td bgcolor="#CCFFCC">
<pre>
# TY SCOP
# XX
# CL Alpha and beta proteins (a+b)
# XX
# FO Ferredoxin-like
# XX
# SF Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# FA Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# SI 54894
# XX
# Number 10 20 30 40 50
d4at1b1 0 GVEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPXSGEMGRKDLIKIEN 0
d4at1d1 0 GVEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGXEMGRKDLIKIEN 0
# Post_similar 111111111111111111111111111111111111111111-0-111111111111
# Number 60 70 80 90
d4at1b1 0 TFLSEDQVDQLALYAPQATVNRIDNYEVVGKSRPSLP 0
d4at1d1 0 TFLSEDQVDQLALYAPQATVNRIDNYEVVGKSRPSLP 0
# Post_similar 1111111111111111111111111111111111111
</pre>
</td></tr></table><p>
<p><h3>File: daf/55074.daf</h3>
<table width="90%"><tr><td bgcolor="#CCFFCC">
<pre>
# TY SCOP
# XX
# CL Alpha and beta proteins (a+b)
# XX
# FO Ferredoxin-like
# XX
# SF Adenylyl and guanylyl cyclase catalytic domain
# XX
# FA Adenylyl and guanylyl cyclase catalytic domain
# XX
# SI 55074
# XX
# Number 10 20 30 40 50
d1cs4a_ 0 MMFHKIYIQKXHXDNVSILFADIEGFTSLASQCTAQELVMTLNELFARFDKLAAENH 0
d1fx2a_ 0 XXNNNRAPXKEPTDPVTLIFTDIESSTALWAAXHPDLMPDAVAAHHRMVRSLIGRYK 0
# Post_similar --000000-0-0-1111111111111111111-000111111111111111111111
# Number 60 70 80 90 100 110
d1cs4a_ 0 CLRIKILGDCYYCVSGLPEARADHAHCCVEMGMDMIEAISLVREMXTXGXXXXXXXX 0
d1fx2a_ 0 CYEVKTVGDSFMIAXXXXSKXXXSPFAAVQLAQELQLCFLHXHDWGTNALDDSYREF 0
# Post_similar 11111111111111----00---111111111111111111-000-0-0--------
# Number 120 130 140 150 160 170
d1cs4a_ 0 XXXXXXXXXXXXXXXXXXXXXXXXXXXVNVNMRVGIHSGRVHXCGVLGLRKWQFDVW 0
d1fx2a_ 0 EEQRAEGECEYTPPTAHMDPEVYSRLWNGLRVRVGIHTGLCDIRHDXEXVTKGYDYY 0
# Post_similar ---------------------------011111111111111-111-0-00001111
# Number 180 190 200 210 220
d1cs4a_ 0 SNDVTLANHMEAGGKAGRIHITKATLSYLNXXXGXDYEVEPGCGGERNXAYLKEHSI 0
d1fx2a_ 0 GRTPNMAARTESVANGGQVLMTHAAYMSLSAEDRKQIDVTALXGDXVALRGXVSDPV 0
# Post_similar 111111111111111111111111111111---0-1111111-00-00-00-00111
# Number 230 240 250
d1cs4a_ 0 ETFLILXXXXXXXXXXXXXXXXXX 0
d1fx2a_ 0 KMYQLNTVPSRNFAALRLDREYFD 0
# Post_similar 111111------------------
</pre>
</td></tr></table><p>
<p><h3>File: 55074.ent</h3>
<table width="90%"><tr><td bgcolor="#CCFFCC">
<pre>
REMARK Output from transform
REMARK STAMP Package (Russell and Barton Proteins, 14, 309-323, 1992)
REMARK Domains were read from the file ./domainalign-1234567890.1234.sort
REMARK Chains are labelled sequentially starting with 'A' and
REMARK after the order given in the file ./domainalign-1234567890.1234.sort
REMARK The domains in this file are:
REMARK d1cs4a_ chain A
REMARK d1fx2a__1 chain B
REMARK Does not include heteroatoms
REMARK Does not include DNA/RNA
REMARK Does not include waters
ATOM 1 N MET A 377 28.568 -27.770 32.255 1.00 73.77 N
ATOM 2 CA MET A 377 28.292 -26.443 32.794 1.00 72.28 C
ATOM 3 C MET A 377 29.325 -25.377 32.396 1.00 69.48 C
ATOM 4 O MET A 377 30.485 -25.687 32.098 1.00 67.04 O
ATOM 5 CB MET A 377 28.075 -26.504 34.312 1.00 74.79 C
ATOM 6 CG MET A 377 29.171 -27.205 35.092 1.00 78.73 C
ATOM 7 SD MET A 377 28.708 -27.446 36.824 1.00 83.74 S
ATOM 8 CE MET A 377 28.745 -25.745 37.440 1.00 81.94 C
ATOM 9 N MET A 378 28.883 -24.120 32.395 1.00 66.44 N
ATOM 10 CA MET A 378 29.698 -22.969 32.011 1.00 62.94 C
ATOM 11 C MET A 378 30.928 -22.727 32.886 1.00 59.70 C
ATOM 12 O MET A 378 32.059 -22.739 32.400 1.00 57.00 O
ATOM 13 CB MET A 378 28.824 -21.715 31.966 1.00 64.01 C
ATOM 14 CG MET A 378 27.551 -21.872 31.137 1.00 64.35 C
ATOM 15 SD MET A 378 27.897 -22.219 29.403 1.00 67.83 S
ATOM 16 CE MET A 378 27.844 -24.014 29.357 1.00 65.41 C
ATOM 17 N PHE A 379 30.697 -22.474 34.167 1.00 56.03 N
ATOM 18 CA PHE A 379 31.763 -22.225 35.123 1.00 53.16 C
ATOM 19 C PHE A 379 32.157 -23.501 35.837 1.00 51.53 C
ATOM 20 O PHE A 379 31.372 -24.440 35.914 1.00 53.09 O
ATOM 21 CB PHE A 379 31.296 -21.227 36.186 1.00 54.14 C
ATOM 22 CG PHE A 379 31.091 -19.832 35.671 1.00 51.13 C
ATOM 23 CD1 PHE A 379 29.927 -19.493 34.986 1.00 50.12 C
ATOM 24 CD2 PHE A 379 32.049 -18.846 35.907 1.00 49.63 C
ATOM 25 CE1 PHE A 379 29.716 -18.196 34.546 1.00 51.63 C
ATOM 26 CE2 PHE A 379 31.852 -17.545 35.474 1.00 50.16 C
ATOM 27 CZ PHE A 379 30.682 -17.216 34.792 1.00 51.72 C
ATOM 28 N HIS A 380 33.371 -23.529 36.372 1.00 50.09 N
ATOM 29 CA HIS A 380 33.830 -24.687 37.120 1.00 49.62 C
ATOM 30 C HIS A 380 33.046 -24.672 38.431 1.00 49.26 C
ATOM 31 O HIS A 380 32.579 -23.623 38.866 1.00 50.30 O
ATOM 32 CB HIS A 380 35.327 -24.574 37.439 1.00 52.29 C
ATOM 33 CG HIS A 380 36.235 -24.929 36.299 1.00 53.66 C
ATOM 34 ND1 HIS A 380 36.782 -23.983 35.457 1.00 53.12 N
ATOM 35 CD2 HIS A 380 36.737 -26.122 35.898 1.00 52.37 C
ATOM 36 CE1 HIS A 380 37.581 -24.576 34.588 1.00 50.60 C
ATOM 37 NE2 HIS A 380 37.572 -25.873 34.833 1.00 52.60 N
ATOM 38 N LYS A 381 32.879 -25.832 39.051 1.00 50.19 N
ATOM 39 CA LYS A 381 32.167 -25.897 40.318 1.00 51.93 C
<font color=red> [Part of this file has been deleted for brevity]</font>
ATOM 1806 CG ASP B1117 55.881 2.260 56.337 1.00 70.08 C
ATOM 1807 OD1 ASP B1117 55.414 2.017 55.197 1.00 73.38 O
ATOM 1808 OD2 ASP B1117 57.084 2.059 56.637 1.00 70.53 O
ATOM 1809 N ARG B1118 52.764 0.077 56.474 1.00 66.61 N
ATOM 1810 CA ARG B1118 52.481 -1.377 56.573 1.00 68.20 C
ATOM 1811 C ARG B1118 53.212 -2.511 55.828 1.00 68.77 C
ATOM 1812 O ARG B1118 54.421 -2.488 55.587 1.00 70.57 O
ATOM 1813 CB ARG B1118 51.033 -1.596 56.196 1.00 71.02 C
ATOM 1814 CG ARG B1118 49.988 -0.730 56.780 1.00 66.89 C
ATOM 1815 CD ARG B1118 48.805 -0.985 55.855 1.00 69.79 C
ATOM 1816 NE ARG B1118 47.548 -0.457 56.346 1.00 65.16 N
ATOM 1817 CZ ARG B1118 47.350 0.805 56.714 1.00 67.30 C
ATOM 1818 NH1 ARG B1118 48.318 1.717 56.649 1.00 62.67 N
ATOM 1819 NH2 ARG B1118 46.198 1.137 57.253 1.00 60.83 N
ATOM 1820 N GLU B1119 52.370 -3.490 55.468 1.00 74.62 N
ATOM 1821 CA GLU B1119 52.576 -4.775 54.756 1.00 71.44 C
ATOM 1822 C GLU B1119 51.216 -5.345 55.152 1.00 68.30 C
ATOM 1823 O GLU B1119 50.787 -5.071 56.255 1.00 70.28 O
ATOM 1824 CB GLU B1119 53.628 -5.683 55.451 1.00 72.63 C
ATOM 1825 CG GLU B1119 53.056 -6.991 56.217 1.00 68.78 C
ATOM 1826 CD GLU B1119 52.428 -6.728 57.622 1.00 67.37 C
ATOM 1827 OE1 GLU B1119 52.888 -5.743 58.270 1.00 63.79 O
ATOM 1828 OE2 GLU B1119 51.452 -7.444 58.067 1.00 24.30 O
ATOM 1829 N TYR B1120 50.504 -6.074 54.309 1.00 69.56 N
ATOM 1830 CA TYR B1120 49.237 -6.679 54.769 1.00 66.82 C
ATOM 1831 C TYR B1120 49.096 -7.992 54.006 1.00 69.41 C
ATOM 1832 O TYR B1120 49.547 -8.082 52.863 1.00 60.86 O
ATOM 1833 CB TYR B1120 48.062 -5.721 54.629 1.00 64.84 C
ATOM 1834 CG TYR B1120 46.960 -6.107 53.681 1.00 72.65 C
ATOM 1835 CD1 TYR B1120 47.056 -5.809 52.324 1.00 69.04 C
ATOM 1836 CD2 TYR B1120 45.764 -6.658 54.154 1.00 72.45 C
ATOM 1837 CE1 TYR B1120 45.996 -6.038 51.461 1.00 74.31 C
ATOM 1838 CE2 TYR B1120 44.688 -6.892 53.292 1.00 74.14 C
ATOM 1839 CZ TYR B1120 44.815 -6.576 51.947 1.00 77.01 C
ATOM 1840 OH TYR B1120 43.765 -6.791 51.087 1.00 73.74 O
ATOM 1841 N PHE B1121 48.537 -9.016 54.663 1.00 67.32 N
ATOM 1842 CA PHE B1121 48.401 -10.387 54.120 1.00 68.14 C
ATOM 1843 C PHE B1121 48.561 -10.745 52.626 1.00 68.72 C
ATOM 1844 O PHE B1121 49.618 -11.228 52.233 1.00 71.84 O
ATOM 1845 CB PHE B1121 47.210 -11.123 54.717 1.00 64.05 C
ATOM 1846 CG PHE B1121 47.408 -12.619 54.794 1.00 61.04 C
ATOM 1847 CD1 PHE B1121 48.271 -13.164 55.734 1.00 58.84 C
ATOM 1848 CD2 PHE B1121 46.737 -13.479 53.925 1.00 63.12 C
ATOM 1849 CE1 PHE B1121 48.466 -14.540 55.815 1.00 60.74 C
ATOM 1850 CE2 PHE B1121 46.925 -14.859 53.999 1.00 59.31 C
ATOM 1851 CZ PHE B1121 47.788 -15.392 54.941 1.00 64.18 C
ATOM 1852 N ASP B1122 47.527 -10.585 51.805 1.00 72.82 N
ATOM 1853 CA ASP B1122 47.656 -10.920 50.374 1.00 73.64 C
ATOM 1854 C ASP B1122 46.520 -10.365 49.515 1.00 74.17 C
ATOM 1855 O ASP B1122 46.464 -10.761 48.332 1.00 75.01 O
ATOM 1856 CB ASP B1122 47.790 -12.446 50.171 1.00 73.22 C
</pre>
</td></tr></table><p>
<p><h3>File: domainalign.log</h3>
<table width="90%"><tr><td bgcolor="#CCFFCC">
<pre>
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
Replaced ' ' in STAMP alignment with 'X'
</pre>
</td></tr></table><p>
<!--
<br><br><b>Figure 1 Example of DOMAINALIGN output file (structure-based sequence alignment) </b>
<table><td bgcolor="#CFCCFF">
<pre>
# TY SCOP
# XX
# CL Alpha and beta proteins (a+b)
# XX
# FO Ferredoxin-like
# XX
# SF Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# FA Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# SI 54894
# XX
# Number 10 20 30 40 50
d4at1b1 0 GVEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLP-SGEMGRKDLIKIEN 0
d4at1d1 0 GVEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSG-EMGRKDLIKIEN 0
# Post_similar 111111111111111111111111111111111111111111-0-111111111111
# Number 60 70 80 90
d4at1b1 0 TFLSEDQVDQLALYAPQATVNRIDNYEVVGKSRPSLP 0
d4at1d1 0 TFLSEDQVDQLALYAPQATVNRIDNYEVVGKSRPSLP 0
# Post_similar 1111111111111111111111111111111111111
</pre>
</td></table>
<br><br>The EMBOSS simple format is similar to the output file generated by STAMP when issued with the following three types of command:
<table><td bgcolor="#FFCCFF">
<pre>
(1) stamp -l ./stamps_file.dom -s -n 2 -slide 5 -prefix ./stamps_file -d
./stamps_file.set;sorttrans -f ./stamps_file.scan -s Sc 2.5 >
./stamps_file.sort;stamp -l ./stamps_file.sort -prefix ./stamps_file >
./stamps_file.log
(2) poststamp -f ./stamps_file.3 -min 0.5
(3) ver2hor -f ./stamps_file.3.post > ./stamps_file.out
</pre>
</td></table>
<br><br>
<b> Structural superimposition </b><br>
A structural superimposition is generated if the STAMP algorithm is selected.
PDB format is used for the DOMAINALIGN structure alignment (superposition) output file
(Figure 2). This is unmodified stamp output. A detailed
explanation of the PDB file format is available on the PDB web site:
<a href="http://www.rcsb.org/pdb/info.html#File_Formats_and_Standards">http://www.rcsb.org/pdb/info.html#File_Formats_and_Standards</a>
<br><br><b>Figure 2 Excerpt of DOMAINALIGN output file (structure alignment) </b>
<table><td bgcolor="#CFCCFF">
<pre>
REMARK Output from transform
REMARK STAMP Package (Russell and Barton Proteins, 14, 309-323, 1992)
REMARK Domains were read from the file ./domainalign-1031313039.24319.sort
REMARK Chains are labelled sequentially starting with 'A' and
REMARK after the order given in the file ./domainalign-1031313039.24319.sort
REMARK The domains in this file are:
REMARK d1cs4a_ chain A
REMARK d1fx2a__1 chain B
REMARK Does not include heteroatoms
REMARK Does not include DNA/RNA
REMARK Does not include waters
ATOM 1 N MET A 22 28.568 -27.770 32.255 1.00 73.77 N
ATOM 2 CA MET A 22 28.292 -26.443 32.794 1.00 72.28 C
ATOM 3 C MET A 22 29.325 -25.377 32.396 1.00 69.48 C
ATOM 4 O MET A 22 30.485 -25.687 32.098 1.00 67.04 O
ATOM 5 CB MET A 22 28.075 -26.504 34.312 1.00 74.79 C
ATOM 6 CG MET A 22 29.171 -27.205 35.092 1.00 78.73 C
ATOM 7 SD MET A 22 28.708 -27.446 36.824 1.00 83.74 S
ATOM 8 CE MET A 22 28.745 -25.745 37.440 1.00 81.94 C
ATOM 9 N MET A 23 28.883 -24.120 32.395 1.00 66.44 N
ATOM 10 CA MET A 23 29.698 -22.969 32.011 1.00 62.94 C
ATOM 11 C MET A 23 30.928 -22.727 32.886 1.00 59.70 C
ATOM 12 O MET A 23 32.059 -22.739 32.400 1.00 57.00 O
ATOM 13 CB MET A 23 28.824 -21.715 31.966 1.00 64.01 C
ATOM 14 CG MET A 23 27.551 -21.872 31.137 1.00 64.35 C
<font color=red> < data ommitted for clarity > </font>
ATOM 1853 CA ASP B 235 47.656 -10.920 50.374 1.00 73.64 C
ATOM 1854 C ASP B 235 46.520 -10.365 49.515 1.00 74.17 C
ATOM 1855 O ASP B 235 46.464 -10.761 48.332 1.00 75.01 O
ATOM 1856 CB ASP B 235 47.790 -12.446 50.171 1.00 73.22 C
</pre>
</td></table>
-->
<!-- DATA FILES
Any data files used (e.g. translation table file, substitution matrix
etc. This includes example data file formats if they are not obvious.
For a standard description of what data files are and how embossdata can
be used to inspect and retrieve them, use:
#include file="inc/localfiles.ihtml"
-->
<a name="5.0"></a>
<br><br><br><H2> 5.0 DATA FILES </H2>
DOMAINALIGN does not use any data files but uses the STAMP
"pdb.directories" file which specifies the permissible prefix, extension and path of
PDB files used by STAMP. This file should look like :
<table><td bgcolor="#FFCCFF">
<pre>
test_data/ - .dent
/data/pdb - -
/data/pdb _ .ent
/data/pdb _ .pdb
/data/pdb pdb .ent
/data/pdbscop _ _
/data/pdbscop _ .ent
/data/pdbscop _ .pdb
/data/pdbscop pdb .ent
./ _ _
./ _ .ent
./ _ .ent.z
./ _ .ent.gz
./ _ .pdb
./ _ .pdb.Z
./ _ .pdb.gz
./ pdb .ent
./ pdb .ent.Z
./ pdb .ent.gz
/data/CASS1/pdb/coords/ _ .pdb
/data/CASS1/pdb/coords/ _ .pdb.Z
/data/CASS1/pdb/coords/ _ .pdb.gz
</pre>
</table>
<!-- USAGE
Example usage, as run from the command-line.
Many examples illustrating different behaviours is good.
-->
<a name="6.0"></a>
<br><br><br><H2> 6.0 USAGE </H2>
<H3> 6.1 COMMAND LINE ARGUMENTS </H3>
<pre>
Generate alignments (DAF file) for nodes in a DCF file.
Version: EMBOSS:6.6.0.0
Standard (Mandatory) qualifiers (* if not always prompted):
[-dcfinfile] infile This option specifies the name of DCF file
(domain classification file) (input). A
'domain classification file' contains
classification and other data for domains
from SCOP or CATH, in DCF format
(EMBL-like). The files are generated by
using SCOPPARSE and CATHPARSE. Domain
sequence information can be added to the
file by using DOMAINSEQS.
[-pdbdir] directory [./] This option specifies the location of
domain PDB files (input). A 'domain PDB
file' contains coordinate data for a single
domain from SCOP or CATH, in PDB format. The
files are generated by using DOMAINER.
-node menu [1] This option specifies the node for
redundancy removal. Redundancy can be
removed at any specified node in the SCOP or
CATH hierarchies. For example by selecting
'Class' entries belonging to the same Class
will be non-redundant. (Values: 1 (Class
(SCOP)); 2 (Fold (SCOP)); 3 (Superfamily
(SCOP)); 4 (Family (SCOP)); 5 (Class
(CATH)); 6 (Architecture (CATH)); 7
(Topology (CATH)); 8 (Homologous Superfamily
(CATH)); 9 (Family (CATH)))
-mode menu [1] This option specifies the alignment
algorithm to use. (Values: 1 (STAMP); 2
(TCOFFEE))
-[no]keepsinglets toggle [Y] This option specifies whether to write
sequences of singlet families to file. If
you specify this option, the sequence for
each singlet family are written to file
(output).
[-dafoutdir] outdir [./] This option specifies the location of
DAF files (domain alignment files) (output).
A 'domain alignment file' contains a
sequence alignment of domains belonging to
the same SCOP or CATH family. The files are
in clustal format and are annotated with
domain family classification information.
The files generated by using SCOPALIGN will
contain a structure-based sequence alignment
of domains of known structure only. Such
alignments can be extended with sequence
relatives (of unknown structure) by using
SEQALIGN.
* -singletsoutdir outdir [./] This option specifies the location of
DHF files (domain hits files) for singlet
sequences (output). The singlets are written
out as a 'domain hits file' - which
contains database hits (sequences) with
domain classification information, in FASTA
format.
* -superoutdir outdir [./] This option specifies the location of
structural superimposition files (output). A
file in PDB format of the structural
superimposition is generated for each family
if the STAMP algorithm is used.
-logfile outfile [domainalign.log] This option specifies the
name of log file (output). The log file
contains messages about any errors arising
while domainalign ran.
Additional (Optional) qualifiers: (none)
Advanced (Unprompted) qualifiers: (none)
Associated qualifiers:
"-pdbdir" associated qualifiers
-extension2 string Default file extension
"-dafoutdir" associated qualifiers
-extension3 string Default file extension
"-singletsoutdir" associated qualifiers
-extension string Default file extension
"-superoutdir" associated qualifiers
-extension string Default file extension
"-logfile" associated qualifiers
-odirectory string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write first file to standard output
-filter boolean Read first file from standard input, write
first file to standard output
-options boolean Prompt for standard and additional values
-debug boolean Write debug output to program.dbg
-verbose boolean Report some/full command line options
-help boolean Report command line options and exit. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report dying program messages
-version boolean Report version number and exit
</pre>
</td></tr></table>
<P>
<table border cellspacing=0 cellpadding=3 bgcolor="#ccccff">
<tr bgcolor="#FFFFCC">
<th align="left">Qualifier</th>
<th align="left">Type</th>
<th align="left">Description</th>
<th align="left">Allowed values</th>
<th align="left">Default</th>
</tr>
<tr bgcolor="#FFFFCC">
<th align="left" colspan=5>Standard (Mandatory) qualifiers</th>
</tr>
<tr bgcolor="#FFFFCC">
<td>[-dcfinfile]<br>(Parameter 1)</td>
<td>infile</td>
<td>This option specifies the name of DCF file (domain classification file) (input). A 'domain classification file' contains classification and other data for domains from SCOP or CATH, in DCF format (EMBL-like). The files are generated by using SCOPPARSE and CATHPARSE. Domain sequence information can be added to the file by using DOMAINSEQS.</td>
<td>Input file</td>
<td><b>Required</b></td>
</tr>
<tr bgcolor="#FFFFCC">
<td>[-pdbdir]<br>(Parameter 2)</td>
<td>directory</td>
<td>This option specifies the location of domain PDB files (input). A 'domain PDB file' contains coordinate data for a single domain from SCOP or CATH, in PDB format. The files are generated by using DOMAINER.</td>
<td>Directory</td>
<td>./</td>
</tr>
<tr bgcolor="#FFFFCC">
<td>-node</td>
<td>list</td>
<td>This option specifies the node for redundancy removal. Redundancy can be removed at any specified node in the SCOP or CATH hierarchies. For example by selecting 'Class' entries belonging to the same Class will be non-redundant.</td>
<td><table><tr><td>1</td> <td><i>(Class (SCOP))</i></td></tr><tr><td>2</td> <td><i>(Fold (SCOP))</i></td></tr><tr><td>3</td> <td><i>(Superfamily (SCOP))</i></td></tr><tr><td>4</td> <td><i>(Family (SCOP))</i></td></tr><tr><td>5</td> <td><i>(Class (CATH))</i></td></tr><tr><td>6</td> <td><i>(Architecture (CATH))</i></td></tr><tr><td>7</td> <td><i>(Topology (CATH))</i></td></tr><tr><td>8</td> <td><i>(Homologous Superfamily (CATH))</i></td></tr><tr><td>9</td> <td><i>(Family (CATH))</i></td></tr></table></td>
<td>1</td>
</tr>
<tr bgcolor="#FFFFCC">
<td>-mode</td>
<td>list</td>
<td>This option specifies the alignment algorithm to use.</td>
<td><table><tr><td>1</td> <td><i>(STAMP)</i></td></tr><tr><td>2</td> <td><i>(TCOFFEE)</i></td></tr></table></td>
<td>1</td>
</tr>
<tr bgcolor="#FFFFCC">
<td>-[no]keepsinglets</td>
<td>toggle</td>
<td>This option specifies whether to write sequences of singlet families to file. If you specify this option, the sequence for each singlet family are written to file (output).</td>
<td>Toggle value Yes/No</td>
<td>Yes</td>
</tr>
<tr bgcolor="#FFFFCC">
<td>[-dafoutdir]<br>(Parameter 3)</td>
<td>outdir</td>
<td>This option specifies the location of DAF files (domain alignment files) (output). A 'domain alignment file' contains a sequence alignment of domains belonging to the same SCOP or CATH family. The files are in clustal format and are annotated with domain family classification information. The files generated by using SCOPALIGN will contain a structure-based sequence alignment of domains of known structure only. Such alignments can be extended with sequence relatives (of unknown structure) by using SEQALIGN.</td>
<td>Output directory</td>
<td>./</td>
</tr>
<tr bgcolor="#FFFFCC">
<td>-singletsoutdir</td>
<td>outdir</td>
<td>This option specifies the location of DHF files (domain hits files) for singlet sequences (output). The singlets are written out as a 'domain hits file' - which contains database hits (sequences) with domain classification information, in FASTA format.</td>
<td>Output directory</td>
<td>./</td>
</tr>
<tr bgcolor="#FFFFCC">
<td>-superoutdir</td>
<td>outdir</td>
<td>This option specifies the location of structural superimposition files (output). A file in PDB format of the structural superimposition is generated for each family if the STAMP algorithm is used.</td>
<td>Output directory</td>
<td>./</td>
</tr>
<tr bgcolor="#FFFFCC">
<td>-logfile</td>
<td>outfile</td>
<td>This option specifies the name of log file (output). The log file contains messages about any errors arising while domainalign ran.</td>
<td>Output file</td>
<td>domainalign.log</td>
</tr>
<tr bgcolor="#FFFFCC">
<th align="left" colspan=5>Additional (Optional) qualifiers</th>
</tr>
<tr>
<td colspan=5>(none)</td>
</tr>
<tr bgcolor="#FFFFCC">
<th align="left" colspan=5>Advanced (Unprompted) qualifiers</th>
</tr>
<tr>
<td colspan=5>(none)</td>
</tr>
<tr bgcolor="#FFFFCC">
<th align="left" colspan=5>Associated qualifiers</th>
</tr>
<tr bgcolor="#FFFFCC">
<td align="left" colspan=5>"-pdbdir" associated directory qualifiers
</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -extension2<br>-extension_pdbdir</td>
<td>string</td>
<td>Default file extension</td>
<td>Any string</td>
<td>ent</td>
</tr>
<tr bgcolor="#FFFFCC">
<td align="left" colspan=5>"-dafoutdir" associated outdir qualifiers
</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -extension3<br>-extension_dafoutdir</td>
<td>string</td>
<td>Default file extension</td>
<td>Any string</td>
<td>daf</td>
</tr>
<tr bgcolor="#FFFFCC">
<td align="left" colspan=5>"-singletsoutdir" associated outdir qualifiers
</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -extension</td>
<td>string</td>
<td>Default file extension</td>
<td>Any string</td>
<td>dhf</td>
</tr>
<tr bgcolor="#FFFFCC">
<td align="left" colspan=5>"-superoutdir" associated outdir qualifiers
</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -extension</td>
<td>string</td>
<td>Default file extension</td>
<td>Any string</td>
<td>ent</td>
</tr>
<tr bgcolor="#FFFFCC">
<td align="left" colspan=5>"-logfile" associated outfile qualifiers
</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -odirectory</td>
<td>string</td>
<td>Output directory</td>
<td>Any string</td>
<td> </td>
</tr>
<tr bgcolor="#FFFFCC">
<th align="left" colspan=5>General qualifiers</th>
</tr>
<tr bgcolor="#FFFFCC">
<td> -auto</td>
<td>boolean</td>
<td>Turn off prompts</td>
<td>Boolean value Yes/No</td>
<td>N</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -stdout</td>
<td>boolean</td>
<td>Write first file to standard output</td>
<td>Boolean value Yes/No</td>
<td>N</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -filter</td>
<td>boolean</td>
<td>Read first file from standard input, write first file to standard output</td>
<td>Boolean value Yes/No</td>
<td>N</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -options</td>
<td>boolean</td>
<td>Prompt for standard and additional values</td>
<td>Boolean value Yes/No</td>
<td>N</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -debug</td>
<td>boolean</td>
<td>Write debug output to program.dbg</td>
<td>Boolean value Yes/No</td>
<td>N</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -verbose</td>
<td>boolean</td>
<td>Report some/full command line options</td>
<td>Boolean value Yes/No</td>
<td>Y</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -help</td>
<td>boolean</td>
<td>Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose</td>
<td>Boolean value Yes/No</td>
<td>N</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -warning</td>
<td>boolean</td>
<td>Report warnings</td>
<td>Boolean value Yes/No</td>
<td>Y</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -error</td>
<td>boolean</td>
<td>Report errors</td>
<td>Boolean value Yes/No</td>
<td>Y</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -fatal</td>
<td>boolean</td>
<td>Report fatal errors</td>
<td>Boolean value Yes/No</td>
<td>Y</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -die</td>
<td>boolean</td>
<td>Report dying program messages</td>
<td>Boolean value Yes/No</td>
<td>Y</td>
</tr>
<tr bgcolor="#FFFFCC">
<td> -version</td>
<td>boolean</td>
<td>Report version number and exit</td>
<td>Boolean value Yes/No</td>
<td>N</td>
</tr>
</table>
<H3> 6.2 EXAMPLE SESSION </H3>
An example of interactive use of DOMAINALIGN is shown below.
Here is a sample session with <b>domainalign</b>
<p>
<p>
<table width="90%"><tr><td bgcolor="#CCFFFF"><pre>
% <b>domainalign </b>
Generate alignments (DAF file) for nodes in a DCF file.
Domain classification file: <b>all.scop2</b>
Domain pdb directory [./]: <b></b>
Node at which to remove redundancy
1 : Class (SCOP)
2 : Fold (SCOP)
3 : Superfamily (SCOP)
4 : Family (SCOP)
5 : Class (CATH)
6 : Architecture (CATH)
7 : Topology (CATH)
8 : Homologous Superfamily (CATH)
9 : Family (CATH)
Select number. [1]: <b>4</b>
Alignment algorithm option
1 : STAMP
2 : TCOFFEE
Select number. [1]: <b>1</b>
Write sequences of singlet families to file. [Y]: <b>N</b>
Domain alignment file output directory [./]: <b>daf</b>
Pdb entry file output directory [./]: <b></b>
Domainatrix log output file [domainalign.log]: <b></b>
STAMP Structural Alignment of Multiple Proteins
by Robert B. Russell & Geoffrey J. Barton
Please cite PROTEINS, v14, 309-323, 1992
Results of scan will be written to file ./domainalign-1234567890.1234.scan
Fits = no. of fits performed, Sc = STAMP score, RMS = RMS deviation
Align = alignment length, Nfit = residues fitted, Eq. = equivalent residues
Secs = no. equiv. secondary structures, %I = seq. identity, %S = sec. str. identity
P(m) = P value (p=1/10) calculated after Murzin (1993), JMB, 230, 689-694
Domain1 Domain2 Fits Sc RMS Len1 Len2 Align Fit Eq. Secs %I %S P(m)
Scan d1cs4a_ d1cs4a_ 1 9.799 0.001 189 189 189 189 188 0 100.00 100.00 0.00e+00
Scan d1cs4a_ d1fx2a_ 1 6.522 1.343 189 235 225 135 133 0 20.30 100.00 0.00017
See the file ./domainalign-1234567890.1234.scan
/shared/software/bin/stamp -l ./domainalign-1234567890.1234.dom -s -n 2 -slide 5 -prefix ./domainalign-1234567890.1234 -d ./domainalign-1234567890.1234.set
/shared/software/bin/sorttrans -f ./domainalign-1234567890.1234.scan -s Sc 2.5 > ./domainalign-1234567890.1234.sort
/shared/software/bin/stamp -l ./domainalign-1234567890.1234.sort -prefix ./domainalign-1234567890.1234 > ./domainalign-1234567890.1234.log
TRANSFORM R.B. Russell, 1995
Using PDB files
Files will not include heteroatoms
Files will not include DNA/RNA
Files will not include waters
All coordinates will be in file ./55074.ent
Domain 1, d1cs4a_ => to ./55074.ent (chain A)
Domain 2, d1fx2a__1 => to ./55074.ent (chain B)
POSTSTAMP, R.B. Russell 1995
New output will be in file ./domainalign-1234567890.1234.1
E1 = 3.800, E2 = 3.800
Minimum Pij set to 0.500
Reading domain descriptors/transformations from the file ./domainalign-1234567890.1234.1
Reading alignment...
Reading coordinates...
Domain 1 /homes/user/test/qa/domainer-keep2//d1cs4a_.ent d1cs4a_
all residues 189 CAs => 189 CAs in total
Transforming coordinates...
Domain 2 /homes/user/test/qa/domainer-keep2//d1fx2a_.ent d1fx2a__1
all residues 235 CAs => 235 CAs in total
Transforming coordinates...
...done.
/shared/software/bin/transform -f ./domainalign-1234567890.1234.sort -g -o ./55074.ent
/shared/software/bin/poststamp -f ./domainalign-1234567890.1234.1 -min 0.5
/shared/software/bin/ver2hor -f ./domainalign-1234567890.1234.1.post > ./domainalign-1234567890.1234.out
STAMP Structural Alignment of Multiple Proteins
by Robert B. Russell & Geoffrey J. Barton
Please cite PROTEINS, v14, 309-323, 1992
Results of scan will be written to file ./domainalign-1234567890.1234.scan
Fits = no. of fits performed, Sc = STAMP score, RMS = RMS deviation
Align = alignment length, Nfit = residues fitted, Eq. = equivalent residues
Secs = no. equiv. secondary structures, %I = seq. identity, %S = sec. str. identity
P(m) = P value (p=1/10) calculated after Murzin (1993), JMB, 230, 689-694
Domain1 Domain2 Fits Sc RMS Len1 Len2 Align Fit Eq. Secs %I %S P(m)
Scan d4at1b1 d4at1b1 1 9.799 0.001 93 93 93 93 92 0 100.00 100.00 1.00e-92
Scan d4at1b1 d4at1d1 1 9.251 0.588 93 93 94 89 88 0 100.00 100.00 1.00e-88
See the file ./domainalign-1234567890.1234.scan
Processing node 54894
/shared/software/bin/stamp -l ./domainalign-1234567890.1234.dom -s -n 2 -slide 5 -prefix ./domainalign-1234567890.1234 -d ./domainalign-1234567890.1234.set
/shared/software/bin/sorttrans -f ./domainalign-1234567890.1234.scan -s Sc 2.5 > ./domainalign-1234567890.1234.sort
/shared/software/bin/stamp -l ./domainalign-1234567890.1234.sort -prefix ./domainalign-1234567890.1234 > ./domainalign-1234567890.1234.log
TRANSFORM R.B. Russell, 1995
Using PDB files
Files will not include heteroatoms
Files will not include DNA/RNA
Files will not include waters
All coordinates will be in file ./54894.ent
Domain 1, d4at1b1 => to ./54894.ent (chain A)
Domain 2, d4at1d1_1 => to ./54894.ent (chain B)
POSTSTAMP, R.B. Russell 1995
New output will be in file ./domainalign-1234567890.1234.1
E1 = 3.800, E2 = 3.800
Minimum Pij set to 0.500
Reading domain descriptors/transformations from the file ./domainalign-1234567890.1234.1
Reading alignment...
Reading coordinates...
Domain 1 /homes/user/test/qa/domainer-keep2//d4at1b1.ent d4at1b1
all residues 93 CAs => 93 CAs in total
Transforming coordinates...
Domain 2 /homes/user/test/qa/domainer-keep2//d4at1d1.ent d4at1d1_1
all residues 93 CAs => 93 CAs in total
Transforming coordinates...
...done.
/shared/software/bin/transform -f ./domainalign-1234567890.1234.sort -g -o ./54894.ent
/shared/software/bin/poststamp -f ./domainalign-1234567890.1234.1 -min 0.5
/shared/software/bin/ver2hor -f ./domainalign-1234567890.1234.1.post > ./domainalign-1234567890.1234.out
</pre></td></tr></table><p>
<p>
<a href="#input.1">Go to the input files for this example</a><br><a href="#output.1">Go to the output files for this example</a><p><p>
<!--
<table><td bgcolor="#FFCCFF">
<pre>
Unix % domainalign
Generates structure-based sequence alignments for nodes in a DCF file
(domain classification file).
Name of DCF file (domain classification file) for input (DCF format): /test_data/all.scop2
Location of domain PDB files (PDB format input) [./]: /test_data/
Node at which to remove redundancy
1 : Class (SCOP)
2 : Fold (SCOP)
3 : Superfamily (SCOP)
4 : Family (SCOP)
5 : Class (CATH)
6 : Architecture (CATH)
7 : Topology (CATH)
8 : Homologous Superfamily (CATH)
9 : Family (CATH)
Select number [1]: 4
Alignment algorithm option
1 : STAMP
2 : TCOFFEE
Select number [1]: 1
Write sequences of singlet families to output file (FASTA-format) [Y]:
Location of files for singlet sequences (FASTA output) [.]: /test_data/
Location of domain alignment files (output) [./]: /test_data/domainalign
Location of structure alignment files for output (PDB format) [./]: /test_data/
Name of log file (output) [domainalign.logf]: /test_data/domainalign
Processing node 55074
stamp -l ./domainalign-1093353541.12819.dom -s -n 2 -slide 5 -prefix ./domainalign-1093353541.12819 -d ./domainalign-1093353541.12819.set
STAMP Structural Alignment of Multiple Proteins
by Robert B. Russell & Geoffrey J. Barton
Please cite PROTEINS, v14, 309-323, 1992
Results of scan will be written to file ./domainalign-1093353541.12819.scan
Fits = no. of fits performed, Sc = STAMP score, RMS = RMS deviation
Align = alignment length, Nfit = residues fitted, Eq. = equivalent residues
Secs = no. equiv. secondary structures, %I = seq. identity, %S = sec. str. identity
P(m) = P value (p=1/10) calculated after Murzin (1993), JMB, 230, 689-694
Domain1 Domain2 Fits Sc RMS Len1 Len2 Align Fit Eq. Secs %I %S P(m)
Scan d1cs4a_ d1cs4a_ 1 9.799 0.001 189 189 189 189 188 0 100.00 100.00 0.00e+00
Scan d1cs4a_ d1fx2a_ 1 6.522 1.343 189 235 225 135 133 0 20.30 100.00 0.00017
See the file ./domainalign-1093353541.12819.scan
sorttrans -f ./domainalign-1093353541.12819.scan -s Sc 2.5 > ./domainalign-1093353541.12819.sort
stamp -l ./domainalign-1093353541.12819.sort -prefix ./domainalign-1093353541.12819 > ./domainalign-1093353541.12819.log
transform -f ./domainalign-1093353541.12819.sort -g -o test_data/55074.palign
TRANSFORM R.B. Russell, 1995
Using PDB files
Files will not include heteroatoms
Files will not include DNA/RNA
Files will not include waters
All coordinates will be in file test_data/55074.palign
Domain 1, d1cs4a_ => to test_data/55074.palign (chain A)
Domain 2, d1fx2a__1 => to test_data/55074.palign (chain B)
poststamp -f ./domainalign-1093353541.12819.1 -min 0.5
POSTSTAMP, R.B. Russell 1995
New output will be in file ./domainalign-1093353541.12819.1
E1 = 3.800, E2 = 3.800
Minimum Pij set to 0.500
Reading domain descriptors/transformations from the file ./domainalign-1093353541.12819.1
Reading alignment...
Reading coordinates...
Domain 1 /data/pdbscop/d1cs4a_.ent d1cs4a_
all residues 189 CAs => 189 CAs in total
Transforming coordinates...
Domain 2 /data/pdbscop/d1fx2a_.ent d1fx2a__1
all residues 235 CAs => 235 CAs in total
Transforming coordinates...
...done.
ver2hor -f ./domainalign-1093353541.12819.1.post > ./domainalign-1093353541.12819.out
Processing node 54894
stamp -l ./domainalign-1093353541.12819.dom -s -n 2 -slide 5 -prefix ./domainalign-1093353541.12819 -d ./domainalign-1093353541.12819.set
STAMP Structural Alignment of Multiple Proteins
by Robert B. Russell & Geoffrey J. Barton
Please cite PROTEINS, v14, 309-323, 1992
Results of scan will be written to file ./domainalign-1093353541.12819.scan
Fits = no. of fits performed, Sc = STAMP score, RMS = RMS deviation
Align = alignment length, Nfit = residues fitted, Eq. = equivalent residues
Secs = no. equiv. secondary structures, %I = seq. identity, %S = sec. str. identity
P(m) = P value (p=1/10) calculated after Murzin (1993), JMB, 230, 689-694
Domain1 Domain2 Fits Sc RMS Len1 Len2 Align Fit Eq. Secs %I %S P(m)
Scan d4at1b1 d4at1b1 1 9.799 0.001 93 93 93 93 92 0 100.00 100.00 1.00e-92
Scan d4at1b1 d4at1d1 1 9.251 0.588 93 93 94 89 88 0 100.00 100.00 1.00e-88
See the file ./domainalign-1093353541.12819.scan
sorttrans -f ./domainalign-1093353541.12819.scan -s Sc 2.5 > ./domainalign-1093353541.12819.sort
stamp -l ./domainalign-1093353541.12819.sort -prefix ./domainalign-1093353541.12819 > ./domainalign-1093353541.12819.log
transform -f ./domainalign-1093353541.12819.sort -g -o test_data/54894.palign
TRANSFORM R.B. Russell, 1995
Using PDB files
Files will not include heteroatoms
Files will not include DNA/RNA
Files will not include waters
All coordinates will be in file test_data/54894.palign
Domain 1, d4at1b1 => to test_data/54894.palign (chain A)
Domain 2, d4at1d1_1 => to test_data/54894.palign (chain B)
poststamp -f ./domainalign-1093353541.12819.1 -min 0.5
POSTSTAMP, R.B. Russell 1995
New output will be in file ./domainalign-1093353541.12819.1
E1 = 3.800, E2 = 3.800
Minimum Pij set to 0.500
Reading domain descriptors/transformations from the file ./domainalign-1093353541.12819.1
Reading alignment...
Reading coordinates...
Domain 1 /data/pdbscop/d4at1b1.ent d4at1b1
all residues 93 CAs => 93 CAs in total
Transforming coordinates...
Domain 2 /data/pdbscop/d4at1d1.ent d4at1d1_1
all residues 93 CAs => 93 CAs in total
Transforming coordinates...
...done.
ver2hor -f ./domainalign-1093353541.12819.1.post > ./domainalign-1093353541.12819.out
Unix %
</pre>
</table>
<!-- Two alignments each of two domains were performed (by using STAMP). Structure-based sequence alignments (/test_data/55074.daf and /test_data/54894.daf) and structure alignments (55074.ent and 54894.ent) were written. The file extensions were specified by the user in the ACD file. The base name of these files (55074 and 54894) is the same as the Sunid for the node (family in this case) taken from the domain classification file /test_data/all.scop2. Any sequences of singlet families were written to /test_data/domainalign. A log file called domainalign.logf was written to /test_data/domainalign. -->
<br>
<br>The following command line would achieve the same result.
<br>
<table><td bgcolor="#FFCCFF">
<pre>
domainalign /test_data/all.scop2 /test_data/ /test_data/ /test_data/domainalign -keepsinglets Y
-singlets /test_data/domainalign -node 4 -mode 1
</pre>
</table>
-->
<!-- KNOWN BUGS & WARNINGS
Bugs that have not yet been fixed, easily missued features, problems
and caveats etc. Potentially stupid things the program will let you do.
-->
<a name="7.0"></a>
<br><br><br><H2> 7.0 KNOWN BUGS & WARNINGS </H2>
<b> 1. Use of stamp</b><br>
DOMAINALIGN requires a modified version of STAMP (see <a href="#8.0">Notes</a> below).
The modified STAMP application must be installed on the system that is running DOMAINALIGN.
<br><br><b> 2. Strange STAMP behaviour</b><br>
STAMP will ignore (omit from the alignment and *not* replace with '-' or
any other symbol) ANY residues or groups in a PDB file that
<br>
<br> (i) are not structured (i.e. do not appear in the ATOM records) or
<br> (ii) lack a CA atom, regardless of whether it is a known amino acid or not.
<br>
<br>
This means that the position (column) in the alignment cannot reliably be
used as the basis for an index into arrays representing the full length
sequences.
STAMP will however include in the alignment residues with a single atom
only, so long as it is the CA atom.
<br><br><b> 3. Handling of singlet nodes</b>
<br>
No sequence alignment or structural superimposition files are generated for nodes that contain a single domain only. Sequences for such domains can be saved to file (see <a href="#2.0">2.0 INPUTS & OUTPUTS</a>).
<br><br><b> 4. Alignment numbering</b>
<br>Residue number positions in alignment are not implemented (zero's are given).
<!-- NOTES
Important general remarks, including:
Restrictions.
Interesting behaviour.
Useful things you can do with this program.
Future plans.
etc.
-->
<a name="8.0"></a>
<br><br><br><H2> 8.0 NOTES </H2>
<b> 1. Adaption of STAMP for domain codes</b>
<br> DOMAINALIGN will only run with with a version of STAMP which has been modified
so that PDB id codes of length greater than 4 characters are acceptable.
This involves a trivial change to the STAMP module getdomain.c (around line
number 155), a 4 must be changed to a 7 as follows:
<table><td bgcolor="#FFCCFF">
<pre>
temp=getfile(domain[0].id,dirfile,4,OUTPUT);
temp=getfile(domain[0].id,dirfile,7,OUTPUT);
</pre>
</td></table>
<br><b> 2. Adaption of STAMP for larger datasets</b>
<br> STAMP fails to align a large dataset of all the available V set Ig
domains. The ver2hor module generates the following error:
<table><td bgcolor="#FFCCFF">
<pre>
Transforming coordinates...
...done.
ver2hor -f ./domainalign-1022069396.11280.76.post > ./domainalign-1022069396.11280.out
error: something wrong with STAMP file
STAMP length is 370, Alignment length is 422
STAMP nseq is 155, Alignment nseq is 155
</pre>
</td></table>
<br>
This is fixed by the following change in alignfit.h.
<table><td bgcolor="#FFCCFF">
<pre>
#define MAXtlen 200
#define MAXtlen 2000
</pre>
</td></table>
<br>
At the same time the following may be changed as a safety measure:
<table><td bgcolor="#FFCCFF">
<pre>
gstamp.c : #define MAX_SEQ_LEN 10000 (was 2000)
pdbseq.c : #define MAX_SEQ_LEN 10000 (was 3000)
defaults.h: #define MAX_SEQ_LEN 10000 (was 8000)
defaults.h: #define MAX_NSEQ 10000 (was 1000)
defaults.h: #define MAX_BLOC_SEQ 5000 (was 500)
dstamp.h : #define MAX_N_SEQ 10000 (was 1000)
ver2hor.h : #define MAX_N_SEQ 10000 (was 1000)
</pre>
</td></table>
<br><br><b> 3. pdb.directories file</b><br>
STAMP (and therefore DOMAINALIGN) uses a "pdb.directories" file: see <a href="#5.0">5.0 DATA FILES </a>
<br><br><b> 4. Choice of alignment algorithm</b><br>
Future versions of DOMAINALIGN will implement a larger choice of alignment algorithms.
<br><br><b> 5. Getting the best alignment</b><br>
DOMAINALIGN will produce better alignments if the DCF file is reordered so that the representative structure of each node (e.g. family) is given first. This is achieved by using DOMAINREP.
<br><br><b> 6. Whitespace in alignment</b><br>
STAMP can insert non-sensical whitespaces into its alignments, e.g. instead of a residue character where that residue was missing electron density in the PDB file. DOMAINALIGN replaces each whitespace within a STAMP alignment with an "X".
<br><H3> 8.1 GLOSSARY OF FILE TYPES </H3>
<a name="ref1"></a>
<table BORDER CELLSPACING=5 CELLPADDING=5 BGCOLOR="#f5f5ff" >
<tr>
<td><b>FILE TYPE</b></td>
<td><b>FORMAT</b></td>
<td><b>DESCRIPTION</b></td>
<td><b>CREATED BY <b></td>
<td><b>SEE ALSO</b></td>
</tr>
<tr>
<td><b> Domain classification file (for SCOP)</b></td>
<td> DCF format (EMBL-like format for domain classification data). </td>
<td> Classification and other data for domains from SCOP. </td>
<td> <a href="scopparse.html">SCOPPARSE</a> </td>
<td> Domain sequence information can be added to the file by using DOMAINSEQS. </td>
</tr>
<tr>
<td><b> Domain classification file (for CATH)</b></td>
<td> DCF format (EMBL-like format for domain classification data). </td>
<td> Classification and other data for domains from CATH. </td>
<td> <a href="cathparse.html">CATHPARSE</a> </td>
<td> Domain sequence information can be added to the file by using DOMAINSEQS. </td>
</tr>
<tr>
<td><b>Domain PDB file </b></td>
<td> PDB format for domain coordinate data. </td>
<td> Coordinate data for a single domain from SCOP or CATH. </td>
<td> <a href="domainer.html">DOMAINER</a> </td>
<td> N.A. </td>
</tr>
<tr>
<td><b>Domain alignment file </b></td>
<td> DAF format (clustal format with domain classification information). </td>
<td> Contains a sequence alignment of domains belonging to the same SCOP or CATH family. The file is annotated with domain family classification information.</td>
<td> <a href="domainalign.html">DOMAINALIGN</a> (structure-based sequence alignment of domains of known structure). </td>
<td> DOMAINALIGN alignments can be extended with sequence relatives (of unknown structure) to the family in question by using SEQALIGN. </td>
</tr>
</table>
None
<!-- DESCRIPTION
A complete, non-technical, user-level description of the application.
-->
<a name="9.0"></a>
<br><br><br><H2> 9.0 DESCRIPTION </H2>
The generation of alignments for large datasets such as SCOP and CATH potentially requires a lot of time for preparation of datasets, writing of scripts, running individual jobs and so on, in addition to the compute time required for the alignments themselves. DOMAINALIGN automates this process: it reads a domain classification file and generates alignments for each user-specified node in turn.
<!-- ALGORITHM
A technical description of algorithmic aspects, describing exactly how
the key aspects of the application work.
-->
<a name="10.0"></a>
<br><br><br><H2> 10.0 ALGORITHM </H2>
More information on STAMP can be found at
<a href ="http://www.compbio.dundee.ac.uk/manuals/stamp.4.2/">http://www.compbio.dundee.ac.uk/manuals/stamp.4.2</a>
<br> More information on TCOFFEE can be found at <a href="http://www.ch.embnet.org/software/TCoffee.html">http://www.ch.embnet.org/software/TCoffee.html</a>
<!-- RELATED APPLICATIONS
Other applications that either generate the input, use the output or
are in some other way related to the application are described here.
(Take this from "Sister applications" in the old documentation)
-->
<a name="11.0"></a>
<br><br><br><H2> 11.0 RELATED APPLICATIONS </H2>
<h2><a name="See also">See also</a></h2>
<table border cellpadding=4 bgcolor="#FFFFF0">
<tr><th>Program name</th>
<th>Description</th></tr>
<tr>
<td><a href="../domainatrix/cathparse.html">cathparse</a></td>
<td>Generate DCF file from raw CATH files</td>
</tr>
<tr>
<td><a href="../domainatrix/domainnr.html">domainnr</a></td>
<td>Remove redundant domains from a DCF file</td>
</tr>
<tr>
<td><a href="domainrep.html">domainrep</a></td>
<td>Reorder DCF file to identify representative structures</td>
</tr>
<tr>
<td><a href="../domainatrix/domainseqs.html">domainseqs</a></td>
<td>Add sequence records to a DCF file</td>
</tr>
<tr>
<td><a href="../domainatrix/domainsse.html">domainsse</a></td>
<td>Add secondary structure records to a DCF file</td>
</tr>
<tr>
<td><a href="../../emboss/apps/helixturnhelix.html">helixturnhelix</a></td>
<td>Identify nucleic acid-binding motifs in protein sequences</td>
</tr>
<tr>
<td><a href="../signature/libgen.html">libgen</a></td>
<td>Generate discriminating elements from alignments</td>
</tr>
<tr>
<td><a href="../../emboss/apps/matcher.html">matcher</a></td>
<td>Waterman-Eggert local alignment of two sequences</td>
</tr>
<tr>
<td><a href="../signature/matgen3d.html">matgen3d</a></td>
<td>Generate a 3D-1D scoring matrix from CCF files</td>
</tr>
<tr>
<td><a href="../hmmer/oalistat.html">oalistat</a></td>
<td>Statistics for multiple alignment files</td>
</tr>
<tr>
<td><a href="../../emboss/apps/pepcoil.html">pepcoil</a></td>
<td>Predict coiled coil regions in protein sequences</td>
</tr>
<tr>
<td><a href="../signature/rocon.html">rocon</a></td>
<td>Generate a hits file from comparing two DHF files</td>
</tr>
<tr>
<td><a href="../signature/rocplot.html">rocplot</a></td>
<td>Perform ROC analysis on hits files</td>
</tr>
<tr>
<td><a href="../domainatrix/scopparse.html">scopparse</a></td>
<td>Generate DCF file from raw SCOP files</td>
</tr>
<tr>
<td><a href="seqalign.html">seqalign</a></td>
<td>Extend alignments (DAF file) with sequences (DHF file)</td>
</tr>
<tr>
<td><a href="../domsearch/seqfraggle.html">seqfraggle</a></td>
<td>Remove fragment sequences from DHF files</td>
</tr>
<tr>
<td><a href="../../emboss/apps/seqmatchall.html">seqmatchall</a></td>
<td>All-against-all word comparison of a sequence set</td>
</tr>
<tr>
<td><a href="../domsearch/seqsort.html">seqsort</a></td>
<td>Remove ambiguous classified sequences from DHF files</td>
</tr>
<tr>
<td><a href="../domsearch/seqwords.html">seqwords</a></td>
<td>Generate DHF files from keyword search of UniProt</td>
</tr>
<tr>
<td><a href="../domainatrix/ssematch.html">ssematch</a></td>
<td>Search a DCF file for secondary structure matches</td>
</tr>
<tr>
<td><a href="../../emboss/apps/supermatcher.html">supermatcher</a></td>
<td>Calculate approximate local pair-wise alignments of larger sequences</td>
</tr>
<tr>
<td><a href="../../emboss/apps/water.html">water</a></td>
<td>Smith-Waterman local alignment of sequences</td>
</tr>
<tr>
<td><a href="../../emboss/apps/wordfinder.html">wordfinder</a></td>
<td>Match large sequences against one or more other sequences</td>
</tr>
<tr>
<td><a href="../../emboss/apps/wordmatch.html">wordmatch</a></td>
<td>Find regions of identity (exact matches) of two sequences</td>
</tr>
</table>
<!-- DIAGNOSTIC ERROR MESSAGES
Description of error messages or log file, if one is written.
-->
<a name="12.0"></a>
<br><br><br><H2> 12.0 DIAGNOSTIC ERROR MESSAGES </H2>
The following message may appear in the log file.
<br><br><i>Replaced ' ' in STAMP alignment with 'X'</i> (STAMP can insert non-sensical whitespaces into its alignments, e.g. instead of a residue character where that residue was missing electron density in the PDB file. DOMAINALIGN replaces each whitespace within a STAMP alignment with an "X").
<!-- AUTHORS -->
<a name="13.0"><br><br><br><H2> 13.0 AUTHORS </H2></a>
Ranjeeva Ranasinghe
<br><br>
Jon Ison <a href="mailto:jison@ebi.ac.uk">(jison@ebi.ac.uk)</a>
<br>
The European Bioinformatics Institute
Wellcome Trust Genome Campus
Cambridge CB10 1SD
UK
<!-- REFERENCES
Quote the paper where the application was first published, described, used etc.
-->
<a name="14.0"></a>
<br><br><br><H2> 14.0 REFERENCES </H2>
Please cite the authors and EMBOSS.
<br><br>
<i>Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European
Molecular Biology Open Software Suite" Trends in Genetics,
15:276-278.
<p>
See also <a href="http://emboss.sourceforge.net/">http://emboss.sourceforge.net/</a></i>
<H3>14.1 Other useful references </H3>
<br>Russell, R. B. & Barton, G. J. (1992), Multiple Sequence Alignment from Tertiary Structure Comparison: Assignment of Global and Residue Confidence Levels, PROTEINS: Struct. Funct. Genet., 14, 309-323.
<br>C. Notredame, D. Higgins, J. Heringa. T-Coffee: A novel method for multiple sequence alignments. Journal of Molecular Biology, 302, 205-217, (2000) </i>
<br><br> More information on STAMP can be found at <a href="http://www.compbio.dundee.ac.uk/manuals/stamp.4.2/">http://www.compbio.dundee.ac.uk/manuals/stamp.4.2/</a>
<br> More information on TCOFFEE can be found at <a href="http://www.ch.embnet.org/software/TCoffee.html">http://www.ch.embnet.org/software/TCoffee.html</a>
<br>
</BODY>
</HTML>
|