/var/lib/mobyle/programs/mafft.xml is in mobyle-programs 5.1.2-2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 | <?xml version='1.0' encoding='UTF-8'?>
<!-- XML Authors: Corinne Maufrais, Nicolas Joly and Bertrand Neron, -->
<!-- 'Biological Software and Databases' Group, Institut Pasteur, Paris. -->
<!-- Distributed under LGPLv2 License. Please refer to the COPYING.LIB document. -->
<program>
<head>
<name>mafft</name>
<version>6.849</version>
<doc>
<title>mafft</title>
<description>
<text lang="en">Multiple alignment program for amino acid or nucleotide sequences.</text>
</description>
<sourcelink>http://mafft.cbrc.jp/alignment/software/source.html</sourcelink>
<homepagelink>http://mafft.cbrc.jp/alignment/software/</homepagelink>
<authors>Kazutaka Katoh</authors>
<reference doi="10.1093/bioinformatics/btq224">
Katoh, Toh 2010 (Bioinformatics 26:1899-1900)
Parallelization of the MAFFT multiple sequence alignment program.
(describes the multithread version; Linux only)
</reference>
<reference doi="10.1007/978-1-59745-251-9_3">
Katoh, Asimenos, Toh 2009 (Methods in Molecular Biology 537:39-64)
Multiple Alignment of DNA Sequences with MAFFT. In Bioinformatics for DNA Sequence Analysis edited by D. Posada
(outlines DNA alignment methods and several tips including group-to-group alignment and rough clustering of a large number of sequences)
</reference>
<reference doi="10.1186/1471-2105-9-212">
Katoh, Toh 2008 (BMC Bioinformatics 9:212)
Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework.
(describes RNA structural alignment methods)
</reference>
<reference doi="10.1093/bib/bbn013">
Katoh, Toh 2008 (Briefings in Bioinformatics 9:286-298)
Recent developments in the MAFFT multiple sequence alignment program.
(outlines version 6; Fast Breaking Paper in Thomson Reuters' ScienceWatch)
</reference>
<reference doi="10.1093/bioinformatics/btl592">
Katoh, Toh 2007 (Bioinformatics 23:372-374) Errata
PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences.
(describes the PartTree algorithm)
</reference>
<reference doi="10.1093/nar/gki198">
Katoh, Kuma, Toh, Miyata 2005 (Nucleic Acids Res. 33:511-518)
MAFFT version 5: improvement in accuracy of multiple sequence alignment.
(describes [ancestral versions of] the G-INS-i, L-INS-i and E-INS-i strategies)
</reference>
<reference doi="10.1093/nar/gkf436">
Katoh, Misawa, Kuma, Miyata 2002 (Nucleic Acids Res. 30:3059-3066)
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
(describes the FFT-NS-1, FFT-NS-2 and FFT-NS-i strategies)
</reference>
<doclink>http://mafft.cbrc.jp/alignment/software/about.html</doclink>
</doc>
<category>alignment:multiple</category>
</head>
<parameters>
<paragraph>
<name>input_opt</name>
<prompt lang="en">Input Options</prompt>
<parameters>
<parameter ismandatory="1" issimple="1">
<name>sequences</name>
<prompt lang="en">Sequences File ( a file containing several sequences ).</prompt>
<type>
<datatype>
<class>Sequence</class>
</datatype>
<dataFormat>FASTA</dataFormat>
</type>
<format>
<code proglang="perl">" $sequences"</code>
<code proglang="python">" " + str( sequences )</code>
</format>
<argpos>1000</argpos>
</parameter>
<parameter>
<name>seq_type</name>
<prompt lang="en">Sequences type</prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>null</value>
</vdef>
<flist>
<felem undef="1">
<value>null</value>
<label>Automatic</label>
<code proglang="perl">''</code>
<code proglang="python">''</code>
</felem>
<felem>
<value>nuc</value>
<label>Assume the sequences are nucleotide.</label>
<code proglang="perl">" --nuc "</code>
<code proglang="python">" --nuc "</code>
</felem>
<felem>
<value>amino</value>
<label>Assume the sequences are amino acid.</label>
<code proglang="perl">" --amino "</code>
<code proglang="python">" --amino "</code>
</felem>
</flist>
</parameter>
<paragraph>
<name>seed</name>
<prompt>Use structural alignment(s)</prompt>
<parameters>
<parameter>
<name>seed_1</name>
<prompt>Structural alignment 1</prompt>
<type>
<datatype>
<class>Alignment</class>
</datatype>
<dataFormat>FASTA</dataFormat>
</type>
<format>
<code proglang="perl">(defined $value)? " --seed $value ": ""</code>
<code proglang="python">( "" , " --seed "+str(value))[value is not None]</code>
</format>
<comment>
<text lang="en">These sequences will be aligned with the 'input' sequences above, being used as a constraint. </text>
</comment>
</parameter>
<parameter>
<name>seed_2</name>
<prompt>Structural alignment 1</prompt>
<type>
<datatype>
<class>Alignment</class>
</datatype>
<dataFormat>FASTA</dataFormat>
</type>
<format>
<code proglang="perl">(defined $value)? " --seed $value ": ""</code>
<code proglang="python">( "" , " --seed "+str(value))[value is not None]</code>
</format>
<comment>
<text lang="en">These sequences will be aligned with the 'input' sequences above, being used as a constraint. </text>
</comment>
</parameter>
<parameter>
<name>seed_3</name>
<prompt>Structural alignment 1</prompt>
<type>
<datatype>
<class>Alignment</class>
</datatype>
<dataFormat>FASTA</dataFormat>
</type>
<format>
<code proglang="perl">(defined $value)? " --seed $value ": ""</code>
<code proglang="python">( "" , " --seed "+str(value))[value is not None]</code>
</format>
<comment>
<text lang="en">These sequences will be aligned with the 'input' sequences above, being used as a constraint. </text>
</comment>
</parameter>
</parameters>
<comment>
<text lang="en">Seed alignments given in alignment (fasta format) are aligned with sequences in input. The alignment within every seed is preserved.</text>
</comment>
</paragraph>
<parameter>
<name>anysymbol</name>
<prompt lang="en">Allow unusual symbols (Selenocysteine "U", Inosine "i", non-alphabetical characters, etc.)</prompt>
<type>
<datatype>
<class>Boolean</class>
</datatype>
</type>
<vdef>
<value>0</value>
</vdef>
<format>
<code proglang="perl">( value )? "" : " --anysymbol "</code>
<code proglang="python">( "" , " --anysymbol ")[ value ]</code>
</format>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>If there are unusual characters (e.g., U as selenocysteine in protein sequence), use the --anysymbol option.</p>
<p>It accepts any printable characters (U, O, #, $, %, etc.; 0x21-0x7e in the ASCII code), execpt for > (0x3e).
They are scored equivalently to X. Gap is - (0x2d), as in the default mode.</p>
</div>
</comment>
</parameter>
</parameters>
</paragraph>
<paragraph>
<name>output_opt</name>
<prompt>Output Options</prompt>
<parameters>
<parameter>
<name>output_format</name>
<prompt lang="en">Output format: </prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>FASTA</value>
</vdef>
<flist>
<felem>
<value>FASTA</value>
<label>fasta</label>
<code proglang="perl">''</code>
<code proglang="python">''</code>
</felem>
<felem>
<value>CLUSTAL</value>
<label>clustal</label>
<code proglang="perl">' --clustalout '</code>
<code proglang="python">' --clustalout '</code>
</felem>
<felem>
<value>PHYLIP</value>
<label>phylip interleaved</label>
<code proglang="perl">' --phylipout '</code>
<code proglang="python">' --phylipout '</code>
</felem>
</flist>
</parameter>
<parameter>
<name>out_order</name>
<prompt lang="en">Output order: </prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>reorder</value>
</vdef>
<vlist>
<velem>
<value>inputorder</value>
<label>Same as input</label>
</velem>
<velem>
<value>reorder</value>
<label>Aligned</label>
</velem>
</vlist>
<format>
<code proglang="perl">( value eq 'reorder')? " --reorder " : ""</code>
<code proglang="python">( '' , ' --reorder ' )[ value == 'reorder' ]</code>
</format>
</parameter>
</parameters>
</paragraph>
<paragraph>
<name>advanced_settings </name>
<prompt>Advanced settings </prompt>
<parameters>
<parameter ismandatory="1" iscommand="1">
<name>strategy</name>
<prompt lang="en">Strategy:</prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>auto</value>
</vdef>
<flist>
<felem>
<value>auto</value>
<label>Auto (FFT-NS-2, FFT-NS-i or L-INS-i; depends on data size)</label>
<code proglang="perl">"mafft --auto"</code>
<code proglang="python">"mafft --auto"</code>
</felem>
<felem>
<value>fftns1</value>
<label>FFT-NS-1 (Very fast; recommended for >2,000 sequences; progressive method)</label>
<code proglang="perl">"mafft-fftns --retree 1 "</code>
<code proglang="python">"mafft-fftns --retree 1 "</code>
</felem>
<felem>
<value>fftns2</value>
<label>FFT-NS-2 (Fast; progressive method)</label>
<code proglang="perl">"mafft-fftns "</code>
<code proglang="python">"mafft-fftns "</code>
</felem>
<felem>
<value>fftnsi2</value>
<label>FFT-NS-i2 (Medium; iterative refinement method, two cycles only)</label>
<code proglang="perl">"mafft-fftnsi "</code>
<code proglang="python">"mafft-fftnsi "</code>
</felem>
<felem>
<value>fftnsi1000</value>
<label>FFT-NS-i (Slow; iterative refinement method)</label>
<code proglang="perl">"mafft-fftnsi --maxiterate 1000 "</code>
<code proglang="python">"mafft-fftnsi --maxiterate 1000 "</code>
</felem>
<felem>
<value>einsi</value>
<label>E-INS-i (Very slow; recommended for <200 sequences with multiple conserved domains and long gaps)</label>
<code proglang="perl">"mafft-einsi "</code>
<code proglang="python">"mafft-einsi "</code>
</felem>
<felem>
<value>linsi</value>
<label>L-INS-i (Very slow; recommended for <200 sequences with one conserved domain and long gaps)</label>
<code proglang="perl">"mafft-linsi "</code>
<code proglang="python">"mafft-linsi "</code>
</felem>
<felem>
<value>ginsi</value>
<label>G-INS-i (Very slow; recommended for <200 sequences with global homology)</label>
<code proglang="perl">"mafft-ginsi "</code>
<code proglang="python">"mafft-ginsi "</code>
</felem>
<felem>
<value>qinsi</value>
<label>Q-INS-i (Extremely slow; recommended
for a global alignment of highly diverged ncRNAs with <200 seq × <1,000 nt)</label>
<code proglang="perl">"mafft-qinsi "</code>
<code proglang="python">"mafft-qinsi "</code>
</felem>
</flist>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<h2>Algorithms and parameters (unfinished)</h2>
MAFFT offers various multiple alignment strategies.
They are classified into three types,
(<b>a</b>) the progressive method,
(<b>b</b>) the iterative refinement method with the WSP score, and
(<b>c</b>) the iterative refinment method using both the WSP and consistency scores.
In general,
there is a tradeoff between speed and accuracy.
The order of speed is
<b>a</b> > <b>b</b> > <b>c</b>, whereas
the order of accuracy is <b>a</b>
< <b>b</b> < <b>c</b>.
The results of benchmarks can be seen
<a href="http://mafft.cbrc.jp/alignment/software/eval/accuracy.html">here</a>.
The following are the detailed procedures for the major options of MAFFT.
<h3 id="fftnsx">(a) FFT-NS-1, FFT-NS-2 — Progressive methods</h3>
<img src="http://mafft.cbrc.jp/alignment/software/algorithms/prog.png" alt="prog.png" height="163" width="382" />
<br />
These are simple progressive methods like
<a href="http://www.ebi.ac.uk/clustalw/">ClustalW</a>.
By using the several new techniques described below,
these options can align a large number of sequences
(up to ∼5,000) on a standard desktop computer.
The qualities of the resulting alignments are shown
<a href="http://mafft.cbrc.jp/alignment/software/eval/accuracy.html">here</a>.
The detailed algorithms are described in Katoh et al. (2002).
<ul>
<li>
<b>FFT-NS-1</b><br />
<b><tt>
mafft --retree 1
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
fftns --retree 1
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
is the simplest progressive option in MAFFT
and one of the fastest methods currently available.
The procedure is:
(1) make a rough distance matrix by counting the number of
shared 6-tuples (see below) between every sequence pair,
(2) build a guide tree
and (3) align the sequences according to the branching order.
<p>
</p>
</li>
<li>
<b>FFT-NS-2</b>
<br />
<b>
<tt>
mafft --retree 2
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
fftns
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
The distance matrix used in FFT-NS-1 is very approximate
and unreliable.
In FFT-NS-2,
(4) the guide tree is re-computed from
the FFT-NS-1 alignment,
and (5) the second progressive alignment
is carried out.
</li>
</ul>
The following techniques are used to improve the performance.
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;"> FFT approximation.</span>
(Not yet written) See Katoh et al. (2002).
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">
<i>k</i>
-mer counting.
</span>
To accelerate the initial calculation of the distance matrix,
which requires a CPU time of
<i>O</i>
(
<i>N</i>
<sup>2</sup>
) steps,
a rough method similar to the 'quicktree' option of ClustalW
is adopted,
in which the number of
<i>k</i>
-mers shared by
a pair of sequences
is counted and regarded as an approximation
of the degree of similarity.
MAFFT uses the very rapid method proposed by Jones et al. (1992)
with a minor modification
(Katoh et al. 2002): (1) The 20 amino acids are compressed to 6
alphabets, according to Dayhoff et al. (1978),
and
(2) MAFFT performs the second progressive alignment (FFT-NS-2) in order to
improve the accuracy.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;"> Modified UPGMA.</span>
<a href="upg.html">A modified version of UPGMA</a>
is used to construct a guide tree,
which works well for handling fragment sequences.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">The second progressive alignment.</span>
The accuracy of the second progressive alignment (FFT-NS-2)
is slightly higher than that of the first progressive alignment (FFT-NS-1)
according to the
<a href="http://mafft.cbrc.jp/alignment/software/eval/accuracy.html">BAliBASE test</a>
,
but the amount CPU time required by FFT-NS-2 is
approximately two times longer than that by FFT-NS-1.
</p>
<h3>(b) FFT-NS-i, NW-NS-i — Iterative refinement method</h3>
<img src="http://mafft.cbrc.jp/alignment/software/algorithms/iter.png" alt="iter.png" height="200" width="379" />
<br />
The accuracy of progressive alignment
can be improved
by the iterative refinement method (Berger and Munson 1991, Gotoh 1993).
A simplified version of
<a href="">PRRN</a>
is implemented as the
FFT-NS-i option of MAFFT.
In FFT-NS-i,
an initial alignment by FFT-NS-2 is subjected to
an iterative refienment process.
<ul>
<li>
<b>FFT-NS-i (max. 1,000 cycles)</b>
<br />
<b>
<tt>
mafft --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
fftnsi --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
The iterative refinement is repeated until
no more improvement in the WSP score is made or the number of cycles reaches 1,000.
<p>
</p>
</li>
<li>
<b>FFT-NS-i (max. 2 cycles)</b>
<br />
<b>
<tt>
mafft --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
fftnsi
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
As most of the quality of improvement is obtained in the early
stage of the iteration, this option is also useful
(default of the fftnsi script).
</li>
</ul>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Objective function.</span>
The weighted sum-of-pairs (WSP) score proposed by Gotoh is used.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Tree-dependent partitioning.</span>
(Not yet written)
See Hirosawa et al.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Effect of FFT.</span>
To test the effect of the FFT approximation,
we also implemented the NW-NS-x options,
in which the FFT approximation is disabled, but the other procedures are the same
as those in the corresponding FFT-NS-x.
There was no significant reduction in the accuracy
by introducing the FFT approximation
(Katoh et al. 2002).
</p>
<h3>(c) L-INS-i, E-INS-i, G-INS-i — Iterative refinement methods using WSP and consistency scores</h3>
<img src="http://mafft.cbrc.jp/alignment/software/algorithms/cons.png" alt="cons.png" height="134" width="366" />
<br />
In order to obtain more accurate alignments in extremely difficult cases,
three new options, L-INS-i, G-INS-i and E-INS-i, have been added to
recent versions (v.≥5) of MAFFT.
These options use
a new objective function combining the WSP score (Gotoh) explained above
and the COFFEE-like score (Notredame et al.),
which evaluates the consistency between
a multiple alignment and pairwise alignments (Katoh et al. 2005).
<p id="GLE">
For pairwise alignment,
three different types of algorithms are implemented,
global alignment (Needleman-Wunsch), local alignment (Smith-Waterman)
with affine gap costs (Gotoh) and
local alignment with generalized affine gap costs (Altschul).
The differences in the accuracy values among these methods are small
for the currently available benchmarks, as shown
<a href="http://mafft.cbrc.jp/alignment/software/eval/accuracy.html">here</a>
.
However,
each of them has different characteristics, according to the algorithm
in the pairwise alignment stage:
</p>
<ul>
<li id="einsi">
<b>E-INS-i</b>
<br />
<b>
<tt>
mafft --genafpair --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
einsi
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
is suitable for alignments like this:
<pre style="background-color: #F0F0F0; border: 0 solid #AAAAAA; font-size: 90%; font-weight: bold;">
oooooooooXXX------XXXX---------------------------------XXXXXXXXXXX-XXXXXXXXXXXXXXXooooooooooooo
---------XXXXXXXXXXXXXooo------------------------------XXXXXXXXXXXXXXXXXX-XXXXXXXX-------------
-----ooooXXXXXX---XXXXooooooooooo----------------------XXXXX----XXXXXXXXXXXXXXXXXXooooooooooooo
---------XXXXX----XXXXoooooooooooooooooooooooooooooooooXXXXX-XXXXXXXXXXXX--XXXXXXX-------------
---------XXXXX----XXXX---------------------------------XXXXX---XXXXXXXXXX--XXXXXXXooooo--------
</pre>
where '
<tt>X</tt>
's indicate alignable residues,
'
<tt>o</tt>
's indicate unalignable residues and
'
<tt>-</tt>
's indicate gaps.
Unalignable residues are left
unaligned
at the pairwise alignment stage,
because of the use of the generalized affine gap cost.
Therefore E-INS-i is applicable to a difficult problem such as RNA polymerase, which
has several conserved motifs embedded in long unalignable regions.
As E-INS-i has the minimum assumption of the three methods,
this is recommended if the nature of sequences to be aligned is not clear.
Note that E-INS-i assumes that the arrangement of the conserved motifs is shared by
all sequences.
</li>
<li id="linsi">
<b>L-INS-i</b>
<br />
<b>
<tt>
mafft --localpair --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
linsi
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
is suitable to:
<pre style="background-color: #F0F0F0; border: 0 solid #AAAAAA; font-size: 90%; font-weight: bold;">
ooooooooooooooooooooooooooooooooXXXXXXXXXXX-XXXXXXXXXXXXXXX------------------
--------------------------------XX-XXXXXXXXXXXXXXX-XXXXXXXXooooooooooo-------
------------------ooooooooooooooXXXXX----XXXXXXXX---XXXXXXXooooooooooo-------
--------ooooooooooooooooooooooooXXXXX-XXXXXXXXXX----XXXXXXXoooooooooooooooooo
--------------------------------XXXXXXXXXXXXXXXX----XXXXXXX------------------
</pre>
L-INS-i can align
a set of sequences containing sequences flanking
around one alignable domain.
Flanking sequences are ignored in the pairwise alignment
by the Smith-Waterman algorithm.
Note that the input sequences are assumed to have
only one alignable domain.
In benchmark tests, the ref4 of BAliBASE corresponds to this.
The other categories of BAliBASE also correspond to similar situations,
because they have flanking sequences.
L-INS-i also shows higher accuracy values for a part of SABmark and HOMSTRAD
than G-INS-i, but we have not identified the reason for this.
</li>
<li id="ginsi">
<b>G-INS-i</b>
<br />
<b>
<tt>
mafft --globalpair --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
ginsi
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
is suitable to:
<pre style="background-color: #F0F0F0; border: 0 solid #AAAAAA; font-size: 90%; font-weight: bold;">
XXXXXXXXXXX-XXXXXXXXXXXXXXX
XX-XXXXXXXXXXXXXXX-XXXXXXXX
XXXXX----XXXXXXXX---XXXXXXX
XXXXX-XXXXXXXXXX----XXXXXXX
XXXXXXXXXXXXXXXX----XXXXXXX
</pre>
G-INS-i assumes that entire region can be aligned
and tries to align them globally using
the Needleman-Wunsch algorithm;
that is,
a set of sequences of one domain
must be extracted by truncating flanking
sequences.
In benchmark tests, SABmark and HOMSTRAD correspond to this.
</li>
</ul>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Consistency score.</span>
The COFFEE objective function was originally proposed
by Notredame et al. (1998), and
the extended versions are used in TCoffee and ProbCons.
MAFFT also adopts a similar objective function, as described
in Katoh et al. (2005).
However,
the consistency among three sequences
(called 'library extension' in TCoffee)
is currently not calculated in MAFFT,
because the improvement in accuracy by library extension was limited to
alignments consisting of a small number (<10) of sequences
in our preliminary tests.
If library extention is needed, then please use
<a href="http://igs-server.cnrs-mrs.fr/%7Ecnotred/Projects_home_page/t_coffee_home_page.html">TCoffee</a>
or
<a href="http://probcons.stanford.edu/">ProbCons</a>.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Consistency + WSP.</span>
Instead,
the WSP score is summed with the consistency score in the
objective function of MAFFT.
The use of the WSP score
has the merit that a pattern of gaps can be incorporated
into the objective function.
This is probably the reason why
MAFFT achieves higher accuracy than
ProbCons and TCoffee for alignments consisting of
many (∼10 - ∼100) sequences.
This suggests that
the pattern of gaps within a group
to be aligned
is important information
when aligning two groups of proteins (and evaluating
homology between distantly related protein families).
</p>
</div>
</comment>
<argpos>0</argpos>
</parameter>
<parameter>
<name>amino_scm</name>
<prompt lang="en">Scoring matrix for amino acid sequences:</prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>BLOSUM62</value>
</vdef>
<flist>
<felem>
<value>BLOSUM30</value>
<label>BLOSUM30</label>
<code proglang="perl">" --bl 30 "</code>
<code proglang="python">" --bl 30 "</code>
</felem>
<felem>
<value>BLOSUM45</value>
<label>BLOSUM45</label>
<code proglang="perl">" --bl 45 "</code>
<code proglang="python">" --bl 45 "</code>
</felem>
<felem>
<value>BLOSUM62</value>
<label>BLOSUM62</label>
<code proglang="perl">""</code>
<code proglang="python">""</code>
</felem>
<felem>
<value>BLOSUM80</value>
<label>BLOSUM80</label>
<code proglang="perl">" --bl 80 "</code>
<code proglang="python">" --bl 80 "</code>
</felem>
<felem>
<value>JT100</value>
<label>JT100</label>
<code proglang="perl">" --jtt 100 "</code>
<code proglang="python">" --jtt 100 "</code>
</felem>
<felem>
<value>JT200</value>
<label>JT200</label>
<code proglang="perl">" --jtt 200 "</code>
<code proglang="python">" --jtt 200 "</code>
</felem>
</flist>
<comment>
<text lang="en">The BLOSUM62 matrix is adopted as a default scoring matrix,
because this showed slightly higher accuracy values than the
BLOSUM80, 45, JTT200PAM, 100PAM and Gonnet matrices in SABmark tests. </text>
</comment>
</parameter>
<parameter>
<name>nuc_scm</name>
<prompt lang="en">Scoring matrix for nucleotide sequences:</prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>200</value>
</vdef>
<vlist>
<velem undef="1">
<value>200</value>
<label>200PAM/ k=2</label>
</velem>
<velem>
<value>20</value>
<label>20PAM/ k=2</label>
</velem>
<velem>
<value>1</value>
<label>1PAM/ k=2</label>
</velem>
</vlist>
<format>
<code proglang="perl">( defined $value and $value ne $vdef )" --kimura $value " : ""</code>
<code proglang="python">( "" , " --kimura "+str( value ) )[ value is not None and value!= vdef ]</code>
</format>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<p style="color: red">Switch it to '1PAM / κ=2' when aligning closely related DNA sequences.</p>
<p>The default scoring matrix is derived from Kimura's two-parameter model.
The ratio of transitions to transversions is set at 2 by default.
Other parameters can be used, but have not yet been tested. </p>
</div>
</comment>
</parameter>
<parameter>
<name>gap_open_penalty</name>
<prompt lang="en">Gap opening penalty (1.0 - 3.0): </prompt>
<type>
<datatype>
<class>Float</class>
</datatype>
</type>
<vdef>
<value>1.53</value>
</vdef>
<format>
<code proglang="perl">(defined $value and $value != $vdef)? " --op $value " : ""</code>
<code proglang="python">( "" , " --op "+str( value ) )[ value is not None and value != vdef ]</code>
</format>
<ctrl>
<message>
<text lang="en">You must provide a value between 1.0 < value < 3.0</text>
</message>
<code proglang="">1.0 < $value < 3.0</code>
<code proglang="">1.0 < value < 3.0</code>
</ctrl>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">
Gap penalties for proteins.</span>
The default gap penalties for amino acid alignments
have been changed in
<span style="color: red">v.4.0</span>.
Note that the current version of MAFFT returns
an entirely different alignment from v.<4.0.
In v.4.0, two major gap penalties
(--op [gap open penalty]
and --ep [offset value, which functions like a gap extension penalty,
see the
<a href="http://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html">mafft3 paper</a>
for definition])
were tuned by applying the FFT-NS-2 option to a part of
the SABmark benchmark.
We adopted the parameter set (--op 1.53 --ep 0.123) optimized for
SABmark,
because this works better for other benchmark
(HOMSTRAD, PREFAB and BAliBASE)
tests than
the previous one (--ep 2.4 --ep 0.06).
Other parameters might work better in other situations.
Consistency-based options have more parameters
(L-INS-i has four more parameters and E-INS-i has six more parameters).
We determined these additional parameters so that the Smith-Waterman alignment function
used in L-INS-i
returns a local alignment similar to that generated by FASTA,
but we have not closely tuned them yet.
In our tests using SABmark,
the accuracy values can be improved by 2-3% by
tuning these parameters,
but this improvement may result from overfitting.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Gap penalties for RNAs.</span>
The default gap penalties for nucleotide alignment
have changed in
<span class="red">
v.5.6</span>.
Note that the current version of MAFFT returns
an entirely different alignment from v.<5.6.
In the former versions (v.<5.6),
the default gap penalties for nucleotide alignments were set at the same values
as those for amino acid alignments.
According to
<a href="http://projects.binf.ku.dk/pgardner/bralibase/">BRAliBASE</a>
,
these penalties result in
very bad alignments for RNAs.
The newer versions (v.≥5.6) use a different penalties for nucleotide alignment;
the penalty values are set to three times larger than those for amino acids.
This is not yet the optimal value for BRAliBASE.
The BRAliBASE score can be improved by
closely tuning the penalty values, but we have not adopted the
optimized penalties, because we are not sure whether they are
applicable to a wide range of problems.
</p>
</div>
</comment>
</parameter>
<parameter>
<name>offset</name>
<prompt lang="en">Offset value (0.0 - 1.0):</prompt>
<type>
<datatype>
<class>Float</class>
</datatype>
</type>
<vdef>
<value>0.0</value>
</vdef>
<format>
<code proglang="perl">(defined $value and $value != $vdef)? " --ep $value " : ""</code>
<code proglang="python">( "" , " --ep "+str( value ) )[ value is not None and value != 0.123 ]</code>
</format>
<ctrl>
<message>
<text lang="en">You must provide a value between 0.0 < value < 1.0 (default 0.123)</text>
</message>
<code proglang="">0.0 < $value < 1.0</code>
<code proglang="">0.0 < value < 1.0</code>
</ctrl>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<p style="color: red">If long gaps are not expected, set it as 0.1 or larger value.</p>
</div>
</comment>
</parameter>
</parameters>
</paragraph>
<parameter isstdout="1">
<name>result</name>
<prompt lang="en">Alignment file</prompt>
<type>
<datatype>
<class>Alignment</class>
</datatype>
<dataFormat>
<ref param="output_format"/>
</dataFormat>
</type>
<filenames>
<code proglang="perl">"mafft.out"</code>
<code proglang="python">"mafft.out"</code>
</filenames>
</parameter>
</parameters>
</program>
|