/var/lib/mobyle/programs/mafft.xml is in mobyle-programs 5.1.2-2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
| <?xml version='1.0' encoding='UTF-8'?>
<!-- XML Authors: Corinne Maufrais, Nicolas Joly and Bertrand Neron, -->
<!-- 'Biological Software and Databases' Group, Institut Pasteur, Paris. -->
<!-- Distributed under LGPLv2 License. Please refer to the COPYING.LIB document. -->
<program>
<head>
<name>mafft</name>
<version>6.849</version>
<doc>
<title>mafft</title>
<description>
<text lang="en">Multiple alignment program for amino acid or nucleotide sequences.</text>
</description>
<sourcelink>http://mafft.cbrc.jp/alignment/software/source.html</sourcelink>
<homepagelink>http://mafft.cbrc.jp/alignment/software/</homepagelink>
<authors>Kazutaka Katoh</authors>
<reference doi="10.1093/bioinformatics/btq224">
Katoh, Toh 2010 (Bioinformatics 26:1899-1900)
Parallelization of the MAFFT multiple sequence alignment program.
(describes the multithread version; Linux only)
</reference>
<reference doi="10.1007/978-1-59745-251-9_3">
Katoh, Asimenos, Toh 2009 (Methods in Molecular Biology 537:39-64)
Multiple Alignment of DNA Sequences with MAFFT. In Bioinformatics for DNA Sequence Analysis edited by D. Posada
(outlines DNA alignment methods and several tips including group-to-group alignment and rough clustering of a large number of sequences)
</reference>
<reference doi="10.1186/1471-2105-9-212">
Katoh, Toh 2008 (BMC Bioinformatics 9:212)
Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework.
(describes RNA structural alignment methods)
</reference>
<reference doi="10.1093/bib/bbn013">
Katoh, Toh 2008 (Briefings in Bioinformatics 9:286-298)
Recent developments in the MAFFT multiple sequence alignment program.
(outlines version 6; Fast Breaking Paper in Thomson Reuters' ScienceWatch)
</reference>
<reference doi="10.1093/bioinformatics/btl592">
Katoh, Toh 2007 (Bioinformatics 23:372-374) Errata
PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences.
(describes the PartTree algorithm)
</reference>
<reference doi="10.1093/nar/gki198">
Katoh, Kuma, Toh, Miyata 2005 (Nucleic Acids Res. 33:511-518)
MAFFT version 5: improvement in accuracy of multiple sequence alignment.
(describes [ancestral versions of] the G-INS-i, L-INS-i and E-INS-i strategies)
</reference>
<reference doi="10.1093/nar/gkf436">
Katoh, Misawa, Kuma, Miyata 2002 (Nucleic Acids Res. 30:3059-3066)
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
(describes the FFT-NS-1, FFT-NS-2 and FFT-NS-i strategies)
</reference>
<doclink>http://mafft.cbrc.jp/alignment/software/about.html</doclink>
</doc>
<category>alignment:multiple</category>
</head>
<parameters>
<paragraph>
<name>input_opt</name>
<prompt lang="en">Input Options</prompt>
<parameters>
<parameter ismandatory="1" issimple="1">
<name>sequences</name>
<prompt lang="en">Sequences File ( a file containing several sequences ).</prompt>
<type>
<datatype>
<class>Sequence</class>
</datatype>
<dataFormat>FASTA</dataFormat>
</type>
<format>
<code proglang="perl">" $sequences"</code>
<code proglang="python">" " + str( sequences )</code>
</format>
<argpos>1000</argpos>
</parameter>
<parameter>
<name>seq_type</name>
<prompt lang="en">Sequences type</prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>null</value>
</vdef>
<flist>
<felem undef="1">
<value>null</value>
<label>Automatic</label>
<code proglang="perl">''</code>
<code proglang="python">''</code>
</felem>
<felem>
<value>nuc</value>
<label>Assume the sequences are nucleotide.</label>
<code proglang="perl">" --nuc "</code>
<code proglang="python">" --nuc "</code>
</felem>
<felem>
<value>amino</value>
<label>Assume the sequences are amino acid.</label>
<code proglang="perl">" --amino "</code>
<code proglang="python">" --amino "</code>
</felem>
</flist>
</parameter>
<paragraph>
<name>seed</name>
<prompt>Use structural alignment(s)</prompt>
<parameters>
<parameter>
<name>seed_1</name>
<prompt>Structural alignment 1</prompt>
<type>
<datatype>
<class>Alignment</class>
</datatype>
<dataFormat>FASTA</dataFormat>
</type>
<format>
<code proglang="perl">(defined $value)? " --seed $value ": ""</code>
<code proglang="python">( "" , " --seed "+str(value))[value is not None]</code>
</format>
<comment>
<text lang="en">These sequences will be aligned with the 'input' sequences above, being used as a constraint. </text>
</comment>
</parameter>
<parameter>
<name>seed_2</name>
<prompt>Structural alignment 1</prompt>
<type>
<datatype>
<class>Alignment</class>
</datatype>
<dataFormat>FASTA</dataFormat>
</type>
<format>
<code proglang="perl">(defined $value)? " --seed $value ": ""</code>
<code proglang="python">( "" , " --seed "+str(value))[value is not None]</code>
</format>
<comment>
<text lang="en">These sequences will be aligned with the 'input' sequences above, being used as a constraint. </text>
</comment>
</parameter>
<parameter>
<name>seed_3</name>
<prompt>Structural alignment 1</prompt>
<type>
<datatype>
<class>Alignment</class>
</datatype>
<dataFormat>FASTA</dataFormat>
</type>
<format>
<code proglang="perl">(defined $value)? " --seed $value ": ""</code>
<code proglang="python">( "" , " --seed "+str(value))[value is not None]</code>
</format>
<comment>
<text lang="en">These sequences will be aligned with the 'input' sequences above, being used as a constraint. </text>
</comment>
</parameter>
</parameters>
<comment>
<text lang="en">Seed alignments given in alignment (fasta format) are aligned with sequences in input. The alignment within every seed is preserved.</text>
</comment>
</paragraph>
<parameter>
<name>anysymbol</name>
<prompt lang="en">Allow unusual symbols (Selenocysteine "U", Inosine "i", non-alphabetical characters, etc.)</prompt>
<type>
<datatype>
<class>Boolean</class>
</datatype>
</type>
<vdef>
<value>0</value>
</vdef>
<format>
<code proglang="perl">( value )? "" : " --anysymbol "</code>
<code proglang="python">( "" , " --anysymbol ")[ value ]</code>
</format>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>If there are unusual characters (e.g., U as selenocysteine in protein sequence), use the --anysymbol option.</p>
<p>It accepts any printable characters (U, O, #, $, %, etc.; 0x21-0x7e in the ASCII code), execpt for > (0x3e).
They are scored equivalently to X. Gap is - (0x2d), as in the default mode.</p>
</div>
</comment>
</parameter>
</parameters>
</paragraph>
<paragraph>
<name>output_opt</name>
<prompt>Output Options</prompt>
<parameters>
<parameter>
<name>output_format</name>
<prompt lang="en">Output format: </prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>FASTA</value>
</vdef>
<flist>
<felem>
<value>FASTA</value>
<label>fasta</label>
<code proglang="perl">''</code>
<code proglang="python">''</code>
</felem>
<felem>
<value>CLUSTAL</value>
<label>clustal</label>
<code proglang="perl">' --clustalout '</code>
<code proglang="python">' --clustalout '</code>
</felem>
<felem>
<value>PHYLIP</value>
<label>phylip interleaved</label>
<code proglang="perl">' --phylipout '</code>
<code proglang="python">' --phylipout '</code>
</felem>
</flist>
</parameter>
<parameter>
<name>out_order</name>
<prompt lang="en">Output order: </prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>reorder</value>
</vdef>
<vlist>
<velem>
<value>inputorder</value>
<label>Same as input</label>
</velem>
<velem>
<value>reorder</value>
<label>Aligned</label>
</velem>
</vlist>
<format>
<code proglang="perl">( value eq 'reorder')? " --reorder " : ""</code>
<code proglang="python">( '' , ' --reorder ' )[ value == 'reorder' ]</code>
</format>
</parameter>
</parameters>
</paragraph>
<paragraph>
<name>advanced_settings </name>
<prompt>Advanced settings </prompt>
<parameters>
<parameter ismandatory="1" iscommand="1">
<name>strategy</name>
<prompt lang="en">Strategy:</prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>auto</value>
</vdef>
<flist>
<felem>
<value>auto</value>
<label>Auto (FFT-NS-2, FFT-NS-i or L-INS-i; depends on data size)</label>
<code proglang="perl">"mafft --auto"</code>
<code proglang="python">"mafft --auto"</code>
</felem>
<felem>
<value>fftns1</value>
<label>FFT-NS-1 (Very fast; recommended for >2,000 sequences; progressive method)</label>
<code proglang="perl">"mafft-fftns --retree 1 "</code>
<code proglang="python">"mafft-fftns --retree 1 "</code>
</felem>
<felem>
<value>fftns2</value>
<label>FFT-NS-2 (Fast; progressive method)</label>
<code proglang="perl">"mafft-fftns "</code>
<code proglang="python">"mafft-fftns "</code>
</felem>
<felem>
<value>fftnsi2</value>
<label>FFT-NS-i2 (Medium; iterative refinement method, two cycles only)</label>
<code proglang="perl">"mafft-fftnsi "</code>
<code proglang="python">"mafft-fftnsi "</code>
</felem>
<felem>
<value>fftnsi1000</value>
<label>FFT-NS-i (Slow; iterative refinement method)</label>
<code proglang="perl">"mafft-fftnsi --maxiterate 1000 "</code>
<code proglang="python">"mafft-fftnsi --maxiterate 1000 "</code>
</felem>
<felem>
<value>einsi</value>
<label>E-INS-i (Very slow; recommended for <200 sequences with multiple conserved domains and long gaps)</label>
<code proglang="perl">"mafft-einsi "</code>
<code proglang="python">"mafft-einsi "</code>
</felem>
<felem>
<value>linsi</value>
<label>L-INS-i (Very slow; recommended for <200 sequences with one conserved domain and long gaps)</label>
<code proglang="perl">"mafft-linsi "</code>
<code proglang="python">"mafft-linsi "</code>
</felem>
<felem>
<value>ginsi</value>
<label>G-INS-i (Very slow; recommended for <200 sequences with global homology)</label>
<code proglang="perl">"mafft-ginsi "</code>
<code proglang="python">"mafft-ginsi "</code>
</felem>
<felem>
<value>qinsi</value>
<label>Q-INS-i (Extremely slow; recommended
for a global alignment of highly diverged ncRNAs with <200 seq × <1,000 nt)</label>
<code proglang="perl">"mafft-qinsi "</code>
<code proglang="python">"mafft-qinsi "</code>
</felem>
</flist>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<h2>Algorithms and parameters (unfinished)</h2>
MAFFT offers various multiple alignment strategies.
They are classified into three types,
(<b>a</b>) the progressive method,
(<b>b</b>) the iterative refinement method with the WSP score, and
(<b>c</b>) the iterative refinment method using both the WSP and consistency scores.
In general,
there is a tradeoff between speed and accuracy.
The order of speed is
<b>a</b> > <b>b</b> > <b>c</b>, whereas
the order of accuracy is <b>a</b>
< <b>b</b> < <b>c</b>.
The results of benchmarks can be seen
<a href="http://mafft.cbrc.jp/alignment/software/eval/accuracy.html">here</a>.
The following are the detailed procedures for the major options of MAFFT.
<h3 id="fftnsx">(a) FFT-NS-1, FFT-NS-2 — Progressive methods</h3>
<img src="http://mafft.cbrc.jp/alignment/software/algorithms/prog.png" alt="prog.png" height="163" width="382" />
<br />
These are simple progressive methods like
<a href="http://www.ebi.ac.uk/clustalw/">ClustalW</a>.
By using the several new techniques described below,
these options can align a large number of sequences
(up to ∼5,000) on a standard desktop computer.
The qualities of the resulting alignments are shown
<a href="http://mafft.cbrc.jp/alignment/software/eval/accuracy.html">here</a>.
The detailed algorithms are described in Katoh et al. (2002).
<ul>
<li>
<b>FFT-NS-1</b><br />
<b><tt>
mafft --retree 1
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
fftns --retree 1
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
is the simplest progressive option in MAFFT
and one of the fastest methods currently available.
The procedure is:
(1) make a rough distance matrix by counting the number of
shared 6-tuples (see below) between every sequence pair,
(2) build a guide tree
and (3) align the sequences according to the branching order.
<p>
</p>
</li>
<li>
<b>FFT-NS-2</b>
<br />
<b>
<tt>
mafft --retree 2
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
fftns
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
The distance matrix used in FFT-NS-1 is very approximate
and unreliable.
In FFT-NS-2,
(4) the guide tree is re-computed from
the FFT-NS-1 alignment,
and (5) the second progressive alignment
is carried out.
</li>
</ul>
The following techniques are used to improve the performance.
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;"> FFT approximation.</span>
(Not yet written) See Katoh et al. (2002).
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">
<i>k</i>
-mer counting.
</span>
To accelerate the initial calculation of the distance matrix,
which requires a CPU time of
<i>O</i>
(
<i>N</i>
<sup>2</sup>
) steps,
a rough method similar to the 'quicktree' option of ClustalW
is adopted,
in which the number of
<i>k</i>
-mers shared by
a pair of sequences
is counted and regarded as an approximation
of the degree of similarity.
MAFFT uses the very rapid method proposed by Jones et al. (1992)
with a minor modification
(Katoh et al. 2002): (1) The 20 amino acids are compressed to 6
alphabets, according to Dayhoff et al. (1978),
and
(2) MAFFT performs the second progressive alignment (FFT-NS-2) in order to
improve the accuracy.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;"> Modified UPGMA.</span>
<a href="upg.html">A modified version of UPGMA</a>
is used to construct a guide tree,
which works well for handling fragment sequences.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">The second progressive alignment.</span>
The accuracy of the second progressive alignment (FFT-NS-2)
is slightly higher than that of the first progressive alignment (FFT-NS-1)
according to the
<a href="http://mafft.cbrc.jp/alignment/software/eval/accuracy.html">BAliBASE test</a>
,
but the amount CPU time required by FFT-NS-2 is
approximately two times longer than that by FFT-NS-1.
</p>
<h3>(b) FFT-NS-i, NW-NS-i — Iterative refinement method</h3>
<img src="http://mafft.cbrc.jp/alignment/software/algorithms/iter.png" alt="iter.png" height="200" width="379" />
<br />
The accuracy of progressive alignment
can be improved
by the iterative refinement method (Berger and Munson 1991, Gotoh 1993).
A simplified version of
<a href="">PRRN</a>
is implemented as the
FFT-NS-i option of MAFFT.
In FFT-NS-i,
an initial alignment by FFT-NS-2 is subjected to
an iterative refienment process.
<ul>
<li>
<b>FFT-NS-i (max. 1,000 cycles)</b>
<br />
<b>
<tt>
mafft --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
fftnsi --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
The iterative refinement is repeated until
no more improvement in the WSP score is made or the number of cycles reaches 1,000.
<p>
</p>
</li>
<li>
<b>FFT-NS-i (max. 2 cycles)</b>
<br />
<b>
<tt>
mafft --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
fftnsi
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
As most of the quality of improvement is obtained in the early
stage of the iteration, this option is also useful
(default of the fftnsi script).
</li>
</ul>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Objective function.</span>
The weighted sum-of-pairs (WSP) score proposed by Gotoh is used.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Tree-dependent partitioning.</span>
(Not yet written)
See Hirosawa et al.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Effect of FFT.</span>
To test the effect of the FFT approximation,
we also implemented the NW-NS-x options,
in which the FFT approximation is disabled, but the other procedures are the same
as those in the corresponding FFT-NS-x.
There was no significant reduction in the accuracy
by introducing the FFT approximation
(Katoh et al. 2002).
</p>
<h3>(c) L-INS-i, E-INS-i, G-INS-i — Iterative refinement methods using WSP and consistency scores</h3>
<img src="http://mafft.cbrc.jp/alignment/software/algorithms/cons.png" alt="cons.png" height="134" width="366" />
<br />
In order to obtain more accurate alignments in extremely difficult cases,
three new options, L-INS-i, G-INS-i and E-INS-i, have been added to
recent versions (v.≥5) of MAFFT.
These options use
a new objective function combining the WSP score (Gotoh) explained above
and the COFFEE-like score (Notredame et al.),
which evaluates the consistency between
a multiple alignment and pairwise alignments (Katoh et al. 2005).
<p id="GLE">
For pairwise alignment,
three different types of algorithms are implemented,
global alignment (Needleman-Wunsch), local alignment (Smith-Waterman)
with affine gap costs (Gotoh) and
local alignment with generalized affine gap costs (Altschul).
The differences in the accuracy values among these methods are small
for the currently available benchmarks, as shown
<a href="http://mafft.cbrc.jp/alignment/software/eval/accuracy.html">here</a>
.
However,
each of them has different characteristics, according to the algorithm
in the pairwise alignment stage:
</p>
<ul>
<li id="einsi">
<b>E-INS-i</b>
<br />
<b>
<tt>
mafft --genafpair --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
einsi
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
is suitable for alignments like this:
<pre style="background-color: #F0F0F0; border: 0 solid #AAAAAA; font-size: 90%; font-weight: bold;">
oooooooooXXX------XXXX---------------------------------XXXXXXXXXXX-XXXXXXXXXXXXXXXooooooooooooo
---------XXXXXXXXXXXXXooo------------------------------XXXXXXXXXXXXXXXXXX-XXXXXXXX-------------
-----ooooXXXXXX---XXXXooooooooooo----------------------XXXXX----XXXXXXXXXXXXXXXXXXooooooooooooo
---------XXXXX----XXXXoooooooooooooooooooooooooooooooooXXXXX-XXXXXXXXXXXX--XXXXXXX-------------
---------XXXXX----XXXX---------------------------------XXXXX---XXXXXXXXXX--XXXXXXXooooo--------
</pre>
where '
<tt>X</tt>
's indicate alignable residues,
'
<tt>o</tt>
's indicate unalignable residues and
'
<tt>-</tt>
's indicate gaps.
Unalignable residues are left
unaligned
at the pairwise alignment stage,
because of the use of the generalized affine gap cost.
Therefore E-INS-i is applicable to a difficult problem such as RNA polymerase, which
has several conserved motifs embedded in long unalignable regions.
As E-INS-i has the minimum assumption of the three methods,
this is recommended if the nature of sequences to be aligned is not clear.
Note that E-INS-i assumes that the arrangement of the conserved motifs is shared by
all sequences.
</li>
<li id="linsi">
<b>L-INS-i</b>
<br />
<b>
<tt>
mafft --localpair --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
linsi
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
is suitable to:
<pre style="background-color: #F0F0F0; border: 0 solid #AAAAAA; font-size: 90%; font-weight: bold;">
ooooooooooooooooooooooooooooooooXXXXXXXXXXX-XXXXXXXXXXXXXXX------------------
--------------------------------XX-XXXXXXXXXXXXXXX-XXXXXXXXooooooooooo-------
------------------ooooooooooooooXXXXX----XXXXXXXX---XXXXXXXooooooooooo-------
--------ooooooooooooooooooooooooXXXXX-XXXXXXXXXX----XXXXXXXoooooooooooooooooo
--------------------------------XXXXXXXXXXXXXXXX----XXXXXXX------------------
</pre>
L-INS-i can align
a set of sequences containing sequences flanking
around one alignable domain.
Flanking sequences are ignored in the pairwise alignment
by the Smith-Waterman algorithm.
Note that the input sequences are assumed to have
only one alignable domain.
In benchmark tests, the ref4 of BAliBASE corresponds to this.
The other categories of BAliBASE also correspond to similar situations,
because they have flanking sequences.
L-INS-i also shows higher accuracy values for a part of SABmark and HOMSTRAD
than G-INS-i, but we have not identified the reason for this.
</li>
<li id="ginsi">
<b>G-INS-i</b>
<br />
<b>
<tt>
mafft --globalpair --maxiterate 1000
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
or
<br />
<b>
<tt>
ginsi
<i>input_file</i>
>
<i>output_file</i>
</tt>
</b>
<br />
is suitable to:
<pre style="background-color: #F0F0F0; border: 0 solid #AAAAAA; font-size: 90%; font-weight: bold;">
XXXXXXXXXXX-XXXXXXXXXXXXXXX
XX-XXXXXXXXXXXXXXX-XXXXXXXX
XXXXX----XXXXXXXX---XXXXXXX
XXXXX-XXXXXXXXXX----XXXXXXX
XXXXXXXXXXXXXXXX----XXXXXXX
</pre>
G-INS-i assumes that entire region can be aligned
and tries to align them globally using
the Needleman-Wunsch algorithm;
that is,
a set of sequences of one domain
must be extracted by truncating flanking
sequences.
In benchmark tests, SABmark and HOMSTRAD correspond to this.
</li>
</ul>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Consistency score.</span>
The COFFEE objective function was originally proposed
by Notredame et al. (1998), and
the extended versions are used in TCoffee and ProbCons.
MAFFT also adopts a similar objective function, as described
in Katoh et al. (2005).
However,
the consistency among three sequences
(called 'library extension' in TCoffee)
is currently not calculated in MAFFT,
because the improvement in accuracy by library extension was limited to
alignments consisting of a small number (<10) of sequences
in our preliminary tests.
If library extention is needed, then please use
<a href="http://igs-server.cnrs-mrs.fr/%7Ecnotred/Projects_home_page/t_coffee_home_page.html">TCoffee</a>
or
<a href="http://probcons.stanford.edu/">ProbCons</a>.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Consistency + WSP.</span>
Instead,
the WSP score is summed with the consistency score in the
objective function of MAFFT.
The use of the WSP score
has the merit that a pattern of gaps can be incorporated
into the objective function.
This is probably the reason why
MAFFT achieves higher accuracy than
ProbCons and TCoffee for alignments consisting of
many (∼10 - ∼100) sequences.
This suggests that
the pattern of gaps within a group
to be aligned
is important information
when aligning two groups of proteins (and evaluating
homology between distantly related protein families).
</p>
</div>
</comment>
<argpos>0</argpos>
</parameter>
<parameter>
<name>amino_scm</name>
<prompt lang="en">Scoring matrix for amino acid sequences:</prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>BLOSUM62</value>
</vdef>
<flist>
<felem>
<value>BLOSUM30</value>
<label>BLOSUM30</label>
<code proglang="perl">" --bl 30 "</code>
<code proglang="python">" --bl 30 "</code>
</felem>
<felem>
<value>BLOSUM45</value>
<label>BLOSUM45</label>
<code proglang="perl">" --bl 45 "</code>
<code proglang="python">" --bl 45 "</code>
</felem>
<felem>
<value>BLOSUM62</value>
<label>BLOSUM62</label>
<code proglang="perl">""</code>
<code proglang="python">""</code>
</felem>
<felem>
<value>BLOSUM80</value>
<label>BLOSUM80</label>
<code proglang="perl">" --bl 80 "</code>
<code proglang="python">" --bl 80 "</code>
</felem>
<felem>
<value>JT100</value>
<label>JT100</label>
<code proglang="perl">" --jtt 100 "</code>
<code proglang="python">" --jtt 100 "</code>
</felem>
<felem>
<value>JT200</value>
<label>JT200</label>
<code proglang="perl">" --jtt 200 "</code>
<code proglang="python">" --jtt 200 "</code>
</felem>
</flist>
<comment>
<text lang="en">The BLOSUM62 matrix is adopted as a default scoring matrix,
because this showed slightly higher accuracy values than the
BLOSUM80, 45, JTT200PAM, 100PAM and Gonnet matrices in SABmark tests. </text>
</comment>
</parameter>
<parameter>
<name>nuc_scm</name>
<prompt lang="en">Scoring matrix for nucleotide sequences:</prompt>
<type>
<datatype>
<class>Choice</class>
</datatype>
</type>
<vdef>
<value>200</value>
</vdef>
<vlist>
<velem undef="1">
<value>200</value>
<label>200PAM/ k=2</label>
</velem>
<velem>
<value>20</value>
<label>20PAM/ k=2</label>
</velem>
<velem>
<value>1</value>
<label>1PAM/ k=2</label>
</velem>
</vlist>
<format>
<code proglang="perl">( defined $value and $value ne $vdef )" --kimura $value " : ""</code>
<code proglang="python">( "" , " --kimura "+str( value ) )[ value is not None and value!= vdef ]</code>
</format>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<p style="color: red">Switch it to '1PAM / κ=2' when aligning closely related DNA sequences.</p>
<p>The default scoring matrix is derived from Kimura's two-parameter model.
The ratio of transitions to transversions is set at 2 by default.
Other parameters can be used, but have not yet been tested. </p>
</div>
</comment>
</parameter>
<parameter>
<name>gap_open_penalty</name>
<prompt lang="en">Gap opening penalty (1.0 - 3.0): </prompt>
<type>
<datatype>
<class>Float</class>
</datatype>
</type>
<vdef>
<value>1.53</value>
</vdef>
<format>
<code proglang="perl">(defined $value and $value != $vdef)? " --op $value " : ""</code>
<code proglang="python">( "" , " --op "+str( value ) )[ value is not None and value != vdef ]</code>
</format>
<ctrl>
<message>
<text lang="en">You must provide a value between 1.0 < value < 3.0</text>
</message>
<code proglang="">1.0 < $value < 3.0</code>
<code proglang="">1.0 < value < 3.0</code>
</ctrl>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">
Gap penalties for proteins.</span>
The default gap penalties for amino acid alignments
have been changed in
<span style="color: red">v.4.0</span>.
Note that the current version of MAFFT returns
an entirely different alignment from v.<4.0.
In v.4.0, two major gap penalties
(--op [gap open penalty]
and --ep [offset value, which functions like a gap extension penalty,
see the
<a href="http://mafft.cbrc.jp/alignment/software/algorithms/algorithms.html">mafft3 paper</a>
for definition])
were tuned by applying the FFT-NS-2 option to a part of
the SABmark benchmark.
We adopted the parameter set (--op 1.53 --ep 0.123) optimized for
SABmark,
because this works better for other benchmark
(HOMSTRAD, PREFAB and BAliBASE)
tests than
the previous one (--ep 2.4 --ep 0.06).
Other parameters might work better in other situations.
Consistency-based options have more parameters
(L-INS-i has four more parameters and E-INS-i has six more parameters).
We determined these additional parameters so that the Smith-Waterman alignment function
used in L-INS-i
returns a local alignment similar to that generated by FASTA,
but we have not closely tuned them yet.
In our tests using SABmark,
the accuracy values can be improved by 2-3% by
tuning these parameters,
but this improvement may result from overfitting.
</p>
<p>
<span style="color: #003366; font-size: 100%; font-style: italic; font-weight: bold;">Gap penalties for RNAs.</span>
The default gap penalties for nucleotide alignment
have changed in
<span class="red">
v.5.6</span>.
Note that the current version of MAFFT returns
an entirely different alignment from v.<5.6.
In the former versions (v.<5.6),
the default gap penalties for nucleotide alignments were set at the same values
as those for amino acid alignments.
According to
<a href="http://projects.binf.ku.dk/pgardner/bralibase/">BRAliBASE</a>
,
these penalties result in
very bad alignments for RNAs.
The newer versions (v.≥5.6) use a different penalties for nucleotide alignment;
the penalty values are set to three times larger than those for amino acids.
This is not yet the optimal value for BRAliBASE.
The BRAliBASE score can be improved by
closely tuning the penalty values, but we have not adopted the
optimized penalties, because we are not sure whether they are
applicable to a wide range of problems.
</p>
</div>
</comment>
</parameter>
<parameter>
<name>offset</name>
<prompt lang="en">Offset value (0.0 - 1.0):</prompt>
<type>
<datatype>
<class>Float</class>
</datatype>
</type>
<vdef>
<value>0.0</value>
</vdef>
<format>
<code proglang="perl">(defined $value and $value != $vdef)? " --ep $value " : ""</code>
<code proglang="python">( "" , " --ep "+str( value ) )[ value is not None and value != 0.123 ]</code>
</format>
<ctrl>
<message>
<text lang="en">You must provide a value between 0.0 < value < 1.0 (default 0.123)</text>
</message>
<code proglang="">0.0 < $value < 1.0</code>
<code proglang="">0.0 < value < 1.0</code>
</ctrl>
<comment>
<div xmlns="http://www.w3.org/1999/xhtml">
<p style="color: red">If long gaps are not expected, set it as 0.1 or larger value.</p>
</div>
</comment>
</parameter>
</parameters>
</paragraph>
<parameter isstdout="1">
<name>result</name>
<prompt lang="en">Alignment file</prompt>
<type>
<datatype>
<class>Alignment</class>
</datatype>
<dataFormat>
<ref param="output_format"/>
</dataFormat>
</type>
<filenames>
<code proglang="perl">"mafft.out"</code>
<code proglang="python">"mafft.out"</code>
</filenames>
</parameter>
</parameters>
</program>
|