This file is indexed.

/usr/bin/rsem-run-ebseq is in rsem 1.2.31+dfsg-1.

This file is owned by root:root, with mode 0o755.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
#!/usr/bin/env perl

use Getopt::Long;
use Pod::Usage;

use FindBin;
use lib $FindBin::RealBin;
use rsem_perl_utils;

use Env qw(@PATH);
@PATH = ("$FindBin::RealBin/EBSeq", @PATH);

use strict;

my $ngvF = "";
my $help = 0;

GetOptions("ngvector=s" => \$ngvF,
	   "h|help" => \$help) or pod2usage(-exitval => 2, -verbose => 2);

pod2usage(-verbose => 2) if ($help == 1);
pod2usage(-msg => "Invalid number of arguments!", -exitval => 2, -verbose => 2) if (scalar(@ARGV) != 3);
pod2usage(-msg => "ngvector file cannot be named as #! # is reserved for other purpose!", -exitval => 2, -verbose => 2) if ($ngvF eq "#");

my $command = "";

my @conditions = split(/,/, $ARGV[1]);

pod2usage(-msg => "At least 2 conditions are required for differential expression analysis!", -exitval => 2, -verbose => 2) if (scalar(@conditions) < 2);

if ($ngvF eq "") { $ngvF = "#"; }

$" = " ";
$command = "rsem-for-ebseq-find-DE $FindBin::RealBin/EBSeq $ngvF $ARGV[0] $ARGV[2] @conditions";
&runCommand($command)

__END__

=head1 NAME

rsem-run-ebseq

=head1 PURPOSE

Wrapper for EBSeq to perform differential expression analysis.

=head1 SYNOPSIS

rsem-run-ebseq [options] data_matrix_file conditions output_file

=head1 ARGUMENTS

=over

=item B<data_matrix_file>

This file is a m by n matrix. m is the number of genes/transcripts and n is the number of total samples. Each element in the matrix represents the expected count for a particular gene/transcript in a particular sample. Users can use 'rsem-generate-data-matrix' to generate this file from expression result files. 

=item B<conditions>

Comma-separated list of values representing the number of replicates for each condition. For example, "3,3" means the data set contains 2 conditions and each condition has 3 replicates. "2,3,3" means the data set contains 3 conditions, with 2, 3, and 3 replicates for each condition respectively.

=item B<output_file>

Output file name.

=back

=head1 OPTIONS

=over

=item B<--ngvector> <file>

This option provides the grouping information required by EBSeq for isoform-level differential expression analysis. The file can be generated by 'rsem-generate-ngvector'. Turning this option on is highly recommended for isoform-level differential expression analysis. (Default: off)

=item B<-h/--help>

Show help information.

=back

=head1 DESCRIPTION

This program is a wrapper over EBSeq. It performs differential expression analysis and can work on two or more conditions. All genes/transcripts and their associated statistcs are reported in one output file. This program does not control false discovery rate and call differential expressed genes/transcripts. Please use 'rsem-control-fdr' to control false discovery rate after this program is finished.

=head1 OUTPUT

=over

=item B<output_file>

This file reports the calculated statistics for all genes/transcripts. It is written as a matrix with row and column names. The row names are the genes'/transcripts' names. The column names are for the reported statistics.

If there are only 2 different conditions among the samples, four statistics (columns) will be reported for each gene/transcript. They are "PPEE", "PPDE", "PostFC" and "RealFC". "PPEE" is the posterior probability (estimated by EBSeq) that a gene/transcript is equally expressed. "PPDE" is the posterior probability that a gene/transcript is differentially expressed. "PostFC" is the posterior fold change (condition 1 over condition2) for a gene/transcript. It is defined as the ratio between posterior mean expression estimates of the gene/transcript for each condition. "RealFC" is the real fold change (condition 1 over condition2) for a gene/transcript.  It is the ratio of the normalized within condition 1 mean count over normalized within condition 2 mean count for the gene/transcript. Fold changes are calculated using EBSeq's 'PostFC' function. The genes/transcripts are reported in descending order of their "PPDE" values.

If there are more than 2 different conditions among the samples, the output format is different. For differential expression analysis with more than 2 conditions, EBSeq will enumerate all possible expression patterns (on which conditions are equally expressed and which conditions are not). Suppose there are k different patterns, the first k columns of the output file give the posterior probability of each expression pattern is true. Patterns are defined in a separate file, 'output_file.pattern'. The k+1 column gives the maximum a posteriori (MAP) expression pattern for each gene/transcript. The k+2 column gives the posterior probability that not all conditions are equally expressed (column name "PPDE"). The genes/transcripts are reported in descending order of their "PPDE" column values. For details on how EBSeq works for more than 2 conditions, please refer to EBSeq's manual.

=item B<output_file.normalized_data_matrix>

This file contains the median normalized version of the input data matrix.

=item B<output_file.pattern>

This file is only generated when there are more than 2 conditions. It defines all possible expression patterns over the conditions using a matrix with names. Each row of the matrix refers to a different expression pattern and each column gives the expression status of a different condition. Two conditions are equally expressed if and only if their statuses are the same.

=item B<output_file.condmeans>

This file is only generated when there are more than 2 conditions. It gives the normalized mean count value for each gene/transcript at each condition. It is formatted as a matrix with names. Each row represents a gene/transcript and each column represent a condition. The order of genes/transcripts is the same as 'output_file'. This file can be used to calculate fold changes between conditions which users are interested in.  

=back

=head1 EXAMPLES

1) We're interested in isoform-level differential expression analysis and there are two conditions. Each condition has 5 replicates. We have already collected the data matrix as 'IsoMat.txt' and generated ngvector as 'ngvector.ngvec':

 rsem-run-ebseq --ngvector ngvector.ngvec IsoMat.txt 5,5 IsoMat.results

The results will be in 'IsoMat.results' and 'IsoMat.results.normalized_data_matrix' contains the normalized data matrix.

2) We're interested in gene-level analysis and there are 3 conditions. The first condition has 3 replicates and the other two has 4 replicates each. The data matrix is named as 'GeneMat.txt':

 rsem-run-ebseq GeneMat.txt 3,4,4 GeneMat.results

Four files, 'GeneMat.results', 'GeneMat.results.normalized_data_matrix', 'GeneMat.results.pattern', and 'GeneMat.results.condmeans', will be generated. 

=cut