This file is indexed.

/usr/share/perl5/Lingua/StopWords.pm is in liblingua-stopwords-perl 0.09-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
package Lingua::StopWords;
use strict;
use warnings;

require Exporter;
our @ISA = qw(Exporter);
our %EXPORT_TAGS = ( 'all' => [ qw( getStopWords ) ] );
our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
our $VERSION = 0.09;

sub getStopWords {
    my ( $language, $encoding ) = @_;

    return undef unless $language;

    $language = uc($language);
    eval { require "Lingua/StopWords/$language.pm"; };
    return undef if $@;

    my @args = $encoding ? ($encoding) : ();
    no strict 'refs';
    return &{ "Lingua::StopWords::$language\::getStopWords" }(@args);
}

1;

__END__

=head1 NAME

Lingua::StopWords - Stop words for several languages.

=head1 SYNOPSIS

    use Lingua::StopWords qw( getStopWords );
    my $stopwords = getStopWords('en');
    
    my @words = qw( i am the walrus goo goo g'joob );
    
    # prints "walrus goo goo g'joob"
    print join ' ', grep { !$stopwords->{$_} } @words;

=head1 DESCRIPTION

In keyword search, it is common practice to suppress a collection of
"stopwords": words such as "the", "and", "maybe", etc. which exist in in a
large number of documents and do not tell you anything important about any
document which contains them.  This module provides such "stoplists" in
several languages.

=head2 Supported Languages

    |-----------------------------------------------------------|
    | Language   | ISO code | default encoding | also available |
    |-----------------------------------------------------------|
    | Danish     | da       | ISO-8859-1       | UTF-8          | 
    | Dutch      | nl       | ISO-8859-1       | UTF-8          | 
    | English    | en       | ISO-8859-1       | UTF-8          |
    | Finnish    | fi       | ISO-8859-1       | UTF-8          |
    | French     | fr       | ISO-8859-1       | UTF-8          |
    | German     | de       | ISO-8859-1       | UTF-8          | 
    | Hungarian  | hu       | ISO-8859-1       | UTF-8          | 
    | Italian    | it       | ISO-8859-1       | UTF-8          | 
    | Norwegian  | no       | ISO-8859-1       | UTF-8          | 
    | Portuguese | pt       | ISO-8859-1       | UTF-8          | 
    | Spanish    | es       | ISO-8859-1       | UTF-8          | 
    | Swedish    | sv       | ISO-8859-1       | UTF-8          | 
    | Russian    | ru       | KOI8-R           | UTF-8          | 
    |-----------------------------------------------------------|

=head1 FUNCTIONS

=head2 getStopWords

    my $stoplist      = getStopWords('en');
    my $utf8_stoplist = getStopWords('en', 'UTF-8');

Retrieve a stoplist in the form of a hashref where the keys are all
stopwords and the values are all 1.

    $stoplist = {
        and => 1,
        if  => 1,
        # ...
    };

getStopWords() expects 1-2 arguments.  The first, which is required, is an ISO
code representing a supported language.  If the ISO code cannot be found,
getStopWords returns undef.

The second argument should be 'UTF-8' if you want the stopwords encoded in
UTF-8.  The UTF-8 flag will be turned on, so make sure you understand all the
implications of that.
    
=head1 SEE ALSO

The stoplists supplied by this module were created as part of the Snowball
project (see L<http://snowball.tartarus.org>,
L<Lingua::Stem::Snowball|Lingua::Stem::Snowball>).

L<Lingua::EN::StopWords|Lingua::EN::StopWords> provides a different stoplist
for English.

=head1 AUTHOR

Maintained by Marvin Humphrey E<lt>marvin at rectangular dot comE<gt>.
Original author Fabien Potencier, E<lt>fabpot at cpan dot orgE<gt>.

=head1 COPYRIGHT AND LICENSE

Copyright 2004-2008 Fabien Potencier, Marvin Humphrey

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.3 or,
at your option, any later version of Perl 5 you may have available.

=cut