/usr/share/doc/courier-doc/htmldoc/maildroptips.html is in courier-doc 0.68.2-1ubuntu7.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content=
"text/html; charset=us-ascii" />
<title>Maildrop tips and tricks.</title>
<meta name="MSSmartTagsPreventParsing" content="TRUE" />
<!-- Copyright 2000-2002 Double Precision, Inc. See COPYING for -->
<!-- distribution information. -->
<link rel="icon" href="icon.gif" type="image/gif" />
</head>
<body>
<h1>Maildrop tips and tricks</h1>Last update: 10/11/1998.<br />
<hr width="100%" />
<table>
<tbody>
<tr>
<td align="left" rowspan="6" width="10%"></td>
<th align="left"><a href="#include">The include
statement</a></th>
<td></td>
</tr>
<tr>
<th align="left"><a href="#linessize">The
<code>$LINES</code> and <code>$SIZE</code>
environment</a></th>
<td></td>
</tr>
<tr>
<th align="left"><a href="#exclamation">Partial matches
using the ! operator</a></th>
<td></td>
</tr>
<tr>
<th align="left"><a href="#shortcircuit">Short-circuit
evaluation using logical operators</a></th>
<td></td>
</tr>
<tr>
<th align="left"><a href="#foreach">The
<code>foreach</code> statement</a></th>
<td></td>
</tr>
<tr>
<th align="left"><a href="#hometmp">The
<code>$HOME/.tmp</code> directory</a></th>
<td></td>
</tr>
</tbody>
</table>
<hr width="100%" />
<br />
Here's a list of tips and tricks to write efficient mail filters
using maildrop. Maildrop - just like any programming language -
can do certain things better than others.<br />
<h2><a name="include" id="include">The include statement</a></h2>
<p>When maildrop is started, it reads
<code>$HOME/.mailfilter</code>, or some other file containing
mail filtering instructions. Before doing anything else, this
entire file is read and semi-compiled. Any grammatical or syntax
errors will result in maildrop terminating with a temporary error
code, without doing anything with the message.</p>
<p>The entire file must be 'semi-compiled". The
'semi-compilation' process consists of building an logical
representation of the filtering statements in memory.</p>
<p>If you have a long and complicated set of instructions, it
will be semi-compiled whether or not it will actually be
executed. For example, consider the following code:<br />
</p>
<pre>
if ( /^Subject: rosebud/ )
{
...
some
really
long
set
of
instructions
...
}
</pre>If the contents of the "<code>if</code>" construct are large,
maildrop may spend considerable amount of time and memory
semi-compiling filtering instructions that may never be used.
Consider putting all these instructions into a separate file, and
using the <code>include</code> statement to execute them.
<p>The <code>include</code> statement is processed only when it
is actually executed by maildrop. That carries certain side
effects. One feature of maildrop is the semi-compilation which
weeds out errors in the filtering recipe file, holding all mail
and not doing anything until the errors are fixed. The
<code>include</code> statement is not processed until it is
actually executed. If there are errors in the file referenced by
the <code>include</code> statement, maildrop will not know about
them until it executes the <code>include</code> statement. When
maildrop detects the error, it will terminate with a temporary
error code, holding all mail. However, any filtering instructions
before the <code>include</code> statement would've already been
executed.</p>
<h2><a name="linessize" id="linessize">The <code>$LINES</code>
and <code>$SIZE</code> environment variables</a></h2>
<p>These variables are initialized to the number of lines in the
message, and the size of the message in bytes. They may not be
exact, due to the presence of the From_ line, or -A arguments on
the command line, but they will always be a reasonable
approximation.</p>
<p>Scanning headers for patterns will take the same amount of
time whether the body of the message has 100 or 1,000 lines.
However, the "b" option causes the message body to be searched.
If someone sends you a 5 megabyte binary file, each pattern
matched against the message body will take a significant time to
complete.</p>
<p>Whenever possible, the first thing you should do is check both
variables to see if the message is excessively large. If
so, divert the message to a separate mailbox, or reject it.</p>
<p>Weighted scoring is especially sensitive to the size of the
message being delivered. Normally, as soon as a pattern match is
found, maildrop terminates the pattern scan. If weighted scoring
is used then the pattern scan is immediately restarted, until the
end of the message is reached.</p>
<p>A fixed overhead is required to start a pattern scan going, so
that weighted scoring that matches a very short pattern, like
<code>/[:lower:]/:Dbw,1,1</code> will take a significant amount
of CPU time to complete. Maildrop is simply not optimized for
these kinds of situations.</p>
<p>Nevertheless, on a Pentium 200, /[:lower:]/:Dbw,1,1 perform
reasonably well for messages less than 60K long.</p>
<p>Here's an example of using <code>$LINES</code> and
<code>$SIZE</code> to divert large messages to a separate
mailbox:</p>
<pre>
if ($LINES > 1000 || $SIZE > 60000)
{
to "mail/IN.large"
}
.
.
.
</pre>
<h2><a name="exclamation" id="exclamation">Partial matches using
the ! operator</a></h2>
<p>Maildrop implements the ! operator in patterns by breaking up
the pattern into two separate patterns, which are compiled and
executed separately.</p>
<p>Whenever the first pattern is successfully matched, maildrop
runs the second pattern. If the second pattern does not match,
maildrop resumes matching the first pattern until it matches
again.</p>
<p>Since starting a pattern match involves some fixed overhead,
minimizing the number of times the first pattern will match will
reduce the amount of time it takes to match the entire
pattern.</p>
<p>For example: you're trying to extract an IP address from the
first Received header. Here's your first attempt at doing so:</p>
<pre>
/^Received:.*![0-9\.]+/
IP=$MATCH2
</pre>This will not work at all. From the manual page for
maildropfilter:
<pre>
When there is more than one way to match a string,
maildrop favors matching as much as possible initially.
</pre>
<p><br />
So this fragment will set IP to the last digit, or period found
in the first Received: header, because the first half of the
pattern will match as much as possible.</p>
<p>Here's your second attempt, then:</p>
<pre>
/^Received:.*[^0-9\.]![0-9\.]+/
IP=$MATCH2
</pre>This will work, except the first half of the pattern will end
up matching every character in the Received: header that's not a
digit or a period. Maildrop will then stop and try to run the
second half of the pattern.
<p>Your mail server will probably put a left bracket before the
IP address of the relay. Noting that, the following code picks up
the IP address as efficiently as possible:</p>
<pre>
/^Received:.*\[![0-9\.]+/
IP=$MATCH2
</pre>
<h2><a name="shortcircuit" id="shortcircuit">Short-circuit
evaluation using logical operators</a></h2>
<p>The logical operators <code>||</code> and
<code>&&</code> work in maildrop just like they work in
C. Consider the following expression:</p>
<pre>
if (/ ... some pattern ... / || / ... some other pattern ... /)
{
.
.
.
}
</pre>If the first pattern is found, maildrop will not execute the
second pattern, since the logical expression will be true no matter
what. You can use that to your advantage. If you have two patterns,
and one of them takes more time to process, put the other first.
Even if both patterns are relatively quick, if you expect one of
them to be found almost every time, put that one first so that it
will not be necessary to run the second pattern most of the time.
<p>The same concept works for the <code>&&</code>
operator:</p>
<pre>
if (/ ... some pattern ... / && / ... some other pattern ... /)
{
.
.
.
}
</pre>If the first pattern is not found, maildrop will not execute
the second pattern, since the logical expression will be false no
matter what.
<h2><a name="foreach" id="foreach">The <code>foreach</code>
statement</a></h2>
<p>The <code>foreach</code> statement is implemented as follows.
Maildrop will execute the regular expression, and compile a list
of all the patterns in the message matched by the regular
expression. Afterwards, maildrop will execute the
<code>foreach</code> statement, once for each matched
pattern.</p>
<p>It is important to note that maildrop will create a list in
memory containing every pattern matched by the regular
expression. if the regular expression is found many times in the
message, this will require a lot of memory. For example, the
following is a Very Bad Thing[tm]:</p>
<pre>
foreach /[:alpha:]/
{
.
.
.
}
</pre>Maildrop does not include any built-in limits on resource
usage, except for the watchdog timer. Therefore, as a system
administrator, it is your responsibility to limit resources used by
the maildrop process.
<p>This implementation of the <code>foreach</code> statement is
the one that yields the least number of surprises. Observe that
the contents of the <code>foreach</code> construct can modify the
message using the <code>xfilter</code> statement. Furthermore,
the regular expression may include variable substitution, and
those variables could be modified by the <code>foreach</code>
statement. If the foreach statement were to be executed "on the
fly", these factors would yield unpredictable - and rather messy
- results.</p>
<h2><a name="hometmp" id="hometmp">The <code>$HOME/.tmp</code>
directory</a></h2>
<p>Under certain conditions, maildrop will need to use a
temporary file. Temporary files are created in the
<code>$HOME/.tmp</code> directory. Temporary files are used to
store large messages (instead of storing them in memory).
Temporary files may also be used by the xfilter command, or in
other situations.</p>
<p>Maildrop will automatically delete all temporary files before
terminating. However, if the maildrop process is killed, or
terminates abnormally, a temporary file may remain and take up
disk space. To have those leftover files automatically cleaned
up, put the following code in <code>/etc/profile</code>, or some
other file that all users run after logging in:</p>
<pre>
find $HOME/.tmp -depth -mtime +7 \( \( \
-type d -exec rmdir {} \; \) -o -exec rm -f {} \; \) 2>/dev/null
</pre>
</body>
</html>
|