Suspicion is a technique used by greylite to neutralize the workarounds that spammers use against greylisting. It is a list of rules to determine if the client has to be required multiple delivery attempts (instead of the usual double attempt): the more suspicious is a client, the more times it might be temporarily rejected. Also, clients resulting suspicious are not whitelisted even if they pass the greylisting challenge; this preserves the quality of the white list.
This is a soft-method: it does not introduce false positives, and trades off more incisive filtering with possible longer delays for messages; that is, if a client is incorrectly found suspicious, its messages are still received, possibly with a longer delay. A rational use of this technique can visibly lower the amount of spams passing greylisting, make spamming more costly to spammers and dramatically reduce the whitelisting pollution with invisible or negligible impact on legitimate traffic.
Greylite activates suspicion when it finds the SUSPICION variable set into its environment. This variable indicates the full path of the file containing the ruleset. See controls for details.
Suspicion's behaviour is driven by a suspicion file
.
A suspicion file is a text file containing a list of rules, one per
line. If one line begins with a #
(sharp) character, it is
recognized as a comment and skipped.
Rules are checked in order, top to bottom. The first rule matching defines the number of attempts to insist with (the possible rules following aren't evaluated).
A rule has three (possibly four) fields separated by two spaces, with the following format:
integer letter rulespecificationor
integer letter ! rulespecificationwhere:
| Name | Letter | Description | Specification |
|---|---|---|---|
| Reverse lookup | r | the client address hostname (PTR name) is compared with a
pattern of suspicious names. If it matches, the rule is
applied. Intended use: spammers with dynamic
Internet connections usually contain dynamicor pppor similar patterns. |
one extended regular expression (see re_format(7)) compared directly with the PTR hostname |
| Environment variable | v | if an environment variable is set, the rule is applied. Intended use: tcpserver, or another module before greylite, performs custom checks and sets variables if they result in suspecting the client. | a space- or comma-separated list of environment variable names. The rule matches if one of these variable is found. |
| Client behaviour | b | if the client features certain behaviours, the rule is
applied. Behaviours are (keyword in italic):
|
a list of one or more keywords of behaviors, space- or comma-separated. If any behaviour is detected, the rule matches. greetdelay requires the GREETDELAY env variable to be set for enabling. |
| GeoIP map | g | if the client resides in a given set of countries
(accordint to the GeoIP GeoLite Countrydatabase) the rule is applied. Intended use: clients from certain countries are much likely to be spammers: China, Russia etc. Also, in some scenarios e-mail is exchanged with a fixed set of countries, all the others are suspects. |
a space- or comma-separated list of ISO country codes. If the client comes from any item in this list, the rule matches. Controls: GEOIPDB_FILE contains the path of the GeoIP database (default /usr/local/share/GeoIP/GeoIP.dat) |
| Envelope information | e | if parts of the envelope information match one or more patterns, the rule is applied. The envelope information in greylite includes the envelope sender, the envelope recipient and the client host label as self-declared with the HELO/EHLO command. Intended use: examples include disabling greylisting for specific domains, or whitelisting specific senders (eg because they are protected by SPF). | a list of one or more space- or comma-separated patterns
prepended by the part to match:
|
# unprotecteddomain.com is not protected with greylisting, and GMX is
# trusted because of SPF's "-all"
0 e r:@unprotecteddomain.com$ s:@gmx.(de|net)$
# who fails the greetdelay trap or retries blindly is rejected to the infinite
100 b greetdelay retryinterval
# dnsblenv sets the BLACKLISTED variable when the client is on a RBL
6 v BLACKLISTED
# clients outside this zone are suspicious (this is very case-specific)
3 g ! AT BE CH DE ES EU FI FR GB IT MC NO SM VA
# clients whose PTR name contains "dynamic" stuff are suspicious
3 r (^|[^a-z])(a?dsl|dyn(amic)?(ip)?|dial(in|up)?|ppp|customer|user|host|home)([^a-z]|\.?$)
2 r (([0-9]{1,3}[-.]){3})[0-9]{1,3}
If you use a LOGTHRESHOLD ≥ 6 greylite will log messages about the result of suspicion lookups, e.g.:
greylite: Rule 'v' in line 4 matched client '97.102.58.123', returning 6 attempts.
Writing a script based on grep and awk for extracting any activity
stats from this logs is trivial.
Examples: what's the rate of suspicious vs clean
addresses? How
many suspicious clients are such because of rule X? What's the general
cut rate of rule Y?
How to use suspicion? Be bold. Spammers workaround greylisting attempting twice or thrice, thus require at least 2 or 3 re-attempts if you suspect one entry. The risk with legitimate messages is just a longer delay.
How many times to limit to stay safe? The total number of attempts of legitimate servers before discarding the message is usually high (several tents). Stay under 10.
How does the order of the rules matter? The first rule that
matches wins
, the rest is not evaluated.
How do I optimize the ruleset for performance? Put rules with more frequent matching higher. Also, ENVVAR rules are the lighter to check, followed by regex rules, followed by GeoIP rules.
How long do I delay if I require N attempts? This depends on the retry interval policy used by the sending server. The table gives some examples for the most popular MTAs (they probably cover 98%+ of the mail traffic). These are default values, to be safe take the half of them.
| Server | lifetime | policy | retry interval | number of attempts | refs |
|---|---|---|---|---|---|
| qmail | 7 days | exponential | 400 secs (first) to 7 days, 1 hour (last) | 40 | lifewithqmail.org |
| postfix | 5 days | saturating exponential (doubling) | 300 secs to 4000 secs | > 108 | postfix.org |
| exchange | 5 days | constant | 20 mins | 360 | microsoft.com |
| sendmail | 5 days | constant | 30 mins | 240 | sendmail.org |
| Lotus Domino | 24 hours | linear | 15, 30, 45, 45, 45, ... mins | 25 | lotus.com |
| exim | 4 days | mixed | 15 mins 8 times, then growing at 150% for 16 hours, then 6 hours | > 28 | exim.org |