Greylite's suspicion

Suspicion is a technique used by greylite to neutralize the workarounds that spammers use against greylisting. It is a list of rules to determine if the client has to be required multiple delivery attempts (instead of the usual double attempt): the more suspicious is a client, the more times it might be temporarily rejected. Also, clients resulting suspicious are not whitelisted even if they pass the greylisting challenge; this preserves the quality of the white list.

This is a soft-method: it does not introduce false positives, and trades off more incisive filtering with possible longer delays for messages; that is, if a client is incorrectly found suspicious, its messages are still received, possibly with a longer delay. A rational use of this technique can visibly lower the amount of spams passing greylisting, make spamming more costly to spammers and dramatically reduce the whitelisting pollution with invisible or negligible impact on legitimate traffic.

Greylite activates suspicion when it finds the SUSPICION variable set into its environment. This variable indicates the full path of the file containing the ruleset. See controls for details.

Suspicion ruleset specification

Suspicion's behaviour is driven by a suspicion file.

A suspicion file is a text file containing a list of rules, one per line. If one line begins with a # (sharp) character, it is recognized as a comment and skipped.

Rules are checked in order, top to bottom. The first rule matching defines the number of attempts to insist with (the possible rules following aren't evaluated).

A rule has three (possibly four) fields separated by two spaces, with the following format:

integer letter rulespecification
or
integer letter ! rulespecification
where:
  1. integer is an integer ≥ 0. This dictates the number of attempts to require if the rule matches.
  2. letter is a single alphabetic character in lower case. This declares the kind of the following rule specification
  3. ! is an optional modifier that inverts (negates) the result of the rule. When present, it must be followed by one space. When omitted, one space must be present between letter and rulespecification.
  4. rulespecification is a string whose format depends on the kind of the rule (see below).

Recognized rules

Name Letter Description Specification
Reverse lookup r the client address hostname (PTR name) is compared with a pattern of suspicious names. If it matches, the rule is applied. Intended use: spammers with dynamic Internet connections usually contain dynamic or ppp or similar patterns. one extended regular expression (see re_format(7)) compared directly with the PTR hostname
Environment variable v if an environment variable is set, the rule is applied. Intended use: tcpserver, or another module before greylite, performs custom checks and sets variables if they result in suspecting the client. a space- or comma-separated list of environment variable names. The rule matches if one of these variable is found.
Client behaviour b if the client features certain behaviours, the rule is applied. Behaviours are (keyword in italic):
  • greetdelay: a delay is inserted before passing data when the connection is open. The client may give up and disconnect, or it may send data blindly before expecting the server's greeting. In the second case it is trapped.
  • retryinterval: the client may be retrying deliveries of the message with an excessive frequency that is not proper of legitimate servers.
  • commanderrors: the client issued some commands that the server rejected. These usually include wrong envelope commands (MAIL FROM violating SPF rules or RCPT TO for non-local domains). It is recommended not to associate this knob with values higher that 3.
Intended use: spot behaviors that are by no means proper of normal, legitimate MTAs.
a list of one or more keywords of behaviors, space- or comma-separated. If any behaviour is detected, the rule matches. greetdelay requires the GREETDELAY env variable to be set for enabling.
GeoIP map g if the client resides in a given set of countries (accordint to the GeoIP GeoLite Country database) the rule is applied. Intended use: clients from certain countries are much likely to be spammers: China, Russia etc. Also, in some scenarios e-mail is exchanged with a fixed set of countries, all the others are suspects. a space- or comma-separated list of ISO country codes. If the client comes from any item in this list, the rule matches. Controls: GEOIPDB_FILE contains the path of the GeoIP database (default /usr/local/share/GeoIP/GeoIP.dat)
Envelope information e if parts of the envelope information match one or more patterns, the rule is applied. The envelope information in greylite includes the envelope sender, the envelope recipient and the client host label as self-declared with the HELO/EHLO command. Intended use: examples include disabling greylisting for specific domains, or whitelisting specific senders (eg because they are protected by SPF). a list of one or more space- or comma-separated patterns prepended by the part to match:
  • s:expr matches the envelope sender
  • r:expr matches the envelope receiver
  • h:expr matches the host label declared by the client
If any of the patterns in the list matches, the rule is applied.

Example suspicion file

# unprotecteddomain.com is not protected with greylisting, and GMX is
# trusted because of SPF's "-all"
0 e r:@unprotecteddomain.com$ s:@gmx.(de|net)$
# who fails the greetdelay trap or retries blindly is rejected to the infinite
100 b greetdelay retryinterval
# dnsblenv sets the BLACKLISTED variable when the client is on a RBL
6 v BLACKLISTED
# clients outside this zone are suspicious (this is very case-specific)
3 g ! AT BE CH DE ES EU FI FR GB IT MC NO SM VA
# clients whose PTR name contains "dynamic" stuff are suspicious
3 r (^|[^a-z])(a?dsl|dyn(amic)?(ip)?|dial(in|up)?|ppp|customer|user|host|home)([^a-z]|\.?$)
2 r (([0-9]{1,3}[-.]){3})[0-9]{1,3}

Effectiveness analysis and stats

If you use a LOGTHRESHOLD ≥ 6 greylite will log messages about the result of suspicion lookups, e.g.:

    greylite: Rule 'v' in line 4 matched client '97.102.58.123', returning 6 attempts.
    
Writing a script based on grep and awk for extracting any activity stats from this logs is trivial.

Examples: what's the rate of suspicious vs clean addresses? How many suspicious clients are such because of rule X? What's the general cut rate of rule Y?

Practical recommendations for suspicion

How to use suspicion? Be bold. Spammers workaround greylisting attempting twice or thrice, thus require at least 2 or 3 re-attempts if you suspect one entry. The risk with legitimate messages is just a longer delay.

How many times to limit to stay safe? The total number of attempts of legitimate servers before discarding the message is usually high (several tents). Stay under 10.

How does the order of the rules matter? The first rule that matches wins, the rest is not evaluated.

How do I optimize the ruleset for performance? Put rules with more frequent matching higher. Also, ENVVAR rules are the lighter to check, followed by regex rules, followed by GeoIP rules.

How long do I delay if I require N attempts? This depends on the retry interval policy used by the sending server. The table gives some examples for the most popular MTAs (they probably cover 98%+ of the mail traffic). These are default values, to be safe take the half of them.

Serverlifetimepolicyretry intervalnumber of attemptsrefs
qmail 7 days exponential 400 secs (first) to 7 days, 1 hour (last) 40 lifewithqmail.org
postfix 5 days saturating exponential (doubling) 300 secs to 4000 secs > 108 postfix.org
exchange 5 days constant 20 mins 360 microsoft.com
sendmail 5 days constant 30 mins 240 sendmail.org
Lotus Domino 24 hours linear 15, 30, 45, 45, 45, ... mins 25 lotus.com
exim 4 days mixed 15 mins 8 times, then growing at 150% for 16 hours, then 6 hours > 28 exim.org