Annoyed by our Taiwanese and Nigerian "friends", I decided to check out what this cool new toy at the university mail server does. And I indeed had learned of it before - and now I'm not going to part with it...

...

SpamAssassin (http://www.spamassassin.org/) is an assassin that kills spam, in other words, it's a spam filter.

SpamAssassin is basically a spam filter based on rule-based heuristics, although these days it also incorporates Bayesian filtering and it can also cooperate with other spam filtering methods (such as Vipul's Razor).

The usual assassination method is based on the score it calculates based on a large set of rules. A message that doesn't activate any filters gets score of 0, some particular "well-behaved" things lower the score a bit, while non-adherence to standards and known spammer tactics increase score (for example, the phrase "this is not spam" increases score by 0.405...=). By default, all E-mails with score greater than +5.0 are considered spam. (For what it's worth, one pr0n spam that I got today easily flew into that category with score of +20.7...) The user can change the scores for each test, and also change the threshold (some say 4.0 is better than 5.0, for example). There's also whitelists for exclusion of your friends and frequent mailers.

SpamAssassin is typically a *NIX utility. It's usually run from procmail. If only a couple of users are using it, it can be run as a stand-alone filter program. Larger sites can use client (spamc) in user .procmailrc or by the MTA itself, and run the daemon (spamd) to do the actual analysis. There's also a filtering SMTP proxy (spamproxy).

If using procmail, all you need to do is to feed the message to SpamAssassin in procmailrc, and if the message has "X-Spam-Flag: YES" header, junk it into an appropriate folder. All messages get identifying headers that tell what score the message got and so on; The messages that have been flagged as spam also contain detailed listing of what filters were triggered.

The program comes with a lot of filters, and it's possible to make custom filters. The filtering is based on the message itself (header and body), and optionally to external sources, such as the RBLs and Vipul's Razor.

SpamAssassin is written in Perl (spamc appears to be written in C, though), and can be got from CPAN too. It is distributed as open source.

For the non-*NIX folks, there's also commercial product called SpamAssassin Pro (from Deersoft Inc.) that works with Microsoft Outlook and Microsoft Exchange. There's also Bloomba's SAproxy.

(The only problem for me was that the first damn Nigerian scam mail I got that passed through it got measly +3.4...)

(Thanks to Zerotime for reminding about Bayesian filterings)

Finally tiring of all the garbage, I found this open source package and recognized it as the best of breed, smiling at the fact that the US spends billions on spam blocking but this package which works excellently is free.

One thing you should know is that the task of recognition and seperation of spam from ham is quite complex and never ending. Since the spammers are constantly at work, so must the spam blockers be. This leads to a major concern you should be aware of: quite rightly Spamassassin concentrates only on the task of spam recognition. This means that integration with your email package requires another package and this can be a significant task in it's own right.

Like most Unix users, I use Sendmail as the MTA and it is notoriously complex and difficult to administer. In my first try at the integration I discovered what would be ultimately the package of choice for the Spamassassin-Sendmail integration, "milter-spamc" ( snert.org ). Unfortuately, in the first attempt I stabbed myself in the foot and crashed my glibc 2.2 based system by trying to force a series of rpm upgrades required for the ensemble of packages (milter-spamc requires the latest Sendmail). To make a long story short, the solution was to abandon the use of RPM and rebuild the current sendmail from the distributed tarball, though some rpms were used in ancillary packages.

milter-spamc provides a fine level of control on the dispositon of junk mail. In practice, as it turns out, Spamassassin generally gives ham a score near 0, though some special kinds of messages give scores as high as upper 2.n (for example I have a convention of using bodyless emails as instant messages). I've found setting the spamassassin level to 2 then setting milter-spamc reject-excess to 1 and discard excess to 3 works best. With these settings, anything between 3 and 5 is rejected by Sendmail (SMTP 550), anything over 5 hits the bit bucket and anything between 2 and 3 shows up in your mailer with the subject prefixed with "[SPAM]".

Update:

After almost 2 months of use and monitoring operation, I've found this setup eliminates most Spam, on some days 100% percent. Had to backup to set reject-excess to 4 and drop-excess to 7 though since some of my own mails were being rejected (mostly from using a sticky IP) like the E user id mails. Should also note that will probably put on new web head I'm building with the settings above since 1) it has a fully static IP and can tolerate the tighter setting and 2) I've realized that this setup prevents mail relaying exploits as well as skank shit showing up in ones inbox.

Log in or register to write something here or to contact authors.