Why Bayesian filtering won't kill the spam industry

Montréal, 19 Feb 2003

Apple's Mail.app has got it. You can plug it into Eudora, Outlook Express, PowerMail and half a dozen other mailers. Apparently a simple version can be written by a UNIX power-user while his or her right hand is busy doing their morning floss. Word has it that a 50MB extension to MSN 8 will use it (and likely send information on your e-mail back to Redmond for ‘quality control’ reaons). Bayesian or statistical spam filtering threatens to make it impossible for the subhuman products of Coles Notes capitalism and a little Internet knwoledge known as spammers to deliver their tiresome payloads to the mailboxes of savvy and semi-savvy Internet users.

Users of these technologies can expect to have a good 80 to 95 percent of their spam intercepted and trashed, at the cost of a (shakily due to small sample size) estimated 0.03% or so of their good e-mail being routed there as well. Of course, most users (myself included) can easily outdo the damage from Bayesian filter (whatever the final ratio is) through misfiling, forgetting, misunderstanding and generally neglecting incoming mail from time to time. The Bayesian filter promises to give everyone their own personalized spam shield, with ongoing training both in what mail is desirable as well as what's eating away at users' bandwidth restrictions.

Paug Graham has written a couple of excellent aricles on his experiences with Bayesian filtering and posits that wide use of Bayesian filtering will crush spammers economically by making it impossible to send any useful spam to Bayesian filter users. Although at first, the flood will come in and be routed to the trash (after having being downloaded for those of us using POP mail), Graham speculates that after a while, the expected benefits of sending out a million spam e-mails will drop and the costs will exceed the benefits.

I suspect this will never happen.

Bayesian filter users will be saved from reading their spam. It will pile up in e-mail trash folders and be quietly expunged without troubling them except for the occasional curiosity-inspired dumpster dive or cursory check for misdirected mail. The facts that the spam travels across the network, eats up their bandwidth, that it is indirectly responsible for those three messages in ten thousand or so being sent off to limbo... these facts will not change. Why?

If you're sufficiently annoyed by spam to set up or pay Microsoft a premium to set up a Bayesian filter, you're probably in at least two groups:

  1. You've got sufficient nerd, geek or Mac tendencies to find or be given a spam filter: you're probably educated or self-taught in computer technology, you're more likely to care about CSS and de-GIF-ing the Internet, you're probably in the minority for now. This will change as Bayesian filtering becomes a standard feature on more and more mail readers... but don't hold your breath.
  2. You probably haven't bought chemical copies of Viagara, mail-ordered yourself a bride from Eastern Europe, helped channel billions out of Nigeria or considered purchasing stealth bulk e-mailer software. If you're making money hand over fist, it's likely not by your industriousness in selling reports in a pyramid-scheme fashion. You're probably not making money hand over fist, in fact, because you're reading all your e-mail rather than hiring someone to do the initial triage for you. But that's beside the point.

You, my dear Bayes fan or Bayes-fan-to-be, are not in the spammers’ target market. You are collateral damage (in a mild sense of the phrase — spammers haven't mastered setting up the application/x-shrapnel or audio/x-concussion MIME types yet and only Microsoft would ever allow auto-opening of attachments of that type). Not to be excessively disrespectful, but the spammers are looking at people who don't fall into the group inclined to pass up billions for worries about sending their bank account number to strangers they've never met. They're looking for sad, gullible people and they're hoping they can find a hundred of them in that million e-mail blitz. They probably will. They don’t give a damn about the other 999 900 recipients so long as they aren’t found by them. If we want to deny the spammers access to that other hundred people, we need filtering above the personal level... and the best filtering I can think of is the fraud squad of wherever the spammers cash their cheques or have their post office boxes. Us indignant masses without mandates to enforce the law or send lawyers after throwaway companies in Florida have little recourse... except possibly to recruit people with one or the other or both of the above.

Maybe the solution is for us to take a look at who's being advertised in those e-mails. Major software occasionally goes in... probably pirated or gotten after it fell off a truck somewhere if it exists at all. If you're a major manufacturer of software that people in the two categories of filter users above might be interested in, say the owner of the Peter Norton name, you might be thinking that spam is not a context in which your trademarks should be appearing. You might even be thinking it could be worth your while to stop people from selling your product that way. Consider that you've probably never seen a Big Mac being sold out of the freezer in a corner store. To use that name, it's likely specified that you sell it in a certain type of restaurant which can be promoted in certain ways. People who contravene such an agreement get their supplies of 100% Pure Beef™ pseudopatties cut off and may even find themselves talking to a pack of lawyers. Now supposing a company whose products are advertised in spam recieve complaints about that. If they recieve enough complaints, and sales start to dip, then they may trot out the trademark and franchise lawyers. They may even suggest to a legislator or two that something must be done. Then, and only then, will the cost of spam be higher than its expected rewards.

The problem with that is that to know which giant multinationals to bug you'll have to start reading your spam again.

Happy filtering.