Tuesday, September 03, 2002

A plan for spam

Essay by Paul Graham on how he uses a Bayesian statistical approach to decide whether an email is or is not spam. The method filters each user's email based on the correspondence (both spam and non-spam) that that user receives - so words that some would regard as spammy, but that others would not, are judged in context to the user's usual correspondence. Graham's filter lets through fewer than 5 spams per 1,000 messages, with zero false positives. A good lesson here for commercial spam filters?
(From Tidbits)

No comments: