Windows IT Pro is the authoritative and independent resource for windows nt, windows 2000, windows 2003, windows xp. Features a collection of resources and magazines for windows IT professionals.
  
  
  Advanced Search 


February 18, 2003

Bayesian Spam Filters

RSS
Subscribe to Windows IT Pro | See More Exchange Server and Outlook Articles Here | Reprints
Or get the Monthly Online Pass—only $5.95 a month!

One of the most promising antidotes to spam is so-called Bayesian filtering, which calculates the probability that a given message is spam, based on analysis of messages previously identified as being spam or not being spam. The Bayesian approach demands less maintenance than keyword-based spam filters that require constant updating of word and phrase lists.

Much of the buzz around this technique started with Paul Graham's August 2002 article "A Plan for Spam" (see the first URL below). Although there is some debate about whether Graham's approach is precisely Bayesian, organizations have been exploring Bayesian methods and applying them to spam for several years. Microsoft Research's antispam effort, spearheaded by a group of Bayesian researchers, began in 1997 and has resulted in a patent. If you want to keep up with spam-fighting techniques, some understanding of the Bayesian technique is in order.

The Bayes in Bayesian was an 18th-century British clergyman and amateur mathematician, Thomas Bayes, who suggested in a posthumously published paper that the probability of some event occurring in the future is related to the proportion of times that event occurred in the past under the same circumstances. Later, mathematicians refined Bayes's ideas and, in the 20th century, built a formal system of classification and decision-making and began applying it to many tasks in science and engineering. (I first encountered Bayesian inference in the context of economics.) A key element of the Bayesian approach is that it depends on having some prior information about the problem at hand.

To some extent, the Bayesian approach models our everyday experience of using probability to try to determine the possible outcome of an action and make decisions. The Bayesian interpretation of probability is different from the coin-flipping experiments that most of us did in school (and which, I'm convinced, are largely an effort to convince students of the futility of gambling). Life isn't a series of random experiments from which we calculate frequency distributions. We must make decisions taking into account the likelihood of different consequences arising from those decisions and whether those consequences are good or bad.

In the case of spam, Bayesian inference suggests that if a new message contains text that appeared often in spam in the past but rarely in legitimate messages, then the new message is likely to be spam. The formal methods of calculating such a probability can also take into account the fact that a single false positive--a legitimate message quarantined as spam--is far more costly than many false negatives or spam messages left untouched in your Inbox.

Graham's method analyzes not just the message body but also the message header, which might contain information about the sender's mail server, foreign character sets, and attachments. He claims that his filter catches 99.5 percent of spam with less than one false positive for every 1000 messages received.

Graham presented an update at an antispam conference at MIT last month. He has expanded his list of "tokens"--telltale words and phrases to look for in incoming mail-–to about 187,000 items. And his method can now handle a word differently depending on whether it appears in the subject, in a URL, or in an address field.

Others following Graham's lead are experimenting with variations that calculate the "spamminess" of messages differently. The open-source SpamBayes effort has produced an Outlook add-in (see second URL below). Another free Outlook spam filter using a Bayesian technique is Spammunition, currently in beta. Spam Bully provides a commercial solution.

John Graham-Cumming, the author of POPFile, another open-source project (this one a mail proxy server using a Bayesian filter) reported to the MIT conference that, as well as statistical filters might work, parsing email messages so that such filters can analyze them will continue to be a hard job. Technically savvy spammers constantly devise new ways to make their messages easy for a user to read but difficult for a program to analyze.

"A Plan for Spam" http://www.paulgraham.com/spam.html

SpamBayes http://spambayes.sourceforge.net

Spammunition http://www.upserve.com/spammunition/default.asp

Spam Bully http://spambully.com

POPFile http://popfile.sourceforge.net

End of Article



Reader Comments

You must log on before posting a comment.

If you don't have a username & password, please register now.




Top Viewed ArticlesView all articles
VMware and the Future of Virtualization

What's next for virtualization and business IT? Windows IT Pro senior editor Jeff James speaks with VMware President and CEO Diane Greene on the future of virtualization technology. ...

The Memory-Optimization Hoax

Don't believe the hype. At best, RAM optimizers have no effect. At worst, they seriously degrade performance. ...

A Great Tool For Making Screencasts

I've started making product demos and have found a tool that has helped make the job easier--Camtasia. ...


Exchange Server and Outlook Whitepapers Anonymizers – The Latest Threat to Your Web Security

Replay for Exchange: Enterprise Protection and an Affordable Price

ETX Driving Embedded I/O

Related Events Check out our list of Free Email Newsletters!

Exchange Server and Outlook eBooks Spam Fighting and Email Security for the 21st Century

Understanding and Leveraging Code Signing Technologies

The Expert's Guide for Exchange 2003: Preparing for, Moving to, and Supporting Exchange Server 2003

Related Exchange Server and Outlook Resources Become a VIP member of the Windows IT Pro community!
Get it all with the VIP CD and VIP access. A $500+ value for only $279!

Subscribe to Windows IT Pro!
Solve your toughest technical problems with our experts and access 10,000 + articles online. 30% off

Monthly Online Pass - Only $5.95!
Get instant access to 10,000+ articles from Windows IT Pro Magazine!

TechNet Virtual Labs
Evaluate and test Microsoft's newest products.

Exchange & Outlook UPDATE eNewsletter
News, strategies, products, and developments in Exchange Server and Outlook messaging.

ADS BY GOOGLE SPONSORED LINKS FEATURED LINKS

Critical Challenges of ESI & Email Retention
Are you storing too much electronic information? Get expert legal advice and better understanding of what you are required to do as an IT professional.

Become a fan of Windows IT Pro on Facebook!
Join us on Facebook and be a fan of Windows IT Pro!

Sustainable Compliance: Are You Having a Resource Crisis?
Read this white paper to examine trends in compliance and security management and review approaches to reducing the cost and operational burden of compliance.

Rev Up Your IT Know-How with Our Recharged Magazine!
The improved Windows IT Pro provides trusted IT content with an enhanced new look and functionality! Get comprehensive coverage of industry topics, expert advice, and real-world solutions—PLUS access to over 10,000 articles online. Order today!

Get It All with Windows IT Pro VIP
Stock your IT toolbox with every solution ever printed in Windows IT Pro and SQL Server Magazine plus bonus Web-exclusive content on hot topics. Subscribe to receive the VIP CD and a subscription to your choice of Windows IT Pro or SQL Server Magazine!



Order Your Fundamentals CD Today!
Gain an introduction to Exchange, learn server security requirements, and understand how unified communications can play a role in your messaging strategies with this free Exchange CD.
Windows IT Pro Home Register About Us Affiliates / Licensing Media Kit Contact Us/Customer Service  
SQL Connected Home IT Library SuperSite FAQ Wininfo News
Europe Edition Office & SharePoint Pro Windows Dev Pro Windows Excavator 
 
 Windows IT Pro is a Division of Penton Media Inc.
 Copyright © 2008 Penton Media, Inc., All rights reserved. Terms and Use | Privacy Statement | Reprints and Licensing