spam classification breaker
Tim Peters
tim.one at comcast.net
Thu Feb 5 18:43:30 EST 2004
More information about the Python-list mailing list
Thu Feb 5 18:43:30 EST 2004
- Previous message (by thread): spam classification breaker
- Next message (by thread): spam classification breaker
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Robin Becker] > .... are you asserting that spammers don't have access to the pdf that > users are filtering? Sorry, I couldn't make sense of that question. > Each filter may be unique, but they can be biassed. -- It doesn't matter, because these classifiers learn. In the early days of the Spambayes project, we experimented with throwing "the best" N clues (both hammy and spammy) out of the database, where "the best" was a measure of how often and how strongly a feature contributed to a correct classification. Through several iterations of that, overall performance remained just as good -- the classifier learned to look for other things. If even the strongest features can be thrown away without harm, there's not much use in trying to exploit small statistical bias. It's not even clear that any particular individual bias is widespread. For example, "Nancy" is a hammy word in my training data, but "Cecil" is spammy. Is that universal? Seems unlikely. "Python" is very hammy for me, but is probably at best neutral for most people; it may even be strongly spammy for most people, thanks to <http://www.python.com>'s advertising. Etc. The details of your personal email life may be as unique as a fingerprint.
- Previous message (by thread): spam classification breaker
- Next message (by thread): spam classification breaker
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list