Graham's spam filter
Sean 'Shaleh' Perry
shalehperry at attbi.com
Thu Aug 22 03:06:44 EDT 2002
More information about the Python-list mailing list
Thu Aug 22 03:06:44 EDT 2002
- Previous message (by thread): Graham's spam filter
- Next message (by thread): Graham's spam filter
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wednesday 21 August 2002 10:48 pm, Erik Max Francis wrote: > > As I said earlier, one blocking issue for me in actually putting the > filter into practice is the lack of good corpora (one for spam, one for > non-spam); I keep all mail I receive, but the "backups" that I have > usually consist of all the email I've ever received. (I certainly have > kept a lot of good mail, but of course I've deleted a lot more, so it's > hard to know whether or not it would be useful.) Note that if, from now > on, I did manage to keep a corpus of all good email I've received > alongside all email (both good and bad), it would be easy to apply > simple subtraction to determine the good and bad figures (which are > needed by Graham's algorithm), but what I have now consists of only some > good messages going back through time and all email I've ever received > (good and bad) since I switched over to my new rule-based Python filter. > Since I read that article I created a spam folder and moved all spam there rather than delete it. I now have 400 or so messages in that folder. Should be a sufficient corpus and it grows daily. An interesting issue for me is the contents of the spam. Some 70% of my spam is Asian so there is a strong chance that any mail with CJK words will appear to be spam, especially Korean.
- Previous message (by thread): Graham's spam filter
- Next message (by thread): Graham's spam filter
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list