![]() |
|
ADSP 21xx
Have you found this site useful? Did we save you time? Did we cure your head-ache? Is your hair growing back now? Please make a donation to help with maintenance. |
Custom Search
SpamProbe Howto GuideFor Mandrake 9.1 and SpamProbe 0.9eHerman Oosthuysen 15 November 2003 Licenced under GNU GPL, 2003, http://www.gnu.org GeneralSpamprobe is arguably the best Bayesian mail filter available. Where most filters count only single words, Spamprobe counts word pairs as well. It also handles the mail headers and HTML tags in an intelligent fashion. The result is a very good filter with about 99% effectiveness and I have never seen any false positives. SpamProbe works well on a Fetchmail/Procmail system, which is what I describe here. With a 99% effective filter, a spammer would have to increase his spam transmissions by 100 times and all messages would have to be different, to get a significant number of spams past the filter. Hopefully, spam filters will improve even more over time, making spamming completely impractical. Where to get itYou can get SpamProbe here: http://spamprobe.sourceforge.net You will also need BerkeleyDB, available here: http://www.sleepycat.com InstallationFirst install BerkeleyDB, then SpamProbe. To install BerkeleyDB, download it to your home directory:
Go and get the tar file from the sleepycat web site: http://www.sleepycat.com
Now start up a browser and read docs/index.html Click Building for UNIX/POSIX systems To do a standard UNIX build of Berkeley DB, change to the build_unix directory and then enter the following two commands:
This will build the Berkeley DB library. To install the Berkeley DB library, enter the following commands:
To rebuild Berkeley DB, enter:
Now, here is the trick, which caused
me to write this howto. Make a symbolic link from /usr/lib to the
berkeley library:
otherwise, SpamProbe can't find the schtoopidttt library. To install SpamProbe, download it to your home directory:
Go and get the tar file from the SpamProbe web site: http://spamprobe/sourceforge.net
Configure and build spamprobe:
Install it:
Database SetupThis howto describes using SpamProbe with a common database. That makes it easy to make corrections to the database, since there is only one to worry about. Generally, for a given business, the e-mail will look pretty much the same for each user, since they all work on the same stuff, therefore using a common database should be good enough. If you want to use multiple databases, then you have to create a .spamprobe directory for each user, including root:
Now for the users:
and repeat for each and every user. You need this, since procmail runs with the permissions of the user the mail is addressed to. The system therefore could keep a different database for each user. Note that the procmail setup below will have to change if you want to use multiuple databases. Procmail SetupProcmail has to run spamprobe on each and every incoming message. Each message is also fed back into SpamProbe, to allow it to evolve its database. Errors muyst be manually corrected. We handle errors by creating two new mail users: spam and ham. TIP: Note that if you define user names and domain names in lower case, they become case insensitive in Unix/Linux. Therefore, NEVER define user/host/domain names with uppercase letters in them. If a user receives a good messages classified an spam, the user should forward it to user Spam, which will then cause SpamProbe to correct its behaviour. Similarly, if a spam message is received in the user's inbox, the user should forward it to user Ham, which will cause SpamProbe to correct its database accordingly. Here are the relevant parts from my /etc/procmail/procmailrc file. Place this definition at the top of the file: # Spamprobe configuration
SPAMPROBE=/usr/local/bin/spamprobe -d /var/spool/mail
Place this code before you sort the mail for each user:
### Spamprobe - Naive Bayesian Word Probability Filter
## Avoid running spamprobe again on spam corrections
:0
* ! (^TO_spam@YOURDOMAIN\.com)
{
# Score the message
:0
SCORE=| $SPAMPROBE receive
# Add the score to X-Spamprobe header
:0 wf
| formail -I "X-SpamProbe: $SCORE"
# Put a copy of spams in the spamprobe box
:0 ac:
* (^X-Spamprobe: SPAM)
/var/spool/mail/spamprobe
}
### Spam Corrections
### To correct a missclassification, forward it to the spam user address
:0
* (^TO_spam@YOURDOMAIN\.com)
{
:0
* (^X-SpamProbe: SPAM)
* ! (^X-Loop: SpamProbe)
{
# Was seen as spam, should be ham and reverse
header
:0 wf
| $FORMAIL -I "X-SpamProbe: GOOD" -rk
:0 wf
| $FORMAIL -I "X-Loop: SpamProbe"
# After the To/From reversal, fix the From line
again
:0 wf
| $FORMAIL -I "From " -a "From "
# Put it in Hambox and copy for redelivery and
user verification
:0 c:
/var/spool/mail/ham
# Rescan the hambox
:0 wc
| $SPAMPROBE good /var/spool/mail/ham
}
:0
* (^X-SpamProbe: GOOD)
* ! (^X-Loop: SpamProbe)
{
# Was seen as ham, should be spam and reverse
header
:0 wf
| $FORMAIL -I "X-SpamProbe: SPAM" -rk
:0 wf
| $FORMAIL -I "X-Loop: SpamProbe"
# After the To/From reversal, fix the From line
again
:0 wf
| $FORMAIL -I "From " -a "From "
# Put it in Spambox and copy for redelivery and
user verification
:0 c:
/var/spool/mail/spam
# Rescan the spambox
:0 wc
| $SPAMPROBE spam /var/spool/mail/spam
}
}
In addition, at the very end of my procmailrc file, I have the following code, to handle the leftovers:
### Unknowns - Whatever is left over is
spam by definition
# Avoid handling the spam twice though
:0
* ! (^X-SpamProbe:.*)
{
# Add a spam header
:0 wf
| $FORMAIL -I "X-SpamProbe: SPAM"
# Put it in Spambox and copy it
:0 c:
/var/spool/mail/spam
# Rescan the spambox
:0 Wc
| $SPAMPROBE spam /var/spool/mail/spam
}
SpamProbe EducationIn order to use SpamProbe, you have to teach it right from wrong. To do this, you need a Bible of Good messages and an Apokriva of Spam messages. If you were careful to delete all crud from your inbox, then that will do for the good messages. Hopefully, you also have a junkbox full of spam. If not, well, it is easy enough to get spam to train SpamProbe on... Before doing the commands below, first compact your mailboxes using your e-mail client, so that deleted/moved mail is really deleted/moved. This is very important, else SpamProbe will read 'moved' spam in the inbox for instance and corrupt its database, reducing its effectiveness. To teach SpamProbe about Ham:
I create a new inbox each year, so I have to run the above multiple times on each inbox. To teach SpamProbe about Spam:
Repeat the above for each user in the system. This process will create the SpamProbe database /var/spool/mail/sp_words. Finally, ensure that SpamProbe can always access the database:
This is required, since procmail runs with the permissions of the user to whom the mail is addressed, so the database must be readable by everybody in the mail group. Change as required for your system. E-Mail Client ConfigurationWith this setup, all mail will be delivered to the user, but the mail will contain a new header, which can be used by the client, to sort the mail into the inbox and junkbox. Configure your e-mail client to look for the header X-SpamProbe: SPAM and dump it into the junkbox. La Voila! Have fun, Herman. |
|
Copyright © 2005-2008, Aerospace Software Ltd., GPL. |