Here's how I integrate dspam into dovecot.
Edited version of Johannes page
I have written a dovecot plugin that watches a special folder which I'll call `SPAM` from now on. When the MTA (postfix/sendmail here) delivers a message to the user, it'll first run it through the spam classifier, in my case dspam. If it is classified as spam, it'll be delivered to the `SPAM` folder instead of the normal filtering file the user may have (my system uses dovecot-lda with sieve filtering).
Now at this point we have:
Obviously this isn't enough because our spam scanner needs training. We'll occasionally have false positives and false negatives.
With something like DMT, you have to move false positives into a `HamTrain` folder, and false negatives into a `SpamTrain` folder. On the other hand, I want those mails in whatever special folders I chose, since my mail filters didn't apply to the spam email and that false positive could've been through a mailing list.
Now this is the point where my dovecot plugin comes into play. Instead of moving mail into special folders, the user has two actions available:
1. moving it '''out''' of the `SPAM` folder and 2. moving it '''into''' the `SPAM` folder.
The dovecot plugin watches these actions (and additionally prohibits APPENDs to the `SPAM` folder, more for technical reasons than others) and tells dspam that it made an error and needs to re-classify the message, depending on which of the two actions the user did. The user can now move the message directly into whatever folder she choses, and it all works. Almost magic.
When I first suggested this, I was told that it wouldn't scale. I believe it can be made scale just as well as DMT or another home-grown system that runs the re-learning as a cron-job.
If you trigger training based on a mail copy, what happens when someone dumps 400 emails into a folder all at once? What happens when 30 people do this all at the same time? It might not suit a smaller system at peak hours to have this done.
– Tom Allison (on the dovecot mailing list)
Well, here's my answer: instead of instantly re-learning the messages, we implement the DMT approach behind the scenes. When the user moves a mail out of the `SPAM` folder, we hardlink it into a special `RelearnHam` folder, when it is moved into the `SPAM` folder, hardlink into a `RelearnSpam` folder. At night, when the system is supposed to be doing less, process all the mails in those Relearn* folders (for all users) depending on which folder they are in. With dspam, this gets even better: it suffices to store only the signatures of those mis-classified mails, for example in a special database. After re-learning, the messages are purged.
Actually, this is not quite correct. As Tom Allison pointed out: when the user re-visits her decision about the classification before the cron-job is run, this could give inconsistencies. Therefore, if a mail is linked into the `RelearnHam` folder it must be unlinked from `RelearnSpam` (and vice versa).
My solution (and what this plugin does). This solution came about, through discussions on the dovecot list, and when I choose to implement it, Johannes and I discussed some of the finer details about it.
When a user moves a message into the Spam folder, a database table is updated (user can change how via stored Procedures in MySQL 5) and the table is also updated when a message is moved out of the folder. Then, a runner program can process the table telling dspam what to relearn (and in my case, clearing the table contents as it goes). The real trick is in how you update the table which I'll show later.
– Tim White
Here is the code for beta9 and 1.0 RC15. I haven't tested these with any other versions, but it should work with the versions in between beta9 and RC15.
To use this plugin, you need to configure dspam to
'do not work', as dovecot is programmed to extract the signature. If you want (or already have) a different header line name, then just change the `#define` on top of the file.You also need to setup the database, I use dspam_plugin.sql to setup my table, and create my stored procedures.
The client runner for processing the database still needs to be written. I have a php version for now (which appears to work very well for small setups, feel free to write a C version if you wish). clientrunner.php
If you have any questions please send them to the dovecot mailing list: http://dovecot.org/cgi-bin/mailman/listinfo/dovecot. I do monitor the list, and will attempt to answer your questions as soon as possible.