(→Merged Group Bug) |
(→Merged Group Bug) |
||
Line 202: | Line 202: | ||
3. Restart the DSpam Daemon | 3. Restart the DSpam Daemon | ||
service dspam restart | service dspam restart | ||
− | 4. Search through dspam.debug for 'adding user to merged group' | + | 4. Wait for the daemon to segfault (this can take minutes to days depending on your level of traffic)<br> |
+ | 5. Search through dspam.debug for 'adding user to merged group' | ||
grep -i merged /var/log/dspam/dspam.debug | grep -i merged /var/log/dspam/dspam.debug | ||
If any results are returned then your merged group is working (at least until it crashes the dspam daemon!) | If any results are returned then your merged group is working (at least until it crashes the dspam daemon!) |
Revision as of 17:08, 6 October 2007
Right out of the README
2.1 CONFIGURING GROUPS Groups enable a group of users to share information. The following group types are supported: SHARED Enables users with similar email behavior to share the same dictionary while still maintaining a private quarantine box. The benefits of this type of group are faster learning, and sharing a single spam alias. Shared groups can have both positive and negative effects on accuracy. If a shared group consists of users with similar, predictable email behavior, the users in the group can benefit from a larger dictionary of spam and faster learning (especially for newcomers in the group). If a group consists of users with different email behavior, however, the users in the group will experience poor spam filtering and a higher number of false positives. NOTE The SQL-based storage drivers support shared groups, but has one caveat: If you are NOT enabling "virtual users" support, you will need to create an actual user on your system named after each group you create. On top of shared group support, a shared group can also be made to be 'managed'. Using the group type 'SHARED,MANAGED' will cause the group to share a single quarantine mailbox which could be managed by the group's administrator. This would enable one individual to monitor quarantine for the entire group, however personal emails marked as false positives could potentially be viewed as well. For this reason, managed groups should only be used when this is not an issue. INOCULATION An inoculation group allows users to maintain their own private dictionaries with their own spam alias, but all members of the group will inoculate other members with spams they manually forward into their alias. This allows users to report spams to one another and maintain their own private dictionary. Another advantage to this is that users do not necessarily have to share the same email behavior. NOTE: Users should only be added to an inoculation group after their initial learning period, to avoid potential false positives due to lack of data. To create groups, you'll want to create a file with the filename 'group' located in the DSPAM user directory. The default is /usr/local/var/dspam/group. The format of the file should look like this: group1:shared:user1,user2,user3 group2:inoculation:user4,user5,user6 A user can be a member of multiple inoculation groups, but a user cannot be a member of both an inoculation group and a shared group. DSPAM will read this file upon startup and determine if the user fits into any particular group. Use the dspam_stats tool to keep an eye on the effectiveness of shared groups. If a shared group experiences poor performance, find the users whose email behavior is inconsistent with that of the group and remove them from the group. CLASSIFICATION Classification groups allow a group of users to network their results together. If DSPAM is uncertain of whether a message is spam or nonspam for a group member, all other members of the group are queried. If another member believes the message to be spam, it will be marked as spam. A user can simultaneously be a member of a classification and inoculation group, but a user cannot be a member of both a classification group and a shared group. VERSATILE LANGUAGE INOCULATION MESSAGES A new Internet-Draft has been released to the public: http://www.ietf.org/internet-drafts/draft-spamfilt-inoculation-00.txt To create a message format standard for sending inoculation data via email. This will allow users on different servers, and even using different anti-spam tools to share inoculation information with one-another. DSPAM presently implements support for this message standard with the following limitations: - Only inbound inoculation messages are supported. DSPAM does not yet send out inoculations using this message format. This should not be confused with local inoculation, which *is* supported. - The message/inoculation format is the only inoculation type presently supported. text/inoculation and multipart/inoculation coming soon. - The only supported authentication mechanism is presently md5 verification codes/checksums. Any unsupported inoculations will simply be dropped. A list of identifies and authentication information can be set up in the file [username].inoc or in the user's home directory in a .inoc file if homedir-dotfiles is enabled. The format of this file is: sender1:shared secret sender2:shared secret Each sender should specify the correct sender id when sending an inoculation, and should generate their checksum based on the shared secret established between both parties. GLOBAL GROUPS Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box filtering" for all new users until they have built their own useful dictionaries. to create a global classification group, add something like this to $HOME/group: groupname:classification:*globaluser This will automatically add globaluser as a classification peer to all users. Any user who has less than 1000 innocent messages or 250 spam messages in their corpus, or whose filter is uncertain about a particular message will consult the global dictionary for an answer. Global groups will need to be trained using corpus or other means, or by using the dspam_merge tool. the global user (in this case 'globaluser') is treated just as any other user on the system. NOTE: Be sure and set your global user's preferences so that trainingMode is set to TOE. This will prevent the purge tools you use from purging them empty in 90 days. MERGED GROUPS Merged groups are similar to global groups in that the entire system uses a single global user as a parent. What's different is that the global group is merged with the individual user's training data at run-time, instead of switching between the two. This allows the global group to be treated like a base dataset for all users, and provides for quicker learning and correction than the previous approach. It is recommended merged groups are only used with TOE-mode training so that only corrective data is stored, but systems with ample amounts of disk may wish to run in TUM mode to learn the user's behavior dynamically. The group's data is merged with the user's data in real-time, so if you have: Group: Viagra = 10 Spam Hits, 0 Innocent Hits User: Viagra = 5 Spam Hits, 15 Innocent Hits Then the token is loaded as: 15 Spam Hits, 15 Innocent Hits = 0.50 (50%) No data is written to the group by DSPAM; only the user's data. This then offsets the group's data without affecting other users. Because of the way this data is merged, it's not recommended that you update the merged group with more than a handful of messages periodically, as it affects how all stats are defined for each user. To set up a merged group, use something like this in your group file: groupname:merged:* groupname:merged:user1,user2,userN groupname represents the name of the global user to merge with all members of the group. NOTE: Merged Groups are great for providing out-of-the-box adaptive filtering, but allowing users to build their own data from scratch will still result in the best possible accuracy in the longrun. NOTE: Be sure and set your global user's preferences so that trainingMode is set to TOE. This will prevent the purge tools you use from purging them empty in 90 days. IMPORTANT! If you are running dspam_clean, be sure to set a preference for your merged group users where trainingMode = TOE. This will cause dspam_clean to skip the purging of unused tokens from the global databases (which could wipe out your entire merged group user's dataset, since it's old).
Creating a simple Merged Group
To create a Merged Group do this:
vi /usr/local/var/dspam/group
It should contain:
YourUsername:merged:*
Merged Group Bug
It has been my experience (and some others) that there is a bug with 'Merged Groups' This bug causes the DSpam Daemon to segfault. It appears to happen mostly when Merged Groups are used in conjunction with MySQL. Some have claimed that it has to do with DSpam not understanding foreign characters in message headers (I cannot confirm nor deny this) Unfortunately the best way to see if you are effected by this bug is to create a merged group and wait for the daemon to crash. The last entry in the dspam.debug log is usually something like this 'sedation level set to:'.
To check if your merged group actually doing what its supposed to be doing, do the following:
1. Make sure you compiled with the --enable-debug option. You can check this by running:
dspam --version
2. Enable debuging in dspam.conf
Debug * DebugOpt process spam fp
3. Restart the DSpam Daemon
service dspam restart
4. Wait for the daemon to segfault (this can take minutes to days depending on your level of traffic)
5. Search through dspam.debug for 'adding user to merged group'
grep -i merged /var/log/dspam/dspam.debug
If any results are returned then your merged group is working (at least until it crashes the dspam daemon!)