From LedHed's Wiki
Jump to: navigation, search

Right out of the README

 2.1 CONFIGURING GROUPS
 
  Groups enable a group of users to share information.  The following
  group types are supported:
 
  SHARED
  Enables users with similar email behavior to share the same dictionary 
  while still maintaining a private quarantine box.  The benefits of this
  type of group are faster learning, and sharing a single spam alias.  Shared
  groups can have both positive and negative effects on accuracy.  If a shared
  group consists of users with similar, predictable email behavior, the users 
  in the group can benefit from a larger dictionary of spam and faster 
  learning (especially for newcomers in the group).  If a group consists of 
  users with different email behavior, however, the users in the group will 
  experience poor spam filtering and a higher number of false positives.
 
  NOTE
    The SQL-based storage drivers support shared groups, but has one caveat:
    If you are NOT enabling "virtual users" support, you will need to create
    an actual user on your system named after each group you create.
 
  On top of shared group support, a shared group can also be made to be
  'managed'.  Using the group type 'SHARED,MANAGED' will cause the group to
  share a single quarantine mailbox which could be managed by the group's
  administrator.  This would enable one individual to monitor quarantine for
  the entire group, however personal emails marked as false positives could
  potentially be viewed as well.  For this reason, managed groups should only
  be used when this is not an issue.
 
  INOCULATION
  An inoculation group allows users to maintain their own private dictionaries
  with their own spam alias, but all members of the group will inoculate other
  members with spams they manually forward into their alias.  This allows 
  users to report spams to one another and maintain their own private
  dictionary.  Another advantage to this is that users do not necessarily have
  to share the same email behavior.  
 
  NOTE: Users should only be added to an inoculation group after their initial
        learning period, to avoid potential false positives due to lack of data.
 
  To create groups, you'll want to create a file with the filename 'group' 
  located in the DSPAM user directory.  The default is
  /usr/local/var/dspam/group. The format of the file should look like this:
 
  group1:shared:user1,user2,user3
  group2:inoculation:user4,user5,user6
 
  A user can be a member of multiple inoculation groups, but a user cannot be
  a member of both an inoculation group and a shared group.
 
  DSPAM will read this file upon startup and determine if the user fits into
  any particular group.  
   
  Use the dspam_stats tool to keep an eye on the effectiveness of shared groups.
  If a shared group experiences poor performance, find the users whose email 
  behavior is inconsistent with that of the group and remove them from the 
  group.
 
  CLASSIFICATION
  Classification groups allow a group of users to network their results
  together.  If DSPAM is uncertain of whether a message is spam or nonspam for
  a group member, all other members of the group are queried.  If another
  member believes the message to be spam, it will be marked as spam.
 
  A user can simultaneously be a member of a classification and inoculation
  group, but a user cannot be a member of both a classification group and a
  shared group.
 
  VERSATILE LANGUAGE INOCULATION MESSAGES
 
  A new Internet-Draft has been released to the public:
 
    http://www.ietf.org/internet-drafts/draft-spamfilt-inoculation-00.txt
 
  To create a message format standard for sending inoculation data via email.
  This will allow users on different servers, and even using different 
  anti-spam tools to share inoculation information with one-another.
 
  DSPAM presently implements support for this message standard with the 
  following limitations:
 
  - Only inbound inoculation messages are supported.  DSPAM does not yet send
    out inoculations using this message format.  This should not be confused
    with local inoculation, which *is* supported.
  
  - The message/inoculation format is the only inoculation type presently
    supported.  text/inoculation and multipart/inoculation coming soon.
 
  - The only supported authentication mechanism is presently md5 verification
    codes/checksums.
 
  Any unsupported inoculations will simply be dropped.
 
  A list of identifies and authentication information can be set up in the file
  [username].inoc or in the user's home directory in a .inoc file if
  homedir-dotfiles is enabled.  The format of this file is:
 
  sender1:shared secret
  sender2:shared secret
 
  Each sender should specify the correct sender id when sending an 
  inoculation, and should generate their checksum based on the shared secret
  established between both parties.
 
  GLOBAL GROUPS
 
  Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
  filtering" for all new users until they have built their own useful
  dictionaries.  to create a global classification group, add something like
  this to $HOME/group:
 
  groupname:classification:*globaluser
 
  This will automatically add globaluser as a classification peer to all users.
  Any user who has less than 1000 innocent messages or 250 spam messages in
  their corpus, or whose filter is uncertain about a particular message will
  consult the global dictionary for an answer.
 
  Global groups will need to be trained using corpus or other means, or by
  using the dspam_merge tool.  the global user (in this case 'globaluser') is
  treated just as any other user on the system.
 
  NOTE: Be sure and set your global user's preferences so that trainingMode
        is set to TOE. This will prevent the purge tools you use from
        purging them empty in 90 days.
 
  MERGED GROUPS
 
  Merged groups are similar to global groups in that the entire system uses
  a single global user as a parent.  What's different is that the global
  group is merged with the individual user's training data at run-time,
  instead of switching between the two.  This allows the global group to be
  treated like a base dataset for all users, and provides for quicker
  learning and correction than the previous approach.  It is recommended 
  merged groups are only used with TOE-mode training so that only corrective 
  data is stored, but systems with ample amounts of disk may wish to run in
  TUM mode to learn the user's behavior dynamically.
 
  The group's data is merged with the user's data in real-time, so if you have:
 
  Group: Viagra = 10 Spam Hits, 0 Innocent Hits
  User: Viagra = 5 Spam Hits, 15 Innocent Hits
 
  Then the token is loaded as: 15 Spam Hits, 15 Innocent Hits = 0.50 (50%)
 
  No data is written to the group by DSPAM; only the user's data. This then
  offsets the group's data without affecting other users. Because of the way
  this data is merged, it's not recommended that you update the merged group
  with more than a handful of messages periodically, as it affects how all
  stats are defined for each user.
 
  To set up a merged group, use something like this in your group file:
 
  groupname:merged:*
  groupname:merged:user1,user2,userN
 
  groupname represents the name of the global user to merge with all members of
  the group.
 
  NOTE: Merged Groups are great for providing out-of-the-box adaptive filtering,
        but allowing users to build their own data from scratch will still 
        result in the best possible accuracy in the longrun.
 
  NOTE: Be sure and set your global user's preferences so that trainingMode
        is set to TOE. This will prevent the purge tools you use from
        purging them empty in 90 days.
 
 
  IMPORTANT!
 
  If you are running dspam_clean, be sure to set a preference for your merged
  group users where trainingMode = TOE. This will cause dspam_clean to skip
  the purging of unused tokens from the global databases (which could wipe
  out your entire merged group user's dataset, since it's old).
 


Creating a simple Merged Group

To create a Merged Group do this:

vi /usr/local/var/dspam/group

It should contain:

YourUsername:merged:*


Merged Group Bug

It has been my experience (and some others) that there is a bug with 'Merged Groups'. This bug causes the DSpam Daemon to segfault. It appears to happen mostly when Merged Groups are used in conjunction with MySQL. Some have claimed that it has to do with DSpam not understanding foreign characters in message headers (I cannot confirm nor deny this) Unfortunately the best way to see if you are effected by this bug is to create a merged group and wait for the daemon to crash. The last entry in the dspam.debug log is usually something like this 'sedation level set to:'.

To check if your merged group actually doing what its supposed to be doing, do the following:
1. Make sure you compiled with the --enable-debug option. You can check this by running:

dspam --version

2. Enable debuging in dspam.conf

Debug *
DebugOpt process spam fp

3. Restart the DSpam Daemon

service dspam restart

4. Wait a while for log entries to populate the log file.
5. Search through dspam.debug for 'adding user to merged group'

grep -i merged /var/log/dspam/dspam.debug

If any results are returned then your merged group is working (at least until it crashes the dspam daemon!)