From LedHed's Wiki
Jump to: navigation, search
(UPGRADING DSPAM)
(ALIASES)
 
(94 intermediate revisions by the same user not shown)
Line 16: Line 16:
  
  
 +
'''CREDITS'''
  
== OVERVIEW ==
+
Original Work By:
 +
*Lead development: Jonathan A. Zdziarski <[email protected]>
 +
*Postgres driver: Rustam Aliyev <[email protected]>
 +
Various:
 +
*Feb/2006 Cove Schneider <[email protected]>
 +
*Jan/2006 Norman Maurer <[email protected]>
  
----
+
Your name is missing? Let us know with a reference to your commit, and we'll
 +
add you to the list.
  
 +
 +
'''COPYRIGHT'''
 +
 +
Original work was done by Jonathan A. Zdziarski.
 +
 +
In 2006 the copyright was handed over to Sensory Networks.
 +
 +
In 2009 Sensory Networks handed over the full copyright to the DSPAM Project.
 +
As of 12 January 2009 the copyright is owned by the DSPAM Project, represented by a team of people, including:
 +
* Alexander Prinsier
 +
* Ion-Mihai Tetcu
 +
* Paul Cockings
 +
* Dov Zamir
 +
* Stevan Bajic
 +
 +
<br>
 +
== OVERVIEW ==
 +
----
 
DSPAM is an open-source, freely available anti-spam solution designed to combat unsolicited commercial email using advanced statistical analysis. In short, DSPAM filters spam by learning what spam is and isn't. It does this by learning each user's individual mail behavior. This allows DSPAM to provide highly-accurate, personalized filtering for each user on even a large system and provides an administratively maintenance free solution capable of learning each user's email behaviors with very few false positives.
 
DSPAM is an open-source, freely available anti-spam solution designed to combat unsolicited commercial email using advanced statistical analysis. In short, DSPAM filters spam by learning what spam is and isn't. It does this by learning each user's individual mail behavior. This allows DSPAM to provide highly-accurate, personalized filtering for each user on even a large system and provides an administratively maintenance free solution capable of learning each user's email behaviors with very few false positives.
  
Line 36: Line 61:
  
  
''PLEASE NOTE:'' DSPAM and libdspam are distributed under the GPL license, not the LGPL. Commercial licensing is available for those who seek to redistribute DSPAM or some of DSPAM's components/libraries in their non-GPL products. Please contact [email protected] for more information about commercial licensing.<br>
+
''PLEASE NOTE:''<br>
<br>
+
DSPAM and libdspam are distributed under the GPL license, not the LGPL. Commercial licensing is available for those who seek to redistribute DSPAM or some of DSPAM's components/libraries in their non-GPL products. Please contact us for more information about commercial licensing.
 +
 
  
 
The DSPAM package is split up into the following pieces:
 
The DSPAM package is split up into the following pieces:
Line 61: Line 87:
 
Some basic tools which have been provided to manage dictionaries, automate corpus feeding, and perform other diagnostic operations related to DSPAM. Some of these include dspam_train, dspam_stats, and dspam_dump.
 
Some basic tools which have been provided to manage dictionaries, automate corpus feeding, and perform other diagnostic operations related to DSPAM. Some of these include dspam_train, dspam_stats, and dspam_dump.
  
 
+
<br>
 
+
  
 
== IMPLEMENTATION OPTIONS ==
 
== IMPLEMENTATION OPTIONS ==
Line 81: Line 106:
  
 
  [MTA] ---> [LDA] ---> (User's Mailbox)
 
  [MTA] ---> [LDA] ---> (User's Mailbox)
 +
  
 
AFTER:
 
AFTER:
Line 88: Line 114:
 
                       \--> [Quarantine]
 
                       \--> [Quarantine]
 
         [End User] ------> [Web UI]
 
         [End User] ------> [Web UI]
 +
 +
<br>
  
 
=== As a POP3 Proxy ===
 
=== As a POP3 Proxy ===
Line 102: Line 130:
 
                   \
 
                   \
 
                   \--> [POP3 Server]
 
                   \--> [POP3 Server]
 +
 +
<br>
  
 
=== As an SMTP Relay ===
 
=== As an SMTP Relay ===
Line 119: Line 149:
 
             [End User] ------> [Web UI]
 
             [End User] ------> [Web UI]
  
== INSTALLATION ==
+
<br>
  
 +
== INSTALLATION ==
 
----
 
----
 
+
<br>
 
+
  
 
=== UPGRADING DSPAM ===
 
=== UPGRADING DSPAM ===
Line 129: Line 159:
 
Follow the steps sequentially from the base version you are running up to the top.
 
Follow the steps sequentially from the base version you are running up to the top.
  
 +
<br>
 +
==== Upgrading from 3.8 ====
  
 +
1. Ensure MySQL is using the new database schema. The following clauses should be executed for upgrading pre-3.9.0 DSPAM MySQL schema to the 3.9.0 schema:
 +
ALTER TABLE `dspam_signature_data`
 +
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL,
 +
  CHANGE `data` `data` LONGBLOB NOT NULL,
 +
  CHANGE `length` `length` INT UNSIGNED NOT NULL;
 +
ALTER TABLE `dspam_stats`
 +
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL,
 +
  CHANGE `spam_learned` `spam_learned` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_learned` `innocent_learned` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `spam_misclassified` `spam_misclassified` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_misclassified` `innocent_misclassified` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `spam_corpusfed` `spam_corpusfed` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_corpusfed` `innocent_corpusfed` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `spam_classified` `spam_classified` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_classified` `innocent_classified` BIGINT UNSIGNED NOT NULL;
 +
ALTER TABLE `dspam_token_data`
 +
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL,
 +
  CHANGE `spam_hits` `spam_hits` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_hits` `innocent_hits` BIGINT UNSIGNED NOT NULL;
 +
 +
 +
If you are using preference extension with DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM preference MySQL schema to the 3.9.0 schema:
 +
ALTER TABLE `dspam_preferences`
 +
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL;
 +
 +
 +
If you are using virtual users (with AUTO_INCREMENT) in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids MySQL schema to the 3.9.0 schema:
 +
ALTER TABLE `dspam_virtual_uids`
 +
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL AUTO_INCREMENT;
 +
 +
 +
If you are using virtual user aliases (aka: DSPAM in relay mode) in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids MySQL schema to the 3.9.0 schema:
 +
ALTER TABLE `dspam_virtual_uids`
 +
      CHANGE `uid` `uid` INT UNSIGNED NOT NULL;
 +
 +
 +
If you need to speed up the MySQL purging script and can afford to use more disk space for the DSPAM MySQL data, then consider executing the following clause for adding three additional indices:
 +
ALTER TABLE `dspam_token_data`
 +
  ADD INDEX(`spam_hits`),
 +
  ADD INDEX(`innocent_hits`),
 +
  ADD INDEX(`last_hit`);
 +
 +
 +
2. Ensure PosgreSQL is using the new database schema. The following clauses should be executed for upgrading pre-3.9.0 DSPAM PosgreSQL schema to the 3.9.0 schema:
 +
ALTER TABLE dspam_preferences ALTER COLUMN uid TYPE integer;
 +
ALTER TABLE dspam_signature_data ALTER COLUMN uid TYPE integer;
 +
ALTER TABLE dspam_stats ALTER COLUMN uid TYPE integer;
 +
ALTER TABLE dspam_token_data ALTER COLUMN uid TYPE integer;
 +
DROP INDEX IF EXISTS id_token_data_sumhits;
 +
 +
 +
If you are using virtual users in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids to the 3.9.0 schema:
 +
ALTER TABLE dspam_virtual_uids ALTER COLUMN uid TYPE integer;
 +
 +
<br>
  
 
==== Upgrading From 3.6 ====
 
==== Upgrading From 3.6 ====
Line 147: Line 234:
 
5. Add "ProcessorURLContext" setting in dspam.conf ProcessorURLContext has been added to toggle whether URL specific tokens are created in the tokenizer process. The "on" value is default for previous versions of DSPAM.
 
5. Add "ProcessorURLContext" setting in dspam.conf ProcessorURLContext has been added to toggle whether URL specific tokens are created in the tokenizer process. The "on" value is default for previous versions of DSPAM.
  
 
+
<br>
  
 
==== Upgrading From 3.4 ====
 
==== Upgrading From 3.4 ====
Line 168: Line 255:
  
  
''NOTE:''
+
''NOTE:''<br>
 
Berkeley DB drivers (libdb3_drv, libdb4_drv) are deprecated and have been removed from the build. You will need to select an alternative storage driver in order to upgrade.
 
Berkeley DB drivers (libdb3_drv, libdb4_drv) are deprecated and have been removed from the build. You will need to select an alternative storage driver in order to upgrade.
  
=== FRESH INSTALLATION ===
+
<br>
  
 +
=== FRESH INSTALLATION ===
 
----
 
----
 +
<br>
  
 
'''PREREQUISITES'''
 
'''PREREQUISITES'''
Line 202: Line 291:
  
 
You can download MySQL from http://www.mysql.com.
 
You can download MySQL from http://www.mysql.com.
 +
 
You can download PostgreSQL from http://www.postgresql.com.
 
You can download PostgreSQL from http://www.postgresql.com.
 +
 
You can download SQLite from http://www.sqlite.org.
 
You can download SQLite from http://www.sqlite.org.
  
 
+
<br>
 
+
 
==== CONFIGURATION ====
 
==== CONFIGURATION ====
  
Line 214: Line 304:
 
DSPAM supports the configuration options below. Generally, the default configuration is more than acceptable, so it's a good idea not to tweak too many settings unless you know what you are doing.
 
DSPAM supports the configuration options below. Generally, the default configuration is more than acceptable, so it's a good idea not to tweak too many settings unless you know what you are doing.
  
 
+
<br>
 
+
 
===== PATH SWITCHES =====
 
===== PATH SWITCHES =====
  
Line 233: Line 322:
 
Specify an alternative log directory. The default is $dspam_home/log. Do not set this to /var/log unless DSPAM will have permissions to write to the directory.
 
Specify an alternative log directory. The default is $dspam_home/log. Do not set this to /var/log unless DSPAM will have permissions to write to the directory.
  
 
+
<br>
 
+
 
===== FILESYSTEM SCALE =====
 
===== FILESYSTEM SCALE =====
  
Line 247: Line 335:
 
Switch for domain-scale implementation.  When used, DSPAM expects username@domain to be passed in as the user id and user data will be stored as $HOME/data/domain.com/user and $HOME/opt-in/domain/user.dspam instead of $HOME/data/user
 
Switch for domain-scale implementation.  When used, DSPAM expects username@domain to be passed in as the user id and user data will be stored as $HOME/data/domain.com/user and $HOME/opt-in/domain/user.dspam instead of $HOME/data/user
  
 
+
<br>
 
+
 
===== INTEGRATION SWITCHES =====
 
===== INTEGRATION SWITCHES =====
  
 
  --with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
 
  --with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
Specify your storage driver selection(s).  A storage driver is a driver written specifically for DSPAM to store tokens, signature data, andperform other proprietary operations.  The default driver is hash_drv. The following drivers have been provided:
+
Specify your storage driver selection(s).  A storage driver is a driver written specifically for DSPAM to store tokens, signature data, and perform other proprietary operations.  The default driver is hash_drv. The following drivers have been provided:
  
 
  mysql_drv:  MySQL Drivers  
 
  mysql_drv:  MySQL Drivers  
Line 277: Line 364:
  
  
''Note:''
+
''NOTE:''<br>
 
+
 
This function is incompatible with most implementations of the Web UI, since it requires access to read each user's home directory. Therefore, only use this option if you will not be using the Web UI or plan on doing something asinine like running it as root.
 
This function is incompatible with most implementations of the Web UI, since it requires access to read each user's home directory. Therefore, only use this option if you will not be using the Web UI or plan on doing something asinine like running it as root.
  
Line 285: Line 371:
 
Builds DSPAM with support for daemon mode, and builds associated dspamc thin client. Pthreads is required to build for daemon mode and the storage driver used must be thread-safe.
 
Builds DSPAM with support for daemon mode, and builds associated dspamc thin client. Pthreads is required to build for daemon mode and the storage driver used must be thread-safe.
  
 
+
<br>
  
 
===== DRIVER SPECIFIC CONFIGURE SWITCHES =====
 
===== DRIVER SPECIFIC CONFIGURE SWITCHES =====
Line 313: Line 399:
  
  
''Note:''
+
''NOTE:''<br>
 
+
 
Please see the file doc/mysql_drv.txt for more information about configuring the mysql_drv storage driver.
 
Please see the file doc/mysql_drv.txt for more information about configuring the mysql_drv storage driver.
  
Line 335: Line 420:
  
  
''Note:''
+
''NOTE:''<br>
 
+
 
Please see the file doc/pgsql_drv.txt for more information about configuring the pgsql_drv storage driver.
 
Please see the file doc/pgsql_drv.txt for more information about configuring the pgsql_drv storage driver.
  
Line 351: Line 435:
 
Specify a path to the SQLite libraries
 
Specify a path to the SQLite libraries
  
 
+
<br>
 
+
 
===== DEBUGGING SWITCHES =====
 
===== DEBUGGING SWITCHES =====
 
  
 
  --enable-debug
 
  --enable-debug
Line 364: Line 446:
  
  
''Note:''
+
''NOTE:''<br>
 
+
 
When verbose debug is compiled in, DSPAM performs many additional mathematical calculations regardless of whether or not it's been activated. You shouldn't use --enable-verbose for production builds unless you have serious issues you can't resolve.
 
When verbose debug is compiled in, DSPAM performs many additional mathematical calculations regardless of whether or not it's been activated. You shouldn't use --enable-verbose for production builds unless you have serious issues you can't resolve.
  
 
+
<br>
 
+
 
===== FEATURE ACTIVATION =====
 
===== FEATURE ACTIVATION =====
  
Line 375: Line 455:
 
Enables support for Clam Antivirus. DSPAM can interface directly with clamd to perform virus scanning and can be configured to react in different ways to viruses. See dspam.conf for more information.
 
Enables support for Clam Antivirus. DSPAM can interface directly with clamd to perform virus scanning and can be configured to react in different ways to viruses. See dspam.conf for more information.
  
 
+
<br>
 
+
 
===== ADDITIONAL CONFIGURATION OPTIONS =====
 
===== ADDITIONAL CONFIGURATION OPTIONS =====
  
 
The remainder of configuration options are located in dspam.conf, which is installed in sysconfdir (default: /usr/local/etc) upon a make install. It is generally a good idea to review dspam.conf and make any changes necessary prior to using DSPAM.
 
The remainder of configuration options are located in dspam.conf, which is installed in sysconfdir (default: /usr/local/etc) upon a make install. It is generally a good idea to review dspam.conf and make any changes necessary prior to using DSPAM.
  
 
+
<br>
  
 
==== BUILDING AND INSTALLING ====
 
==== BUILDING AND INSTALLING ====
Line 391: Line 470:
  
  
''Note:''
+
''NOTE:''<br>
 
If you are a developer wanting to link to the core engine of dspam, libdspam will be built during this process.  Please see the example.c file for examples of how to link to and use libdspam. Static and dynamic libraries are built in the .libs directory. Needed headers will be installed in $prefix$/include/dspam.
 
If you are a developer wanting to link to the core engine of dspam, libdspam will be built during this process.  Please see the example.c file for examples of how to link to and use libdspam. Static and dynamic libraries are built in the .libs directory. Needed headers will be installed in $prefix$/include/dspam.
  
 
+
<br>
  
 
==== PERMISSIONS ====
 
==== PERMISSIONS ====
Line 403: Line 482:
  
 
The CGI User: This is the user your web server (most likely Apache) is running as. This is commonly 'nobody' or 'web'. You can find this in Apache's httpd.conf by searching for 'User'. The CGI user will need the ability to access the following components of DSPAM:
 
The CGI User: This is the user your web server (most likely Apache) is running as. This is commonly 'nobody' or 'web'. You can find this in Apache's httpd.conf by searching for 'User'. The CGI user will need the ability to access the following components of DSPAM:
- Ability to execute the dspam binary
+
* Ability to execute the dspam binary
- Ability to read and write to dspam_home/data/
+
* Ability to read and write to dspam_home/data/
- Trusted user permissions in dspam.conf ("Trust [username]")
+
* Trusted user permissions in dspam.conf ("Trust [username]")
- The execution 'Group' used must match the group dspam is running as
+
* The execution 'Group' used must match the group dspam is running as (this is typically 'mail', 'dspam', or similar).
  (this is typically 'mail', 'dspam', or similar)
+
 
   
+
  
 
The MTA User: This is the user your mail server software is running as when it executes DSPAM. This is usually daemon, mail, exim, etc. This is typically different from the user the MTA runs and polices itself as, to avoid security problems. Consult your MTA's documentation for more info. The MTA user will require:
 
The MTA User: This is the user your mail server software is running as when it executes DSPAM. This is usually daemon, mail, exim, etc. This is typically different from the user the MTA runs and polices itself as, to avoid security problems. Consult your MTA's documentation for more info. The MTA user will require:
- The ability to execute the dspam binary
+
* The ability to execute the dspam binary
- Trusted user permissions in dspam.conf ("Trust [username]")
+
* Trusted user permissions in dspam.conf ("Trust [username]")
  
  
 
Systems Administrators: In order to perform administrative functions, systems administratiors will require:
 
Systems Administrators: In order to perform administrative functions, systems administratiors will require:
- The ability to execute dspam-related binaries
+
* The ability to execute dspam-related binaries
- Trusted user permissions in dspam.conf ("Trust [username]")
+
* Trusted user permissions in dspam.conf ("Trust [username]")
  
  
''Note:''
+
''NOTE:''<br>
 
+
 
If the MTA is communicating with DSPAM via LMTP (explained later), then execution permissions are not necessary.
 
If the MTA is communicating with DSPAM via LMTP (explained later), then execution permissions are not necessary.
  
  
''Note about FreeBSD:''
+
''NOTE about FreeBSD:''<br>
 
+
 
FreeBSD's default MTA user is 'mailnull' FreeBSD's default delivery agent also changes its uid, and so in order to call it, dspam must be installed as setuid root to work on the commandline properly. This is done automatically on install.
 
FreeBSD's default MTA user is 'mailnull' FreeBSD's default delivery agent also changes its uid, and so in order to call it, dspam must be installed as setuid root to work on the commandline properly. This is done automatically on install.
  
Line 441: Line 517:
 
A list of trusted users is maintained in dspam.conf. This file should include a list of trusted users who should be allowed to set the dspam user, passthru parameters, and other information that would be potentially dangerous for a malicious user to be able to set.  You'll need to ensure that your CGI user, MTA user, and system administrators are on the list.
 
A list of trusted users is maintained in dspam.conf. This file should include a list of trusted users who should be allowed to set the dspam user, passthru parameters, and other information that would be potentially dangerous for a malicious user to be able to set.  You'll need to ensure that your CGI user, MTA user, and system administrators are on the list.
  
 
+
<br>
  
 
==== MAIL SERVER INTEGRATION ====
 
==== MAIL SERVER INTEGRATION ====
 
 
As previously mentioned, there are three popular ways to implement DSPAM:
 
As previously mentioned, there are three popular ways to implement DSPAM:
  
Line 476: Line 551:
 
  /bin/mail -d bob
 
  /bin/mail -d bob
  
 
+
<br>
 
+
 
===== ALIASES =====
 
===== ALIASES =====
 
 
There are essentially two different ways a user might train DSPAM. The first is by using the Web UI, which allows them to retrain via the "History" tab. This works quite well, as users must visit the Web UI occasionally to review their quarantine anyway (and reverse any false positives). We'll discuss this shortly in section 1.1.8.
 
There are essentially two different ways a user might train DSPAM. The first is by using the Web UI, which allows them to retrain via the "History" tab. This works quite well, as users must visit the Web UI occasionally to review their quarantine anyway (and reverse any false positives). We'll discuss this shortly in section 1.1.8.
  
Line 486: Line 559:
  
  
''Note:''
+
''NOTE:''<br>
 
+
 
If you are using an IMAP based system, Web-based email, or other form of email management where the original messages are stored on the server in pristine format, you can turn this signature feature off by setting "TrainPristine on" in dspam.conf. DSPAM will then use the message itself that you provide it to train, which MUST be identical to the original message in order to retrain properly.
 
If you are using an IMAP based system, Web-based email, or other form of email management where the original messages are stored on the server in pristine format, you can turn this signature feature off by setting "TrainPristine on" in dspam.conf. DSPAM will then use the message itself that you provide it to train, which MUST be identical to the original message in order to retrain properly.
  
Line 495: Line 567:
  
  
====== The Simple Way ======
+
''' The Simple Way '''
  
 
If you are using the MySQL or PgSQL storage drivers, the original numeric user id can be embedded in the signature, requiring only one central spam alias to be necessary for the entire system. To configure this, uncomment the appropriate UIDInSignature option in dspam.conf:
 
If you are using the MySQL or PgSQL storage drivers, the original numeric user id can be embedded in the signature, requiring only one central spam alias to be necessary for the entire system. To configure this, uncomment the appropriate UIDInSignature option in dspam.conf:
Line 510: Line 582:
  
  
''Note:''
+
''NOTE:''<br>
 
+
 
The 'root' user represents any active dspam user. It is necessary to supply a username on the commandline or DSPAM will bail on an error, however the user will be changed internally once the signature is read.
 
The 'root' user represents any active dspam user. It is necessary to supply a username on the commandline or DSPAM will bail on an error, however the user will be changed internally once the signature is read.
  
  
  
====== The Kind-of-Simple Way ======
+
''' The Kind-of-Simple Way '''
  
 
If you're not using one of the above storage drivers, the next easiest way to configure aliases is to have DSPAM parse the 'To:' header of the message and use a catch-all subdomain to direct all mail into DSPAM for retraining. You can then instruct your users to email addresses like '[email protected]'. The ParseToHeaders option (available in dspam.conf) will parse the To: header of forwarded messages and set the username to either 'bob' or '[email protected]', depending on how it is configured. DSPAM can also set the training mode to either "learn spam" or "learn notspam" depending on whether the user specified a spam- or notspam- address in the To: header.
 
If you're not using one of the above storage drivers, the next easiest way to configure aliases is to have DSPAM parse the 'To:' header of the message and use a catch-all subdomain to direct all mail into DSPAM for retraining. You can then instruct your users to email addresses like '[email protected]'. The ParseToHeaders option (available in dspam.conf) will parse the To: header of forwarded messages and set the username to either 'bob' or '[email protected]', depending on how it is configured. DSPAM can also set the training mode to either "learn spam" or "learn notspam" depending on whether the user specified a spam- or notspam- address in the To: header.
Line 532: Line 603:
  
  
====== The Old Way (A.K.A. The Hard Way) ======
+
''' The Old Way (A.K.A. The Hard Way) '''
  
 
If neither of the easy ways are possible, you're stuck with doing it the hard way. This means you'll need a separate spam alias (and notspam alias, if users are tagging mail) for each user. To do this, you will need to create an email address for each user, so that DSPAM can analyze and learn for that specific user.  For example:
 
If neither of the easy ways are possible, you're stuck with doing it the hard way. This means you'll need a separate spam alias (and notspam alias, if users are tagging mail) for each user. To do this, you will need to create an email address for each user, so that DSPAM can analyze and learn for that specific user.  For example:
Line 546: Line 617:
  
  
''Note About Security:''
+
''NOTE about Security:''
  
 
You might be wondering if a user can forward a spam to another user's address, or whether a spammer can forward a spam to another user's notspam address. The answer is "no". The key to all mail-based retraining is the signature embedded in each email. The signature is stored with each user's own user id, and so not only does the incoming message have to bear a valid signature, but it also has to be stored on the system with the correct user id. This prevents any kind of alias abuse.
 
You might be wondering if a user can forward a spam to another user's address, or whether a spammer can forward a spam to another user's notspam address. The answer is "no". The key to all mail-based retraining is the signature embedded in each email. The signature is stored with each user's own user id, and so not only does the incoming message have to bear a valid signature, but it also has to be stored on the system with the correct user id. This prevents any kind of alias abuse.
  
 
+
<br>
  
 
==== NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS ====
 
==== NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS ====
 
+
<br>
 
===== Non-SQL Based Nightly Purge =====
 
===== Non-SQL Based Nightly Purge =====
 
 
If you are NOT running a SQL-based solution, then you should configure dspam_clean to run under cron nightly. This clean tool will read all signature databases and purge signatures that are older than 14 days (configurable), purge abandoned tokens, and remove unimportant tokens. Without this tool, old signatures will continue to pile up. Be sure the user running cleanup has full read/write permissions on the DSPAM data files.
 
If you are NOT running a SQL-based solution, then you should configure dspam_clean to run under cron nightly. This clean tool will read all signature databases and purge signatures that are older than 14 days (configurable), purge abandoned tokens, and remove unimportant tokens. Without this tool, old signatures will continue to pile up. Be sure the user running cleanup has full read/write permissions on the DSPAM data files.
 
  0 0 * * * /usr/local/bin/dspam_clean [options]
 
  0 0 * * * /usr/local/bin/dspam_clean [options]
 
''See the dspam_clean description for more information''
 
''See the dspam_clean description for more information''
  
 
+
<br>
 
+
 
===== SQL-Based Nightly Purge =====
 
===== SQL-Based Nightly Purge =====
 
 
SQL-Based solutions include a nightly SQL script to perform the same basic
 
SQL-Based solutions include a nightly SQL script to perform the same basic
 
tasks as dspam_clean, and it does it much faster and with more finesse.
 
tasks as dspam_clean, and it does it much faster and with more finesse.
Line 572: Line 640:
 
  0 0 * * * mysql --user=[user] --pass=[pass] [db] < /path/to/purge-4.1.sql
 
  0 0 * * * mysql --user=[user] --pass=[pass] [db] < /path/to/purge-4.1.sql
  
 
+
<br>
 
+
 
===== Log Rotation =====
 
===== Log Rotation =====
 
 
The system log and user logs can fill up fairly quickly, when all that's really needed to generate graphs are the last two to three weeks of data. You can configure a nightly log cleanup using dspam_logrotate:
 
The system log and user logs can fill up fairly quickly, when all that's really needed to generate graphs are the last two to three weeks of data. You can configure a nightly log cleanup using dspam_logrotate:
  
 
  0 0 * * * dspam_logrotate -a 30 -d /usr/local/var/dspam/data
 
  0 0 * * * dspam_logrotate -a 30 -d /usr/local/var/dspam/data
  
 
+
<br>
  
 
==== NOTIFICATIONS ====
 
==== NOTIFICATIONS ====
 
 
DSPAM is capable of sending three different notifications to users:
 
DSPAM is capable of sending three different notifications to users:
 
+
* A "First Run" message sent to each user when they receive their first message through DSPAM.
- A "First Run" message sent to each user when they receive their first message through DSPAM.
+
* A "First Spam" message sent to each user when they receive their first spam
- A "First Spam" message sent to each user when they receive their first spam
+
* A "Quarantine Full" message sent to each user when their quarantine box is > 2MB in size.
- A "Quarantine Full" message sent to each user when their quarantine box is > 2MB in size.
+
  
 
These notifications can be activated by copying the txt/ directory from the distribution into DSPAM's home (by default /usr/local/var/dspam).  You will want to modify these templates prior to installing them to reflect the correct email addresses and URLs (look for 'configureme' and 'yourdomain').
 
These notifications can be activated by copying the txt/ directory from the distribution into DSPAM's home (by default /usr/local/var/dspam).  You will want to modify these templates prior to installing them to reflect the correct email addresses and URLs (look for 'configureme' and 'yourdomain').
  
  
''NOTE:''
+
''NOTE:''<br>
 
The quarantine warning is reset when the user clicks 'Delete All', but is not reset if they use "Delete Selected".  If the user doesn't wish to receive reminders, they should use the "Delete Selected" function instead of "Delete All".
 
The quarantine warning is reset when the user clicks 'Delete All', but is not reset if they use "Delete Selected".  If the user doesn't wish to receive reminders, they should use the "Delete Selected" function instead of "Delete All".
  
 
You'll need to also set "Notifications" to "on" in dspam.conf.
 
You'll need to also set "Notifications" to "on" in dspam.conf.
  
 
+
<br>
  
 
==== THE WEB UI ====
 
==== THE WEB UI ====
 
 
The Web UI (CGI client) can be run from any executable location on a web server, and detects its user's identity from the REMOTE_USER
 
The Web UI (CGI client) can be run from any executable location on a web server, and detects its user's identity from the REMOTE_USER
 
environment variable. This means you'll need to use HTTP password authentication to access the CGI (Any type of authentication will work, so long as Apache supports the module). This is also convenient in that you can set up authentication using almost any existing system you have. The only catch is that you'll need the usernames to match the actual DSPAM usernames used the system. A copy of the shadow password file will suffice for most common installs.
 
environment variable. This means you'll need to use HTTP password authentication to access the CGI (Any type of authentication will work, so long as Apache supports the module). This is also convenient in that you can set up authentication using almost any existing system you have. The only catch is that you'll need the usernames to match the actual DSPAM usernames used the system. A copy of the shadow password file will suffice for most common installs.
Line 609: Line 672:
  
  
''Note:''
+
''NOTE:''<br>
 
+
 
Some authentication mechanisms are case insensitive and will authenticate the user regardless of the case they type it in.  DSPAM, on the other hand, is case sensitive and the case of the username used will need to match the case on the system.  If you suffer from this authentication problem, and are certain all of your users' usernames are in lowercase, you can add the following line of code to the CGI right after the call to &ReadParse...
 
Some authentication mechanisms are case insensitive and will authenticate the user regardless of the case they type it in.  DSPAM, on the other hand, is case sensitive and the case of the username used will need to match the case on the system.  If you suffer from this authentication problem, and are certain all of your users' usernames are in lowercase, you can add the following line of code to the CGI right after the call to &ReadParse...
  
Line 618: Line 680:
  
  
''Note:''
+
''NOTE:''<br>
 
+
 
Apache users do NOT take on the identity of the groups specified in /etc/group so you will need to specifically assign the group in httpd.conf.
 
Apache users do NOT take on the identity of the groups specified in /etc/group so you will need to specifically assign the group in httpd.conf.
  
  
''Note about Procmail:''
+
''NOTE about Procmail:''<br>
 
+
 
Because the DSPAM Web UI is a CGI script, DSPAM will not retain its setuid privileges when called. If you are running procmail, this will become a problem as procmail requires root privileges to deliver. The easiest hack around this is to create a procmail.dspam binary and make it setuid root, then make it executable only by the mail group (or whatever group DSPAM and the CGI run in).
 
Because the DSPAM Web UI is a CGI script, DSPAM will not retain its setuid privileges when called. If you are running procmail, this will become a problem as procmail requires root privileges to deliver. The easiest hack around this is to create a procmail.dspam binary and make it setuid root, then make it executable only by the mail group (or whatever group DSPAM and the CGI run in).
  
  
The DSPAM Web UI has a minimal configuration inside the configure.pl script. You'll want to check and make sure all of the settings are correct. In most cases, the only that will be necessary to change are the large-scale or domain-scale flags.
+
The DSPAM Web UI has a minimal configuration inside the configure.pl script. You'll want to check and make sure all of the settings are correct. In most cases, the only settings that will be necessary to change are the large-scale or domain-scale flags.
  
  
Line 664: Line 724:
  
 
The following PERL modules (http://www.perl.com/CPAN/modules/by-module/GD/):
 
The following PERL modules (http://www.perl.com/CPAN/modules/by-module/GD/):
. GD
+
* GD
. GD-Graph3d
+
* GD-Graph3d
. GDGraph
+
* GDGraph
. GDTextUtil
+
* GDTextUtil
. CGI
+
* CGI
 
   
 
   
 
Typically this can be accomplished on the commandline:
 
Typically this can be accomplished on the commandline:
Line 683: Line 743:
 
'''Opt-In/Out'''
 
'''Opt-In/Out'''
  
If you would like your users to be able to opt in/out of DSPAM filtering, add the correct option to the nav_preferences.html template, depending on your configuration (for example, if you have an opt-in system, you'll want to add the opt-in option). Note: This currently only works with the preferences extension, and not drop files.
+
If you would like your users to be able to opt in/out of DSPAM filtering, add the correct option to the nav_preferences.html template, depending on your configuration (for example, if you have an opt-in system, you'll want to add the opt-in option).  
 +
 
 +
''NOTE:''<br>
 +
This currently only works with the preferences extension, and not drop files.
 +
 
 
  <INPUT TYPE=CHECKBOX NAME=optIn $C_OPTIN$>
 
  <INPUT TYPE=CHECKBOX NAME=optIn $C_OPTIN$>
 
  Opt into DSPAM filtering
 
  Opt into DSPAM filtering
Line 690: Line 754:
 
  Opt out of DSPAM filtering
 
  Opt out of DSPAM filtering
  
 
+
<br>
  
 
=== TESTING ===
 
=== TESTING ===
 
+
-----
 
If you've installed from an RPM, there's a good chance that the packager went to the trouble of testing already. If you're building from sources,however, you'll need to find a way to ensure your configuration isn't broken.
 
If you've installed from an RPM, there's a good chance that the packager went to the trouble of testing already. If you're building from sources,however, you'll need to find a way to ensure your configuration isn't broken.
  
Line 702: Line 766:
 
Before running the test, you should have completed section 1.1's instructions for compiling and installing dspam as well as configured your mail server to support dspam.
 
Before running the test, you should have completed section 1.1's instructions for compiling and installing dspam as well as configured your mail server to support dspam.
  
 
+
<br>
 
+
 
==== 1. Create a new user account on your system ====
 
==== 1. Create a new user account on your system ====
  
 
It is important that this be a new account to prevent any unrelated email from being delivered during testing.  Be sure to configure a spam alias for the test account.
 
It is important that this be a new account to prevent any unrelated email from being delivered during testing.  Be sure to configure a spam alias for the test account.
  
 
+
<br>
 
==== 2. Send a short email ====
 
==== 2. Send a short email ====
  
 
Send a short email (10 words or less) to the account, and pick it up using your favorite mail client.
 
Send a short email (10 words or less) to the account, and pick it up using your favorite mail client.
  
 
+
<br>
 
==== 3. Run dspam_stats ====
 
==== 3. Run dspam_stats ====
 
  dspam_state [username]
 
  dspam_state [username]
Line 721: Line 784:
 
If you receive an error such as "unable to open /usr/local/var/dspam... for reading", then the dspam agent is not configured correctly. The problem could exist in either your mail server configuration or one or more of the permissions on the directory or agent.  Check your configuration and permissions, and repeat this step until the correct results are experienced.
 
If you receive an error such as "unable to open /usr/local/var/dspam... for reading", then the dspam agent is not configured correctly. The problem could exist in either your mail server configuration or one or more of the permissions on the directory or agent.  Check your configuration and permissions, and repeat this step until the correct results are experienced.
  
 
+
<br>
 
==== 4. Run dspam_dump ====
 
==== 4. Run dspam_dump ====
 
  dspam_dump [username]
 
  dspam_dump [username]
Line 738: Line 801:
 
  7717766825815048192  S: 00265  I: 00068  P: 0.7358
 
  7717766825815048192  S: 00265  I: 00068  P: 0.7358
  
 
+
<br>
 
==== 5. Forward the test message ====
 
==== 5. Forward the test message ====
 
Forward the test message to the spam alias you've created for the test account. Provide enough time for the message to have processed.
 
Forward the test message to the spam alias you've created for the test account. Provide enough time for the message to have processed.
  
 
+
<br>
 
==== 6. Run dspam_stats again ====
 
==== 6. Run dspam_stats again ====
 
  dspam_state [username]
 
  dspam_state [username]
Line 749: Line 812:
 
If this is not the case, check the group permissions of the dspam agent as well as the permissions your MTA uses when piping to aliases.
 
If this is not the case, check the group permissions of the dspam agent as well as the permissions your MTA uses when piping to aliases.
 
    
 
    
 
+
<br>
 
==== 7. Run dspam_dump [username] again ====
 
==== 7. Run dspam_dump [username] again ====
 
dspam_dump [username]
 
dspam_dump [username]
Line 759: Line 822:
 
If you have some tokens that do not have an S: of 1 or an I: of 0, the dspam signature was not found on the email, and this could be due to a lot of things.
 
If you have some tokens that do not have an S: of 1 or an I: of 0, the dspam signature was not found on the email, and this could be due to a lot of things.
  
 
+
<br>
  
 
=== TROUBLESHOOTING ===
 
=== TROUBLESHOOTING ===
 
+
-----
 
''Problem:''
 
''Problem:''
 
No files are being created in the user directory
 
No files are being created in the user directory
Line 769: Line 832:
 
Check the directory permissions of the directory.  The user directory must be writable by the user the dspam agent is running as as well as the CGI user.
 
Check the directory permissions of the directory.  The user directory must be writable by the user the dspam agent is running as as well as the CGI user.
  
----
 
  
 +
<br>
 
''Problem:''
 
''Problem:''
 
False positives are never being delivered
 
False positives are never being delivered
Line 777: Line 840:
 
Your CGI most likely doesn't have the privileges required by the LDA to deliver the messages.  Make sure the CGI user is in the correct group. Also consider setting the dspam agent to setuid or setgid with the correct permissions.
 
Your CGI most likely doesn't have the privileges required by the LDA to deliver the messages.  Make sure the CGI user is in the correct group. Also consider setting the dspam agent to setuid or setgid with the correct permissions.
  
----
 
  
 +
<br>
 
''Problem:''
 
''Problem:''
 
My database is getting huge!
 
My database is getting huge!
  
 
''Solution:''
 
''Solution:''
DSPAM's default training mode is TEFT. On top of this, the purging defaults are very lax. You might consider switching to TOE (Train-on-Error) mode training if you require a minimal database. If you are willing to sacrifice accuracy for disk space, disabling the 'chain' tokenizer from dspam.conf will prevent the use of multi-word (chained) tokens, which will also cut your database size considerably. You may also consider more frequent calls to dspam_clean -p to purge neutral data, which comprises a majorrity of most databases. For more help, please see the DSPAM FAQ at http://dspam.nuclearelephant.com.
+
DSPAM's default training mode is TEFT. On top of this, the purging defaults are very lax. You might consider switching to TOE (Train-on-Error) mode training if you require a minimal database. If you are willing to sacrifice accuracy for disk space, disabling the 'chain' tokenizer from dspam.conf will prevent the use of multi-word (chained) tokens, which will also cut your database size considerably. You may also consider more frequent calls to dspam_clean -p to purge neutral data, which comprises a majorrity of most databases.
  
  
 +
For more help, please see the DSPAM FAQ at http://dspam.sourceforge.net.
  
=== DSPAM TOOLS ===
+
<br>
  
 +
=== DSPAM TOOLS ===
 +
-----
 
A few useful tools have been provided to make DSPAM management a bit easier. These tools include:
 
A few useful tools have been provided to make DSPAM management a bit easier. These tools include:
 
  dspam_admin
 
  dspam_admin
Line 798: Line 864:
 
Syntax: dspam_train [username] [spam_dir] [nonspam_dir] where username is the username of the user to apply the training to, and the two dirs represent directories containing messages in individual files (e.g. maildir/corpus format). dspam_train can be used on an existing user's database, to further improve accuracy, or to train from scratch. It also provides a solid test jig for testing the efficiency and accuracy of a test corpus against the filter.  
 
Syntax: dspam_train [username] [spam_dir] [nonspam_dir] where username is the username of the user to apply the training to, and the two dirs represent directories containing messages in individual files (e.g. maildir/corpus format). dspam_train can be used on an existing user's database, to further improve accuracy, or to train from scratch. It also provides a solid test jig for testing the efficiency and accuracy of a test corpus against the filter.  
  
''NOTE:''
+
''NOTE:''<br>
 
+
 
dspam_train will automatically balance training of the corpus to ensure both spam and nonspam are trained based on the ratio of spam/nonspam. this means if you have twice as much spam as nonspam, two spam will be trained for every nonspam.
 
dspam_train will automatically balance training of the corpus to ensure both spam and nonspam are trained based on the ratio of spam/nonspam. this means if you have twice as much spam as nonspam, two spam will be trained for every nonspam.
 
      
 
      
Line 811: Line 876:
 
  dspam_clean
 
  dspam_clean
 
Performs nightly housecleaning by deleting old or useless data from user data.  dspam_clean performs the following operations:
 
Performs nightly housecleaning by deleting old or useless data from user data.  dspam_clean performs the following operations:
 +
 
1. Using the -s flag, dspam_clean will continue to perform stale signature purging.  If an age is specified, for example -s14, the age defined as the default will be overridden. Specifying an age of 0 will delete all signatures for the users processed.
 
1. Using the -s flag, dspam_clean will continue to perform stale signature purging.  If an age is specified, for example -s14, the age defined as the default will be overridden. Specifying an age of 0 will delete all signatures for the users processed.
 
                                                                                  
 
                                                                                  
Line 820: Line 886:
 
  - Tokens which have only one spam hit
 
  - Tokens which have only one spam hit
 
  - Tokens which have only one innocent hit
 
  - Tokens which have only one innocent hit
                                                                               
+
 
 
Ages may be overridden by specifying a format such as -u30,15,10,10 where each number represents the respective age.  Specifying an age of zero will delete all unused tokens in the category. Defaults are set in dspam.conf.
 
Ages may be overridden by specifying a format such as -u30,15,10,10 where each number represents the respective age.  Specifying an age of zero will delete all unused tokens in the category. Defaults are set in dspam.conf.
 
                                                                                  
 
                                                                                  
Line 843: Line 909:
 
  dspam_clean -s -p -u
 
  dspam_clean -s -p -u
  
 +
''NOTE:''<br>
 +
You may wish to only run certain cleaning modes depending on the type of storage driver you are using.  For example, the MySQL storage driver includes a script which performs signature and unused token operations, leaving only probability operations as useful.  If you are using a SQL-based storage driver, it is strongly recommended that you use the maintenance scripts wherever possible for optimum efficiency.
  
''NOTE:''
 
  
You may wish to only run certain cleaning modes depending on the type of storage driver you are using. For example, the MySQL storage driver includes a script which performs signature and unused token operations, leaving only probability operations as useful.  If you are using a SQL-based storage driver, it is strongly recommended that you use the maintenance scripts wherever possible for optimum efficiency.
+
  dspam_stats
 
+
dspam_stats
+
 
Displays the spam statistics for one or all users on the system.
 
Displays the spam statistics for one or all users on the system.
 
Syntax: dspam_stats [username]
 
Syntax: dspam_stats [username]
Line 854: Line 919:
  
  
dspam_genaliases
+
dspam_genaliases
 
Reads the /etc/passwd file and outputs a dspam aliases table which can be included in the master aliases table.  You may try Art Sackett's generate_dspam_aliases tool at http://www.artsackett.com/freebies/generate_dspam_aliases/ if you need some better functionality.  This will eventually be merged in as a replacement for the existing tool.
 
Reads the /etc/passwd file and outputs a dspam aliases table which can be included in the master aliases table.  You may try Art Sackett's generate_dspam_aliases tool at http://www.artsackett.com/freebies/generate_dspam_aliases/ if you need some better functionality.  This will eventually be merged in as a replacement for the existing tool.
+
 
dspam_merge
+
 
 +
dspam_merge
 
Merges multiple users' dictionaries together into one user's dictionary (does not affect the merge users).  This can be used to create a seeded dictionary for a new user, or to copy a single user's dictionary to a new file.  This is great for building global dictionaries, but crunches a lot of time and disk.
 
Merges multiple users' dictionaries together into one user's dictionary (does not affect the merge users).  This can be used to create a seeded dictionary for a new user, or to copy a single user's dictionary to a new file.  This is great for building global dictionaries, but crunches a lot of time and disk.
  
 
+
<br>
  
 
=== AGENT COMMANDLINE ARGUMENTS ===
 
=== AGENT COMMANDLINE ARGUMENTS ===
 
+
-----
 +
<br>
 
==== Specifying a User ====
 
==== Specifying a User ====
 
The DSPAM agent (dspam) recognizes the following commandline arguments:
 
The DSPAM agent (dspam) recognizes the following commandline arguments:
 
  --user [user1 user2 ... userN]
 
  --user [user1 user2 ... userN]
 
Specifies the destination user(s) of the incoming message.  DSPAM then processes the message once for each user individually.  If the message is to be delivered, the $u (or %u) parameters of the arguments string will be interpolated for the current user being processed.
 
Specifies the destination user(s) of the incoming message.  DSPAM then processes the message once for each user individually.  If the message is to be delivered, the $u (or %u) parameters of the arguments string will be interpolated for the current user being processed.
 +
 
<br>
 
<br>
 
 
 
==== Classification ====
 
==== Classification ====
 
  --class=[spam|innocent]
 
  --class=[spam|innocent]
 
Tells DSPAM that the message being presented has already been classified by the user.  This flag should be used when a misclassification has occurred, when the user is corpus-feeding a message, or an inoculation is being presented.  This flag must be used in conjunction with the --source flag. Providing no classification invokes the SOP of DSPAM, which is to determine the message's nature on its own.
 
Tells DSPAM that the message being presented has already been classified by the user.  This flag should be used when a misclassification has occurred, when the user is corpus-feeding a message, or an inoculation is being presented.  This flag must be used in conjunction with the --source flag. Providing no classification invokes the SOP of DSPAM, which is to determine the message's nature on its own.
 +
 
<br>
 
<br>
 
 
==== Source ====
 
==== Source ====
 
  --source=[error|corpus|inoculation]
 
  --source=[error|corpus|inoculation]
 
Wherever --class is used, the source of the user-provided classification must also be provided.  The source is very important and dramatically affects DSPAM's training behavior:
 
Wherever --class is used, the source of the user-provided classification must also be provided.  The source is very important and dramatically affects DSPAM's training behavior:
<br>
+
 
  
 
'''error:'''<br>
 
'''error:'''<br>
 
The message being presented was a message previously misclassified by DSPAM.  When 'error' is provided as a source, DSPAM requires that the DSPAM signature be present in the message, and will use the signature to recall the original training metadata.  If the signature is not present, the message will be rejected.  In this source mode, DSPAM will also decrement each token's previous classification's count as well as the user totals.
 
The message being presented was a message previously misclassified by DSPAM.  When 'error' is provided as a source, DSPAM requires that the DSPAM signature be present in the message, and will use the signature to recall the original training metadata.  If the signature is not present, the message will be rejected.  In this source mode, DSPAM will also decrement each token's previous classification's count as well as the user totals.
<br>
 
  
You should use error only when DSPAM has made an error in classifying the message, and should present the modified version of the message with the DSPAM signature when doing so.
+
''You should use error only when DSPAM has made an error in classifying the message, and should present the modified version of the message with the DSPAM signature when doing so.''
<br>
+
 
  
 
'''corpus:'''<br>
 
'''corpus:'''<br>
 
The message being presented is from a mail corpus, and should be trained as a new message, rather than re-trained based on a signature.  The message's full headers and body will be analyzed and the correct classification will be incremented, without its opposite being decremented.
 
The message being presented is from a mail corpus, and should be trained as a new message, rather than re-trained based on a signature.  The message's full headers and body will be analyzed and the correct classification will be incremented, without its opposite being decremented.
<br>
 
  
You should use corpus only when feeding messages in from corpus, not for correcting errors.<br>
+
''You should use corpus only when feeding messages in from corpus, not for correcting errors.<br>''
 +
 
  
 
'''inoculation:'''<br>
 
'''inoculation:'''<br>
 
The message being presented is in pristine form, and should be trained as an inoculation.  Inoculations are a more intense mode of training designed to cause DSPAM to train the user's metadata repeatedly on previously unknown tokens, in an attepmt to vaccinate the user from future messages similar to the one being presented.
 
The message being presented is in pristine form, and should be trained as an inoculation.  Inoculations are a more intense mode of training designed to cause DSPAM to train the user's metadata repeatedly on previously unknown tokens, in an attepmt to vaccinate the user from future messages similar to the one being presented.
<br>
 
  
You should use inoculation only on honeypots and the like.
+
''You should use inoculation only on honeypots and the like.''
 +
 
 
<br>
 
<br>
  
Line 904: Line 969:
 
  --deliver=[innocent,spam]
 
  --deliver=[innocent,spam]
 
Tells DSPAM to deliver the message if its result falls within the criteria specified.  For example, --deliver=innocent will cause DSPAM to only deliver the message if it classifies as innocent.  Providing --deliver=innocent,spam will cause DSPAM to deliver the message regardless of its classification.  This flag provides a significant amount of flexibility for nonstandard implementations, where false positives may not be delivered but spam is, and etcetera.
 
Tells DSPAM to deliver the message if its result falls within the criteria specified.  For example, --deliver=innocent will cause DSPAM to only deliver the message if it classifies as innocent.  Providing --deliver=innocent,spam will cause DSPAM to deliver the message regardless of its classification.  This flag provides a significant amount of flexibility for nonstandard implementations, where false positives may not be delivered but spam is, and etcetera.
<br>
+
 
 
+
 
 
  --stdout
 
  --stdout
 
If the message is indeed deemed "deliverable" by the --deliver flag, this flag will cause DSPAM to deliver the message to stdout, rather than the configured delivery agent.
 
If the message is indeed deemed "deliverable" by the --deliver flag, this flag will cause DSPAM to deliver the message to stdout, rather than the configured delivery agent.
<br>
+
 
  
 
  --process
 
  --process
 
Tells DSPAM to process the message.  This is the default behavior, and the flag is implied unless --classify is used - but is a good idea to use to avoid ambiguity.
 
Tells DSPAM to process the message.  This is the default behavior, and the flag is implied unless --classify is used - but is a good idea to use to avoid ambiguity.
<br>
+
 
  
 
  --classify
 
  --classify
 
Tells DSPAM only to classify the message, and not make any writes to the user's metadata or attempt to deliver/quarantine the message.
 
Tells DSPAM only to classify the message, and not make any writes to the user's metadata or attempt to deliver/quarantine the message.
<br>
+
 
  
 
''NOTE:''<br>
 
''NOTE:''<br>
 
The output of the classification is specific to the user, not including the output of any groups they might be affiliated with, so it is entirely possible that the message would be caught as spam by the group, even if it didn't appear in the classification.  If you want to get the classification for the GROUP, use the group name as the user instead of an individual.
 
The output of the classification is specific to the user, not including the output of any groups they might be affiliated with, so it is entirely possible that the message would be caught as spam by the group, even if it didn't appear in the classification.  If you want to get the classification for the GROUP, use the group name as the user instead of an individual.
 +
 
<br>
 
<br>
 
 
 
==== Signatures ====
 
==== Signatures ====
 
  --signature=[signature]
 
  --signature=[signature]
 
For some implementations, the admin may wish to pass the signature in via commandline instead of allowing DSPAM to find it on its own. This is especially useful when front-ending the agent with other tools. Using this option will set the active signature and will also forego reading of stdin.
 
For some implementations, the admin may wish to pass the signature in via commandline instead of allowing DSPAM to find it on its own. This is especially useful when front-ending the agent with other tools. Using this option will set the active signature and will also forego reading of stdin.
 +
 
<br>
 
<br>
 
 
 
==== Training Modes ====
 
==== Training Modes ====
 
    
 
    
 
  --mode=[toe|tum|teft|notrain|unlearn]
 
  --mode=[toe|tum|teft|notrain|unlearn]
 
Configures the training mode to be used for this process:
 
Configures the training mode to be used for this process:
 +
 
<br>
 
<br>
 
 
===== TEFT =====
 
===== TEFT =====
 
Train-Everything.  Trains on all messages processed.  This is a very thorough training approach and should be considered the standard training approach for most users.  TEFT may, however, prove too volatile on installations with extremely high per-user traffic, or prove not very scalable on systems with extremely large user-bases.  In the event that TEFT is proving ineffective, one of the other modes is recommended.
 
Train-Everything.  Trains on all messages processed.  This is a very thorough training approach and should be considered the standard training approach for most users.  TEFT may, however, prove too volatile on installations with extremely high per-user traffic, or prove not very scalable on systems with extremely large user-bases.  In the event that TEFT is proving ineffective, one of the other modes is recommended.
Line 942: Line 1,005:
 
Until a user reaches 100 innocent messages in their metadata, train-on-error will also be teft-based, even if otherwise specified on the commandline.
 
Until a user reaches 100 innocent messages in their metadata, train-on-error will also be teft-based, even if otherwise specified on the commandline.
  
 +
<br>
 
===== TOE =====
 
===== TOE =====
 
Train-on-Error.  Trains only on a classification error, once the user's metadata has matured to 2500 innocent messages.  This training mode is much less resource intensive, as only occasional metadata writes are necessary.  It is also far less volatile than the TEFT mode of training.  One drawback, however, is that TOE only learns when DSPAM has made a mistake - which means the data is sometimes too static, and unable to "ease into" a different type of behavior.
 
Train-on-Error.  Trains only on a classification error, once the user's metadata has matured to 2500 innocent messages.  This training mode is much less resource intensive, as only occasional metadata writes are necessary.  It is also far less volatile than the TEFT mode of training.  One drawback, however, is that TOE only learns when DSPAM has made a mistake - which means the data is sometimes too static, and unable to "ease into" a different type of behavior.
<br>
 
  
 +
<br>
 
===== TUM =====
 
===== TUM =====
 
Train-until-Mature.  This training mode is a hybrid between the other two training modes and provides a great balance between volatility and static metadata.  TuM will train on a per-token basis only tokens which have had fewer than 50 "hits" on them, unless an error is being retrained in which case all tokens are trained.  This training mode provides a solid core of stable tokens to keep accuracy consistent, but also allows for dynamic adaptation to any new types of email behavior a user might be experiencing. It is a balance of resources as well, as only less-than-mature tokens are written to the database. NOTE: You should corpus train before using tum.
 
Train-until-Mature.  This training mode is a hybrid between the other two training modes and provides a great balance between volatility and static metadata.  TuM will train on a per-token basis only tokens which have had fewer than 50 "hits" on them, unless an error is being retrained in which case all tokens are trained.  This training mode provides a solid core of stable tokens to keep accuracy consistent, but also allows for dynamic adaptation to any new types of email behavior a user might be experiencing. It is a balance of resources as well, as only less-than-mature tokens are written to the database. NOTE: You should corpus train before using tum.
<br>
 
  
 +
<br>
 
===== NOTRAIN =====
 
===== NOTRAIN =====
 
No training.  Do not train the user's data, and do not keep totals. This should only be used in cases where you want to process mail for a particular user (based on a group, for example), but don't want the user to accumulate any learning data.
 
No training.  Do not train the user's data, and do not keep totals. This should only be used in cases where you want to process mail for a particular user (based on a group, for example), but don't want the user to accumulate any learning data.
<br>
 
  
 +
<br>
 
===== UNLEARN =====
 
===== UNLEARN =====
 
Unlearn original training. Use this if you wish to unlearn a previously learned message. Be sure to specify --source=error and --class to whatever the original classification the message was learned under. If not using TrainPristine, this will require the original signature from training.
 
Unlearn original training. Use this if you wish to unlearn a previously learned message. Be sure to specify --source=error and --class to whatever the original classification the message was learned under. If not using TrainPristine, this will require the original signature from training.
<br>
+
 
  
 
'''RECOMMENDATIONS'''
 
'''RECOMMENDATIONS'''
 
In general, it is recommended that users begin with TEFT.  If a user is experiencing between a 75-85% spam ratio, they may benefit from Train-on-Mature mode.  If a user is experiencing over 90% spam, then Train-on-Error mode should make a noticeable improvement in accuracy. It eventually boils down to what works best for your users.  There is no reason a system could not be configured (with a script) to analyze a user's *.stats file and determine the best training mode for that user.
 
In general, it is recommended that users begin with TEFT.  If a user is experiencing between a 75-85% spam ratio, they may benefit from Train-on-Mature mode.  If a user is experiencing over 90% spam, then Train-on-Error mode should make a noticeable improvement in accuracy. It eventually boils down to what works best for your users.  There is no reason a system could not be configured (with a script) to analyze a user's *.stats file and determine the best training mode for that user.
<br>
 
 
  
 +
<br>
 
==== Features ====
 
==== Features ====
  --feature=[noise,whitelist,tb=N]
+
  --feature=[no,wh,tb=N]
 
Specifies the features that should be activated for this filter instance. The following features may be used individually or combined using a comma as a delimiter:
 
Specifies the features that should be activated for this filter instance. The following features may be used individually or combined using a comma as a delimiter:
<br>
 
  
'''noise:'''<br>
 
Bayesian Noise Reduction (BNR).  Bayesian Noise Reduction kicks in at 2500 innocent messages and provides an advanced progressive noise logic to reduce Bayesian Noise (wordlist attacks) in spams.  See http://bnr.nuclearelephant.com for more information. BNR is not for everyone, and so users should try it out after they've trained to see if it helps improve accuracy.
 
<br>
 
  
'''tb=N:'''<br>
+
''no:''
Sets the training loop buffering level. Training loop buffering is the amount of statistical sedation performed to water down statistics and avoid false positives during the user's training loop. The training buffer sets the buffer sensitivity, and should be a number between 0 (no buffering whatsoever) to 10 (heavy buffering). The default is 5, half of what previous versions of DSPAM used. To avoid dulling down statistics at all during the training loop, set this to 0. This feature should be disabled if you're not paranoid about false positives, as it does increase the number of spam misses significantly during training.
+
 
<br>
+
Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in at 2500 innocent messages and provides an advanced progressive noise logic to reduce Bayesian Noise (wordlist attacks) in spams.  See http://bnr.nuclearelephant.com for more information. BNR is not for everyone, and so users should try it out after they've trained to see if it helps improve accuracy.
 +
 
 +
 
 +
''tb=N:''
 +
 
 +
Sets the training loop buffering level. Training loop buffering is the amount of statistical sedation performed to water down statistics and avoid false positives during the user's training loop. The training buffer sets the buffer sensitivity, and should be a number between 0 (no buffering whatsoever) to 10 (heavy buffering). The default is 5, half of what previous versions of DSPAM used. To avoid dulling down statistics at all during the training loop, set this to 0. This feature should be disabled if you're not paranoid about false positives, as it does increase the number of spam misses significantly during training.
 +
 
 +
 
 +
''wh:''
  
'''whitelist:'''<br>
 
 
Automatic whitelisting.  DSPAM will keep track of the entire "From:" line for each message received per user, and automatically whitelist messages from senders with more than 10 innocent messages and zero spams.  Once the user reports a spam from the sender, automatic whitelisting will automatically be deactivated for that sender.  Since DSPAM uses the entire "From:" line, and not just the sender's email address, automatic whitelisting is a very safe approach to improving accuracy during initial training.
 
Automatic whitelisting.  DSPAM will keep track of the entire "From:" line for each message received per user, and automatically whitelist messages from senders with more than 10 innocent messages and zero spams.  Once the user reports a spam from the sender, automatic whitelisting will automatically be deactivated for that sender.  Since DSPAM uses the entire "From:" line, and not just the sender's email address, automatic whitelisting is a very safe approach to improving accuracy during initial training.
<br>
 
  
'''NOTE:'''<br>
+
 
 +
''NOTE:''<br>
 
None of the present features are necessary when the source is "error", because the original training data is used from the signature to retrain, instantiating whatever features (such as whitelisting) were active at the time of the initial classification.  Since BNR is only necessary when a message is being classified, the --feature flag can be safely omitted from error source calls.
 
None of the present features are necessary when the source is "error", because the original training data is used from the signature to retrain, instantiating whatever features (such as whitelisting) were active at the time of the initial classification.  Since BNR is only necessary when a message is being classified, the --feature flag can be safely omitted from error source calls.
 +
 
<br>
 
<br>
  
Line 987: Line 1,054:
 
  --daemon
 
  --daemon
 
Puts DSPAM in daemon mode; e.g. DSPAM acts like a server when started with this parameter. See section 2.3 for more information about daemon mode.
 
Puts DSPAM in daemon mode; e.g. DSPAM acts like a server when started with this parameter. See section 2.3 for more information about daemon mode.
 +
 
<br>
 
<br>
  
 
== LINKING WITH LIBDSPAM ==
 
== LINKING WITH LIBDSPAM ==
 
 
----
 
----
 
+
Developers are able to link to the DSPAM core engine (libdspam) to provide "drop-in" spam-filtering for their applications.  Examples of the libdspam API can be found in the example.c file included with this distribution.
 
+
  Developers are able to link to the DSPAM core engine (libdspam) to provide  
+
  "drop-in" spam-filtering for their applications.  Examples of the libdspam
+
  API can be found in the example.c file included with this distribution.
+
  
 
   <COMMERCIAL LICENSING>
 
   <COMMERCIAL LICENSING>
 
+
 
   IF YOUR PROJECT USES THE LIBDSPAM API, A GPL-COMPATIBLE OPEN SOURCE LICENSE
 
   IF YOUR PROJECT USES THE LIBDSPAM API, A GPL-COMPATIBLE OPEN SOURCE LICENSE
 
   IS REQUIRED IN ORDER TO REDISTRIBUTE. IF YOU ARE DEVELOPING A CLOSED-SOURCE  
 
   IS REQUIRED IN ORDER TO REDISTRIBUTE. IF YOU ARE DEVELOPING A CLOSED-SOURCE  
Line 1,005: Line 1,068:
 
   NOT REDISTRIBUTE ANY APPLICATIONS USING LIBDSPAM WITHOUT A COMMERCIAL  
 
   NOT REDISTRIBUTE ANY APPLICATIONS USING LIBDSPAM WITHOUT A COMMERCIAL  
 
   LICENSE.
 
   LICENSE.
 
+
 
   COMMERCIAL LICENSING BENEFITS:
 
   COMMERCIAL LICENSING BENEFITS:
 
   - PRIORITY DEVELOPER SUPPORT
 
   - PRIORITY DEVELOPER SUPPORT
Line 1,011: Line 1,074:
 
   - NON-GPL REDISTRIBUTION PRIVILEGES
 
   - NON-GPL REDISTRIBUTION PRIVILEGES
 
   - BUG AND FEATURE REQUEST PRIORITY
 
   - BUG AND FEATURE REQUEST PRIORITY
 
+
   Please contact the author at [email protected] for information  
+
   Please contact the author at 'to be determined' for information  
 
   about commercial licensing.  
 
   about commercial licensing.  
 
+
 
   </COMMERCIAL LICENSING>
 
   </COMMERCIAL LICENSING>
 
+
 
   To link to libdspam, follow the instructions for compiling and installing  
 
   To link to libdspam, follow the instructions for compiling and installing  
 
   DSPAM. When compiled, the libdspam static and shared libraries are also  
 
   DSPAM. When compiled, the libdspam static and shared libraries are also  
 
   built. This library contains all the functions necessary to use dspam's  
 
   built. This library contains all the functions necessary to use dspam's  
 
   filtering in your application.  
 
   filtering in your application.  
 
+
 
   Your application will also need to link to the correct storage driver
 
   Your application will also need to link to the correct storage driver
 
   libraries. If you are using libdspam in a multithreaded application, you
 
   libraries. If you are using libdspam in a multithreaded application, you
 
   will need to either use a thread-safe storage driver or control access to
 
   will need to either use a thread-safe storage driver or control access to
 
   libdspam using a mutex lock.
 
   libdspam using a mutex lock.
 
+
 
   If you are using libdspam in a multithreaded environment, each thread will
 
   If you are using libdspam in a multithreaded environment, each thread will
 
   require its own DSPAM context. Fortunately, you can attach the same
 
   require its own DSPAM context. Fortunately, you can attach the same
 
   database handle to each context using dspam_attach(). See the man page for
 
   database handle to each context using dspam_attach(). See the man page for
 
   more information.
 
   more information.
 
+
 
   To build with the dspam API, you will also need the header files from
 
   To build with the dspam API, you will also need the header files from
 
   the distribution.  You can copy these to /usr/include/dspam for ease of
 
   the distribution.  You can copy these to /usr/include/dspam for ease of
 
   use, and then use -I/usr/include/dspam
 
   use, and then use -I/usr/include/dspam
 
+
 
   Please see example.c for API examples.
 
   Please see example.c for API examples.
 
+
 
   If you are interested in linking libdspam with your project and have  
 
   If you are interested in linking libdspam with your project and have  
 
   questions or concerns, please contact the dspam-dev mailing list.
 
   questions or concerns, please contact the dspam-dev mailing list.
  
 
+
<br>
 
=== CONFIGURING GROUPS ===
 
=== CONFIGURING GROUPS ===
 +
-----
 +
Groups enable a group of users to share information.  The following group types are supported:
  
  Groups enable a group of users to share information.  The following
+
<br>
  group types are supported:
+
==== SHARED GROUPS ====
 +
Enables users with similar email behavior to share the same dictionary while still maintaining a private quarantine box.  The benefits of this type of group are faster learning, and sharing a single spam alias.  Shared groups can have both positive and negative effects on accuracy.  If a shared group consists of users with similar, predictable email behavior, the users in the group can benefit from a larger dictionary of spam and faster learning (especially for newcomers in the group).  If a group consists of users with different email behavior, however, the users in the group will experience poor spam filtering and a higher number of false positives.
  
  SHARED
+
''NOTE:''<br>
  Enables users with similar email behavior to share the same dictionary
+
The SQL-based storage drivers support shared groups, but has one caveat:
  while still maintaining a private quarantine box.  The benefits of this
+
If you are NOT enabling "virtual users" support, you will need to create an actual user on your system named after each group you create.
  type of group are faster learning, and sharing a single spam alias.  Shared
+
  groups can have both positive and negative effects on accuracy.  If a shared
+
  group consists of users with similar, predictable email behavior, the users
+
  in the group can benefit from a larger dictionary of spam and faster
+
  learning (especially for newcomers in the group).  If a group consists of
+
  users with different email behavior, however, the users in the group will
+
  experience poor spam filtering and a higher number of false positives.
+
  
  NOTE
+
On top of shared group support, a shared group can also be made to be 'managed'.  Using the group type 'SHARED,MANAGED' will cause the group to share a single quarantine mailbox which could be managed by the group's administrator.  This would enable one individual to monitor quarantine for the entire group, however personal emails marked as false positives could potentially be viewed as well.  For this reason, managed groups should only be used when this is not an issue.
    The SQL-based storage drivers support shared groups, but has one caveat:
+
    If you are NOT enabling "virtual users" support, you will need to create
+
    an actual user on your system named after each group you create.
+
  
  On top of shared group support, a shared group can also be made to be
+
<br>
  'managed'.  Using the group type 'SHARED,MANAGED' will cause the group to
+
==== INOCULATION GROUPS ====
  share a single quarantine mailbox which could be managed by the group's
+
An inoculation group allows users to maintain their own private dictionaries with their own spam alias, but all members of the group will inoculate other members with spams they manually forward into their alias.  This allows users to report spams to one another and maintain their own private dictionaryAnother advantage to this is that users do not necessarily have to share the same email behavior.
  administrator.  This would enable one individual to monitor quarantine for
+
  the entire group, however personal emails marked as false positives could
+
  potentially be viewed as wellFor this reason, managed groups should only
+
  be used when this is not an issue.
+
  
  INOCULATION
+
''NOTE:''<br>
  An inoculation group allows users to maintain their own private dictionaries
+
Users should only be added to an inoculation group after their initial learning period, to avoid potential false positives due to lack of data.
  with their own spam alias, but all members of the group will inoculate other
+
  members with spams they manually forward into their alias.  This allows
+
  users to report spams to one another and maintain their own private
+
  dictionary.  Another advantage to this is that users do not necessarily have
+
  to share the same email behavior. 
+
 
+
  NOTE: Users should only be added to an inoculation group after their initial
+
        learning period, to avoid potential false positives due to lack of data.
+
  
 
   To create groups, you'll want to create a file with the filename 'group'  
 
   To create groups, you'll want to create a file with the filename 'group'  
Line 1,100: Line 1,144:
 
   group.
 
   group.
  
  CLASSIFICATION
+
<br>
 +
==== CLASSIFICATION GROUPS ====
 
   Classification groups allow a group of users to network their results
 
   Classification groups allow a group of users to network their results
 
   together.  If DSPAM is uncertain of whether a message is spam or nonspam for
 
   together.  If DSPAM is uncertain of whether a message is spam or nonspam for
Line 1,146: Line 1,191:
 
   established between both parties.
 
   established between both parties.
  
  GLOBAL GROUPS
+
<BR>
 
+
==== GLOBAL GROUPS ====
 
   Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
 
   Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
 
   filtering" for all new users until they have built their own useful
 
   filtering" for all new users until they have built their own useful
Line 1,164: Line 1,209:
 
   treated just as any other user on the system.
 
   treated just as any other user on the system.
  
  NOTE: Be sure and set your global user's preferences so that trainingMode
+
''NOTE:''<BR>
        is set to TOE. This will prevent the purge tools you use from
+
Be sure and set your global user's preferences so that trainingMode is set to TOE. This will prevent the purge tools you use from purging them empty in 90 days.
        purging them empty in 90 days.
+
 
+
  MERGED GROUPS
+
  
 +
<BR>
 +
==== MERGED GROUPS ====
 
   Merged groups are similar to global groups in that the entire system uses
 
   Merged groups are similar to global groups in that the entire system uses
 
   a single global user as a parent.  What's different is that the global
 
   a single global user as a parent.  What's different is that the global
Line 1,201: Line 1,245:
 
   the group.
 
   the group.
  
  NOTE: Merged Groups are great for providing out-of-the-box adaptive filtering,
+
''NOTE:''<br>
        but allowing users to build their own data from scratch will still  
+
Merged Groups are great for providing out-of-the-box adaptive filtering, but allowing users to build their own data from scratch will still result in the best possible accuracy in the longrun.
        result in the best possible accuracy in the longrun.
+
  
  NOTE: Be sure and set your global user's preferences so that trainingMode
+
''NOTE:''<br>
        is set to TOE. This will prevent the purge tools you use from
+
Be sure and set your global user's preferences so that trainingMode is set to TOE. This will prevent the purge tools you use from purging them empty in 90 days.
        purging them empty in 90 days.
+
  
 
+
'''  IMPORTANT! '''
  IMPORTANT!
+
  
 
   If you are running dspam_clean, be sure to set a preference for your merged
 
   If you are running dspam_clean, be sure to set a preference for your merged
Line 1,217: Line 1,258:
 
   out your entire merged group user's dataset, since it's old).
 
   out your entire merged group user's dataset, since it's old).
  
 +
<br>
  
 
=== EXTERNAL INOCULATION THEORY ===
 
=== EXTERNAL INOCULATION THEORY ===
 
+
-----
  Bill Yerazunis recently expressed his theory of inoculation on an anti-spam
+
Bill Yerazunis recently expressed his theory of inoculation on an anti-spam development list, using the term "vaccination":
  development list, using the term "vaccination":
+
  
 
   "Part of the problem is that spam isn't stationary, it evolves. That  
 
   "Part of the problem is that spam isn't stationary, it evolves. That  
Line 1,300: Line 1,341:
 
   harvester bots, making them obsolete as counter-productive tools.
 
   harvester bots, making them obsolete as counter-productive tools.
  
 +
<br>
  
 
=== CLIENT/SERVER MODE ===
 
=== CLIENT/SERVER MODE ===
 +
-----
 +
DSPAM supports two different modes of operation.  In standard operating mode, the DSPAM agent is called by the MTA (or proxy) and each agent process performs independently, establishing its own connection to a database and performs delivery on its own. The second operating mode, client/server mode, allows the DSPAM agent to act more like a thin client, connecting to the DSPAM server process which then does all the work of analyzing and delivering or quarantining the message. The advantages to using DSPAM in client/server mode are:
 +
* Maintaining a set of stateful database connections (within the server), which should enhance performance on some systems by eliminating the need to establish a new database connection for every message processed.
  
  DSPAM supports two different modes of operation. In standard operating
+
* Providing a central point of processing. Having one server perform all processing and delivery, while having multiple thin clients on your mail servers may be more desirable than having multiple agents performing processing and delivery on all your servers.
  mode, the DSPAM agent is called by the MTA (or proxy) and each agent process
+
* The DSPAM server speaks LMTP, which some implementations may be able to take advantage of, eliminating the need for the DSPAM client all together.
  performs independently, establishing its own connection to a database and  
+
* Having a single multithreaded daemon should use less memory and other resources than having independently operating clients.
  performs delivery on its own. The second operating mode, client/server mode,  
+
  allows the DSPAM agent to act more like a thin client, connecting to the
+
  DSPAM server process which then does all the work of analyzing and delivering
+
  or quarantining the message. The advantages to using DSPAM in client/server
+
  mode are:
+
  
  - Maintaining a set of stateful database connections (within the server),
 
    which should enhance performance on some systems by eliminating the need
 
    to establish a new database connection for every message processed.
 
  
  - Providing a central point of processing. Having one server perform all
+
If you've already got DSPAM set up, client/server mode won't require any changes to your mail server's configuration - it's completely transparent.
    processing and delivery, while having multiple thin clients on your mail  
+
    servers may be more desirable than having multiple agents performing
+
    processing and delivery on all your servers.
+
  
  - The DSPAM server speaks LMTP, which some implementations may be able to
 
    take advantage of, eliminating the need for the DSPAM client all together.
 
  
  - Having a single multithreaded daemon should use less memory and other
+
The DSPAM agent can be compiled with client/server support by configuring with --enable-daemon. You will need to use a multithread-safe storage driver (presently mysql_drv, pgsql_drv, and hash_drv are supported). Once you have compiled with daemon support, you'll need to modify your dspam.conf to provide the settings necessary for client/server mode:
    resources than having independently operating clients.
+
  
  If you've already got DSPAM set up, client/server mode won't require any
+
ServerHost            127.0.0.1
  changes to your mail server's configuration - it's completely transparent.
+
The host to listen on. The default is to comment this setting which will force DSPAM to listen on all available interfaces.
  
  The DSPAM agent can be compiled with client/server support by configuring
 
  with --enable-daemon. You will need to use a multithread-safe storage driver
 
  (presently mysql_drv and pgsql_drv are supported). Once you have compiled
 
  with daemon support, you'll need to modify your dspam.conf to provide the
 
  settings necessary for client/server mode:
 
  
ServerPort            24
+
ServerPort            24
 +
The port to listen on. The default is 24, the LMTP port.
  
  The port to listen on. The default is 24, the LMTP port.
 
  
ServerQueueSize        32
+
ServerQueueSize        32
 +
The maximum number of connections which may remain backlogged before they are accepted.
  
  The maximum number of connections which may remain backlogged before they
 
  are accepted.
 
  
ServerPass.Relay1      "secret"
+
ServerPass.Relay1      "secret"
ServerPass.Relay2      "password"
+
ServerPass.Relay2      "password"
 +
Each client server allowed to connect should have its own password. They can be defined here.
  
  Each client server allowed to connect should have its own password. They
 
  can be defined here.
 
  
  The DSPAM server can listen on either a network socket or a local unix
+
The DSPAM server can listen on either a network socket or a local unix domain socket. If you're running the client and server on the same machine, a domain socket should be used as it eliminates additional overhead. To use a domain socket, you'll also need to add the following option:
  domain socket. If you're running the client and server on the same machine,
+
  a domain socket should be used as it eliminates additional overhead. To use
+
  a domain socket, you'll also need to add the following option:
+
  
ServerDomainSocketPath  "/tmp/dspam.sock"
+
ServerDomainSocketPath  "/tmp/dspam.sock"
  
  Once you've configured the server config, you'll want to set the client
 
  configuration on all client machines. If you are using network sockets,
 
  set the following to appropriate values:
 
  
ClientHost    127.0.0.1
+
Once you've configured the server config, you'll want to set the client configuration on all client machines. If you are using network sockets, set the following to appropriate values:
ClientPort    24
+
ClientHost    127.0.0.1
 +
ClientPort    24
  
  Or if using a domain socket:
+
Or if using a domain socket:
 +
ClientHost    /tmp/dspam.sock
  
        ClientHost    /tmp/dspam.sock
+
In both cases, you'll need to set the client's authentication ident:
 +
ClientIdent    "secret@Relay1"
  
  In both cases, you'll need to set the client's authentication ident:
 
  
ClientIdent    "secret@Relay1"
+
Now you're ready to go. To start the DSPAM server, run:
 +
dspam --daemon &
  
  Now you're ready to go. To start the DSPAM server, run:
+
Or alternatively, if you have debugging enabled:
 +
dspam --debug --daemon &
  
dspam --daemon &
 
  
  Or alternatively, if you have debugging enabled:
+
The DSPAM agent can then be called the same as if you were running in standard (non-client/server) mode and adding --client to the set of parameters. Running dspam without --client specified will cause DSPAM to revert to its normal non-daemon behavior and establish database connections
 +
on its own. The client settings will be loaded from dspam.conf, and the agent will act as a thin client instead. For example:
 +
dspam --client --user dick jane --deliver=innocent -d %u
  
dspam --debug --daemon &
+
Alternatively, if you'd like to use a thinner client, dspamc is identical to the dspam binary in behavior, but has been stripped down to only include the lightweight client.
 +
dspamc --client --user dick jane --deliver=innocent -d %u
  
  The DSPAM agent can then be called the same as if you were running in
 
  standard (non-client/server) mode and adding --client to the set of
 
  parameters. Running dspam without --client specified will cause DSPAM to
 
  revert to its normal non-daemon behavior and establish database connections
 
  on its own. The client settings will be loaded from dspam.conf, and the
 
  agent will act as a thin client instead. For example:
 
  
dspam --client --user dick jane --deliver=innocent -d %u
+
The conversation that takes place between the client/server is LMTP-based, and will look like this:
 +
SERVER> 220 DSPAM DLMTP 3.4.0 Authentication Required
 +
CLIENT> LHLO Relay1
 +
SERVER> 250-PIPELINING
 +
SERVER> 250-ENHANCEDSTATUSCODES
 +
SERVER> 250-DSPAMPROCESSMODE
 +
SERVER> 250 SIZE
 +
CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"
 +
SERVER> 250 2.1.0 OK
 +
CLIENT> RCPT TO: dick
 +
SERVER> 250 2.1.5 OK
 +
CLIENT> RCPT TO: jane
 +
SERVER: 250 2.1.5 OK
 +
CLIENT> DATA
 +
SERVER> 354 Enter mail, end with "." on a line by itself
 +
CLIENT> Subject: Cheap Viagra!
 +
CLIENT>
 +
CLIENT> Click Here: <nowiki>http://www.cheapviagra.com</nowiki>
 +
CLIENT> .
 +
SERVER> 250 2.0.0 <dick> Message accepted for delivery: INNOCENT
 +
SERVER> 250 2.0.0 <jane> Message accepted for delivery: SPAM
  
  Alternatively, if you'd like to use a thinner client, dspamc is identical
 
  to the dspam binary in behavior, but has been stripped down to only include
 
  the lightweight client.
 
  
dspamc --client --user dick jane --deliver=innocent -d %u
+
Optionally, if you'd like the clients to perform delivery, you can use DSPAM's --stdout or --classify functionality to obtain a dump of the message or results, respectively. From there, it's up to you and your MTA to deliver the message. The DSPAM client will output the results to stdout in this case, just as it would in standard operating mode.
  
  The conversation that takes place between the client/server is LMTP-based,
 
  and will look like this:
 
  
SERVER> 220 DSPAM DLMTP 3.4.0 Authentication Required
+
Once the server is running, its configuration can be reloaded with a SIGHUP.
CLIENT> LHLO Relay1
+
When the daemon is reloaded, the following occurs:
SERVER> 250-PIPELINING
+
* The daemon stops listening for new requests
SERVER> 250-ENHANCEDSTATUSCODES
+
* All threads are allowed to finish processing and exit
SERVER> 250-DSPAMPROCESSMODE
+
* All connections to the database are closed
SERVER> 250 SIZE
+
* The dspam.conf configuration is reloaded
CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"
+
* All connections to the database are re-opened
SERVER> 250 2.1.0 OK
+
* The daemon starts listening for new requests
CLIENT> RCPT TO: dick
+
SERVER> 250 2.1.5 OK
+
CLIENT> RCPT TO: jane
+
SERVER: 250 2.1.5 OK
+
CLIENT> DATA
+
SERVER> 354 Enter mail, end with "." on a line by itself
+
CLIENT> Subject: Cheap Viagra!
+
CLIENT>
+
CLIENT> Click Here: http://www.cheapviagra.com
+
CLIENT> .
+
SERVER> 250 2.0.0 <dick> Message accepted for delivery: INNOCENT
+
SERVER> 250 2.0.0 <jane> Message accepted for delivery: SPAM
+
+
  Optionally, if you'd like the clients to perform delivery, you can use
+
  DSPAM's --stdout or --classify functionality to obtain a dump of the message
+
  or results, respectively. From there, it's up to you and your MTA to
+
  deliver the message. The DSPAM client will output the results to stdout in
+
  this case, just as it would in standard operating mode.
+
  
  Once the server is running, its configuration can be reloaded with a SIGHUP.
+
This allows database and listener configurations to also be reloaded from dspam.conf without the need to interrupt the process.
  When the daemon is reloaded, the following occurs:
+
  
  - The daemon stops listening for new requests
 
  - All threads are allowed to finish processing and exit
 
  - All connections to the database are closed
 
  - The dspam.conf configuration is reloaded
 
  - All connections to the database are re-opened
 
  - The daemon starts listening for new requests
 
  
  This allows database and listener configurations to also be reloaded from
+
''NOTE:''<br>
  dspam.conf without the need to interrupt the process.
+
During the period of time the daemon is reloading, client connections will fail. Depending on how the MTA reacts, this may cause messages to fall back to queue or to bounce.
 
+
  NOTE: During the period of time the daemon is reloading, client connections
+
  will fail. Depending on how the MTA reacts, this may cause messages to
+
  fall back to queue or to bounce.
+
  
 +
<br>
  
 
=== LMTP ===
 
=== LMTP ===
 
+
-----
  DSPAM supports LMTP both on the front-end and back-end (delivery). This
+
DSPAM supports LMTP both on the front-end and back-end (delivery). This section will briefly provide instructions for configuring either or both of these advanced options.
  section will briefly provide instructions for configuring either or both of
+
  these advanced options.
+
  
 
   LMTP (AND SMTP) DELIVERY
 
   LMTP (AND SMTP) DELIVERY
Line 1,544: Line 1,547:
 
   In both cases, the content provided between < > is what is actually used.
 
   In both cases, the content provided between < > is what is actually used.
  
 +
<br>
  
 
=== DSPAM USER PREFERENCES ===
 
=== DSPAM USER PREFERENCES ===
 +
-----
 +
Preferences are settings that can be configured globally in dspam.conf or for individual users via the dspam_admin command.
 +
 +
trainingMode { TOE | TUM | TEFT | NOTRAIN }
 +
How DSPAM should train messages it analyzes. See section 1.5 --mode (default:teft, see dspam.conf)
 +
 +
 +
spamAction { quarantine | tag | deliver }
 +
What to do with spam. The tag and deliver options both deliver, but tag adds a special prefix to the subject, whereas deliver merely sets X-DSPAM-Result. (default:quarantine)
 +
  
  Preferences are settings that can be configured globally in dspam.conf or
+
spamSubject
  for individual users via the dspam_admin command.
+
A customized subject to prefix when spamAction=tag. (default:[SPAM])
  
  trainingMode { TOE | TUM | TEFT | NOTRAIN }
 
    How DSPAM should train messages it analyzes. See section 1.5 --mode
 
    (default:teft, see dspam.conf)
 
  
  spamAction { quarantine | tag | deliver }  
+
statisticalSedation { 0 - 10 }
    What to do with spam. The tag and deliver options both deliver, but tag
+
The level of dampening during training (0-10, 0 = no dampening, default:0)
    adds a special prefix to the subject, whereas deliver merely sets
+
    X-DSPAM-Result. (default:quarantine)
+
  
  spamSubject
 
    A customized subject to prefix when spamAction=tag. (default:[SPAM])
 
  
  statisticalSedation { 0 - 10 }
+
enableBNR { on | off }
    The level of dampening during training (0-10, 0 = no dampening, default:0)
+
Enables or disables bayesian noise reduction (default:off)
  
  enableBNR { on | off }
 
    Enables or disables bayesian noise reduction (default:off)
 
 
   
 
   
  enableWhitelist { on | off }
+
enableWhitelist { on | off }
    Enables or disables automatic whitelisting (default:on)
+
Enables or disables automatic whitelisting (default:on)
  
  signatureLocation { message | headers }
 
    Where to place the DSPAM signature. Placement affects forwarding approach.
 
    (default:message)
 
  
  tagSpam / tagNonspam { on | off }
+
signatureLocation { message | headers }
    Adds a tagline to the end of a message based on its classification; useful
+
Where to place the DSPAM signature. Placement affects forwarding approach. (default:message)
    for things such as "Scanned by Your ISP.com". If set to on, the file
+
    msgtag.spam and/or msgtag.nonspam will be looked for in dspam_home/txt/
+
    and appended to appropriate messages.  
+
  
    NOTE: Signed messages will not be tagged in this fashion
 
  
  showFactors { on | off }
+
tagSpam / tagNonspam { on | off }
    Whether to include an X-DSPAM-Factors header including decision-making
+
Adds a tagline to the end of a message based on its classification; useful for things such as "Scanned by Your ISP.com". If set to on, the file msgtag.spam and/or msgtag.nonspam will be looked for in dspam_home/txt/ and appended to appropriate messages.  
    factors (clues). NOTE: This can break RFC in some cases, and should only
+
    be used for debugging. (default:off)
+
  
  optIn / optOut { on | off }
+
''NOTE:''<br>
    Depending on whether the system is opt-in or opt-out, sets the user's
+
Signed messages will not be tagged in this fashion
    membership. If user is opted out (or not opted in), mail will be delivered
+
    by DSPAM without being processed.
+
  
  whitelistThreshold { Integer }
 
    Overrides the default number of times a From: header has been seen before
 
    it is automatically whitelisted. (default:10)
 
  
  makeCorpus { on | off }
+
showFactors { on | off }
    When activated, a maildir-style corpus is maintained in the user's data
+
Whether to include an X-DSPAM-Factors header including decision-making factors (clues). NOTE: This can break RFC in some cases, and should only be used for debugging. (default:off)
    directory (DSPAM_HOME/DATA/USERNAME), suitable for future retraining or
+
    other analysis. (default:off)
+
  
  storeFragments { on | off }
 
    When activated, the first 1k of each message are temporarily stored on
 
    the server for reference via the webui's history function. (default:off)
 
  
  localStore { on | off }
+
optIn / optOut { on | off }
    Overrides the directory name used for the user's dspam data directory. This
+
Depending on whether the system is opt-in or opt-out, sets the user's membership. If user is opted out (or not opted in), mail will be delivered by DSPAM without being processed.
    is useful when using recipient addresses as usernames, as it will allow
+
    all addresses belonging to a specific user to be written to a single
+
    webui directory. (default:username)
+
   
+
  processorBias { on | off }
+
    Overrides the "bias" setting in dspam.conf, which biases mail as
+
    innocent. (default:on, see dspam.conf)
+
  
  fallbackDomain { on | off }
 
    Allows a dspam user ("@domain.com") to be marked as a fallback user for
 
    the entire domain, so if the destination dspam user does not exist in
 
    the database, the fallback user's database will be used. The
 
    dspam.conf "FallbackDomains" setting must also be "on". (default:off)
 
    NOTE: You will need to set "FallbackDomains on" in dspam.conf to use this.
 
  
  trainPristine { on | off }
+
whitelistThreshold { Integer }
    Override's the default signature mode and treats messages as if they were
+
Overrides the default number of times a From: header has been seen before it is automatically whitelisted. (default:10)
    in pristine format when retraining. This requires all retraining to use
+
    the original message that was processed as no dspam signature is stored
+
    for pristine training. (default:off)
+
  
  optOutClamAV { on | off }
 
    Opts out of ClamAV virus scanning (if ClamAV is directly integrated with
 
    dspam via dspam.conf). (default:off)
 
  
 +
makeCorpus { on | off }
 +
When activated, a maildir-style corpus is maintained in the user's data directory (DSPAM_HOME/DATA/USERNAME), suitable for future retraining or other analysis. (default:off)
  
=== FALLBACK DOMAINS ===
 
  
  Fallback domains allow you to default some or all users for a particular
+
storeFragments { on | off }
  domain to a single domain user; this allows you to set preferences (including
+
When activated, the first 1k of each message are temporarily stored on the server for reference via the webui's history function. (default:off)
  opting out of filtering entirely) for users based on domain name. Any user
+
  who does not exist as a known user to DSPAM will be defaulted to the  
+
  domain it belongs to if it is designated as a fallback domain. This
+
  means that you can create [email protected] and [email protected] with their own
+
  databases and preferences, but also default all other users to @domain.com.
+
  Alternatively, you could create just the domain without any other users and
+
  default all users to @domain.com
+
  
  To use fallback domains, you'll first need to activate this feature in
 
  dspam.conf:
 
  
  FallbackDomains on
+
localStore { on | off }
 +
Overrides the directory name used for the user's dspam data directory. This is useful when using recipient addresses as usernames, as it will allow all addresses belonging to a specific user to be written to a single webui directory. (default:username)
  
  Next, you'll need to create a dspam user for each domain you wish to use
+
   
  as a fallback domain. For example, @domain.com. Depending on your
+
processorBias { on | off }
  implementation, this may be a simple insert into dspam_virtual_uids or may
+
Overrides the "bias" setting in dspam.conf, which biases mail as innocent. (default:on, see dspam.conf)
  be created automatically when setting a user's preferences.
+
  
  Finally, designate that special user as a fallback domain by setting a
 
  prefer