Difference between revisions of "DSpam README"

Revision as of 18:23, 27 July 2009 (view source)

Ledhed (Talk | contribs)

(→‎TROUBLESHOOTING)

← Older edit

Latest revision as of 23:27, 27 July 2009 (view source)

Ledhed (Talk | contribs)

(→‎ALIASES)

(78 intermediate revisions by the same user not shown)

Line 16:

+

'''CREDITS'''

−

~~== OVERVIEW ==~~

+

Original Work By:

+

*Lead development: Jonathan A. Zdziarski <[email protected]>

+

*Postgres driver: Rustam Aliyev <[email protected]>

+

Various:

+

*Feb/2006 Cove Schneider <[email protected]>

+

*Jan/2006 Norman Maurer <[email protected]>

−

~~----~~

+

Your name is missing? Let us know with a reference to your commit, and we'll

+

add you to the list.

+

'''COPYRIGHT'''

+

Original work was done by Jonathan A. Zdziarski.

+

In 2006 the copyright was handed over to Sensory Networks.

+

In 2009 Sensory Networks handed over the full copyright to the DSPAM Project.

+

As of 12 January 2009 the copyright is owned by the DSPAM Project, represented by a team of people, including:

+

* Alexander Prinsier

+

* Ion-Mihai Tetcu

+

* Paul Cockings

+

* Dov Zamir

+

* Stevan Bajic

+

+

== OVERVIEW ==

+

----

DSPAM is an open-source, freely available anti-spam solution designed to combat unsolicited commercial email using advanced statistical analysis. In short, DSPAM filters spam by learning what spam is and isn't. It does this by learning each user's individual mail behavior. This allows DSPAM to provide highly-accurate, personalized filtering for each user on even a large system and provides an administratively maintenance free solution capable of learning each user's email behaviors with very few false positives.

Line 36:

Line 61:

−

''PLEASE NOTE:'' DSPAM and libdspam are distributed under the GPL license, not the LGPL. Commercial licensing is available for those who seek to redistribute DSPAM or some of DSPAM's components/libraries in their non-GPL products. Please contact ~~[email protected]~~ for more information about commercial licensing.~~ ~~

+

''PLEASE NOTE:''

−

~~ ~~

+

DSPAM and libdspam are distributed under the GPL license, not the LGPL. Commercial licensing is available for those who seek to redistribute DSPAM or some of DSPAM's components/libraries in their non-GPL products. Please contact us for more information about commercial licensing.

+

The DSPAM package is split up into the following pieces:

Line 80:

Line 106:

[MTA] ---> [LDA] ---> (User's Mailbox)

+

AFTER:

Line 132:

Line 159:

Follow the steps sequentially from the base version you are running up to the top.

+

+

==== Upgrading from 3.8 ====

+

1. Ensure MySQL is using the new database schema. The following clauses should be executed for upgrading pre-3.9.0 DSPAM MySQL schema to the 3.9.0 schema:

+

ALTER TABLE `dspam_signature_data`

+

CHANGE `uid` `uid` INT UNSIGNED NOT NULL,

+

CHANGE `data` `data` LONGBLOB NOT NULL,

+

CHANGE `length` `length` INT UNSIGNED NOT NULL;

+

ALTER TABLE `dspam_stats`

+

CHANGE `uid` `uid` INT UNSIGNED NOT NULL,

+

CHANGE `spam_learned` `spam_learned` BIGINT UNSIGNED NOT NULL,

+

CHANGE `innocent_learned` `innocent_learned` BIGINT UNSIGNED NOT NULL,

+

CHANGE `spam_misclassified` `spam_misclassified` BIGINT UNSIGNED NOT NULL,

+

CHANGE `innocent_misclassified` `innocent_misclassified` BIGINT UNSIGNED NOT NULL,

+

CHANGE `spam_corpusfed` `spam_corpusfed` BIGINT UNSIGNED NOT NULL,

+

CHANGE `innocent_corpusfed` `innocent_corpusfed` BIGINT UNSIGNED NOT NULL,

+

CHANGE `spam_classified` `spam_classified` BIGINT UNSIGNED NOT NULL,

+

CHANGE `innocent_classified` `innocent_classified` BIGINT UNSIGNED NOT NULL;

+

ALTER TABLE `dspam_token_data`

+

CHANGE `uid` `uid` INT UNSIGNED NOT NULL,

+

CHANGE `spam_hits` `spam_hits` BIGINT UNSIGNED NOT NULL,

+

CHANGE `innocent_hits` `innocent_hits` BIGINT UNSIGNED NOT NULL;

+

If you are using preference extension with DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM preference MySQL schema to the 3.9.0 schema:

+

ALTER TABLE `dspam_preferences`

+

CHANGE `uid` `uid` INT UNSIGNED NOT NULL;

+

If you are using virtual users (with AUTO_INCREMENT) in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids MySQL schema to the 3.9.0 schema:

+

ALTER TABLE `dspam_virtual_uids`

+

CHANGE `uid` `uid` INT UNSIGNED NOT NULL AUTO_INCREMENT;

+

If you are using virtual user aliases (aka: DSPAM in relay mode) in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids MySQL schema to the 3.9.0 schema:

+

ALTER TABLE `dspam_virtual_uids`

+

CHANGE `uid` `uid` INT UNSIGNED NOT NULL;

+

If you need to speed up the MySQL purging script and can afford to use more disk space for the DSPAM MySQL data, then consider executing the following clause for adding three additional indices:

+

ALTER TABLE `dspam_token_data`

+

ADD INDEX(`spam_hits`),

+

ADD INDEX(`innocent_hits`),

+

ADD INDEX(`last_hit`);

+

2. Ensure PosgreSQL is using the new database schema. The following clauses should be executed for upgrading pre-3.9.0 DSPAM PosgreSQL schema to the 3.9.0 schema:

+

ALTER TABLE dspam_preferences ALTER COLUMN uid TYPE integer;

+

ALTER TABLE dspam_signature_data ALTER COLUMN uid TYPE integer;

+

ALTER TABLE dspam_stats ALTER COLUMN uid TYPE integer;

+

ALTER TABLE dspam_token_data ALTER COLUMN uid TYPE integer;

+

DROP INDEX IF EXISTS id_token_data_sumhits;

+

If you are using virtual users in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids to the 3.9.0 schema:

+

ALTER TABLE dspam_virtual_uids ALTER COLUMN uid TYPE integer;

+

==== Upgrading From 3.6 ====

Line 171:

Line 255:

−

''NOTE:''

+

''NOTE:''

Berkeley DB drivers (libdb3_drv, libdb4_drv) are deprecated and have been removed from the build. You will need to select an alternative storage driver in order to upgrade.

Line 179:

Line 263:

----

+

'''PREREQUISITES'''

Line 206:

Line 291:

You can download MySQL from http://www.mysql.com.

+

You can download PostgreSQL from http://www.postgresql.com.

+

You can download SQLite from http://www.sqlite.org.

Line 252:

Line 339:

--with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]

−

Specify your storage driver selection(s). A storage driver is a driver written specifically for DSPAM to store tokens, signature data, ~~andperform~~ other proprietary operations. The default driver is hash_drv. The following drivers have been provided:

+

Specify your storage driver selection(s). A storage driver is a driver written specifically for DSPAM to store tokens, signature data, and perform other proprietary operations. The default driver is hash_drv. The following drivers have been provided:

mysql_drv: MySQL Drivers

Line 277:

Line 364:

−

''~~Note~~:''

+

''NOTE:''

−

+

This function is incompatible with most implementations of the Web UI, since it requires access to read each user's home directory. Therefore, only use this option if you will not be using the Web UI or plan on doing something asinine like running it as root.

Line 286:

Line 372:

+

===== DRIVER SPECIFIC CONFIGURE SWITCHES =====

Line 312:

Line 399:

−

''~~Note~~:''

+

''NOTE:''

−

+

Please see the file doc/mysql_drv.txt for more information about configuring the mysql_drv storage driver.

Line 334:

Line 420:

−

''~~Note~~:''

+

''NOTE:''

−

+

Please see the file doc/pgsql_drv.txt for more information about configuring the pgsql_drv storage driver.

Line 352:

Line 437:

===== DEBUGGING SWITCHES =====

−

--enable-debug

Line 362:

Line 446:

−

''~~Note~~:''

+

''NOTE:''

−

+

When verbose debug is compiled in, DSPAM performs many additional mathematical calculations regardless of whether or not it's been activated. You shouldn't use --enable-verbose for production builds unless you have serious issues you can't resolve.

Line 378:

Line 461:

+

==== BUILDING AND INSTALLING ====

Line 386:

Line 470:

−

''~~Note~~:''

+

''NOTE:''

If you are a developer wanting to link to the core engine of dspam, libdspam will be built during this process. Please see the example.c file for examples of how to link to and use libdspam. Static and dynamic libraries are built in the .libs directory. Needed headers will be installed in $prefix$/include/dspam.

+

==== PERMISSIONS ====

Line 397:

Line 482:

The CGI User: This is the user your web server (most likely Apache) is running as. This is commonly 'nobody' or 'web'. You can find this in Apache's httpd.conf by searching for 'User'. The CGI user will need the ability to access the following components of DSPAM:

−

- Ability to execute the dspam binary

+

* Ability to execute the dspam binary

−

- Ability to read and write to dspam_home/data/

+

* Ability to read and write to dspam_home/data/

−

- Trusted user permissions in dspam.conf ("Trust [username]")

+

* Trusted user permissions in dspam.conf ("Trust [username]")

−

- The execution 'Group' used must match the group dspam is running as

+

* The execution 'Group' used must match the group dspam is running as (this is typically 'mail', 'dspam', or similar).

−

(this is typically 'mail', 'dspam', or similar)

+

−

+

The MTA User: This is the user your mail server software is running as when it executes DSPAM. This is usually daemon, mail, exim, etc. This is typically different from the user the MTA runs and polices itself as, to avoid security problems. Consult your MTA's documentation for more info. The MTA user will require:

−

- The ability to execute the dspam binary

+

* The ability to execute the dspam binary

−

- Trusted user permissions in dspam.conf ("Trust [username]")

+

* Trusted user permissions in dspam.conf ("Trust [username]")

Systems Administrators: In order to perform administrative functions, systems administratiors will require:

−

- The ability to execute dspam-related binaries

+

* The ability to execute dspam-related binaries

−

- Trusted user permissions in dspam.conf ("Trust [username]")

+

* Trusted user permissions in dspam.conf ("Trust [username]")

−

+

−

~~''Note:''~~

+

''NOTE:''

If the MTA is communicating with DSPAM via LMTP (explained later), then execution permissions are not necessary.

−

''~~Note~~ about FreeBSD:''

+

''NOTE about FreeBSD:''

−

+

FreeBSD's default MTA user is 'mailnull' FreeBSD's default delivery agent also changes its uid, and so in order to call it, dspam must be installed as setuid root to work on the commandline properly. This is done automatically on install.

Line 436:

Line 518:

−

~~==== MAIL SERVER INTEGRATION ====~~

+

==== MAIL SERVER INTEGRATION ====

As previously mentioned, there are three popular ways to implement DSPAM:

Line 471:

Line 553:

===== ALIASES =====

−

There are essentially two different ways a user might train DSPAM. The first is by using the Web UI, which allows them to retrain via the "History" tab. This works quite well, as users must visit the Web UI occasionally to review their quarantine anyway (and reverse any false positives). We'll discuss this shortly in section 1.1.8.

Line 478:

Line 559:

−

''~~Note~~:''

+

''NOTE:''

−

+

If you are using an IMAP based system, Web-based email, or other form of email management where the original messages are stored on the server in pristine format, you can turn this signature feature off by setting "TrainPristine on" in dspam.conf. DSPAM will then use the message itself that you provide it to train, which MUST be identical to the original message in order to retrain properly.

Line 485:

Line 565:

Because DSPAM learns each user's specific email behavior, it's necessary to identify the user in order to program their specific filtering database. This can be done in one of three ways:

−

~~ ~~

+

−

~~======~~ The Simple Way ~~======~~

+

''' The Simple Way '''

If you are using the MySQL or PgSQL storage drivers, the original numeric user id can be embedded in the signature, requiring only one central spam alias to be necessary for the entire system. To configure this, uncomment the appropriate UIDInSignature option in dspam.conf:

Line 501:

Line 582:

−

''~~Note~~:''

+

''NOTE:''

−

+

The 'root' user represents any active dspam user. It is necessary to supply a username on the commandline or DSPAM will bail on an error, however the user will be changed internally once the signature is read.

−

~~ ~~

+

−

~~======~~ The Kind-of-Simple Way ~~======~~

+

''' The Kind-of-Simple Way '''

If you're not using one of the above storage drivers, the next easiest way to configure aliases is to have DSPAM parse the 'To:' header of the message and use a catch-all subdomain to direct all mail into DSPAM for retraining. You can then instruct your users to email addresses like '[email protected]'. The ParseToHeaders option (available in dspam.conf) will parse the To: header of forwarded messages and set the username to either 'bob' or '[email protected]', depending on how it is configured. DSPAM can also set the training mode to either "learn spam" or "learn notspam" depending on whether the user specified a spam- or notspam- address in the To: header.

Line 520:

Line 601:

ChangeModeOnParse on

−

~~ ~~

+

−

~~======~~ The Old Way (A.K.A. The Hard Way) ~~======~~

+

''' The Old Way (A.K.A. The Hard Way) '''

If neither of the easy ways are possible, you're stuck with doing it the hard way. This means you'll need a separate spam alias (and notspam alias, if users are tagging mail) for each user. To do this, you will need to create an email address for each user, so that DSPAM can analyze and learn for that specific user. For example:

Line 535:

Line 617:

−

''~~Note About~~ Security:''

+

''NOTE about Security:''

You might be wondering if a user can forward a spam to another user's address, or whether a spammer can forward a spam to another user's notspam address. The answer is "no". The key to all mail-based retraining is the signature embedded in each email. The signature is stored with each user's own user id, and so not only does the incoming message have to bear a valid signature, but it also has to be stored on the system with the correct user id. This prevents any kind of alias abuse.

+

==== NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS ====

−

===== Non-SQL Based Nightly Purge =====

−

If you are NOT running a SQL-based solution, then you should configure dspam_clean to run under cron nightly. This clean tool will read all signature databases and purge signatures that are older than 14 days (configurable), purge abandoned tokens, and remove unimportant tokens. Without this tool, old signatures will continue to pile up. Be sure the user running cleanup has full read/write permissions on the DSPAM data files.

0 0 * * * /usr/local/bin/dspam_clean [options]

Line 551:

Line 632:

===== SQL-Based Nightly Purge =====

−

SQL-Based solutions include a nightly SQL script to perform the same basic

tasks as dspam_clean, and it does it much faster and with more finesse.

Line 562:

Line 642:

===== Log Rotation =====

−

The system log and user logs can fill up fairly quickly, when all that's really needed to generate graphs are the last two to three weeks of data. You can configure a nightly log cleanup using dspam_logrotate:

Line 568:

Line 647:

+

==== NOTIFICATIONS ====

−

DSPAM is capable of sending three different notifications to users:

−

+

* A "First Run" message sent to each user when they receive their first message through DSPAM.

−

- A "First Run" message sent to each user when they receive their first message through DSPAM.

+

* A "First Spam" message sent to each user when they receive their first spam

−

- A "First Spam" message sent to each user when they receive their first spam

+

* A "Quarantine Full" message sent to each user when their quarantine box is > 2MB in size.

−

- A "Quarantine Full" message sent to each user when their quarantine box is > 2MB in size.

+

These notifications can be activated by copying the txt/ directory from the distribution into DSPAM's home (by default /usr/local/var/dspam). You will want to modify these templates prior to installing them to reflect the correct email addresses and URLs (look for 'configureme' and 'yourdomain').

−

''NOTE:''

+

''NOTE:''

The quarantine warning is reset when the user clicks 'Delete All', but is not reset if they use "Delete Selected". If the user doesn't wish to receive reminders, they should use the "Delete Selected" function instead of "Delete All".

Line 585:

Line 663:

−

~~==== THE WEB UI ====~~

+

==== THE WEB UI ====

The Web UI (CGI client) can be run from any executable location on a web server, and detects its user's identity from the REMOTE_USER

environment variable. This means you'll need to use HTTP password authentication to access the CGI (Any type of authentication will work, so long as Apache supports the module). This is also convenient in that you can set up authentication using almost any existing system you have. The only catch is that you'll need the usernames to match the actual DSPAM usernames used the system. A copy of the shadow password file will suffice for most common installs.

Line 594:

Line 672:

−

''~~Note~~:''

+

''NOTE:''

−

+

Some authentication mechanisms are case insensitive and will authenticate the user regardless of the case they type it in. DSPAM, on the other hand, is case sensitive and the case of the username used will need to match the case on the system. If you suffer from this authentication problem, and are certain all of your users' usernames are in lowercase, you can add the following line of code to the CGI right after the call to &ReadParse...

Line 603:

Line 680:

−

''~~Note~~:''

+

''NOTE:''

−

+

Apache users do NOT take on the identity of the groups specified in /etc/group so you will need to specifically assign the group in httpd.conf.

−

''~~Note~~ about Procmail:''

+

''NOTE about Procmail:''

−

+

Because the DSPAM Web UI is a CGI script, DSPAM will not retain its setuid privileges when called. If you are running procmail, this will become a problem as procmail requires root privileges to deliver. The easiest hack around this is to create a procmail.dspam binary and make it setuid root, then make it executable only by the mail group (or whatever group DSPAM and the CGI run in).

−

The DSPAM Web UI has a minimal configuration inside the configure.pl script. You'll want to check and make sure all of the settings are correct. In most cases, the only that will be necessary to change are the large-scale or domain-scale flags.

+

The DSPAM Web UI has a minimal configuration inside the configure.pl script. You'll want to check and make sure all of the settings are correct. In most cases, the only settings that will be necessary to change are the large-scale or domain-scale flags.

Line 649:

Line 724:

The following PERL modules (http://www.perl.com/CPAN/modules/by-module/GD/):

−

. GD

+

* GD

−

. GD-Graph3d

+

* GD-Graph3d

−

. GDGraph

+

* GDGraph

−

. GDTextUtil

+

* GDTextUtil

−

. CGI

+

* CGI

Typically this can be accomplished on the commandline:

Line 668:

Line 743:

'''Opt-In/Out'''

−

If you would like your users to be able to opt in/out of DSPAM filtering, add the correct option to the nav_preferences.html template, depending on your configuration (for example, if you have an opt-in system, you'll want to add the opt-in option). ~~Note~~: This currently only works with the preferences extension, and not drop files.

+

If you would like your users to be able to opt in/out of DSPAM filtering, add the correct option to the nav_preferences.html template, depending on your configuration (for example, if you have an opt-in system, you'll want to add the opt-in option).

+

''NOTE:''

+

This currently only works with the preferences extension, and not drop files.

+

Opt into DSPAM filtering

Line 674:

Line 753:

Opt out of DSPAM filtering

+

=== TESTING ===

-----

−

~~ ~~

If you've installed from an RPM, there's a good chance that the packager went to the trouble of testing already. If you're building from sources,however, you'll need to find a way to ensure your configuration isn't broken.

Line 686:

Line 766:

Before running the test, you should have completed section 1.1's instructions for compiling and installing dspam as well as configured your mail server to support dspam.

−

+

−

+

==== 1. Create a new user account on your system ====

It is important that this be a new account to prevent any unrelated email from being delivered during testing. Be sure to configure a spam alias for the test account.

−

+

==== 2. Send a short email ====

Send a short email (10 words or less) to the account, and pick it up using your favorite mail client.

−

+

==== 3. Run dspam_stats ====

dspam_state [username]

Line 705:

Line 784:

If you receive an error such as "unable to open /usr/local/var/dspam... for reading", then the dspam agent is not configured correctly. The problem could exist in either your mail server configuration or one or more of the permissions on the directory or agent. Check your configuration and permissions, and repeat this step until the correct results are experienced.

−

+

==== 4. Run dspam_dump ====

dspam_dump [username]

Line 722:

Line 801:

7717766825815048192 S: 00265 I: 00068 P: 0.7358

−

+

==== 5. Forward the test message ====

Forward the test message to the spam alias you've created for the test account. Provide enough time for the message to have processed.

−

+

==== 6. Run dspam_stats again ====

dspam_state [username]

Line 733:

Line 812:

If this is not the case, check the group permissions of the dspam agent as well as the permissions your MTA uses when piping to aliases.

−

+

==== 7. Run dspam_dump [username] again ====

dspam_dump [username]

Line 742:

Line 821:

8851970219880318167 S: 1 I: 0 LH: Mon Aug 4 11:44:29 2003

If you have some tokens that do not have an S: of 1 or an I: of 0, the dspam signature was not found on the email, and this could be due to a lot of things.

+

=== TROUBLESHOOTING ===

-----

−

~~ ~~

''Problem:''

No files are being created in the user directory

Line 752:

Line 832:

Check the directory permissions of the directory. The user directory must be writable by the user the dspam agent is running as as well as the CGI user.

−

~~----~~

+

''Problem:''

False positives are never being delivered

Line 760:

Line 840:

Your CGI most likely doesn't have the privileges required by the LDA to deliver the messages. Make sure the CGI user is in the correct group. Also consider setting the dspam agent to setuid or setgid with the correct permissions.

−

~~----~~

+

''Problem:''

My database is getting huge!

''Solution:''

−

DSPAM's default training mode is TEFT. On top of this, the purging defaults are very lax. You might consider switching to TOE (Train-on-Error) mode training if you require a minimal database. If you are willing to sacrifice accuracy for disk space, disabling the 'chain' tokenizer from dspam.conf will prevent the use of multi-word (chained) tokens, which will also cut your database size considerably. You may also consider more frequent calls to dspam_clean -p to purge neutral data, which comprises a majorrity of most databases. For more help, please see the DSPAM FAQ at http://dspam.~~nuclearelephant~~.~~com~~.

+

DSPAM's default training mode is TEFT. On top of this, the purging defaults are very lax. You might consider switching to TOE (Train-on-Error) mode training if you require a minimal database. If you are willing to sacrifice accuracy for disk space, disabling the 'chain' tokenizer from dspam.conf will prevent the use of multi-word (chained) tokens, which will also cut your database size considerably. You may also consider more frequent calls to dspam_clean -p to purge neutral data, which comprises a majorrity of most databases.

+

For more help, please see the DSPAM FAQ at http://dspam.sourceforge.net.

Line 772:

Line 855:

=== DSPAM TOOLS ===

-----

−

~~ ~~

A few useful tools have been provided to make DSPAM management a bit easier. These tools include:

dspam_admin

Line 782:

Line 864:

Syntax: dspam_train [username] [spam_dir] [nonspam_dir] where username is the username of the user to apply the training to, and the two dirs represent directories containing messages in individual files (e.g. maildir/corpus format). dspam_train can be used on an existing user's database, to further improve accuracy, or to train from scratch. It also provides a solid test jig for testing the efficiency and accuracy of a test corpus against the filter.

−

''NOTE:''

+

''NOTE:''

−

+

dspam_train will automatically balance training of the corpus to ensure both spam and nonspam are trained based on the ratio of spam/nonspam. this means if you have twice as much spam as nonspam, two spam will be trained for every nonspam.

Line 795:

Line 876:

dspam_clean

Performs nightly housecleaning by deleting old or useless data from user data. dspam_clean performs the following operations:

+

1. Using the -s flag, dspam_clean will continue to perform stale signature purging. If an age is specified, for example -s14, the age defined as the default will be overridden. Specifying an age of 0 will delete all signatures for the users processed.

Line 804:

Line 886:

- Tokens which have only one spam hit

- Tokens which have only one innocent hit

−

+

Ages may be overridden by specifying a format such as -u30,15,10,10 where each number represents the respective age. Specifying an age of zero will delete all unused tokens in the category. Defaults are set in dspam.conf.

Line 827:

Line 909:

dspam_clean -s -p -u

+

''NOTE:''

+

You may wish to only run certain cleaning modes depending on the type of storage driver you are using. For example, the MySQL storage driver includes a script which performs signature and unused token operations, leaving only probability operations as useful. If you are using a SQL-based storage driver, it is strongly recommended that you use the maintenance scripts wherever possible for optimum efficiency.

−

~~''NOTE:''~~

−

You may wish to only run certain cleaning modes depending on the type of storage driver you are using. For example, the MySQL storage driver includes a script which performs signature and unused token operations, leaving only probability operations as useful. If you are using a SQL-based storage driver, it is strongly recommended that you use the maintenance scripts wherever possible for optimum efficiency.

−

dspam_stats

+

dspam_stats

Displays the spam statistics for one or all users on the system.

Syntax: dspam_stats [username]

Line 838:

Line 919:

−

dspam_genaliases

+

dspam_genaliases

Reads the /etc/passwd file and outputs a dspam aliases table which can be included in the master aliases table. You may try Art Sackett's generate_dspam_aliases tool at http://www.artsackett.com/freebies/generate_dspam_aliases/ if you need some better functionality. This will eventually be merged in as a replacement for the existing tool.

−

+

−

dspam_merge

+

dspam_merge

Merges multiple users' dictionaries together into one user's dictionary (does not affect the merge users). This can be used to create a seeded dictionary for a new user, or to copy a single user's dictionary to a new file. This is great for building global dictionaries, but crunches a lot of time and disk.

−

~~=== AGENT COMMANDLINE ARGUMENTS ===~~

+

+

=== AGENT COMMANDLINE ARGUMENTS ===

+

-----

+

==== Specifying a User ====

The DSPAM agent (dspam) recognizes the following commandline arguments:

--user [user1 user2 ... userN]

Specifies the destination user(s) of the incoming message. DSPAM then processes the message once for each user individually. If the message is to be delivered, the $u (or %u) parameters of the arguments string will be interpolated for the current user being processed.

−

~~ ~~

−

+

==== Classification ====

--class=[spam|innocent]

Tells DSPAM that the message being presented has already been classified by the user. This flag should be used when a misclassification has occurred, when the user is corpus-feeding a message, or an inoculation is being presented. This flag must be used in conjunction with the --source flag. Providing no classification invokes the SOP of DSPAM, which is to determine the message's nature on its own.

−

~~ ~~

+

==== Source ====

--source=[error|corpus|inoculation]

Wherever --class is used, the source of the user-provided classification must also be provided. The source is very important and dramatically affects DSPAM's training behavior:

−

~~ ~~

+

'''error:'''

The message being presented was a message previously misclassified by DSPAM. When 'error' is provided as a source, DSPAM requires that the DSPAM signature be present in the message, and will use the signature to recall the original training metadata. If the signature is not present, the message will be rejected. In this source mode, DSPAM will also decrement each token's previous classification's count as well as the user totals.

−

~~ ~~

−

You should use error only when DSPAM has made an error in classifying the message, and should present the modified version of the message with the DSPAM signature when doing so.

+

''You should use error only when DSPAM has made an error in classifying the message, and should present the modified version of the message with the DSPAM signature when doing so.''

−

~~ ~~

+

'''corpus:'''

The message being presented is from a mail corpus, and should be trained as a new message, rather than re-trained based on a signature. The message's full headers and body will be analyzed and the correct classification will be incremented, without its opposite being decremented.

−

~~ ~~

−

You should use corpus only when feeding messages in from corpus, not for correcting errors.

+

''You should use corpus only when feeding messages in from corpus, not for correcting errors. ''

+

'''inoculation:'''

The message being presented is in pristine form, and should be trained as an inoculation. Inoculations are a more intense mode of training designed to cause DSPAM to train the user's metadata repeatedly on previously unknown tokens, in an attepmt to vaccinate the user from future messages similar to the one being presented.

−

~~ ~~

−

You should use inoculation only on honeypots and the like.

+

''You should use inoculation only on honeypots and the like.''

+

Line 886:

Line 969:

--deliver=[innocent,spam]

Tells DSPAM to deliver the message if its result falls within the criteria specified. For example, --deliver=innocent will cause DSPAM to only deliver the message if it classifies as innocent. Providing --deliver=innocent,spam will cause DSPAM to deliver the message regardless of its classification. This flag provides a significant amount of flexibility for nonstandard implementations, where false positives may not be delivered but spam is, and etcetera.

−

~~ ~~

+

−

+

--stdout

If the message is indeed deemed "deliverable" by the --deliver flag, this flag will cause DSPAM to deliver the message to stdout, rather than the configured delivery agent.

−

~~ ~~

+

--process

Tells DSPAM to process the message. This is the default behavior, and the flag is implied unless --classify is used - but is a good idea to use to avoid ambiguity.

−

~~ ~~

+

--classify

Tells DSPAM only to classify the message, and not make any writes to the user's metadata or attempt to deliver/quarantine the message.

−

~~ ~~

+

''NOTE:''

The output of the classification is specific to the user, not including the output of any groups they might be affiliated with, so it is entirely possible that the message would be caught as spam by the group, even if it didn't appear in the classification. If you want to get the classification for the GROUP, use the group name as the user instead of an individual.

+

−

==== Signatures ====

--signature=[signature]

For some implementations, the admin may wish to pass the signature in via commandline instead of allowing DSPAM to find it on its own. This is especially useful when front-ending the agent with other tools. Using this option will set the active signature and will also forego reading of stdin.

−

~~ ~~

−

+

==== Training Modes ====

--mode=[toe|tum|teft|notrain|unlearn]

Configures the training mode to be used for this process:

−

~~ ~~

+

===== TEFT =====

Train-Everything. Trains on all messages processed. This is a very thorough training approach and should be considered the standard training approach for most users. TEFT may, however, prove too volatile on installations with extremely high per-user traffic, or prove not very scalable on systems with extremely large user-bases. In the event that TEFT is proving ineffective, one of the other modes is recommended.

Line 924:

Line 1,005:

Until a user reaches 100 innocent messages in their metadata, train-on-error will also be teft-based, even if otherwise specified on the commandline.

+

===== TOE =====

Train-on-Error. Trains only on a classification error, once the user's metadata has matured to 2500 innocent messages. This training mode is much less resource intensive, as only occasional metadata writes are necessary. It is also far less volatile than the TEFT mode of training. One drawback, however, is that TOE only learns when DSPAM has made a mistake - which means the data is sometimes too static, and unable to "ease into" a different type of behavior.

−

~~ ~~

+

===== TUM =====

Train-until-Mature. This training mode is a hybrid between the other two training modes and provides a great balance between volatility and static metadata. TuM will train on a per-token basis only tokens which have had fewer than 50 "hits" on them, unless an error is being retrained in which case all tokens are trained. This training mode provides a solid core of stable tokens to keep accuracy consistent, but also allows for dynamic adaptation to any new types of email behavior a user might be experiencing. It is a balance of resources as well, as only less-than-mature tokens are written to the database. NOTE: You should corpus train before using tum.

−

~~ ~~

+

===== NOTRAIN =====

No training. Do not train the user's data, and do not keep totals. This should only be used in cases where you want to process mail for a particular user (based on a group, for example), but don't want the user to accumulate any learning data.

−

~~ ~~

+

===== UNLEARN =====

Unlearn original training. Use this if you wish to unlearn a previously learned message. Be sure to specify --source=error and --class to whatever the original classification the message was learned under. If not using TrainPristine, this will require the original signature from training.

−

~~ ~~

+

'''RECOMMENDATIONS'''

In general, it is recommended that users begin with TEFT. If a user is experiencing between a 75-85% spam ratio, they may benefit from Train-on-Mature mode. If a user is experiencing over 90% spam, then Train-on-Error mode should make a noticeable improvement in accuracy. It eventually boils down to what works best for your users. There is no reason a system could not be configured (with a script) to analyze a user's *.stats file and determine the best training mode for that user.

−

~~ ~~

−

+

==== Features ====

−

--feature=[~~noise~~,~~whitelist~~,tb=N]

+

--feature=[no,wh,tb=N]

Specifies the features that should be activated for this filter instance. The following features may be used individually or combined using a comma as a delimiter:

−

~~ ~~

−

~~'''noise:''' ~~

−

Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in at 2500 innocent messages and provides an advanced progressive noise logic to reduce Bayesian Noise (wordlist attacks) in spams. See http://bnr.nuclearelephant.com for more information. BNR is not for everyone, and so users should try it out after they've trained to see if it helps improve accuracy.

−

~~ ~~

−

'''tb=N:''~~' ~~

+

''no:''

−

Sets the training loop buffering level. Training loop buffering is the amount of statistical sedation performed to water down statistics and avoid false positives during the user's training loop. The training buffer sets the buffer sensitivity, and should be a number between 0 (no buffering whatsoever) to 10 (heavy buffering). The default is 5, half of what previous versions of DSPAM used. To avoid dulling down statistics at all during the training loop, set this to 0. This feature should be disabled if you're not paranoid about false positives, as it does increase the number of spam misses significantly during training.

+

−

~~ ~~

+

Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in at 2500 innocent messages and provides an advanced progressive noise logic to reduce Bayesian Noise (wordlist attacks) in spams. See http://bnr.nuclearelephant.com for more information. BNR is not for everyone, and so users should try it out after they've trained to see if it helps improve accuracy.

+

''tb=N:''

+

Sets the training loop buffering level. Training loop buffering is the amount of statistical sedation performed to water down statistics and avoid false positives during the user's training loop. The training buffer sets the buffer sensitivity, and should be a number between 0 (no buffering whatsoever) to 10 (heavy buffering). The default is 5, half of what previous versions of DSPAM used. To avoid dulling down statistics at all during the training loop, set this to 0. This feature should be disabled if you're not paranoid about false positives, as it does increase the number of spam misses significantly during training.

+

''wh:''

−

~~'''whitelist:''' ~~

Automatic whitelisting. DSPAM will keep track of the entire "From:" line for each message received per user, and automatically whitelist messages from senders with more than 10 innocent messages and zero spams. Once the user reports a spam from the sender, automatic whitelisting will automatically be deactivated for that sender. Since DSPAM uses the entire "From:" line, and not just the sender's email address, automatic whitelisting is a very safe approach to improving accuracy during initial training.

−

~~ ~~

−

'''NOTE:'''

+

''NOTE:''

None of the present features are necessary when the source is "error", because the original training data is used from the signature to retrain, instantiating whatever features (such as whitelisting) were active at the time of the initial classification. Since BNR is only necessary when a message is being classified, the --feature flag can be safely omitted from error source calls.

+

Line 969:

Line 1,054:

--daemon

Puts DSPAM in daemon mode; e.g. DSPAM acts like a server when started with this parameter. See section 2.3 for more information about daemon mode.

+

== LINKING WITH LIBDSPAM ==

−

----

−

+

Developers are able to link to the DSPAM core engine (libdspam) to provide "drop-in" spam-filtering for their applications. Examples of the libdspam API can be found in the example.c file included with this distribution.

−

+

−

Developers are able to link to the DSPAM core engine (libdspam) to provide

+

−

"drop-in" spam-filtering for their applications. Examples of the libdspam

+

−

API can be found in the example.c file included with this distribution.

+

−

+

IF YOUR PROJECT USES THE LIBDSPAM API, A GPL-COMPATIBLE OPEN SOURCE LICENSE

IS REQUIRED IN ORDER TO REDISTRIBUTE. IF YOU ARE DEVELOPING A CLOSED-SOURCE

Line 987:

Line 1,068:

NOT REDISTRIBUTE ANY APPLICATIONS USING LIBDSPAM WITHOUT A COMMERCIAL

LICENSE.

−

+

COMMERCIAL LICENSING BENEFITS:

- PRIORITY DEVELOPER SUPPORT

Line 993:

Line 1,074:

- NON-GPL REDISTRIBUTION PRIVILEGES

- BUG AND FEATURE REQUEST PRIORITY

−

+

−

Please contact the author at ~~[email protected]~~ for information

+

Please contact the author at 'to be determined' for information

about commercial licensing.

−

+

</COMMERCIAL LICENSING>

−

+

To link to libdspam, follow the instructions for compiling and installing

DSPAM. When compiled, the libdspam static and shared libraries are also

built. This library contains all the functions necessary to use dspam's

filtering in your application.

−

+

Your application will also need to link to the correct storage driver

libraries. If you are using libdspam in a multithreaded application, you

will need to either use a thread-safe storage driver or control access to

libdspam using a mutex lock.

−

+

If you are using libdspam in a multithreaded environment, each thread will

require its own DSPAM context. Fortunately, you can attach the same

database handle to each context using dspam_attach(). See the man page for

more information.

−

+

To build with the dspam API, you will also need the header files from

the distribution. You can copy these to /usr/include/dspam for ease of

use, and then use -I/usr/include/dspam

−

+

Please see example.c for API examples.

−

+

If you are interested in linking libdspam with your project and have

questions or concerns, please contact the dspam-dev mailing list.

−

+

=== CONFIGURING GROUPS ===

+

-----

+

Groups enable a group of users to share information. The following group types are supported:

−

~~Groups enable a group of~~ users to share ~~information~~. The ~~following~~

+

−

group ~~types~~ are ~~supported:~~

+

==== SHARED GROUPS ====

+

Enables users with similar email behavior to share the same dictionary while still maintaining a private quarantine box. The benefits of this type of group are faster learning, and sharing a single spam alias. Shared groups can have both positive and negative effects on accuracy. If a shared group consists of users with similar, predictable email behavior, the users in the group can benefit from a larger dictionary of spam and faster learning (especially for newcomers in the group). If a group consists of users with different email behavior, however, the users in the group will experience poor spam filtering and a higher number of false positives.

−

~~SHARED~~

+

''NOTE:''

−

~~Enables users with similar email behavior to share the same dictionary~~

+

The SQL-based storage drivers support shared groups, but has one caveat:

−

~~while still maintaining a private quarantine box.~~ The ~~benefits of this~~

+

If you are NOT enabling "virtual users" support, you will need to create an actual user on your system named after each group you create.

−

~~type of group are faster learning, and sharing a single spam alias. Shared~~

+

−

groups ~~can have both positive and negative effects on accuracy. If a shared~~

+

−

~~group consists of users with similar~~, ~~predictable email behavior, the users~~

+

−

~~in the group can benefit from a larger dictionary of spam and faster~~

+

−

~~learning (especially for newcomers in the group).~~ If ~~a group consists of~~

+

−

users ~~with different email behavior~~, ~~however, the users in the~~ group ~~will~~

+

−

~~experience poor spam filtering and a higher number of false positives~~.

+

−

~~NOTE~~

+

On top of shared group support, a shared group can also be made to be 'managed'. Using the group type 'SHARED,MANAGED' will cause the group to share a single quarantine mailbox which could be managed by the group's administrator. This would enable one individual to monitor quarantine for the entire group, however personal emails marked as false positives could potentially be viewed as well. For this reason, managed groups should only be used when this is not an issue.

−

~~The SQL-based storage drivers support~~ shared ~~groups, but has one caveat:~~

+

−

~~If you are NOT enabling "virtual users"~~ support, ~~you~~ will ~~need~~ to ~~create~~

+

−

~~an actual user on your system named after each~~ group ~~you create~~.

+

−

~~On top of shared~~ group ~~support~~, ~~a shared group can also be made to be~~

+

−

~~'managed'. Using~~ the group ~~type 'SHARED,MANAGED'~~ will ~~cause the group to~~

+

==== INOCULATION GROUPS ====

−

~~share a single quarantine mailbox which could be managed by the group's~~

+

An inoculation group allows users to maintain their own private dictionaries with their own spam alias, but all members of the group will inoculate other members with spams they manually forward into their alias. This allows users to report spams to one another and maintain their own private dictionary. Another advantage to this is that users do not necessarily have to share the same email behavior.

−

~~administrator~~. This ~~would enable~~ one ~~individual to monitor quarantine for~~

+

−

~~the entire group, however personal emails marked as false positives could~~

+

−

~~potentially be viewed as well~~. ~~For this reason, managed groups should only~~

+

−

~~be used when~~ this is not ~~an issue~~.

+

−

~~INOCULATION~~

+

''NOTE:''

−

~~An inoculation group allows users to maintain their own private dictionaries~~

+

Users should only be added to an inoculation group after their initial learning period, to avoid potential false positives due to lack of data.

−

~~with their own spam alias, but all members of the group will inoculate other~~

+

−

~~members with spams they manually forward into their alias. This allows~~

+

−

~~users to report spams to one another and maintain their own private~~

+

−

~~dictionary. Another advantage to this is that users do not necessarily have~~

+

−

~~to share the same email behavior.~~

+

−

+

−

NOTE: Users should only be added to an inoculation group after their initial

+

−

learning period, to avoid potential false positives due to lack of data.

+

To create groups, you'll want to create a file with the filename 'group'

Line 1,082:

Line 1,144:

group.

−

CLASSIFICATION

+

+

==== CLASSIFICATION GROUPS ====

Classification groups allow a group of users to network their results

together. If DSPAM is uncertain of whether a message is spam or nonspam for

Line 1,128:

Line 1,191:

established between both parties.

−

GLOBAL GROUPS

+

−

+

==== GLOBAL GROUPS ====

Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box

filtering" for all new users until they have built their own useful

Line 1,146:

Line 1,209:

treated just as any other user on the system.

−

NOTE: Be sure and set your global user's preferences so that trainingMode

+

''NOTE:''

−

is set to TOE. This will prevent the purge tools you use from

+

Be sure and set your global user's preferences so that trainingMode is set to TOE. This will prevent the purge tools you use from purging them empty in 90 days.

−

purging them empty in 90 days.

+

−

+

−

~~MERGED GROUPS~~

+

+

==== MERGED GROUPS ====

Merged groups are similar to global groups in that the entire system uses

a single global user as a parent. What's different is that the global

Line 1,183:

Line 1,245:

the group.

−

NOTE: Merged Groups are great for providing out-of-the-box adaptive filtering,

+

''NOTE:''

−

but allowing users to build their own data from scratch will still

+

Merged Groups are great for providing out-of-the-box adaptive filtering, but allowing users to build their own data from scratch will still result in the best possible accuracy in the longrun.

−

result in the best possible accuracy in the longrun.

+

−

+

−

~~NOTE: Be sure and set your global user's preferences so that trainingMode~~

+

−

~~is set to TOE. This will prevent the purge tools you use from~~

+

−

~~purging them empty in 90 days~~.

+

''NOTE:''

+

Be sure and set your global user's preferences so that trainingMode is set to TOE. This will prevent the purge tools you use from purging them empty in 90 days.

−

IMPORTANT!

+

''' IMPORTANT! '''

If you are running dspam_clean, be sure to set a preference for your merged

Line 1,199:

Line 1,258:

out your entire merged group user's dataset, since it's old).

+

=== EXTERNAL INOCULATION THEORY ===

−

+

-----

−

Bill Yerazunis recently expressed his theory of inoculation on an anti-spam

+

Bill Yerazunis recently expressed his theory of inoculation on an anti-spam development list, using the term "vaccination":

−

development list, using the term "vaccination":

+

"Part of the problem is that spam isn't stationary, it evolves. That

Line 1,282:

Line 1,341:

harvester bots, making them obsolete as counter-productive tools.

+

=== CLIENT/SERVER MODE ===

+

-----

+

DSPAM supports two different modes of operation. In standard operating mode, the DSPAM agent is called by the MTA (or proxy) and each agent process performs independently, establishing its own connection to a database and performs delivery on its own. The second operating mode, client/server mode, allows the DSPAM agent to act more like a thin client, connecting to the DSPAM server process which then does all the work of analyzing and delivering or quarantining the message. The advantages to using DSPAM in client/server mode are:

+

* Maintaining a set of stateful database connections (within the server), which should enhance performance on some systems by eliminating the need to establish a new database connection for every message processed.

−

~~DSPAM supports two different modes~~ of ~~operation~~. ~~In standard operating~~

+

* Providing a central point of processing. Having one server perform all processing and delivery, while having multiple thin clients on your mail servers may be more desirable than having multiple agents performing processing and delivery on all your servers.

−

~~mode, the DSPAM agent is called by the MTA (or proxy)~~ and ~~each agent process~~

+

* The DSPAM server speaks LMTP, which some implementations may be able to take advantage of, eliminating the need for the DSPAM client all together.

−

~~performs independently~~, ~~establishing its own connection to a database~~ and

+

* Having a single multithreaded daemon should use less memory and other resources than having independently operating clients.

−

~~performs~~ delivery on ~~its own~~. The ~~second operating mode, client/~~server ~~mode~~,

+

−

~~allows the DSPAM agent~~ to ~~act more like a thin client~~, ~~connecting to~~ the

+

−

DSPAM ~~server process which then does~~ all ~~the work of analyzing~~ and ~~delivering~~

+

−

~~or quarantining the message~~. ~~The advantages to using DSPAM in client/server~~

+

−

~~mode are:~~

+

−

~~- Maintaining a set of stateful database connections (within the server),~~

−

~~which should enhance performance on some systems by eliminating the need~~

−

~~to establish a new database connection for every message processed.~~

−

~~- Providing a central point of processing. Having one server perform all~~

+

If you've already got DSPAM set up, client/server mode won't require any changes to your mail server's configuration - it's completely transparent.

−

~~processing and delivery~~, ~~while having multiple thin clients on~~ your mail

+

−

~~servers may be more desirable than having multiple agents performing~~

+

−

~~processing and delivery on all your servers~~.

+

−

~~- The DSPAM server speaks LMTP, which some implementations may be able to~~

−

~~take advantage of, eliminating the need for the DSPAM client all together.~~

−

- ~~Having a single multithreaded~~ daemon ~~should~~ use ~~less memory~~ and ~~other~~

+

The DSPAM agent can be compiled with client/server support by configuring with --enable-daemon. You will need to use a multithread-safe storage driver (presently mysql_drv, pgsql_drv, and hash_drv are supported). Once you have compiled with daemon support, you'll need to modify your dspam.conf to provide the settings necessary for client/server mode:

−

~~resources than having independently operating clients~~.

+

−

~~If you've already got DSPAM set up, client/server mode won't require any~~

+

ServerHost 127.0.0.1

−

~~changes~~ to ~~your mail server's configuration - it's completely transparent~~.

+

The host to listen on. The default is to comment this setting which will force DSPAM to listen on all available interfaces.

−

~~The DSPAM agent can be compiled with client/server support by configuring~~

−

~~with --enable-daemon. You will need to use a multithread-safe storage driver~~

−

~~(presently mysql_drv and pgsql_drv are supported). Once you have compiled~~

−

~~with daemon support, you'll need to modify your dspam.conf to provide the~~

−

~~settings necessary for client/server mode:~~

−

ServerPort 24

+

ServerPort 24

+

The port to listen on. The default is 24, the LMTP port.

−

~~The port to listen on. The default is 24, the LMTP port.~~

−

ServerQueueSize 32

+

ServerQueueSize 32

+

The maximum number of connections which may remain backlogged before they are accepted.

−

~~The maximum number of connections which may remain backlogged before they~~

−

~~are accepted.~~

−

ServerPass.Relay1 "secret"

+

ServerPass.Relay1 "secret"

−

ServerPass.Relay2 "password"

+

ServerPass.Relay2 "password"

+

Each client server allowed to connect should have its own password. They can be defined here.

−

~~Each client server allowed to connect should have its own password. They~~

−

~~can be defined here.~~

−

The DSPAM server can listen on either a network socket or a local unix

+

The DSPAM server can listen on either a network socket or a local unix domain socket. If you're running the client and server on the same machine, a domain socket should be used as it eliminates additional overhead. To use a domain socket, you'll also need to add the following option:

−

domain socket. If you're running the client and server on the same machine,

+

−

a domain socket should be used as it eliminates additional overhead. To use

+

−

a domain socket, you'll also need to add the following option:

+

−

ServerDomainSocketPath "/tmp/dspam.sock"

+

ServerDomainSocketPath "/tmp/dspam.sock"

−

~~Once you've configured the server config, you'll want to set the client~~

−

~~configuration on all client machines. If you are using network sockets,~~

−

~~set the following to appropriate values:~~

−

ClientHost 127.0.0.1

+

Once you've configured the server config, you'll want to set the client configuration on all client machines. If you are using network sockets, set the following to appropriate values:

−

ClientPort 24

+

ClientHost 127.0.0.1

+

ClientPort 24

−

Or if using a domain socket:

+

Or if using a domain socket:

+

ClientHost /tmp/dspam.sock

−

~~ClientHost /tmp/dspam.sock~~

+

In both cases, you'll need to set the client's authentication ident:

+

ClientIdent "secret@Relay1"

−

~~In both cases, you'll need to set the client's authentication ident:~~

−

~~ClientIdent "secret@Relay1"~~

+

Now you're ready to go. To start the DSPAM server, run:

+

dspam --daemon &

−

~~Now you're ready to go. To start the DSPAM server~~, ~~run~~:

+

Or alternatively, if you have debugging enabled:

+

dspam --debug --daemon &

−

~~dspam --daemon &~~

−

~~Or alternatively,~~ if you ~~have debugging enabled~~:

+

The DSPAM agent can then be called the same as if you were running in standard (non-client/server) mode and adding --client to the set of parameters. Running dspam without --client specified will cause DSPAM to revert to its normal non-daemon behavior and establish database connections

+

on its own. The client settings will be loaded from dspam.conf, and the agent will act as a thin client instead. For example:

+

dspam --client --user dick jane --deliver=innocent -d %u

−

dspam --~~debug~~ --~~daemon &~~

+

Alternatively, if you'd like to use a thinner client, dspamc is identical to the dspam binary in behavior, but has been stripped down to only include the lightweight client.

+

dspamc --client --user dick jane --deliver=innocent -d %u

−

~~The DSPAM agent can then be called the same as if you were running in~~

−

~~standard (non-client/server) mode and adding --client to the set of~~

−

~~parameters. Running dspam without --client specified will cause DSPAM to~~

−

~~revert to its normal non-daemon behavior and establish database connections~~

−

~~on its own. The client settings will be loaded from dspam.conf, and the~~

−

~~agent will act as a thin client instead. For example:~~

−

~~dspam~~ --~~client~~ --~~user dick jane~~ --deliver=innocent -d %u

+

The conversation that takes place between the client/server is LMTP-based, and will look like this:

+

SERVER> 220 DSPAM DLMTP 3.4.0 Authentication Required

+

CLIENT> LHLO Relay1

+

SERVER> 250-PIPELINING

+

SERVER> 250-ENHANCEDSTATUSCODES

+

SERVER> 250-DSPAMPROCESSMODE

+

SERVER> 250 SIZE

+

CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"

+

SERVER> 250 2.1.0 OK

+

CLIENT> RCPT TO: dick

+

SERVER> 250 2.1.5 OK

+

CLIENT> RCPT TO: jane

+

SERVER: 250 2.1.5 OK

+

CLIENT> DATA

+

SERVER> 354 Enter mail, end with "." on a line by itself

+

CLIENT> Subject: Cheap Viagra!

+

CLIENT>

+

CLIENT> Click Here: <nowiki>http://www.cheapviagra.com</nowiki>

+

CLIENT> .

+

SERVER> 250 2.0.0 <dick> Message accepted for delivery: INNOCENT

+

SERVER> 250 2.0.0 <jane> Message accepted for delivery: SPAM

−

~~Alternatively, if you'd like to use a thinner client, dspamc is identical~~

−

~~to the dspam binary in behavior, but has been stripped down to only include~~

−

~~the lightweight client.~~

−

~~dspamc~~ --~~client --user dick jane~~ --deliver~~=innocent -d %u~~

+

Optionally, if you'd like the clients to perform delivery, you can use DSPAM's --stdout or --classify functionality to obtain a dump of the message or results, respectively. From there, it's up to you and your MTA to deliver the message. The DSPAM client will output the results to stdout in this case, just as it would in standard operating mode.

−

~~The conversation that takes place between the client/server is LMTP-based,~~

−

~~and will look like this:~~

−

~~SERVER> 220 DSPAM DLMTP 3.4.0 Authentication Required~~

+

Once the server is running, its configuration can be reloaded with a SIGHUP.

−

~~CLIENT> LHLO Relay1~~

+

When the daemon is reloaded, the following occurs:

−

~~SERVER> 250-PIPELINING~~

+

* The daemon stops listening for new requests

−

~~SERVER> 250-ENHANCEDSTATUSCODES~~

+

* All threads are allowed to finish processing and exit

−

~~SERVER> 250-DSPAMPROCESSMODE~~

+

* All connections to the database are closed

−

~~SERVER> 250 SIZE~~

+

* The dspam.conf configuration is reloaded

−

~~CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"~~

+

* All connections to the database are re-opened

−

~~SERVER> 250 2.1.0 OK~~

+

* The daemon starts listening for new requests

−

~~CLIENT> RCPT TO: dick~~

+

−

~~SERVER> 250 2.1.5 OK~~

+

−

~~CLIENT> RCPT TO: jane~~

+

−

~~SERVER: 250 2.1.5 OK~~

+

−

~~CLIENT> DATA~~

+

−

~~SERVER> 354 Enter mail~~, ~~end~~ with ~~"." on~~ a ~~line by itself~~

+

−

~~CLIENT> Subject: Cheap Viagra!~~

+

−

~~CLIENT>~~

+

−

~~CLIENT> Click Here: http://www~~.~~cheapviagra.com~~

+

−

~~CLIENT> .~~

+

−

~~SERVER> 250 2.0.0 <dick> Message accepted for delivery~~: ~~INNOCENT~~

+

−

~~SERVER> 250 2.0.0 <jane> Message accepted~~ for ~~delivery: SPAM~~

+

−

+

−

~~Optionally, if you'd like the clients~~ to ~~perform delivery, you can use~~

+

−

~~DSPAM's --stdout or --classify functionality~~ to ~~obtain a dump of~~ the ~~message~~

+

−

~~or results, respectively~~. ~~From there, it's up to you and your MTA~~ to

+

−

~~deliver the message.~~ The ~~DSPAM client will output the results to stdout in~~

+

−

~~this case, just as it would in standard operating mode.~~

+

−

~~Once the server is running, its configuration can~~ be reloaded ~~with a SIGHUP~~.

+

This allows database and listener configurations to also be reloaded from dspam.conf without the need to interrupt the process.

−

~~When~~ the ~~daemon is reloaded,~~ the ~~following occurs:~~

+

−

~~- The daemon stops listening for new requests~~

−

~~- All threads are allowed to finish processing and exit~~

−

~~- All connections to the database are closed~~

−

~~- The dspam.conf configuration is reloaded~~

−

~~- All connections to the database are re-opened~~

−

~~- The daemon starts listening for new requests~~

−

~~This allows database and listener configurations to also be reloaded from~~

+

''NOTE:''

−

~~dspam.conf without the need to interrupt the process.~~

+

During the period of time the daemon is reloading, client connections will fail. Depending on how the MTA reacts, this may cause messages to fall back to queue or to bounce.

−

+

−

NOTE: During the period of time the daemon is reloading, client connections

+

−

will fail. Depending on how the MTA reacts, this may cause messages to

+

−

fall back to queue or to bounce.

+

=== LMTP ===

−

+

-----

−

DSPAM supports LMTP both on the front-end and back-end (delivery). This

+

DSPAM supports LMTP both on the front-end and back-end (delivery). This section will briefly provide instructions for configuring either or both of these advanced options.

−

section will briefly provide instructions for configuring either or both of

+

−

these advanced options.

+

LMTP (AND SMTP) DELIVERY

Line 1,526:

Line 1,547:

In both cases, the content provided between < > is what is actually used.

+

=== DSPAM USER PREFERENCES ===

+

-----

+

Preferences are settings that can be configured globally in dspam.conf or for individual users via the dspam_admin command.

+

trainingMode { TOE | TUM | TEFT | NOTRAIN }

+

How DSPAM should train messages it analyzes. See section 1.5 --mode (default:teft, see dspam.conf)

+

spamAction { quarantine | tag | deliver }

+

What to do with spam. The tag and deliver options both deliver, but tag adds a special prefix to the subject, whereas deliver merely sets X-DSPAM-Result. (default:quarantine)

+

−

~~Preferences are settings that can be configured globally in dspam.conf or~~

+

spamSubject

−

~~for individual users via the dspam_admin command~~.

+

A customized subject to prefix when spamAction=tag. (default:[SPAM])

−

~~trainingMode { TOE | TUM | TEFT | NOTRAIN }~~

−

~~How DSPAM should train messages it analyzes. See section 1.5 --mode~~

−

~~(default:teft, see dspam.conf)~~

−

~~spamAction~~ { ~~quarantine | tag | deliver~~ }

+

statisticalSedation { 0 - 10 }

−

~~What to do with spam.~~ The ~~tag and deliver options both deliver~~, ~~but tag~~

+

The level of dampening during training (0-10, 0 = no dampening, default:0)

−

~~adds a special prefix to the subject~~, ~~whereas deliver merely sets~~

+

−

~~X-DSPAM-Result. (~~default:~~quarantine~~)

+

−

~~spamSubject~~

−

~~A customized subject to prefix when spamAction=tag. (default:[SPAM])~~

−

~~statisticalSedation~~ { ~~0 - 10~~ }

+

enableBNR { on | off }

−

~~The level of dampening during training~~ (~~0-10, 0 = no dampening,~~ default:0)

+

Enables or disables bayesian noise reduction (default:off)

−

~~enableBNR { on | off }~~

−

~~Enables or disables bayesian noise reduction (default:off)~~

−

enableWhitelist { on | off }

+

enableWhitelist { on | off }

−

Enables or disables automatic whitelisting (default:on)

+

Enables or disables automatic whitelisting (default:on)

−

~~signatureLocation { message | headers }~~

−

~~Where to place the DSPAM signature. Placement affects forwarding approach.~~

−

~~(default:message)~~

−

~~tagSpam / tagNonspam~~ { on | ~~off~~ }

+

signatureLocation { message | headers }

−

~~Adds a tagline~~ to the ~~end of a message based on its classification; useful~~

+

Where to place the DSPAM signature. Placement affects forwarding approach. (default:message)

−

~~for things such as "Scanned by Your ISP~~.~~com". If set to on, the file~~

+

−

~~msgtag.spam and/or msgtag.nonspam will be looked for in dspam_home/txt/~~

+

−

~~and appended to appropriate messages~~.

+

−

~~NOTE: Signed messages will not be tagged in this fashion~~

−

~~showFactors~~ { on | off }

+

tagSpam / tagNonspam { on | off }

−

~~Whether~~ to ~~include an X-DSPAM-Factors header including decision-making~~

+

Adds a tagline to the end of a message based on its classification; useful for things such as "Scanned by Your ISP.com". If set to on, the file msgtag.spam and/or msgtag.nonspam will be looked for in dspam_home/txt/ and appended to appropriate messages.

−

~~factors (clues)~~. ~~NOTE: This can break RFC in some cases~~, and ~~should only~~

+

−

be ~~used~~ for ~~debugging~~. ~~(default:off)~~

+

−

~~optIn / optOut { on | off }~~

+

''NOTE:''

−

~~Depending on whether the system is opt-in or opt-out, sets the user~~'s

+

Signed messages will not be tagged in this fashion

−

~~membership. If user is opted out (or not opted in), mail~~ will be ~~delivered~~

+

−

~~by DSPAM without being processed.~~

+

−

~~whitelistThreshold { Integer }~~

−

~~Overrides the default number of times a From: header has been seen before~~

−

~~it is automatically whitelisted. (default:10)~~

−

~~makeCorpus~~ { on | off }

+

showFactors { on | off }

−

~~When activated, a maildir~~-~~style corpus is maintained in the user's data~~

+

Whether to include an X-DSPAM-Factors header including decision-making factors (clues). NOTE: This can break RFC in some cases, and should only be used for debugging. (default:off)

−

~~directory~~ (~~DSPAM_HOME/DATA/USERNAME~~), ~~suitable~~ for ~~future retraining or~~

+

−

~~other analysis~~. (default:off)

+

−

~~storeFragments { on | off }~~

−

~~When activated, the first 1k of each message are temporarily stored on~~

−

~~the server for reference via the webui's history function. (default:off)~~

−

~~localStore~~ { on | off }

+

optIn / optOut { on | off }

−

~~Overrides~~ the ~~directory name used for~~ the user's ~~dspam data directory~~. ~~This~~

+

Depending on whether the system is opt-in or opt-out, sets the user's membership. If user is opted out (or not opted in), mail will be delivered by DSPAM without being processed.

−

is ~~useful when using recipient addresses as usernames, as it will allow~~

+

−

~~all addresses belonging to a specific user to be written to a single~~

+

−

~~webui directory.~~ (~~default:username~~)

+

−

+

−

~~processorBias { on | off }~~

+

−

~~Overrides the "bias" setting in dspam.conf~~, ~~which biases~~ mail as

+

−

~~innocent~~. ~~(default:on, see dspam.conf)~~

+

−

~~fallbackDomain { on | off }~~

−

~~Allows a dspam user ("@domain.com") to be marked as a fallback user for~~

−

~~the entire domain, so if the destination dspam user does not exist in~~

−

~~the database, the fallback user's database will be used. The~~

−

~~dspam.conf "FallbackDomains" setting must also be "on". (default:off)~~

−

~~NOTE: You will need to set "FallbackDomains on" in dspam.conf to use this.~~

−

~~trainPristine~~ { ~~on | off~~ }

+

whitelistThreshold { Integer }

−

~~Override's~~ the default ~~signature mode and treats messages as if they were~~

+

Overrides the default number of times a From: header has been seen before it is automatically whitelisted. (default:10)

−

~~in pristine format when retraining. This requires all retraining to use~~

+

−

~~the original message that was processed as no dspam signature~~ is ~~stored~~

+

−

~~for pristine training~~. (default:~~off~~)

+

−

~~optOutClamAV { on | off }~~

−

~~Opts out of ClamAV virus scanning (if ClamAV is directly integrated with~~

−

~~dspam via dspam.conf). (default:off)~~

+

makeCorpus { on | off }

+

When activated, a maildir-style corpus is maintained in the user's data directory (DSPAM_HOME/DATA/USERNAME), suitable for future retraining or other analysis. (default:off)

−

~~=== FALLBACK DOMAINS ===~~

−

~~Fallback domains allow you to default some or all users for a particular~~

+

storeFragments { on | off }

−

~~domain to a single domain user; this allows you to set preferences (including~~

+

When activated, the first 1k of each message are temporarily stored on the server for reference via the webui's history function. (default:off)

−

~~opting out~~ of ~~filtering entirely) for users based~~ on ~~domain name. Any user~~

+

−

~~who does not exist as a known user to DSPAM will be defaulted to~~ the

+

−

~~domain it belongs to if it is designated as a fallback domain~~. ~~This~~

+

−

~~means that you can create [email protected] and [email protected] with their own~~

+

−

~~databases and preferences, but also~~ default ~~all other users to @domain.com.~~

+

−

~~Alternatively, you could create just the domain without any other users and~~

+

−

~~default all users to @domain.com~~

+

−

~~To use fallback domains, you'll first need to activate this feature in~~

−

~~dspam.conf:~~

−

~~FallbackDomains~~ on

+

localStore { on | off }

+

Overrides the directory name used for the user's dspam data directory. This is useful when using recipient addresses as usernames, as it will allow all addresses belonging to a specific user to be written to a single webui directory. (default:username)

−

~~Next, you'll need to create a~~ dspam ~~user for each domain you wish to use~~

+

−

~~as a fallback domain~~. ~~For example~~, ~~@domain~~.~~com. Depending~~ on ~~your~~

+

processorBias { on | off }

−

~~implementation~~, ~~this may be a simple insert into dspam_virtual_uids or may~~

+

Overrides the "bias" setting in dspam.conf, which biases mail as innocent. (default:on, see dspam.conf)

−

~~be created automatically when setting a user's preferences~~.

+

−

~~Finally, designate that special user as a fallback domain by setting a~~

−

~~preference:~~

−

~~dspam_admin ch pref~~ @domain.com ~~fallbackDomain~~ on

+

fallbackDomain { on | off }

+

Allows a dspam user ("@domain.com") to be marked as a fallback user for the entire domain, so if the destination dspam user does not exist in the database, the fallback user's database will be used. The dspam.conf "FallbackDomains" setting must also be "on". (default:off)

−

~~Any mail coming in for that domain that does _not_ match a known user in~~

+

''NOTE:''

−

~~dspam~~ will ~~now fall back~~ to ~~this user; you can then~~ set ~~specific preferences~~

+

You will need to set "FallbackDomains on" in dspam.conf to use this.

−

~~or even opt out the entire user~~. ~~Alternatively, you can create a domain-based~~

+

−

~~database for filtering mail specific~~ to ~~that domain, just as you would a~~

+

−

~~normal user~~.

+

−

~~== BUGS, PORTS, AND THE LIKE ==~~

−

~~----~~

+

trainPristine { on | off }

+

Override's the default signature mode and treats messages as if they were in pristine format when retraining. This requires all retraining to use the original message that was processed as no dspam signature is stored for pristine training. (default:off)

−

~~Please see http://~~dspam.~~nuclearelephant.com/bugs.shtml for the current known~~

+

optOutClamAV { on | off }

−

~~bugs list and proper reporting procedure~~.

+

Opts out of ClamAV virus scanning (if ClamAV is directly integrated with dspam via dspam.conf). (default:off)

−

~~If you port DSPAM to another platform, or would like to submit changes to~~

−

~~the distribution, please email a diff along with any other pertinent~~

−

~~information to the dspam-dev mailing list.~~

−

~~Note~~:

+

ignoreRBLLookups { on | off }

+

Overrides the "Lookup" setting in dspam.conf, which lookups senders IP addresses in a Realtime Blackhole List (RBL). (default:off)

−

~~In order to keep DSPAM unencumbered by intellectual property abuses, all~~

−

~~external contributors to the project are asked to release any rights to the~~

−

~~submission. This keeps the DSPAM project a healthy, unencumbered GPL project.~~

−

~~Please accompany your patch, code, or other submission with the following~~

−

~~statement. By submitting a patch to the project, you agree to be bound by~~

−

~~the terms of this statement whether it is specifically included in the~~

−

~~submission or not, however we still require that it be attached to the~~

−

~~submission:~~

−

~~The author or authors of this submission hereby release any and all~~

+

RBLInoculate { on | off }

−

~~copyright interest~~ in ~~this code~~, ~~documentation, or other materials~~

@@ Line 16: / Line 16: @@
+'''CREDITS'''
-== OVERVIEW ==
+Original Work By:
+*Lead development: Jonathan A. Zdziarski <[email protected]>
+*Postgres driver: Rustam Aliyev <[email protected]>
+Various:
+*Feb/2006 Cove Schneider <[email protected]>
+*Jan/2006 Norman Maurer <[email protected]>
-----
+Your name is missing? Let us know with a reference to your commit, and we'll
+add you to the list.
+'''COPYRIGHT'''
+Original work was done by Jonathan A. Zdziarski.
+In 2006 the copyright was handed over to Sensory Networks.
+In 2009 Sensory Networks handed over the full copyright to the DSPAM Project.
+As of 12 January 2009 the copyright is owned by the DSPAM Project, represented by a team of people, including:
+* Alexander Prinsier
+* Ion-Mihai Tetcu
+* Paul Cockings
+* Dov Zamir
+* Stevan Bajic
+<br>
+== OVERVIEW ==
+----
 DSPAM is an open-source, freely available anti-spam solution designed to combat unsolicited commercial email using advanced statistical analysis. In short, DSPAM filters spam by learning what spam is and isn't. It does this by learning each user's individual mail behavior. This allows DSPAM to provide highly-accurate, personalized filtering for each user on even a large system and provides an administratively maintenance free solution capable of learning each user's email behaviors with very few false positives.
@@ Line 36: / Line 61: @@
-''PLEASE NOTE:'' DSPAM and libdspam are distributed under the GPL license, not the LGPL. Commercial licensing is available for those who seek to redistribute DSPAM or some of DSPAM's components/libraries in their non-GPL products. Please contact [email protected] for more information about commercial licensing.<br>
+''PLEASE NOTE:''<br>
-<br>
+DSPAM and libdspam are distributed under the GPL license, not the LGPL. Commercial licensing is available for those who seek to redistribute DSPAM or some of DSPAM's components/libraries in their non-GPL products. Please contact us for more information about commercial licensing.
 The DSPAM package is split up into the following pieces:
@@ Line 80: / Line 106: @@
   [MTA] ---> [LDA] ---> (User's Mailbox)
 AFTER:
@@ Line 132: / Line 159: @@
 Follow the steps sequentially from the base version you are running up to the top.
+<br>
+==== Upgrading from 3.8 ====
+. Ensure MySQL is using the new database schema. The following clauses should be executed for upgrading pre-3.9.0 DSPAM MySQL schema to the 3.9.0 schema:
+ ALTER TABLE `dspam_signature_data`
+  CHANGE `uid` `uid` INT UNSIGNED NOT NULL,
+  CHANGE `data` `data` LONGBLOB NOT NULL,
+  CHANGE `length` `length` INT UNSIGNED NOT NULL;
+ ALTER TABLE `dspam_stats`
+  CHANGE `uid` `uid` INT UNSIGNED NOT NULL,
+  CHANGE `spam_learned` `spam_learned` BIGINT UNSIGNED NOT NULL,
+  CHANGE `innocent_learned` `innocent_learned` BIGINT UNSIGNED NOT NULL,
+  CHANGE `spam_misclassified` `spam_misclassified` BIGINT UNSIGNED NOT NULL,
+  CHANGE `innocent_misclassified` `innocent_misclassified` BIGINT UNSIGNED NOT NULL,
+  CHANGE `spam_corpusfed` `spam_corpusfed` BIGINT UNSIGNED NOT NULL,
+  CHANGE `innocent_corpusfed` `innocent_corpusfed` BIGINT UNSIGNED NOT NULL,
+  CHANGE `spam_classified` `spam_classified` BIGINT UNSIGNED NOT NULL,
+  CHANGE `innocent_classified` `innocent_classified` BIGINT UNSIGNED NOT NULL;
+ ALTER TABLE `dspam_token_data`
+  CHANGE `uid` `uid` INT UNSIGNED NOT NULL,
+  CHANGE `spam_hits` `spam_hits` BIGINT UNSIGNED NOT NULL,
+  CHANGE `innocent_hits` `innocent_hits` BIGINT UNSIGNED NOT NULL;
+If you are using preference extension with DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM preference MySQL schema to the 3.9.0 schema:
+ ALTER TABLE `dspam_preferences`
+  CHANGE `uid` `uid` INT UNSIGNED NOT NULL;
+If you are using virtual users (with AUTO_INCREMENT) in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids MySQL schema to the 3.9.0 schema:
+ ALTER TABLE `dspam_virtual_uids`
+  CHANGE `uid` `uid` INT UNSIGNED NOT NULL AUTO_INCREMENT;
+If you are using virtual user aliases (aka: DSPAM in relay mode) in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids MySQL schema to the 3.9.0 schema:
+ ALTER TABLE `dspam_virtual_uids`
+       CHANGE `uid` `uid` INT UNSIGNED NOT NULL;
+If you need to speed up the MySQL purging script and can afford to use more disk space for the DSPAM MySQL data, then consider executing the following clause for adding three additional indices:
+ ALTER TABLE `dspam_token_data`
+  ADD INDEX(`spam_hits`),
+  ADD INDEX(`innocent_hits`),
+  ADD INDEX(`last_hit`);
+. Ensure PosgreSQL is using the new database schema. The following clauses should be executed for upgrading pre-3.9.0 DSPAM PosgreSQL schema to the 3.9.0 schema:
+ ALTER TABLE dspam_preferences ALTER COLUMN uid TYPE integer;
+ ALTER TABLE dspam_signature_data ALTER COLUMN uid TYPE integer;
+ ALTER TABLE dspam_stats ALTER COLUMN uid TYPE integer;
+ ALTER TABLE dspam_token_data ALTER COLUMN uid TYPE integer;
+ DROP INDEX IF EXISTS id_token_data_sumhits;
+If you are using virtual users in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids to the 3.9.0 schema:
+ ALTER TABLE dspam_virtual_uids ALTER COLUMN uid TYPE integer;
+<br>
 ==== Upgrading From 3.6 ====
@@ Line 171: / Line 255: @@
-''NOTE:''
+''NOTE:''<br>
 Berkeley DB drivers (libdb3_drv, libdb4_drv) are deprecated and have been removed from the build. You will need to select an alternative storage driver in order to upgrade.
@@ Line 179: / Line 263: @@
 ----
 <br>
 '''PREREQUISITES'''
@@ Line 206: / Line 291: @@
 You can download MySQL from http://www.mysql.com.
 You can download PostgreSQL from http://www.postgresql.com.
 You can download SQLite from http://www.sqlite.org.
@@ Line 252: / Line 339: @@
   --with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
-Specify your storage driver selection(s).  A storage driver is a driver written specifically for DSPAM to store tokens, signature data, andperform other proprietary operations.  The default driver is hash_drv. The following drivers have been provided:
+Specify your storage driver selection(s).  A storage driver is a driver written specifically for DSPAM to store tokens, signature data, and perform other proprietary operations.  The default driver is hash_drv. The following drivers have been provided:
   mysql_drv:   MySQL Drivers
@@ Line 277: / Line 364: @@
-''Note:''
+''NOTE:''<br>
 This function is incompatible with most implementations of the Web UI, since it requires access to read each user's home directory. Therefore, only use this option if you will not be using the Web UI or plan on doing something asinine like running it as root.
@@ Line 286: / Line 372: @@
 <br>
 ===== DRIVER SPECIFIC CONFIGURE SWITCHES =====
@@ Line 312: / Line 399: @@
-''Note:''
+''NOTE:''<br>
 Please see the file doc/mysql_drv.txt for more information about configuring the mysql_drv storage driver.
@@ Line 334: / Line 420: @@
-''Note:''
+''NOTE:''<br>
 Please see the file doc/pgsql_drv.txt for more information about configuring the pgsql_drv storage driver.
@@ Line 352: / Line 437: @@
 <br>
 ===== DEBUGGING SWITCHES =====
   --enable-debug
@@ Line 362: / Line 446: @@
-''Note:''
+''NOTE:''<br>
 When verbose debug is compiled in, DSPAM performs many additional mathematical calculations regardless of whether or not it's been activated. You shouldn't use --enable-verbose for production builds unless you have serious issues you can't resolve.
@@ Line 378: / Line 461: @@
 <br>
 ==== BUILDING AND INSTALLING ====
@@ Line 386: / Line 470: @@
-''Note:''
+''NOTE:''<br>
 If you are a developer wanting to link to the core engine of dspam, libdspam will be built during this process.  Please see the example.c file for examples of how to link to and use libdspam. Static and dynamic libraries are built in the .libs directory. Needed headers will be installed in $prefix$/include/dspam.
 <br>
 ==== PERMISSIONS ====
@@ Line 397: / Line 482: @@
 The CGI User: This is the user your web server (most likely Apache) is running as. This is commonly 'nobody' or 'web'. You can find this in Apache's httpd.conf by searching for 'User'. The CGI user will need the ability to access the following components of DSPAM:
- - Ability to execute the dspam binary
+* Ability to execute the dspam binary
- - Ability to read and write to dspam_home/data/
+* Ability to read and write to dspam_home/data/
- - Trusted user permissions in dspam.conf ("Trust [username]")
+* Trusted user permissions in dspam.conf ("Trust [username]")
- - The execution 'Group' used must match the group dspam is running as
+* The execution 'Group' used must match the group dspam is running as (this is typically 'mail', 'dspam', or similar).
-   (this is typically 'mail', 'dspam', or similar)
 The MTA User: This is the user your mail server software is running as when it executes DSPAM. This is usually daemon, mail, exim, etc. This is typically different from the user the MTA runs and polices itself as, to avoid security problems. Consult your MTA's documentation for more info. The MTA user will require:
- - The ability to execute the dspam binary
+* The ability to execute the dspam binary
- - Trusted user permissions in dspam.conf ("Trust [username]")
+* Trusted user permissions in dspam.conf ("Trust [username]")
 Systems Administrators: In order to perform administrative functions, systems administratiors will require:
- - The ability to execute dspam-related binaries
+* The ability to execute dspam-related binaries
- - Trusted user permissions in dspam.conf ("Trust [username]")
+* Trusted user permissions in dspam.conf ("Trust [username]")
-''Note:''
+''NOTE:''<br>
 If the MTA is communicating with DSPAM via LMTP (explained later), then execution permissions are not necessary.
-''Note about FreeBSD:''
+''NOTE about FreeBSD:''<br>
 FreeBSD's default MTA user is 'mailnull' FreeBSD's default delivery agent also changes its uid, and so in order to call it, dspam must be installed as setuid root to work on the commandline properly. This is done automatically on install.
@@ Line 436: / Line 518: @@
 <br>
-==== MAIL SERVER INTEGRATION ====
+==== MAIL SERVER INTEGRATION ====
 As previously mentioned, there are three popular ways to implement DSPAM:
@@ Line 471: / Line 553: @@
 <br>
 ===== ALIASES =====
 There are essentially two different ways a user might train DSPAM. The first is by using the Web UI, which allows them to retrain via the "History" tab. This works quite well, as users must visit the Web UI occasionally to review their quarantine anyway (and reverse any false positives). We'll discuss this shortly in section 1.1.8.
@@ Line 478: / Line 559: @@
-''Note:''
+''NOTE:''<br>
 If you are using an IMAP based system, Web-based email, or other form of email management where the original messages are stored on the server in pristine format, you can turn this signature feature off by setting "TrainPristine on" in dspam.conf. DSPAM will then use the message itself that you provide it to train, which MUST be identical to the original message in order to retrain properly.
@@ Line 485: / Line 565: @@
 Because DSPAM learns each user's specific email behavior, it's necessary to identify the user in order to program their specific filtering database. This can be done in one of three ways:
-<br>
-====== The Simple Way ======
+''' The Simple Way '''
 If you are using the MySQL or PgSQL storage drivers, the original numeric user id can be embedded in the signature, requiring only one central spam alias to be necessary for the entire system. To configure this, uncomment the appropriate UIDInSignature option in dspam.conf:
@@ Line 501: / Line 582: @@
-''Note:''
+''NOTE:''<br>
 The 'root' user represents any active dspam user. It is necessary to supply a username on the commandline or DSPAM will bail on an error, however the user will be changed internally once the signature is read.
-<br>
-====== The Kind-of-Simple Way ======
+''' The Kind-of-Simple Way '''
 If you're not using one of the above storage drivers, the next easiest way to configure aliases is to have DSPAM parse the 'To:' header of the message and use a catch-all subdomain to direct all mail into DSPAM for retraining. You can then instruct your users to email addresses like '[email protected]'. The ParseToHeaders option (available in dspam.conf) will parse the To: header of forwarded messages and set the username to either 'bob' or '[email protected]', depending on how it is configured. DSPAM can also set the training mode to either "learn spam" or "learn notspam" depending on whether the user specified a spam- or notspam- address in the To: header.
@@ Line 520: / Line 601: @@
   ChangeModeOnParse on
-<br>
-====== The Old Way (A.K.A. The Hard Way) ======
+''' The Old Way (A.K.A. The Hard Way) '''
 If neither of the easy ways are possible, you're stuck with doing it the hard way. This means you'll need a separate spam alias (and notspam alias, if users are tagging mail) for each user. To do this, you will need to create an email address for each user, so that DSPAM can analyze and learn for that specific user.  For example:
@@ Line 535: / Line 617: @@
-''Note About Security:''
+''NOTE about Security:''
 You might be wondering if a user can forward a spam to another user's address, or whether a spammer can forward a spam to another user's notspam address. The answer is "no". The key to all mail-based retraining is the signature embedded in each email. The signature is stored with each user's own user id, and so not only does the incoming message have to bear a valid signature, but it also has to be stored on the system with the correct user id. This prevents any kind of alias abuse.
 <br>
 ==== NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS ====
 <br>
 ===== Non-SQL Based Nightly Purge =====
 If you are NOT running a SQL-based solution, then you should configure dspam_clean to run under cron nightly. This clean tool will read all signature databases and purge signatures that are older than 14 days (configurable), purge abandoned tokens, and remove unimportant tokens. Without this tool, old signatures will continue to pile up. Be sure the user running cleanup has full read/write permissions on the DSPAM data files.
 0 * * * /usr/local/bin/dspam_clean [options]
@@ Line 551: / Line 632: @@
 <br>
 ===== SQL-Based Nightly Purge =====
 SQL-Based solutions include a nightly SQL script to perform the same basic
 tasks as dspam_clean, and it does it much faster and with more finesse.
@@ Line 562: / Line 642: @@
 <br>
 ===== Log Rotation =====
 The system log and user logs can fill up fairly quickly, when all that's really needed to generate graphs are the last two to three weeks of data. You can configure a nightly log cleanup using dspam_logrotate:
@@ Line 568: / Line 647: @@
 <br>
 ==== NOTIFICATIONS ====
 DSPAM is capable of sending three different notifications to users:
+* A "First Run" message sent to each user when they receive their first message through DSPAM.
- - A "First Run" message sent to each user when they receive their first message through DSPAM.
+* A "First Spam" message sent to each user when they receive their first spam
- - A "First Spam" message sent to each user when they receive their first spam
+* A "Quarantine Full" message sent to each user when their quarantine box is > 2MB in size.
- - A "Quarantine Full" message sent to each user when their quarantine box is > 2MB in size.
 These notifications can be activated by copying the txt/ directory from the distribution into DSPAM's home (by default /usr/local/var/dspam).  You will want to modify these templates prior to installing them to reflect the correct email addresses and URLs (look for 'configureme' and 'yourdomain').
-''NOTE:''
+''NOTE:''<br>
 The quarantine warning is reset when the user clicks 'Delete All', but is not reset if they use "Delete Selected".  If the user doesn't wish to receive reminders, they should use the "Delete Selected" function instead of "Delete All".
@@ Line 585: / Line 663: @@
 <br>
-==== THE WEB UI ====
+==== THE WEB UI ====
 The Web UI (CGI client) can be run from any executable location on a web server, and detects its user's identity from the REMOTE_USER
 environment variable. This means you'll need to use HTTP password authentication to access the CGI (Any type of authentication will work, so long as Apache supports the module). This is also convenient in that you can set up authentication using almost any existing system you have. The only catch is that you'll need the usernames to match the actual DSPAM usernames used the system. A copy of the shadow password file will suffice for most common installs.
@@ Line 594: / Line 672: @@
-''Note:''
+''NOTE:''<br>
 Some authentication mechanisms are case insensitive and will authenticate the user regardless of the case they type it in.  DSPAM, on the other hand, is case sensitive and the case of the username used will need to match the case on the system.  If you suffer from this authentication problem, and are certain all of your users' usernames are in lowercase, you can add the following line of code to the CGI right after the call to &ReadParse...
@@ Line 603: / Line 680: @@
-''Note:''
+''NOTE:''<br>
 Apache users do NOT take on the identity of the groups specified in /etc/group so you will need to specifically assign the group in httpd.conf.
-''Note about Procmail:''
+''NOTE about Procmail:''<br>
 Because the DSPAM Web UI is a CGI script, DSPAM will not retain its setuid privileges when called. If you are running procmail, this will become a problem as procmail requires root privileges to deliver. The easiest hack around this is to create a procmail.dspam binary and make it setuid root, then make it executable only by the mail group (or whatever group DSPAM and the CGI run in).
-The DSPAM Web UI has a minimal configuration inside the configure.pl script. You'll want to check and make sure all of the settings are correct. In most cases, the only that will be necessary to change are the large-scale or domain-scale flags.
+The DSPAM Web UI has a minimal configuration inside the configure.pl script. You'll want to check and make sure all of the settings are correct. In most cases, the only settings that will be necessary to change are the large-scale or domain-scale flags.
@@ Line 649: / Line 724: @@
 The following PERL modules (http://www.perl.com/CPAN/modules/by-module/GD/):
- . GD
+* GD
- . GD-Graph3d
+* GD-Graph3d
- . GDGraph
+* GDGraph
- . GDTextUtil
+* GDTextUtil
- . CGI
+* CGI
 Typically this can be accomplished on the commandline:
@@ Line 668: / Line 743: @@
 '''Opt-In/Out'''
-If you would like your users to be able to opt in/out of DSPAM filtering, add the correct option to the nav_preferences.html template, depending on your configuration (for example, if you have an opt-in system, you'll want to add the opt-in option). Note: This currently only works with the preferences extension, and not drop files.
+If you would like your users to be able to opt in/out of DSPAM filtering, add the correct option to the nav_preferences.html template, depending on your configuration (for example, if you have an opt-in system, you'll want to add the opt-in option).
+''NOTE:''<br>
+This currently only works with the preferences extension, and not drop files.
   <INPUT TYPE=CHECKBOX NAME=optIn $C_OPTIN$>
   Opt into DSPAM filtering
@@ Line 674: / Line 753: @@
   <INPUT TYPE=CHECKBOX NAME=optOut $C_OPTOUT$>
   Opt out of DSPAM filtering
+<br>
 === TESTING ===
 -----
-<br>
 If you've installed from an RPM, there's a good chance that the packager went to the trouble of testing already. If you're building from sources,however, you'll need to find a way to ensure your configuration isn't broken.
@@ Line 686: / Line 766: @@
 Before running the test, you should have completed section 1.1's instructions for compiling and installing dspam as well as configured your mail server to support dspam.
+<br>
 ==== 1. Create a new user account on your system ====
 It is important that this be a new account to prevent any unrelated email from being delivered during testing.  Be sure to configure a spam alias for the test account.
+<br>
 ==== 2. Send a short email ====
 Send a short email (10 words or less) to the account, and pick it up using your favorite mail client.
+<br>
 ==== 3. Run dspam_stats ====
   dspam_state [username]
@@ Line 705: / Line 784: @@
 If you receive an error such as "unable to open /usr/local/var/dspam... for reading", then the dspam agent is not configured correctly. The problem could exist in either your mail server configuration or one or more of the permissions on the directory or agent.  Check your configuration and permissions, and repeat this step until the correct results are experienced.
+<br>
 ==== 4. Run dspam_dump ====
   dspam_dump [username]
@@ Line 722: / Line 801: @@
   7717766825815048192  S: 00265  I: 00068  P: 0.7358
+<br>
 ==== 5. Forward the test message ====
 Forward the test message to the spam alias you've created for the test account. Provide enough time for the message to have processed.
+<br>
 ==== 6. Run dspam_stats again ====
   dspam_state [username]
@@ Line 733: / Line 812: @@
 If this is not the case, check the group permissions of the dspam agent as well as the permissions your MTA uses when piping to aliases.
+<br>
 ==== 7. Run dspam_dump [username] again ====
 dspam_dump [username]
@@ Line 742: / Line 821: @@
   8851970219880318167              S:    1  I:    0  LH: Mon Aug  4 11:44:29 2003
 If you have some tokens that do not have an S: of 1 or an I: of 0, the dspam signature was not found on the email, and this could be due to a lot of things.
+<br>
 === TROUBLESHOOTING ===
 -----
-<br>
 ''Problem:''
 No files are being created in the user directory
@@ Line 752: / Line 832: @@
 Check the directory permissions of the directory.  The user directory must be writable by the user the dspam agent is running as as well as the CGI user.
-----
+<br>
 ''Problem:''
 False positives are never being delivered
@@ Line 760: / Line 840: @@
 Your CGI most likely doesn't have the privileges required by the LDA to deliver the messages.  Make sure the CGI user is in the correct group. Also consider setting the dspam agent to setuid or setgid with the correct permissions.
-----
+<br>
 ''Problem:''
 My database is getting huge!
 ''Solution:''
-DSPAM's default training mode is TEFT. On top of this, the purging defaults are very lax. You might consider switching to TOE (Train-on-Error) mode training if you require a minimal database. If you are willing to sacrifice accuracy for disk space, disabling the 'chain' tokenizer from dspam.conf will prevent the use of multi-word (chained) tokens, which will also cut your database size considerably. You may also consider more frequent calls to dspam_clean -p to purge neutral data, which comprises a majorrity of most databases. For more help, please see the DSPAM FAQ at http://dspam.nuclearelephant.com.
+DSPAM's default training mode is TEFT. On top of this, the purging defaults are very lax. You might consider switching to TOE (Train-on-Error) mode training if you require a minimal database. If you are willing to sacrifice accuracy for disk space, disabling the 'chain' tokenizer from dspam.conf will prevent the use of multi-word (chained) tokens, which will also cut your database size considerably. You may also consider more frequent calls to dspam_clean -p to purge neutral data, which comprises a majorrity of most databases.
+For more help, please see the DSPAM FAQ at http://dspam.sourceforge.net.
 <br>
@@ Line 772: / Line 855: @@
 === DSPAM TOOLS ===
 -----
-<br>
 A few useful tools have been provided to make DSPAM management a bit easier. These tools include:
   dspam_admin
@@ Line 782: / Line 864: @@
 Syntax: dspam_train [username] [spam_dir] [nonspam_dir] where username is the username of the user to apply the training to, and the two dirs represent directories containing messages in individual files (e.g. maildir/corpus format). dspam_train can be used on an existing user's database, to further improve accuracy, or to train from scratch. It also provides a solid test jig for testing the efficiency and accuracy of a test corpus against the filter.
-''NOTE:''
+''NOTE:''<br>
 dspam_train will automatically balance training of the corpus to ensure both spam and nonspam are trained based on the ratio of spam/nonspam. this means if you have twice as much spam as nonspam, two spam will be trained for every nonspam.
@@ Line 795: / Line 876: @@
   dspam_clean
 Performs nightly housecleaning by deleting old or useless data from user data.  dspam_clean performs the following operations:
 . Using the -s flag, dspam_clean will continue to perform stale signature purging.  If an age is specified, for example -s14, the age defined as the default will be overridden. Specifying an age of 0 will delete all signatures for the users processed.
@@ Line 804: / Line 886: @@
   - Tokens which have only one spam hit
   - Tokens which have only one innocent hit
 Ages may be overridden by specifying a format such as -u30,15,10,10 where each number represents the respective age.  Specifying an age of zero will delete all unused tokens in the category. Defaults are set in dspam.conf.
@@ Line 827: / Line 909: @@
   dspam_clean -s -p -u
+''NOTE:''<br>
+You may wish to only run certain cleaning modes depending on the type of storage driver you are using.  For example, the MySQL storage driver includes a script which performs signature and unused token operations, leaving only probability operations as useful.  If you are using a SQL-based storage driver, it is strongly recommended that you use the maintenance scripts wherever possible for optimum efficiency.
-''NOTE:''
-You may wish to only run certain cleaning modes depending on the type of storage driver you are using.  For example, the MySQL storage driver includes a script which performs signature and unused token operations, leaving only probability operations as useful.  If you are using a SQL-based storage driver, it is strongly recommended that you use the maintenance scripts wherever possible for optimum efficiency.
-dspam_stats
+ dspam_stats
 Displays the spam statistics for one or all users on the system.
 Syntax: dspam_stats [username]
@@ Line 838: / Line 919: @@
-dspam_genaliases
+ dspam_genaliases
 Reads the /etc/passwd file and outputs a dspam aliases table which can be included in the master aliases table.  You may try Art Sackett's generate_dspam_aliases tool at http://www.artsackett.com/freebies/generate_dspam_aliases/ if you need some better functionality.  This will eventually be merged in as a replacement for the existing tool.
-dspam_merge
+ dspam_merge
 Merges multiple users' dictionaries together into one user's dictionary (does not affect the merge users).  This can be used to create a seeded dictionary for a new user, or to copy a single user's dictionary to a new file.  This is great for building global dictionaries, but crunches a lot of time and disk.
-=== AGENT COMMANDLINE ARGUMENTS ===
+<br>
+=== AGENT COMMANDLINE ARGUMENTS ===
+-----
+<br>
 ==== Specifying a User ====
 The DSPAM agent (dspam) recognizes the following commandline arguments:
   --user [user1 user2 ... userN]
 Specifies the destination user(s) of the incoming message.  DSPAM then processes the message once for each user individually.  If the message is to be delivered, the $u (or %u) parameters of the arguments string will be interpolated for the current user being processed.
-<br>
+<br>
 ==== Classification ====
   --class=[spam|innocent]
 Tells DSPAM that the message being presented has already been classified by the user.  This flag should be used when a misclassification has occurred, when the user is corpus-feeding a message, or an inoculation is being presented.  This flag must be used in conjunction with the --source flag. Providing no classification invokes the SOP of DSPAM, which is to determine the message's nature on its own.
-<br>
+<br>
 ==== Source ====
   --source=[error|corpus|inoculation]
 Wherever --class is used, the source of the user-provided classification must also be provided.  The source is very important and dramatically affects DSPAM's training behavior:
-<br>
 '''error:'''<br>
 The message being presented was a message previously misclassified by DSPAM.  When 'error' is provided as a source, DSPAM requires that the DSPAM signature be present in the message, and will use the signature to recall the original training metadata.  If the signature is not present, the message will be rejected.  In this source mode, DSPAM will also decrement each token's previous classification's count as well as the user totals.
-<br>
-You should use error only when DSPAM has made an error in classifying the message, and should present the modified version of the message with the DSPAM signature when doing so.
+''You should use error only when DSPAM has made an error in classifying the message, and should present the modified version of the message with the DSPAM signature when doing so.''
-<br>
 '''corpus:'''<br>
 The message being presented is from a mail corpus, and should be trained as a new message, rather than re-trained based on a signature.  The message's full headers and body will be analyzed and the correct classification will be incremented, without its opposite being decremented.
-<br>
-You should use corpus only when feeding messages in from corpus, not for correcting errors.<br>
+''You should use corpus only when feeding messages in from corpus, not for correcting errors.<br>''
 '''inoculation:'''<br>
 The message being presented is in pristine form, and should be trained as an inoculation.  Inoculations are a more intense mode of training designed to cause DSPAM to train the user's metadata repeatedly on previously unknown tokens, in an attepmt to vaccinate the user from future messages similar to the one being presented.
-<br>
-You should use inoculation only on honeypots and the like.
+''You should use inoculation only on honeypots and the like.''
 <br>
@@ Line 886: / Line 969: @@
   --deliver=[innocent,spam]
 Tells DSPAM to deliver the message if its result falls within the criteria specified.  For example, --deliver=innocent will cause DSPAM to only deliver the message if it classifies as innocent.  Providing --deliver=innocent,spam will cause DSPAM to deliver the message regardless of its classification.  This flag provides a significant amount of flexibility for nonstandard implementations, where false positives may not be delivered but spam is, and etcetera.
-<br>
   --stdout
 If the message is indeed deemed "deliverable" by the --deliver flag, this flag will cause DSPAM to deliver the message to stdout, rather than the configured delivery agent.
-<br>
   --process
 Tells DSPAM to process the message.  This is the default behavior, and the flag is implied unless --classify is used - but is a good idea to use to avoid ambiguity.
-<br>
   --classify
 Tells DSPAM only to classify the message, and not make any writes to the user's metadata or attempt to deliver/quarantine the message.
-<br>
 ''NOTE:''<br>
 The output of the classification is specific to the user, not including the output of any groups they might be affiliated with, so it is entirely possible that the message would be caught as spam by the group, even if it didn't appear in the classification.  If you want to get the classification for the GROUP, use the group name as the user instead of an individual.
 <br>
 ==== Signatures ====
   --signature=[signature]
 For some implementations, the admin may wish to pass the signature in via commandline instead of allowing DSPAM to find it on its own. This is especially useful when front-ending the agent with other tools. Using this option will set the active signature and will also forego reading of stdin.
-<br>
+<br>
 ==== Training Modes ====
   --mode=[toe|tum|teft|notrain|unlearn]
 Configures the training mode to be used for this process:
-<br>
+<br>
 ===== TEFT =====
 Train-Everything.  Trains on all messages processed.  This is a very thorough training approach and should be considered the standard training approach for most users.  TEFT may, however, prove too volatile on installations with extremely high per-user traffic, or prove not very scalable on systems with extremely large user-bases.  In the event that TEFT is proving ineffective, one of the other modes is recommended.
@@ Line 924: / Line 1,005: @@
 Until a user reaches 100 innocent messages in their metadata, train-on-error will also be teft-based, even if otherwise specified on the commandline.
+<br>
 ===== TOE =====
 Train-on-Error.  Trains only on a classification error, once the user's metadata has matured to 2500 innocent messages.  This training mode is much less resource intensive, as only occasional metadata writes are necessary.  It is also far less volatile than the TEFT mode of training.  One drawback, however, is that TOE only learns when DSPAM has made a mistake - which means the data is sometimes too static, and unable to "ease into" a different type of behavior.
-<br>
+<br>
 ===== TUM =====
 Train-until-Mature.  This training mode is a hybrid between the other two training modes and provides a great balance between volatility and static metadata.  TuM will train on a per-token basis only tokens which have had fewer than 50 "hits" on them, unless an error is being retrained in which case all tokens are trained.  This training mode provides a solid core of stable tokens to keep accuracy consistent, but also allows for dynamic adaptation to any new types of email behavior a user might be experiencing. It is a balance of resources as well, as only less-than-mature tokens are written to the database. NOTE: You should corpus train before using tum.
-<br>
+<br>
 ===== NOTRAIN =====
 No training.  Do not train the user's data, and do not keep totals. This should only be used in cases where you want to process mail for a particular user (based on a group, for example), but don't want the user to accumulate any learning data.
-<br>
+<br>
 ===== UNLEARN =====
 Unlearn original training. Use this if you wish to unlearn a previously learned message. Be sure to specify --source=error and --class to whatever the original classification the message was learned under. If not using TrainPristine, this will require the original signature from training.
-<br>
 '''RECOMMENDATIONS'''
 In general, it is recommended that users begin with TEFT.  If a user is experiencing between a 75-85% spam ratio, they may benefit from Train-on-Mature mode.  If a user is experiencing over 90% spam, then Train-on-Error mode should make a noticeable improvement in accuracy. It eventually boils down to what works best for your users.  There is no reason a system could not be configured (with a script) to analyze a user's *.stats file and determine the best training mode for that user.
-<br>
+<br>
 ==== Features ====
-  --feature=[noise,whitelist,tb=N]
+  --feature=[no,wh,tb=N]
 Specifies the features that should be activated for this filter instance. The following features may be used individually or combined using a comma as a delimiter:
-<br>
-'''noise:'''<br>
-Bayesian Noise Reduction (BNR).  Bayesian Noise Reduction kicks in at 2500 innocent messages and provides an advanced progressive noise logic to reduce Bayesian Noise (wordlist attacks) in spams.  See http://bnr.nuclearelephant.com for more information. BNR is not for everyone, and so users should try it out after they've trained to see if it helps improve accuracy.
-<br>
-'''tb=N:'''<br>
+''no:''
-Sets the training loop buffering level. Training loop buffering is the amount of statistical sedation performed to water down statistics and avoid false positives during the user's training loop.  The training buffer sets the buffer sensitivity, and should be a number between 0 (no buffering whatsoever) to 10 (heavy buffering).  The default is 5, half of what previous versions of DSPAM used. To avoid dulling down statistics at all during the training loop, set this to 0. This feature should be disabled if you're not paranoid about false positives, as it does increase the number of spam misses significantly during training.
-<br>
+Bayesian Noise Reduction (BNR). Bayesian Noise Reduction kicks in at 2500 innocent messages and provides an advanced progressive noise logic to reduce Bayesian Noise (wordlist attacks) in spams.  See http://bnr.nuclearelephant.com for more information. BNR is not for everyone, and so users should try it out after they've trained to see if it helps improve accuracy.
+''tb=N:''
+Sets the training loop buffering level. Training loop buffering is the amount of statistical sedation performed to water down statistics and avoid false positives during the user's training loop. The training  buffer sets the buffer sensitivity, and should be a number between 0 (no buffering whatsoever) to 10 (heavy buffering). The default is 5, half of what previous versions of DSPAM used. To avoid dulling down statistics at all during the training loop, set this to 0. This feature should be disabled if you're not paranoid about false positives, as it does increase the number of spam misses significantly during training.
+''wh:''
-'''whitelist:'''<br>
 Automatic whitelisting.  DSPAM will keep track of the entire "From:" line for each message received per user, and automatically whitelist messages from senders with more than 10 innocent messages and zero spams.  Once the user reports a spam from the sender, automatic whitelisting will automatically be deactivated for that sender.  Since DSPAM uses the entire "From:" line, and not just the sender's email address, automatic whitelisting is a very safe approach to improving accuracy during initial training.
-<br>
-'''NOTE:'''<br>
+''NOTE:''<br>
 None of the present features are necessary when the source is "error", because the original training data is used from the signature to retrain, instantiating whatever features (such as whitelisting) were active at the time of the initial classification.  Since BNR is only necessary when a message is being classified, the --feature flag can be safely omitted from error source calls.
 <br>
@@ Line 969: / Line 1,054: @@
   --daemon
 Puts DSPAM in daemon mode; e.g. DSPAM acts like a server when started with this parameter. See section 2.3 for more information about daemon mode.
 <br>
 == LINKING WITH LIBDSPAM ==
 ----
+Developers are able to link to the DSPAM core engine (libdspam) to provide "drop-in" spam-filtering for their applications.  Examples of the libdspam API can be found in the example.c file included with this distribution.
-  Developers are able to link to the DSPAM core engine (libdspam) to provide
-  "drop-in" spam-filtering for their applications.  Examples of the libdspam
-   API can be found in the example.c file included with this distribution.
    <COMMERCIAL LICENSING>
    IF YOUR PROJECT USES THE LIBDSPAM API, A GPL-COMPATIBLE OPEN SOURCE LICENSE
    IS REQUIRED IN ORDER TO REDISTRIBUTE. IF YOU ARE DEVELOPING A CLOSED-SOURCE
@@ Line 987: / Line 1,068: @@
    NOT REDISTRIBUTE ANY APPLICATIONS USING LIBDSPAM WITHOUT A COMMERCIAL
    LICENSE.
    COMMERCIAL LICENSING BENEFITS:
    - PRIORITY DEVELOPER SUPPORT
@@ Line 993: / Line 1,074: @@
    - NON-GPL REDISTRIBUTION PRIVILEGES
    - BUG AND FEATURE REQUEST PRIORITY
-   Please contact the author at [email protected] for information
+   Please contact the author at 'to be determined' for information
    about commercial licensing.
    </COMMERCIAL LICENSING>
    To link to libdspam, follow the instructions for compiling and installing
    DSPAM. When compiled, the libdspam static and shared libraries are also
    built. This library contains all the functions necessary to use dspam's
    filtering in your application.
    Your application will also need to link to the correct storage driver
    libraries. If you are using libdspam in a multithreaded application, you
    will need to either use a thread-safe storage driver or control access to
    libdspam using a mutex lock.
    If you are using libdspam in a multithreaded environment, each thread will
    require its own DSPAM context. Fortunately, you can attach the same
    database handle to each context using dspam_attach(). See the man page for
    more information.
    To build with the dspam API, you will also need the header files from
    the distribution.  You can copy these to /usr/include/dspam for ease of
    use, and then use -I/usr/include/dspam
    Please see example.c for API examples.
    If you are interested in linking libdspam with your project and have
    questions or concerns, please contact the dspam-dev mailing list.
+<br>
 === CONFIGURING GROUPS ===
+-----
+Groups enable a group of users to share information.  The following group types are supported:
-  Groups enable a group of users to share information.  The following
+<br>
-  group types are supported:
+==== SHARED GROUPS ====
+Enables users with similar email behavior to share the same dictionary while still maintaining a private quarantine box.  The benefits of this type of group are faster learning, and sharing a single spam alias.  Shared groups can have both positive and negative effects on accuracy.  If a shared group consists of users with similar, predictable email behavior, the users in the group can benefit from a larger dictionary of spam and faster learning (especially for newcomers in the group).  If a group consists of users with different email behavior, however, the users in the group will experience poor spam filtering and a higher number of false positives.
-  SHARED
+''NOTE:''<br>
-  Enables users with similar email behavior to share the same dictionary
+The SQL-based storage drivers support shared groups, but has one caveat:
-  while still maintaining a private quarantine box.  The benefits of this
+If you are NOT enabling "virtual users" support, you will need to create an actual user on your system named after each group you create.
-  type of group are faster learning, and sharing a single spam alias.  Shared
-  groups can have both positive and negative effects on accuracy.  If a shared
-  group consists of users with similar, predictable email behavior, the users
-  in the group can benefit from a larger dictionary of spam and faster
-  learning (especially for newcomers in the group).  If a group consists of
-  users with different email behavior, however, the users in the group will
-  experience poor spam filtering and a higher number of false positives.
-  NOTE
+On top of shared group support, a shared group can also be made to be 'managed'.  Using the group type 'SHARED,MANAGED' will cause the group to share a single quarantine mailbox which could be managed by the group's administrator.  This would enable one individual to monitor quarantine for the entire group, however personal emails marked as false positives could potentially be viewed as well.  For this reason, managed groups should only be used when this is not an issue.
-    The SQL-based storage drivers support shared groups, but has one caveat:
-    If you are NOT enabling "virtual users" support, you will need to create
-    an actual user on your system named after each group you create.
-  On top of shared group support, a shared group can also be made to be
+<br>
-  'managed'.  Using the group type 'SHARED,MANAGED' will cause the group to
+==== INOCULATION GROUPS ====
-  share a single quarantine mailbox which could be managed by the group's
+An inoculation group allows users to maintain their own private dictionaries with their own spam alias, but all members of the group will inoculate other members with spams they manually forward into their alias.  This allows users to report spams to one another and maintain their own private dictionary.  Another advantage to this is that users do not necessarily have to share the same email behavior.
-  administrator.  This would enable one individual to monitor quarantine for
-  the entire group, however personal emails marked as false positives could
-  potentially be viewed as well.  For this reason, managed groups should only
-  be used when this is not an issue.
-  INOCULATION
+''NOTE:''<br>
-  An inoculation group allows users to maintain their own private dictionaries
+Users should only be added to an inoculation group after their initial learning period, to avoid potential false positives due to lack of data.
-  with their own spam alias, but all members of the group will inoculate other
-  members with spams they manually forward into their alias.  This allows
-  users to report spams to one another and maintain their own private
-  dictionary.  Another advantage to this is that users do not necessarily have
-  to share the same email behavior.
-  NOTE: Users should only be added to an inoculation group after their initial
-        learning period, to avoid potential false positives due to lack of data.
    To create groups, you'll want to create a file with the filename 'group'
@@ Line 1,082: / Line 1,144: @@
    group.
-  CLASSIFICATION
+<br>
+==== CLASSIFICATION GROUPS ====
    Classification groups allow a group of users to network their results
    together.  If DSPAM is uncertain of whether a message is spam or nonspam for
@@ Line 1,128: / Line 1,191: @@
    established between both parties.
-  GLOBAL GROUPS
+<BR>
+==== GLOBAL GROUPS ====
    Global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
    filtering" for all new users until they have built their own useful
@@ Line 1,146: / Line 1,209: @@
    treated just as any other user on the system.
-  NOTE: Be sure and set your global user's preferences so that trainingMode
+''NOTE:''<BR>
-        is set to TOE. This will prevent the purge tools you use from
+Be sure and set your global user's preferences so that trainingMode is set to TOE. This will prevent the purge tools you use from purging them empty in 90 days.
-        purging them empty in 90 days.
-  MERGED GROUPS
+<BR>
+==== MERGED GROUPS ====
    Merged groups are similar to global groups in that the entire system uses
    a single global user as a parent.  What's different is that the global
@@ Line 1,183: / Line 1,245: @@
    the group.
-  NOTE: Merged Groups are great for providing out-of-the-box adaptive filtering,
+''NOTE:''<br>
-        but allowing users to build their own data from scratch will still
+Merged Groups are great for providing out-of-the-box adaptive filtering, but allowing users to build their own data from scratch will still result in the best possible accuracy in the longrun.
-        result in the best possible accuracy in the longrun.
-  NOTE: Be sure and set your global user's preferences so that trainingMode
-        is set to TOE. This will prevent the purge tools you use from
-        purging them empty in 90 days.
+''NOTE:''<br>
+Be sure and set your global user's preferences so that trainingMode is set to TOE. This will prevent the purge tools you use from purging them empty in 90 days.
-  IMPORTANT!
+'''  IMPORTANT! '''
    If you are running dspam_clean, be sure to set a preference for your merged
@@ Line 1,199: / Line 1,258: @@
    out your entire merged group user's dataset, since it's old).
+<br>
 === EXTERNAL INOCULATION THEORY ===
+-----
-  Bill Yerazunis recently expressed his theory of inoculation on an anti-spam
+Bill Yerazunis recently expressed his theory of inoculation on an anti-spam development list, using the term "vaccination":
-  development list, using the term "vaccination":
    "Part of the problem is that spam isn't stationary, it evolves. That
@@ Line 1,282: / Line 1,341: @@
     harvester bots, making them obsolete as counter-productive tools.
+<br>
 === CLIENT/SERVER MODE ===
+-----
+DSPAM supports two different modes of operation.  In standard operating mode, the DSPAM agent is called by the MTA (or proxy) and each agent process performs independently, establishing its own connection to a database and performs delivery on its own. The second operating mode, client/server mode, allows the DSPAM agent to act more like a thin client, connecting to the DSPAM server process which then does all the work of analyzing and delivering or quarantining the message. The advantages to using DSPAM in client/server mode are:
+* Maintaining a set of stateful database connections (within the server), which should enhance performance on some systems by eliminating the need to establish a new database connection for every message processed.
-  DSPAM supports two different modes of operation.  In standard operating
+* Providing a central point of processing. Having one server perform all processing and delivery, while having multiple thin clients on your mail servers may be more desirable than having multiple agents performing processing and delivery on all your servers.
-  mode, the DSPAM agent is called by the MTA (or proxy) and each agent process
+* The DSPAM server speaks LMTP, which some implementations may be able to take advantage of, eliminating the need for the DSPAM client all together.
-  performs independently, establishing its own connection to a database and
+* Having a single multithreaded daemon should use less memory and other resources than having independently operating clients.
-  performs delivery on its own. The second operating mode, client/server mode,
-  allows the DSPAM agent to act more like a thin client, connecting to the
-  DSPAM server process which then does all the work of analyzing and delivering
-  or quarantining the message. The advantages to using DSPAM in client/server
-  mode are:
-  - Maintaining a set of stateful database connections (within the server),
-    which should enhance performance on some systems by eliminating the need
-    to establish a new database connection for every message processed.
-  - Providing a central point of processing. Having one server perform all
+If you've already got DSPAM set up, client/server mode won't require any changes to your mail server's configuration - it's completely transparent.
-    processing and delivery, while having multiple thin clients on your mail
-    servers may be more desirable than having multiple agents performing
-    processing and delivery on all your servers.
-  - The DSPAM server speaks LMTP, which some implementations may be able to
-    take advantage of, eliminating the need for the DSPAM client all together.
-  - Having a single multithreaded daemon should use less memory and other
+The DSPAM agent can be compiled with client/server support by configuring with --enable-daemon. You will need to use a multithread-safe storage driver (presently mysql_drv, pgsql_drv, and hash_drv are supported). Once you have compiled with daemon support, you'll need to modify your dspam.conf to provide the settings necessary for client/server mode:
-    resources than having independently operating clients.
-  If you've already got DSPAM set up, client/server mode won't require any
+ ServerHost             127.0.0.1
-  changes to your mail server's configuration - it's completely transparent.
+The host to listen on. The default is to comment this setting which will force DSPAM to listen on all available interfaces.
-  The DSPAM agent can be compiled with client/server support by configuring
-  with --enable-daemon. You will need to use a multithread-safe storage driver
-  (presently mysql_drv and pgsql_drv are supported). Once you have compiled
-  with daemon support, you'll need to modify your dspam.conf to provide the
-  settings necessary for client/server mode:
-	ServerPort             24
+ ServerPort             24
+The port to listen on. The default is 24, the LMTP port.
-  The port to listen on. The default is 24, the LMTP port.
-	ServerQueueSize        32
+ ServerQueueSize        32
+The maximum number of connections which may remain backlogged before they are accepted.
-  The maximum number of connections which may remain backlogged before they
-  are accepted.
-	ServerPass.Relay1      "secret"
+ ServerPass.Relay1      "secret"
-	ServerPass.Relay2      "password"
+ ServerPass.Relay2      "password"
+Each client server allowed to connect should have its own password. They can be defined here.
-  Each client server allowed to connect should have its own password. They
-  can be defined here.
-  The DSPAM server can listen on either a network socket or a local unix
+The DSPAM server can listen on either a network socket or a local unix domain socket. If you're running the client and server on the same machine, a domain socket should be used as it eliminates additional overhead. To use a domain socket, you'll also need to add the following option:
-  domain socket. If you're running the client and server on the same machine,
-  a domain socket should be used as it eliminates additional overhead. To use
-  a domain socket, you'll also need to add the following option:
-	ServerDomainSocketPath  "/tmp/dspam.sock"
+ ServerDomainSocketPath  "/tmp/dspam.sock"
-  Once you've configured the server config, you'll want to set the client
-  configuration on all client machines. If you are using network sockets,
-  set the following to appropriate values:
-	ClientHost     127.0.0.1
+Once you've configured the server config, you'll want to set the client configuration on all client machines. If you are using network sockets, set the following to appropriate values:
-	ClientPort     24
+ ClientHost     127.0.0.1
+ ClientPort     24
-  Or if using a domain socket:
+Or if using a domain socket:
+ ClientHost     /tmp/dspam.sock
-        ClientHost     /tmp/dspam.sock
+In both cases, you'll need to set the client's authentication ident:
+ ClientIdent    "secret@Relay1"
-  In both cases, you'll need to set the client's authentication ident:
-	ClientIdent    "secret@Relay1"
+Now you're ready to go. To start the DSPAM server, run:
+ dspam --daemon &
-  Now you're ready to go. To start the DSPAM server, run:
+Or alternatively, if you have debugging enabled:
+ dspam --debug --daemon &
-	dspam --daemon &
-  Or alternatively, if you have debugging enabled:
+The DSPAM agent can then be called the same as if you were running in standard (non-client/server) mode and adding --client to the set of parameters. Running dspam without --client specified will cause DSPAM to revert to its normal non-daemon behavior and establish database connections
+on its own. The client settings will be loaded from dspam.conf, and the agent will act as a thin client instead. For example:
+ dspam --client --user dick jane --deliver=innocent -d %u
-	dspam --debug --daemon &
+Alternatively, if you'd like to use a thinner client, dspamc is identical to the dspam binary in behavior, but has been stripped down to only include the lightweight client.
+ dspamc --client --user dick jane --deliver=innocent -d %u
-  The DSPAM agent can then be called the same as if you were running in
-  standard (non-client/server) mode and adding --client to the set of
-  parameters. Running dspam without --client specified will cause DSPAM to
-  revert to its normal non-daemon behavior and establish database connections
-  on its own. The client settings will be loaded from dspam.conf, and the
-  agent will act as a thin client instead. For example:
-	dspam --client --user dick jane --deliver=innocent -d %u
+The conversation that takes place between the client/server is LMTP-based, and will look like this:
+ SERVER> 220 DSPAM DLMTP 3.4.0 Authentication Required
+ CLIENT> LHLO Relay1
+ SERVER> 250-PIPELINING
+ SERVER> 250-ENHANCEDSTATUSCODES
+ SERVER> 250-DSPAMPROCESSMODE
+ SERVER> 250 SIZE
+ CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"
+ SERVER> 250 2.1.0 OK
+ CLIENT> RCPT TO: dick
+ SERVER> 250 2.1.5 OK
+ CLIENT> RCPT TO: jane
+ SERVER: 250 2.1.5 OK
+ CLIENT> DATA
+ SERVER> 354 Enter mail, end with "." on a line by itself
+ CLIENT> Subject: Cheap Viagra!
+ CLIENT>
+ CLIENT> Click Here: <nowiki>http://www.cheapviagra.com</nowiki>
+ CLIENT> .
+ SERVER> 250 2.0.0 <dick> Message accepted for delivery: INNOCENT
+ SERVER> 250 2.0.0 <jane> Message accepted for delivery: SPAM
-  Alternatively, if you'd like to use a thinner client, dspamc is identical
-  to the dspam binary in behavior, but has been stripped down to only include
-  the lightweight client.
-	dspamc --client --user dick jane --deliver=innocent -d %u
+Optionally, if you'd like the clients to perform delivery, you can use DSPAM's --stdout or --classify functionality to obtain a dump of the message or results, respectively. From there, it's up to you and your MTA to deliver the message. The DSPAM client will output the results to stdout in this case, just as it would in standard operating mode.
-  The conversation that takes place between the client/server is LMTP-based,
-  and will look like this:
-SERVER> 220 DSPAM DLMTP 3.4.0 Authentication Required
+Once the server is running, its configuration can be reloaded with a SIGHUP.
-CLIENT> LHLO Relay1
+When the daemon is reloaded, the following occurs:
-SERVER> 250-PIPELINING
+* The daemon stops listening for new requests
-SERVER> 250-ENHANCEDSTATUSCODES
+* All threads are allowed to finish processing and exit
-SERVER> 250-DSPAMPROCESSMODE
+* All connections to the database are closed
-SERVER> 250 SIZE
+* The dspam.conf configuration is reloaded
-CLIENT> MAIL FROM: <secret@Relay1> DSPAMPROCESSMODE="--deliver=innocent -d %u"
+* All connections to the database are re-opened
-SERVER> 250 2.1.0 OK
+* The daemon starts listening for new requests
-CLIENT> RCPT TO: dick
-SERVER> 250 2.1.5 OK
-CLIENT> RCPT TO: jane
-SERVER: 250 2.1.5 OK
-CLIENT> DATA
-SERVER> 354 Enter mail, end with "." on a line by itself
-CLIENT> Subject: Cheap Viagra!
-CLIENT>
-CLIENT> Click Here: http://www.cheapviagra.com
-CLIENT> .
-SERVER> 250 2.0.0 <dick> Message accepted for delivery: INNOCENT
-SERVER> 250 2.0.0 <jane> Message accepted for delivery: SPAM
-  Optionally, if you'd like the clients to perform delivery, you can use
-  DSPAM's --stdout or --classify functionality to obtain a dump of the message
-  or results, respectively. From there, it's up to you and your MTA to
-  deliver the message. The DSPAM client will output the results to stdout in
-  this case, just as it would in standard operating mode.
-  Once the server is running, its configuration can be reloaded with a SIGHUP.
+This allows database and listener configurations to also be reloaded from dspam.conf without the need to interrupt the process.
-  When the daemon is reloaded, the following occurs:
-  - The daemon stops listening for new requests
-  - All threads are allowed to finish processing and exit
-  - All connections to the database are closed
-  - The dspam.conf configuration is reloaded
-  - All connections to the database are re-opened
-  - The daemon starts listening for new requests
-  This allows database and listener configurations to also be reloaded from
+''NOTE:''<br>
-  dspam.conf without the need to interrupt the process.
+During the period of time the daemon is reloading, client connections will fail. Depending on how the MTA reacts, this may cause messages to fall back to queue or to bounce.
-  NOTE: During the period of time the daemon is reloading, client connections
-  will fail. Depending on how the MTA reacts, this may cause messages to
-  fall back to queue or to bounce.
+<br>
 === LMTP ===
+-----
-  DSPAM supports LMTP both on the front-end and back-end (delivery). This
+DSPAM supports LMTP both on the front-end and back-end (delivery). This section will briefly provide instructions for configuring either or both of these advanced options.
-  section will briefly provide instructions for configuring either or both of
-  these advanced options.
    LMTP (AND SMTP) DELIVERY
@@ Line 1,526: / Line 1,547: @@
    In both cases, the content provided between < > is what is actually used.
+<br>
 === DSPAM USER PREFERENCES ===
+-----
+Preferences are settings that can be configured globally in dspam.conf or for individual users via the dspam_admin command.
+ trainingMode { TOE | TUM | TEFT | NOTRAIN }
+How DSPAM should train messages it analyzes. See section 1.5 --mode (default:teft, see dspam.conf)
+ spamAction { quarantine | tag | deliver }
+What to do with spam. The tag and deliver options both deliver, but tag adds a special prefix to the subject, whereas deliver merely sets X-DSPAM-Result. (default:quarantine)
-  Preferences are settings that can be configured globally in dspam.conf or
+ spamSubject
-  for individual users via the dspam_admin command.
+A customized subject to prefix when spamAction=tag. (default:[SPAM])
-  trainingMode { TOE | TUM | TEFT | NOTRAIN }
-    How DSPAM should train messages it analyzes. See section 1.5 --mode
-    (default:teft, see dspam.conf)
-  spamAction { quarantine | tag | deliver }
+ statisticalSedation { 0 - 10 }
-    What to do with spam. The tag and deliver options both deliver, but tag
+The level of dampening during training (0-10, 0 = no dampening, default:0)
-    adds a special prefix to the subject, whereas deliver merely sets
-    X-DSPAM-Result. (default:quarantine)
-  spamSubject
-    A customized subject to prefix when spamAction=tag. (default:[SPAM])
-  statisticalSedation { 0 - 10 }
+ enableBNR { on | off }
-    The level of dampening during training (0-10, 0 = no dampening, default:0)
+Enables or disables bayesian noise reduction (default:off)
-  enableBNR { on | off }
-    Enables or disables bayesian noise reduction (default:off)
-  enableWhitelist { on | off }
+ enableWhitelist { on | off }
-    Enables or disables automatic whitelisting (default:on)
+Enables or disables automatic whitelisting (default:on)
-  signatureLocation { message | headers }
-    Where to place the DSPAM signature. Placement affects forwarding approach.
-    (default:message)
-  tagSpam / tagNonspam { on | off }
+ signatureLocation { message | headers }
-    Adds a tagline to the end of a message based on its classification; useful
+Where to place the DSPAM signature. Placement affects forwarding approach. (default:message)
-    for things such as "Scanned by Your ISP.com". If set to on, the file
-    msgtag.spam and/or msgtag.nonspam will be looked for in dspam_home/txt/
-    and appended to appropriate messages.
-    NOTE: Signed messages will not be tagged in this fashion
-  showFactors { on | off }
+ tagSpam / tagNonspam { on | off }
-    Whether to include an X-DSPAM-Factors header including decision-making
+Adds a tagline to the end of a message based on its classification; useful for things such as "Scanned by Your ISP.com". If set to on, the file msgtag.spam and/or msgtag.nonspam will be looked for in dspam_home/txt/ and appended to appropriate messages.
-    factors (clues). NOTE: This can break RFC in some cases, and should only
-    be used for debugging. (default:off)
-  optIn / optOut { on | off }
+''NOTE:''<br>
-    Depending on whether the system is opt-in or opt-out, sets the user's
+Signed messages will not be tagged in this fashion
-    membership. If user is opted out (or not opted in), mail will be delivered
-    by DSPAM without being processed.
-  whitelistThreshold { Integer }
-    Overrides the default number of times a From: header has been seen before
-    it is automatically whitelisted. (default:10)
-  makeCorpus { on | off }
+ showFactors { on | off }
-    When activated, a maildir-style corpus is maintained in the user's data
+Whether to include an X-DSPAM-Factors header including decision-making factors (clues). NOTE: This can break RFC in some cases, and should only be used for debugging. (default:off)
-    directory (DSPAM_HOME/DATA/USERNAME), suitable for future retraining or
-    other analysis. (default:off)
-  storeFragments { on | off }
-    When activated, the first 1k of each message are temporarily stored on
-    the server for reference via the webui's history function. (default:off)
-  localStore { on | off }
+ optIn / optOut { on | off }
-    Overrides the directory name used for the user's dspam data directory. This
+Depending on whether the system is opt-in or opt-out, sets the user's membership. If user is opted out (or not opted in), mail will be delivered by DSPAM without being processed.
-    is useful when using recipient addresses as usernames, as it will allow
-    all addresses belonging to a specific user to be written to a single
-    webui directory. (default:username)
-  processorBias { on | off }
-    Overrides the "bias" setting in dspam.conf, which biases mail as
-    innocent. (default:on, see dspam.conf)
-  fallbackDomain { on | off }
-    Allows a dspam user ("@domain.com") to be marked as a fallback user for
-    the entire domain, so if the destination dspam user does not exist in
-    the database, the fallback user's database will be used. The
-    dspam.conf "FallbackDomains" setting must also be "on". (default:off)
-    NOTE: You will need to set "FallbackDomains on" in dspam.conf to use this.
-  trainPristine { on | off }
+ whitelistThreshold { Integer }
-    Override's the default signature mode and treats messages as if they were
+Overrides the default number of times a From: header has been seen before it is automatically whitelisted. (default:10)
-    in pristine format when retraining. This requires all retraining to use
-    the original message that was processed as no dspam signature is stored
-    for pristine training. (default:off)
-  optOutClamAV { on | off }
-    Opts out of ClamAV virus scanning (if ClamAV is directly integrated with
-    dspam via dspam.conf). (default:off)
+ makeCorpus { on | off }
+When activated, a maildir-style corpus is maintained in the user's data directory (DSPAM_HOME/DATA/USERNAME), suitable for future retraining or other analysis. (default:off)
-=== FALLBACK DOMAINS ===
-  Fallback domains allow you to default some or all users for a particular
+ storeFragments { on | off }
-  domain to a single domain user; this allows you to set preferences (including
+When activated, the first 1k of each message are temporarily stored on the server for reference via the webui's history function. (default:off)
-  opting out of filtering entirely) for users based on domain name. Any user
-  who does not exist as a known user to DSPAM will be defaulted to the
-  domain it belongs to if it is designated as a fallback domain. This
-  means that you can create [email protected] and [email protected] with their own
-  databases and preferences, but also default all other users to @domain.com.
-  Alternatively, you could create just the domain without any other users and
-  default all users to @domain.com
-  To use fallback domains, you'll first need to activate this feature in
-  dspam.conf:
-  FallbackDomains on
+ localStore { on | off }
+Overrides the directory name used for the user's dspam data directory. This is useful when using recipient addresses as usernames, as it will allow all addresses belonging to a specific user to be written to a single webui directory. (default:username)
-  Next, you'll need to create a dspam user for each domain you wish to use
-  as a fallback domain. For example, @domain.com. Depending on your
+ processorBias { on | off }
-  implementation, this may be a simple insert into dspam_virtual_uids or may
+Overrides the "bias" setting in dspam.conf, which biases mail as innocent. (default:on, see dspam.conf)
-  be created automatically when setting a user's preferences.
-  Finally, designate that special user as a fallback domain by setting a
-  preference:
-  dspam_admin ch pref @domain.com fallbackDomain on
+ fallbackDomain { on | off }
+Allows a dspam user ("@domain.com") to be marked as a fallback user for the entire domain, so if the destination dspam user does not exist in the database, the fallback user's database will be used. The dspam.conf "FallbackDomains" setting must also be "on". (default:off)
-  Any mail coming in for that domain that does _not_ match a known user in
+''NOTE:''
-  dspam will now fall back to this user; you can then set specific preferences
+You will need to set "FallbackDomains on" in dspam.conf to use this.
-  or even opt out the entire user. Alternatively, you can create a domain-based
-  database for filtering mail specific to that domain, just as you would a
-  normal user.
-== BUGS, PORTS, AND THE LIKE ==
-----
+ trainPristine { on | off }
+Override's the default signature mode and treats messages as if they were in pristine format when retraining. This requires all retraining to use the original message that was processed as no dspam signature is stored for pristine training. (default:off)
-  Please see http://dspam.nuclearelephant.com/bugs.shtml for the current known
+ optOutClamAV { on | off }
-  bugs list and proper reporting procedure.
+Opts out of ClamAV virus scanning (if ClamAV is directly integrated with dspam via dspam.conf). (default:off)
-  If you port DSPAM to another platform, or would like to submit changes to
-  the distribution, please email a diff along with any other pertinent
-  information to the dspam-dev mailing list.
-  Note:
+ ignoreRBLLookups { on | off }
+Overrides the "Lookup" setting in dspam.conf, which lookups senders IP addresses in a Realtime Blackhole List (RBL). (default:off)
-  In order to keep DSPAM unencumbered by intellectual property abuses, all
-  external contributors to the project are asked to release any rights to the
-  submission. This keeps the DSPAM project a healthy, unencumbered GPL project.
-  Please accompany your patch, code, or other submission with the following
-  statement. By submitting a patch to the project, you agree to be bound by
-  the terms of this statement whether it is specifically included in the
-  submission or not, however we still require that it be attached to the
-  submission:
-    The author or authors of this submission hereby release any and all
+ RBLInoculate { on | off }
-    copyright interest in this code, documentation, or other materials