From LedHed's Wiki
Jump to: navigation, search
(ALIASES)
 
(152 intermediate revisions by the same user not shown)
Line 16: Line 16:
  
  
== 1.0 ABOUT DSPAM ==
+
'''CREDITS'''
  
DSPAM is an open-source, freely available anti-spam solution designed to combat
+
Original Work By:
unsolicited commercial email using advanced statistical analysis. In short,
+
*Lead development: Jonathan A. Zdziarski <jonathan@nuclearelephant.com>
DSPAM filters spam by learning what spam is and isn't. It does this by learning
+
*Postgres driver: Rustam Aliyev <rustam@azernews.com>
each user's individual mail behavior. This allows DSPAM to provide
+
Various:
highly-accurate, personalized filtering for each user on even a large system
+
*Feb/2006 Cove Schneider <[email protected]>
and provides an administratively maintenance free solution capable of learning
+
*Jan/2006 Norman Maurer <nm@byteaction.de>
each user's email behaviors with very few false positives.
+
  
While DSPAM is focused around spam filtering, many have found alternative
+
Your name is missing? Let us know with a reference to your commit, and we'll
uses for all types of two-concept document classification.  
+
add you to the list.
  
DSPAM is rapidly gaining a large support forum and being used in many large-
 
scale implementations. Contributions to the project are welcome via the
 
dspam-dev mailing list or in the form of financial contributions.
 
  
Many of the foundational principles incorporated into this software were
+
'''COPYRIGHT'''
contributed by Paul Graham's white paper on combatting spam, which can be
+
found at http://paulgraham.com/spam.html.  Much research and development has
+
resulted in many new approaches being added onto the DPSAM project as well,
+
some of which are explained in white papers on the DSPAM home page.
+
  
DSPAM can be implemented as a total solution, or as a library which developers  
+
Original work was done by Jonathan A. Zdziarski.
may link their projects to the dspam core engine (libdspam) in accordance with  
+
 
the GPL license agreement.  This enables developers to incorporate libdspam as  
+
In 2006 the copyright was handed over to Sensory Networks.
a "drop-in" for instant spam filtering within their applications - such as mail  
+
 
clients, other anti-spam tools, and so on.
+
In 2009 Sensory Networks handed over the full copyright to the DSPAM Project.
 +
As of 12 January 2009 the copyright is owned by the DSPAM Project, represented by a team of people, including:
 +
* Alexander Prinsier
 +
* Ion-Mihai Tetcu
 +
* Paul Cockings
 +
* Dov Zamir
 +
* Stevan Bajic
 +
 
 +
<br>
 +
== OVERVIEW ==
 +
----
 +
DSPAM is an open-source, freely available anti-spam solution designed to combat unsolicited commercial email using advanced statistical analysis. In short, DSPAM filters spam by learning what spam is and isn't. It does this by learning each user's individual mail behavior. This allows DSPAM to provide highly-accurate, personalized filtering for each user on even a large system and provides an administratively maintenance free solution capable of learning each user's email behaviors with very few false positives.
 +
 
 +
 
 +
While DSPAM is focused around spam filtering, many have found alternative uses for all types of two-concept document classification.
 +
 
 +
 
 +
DSPAM is rapidly gaining a large support forum and being used in many large-scale implementations. Contributions to the project are welcome via the dspam-dev mailing list or in the form of financial contributions.
 +
 
 +
 
 +
Many of the foundational principles incorporated into this software were contributed by Paul Graham's white paper on combatting spam, which can be found at http://paulgraham.com/spam.html.  Much research and development has resulted in many new approaches being added onto the DPSAM project as well, some of which are explained in white papers on the DSPAM home page.
 +
 
 +
 
 +
DSPAM can be implemented as a total solution, or as a library which developers may link their projects to the dspam core engine (libdspam) in accordance with the GPL license agreement.  This enables developers to incorporate libdspam as a "drop-in" for instant spam filtering within their applications - such as mail clients, other anti-spam tools, and so on.
 +
 
 +
 
 +
''PLEASE NOTE:''<br>
 +
DSPAM and libdspam are distributed under the GPL license, not the LGPL. Commercial licensing is available for those who seek to redistribute DSPAM or some of DSPAM's components/libraries in their non-GPL products. Please contact us for more information about commercial licensing.
  
PLEASE NOTE: DSPAM and libdspam are distributed under the GPL license, not the
 
LGPL. Commercial licensing is available for those who seek to redistribute
 
DSPAM or some of DSPAM's components/libraries in their non-GPL products.
 
Please contact [email protected] for more information about
 
commercial licensing.
 
  
 
The DSPAM package is split up into the following pieces:
 
The DSPAM package is split up into the following pieces:
  
DSPAM AGENT
 
  
The DSPAM agent is the command center for all shell and daemon operations.
+
'''DSPAM AGENT'''
If you're using DSPAM as a filtering solution, this is the 'dspam' (or dspamc)
+
 
binary you're likely going to be talking to via commandline.  
+
The DSPAM agent is the command center for all shell and daemon operations. If you're using DSPAM as a filtering solution, this is the 'dspam' (or dspamc) binary you're likely going to be talking to via commandline.  
 +
 
 +
 
 +
'''LIBDSPAM: CORE ENGINE'''
  
LIBDSPAM: CORE ENGINE
+
The DSPAM core processing engine, also known as libdspam, provides all critical spam filtering functions.  The engine is embedded into other dspam components (such as the agent) and is responsbile for the actual filtering logic. If you're not a developer, you don't need to be concerned with this component as it is automatically compiled in with the build.
  
The DSPAM core processing engine, also known as libdspam, provides all critical
 
spam filtering functions.  The engine is embedded into other dspam components
 
(such as the agent) and is responsbile for the actual filtering logic.
 
If you're not a developer, you don't need to be concerned with this component
 
as it is automatically compiled in with the build.
 
  
WEB UI
+
'''WEB UI'''
  
The Web UI (User Interface) is designed to allow end-users to review their
+
The Web UI (User Interface) is designed to allow end-users to review their spam quarantine and history, graphs, and to delete their spam permanently. They can also optionally use the quarantine to perform all of their training. The UI also includes some basic administrative tools to change settings and manage user quarantines.
spam quarantine and history, graphs, and to delete their spam permanently.
+
They can also optionally use the quarantine to perform all of their training.
+
The UI also includes some basic administrative tools to change settings and
+
manage user quarantines.
+
 
   
 
   
TOOLS
 
  
Some basic tools which have been provided to manage dictionaries, automate
+
'''TOOLS'''
corpus feeding, and perform other diagnostic operations related to DSPAM.
+
Some of these include dspam_train, dspam_stats, and dspam_dump.
+
  
 +
Some basic tools which have been provided to manage dictionaries, automate corpus feeding, and perform other diagnostic operations related to DSPAM. Some of these include dspam_train, dspam_stats, and dspam_dump.
  
=== 1.1 INSTALLATION ===
+
<br>
  
 +
== IMPLEMENTATION OPTIONS ==
  
IMPLEMENTATION OPTIONS
+
----
  
There are many different ways to deploy DSPAM onto an existing network. The
+
There are many different ways to deploy DSPAM onto an existing network. The most popular approaches are:
most popular approaches are:
+
  
==== 1. As a delivery agent proxy ====
 
  
When your mail server gets ready to deliver mail to a user's mailbox it calls
 
a delivery agent of some sort. On most UNIX systems, this is procmail, maildrop,
 
mail.local, or a similar tool. When used as a delivery proxy, the DSPAM agent
 
is called in place of your existing agent - or better put, it can masquerade
 
as the local delivery agent. DSPAM then processes the message and will call
 
the /real/ delivery agent to pass the good mail into the user's mailbox,
 
quarantining the bad mail. DSPAM can optionally tag and deliver both spam
 
and legitimate mail.
 
  
In the diagram below, MTA refers to Mail Transfer Agent, or your mail server
+
=== As a delivery agent proxy ===
software: Postfix, Sendmail, Exim, etc. LDA refers to the Local Delivery
+
----
Agent: Procmail, Maildrop, etc..
+
When your mail server gets ready to deliver mail to a user's mailbox it calls a delivery agent of some sort. On most UNIX systems, this is procmail, maildrop, mail.local, or a similar tool. When used as a delivery proxy, the DSPAM agent is called in place of your existing agent - or better put, it can masquerade as the local delivery agent. DSPAM then processes the message and will call the /real/ delivery agent to pass the good mail into the user's mailbox, quarantining the bad mail. DSPAM can optionally tag and deliver both spam and legitimate mail.
 +
 
 +
In the diagram below, MTA refers to Mail Transfer Agent, or your mail server software: Postfix, Sendmail, Exim, etc. LDA refers to the Local Delivery Agent: Procmail, Maildrop, etc..
  
 
BEFORE:
 
BEFORE:
  
    [MTA] ---> [LDA] ---> (User's Mailbox)
+
[MTA] ---> [LDA] ---> (User's Mailbox)
 +
 
  
 
AFTER:
 
AFTER:
  
    [MTA] ---> [DSPAM] ---> [LDA] ---> (User's Mailbox)
+
[MTA] ---> [DSPAM] ---> [LDA] ---> (User's Mailbox)
                        \
+
                    \
                        \--> [Quarantine]
+
                      \--> [Quarantine]
          [End User] ------> [Web UI]
+
        [End User] ------> [Web UI]
  
==== 2. As a POP3 Proxy ====
+
<br>
  
If you don't want to tinker with your existing mail server setup, DSPAM can
+
=== As a POP3 Proxy ===
be combined with one of a few open source programs designed to act as a POP3
+
----
proxy. This means spam is filtered whenever the user checks their mail,
+
If you don't want to tinker with your existing mail server setup, DSPAM can be combined with one of a few open source programs designed to act as a POP3 proxy. This means spam is filtered whenever the user checks their mail, rather than when it is delivered. The benefit to this is that you can set up a small machine on your network that will connect to your existing mail server, so no integration is needed. It also allows your users to arbitarily point their mail client at it if they desire filtering. The drawback to this approach is that the POP3 protocol has no way to tell the mail client that a message is spam, and so the user will have to download the spam (tagged, of course).
rather than when it is delivered. The benefit to this is that you can set up
+
a small machine on your network that will connect to your existing mail server,
+
so no integration is needed. It also allows your users to arbitarily point their
+
mail client at it if they desire filtering. The drawback to this approach is
+
that the POP3 protocol has no way to tell the mail client that a message is
+
spam, and so the user will have to download the spam (tagged, of course).
+
  
 
BEFORE:
 
BEFORE:
  
    [End User] ---> [POP3 Server]
+
[End User] ---> [POP3 Server]
  
 
AFTER:
 
AFTER:
  
    [End User] ---> [POP3 Proxy] <--> [DSPAM]
+
[End User] ---> [POP3 Proxy] <--> [DSPAM]
                    \
+
                  \
                      \--> [POP3 Server]
+
                  \--> [POP3 Server]
  
==== 3. As an SMTP Relay ====
+
<br>
  
Newer versions of DSPAM have seen features that allow it to function more
+
=== As an SMTP Relay ===
easily as an SMTP relay. An SMTP relay sits in front of your existing mail
+
----
server (requiring no integration). To use an SMTP relay, the MX records for  
+
Newer versions of DSPAM have seen features that allow it to function more easily as an SMTP relay. An SMTP relay sits in front of your existing mail server (requiring no integration). To use an SMTP relay, the MX records for your domains are repointed to the relay machine running DSPAM. DSPAM then relays the good (and optionally bad) mail to the existing SMTP server. This allows you to use DSPAM with even a Windows-based destination mail server as no integration is necessary. See doc/relay.txt for one example of how to do this with Postfix.
your domains are repointed to the relay machine running DSPAM. DSPAM then  
+
relays the good (and optionally bad) mail to the existing SMTP server. This
+
allows you to use DSPAM with even a Windows-based destination mail server
+
as no integration is necessary. See doc/relay.txt for one example of how to
+
do this with Postfix.
+
  
 
BEFORE:
 
BEFORE:
  
  { Internet } ---> [Company Mail Server]
+
{ Internet } ---> [Company Mail Server]
  
 
AFTER:
 
AFTER:
  
  { Internet } --->  [ Inbound SMTP Relay  ]  --->  [Company Mail Server]
+
{ Internet } --->  [ Inbound SMTP Relay  ]  --->  [Company Mail Server]
                        ( MTA <> DSPAM )    SMTP  
+
                        ( MTA <> DSPAM )    SMTP  
                          \                    or
+
                        \                    or
                          \--> [Quarantine]  LMTP
+
                          \--> [Quarantine]  LMTP
            [End User] ------> [Web UI]
+
            [End User] ------> [Web UI]
  
 +
<br>
  
==== UPGRADING DSPAM ====
+
== INSTALLATION ==
 +
----
 +
<br>
  
Follow the steps sequentially from the base version you are running up to
+
=== UPGRADING DSPAM ===
the top.
+
----
 +
Follow the steps sequentially from the base version you are running up to the top.
  
 +
<br>
 +
==== Upgrading from 3.8 ====
  
===== UPGRADING FROM 3.6 =====
+
1. Ensure MySQL is using the new database schema. The following clauses should be executed for upgrading pre-3.9.0 DSPAM MySQL schema to the 3.9.0 schema:
 +
ALTER TABLE `dspam_signature_data`
 +
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL,
 +
  CHANGE `data` `data` LONGBLOB NOT NULL,
 +
  CHANGE `length` `length` INT UNSIGNED NOT NULL;
 +
ALTER TABLE `dspam_stats`
 +
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL,
 +
  CHANGE `spam_learned` `spam_learned` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_learned` `innocent_learned` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `spam_misclassified` `spam_misclassified` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_misclassified` `innocent_misclassified` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `spam_corpusfed` `spam_corpusfed` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_corpusfed` `innocent_corpusfed` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `spam_classified` `spam_classified` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_classified` `innocent_classified` BIGINT UNSIGNED NOT NULL;
 +
ALTER TABLE `dspam_token_data`
 +
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL,
 +
  CHANGE `spam_hits` `spam_hits` BIGINT UNSIGNED NOT NULL,
 +
  CHANGE `innocent_hits` `innocent_hits` BIGINT UNSIGNED NOT NULL;
  
  
1. Add 'Tokenizer' setting to dspam.conf
+
If you are using preference extension with DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM preference MySQL schema to the 3.9.0 schema:
  The 'Tokenizer' setting in 3.8.0 replaces tokenizer definitions in the  
+
ALTER TABLE `dspam_preferences`
  "Feature" clause of previous version configurations. See src/dspam.conf
+
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL;
  (after make) for more information about this seting.
+
 
 +
 
 +
If you are using virtual users (with AUTO_INCREMENT) in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids MySQL schema to the 3.9.0 schema:
 +
ALTER TABLE `dspam_virtual_uids`
 +
  CHANGE `uid` `uid` INT UNSIGNED NOT NULL AUTO_INCREMENT;
 +
 
 +
 
 +
If you are using virtual user aliases (aka: DSPAM in relay mode) in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids MySQL schema to the 3.9.0 schema:
 +
ALTER TABLE `dspam_virtual_uids`
 +
      CHANGE `uid` `uid` INT UNSIGNED NOT NULL;
 +
 
 +
 
 +
If you need to speed up the MySQL purging script and can afford to use more disk space for the DSPAM MySQL data, then consider executing the following clause for adding three additional indices:
 +
ALTER TABLE `dspam_token_data`
 +
  ADD INDEX(`spam_hits`),
 +
  ADD INDEX(`innocent_hits`),
 +
  ADD INDEX(`last_hit`);
 +
 
 +
 
 +
2. Ensure PosgreSQL is using the new database schema. The following clauses should be executed for upgrading pre-3.9.0 DSPAM PosgreSQL schema to the 3.9.0 schema:
 +
ALTER TABLE dspam_preferences ALTER COLUMN uid TYPE integer;
 +
ALTER TABLE dspam_signature_data ALTER COLUMN uid TYPE integer;
 +
ALTER TABLE dspam_stats ALTER COLUMN uid TYPE integer;
 +
ALTER TABLE dspam_token_data ALTER COLUMN uid TYPE integer;
 +
DROP INDEX IF EXISTS id_token_data_sumhits;
 +
 
 +
 
 +
If you are using virtual users in DSPAM, then you should execute the following clause for upgrading pre-3.9.0 DSPAM virtual uids to the 3.9.0 schema:
 +
ALTER TABLE dspam_virtual_uids ALTER COLUMN uid TYPE integer;
 +
 
 +
<br>
 +
 
 +
==== Upgrading From 3.6 ====
 +
 
 +
1. Add 'Tokenizer' setting to dspam.conf The 'Tokenizer' setting in 3.8.0 replaces tokenizer definitions in the "Feature" clause of previous version configurations. See src/dspam.conf (after make) for more information about this seting.
 +
 
 
   
 
   
2. Check calls to dspam_logrotate
+
2. Check calls to dspam_logrotate Earlier versions of 3.6 did not prepend a leading "-l" flag to specifying log file selection. This is now required.
  Earlier versions of 3.6 did not prepend a leading "-l" flag to specifying
+
  log file selection. This is now required.
+
  
3. Ensure 3.6.0 malaligned hash databases are converted
 
  Version 3.6.0 failed to align hash databases to 8-byte boundaries. If you
 
  are upgrading from v3.6.0 and are using the hash_drv storage driver, you
 
  should run cssconvert to upgrade your .css files to a fully aligned format.
 
  
4. Invert "SupressWebStats" setting in dspam.conf
+
3. Ensure 3.6.0 malaligned hash databases are converted Version 3.6.0 failed to align hash databases to 8-byte boundaries. If you are upgrading from v3.6.0 and are using the hash_drv storage driver, you should run cssconvert to upgrade your .css files to a fully aligned format.
  SupressWebStats has been changed to simply WebStats, and the setting is
+
  inverted. Be sure to update this in dspam.conf.
+
  
5. Add "ProcessorURLContext" setting in dspam.conf
 
  ProcessorURLContext has been added to toggle whether URL specific tokens
 
  are created in the tokenizer process. The "on" value is default for previous
 
  versions of DSPAM.
 
  
===== UPGRADING FROM 3.4 =====
+
4. Invert "SupressWebStats" setting in dspam.conf SupressWebStats has been changed to simply WebStats, and the setting is inverted. Be sure to update this in dspam.conf.
 +
 
 +
 
 +
5. Add "ProcessorURLContext" setting in dspam.conf ProcessorURLContext has been added to toggle whether URL specific tokens are created in the tokenizer process. The "on" value is default for previous versions of DSPAM.
 +
 
 +
<br>
 +
 
 +
==== Upgrading From 3.4 ====
  
 
Follow all of the steps above, and the following steps:
 
Follow all of the steps above, and the following steps:
  
1. Add "ProcessorBias" setting to dspam.conf
+
1. Add "ProcessorBias" setting to dspam.conf ProcessorBias has been added to dspam.conf and must be specified. Since ProcessorBias is the default behavior for previous versions of DSPAM, you will need to add "ProcessorBias on" to dspam.conf. If you have specifically disabled bias, or are using a technique such as Markovian discrimination, you may leave this feature off.
  ProcessorBias has been added to dspam.conf and must be specified.
+
  Since ProcessorBias is the default behavior for previous versions of DSPAM,
+
  you will need to add "ProcessorBias on" to dspam.conf. If you have
+
  specifically disabled bias, or are using a technique such as Markovian
+
  discrimination, you may leave this feature off.
+
  
2. Ensure references to SBLQueue are changed to RABLQueue.
 
  Older versions of DSPAM used the SBLQueue setting to write files for a
 
  DSPAM SBL setup. This has been renamed to RABLQueue. Please change this in
 
  dspam.conf if you are writing to a SBL/RABL installation.
 
  
3. Add "TestConditionalTraining" setting to dspam.conf
+
2. Ensure references to SBLQueue are changed to RABLQueue. Older versions of DSPAM used the SBLQueue setting to write files for a DSPAM SBL setup. This has been renamed to RABLQueue. Please change this in dspam.conf if you are writing to a SBL/RABL installation.
  TestConditionalTraining has been added to dspam.conf and must be specified
+
  to be enabled. Since TestConditionalTraining is the default behavior
+
  in DSPAM, it is strongly recommended that you add
+
  "TestConditionalTraining on" to dspam.conf
+
  
4. Ensure PostgreSQL installation have a lookup_tokens function
 
  PostgreSQL systems running v8.0+ must create the function lookup_tokens
 
  added to pgsql_objects.sql. The driver now checks your version and uses this
 
  function to improve performance on 8.0+.
 
  
5. Ensure you are specifying the correct storage driver.
+
3. Add "TestConditionalTraining" setting to dspam.conf TestConditionalTraining has been added to dspam.conf and must be specified to be enabled. Since TestConditionalTraining is the default behavior in DSPAM, it is strongly recommended that you add "TestConditionalTraining on" to dspam.conf
  hash_drv is now the new default storage driver. hash_drv has no dependencies
+
  and is extremely fast/efficient. If you're not familiar with it, you should
+
  check out the readme. If you were previously using SQLite, you will now need
+
  to specify it as the storage driver: --with-storage-driver=sqlite_drv
+
  
  NOTE: Berkeley DB drivers (libdb3_drv, libdb4_drv) are deprecated and have
 
        been removed from the build. You will need to select an alternative
 
        storage driver in order to upgrade.
 
  
 +
4. Ensure PostgreSQL installation have a lookup_tokens function PostgreSQL systems running v8.0+ must create the function lookup_tokens added to pgsql_objects.sql. The driver now checks your version and uses this function to improve performance on 8.0+.
  
==== FRESH INSTALLATION ====
 
  
===== 0. PREREQUISITES =====
+
5. Ensure you are specifying the correct storage driver. hash_drv is now the new default storage driver. hash_drv has no dependencies and is extremely fast/efficient. If you're not familiar with it, you should check out the readme. If you were previously using SQLite, you will now need to specify it as the storage driver: --with-storage-driver=sqlite_drv
  
  DSPAM can use one of many different backends to store its information, and
+
 
  you will need to decide on one and install the appropriate software before
+
''NOTE:''<br>
  you can build DSPAM. The following storage backends are presently available:  
+
Berkeley DB drivers (libdb3_drv, libdb4_drv) are deprecated and have been removed from the build. You will need to select an alternative storage driver in order to upgrade.
 +
 
 +
<br>
 +
 
 +
=== FRESH INSTALLATION ===
 +
----
 +
<br>
 +
 
 +
'''PREREQUISITES'''
 +
 
 +
DSPAM can use one of many different backends to store its information, and you will need to decide on one and install the appropriate software before you can build DSPAM. The following storage backends are presently available:  
 
   
 
   
  Driver      Requirements
+
    Driver      Requirements
 
   -------------------------------------------------------------------------
 
   -------------------------------------------------------------------------
T mysql_drv:  MySQL client libraries      (and a server to connect to)  
+
  T mysql_drv:  MySQL client libraries      (and a server to connect to)  
T pgsql_drv:  PostgreSQL client libraries (and a server to connect to)
+
  T pgsql_drv:  PostgreSQL client libraries (and a server to connect to)
  sqlite_drv:  SQLite v2.7.7 or above  
+
    sqlite_drv:  SQLite v2.7.7 or above  
  sqlite3_drv: SQLite v3.x
+
    sqlite3_drv: SQLite v3.x
*T hash_drv:    None (Self-Contained Hash-Based Driver)
+
*T hash_drv:    None (Self-Contained Hash-Based Driver)
 
+
 
   Legend:
 
   Legend:
 
     * Default storage driver
 
     * Default storage driver
 
     T Thread-safe (Required for running DSPAM in server daemon mode)
 
     T Thread-safe (Required for running DSPAM in server daemon mode)
  
  In general, MySQL is one of the faster solutions with a smaller storage
 
  footprint, and is well suited for both small and large-scale implementations.
 
  
  The hash driver (inspired by Bill Yerazunis' CRM Sparse Spectra algorithm)
+
In general, MySQL is one of the faster solutions with a smaller storage footprint, and is well suited for both small and large-scale implementations.
  is the fastest solution by far and requires no dependencies, supports
+
  an auto-extend feature to grow the file size as needed, and is very
+
  fast and compact. It does, however, lack some features (such as merged
+
  groups support) and uses a lot of memory to mmap() users.
+
  
  Documentation for any additional setup of your selected storage driver can
 
  be found in the doc/ directory. You'll need to follow any steps outlined in
 
  the storage driver documentation before continuing.
 
  
  You can download MySQL from http://www.mysql.com.
+
The hash driver (inspired by Bill Yerazunis' CRM Sparse Spectra algorithm) is the fastest solution by far and requires no dependencies, supports an auto-extend feature to grow the file size as needed, and is very fast and compact. It does, however, lack some features (such as merged groups support) and uses a lot of memory to mmap() users.
  You can download PostgreSQL from http://www.postgresql.com.
+
  You can download SQLite from http://www.sqlite.org.
+
  
===== 1. CONFIGURATION =====
 
  
  DSPAM uses autoconf, so configuration is fairly standardized with other
+
Documentation for any additional setup of your selected storage driver can be found in the doc/ directory. You'll need to follow any steps outlined in the storage driver documentation before continuing.
  UNIX-based software:
+
  
  ./configure [options]
 
  
  DSPAM supports the configuration options below. Generally, the default
+
You can download MySQL from http://www.mysql.com.
  configuration is more than acceptable, so it's a good idea not to tweak too
+
  many settings unless you know what you are doing.
+
  
  PATH SWITCHES
+
You can download PostgreSQL from http://www.postgresql.com.
  
    --prefix=DIR
+
You can download SQLite from http://www.sqlite.org.
    Specify an alternative root prefix for installation.  The default is
+
    /usr/local. This does not affect the location of dspam.conf (which
+
    defaults to /usr/local/etc). Use --sysconfdir= for this.
+
  
    --sysconfdir=DIR
+
<br>
    Specify an alternative home for the dspam.conf file. The default is
+
==== CONFIGURATION ====
    prefix/etc.
+
  
    --with-dspam-home=DIR
+
DSPAM uses autoconf, so configuration is fairly standardized with other UNIX-based software:
    Specify an alternative DSPAM home for installation. This can alternatively
+
./configure [options]
    be changed in dspam.conf, but is convenient to do on the configure line.
+
    The default is $prefix/var/dspam, or /usr/local/var/dspam.
+
  
    --with-logdir=DIR
+
DSPAM supports the configuration options below. Generally, the default configuration is more than acceptable, so it's a good idea not to tweak too many settings unless you know what you are doing.
    Specify an alternative log directory. The default is $dspam_home/log. Do
+
    not set this to /var/log unless DSPAM will have permissions to write to
+
    the directory.
+
  
  FILESYSTEM SCALE
+
<br>
 +
===== PATH SWITCHES =====
  
    The default filesystem scale is "small-scale", and writes each user to
+
--prefix=DIR
    its own directory in the top-level DSPAM home data directory.   
+
Specify an alternative root prefix for installation.  The default is /usr/local. This does not affect the location of dspam.conf (which defaults to /usr/local/etc). Use --sysconfdir= for this.
    The following two switches allow the scale to be changed to be more
+
    suitable for larger installations.
+
  
    --enable-large-scale
 
    Switch for large-scale implementation.  User data will be stored as
 
    $HOME/data/u/s/user instead of $HOME/data/user
 
  
    --enable-domain-scale
+
--sysconfdir=DIR
    Switch for domain-scale implementation.  When used, DSPAM expects
+
Specify an alternative home for the dspam.conf file. The default is prefix/etc.
    username@domain to be passed in as the user id and user data will be
+
    stored as $HOME/data/domain.com/user and $HOME/opt-in/domain/user.dspam
+
    instead of $HOME/data/user
+
  
  INTEGRATION SWITCHES
 
  
    --with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
+
--with-dspam-home=DIR
    Specify your storage driver selection(s).  A storage driver is a driver
+
Specify an alternative DSPAM home for installation. This can alternatively be changed in dspam.conf, but is convenient to do on the configure line. The default is $prefix/var/dspam, or /usr/local/var/dspam.
    written specifically for DSPAM to store tokens, signature data, and
+
    perform other proprietary operations. The default driver is hash_drv.
+
    The following drivers have been provided:
+
  
    mysql_drv:  MySQL Drivers
 
    pgsql_drv:  PostgreSQL Drivers
 
    sqlite_drv:  SQLite v2.x Drivers
 
    sqlite3_drv: SQLite v3.x Drivers
 
    hash_drv:    Self-Contained Hash Database
 
  
    If you are a packager, or wish to have multiple drivers built for any
+
--with-logdir=DIR
    reason, you may specify multiple drivers by separating them with commas.
+
Specify an alternative log directory. The default is $dspam_home/log. Do not set this to /var/log unless DSPAM will have permissions to write to the directory.
    This will cause the storage driver specified in dspam.conf to be
+
    dynamically loaded at runtime rather than statically linked. If you wish
+
    to build only one driver, but dynamically, then specify it twice as in
+
    --with-storage-driver=mysql_drv,mysql_drv.
+
  
    If you will be compiling DSPAM to operate as a server daemon or to deliver
+
<br>
    via SMTP/LMTP, you will need to use a thread-safe driver (outlined in the
+
===== FILESYSTEM SCALE =====
    chart earlier in this document).
+
  
    You may also need to use some of the driver-specific configure flags
+
The default filesystem scale is "small-scale", and writes each user to its own directory in the top-level DSPAM home data directory. The following two switches allow the scale to be changed to be more suitable for larger installations.
    (discussed in the DRIVER SPECIFIC CONFIGURATION OPTIONS section below).
+
  
    --disable-trusted-user-security
 
    Administrators who wish to disable trusted user security may do so by
 
    using this configure flag.  This will cause DSPAM to treat each user as
 
    if they were "trusted" which could allow them to potentially execute
 
    arbitrary commands on the server via DSPAM. Because of this, administrators
 
    should only use this option on either a closed server, or configure their
 
    DSPAM binary to be executable only by users who can be trusted.  This
 
    option SHOULD NOT be used as a solution to your MTA dropping privileges
 
    prior to calling DSPAM. Instead, see the TRUSTED SECURITY section of this
 
    document.
 
  
    --enable-homedir
+
--enable-large-scale
    When enabled, instead of checking for $HOME/$USER/opt-in/
+
Switch for large-scale implementation.  User data will be stored as $HOME/data/u/s/user instead of $HOME/data/user
    $USER[.dspam|.nodspam], DSPAM will check for a .dspam|.nodspam file in the
+
    user's home directory. DSPAM will also store each user's data in ~/.dspam
+
    when this option is enabled. Because of this, DSPAM will automatically
+
    install and run setuid root so that it can read each user's home directory.
+
  
    Note:
 
  
      This function is incompatible with most implementations of the Web UI,
+
--enable-domain-scale
      since it requires access to read each user's home directory. Therefore,  
+
Switch for domain-scale implementation. When used, DSPAM expects username@domain to be passed in as the user id and user data will be stored as $HOME/data/domain.com/user and $HOME/opt-in/domain/user.dspam instead of $HOME/data/user
      only use this option if you will not be using the Web UI or plan on
+
      doing something asinine like running it as root.
+
  
    --enable-daemon
+
<br>
    Builds DSPAM with support for daemon mode, and builds associated dspamc
+
===== INTEGRATION SWITCHES =====
    thin client. Pthreads is required to build for daemon mode and the
+
    storage driver used must be thread-safe.
+
  
  DRIVER SPECIFIC CONFIGURE SWITCHES
+
--with-storage-driver=DRIVER[,DRIVER2[...,DRIVERN]]
 +
Specify your storage driver selection(s).  A storage driver is a driver written specifically for DSPAM to store tokens, signature data, and perform other proprietary operations.  The default driver is hash_drv. The following drivers have been provided:
  
    Some storage drivers have their own custom configuration switches:
+
mysql_drv:   MySQL Drivers
 +
pgsql_drv:  PostgreSQL Drivers
 +
sqlite_drv:  SQLite v2.x Drivers
 +
sqlite3_drv: SQLite v3.x Drivers
 +
hash_drv:    Self-Contained Hash Database
  
    mysql_drv:
 
      --with-mysql-includes=DIR
 
      Specify a path to the MySQL includes
 
  
      --with-mysql-libraries=DIR
+
If you are a packager, or wish to have multiple drivers built for any reason, you may specify multiple drivers by separating them with commas. This will cause the storage driver specified in dspam.conf to be dynamically loaded at runtime rather than statically linked. If you wish to build only one driver, but dynamically, then specify it twice as in:
      Specify a path to the MySQL libraries
+
--with-storage-driver=mysql_drv,mysql_drv.
      (Currently links to -lmysqlclient, also -lcrypto on some systems)
+
  
      --enable-virtual-users
 
      Tells DSPAM to create virtual user ids.  Use this if your users don't
 
      actually exist on the system (e.g. in /etc/passwd if using a password
 
      file)
 
  
      --enable-preferences-extension
+
If you will be compiling DSPAM to operate as a server daemon or to deliver via SMTP/LMTP, you will need to use a thread-safe driver (outlined in the chart earlier in this document).
      MySQL supports the preferences extension, which stores user preferences
+
You may also need to use some of the driver-specific configure flags (discussed in the DRIVER SPECIFIC CONFIGURATION OPTIONS section below).
      in mysql instead of flat files (the built-in method)
+
  
      --disable-mysql4-initialization
 
      If you are compiling libdspam for use with a third party application,
 
      and the third party application makes its own calls to libmysqlclient,
 
      you should use this option to disable libdspam's initialization and
 
      cleanup of libmysqlclient, and allow the application to manage this.
 
      This option suppresses libdspam's calls to mysql_server_init and
 
      mysql_server_end.
 
  
      Note:
+
--disable-trusted-user-security
 +
Administrators who wish to disable trusted user security may do so by using this configure flag.  This will cause DSPAM to treat each user as if they were "trusted" which could allow them to potentially execute arbitrary commands on the server via DSPAM. Because of this, administrators should only use this option on either a closed server, or configure their DSPAM binary to be executable only by users who can be trusted.  This option SHOULD NOT be used as a solution to your MTA dropping privileges prior to calling DSPAM. Instead, see the TRUSTED SECURITY section of this document.
  
      Please see the file doc/mysql_drv.txt for more information
 
      about configuring the mysql_drv storage driver.
 
  
    pgsql_drv:
+
--enable-homedir
      --with-pgsql-includes=DIR
+
When enabled, instead of checking for $HOME/$USER/opt-in/$USER[.dspam|.nodspam], DSPAM will check for a .dspam|.nodspam file in the user's home directory. DSPAM will also store each user's data in ~/.dspam when this option is enabled. Because of this, DSPAM will automatically install and run setuid root so that it can read each user's home directory.
      Specify a path to the PgSQL includes
+
  
      --with-pgsql-libraries=DIR
 
      Specify a path to the PgSQL libraries
 
      (Currently links to -lpq, and netlibs on some systems)
 
  
      --enable-virtual-users
+
''NOTE:''<br>
      Tells DSPAM to create virtual user ids. Use this if your users don't
+
This function is incompatible with most implementations of the Web UI, since it requires access to read each user's home directory. Therefore, only use this option if you will not be using the Web UI or plan on doing something asinine like running it as root.
      actually exist on the system (e.g. in /etc/passwd if using a password
+
      file)
+
  
      --enable-preferences-extension
 
      Postgres supports the preferences extension, which stores user
 
      preferences in pgsql instead of flat files (the built-in method)
 
  
      Note:
+
--enable-daemon
 +
Builds DSPAM with support for daemon mode, and builds associated dspamc thin client. Pthreads is required to build for daemon mode and the storage driver used must be thread-safe.
  
      Please see the file doc/pgsql_drv.txt for more information about
+
<br>
      configuring the pgsql_drv storage driver.
+
  
    sqlite_drv:
+
===== DRIVER SPECIFIC CONFIGURE SWITCHES =====
    sqlite3_drv:
+
      --with-sqlite-includes=DIR
+
      Specify a path to the SQLite includes
+
  
      --with-sqlite-libraries=DIR
+
Some storage drivers have their own custom configuration switches:
      Specify a path to the SQLite libraries
+
  
  DEBUGGING SWITCHES
 
  
    --enable-debug
+
mysql_drv:
    Turns on support for debugging output. This option allows you to turn on
+
--with-mysql-includes=DIR
    debugging messages for all or some users by editing dspam.conf or setting
+
Specify a path to the MySQL includes
    --debug on the commandline. Enabling debug in configure only adds support
+
    for debug to be compiled in, it must still be activated using one of the  
+
    options prescribed above. Debugging support itself doesn't use up very
+
    many additional resources, so it should be safe to leave enabled on
+
    non-enterprise class systems.
+
  
    --enable-verbose-debug
 
    Turns on extremely verbose debugging output. --enable-debug is implied.
 
    Never use this on production builds!
 
  
    Note:
+
--with-mysql-libraries=DIR
 +
Specify a path to the MySQL libraries (Currently links to -lmysqlclient, also -lcrypto on some systems)
  
    When verbose debug is compiled in, DSPAM performs many additional
 
    mathematical calculations regardless of whether or not it's been
 
    activated. You shouldn't use --enable-verbose for production builds
 
    unless you have serious issues you can't resolve.
 
  
  FEATURE ACTIVATION
+
--enable-virtual-users
 +
Tells DSPAM to create virtual user ids.  Use this if your users don't actually exist on the system (e.g. in /etc/passwd if using a password file)
  
    --enable-clamav
 
    Enables support for Clam Antivirus. DSPAM can interface directly with
 
    clamd to perform virus scanning and can be configured to react in
 
    different ways to viruses. See dspam.conf for more information.
 
  
  ADDITIONAL CONFIGURATION OPTIONS
+
--enable-preferences-extension
 +
MySQL supports the preferences extension, which stores user preferences in mysql instead of flat files (the built-in method)
  
    The remainder of configuration options are located in dspam.conf, which
 
    is installed in sysconfdir (default: /usr/local/etc) upon a make install.
 
    It is generally a good idea to review dspam.conf and make any changes
 
    necessary prior to using DSPAM.
 
  
===== 2. BUILDING AND INSTALLING =====
+
--disable-mysql4-initialization
 +
If you are compiling libdspam for use with a third party application, and the third party application makes its own calls to libmysqlclient, you should use this option to disable libdspam's initialization and cleanup of libmysqlclient, and allow the application to manage this. This option suppresses libdspam's calls to mysql_server_init and mysql_server_end.
  
  After you have run configure with the correct options, build and install
 
  DSPAM by performing:
 
  
  make && make install
+
''NOTE:''<br>
 +
Please see the file doc/mysql_drv.txt for more information about configuring the mysql_drv storage driver.
  
  Note:
 
  
    If you are a developer wanting to link to the core engine of dspam,
+
pgsql_drv:
    libdspam will be built during this process. Please see the
+
  --with-pgsql-includes=DIR
    example.c file for examples of how to link to and use libdspam. Static
+
Specify a path to the PgSQL includes
    and dynamic libraries are built in the .libs directory. Needed headers
+
    will be installed in $prefix$/include/dspam.
+
  
===== 3. PERMISSIONS =====
 
  
  In the typical UNIX environment, you'll need to worry about the following
+
--with-pgsql-libraries=DIR
  permissions:
+
Specify a path to the PgSQL libraries (Currently links to -lpq, and netlibs on some systems)
  
  The CGI User: This is the user your web server (most likely Apache) is
 
    running as. This is commonly 'nobody' or 'web'. You can find this in
 
    Apache's httpd.conf by searching for 'User'. The CGI user will need
 
    the ability to access the following components of DSPAM:
 
      - Ability to execute the dspam binary
 
      - Ability to read and write to dspam_home/data/
 
      - Trusted user permissions in dspam.conf ("Trust [username]")
 
      - The execution 'Group' used must match the group dspam is running as
 
        (this is typically 'mail', 'dspam', or similar)
 
   
 
  The MTA User: This is the user your mail server software is running as when
 
    it executes DSPAM. This is usually daemon, mail, exim, etc. This is
 
    typically different from the user the MTA runs and polices itself as, to
 
    avoid security problems. Consult your MTA's documentation for more info.
 
    The MTA user will require:
 
      - The ability to execute the dspam binary
 
      - Trusted user permissions in dspam.conf ("Trust [username]")
 
  
  Systems Administrators: In order to perform administrative functions,
+
--enable-virtual-users
    systems administratiors will require:
+
Tells DSPAM to create virtual user ids. Use this if your users don't actually exist on the system (e.g. in /etc/passwd if using a password file)
      - The ability to execute dspam-related binaries
+
      - Trusted user permissions in dspam.conf ("Trust [username]")
+
  
  Note:
 
 
    If the MTA is communicating with DSPAM via LMTP (explained later), then
 
    execution permissions are not necessary
 
  
  Note about FreeBSD:
+
--enable-preferences-extension
 +
Postgres supports the preferences extension, which stores user preferences in pgsql instead of flat files (the built-in method)
  
    FreeBSD's default MTA user is 'mailnull'
 
    FreeBSD's default delivery agent also changes its uid, and so in order
 
    to call it, dspam must be installed as setuid root to work on the
 
    commandline properly. This is done automatically on install.
 
  
 +
''NOTE:''<br>
 +
Please see the file doc/pgsql_drv.txt for more information about configuring the pgsql_drv storage driver.
  
  Understanding Trusted User Security
 
  
  DSPAM has tighter security for untrusted users on the system to prevent
+
sqlite_drv:
  them from touching other user's data or passing arbitrary commands to the
+
sqlite3_drv:
  delivery agent DSPAM calls. "Trusted User Security" is a simple system
+
  whereby any unsafe functions are not available to a user calling dspam
+
  unless they are within dspam.conf's trusted user list.
+
  
  Local non-privileged users should be able to use DSPAM without any problems
 
  while remaining untrusted, as long as they behave. For example, an untrusted
 
  user cannot set their DSPAM username to any name other than their username.
 
  Untrusted users are also limited to the delivery options set by the
 
  system administrator, and cannot redirect how DSPAM delivers mail.
 
  
  A list of trusted users is maintained in dspam.conf. This file should
+
--with-sqlite-includes=DIR
  include a list of trusted users who should be allowed to set the dspam user,
+
Specify a path to the SQLite includes
  passthru parameters, and other information that would be potentially
+
  dangerous for a malicious user to be able to set.  You'll need to ensure
+
  that your CGI user, MTA user, and system administrators are on the list.
+
  
===== 4. MAIL SERVER INTEGRATION =====
 
  
  As previously mentioned, there are three popular ways to implement DSPAM:
+
--with-sqlite-libraries=DIR
 +
Specify a path to the SQLite libraries
  
  As a delivery proxy:
+
<br>
    The default approach integrates DSPAM directly with the mail server and
+
===== DEBUGGING SWITCHES =====
    filters spam as mail comes in. Please see the appropriate instructions
+
    in doc/ pertaining to your MTA.
+
  
  As a POP3 proxy:
+
--enable-debug
    This alternative approach implements a POP3 proxy where users
+
Turns on support for debugging output. This option allows you to turn on debugging messages for all or some users by editing dspam.conf or setting --debug on the commandline. Enabling debug in configure only adds support for debug to be compiled in, it must still be activated using one of the options prescribed above. Debugging support itself doesn't use up very many additional resources, so it should be safe to leave enabled on non-enterprise class systems.
    connect to the proxy to check their email, and email is filtered when
+
    being downloaded. The POP3 proxy is a much easier approach, as it
+
    requires much less integration work with the mail server (and is ideal
+
    for implementing DSPAM on Exchange, etcetera). Please see the file
+
    doc/pop3filter.txt.
+
  
  As an SMTP Relay:
 
    DSPAM can be configured as an SMTP relay, a.k.a appliance. You
 
    can set it up to sit in front of your real mail server and then point
 
    your MX records at it. DSPAM will then pass along the good mail to
 
    your real SMTP server. See doc/relay.txt for more information. The
 
    example provided uses Postfix and MySQL.
 
  
  Trusted users and the MTA
+
--enable-verbose-debug
 +
Turns on extremely verbose debugging output. --enable-debug is implied. Never use this on production builds!
  
  If you are using an MTA that changes its userid to match the destination
 
  user before calling DSPAM, you won't be able to provide pass-thru
 
  arguments to DSPAM (these are the commandline arguments that DSPAM in turn
 
  passed to the local delivery agent, in such a configuration).
 
  You will need to pre-configure the "default" pass-thru arguments in DSPAM.
 
  This can be done by declaring an untrusted delivery agent in dspam.conf.
 
  When DSPAM is called by an untrusted user, it will automatically force their
 
  DSPAM user id and passthru delivery agent arguments specified in dspam.conf.
 
  
  This information will override any passthru commandline parameters
+
''NOTE:''<br>
  specified by the user. For example:
+
When verbose debug is compiled in, DSPAM performs many additional mathematical calculations regardless of whether or not it's been activated. You shouldn't use --enable-verbose for production builds unless you have serious issues you can't resolve.
  
  UntrustedDeliveryAgent      "/bin/mail -d $u"
+
<br>
 +
===== FEATURE ACTIVATION =====
  
  The variable $u informs DSPAM that you would like the destination username
+
--enable-clamav
  to be used in the position $u is specified, so when DSPAM calls your LDA
+
Enables support for Clam Antivirus. DSPAM can interface directly with clamd to perform virus scanning and can be configured to react in different ways to viruses. See dspam.conf for more information.
  for user 'bob', it will call it with:
+
  
  /bin/mail -d bob
+
<br>
 +
===== ADDITIONAL CONFIGURATION OPTIONS =====
  
===== 5. ALIASES =====
+
The remainder of configuration options are located in dspam.conf, which is installed in sysconfdir (default: /usr/local/etc) upon a make install. It is generally a good idea to review dspam.conf and make any changes necessary prior to using DSPAM.
  
  There are essentially two different ways a user might train DSPAM. The first
+
<br>
  is by using the Web UI, which allows them to retrain via the "History"
+
  tab. This works quite well, as users must visit the Web UI occasionally
+
  to review their quarantine anyway (and reverse any false positives). We'll
+
  discuss this shortly in section 1.1.8.
+
  
  The more common approach to training, discussed here, is to allow users to
+
==== BUILDING AND INSTALLING ====
  simply forward their spam to an email address where DSPAM can analyze and
+
  learn it. DSPAM uses a signature-based system, where a serial number of
+
  sorts is appended to each email processed by DSPAM. DSPAM reads this serial
+
  number when the user forwards (or bounced) a message to what is called their
+
  "spam email address". The serial number points to temporary information
+
  stored on the server (for 14 days by default) containing all of the
+
  information necessary for DSPAM to relearn the message. This is necessary
+
  in order to relearn the *exact* message DSPAM originally processed.
+
  
  Note:
+
After you have run configure with the correct options, build and install
 +
DSPAM by performing:
  
    If you are using an IMAP based system, Web-based email, or other form of
+
make && make install
    email management where the original messages are stored on the server in
+
    pristine format, you can turn this signature feature off by setting
+
    "TrainPristine on" in dspam.conf. DSPAM will then use the message itself
+
    that you provide it to train, which MUST be identical to the original
+
    message in order to retrain properly.
+
  
  Because DSPAM learns each user's specific email behavior, it's necessary
 
  to identify the user in order to program their specific filtering database.
 
  This can be done in one of three ways:
 
  
  The Simple Way:
+
''NOTE:''<br>
 +
If you are a developer wanting to link to the core engine of dspam, libdspam will be built during this process.  Please see the example.c file for examples of how to link to and use libdspam. Static and dynamic libraries are built in the .libs directory. Needed headers will be installed in $prefix$/include/dspam.
  
    If you are using the MySQL or PgSQL storage drivers, the original
+
<br>
    numeric user id can be embedded in the signature, requiring only one
+
    central spam alias to be necessary for the entire system. To configure
+
    this, uncomment the appropriate UIDInSignature option in dspam.conf:
+
  
    # MySQLUIDInSignature    on
+
==== PERMISSIONS ====
    # PgSQLUIDInSignature    on 
+
  
    Now all you'll need is a single system-wide alias, and DSPAM will train
+
In the typical UNIX environment, you'll need to worry about the following
    the appropriate user when it sees the signature. An example of an alias
+
permissions:
    might look like:
+
  
    spam:"|/usr/local/bin/dspam --user root --class=spam --source=error"
 
  
    Similarly, you may also wish to have a false-positive alias for users who
+
The CGI User: This is the user your web server (most likely Apache) is running as. This is commonly 'nobody' or 'web'. You can find this in Apache's httpd.conf by searching for 'User'. The CGI user will need the ability to access the following components of DSPAM:
    prefer to tag spam rather than quarantine it:
+
* Ability to execute the dspam binary
 +
* Ability to read and write to dspam_home/data/
 +
* Trusted user permissions in dspam.conf ("Trust [username]")
 +
* The execution 'Group' used must match the group dspam is running as (this is typically 'mail', 'dspam', or similar).
  
    notspam:"|/usr/local/bin/dspam --user root --class=innocent --source=error"
 
  
    Note:
+
The MTA User: This is the user your mail server software is running as when it executes DSPAM. This is usually daemon, mail, exim, etc. This is typically different from the user the MTA runs and polices itself as, to avoid security problems. Consult your MTA's documentation for more info. The MTA user will require:
 +
* The ability to execute the dspam binary
 +
* Trusted user permissions in dspam.conf ("Trust [username]")
  
    The 'root' user represents any active dspam user. It is necessary to
 
    supply a username on the commandline or DSPAM will bail on
 
    an error, however the user will be changed internally once the signature
 
    is read.
 
  
  The Kind-of-Simple Way:
+
Systems Administrators: In order to perform administrative functions, systems administratiors will require:
 +
* The ability to execute dspam-related binaries
 +
* Trusted user permissions in dspam.conf ("Trust [username]")
  
    If you're not using one of the above storage drivers, the next easiest
 
    way to configure aliases is to have DSPAM parse the 'To:' header of the
 
    message and use a catch-all subdomain to direct all mail into DSPAM for
 
    retraining. You can then instruct your users to email addresses like
 
    '[email protected]'. The ParseToHeaders option (available
 
    in dspam.conf) will parse the To: header of forwarded messages and
 
    set the username to either 'bob' or '[email protected]', depending
 
    on how it is configured. DSPAM can also set the training mode to either
 
    "learn spam" or "learn notspam" depending on whether the user specified
 
    a spam- or notspam- address in the To: header.
 
  
    This is ideal if you don't want to set up a separate alias for each user
+
''NOTE:''<br>
    on your system (The Hard Way). If you're fortunate enough to have a
+
If the MTA is communicating with DSPAM via LMTP (explained later), then execution permissions are not necessary.
    mail server that can perform regular expression matching, you can set up
+
    your system without a subdomain, and just use addresses like
+
    [email protected]. For the rest of us, it will be necessary to set up
+
    a subdomain catch-all directly into DSPAM. For example:
+
  
    @relearn.domain.tld "|/usr/local/bin/dspam"
 
  
    Don't forget to set the appropriate ParseToHeaders and related options in
+
''NOTE about FreeBSD:''<br>
    dspam.conf as well. More specific instructions can be found in dspam.conf
+
FreeBSD's default MTA user is 'mailnull' FreeBSD's default delivery agent also changes its uid, and so in order to call it, dspam must be installed as setuid root to work on the commandline properly. This is done automatically on install.
    itself. In most cases, the following will suffice:
+
  
    ParseToHeaders on
 
    ChangeUserOnParse user
 
    ChangeModeOnParse on
 
  
  The Old Way (A.K.A. The Hard Way)
 
  
    If neither of the easy ways are possible, you're stuck with doing it
+
'''Understanding Trusted User Security'''
    the hard way. This means you'll need a separate spam alias (and notspam
+
    alias, if users are tagging mail) for each user. To do this, you will
+
    need to create an email address for each user, so that DSPAM can
+
    analyze and learn for that specific user.  For example:
+
  
    spam-bob: "|/usr/local/bin/dspam --user bob --class=spam --source=error"
+
DSPAM has tighter security for untrusted users on the system to prevent them from touching other user's data or passing arbitrary commands to the delivery agent DSPAM calls. "Trusted User Security" is a simple system whereby any unsafe functions are not available to a user calling dspam unless they are within dspam.conf's trusted user list.
  
    You will end up having one alias per mail user on the system, two if you
 
    do not use DSPAM's CGI quarantine (an additional one using notspam-). Be
 
    sure the aliases are unique and each username matches the name after the
 
    --user flag.  A tool has been provided called dspam_genaliases.  This tool
 
    will read the /etc/passwd file and write out a dspam aliases file that can
 
    be included in your master aliases table. 
 
  
    To report spam, the user should be instructed to forward each spam to
+
Local non-privileged users should be able to use DSPAM without any problems while remaining untrusted, as long as they behave. For example, an untrusted user cannot set their DSPAM username to any name other than their username. Untrusted users are also limited to the delivery options set by the system administrator, and cannot redirect how DSPAM delivers mail.
    spam-user@yourhost
+
  
    It doesn't really matter what you name these aliases, so long as the flags
 
    being passed to dspam are correct for each user.  It might be a good idea
 
    to create an alias custom to your network, so that spammers don't forward
 
    spam into it.  For example, notspam-yourcompany-bob or something. 
 
  
  Note About Security:
+
A list of trusted users is maintained in dspam.conf. This file should include a list of trusted users who should be allowed to set the dspam user, passthru parameters, and other information that would be potentially dangerous for a malicious user to be able to set.  You'll need to ensure that your CGI user, MTA user, and system administrators are on the list.
  
    You might be wondering if a user can forward a spam to another user's
+
<br>
    address, or whether a spammer can forward a spam to another user's
+
    notspam address. The answer is "no". The key to all mail-based retraining
+
    is the signature embedded in each email. The signature is stored with
+
    each user's own user id, and so not only does the incoming message have
+
    to bear a valid signature, but it also has to be stored on the system with
+
    the correct user id. This prevents any kind of alias abuse.
+
  
===== 6. NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS =====
+
==== MAIL SERVER INTEGRATION ====
 +
As previously mentioned, there are three popular ways to implement DSPAM:
  
  Non-SQL Based Nightly Purge
 
  
    If you are NOT running a SQL-based solution, then you should configure
 
    dspam_clean to run under cron nightly. This clean tool will read all
 
    signature databases and purge signatures that are older than 14 days
 
    (configurable), purge abandoned tokens, and remove unimportant tokens. 
 
    Without this tool, old signatures will continue to pile up.
 
    Be sure the user running cleanup has full read/write permissions on the
 
    DSPAM data files.
 
  
    0 0 * * * /usr/local/bin/dspam_clean [options]
+
'''As a delivery proxy'''
  
    See the dspam_clean description for more information
+
The default approach integrates DSPAM directly with the mail server and filters spam as mail comes in. Please see the appropriate instructions in doc/ pertaining to your MTA.
  
  SQL-Based Nightly Purge
 
  
    SQL-Based solutions include a nightly SQL script to perform the same basic
 
    tasks as dspam_clean, and it does it much faster and with more finesse.
 
    You can find instructions about each driver's purge functions in
 
    the driver's README (doc/[driver].txt) for performing nightly
 
    maintenance. Most SQL drivers will include a purge script in the
 
    src/tools.[driver] directory. For example:
 
  
    0 0 * * * mysql --user=[user] --pass=[pass] [db] < /path/to/purge-4.1.sql
+
'''As a POP3 proxy'''
  
  Log Rotation
+
This alternative approach implements a POP3 proxy where users connect to the proxy to check their email, and email is filtered when being downloaded.  The POP3 proxy is a much easier approach, as it requires much less integration work with the mail server (and is ideal for implementing DSPAM on Exchange, etcetera). Please see the file doc/pop3filter.txt.
  
    The system log and user logs can fill up fairly quickly, when all that's
 
    really needed to generate graphs are the last two to three weeks of data.
 
    You can configure a nightly log cleanup using dspam_logrotate:
 
  
    0 0 * * * dspam_logrotate -a 30 -d /usr/local/var/dspam/data
 
  
===== 7. NOTIFICATIONS =====
+
'''As an SMTP Relay'''
  
  DSPAM is capable of sending three different notifications to users:
+
DSPAM can be configured as an SMTP relay, a.k.a appliance. You can set it up to sit in front of your real mail server and then point your MX records at it. DSPAM will then pass along the good mail to your real SMTP server. See doc/relay.txt for more information. The example provided uses Postfix and MySQL.
  
    - A "First Run" message sent to each user when they receive their first
 
      message through DSPAM.
 
  
    - A "First Spam" message sent to each user when they receive their first
+
'''Trusted users and the MTA'''
      spam
+
  
    - A "Quarantine Full" message sent to each user when their quarantine box
+
If you are using an MTA that changes its userid to match the destination user before calling DSPAM, you won't be able to provide pass-thru arguments to DSPAM (these are the commandline arguments that DSPAM in turn passed to the local delivery agent, in such a configuration). You will need to pre-configure the "default" pass-thru arguments in DSPAM. This can be done by declaring an untrusted delivery agent in dspam.conf. When DSPAM is called by an untrusted user, it will automatically force their DSPAM user id and passthru delivery agent arguments specified in dspam.conf.
      is > 2MB in size.
+
  
  These notifications can be activated by copying the txt/ directory from the
+
This information will override any passthru commandline parameters specified by the user. For example:
  distribution into DSPAM's home (by default /usr/local/var/dspam).  You will
+
UntrustedDeliveryAgent      "/bin/mail -d $u"
  want to modify these templates prior to installing them to reflect the
+
  correct email addresses and URLs (look for 'configureme' and 'yourdomain').
+
  
  NOTE: The quarantine warning is reset when the user clicks 'Delete All', but
+
The variable $u informs DSPAM that you would like the destination username to be used in the position $u is specified, so when DSPAM calls your LDA for user 'bob', it will call it with:
  is not reset if they use "Delete Selected". If the user doesn't wish to
+
  /bin/mail -d bob
  receive reminders, they should use the "Delete Selected" function instead
+
  of "Delete All".
+
  
  You'll need to also set "Notifications" to "on" in dspam.conf.
+
<br>
 +
===== ALIASES =====
 +
There are essentially two different ways a user might train DSPAM. The first is by using the Web UI, which allows them to retrain via the "History" tab. This works quite well, as users must visit the Web UI occasionally to review their quarantine anyway (and reverse any false positives). We'll discuss this shortly in section 1.1.8.
  
===== 8. THE WEB UI =====
 
  
  The Web UI (CGI client) can be run from any executable location on
+
The more common approach to training, discussed here, is to allow users to simply forward their spam to an email address where DSPAM can analyze and learn it. DSPAM uses a signature-based system, where a serial number of sorts is appended to each email processed by DSPAM. DSPAM reads this serial number when the user forwards (or bounced) a message to what is called their "spam email address". The serial number points to temporary information stored on the server (for 14 days by default) containing all of the information necessary for DSPAM to relearn the message. This is necessary in order to relearn the *exact* message DSPAM originally processed.
  a web server, and detects its user's identity from the REMOTE_USER
+
  environment variable. This means you'll need to use HTTP password
+
  authentication to access the CGI (Any type of authentication will work,
+
  so long as Apache supports the module). This is also convenient in that you
+
  can set up authentication using almost any existing system you have.
+
  The only catch is that you'll need the usernames to match the actual
+
  DSPAM usernames used the system. A copy of the shadow password file
+
  will suffice for most common installs.
+
  
  The accompanying files in the webui/ folder should be copied into your
 
  document root and cgi-bin, as specified.
 
  
    Note:
+
''NOTE:''<br>
 +
If you are using an IMAP based system, Web-based email, or other form of email management where the original messages are stored on the server in pristine format, you can turn this signature feature off by setting "TrainPristine on" in dspam.conf. DSPAM will then use the message itself that you provide it to train, which MUST be identical to the original message in order to retrain properly.
  
    Some authentication mechanisms are case insensitive and will
 
    authenticate the user regardless of the case they type it in.  DSPAM,
 
    on the other hand, is case sensitive and the case of the username used
 
    will need to match the case on the system.  If you suffer from this
 
    authentication problem, and are certain all of your users' usernames are
 
    in lowercase, you can add the following line of code to the CGI right
 
    after the call to &ReadParse...
 
  
    $ENV{'REMOTE_USER'} = lc($ENV{'REMOTE_USER'});
+
Because DSPAM learns each user's specific email behavior, it's necessary to identify the user in order to program their specific filtering database. This can be done in one of three ways:
  
  The CGI will need to function in the same group as the dspam agent in order
 
  to work with the files in dspam_home.  The best way to do this is to create
 
  a separate virtualhost specifically for the CGI and assign it to run in the
 
  MTA group using Apache's suexec. If you are using procmail, additional
 
  configuration may also be necessary (see below). 
 
  
  Note:
 
  
    Apache users do NOT take on the identity of the groups specified in
+
''' The Simple Way '''
    /etc/group so you will need to specifically assign the group in
+
    httpd.conf.
+
  
  Note about Procmail:
+
If you are using the MySQL or PgSQL storage drivers, the original numeric user id can be embedded in the signature, requiring only one central spam alias to be necessary for the entire system. To configure this, uncomment the appropriate UIDInSignature option in dspam.conf:
 +
# MySQLUIDInSignature    on
 +
# PgSQLUIDInSignature    on 
  
      Because the DSPAM Web UI is a CGI script, DSPAM will not retain its
 
      setuid privileges when called. If you are running procmail, this will
 
      become a problem as procmail requires root privileges to deliver. The
 
      easiest hack around this is to create a procmail.dspam binary and make it
 
      setuid root, then make it executable only by the mail group (or
 
      whatever group DSPAM and the CGI run in).
 
  
  The DSPAM Web UI has a minimal configuration inside the configure.pl script.
+
Now all you'll need is a single system-wide alias, and DSPAM will train the appropriate user when it sees the signature. An example of an alias might look like:
  You'll want to check and make sure all of the settings are correct. In
+
spam:"|/usr/local/bin/dspam --user root --class=spam --source=error"
  most cases, the only that will be necessary to change are the large-scale
+
  or domain-scale flags.
+
  
  BEFORE PROCEEDING:
 
    Check and make sure (Again) that the CGI user from Apache's httpd.conf is
 
    added as a trusted user in dspam.conf.
 
  
  Default Preferences
+
Similarly, you may also wish to have a false-positive alias for users who prefer to tag spam rather than quarantine it:
 +
notspam:"|/usr/local/bin/dspam --user root --class=innocent --source=error"
  
  Now would be a good time to set the system's default preferences. This can
 
  be done using the dspam_admin tool.  For example:
 
  
    dspam_admin ch pref default trainingMode TEFT
+
''NOTE:''<br>
    dspam_admin ch pref default spamAction quarantine
+
The 'root' user represents any active dspam user. It is necessary to supply a username on the commandline or DSPAM will bail on an error, however the user will be changed internally once the signature is read.
    dspam_admin ch pref default spamSubject "[SPAM]"
+
 
    dspam_admin ch pref default enableWhitelist on
+
 
    dspam_admin ch pref showFactors off
+
 
 +
''' The Kind-of-Simple Way '''
 +
 
 +
If you're not using one of the above storage drivers, the next easiest way to configure aliases is to have DSPAM parse the 'To:' header of the message and use a catch-all subdomain to direct all mail into DSPAM for retraining. You can then instruct your users to email addresses like '[email protected]'. The ParseToHeaders option (available in dspam.conf) will parse the To: header of forwarded messages and set the username to either 'bob' or '[email protected]', depending on how it is configured. DSPAM can also set the training mode to either "learn spam" or "learn notspam" depending on whether the user specified a spam- or notspam- address in the To: header.
 +
 
 +
 
 +
This is ideal if you don't want to set up a separate alias for each user on your system (The Hard Way). If you're fortunate enough to have a mail server that can perform regular expression matching, you can set up your system without a subdomain, and just use addresses like [email protected]. For the rest of us, it will be necessary to set up a subdomain catch-all directly into DSPAM. For example:
 +
@relearn.domain.tld "|/usr/local/bin/dspam"
 +
 
 +
 
 +
Don't forget to set the appropriate ParseToHeaders and related options in dspam.conf as well. More specific instructions can be found in dspam.conf itself. In most cases, the following will suffice:
 +
ParseToHeaders on
 +
ChangeUserOnParse user
 +
ChangeModeOnParse on
 +
 
 +
 
 +
 
 +
''' The Old Way (A.K.A. The Hard Way) '''
 +
 
 +
If neither of the easy ways are possible, you're stuck with doing it the hard way. This means you'll need a separate spam alias (and notspam alias, if users are tagging mail) for each user. To do this, you will need to create an email address for each user, so that DSPAM can analyze and learn for that specific user.  For example:
 +
spam-bob: "|/usr/local/bin/dspam --user bob --class=spam --source=error"
 +
 
 +
You will end up having one alias per mail user on the system, two if you do not use DSPAM's CGI quarantine (an additional one using notspam-). Be sure the aliases are unique and each username matches the name after the --user flag.  A tool has been provided called dspam_genaliases.  This tool will read the /etc/passwd file and write out a dspam aliases file that can be included in your master aliases table. 
 +
 
 +
 
 +
To report spam, the user should be instructed to forward each spam to spam-user@yourhost
 +
 
 +
 
 +
It doesn't really matter what you name these aliases, so long as the flags being passed to dspam are correct for each user.  It might be a good idea to create an alias custom to your network, so that spammers don't forward spam into it.  For example, notspam-yourcompany-bob or something. 
 +
 
 +
 
 +
''NOTE about Security:''
 +
 
 +
You might be wondering if a user can forward a spam to another user's address, or whether a spammer can forward a spam to another user's notspam address. The answer is "no". The key to all mail-based retraining is the signature embedded in each email. The signature is stored with each user's own user id, and so not only does the incoming message have to bear a valid signature, but it also has to be stored on the system with the correct user id. This prevents any kind of alias abuse.
 +
 
 +
<br>
 +
 
 +
==== NIGHTLY MAINTENANCE AND HOUSEKEEPING CRONS ====
 +
<br>
 +
===== Non-SQL Based Nightly Purge =====
 +
If you are NOT running a SQL-based solution, then you should configure dspam_clean to run under cron nightly. This clean tool will read all signature databases and purge signatures that are older than 14 days (configurable), purge abandoned tokens, and remove unimportant tokens. Without this tool, old signatures will continue to pile up. Be sure the user running cleanup has full read/write permissions on the DSPAM data files.
 +
0 0 * * * /usr/local/bin/dspam_clean [options]
 +
''See the dspam_clean description for more information''
 +
 
 +
<br>
 +
===== SQL-Based Nightly Purge =====
 +
SQL-Based solutions include a nightly SQL script to perform the same basic
 +
tasks as dspam_clean, and it does it much faster and with more finesse.
 +
You can find instructions about each driver's purge functions in
 +
the driver's README (doc/[driver].txt) for performing nightly
 +
maintenance. Most SQL drivers will include a purge script in the
 +
src/tools.[driver] directory. For example:
 +
0 0 * * * mysql --user=[user] --pass=[pass] [db] < /path/to/purge-4.1.sql
 +
 
 +
<br>
 +
===== Log Rotation =====
 +
The system log and user logs can fill up fairly quickly, when all that's really needed to generate graphs are the last two to three weeks of data. You can configure a nightly log cleanup using dspam_logrotate:
 +
 
 +
0 0 * * * dspam_logrotate -a 30 -d /usr/local/var/dspam/data
 +
 
 +
<br>
 +
 
 +
==== NOTIFICATIONS ====
 +
DSPAM is capable of sending three different notifications to users:
 +
* A "First Run" message sent to each user when they receive their first message through DSPAM.
 +
* A "First Spam" message sent to each user when they receive their first spam
 +
* A "Quarantine Full" message sent to each user when their quarantine box is > 2MB in size.
 +
 
 +
These notifications can be activated by copying the txt/ directory from the distribution into DSPAM's home (by default /usr/local/var/dspam).  You will want to modify these templates prior to installing them to reflect the correct email addresses and URLs (look for 'configureme' and 'yourdomain').
 +
 
 +
 
 +
''NOTE:''<br>
 +
The quarantine warning is reset when the user clicks 'Delete All', but is not reset if they use "Delete Selected".  If the user doesn't wish to receive reminders, they should use the "Delete Selected" function instead of "Delete All".
 +
 
 +
You'll need to also set "Notifications" to "on" in dspam.conf.
 +
 
 +
<br>
 +
 
 +
==== THE WEB UI ====
 +
The Web UI (CGI client) can be run from any executable location on a web server, and detects its user's identity from the REMOTE_USER
 +
environment variable. This means you'll need to use HTTP password authentication to access the CGI (Any type of authentication will work, so long as Apache supports the module). This is also convenient in that you can set up authentication using almost any existing system you have. The only catch is that you'll need the usernames to match the actual DSPAM usernames used the system. A copy of the shadow password file will suffice for most common installs.
 +
 
 +
 
 +
The accompanying files in the webui/ folder should be copied into your document root and cgi-bin, as specified.
 +
 
 +
 
 +
''NOTE:''<br>
 +
Some authentication mechanisms are case insensitive and will authenticate the user regardless of the case they type it in.  DSPAM, on the other hand, is case sensitive and the case of the username used will need to match the case on the system.  If you suffer from this