Trinity College Dublin, Department of Computer Science
Citation:
Delany, Sarah Jane; Cunningham, Pádraig; Tsymbal, Alexey. 'A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering'. - Dublin, Trinity College Dublin, Department of Computer Science, TCD-CS-2005-19, 2005, pp6
Series/Report no.:
Computer Science Technical Report TCD-CS-2005-19
Abstract:
The problem of concept drift has recently received
considerable attention in machine learning
research. One important practical problem where
concept drift needs to be addressed is spam filtering.
The literature on concept drift shows that
among the most promising approaches are ensembles
and a variety of techniques for ensemble construction
has been proposed. In this paper we consider
an alternative lazy learning approach to concept
drift whereby a single case-based classifier
for spam filtering keeps itself up-to-date through
a case-base maintenance protocol. We present an
evaluation that shows that the case-base maintenance
approach is more effective than a variety of
ensemble techniques. The evaluation is complicated
by the overriding importance of False Positives
(FPs) in spam filtering. The ensemble approaches
can have very good performance on FPs
because it is possible to bias an ensemble more
strongly away from FPs than it is to bias the single
classifer. However this comes at considerable
cost in overall accuracy.
Please note: There is a known bug in some browsers that causes an
error when a user tries to view large pdf file within the browser window.
If you receive the message "The file is damaged and could not be
repaired", please try one of the solutions linked below based on the
browser you are using.
Items in TARA are protected by copyright, with all rights reserved, unless otherwise indicated.