A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering
File Type:
PDFItem Type:
Technical ReportDate:
2005Citation:
Delany, Sarah Jane; Cunningham, Padraig; Tsymbal, Alexey. 'A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering'. - Dublin, Trinity College Dublin, Department of Computer Science, TCD-CS-2005-19, 2005, pp6Download Item:
TCD-CS-2005-19.pdf (PDF) 91.47Kb
Abstract:
The problem of concept drift has recently received
considerable attention in machine learning
research. One important practical problem where
concept drift needs to be addressed is spam filtering.
The literature on concept drift shows that
among the most promising approaches are ensembles
and a variety of techniques for ensemble construction
has been proposed. In this paper we consider
an alternative lazy learning approach to concept
drift whereby a single case-based classifier
for spam filtering keeps itself up-to-date through
a case-base maintenance protocol. We present an
evaluation that shows that the case-base maintenance
approach is more effective than a variety of
ensemble techniques. The evaluation is complicated
by the overriding importance of False Positives
(FPs) in spam filtering. The ensemble approaches
can have very good performance on FPs
because it is possible to bias an ensemble more
strongly away from FPs than it is to bias the single
classifer. However this comes at considerable
cost in overall accuracy.
Sponsor
Grant Number
Enterprise Ireland
Science Foundation Ireland
Publisher:
Trinity College Dublin, Department of Computer ScienceType of material:
Technical ReportCollections:
Series/Report no:
Computer Science Technical ReportTCD-CS-2005-19
Availability:
Full text availableKeywords:
Computer ScienceLicences: