The University of Dublin | Trinity College -- Ollscoil Átha Cliath | Coláiste na Tríonóide
Trinity's Access to Research Archive
Home :: Log In :: Submit :: Alerts ::

TARA >
School of Computer Science and Statistics >
Computer Science >
Computer Science Technical Reports >

Please use this identifier to cite or link to this item: http://hdl.handle.net/2262/13439

Title: A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering
Author: Delany, Sarah Jane
Cunningham, Pádraig
Tsymbal, Alexey
Sponsor: Enterprise Ireland
Science Foundation Ireland
Keywords: Computer Science
Issue Date: 2005
Publisher: Trinity College Dublin, Department of Computer Science
Citation: Delany, Sarah Jane; Cunningham, Pádraig; Tsymbal, Alexey. 'A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering'. - Dublin, Trinity College Dublin, Department of Computer Science, TCD-CS-2005-19, 2005, pp6
Series/Report no.: Computer Science Technical Report
TCD-CS-2005-19
Abstract: The problem of concept drift has recently received considerable attention in machine learning research. One important practical problem where concept drift needs to be addressed is spam filtering. The literature on concept drift shows that among the most promising approaches are ensembles and a variety of techniques for ensemble construction has been proposed. In this paper we consider an alternative lazy learning approach to concept drift whereby a single case-based classifier for spam filtering keeps itself up-to-date through a case-base maintenance protocol. We present an evaluation that shows that the case-base maintenance approach is more effective than a variety of ensemble techniques. The evaluation is complicated by the overriding importance of False Positives (FPs) in spam filtering. The ensemble approaches can have very good performance on FPs because it is possible to bias an ensemble more strongly away from FPs than it is to bias the single classifer. However this comes at considerable cost in overall accuracy.
URI: https://www.cs.tcd.ie/publications/tech-reports/reports.05/TCD-CS-2005-19.pdf
http://hdl.handle.net/2262/13439
Appears in Collections:Computer Science Technical Reports

Files in This Item:

File Description SizeFormat
TCD-CS-2005-19.pdf91.48 kBAdobe PDFView/Open


This item is protected by original copyright


Please note: There is a known bug in some browsers that causes an error when a user tries to view large pdf file within the browser window. If you receive the message "The file is damaged and could not be repaired", please try one of the solutions linked below based on the browser you are using.

Items in TARA are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback