An Analysis of Case-Base Editing in a Spam Filtering System

File Type:
PDFItem Type:
Technical ReportDate:
2004-08Citation:
Delany, Sarah Jane; Cunningham, Padraig. 'An Analysis of Case-Base Editing in a Spam Filtering System'. - Dublin, Trinity College Dublin, Department of Computer Science, TCD-CS-2004-29, 2004, pp14Download Item:

Abstract:
Because of the volume of spam email and its evolving nature, any
deployed Machine Learning-based spam filtering system will need to have
procedures for case-base maintenance. Key to this will be procedures to edit the
case-base to remove noise and eliminate redundancy. In this paper we present a
two stage process to do this. We present a new noise reduction algorithm called
Blame-Based Noise Reduction that removes cases that are observed to cause
misclassification. We also present an algorithm called Conservative
Redundancy Reduction that is much less aggressive than the state-of-the-art
alternatives and has significantly better generalisation performance in this
domain. These new techniques are evaluated against the alternatives in the
literature on four datasets of 1000 emails each (50% spam and 50% non spam).
Sponsor
Grant Number
Science Foundation Ireland
Enterprise Ireland
Author: Delany, Sarah Jane; Cunningham, Padraig
Publisher:
Trinity College Dublin, Department of Computer ScienceType of material:
Technical ReportCollections:
Series/Report no:
Computer Science Technical ReportTCD-CS-2004-29
Availability:
Full text availableKeywords:
Computer ScienceLicences: