Generating Estimates of Classification Confidence for a Case-Based Spam Filter
![Thumbnail](/themes/Mirage2/images/white_rectangle.jpeg)
File Type:
PDFItem Type:
Technical ReportDate:
2005-02-05Citation:
Delany, Sarah Jane; Cunningham, Padraig; Doyle, Doonal. 'Generating Estimates of Classification Confidence for a Case-Based Spam Filter'. - Dublin, Trinity College Dublin, Department of Computer Science, TCD-CS-2005-20, 2005, pp12Download Item:
Abstract:
Producing estimates of classification confidence is surprisingly
difficult. One might expect that classifiers that can produce numeric
classification scores (e.g. k-Nearest Neighbour or Naive Bayes)
could readily produce confidence estimates based on thresholds. In fact,
this proves not to be the case, probably because these are not probabilistic
classifiers in the strict sense. The numeric scores coming from
k-Nearest Neighbour or Naive Bayes classifiers are not well correlated
with classification confidence. In this paper we describe a case-based
spam filtering application that would benefit significantly from an ability
to attach confidence predictions to positive classifications (i.e. messages
classified as spam). We show that `obvious? confidence metrics for
a case-based classifier are not effective. We propose an ensemble-like solution
that aggregates a collection of confidence metrics and show that
this offers an effective solution in this spam filtering domain.
Publisher:
Trinity College Dublin, Department of Computer ScienceType of material:
Technical ReportCollections
Series/Report no:
Computer Science Technical ReportTCD-CS-2005-20
Availability:
Full text availableKeywords:
Computer ScienceMetadata
Show full item recordLicences: