Trinity College Dublin, Department of Computer Science
Delany, Sarah Jane; Cunningham, Pádraig; Doyle, Doónal. 'Generating Estimates of Classification Confidence for a Case-Based Spam Filter'. - Dublin, Trinity College Dublin, Department of Computer Science, TCD-CS-2005-20, 2005, pp12
Computer Science Technical Report TCD-CS-2005-20
Producing estimates of classification confidence is surprisingly
difficult. One might expect that classifiers that can produce numeric
classification scores (e.g. k-Nearest Neighbour or Naive Bayes)
could readily produce confidence estimates based on thresholds. In fact,
this proves not to be the case, probably because these are not probabilistic
classifiers in the strict sense. The numeric scores coming from
k-Nearest Neighbour or Naive Bayes classifiers are not well correlated
with classification confidence. In this paper we describe a case-based
spam filtering application that would benefit significantly from an ability
to attach confidence predictions to positive classifications (i.e. messages
classified as spam). We show that ‘obvious’ confidence metrics for
a case-based classifier are not effective. We propose an ensemble-like solution
that aggregates a collection of confidence metrics and show that
this offers an effective solution in this spam filtering domain.
Please note: There is a known bug in some browsers that causes an
error when a user tries to view large pdf file within the browser window.
If you receive the message "The file is damaged and could not be
repaired", please try one of the solutions linked below based on the
browser you are using.
Items in TARA are protected by copyright, with all rights reserved, unless otherwise indicated.