Overfitting and Diversity in Classification Ensembles based on Feature Selection
Citation:
Cunningham, Padraig. 'Overfitting and Diversity in Classification Ensembles based on Feature Selection'. - Dublin, Trinity College Dublin, Department of Computer Science, TCD-CS-2000-07, 2000, pp8Download Item:
Abstract:
This paper addresses Wrapper-like approaches to feature subset selection and the
production of classifier ensembles based on members with different feature
subsets. The paper starts with the observation that if an insufficient amount of data
is used to guide the Wrapper search then the feature selection will overfit the data.
If the objective of the feature selection exercise is to build a better predictor,
rather than identify important features for data mining reasons, then ensembles
offers a solution. Overfitting may be used to provide diversity in ensembles
provided the overfitted members have variety. The paper concludes with an
assessment of entropy as a measure of diversity in classifier ensembles. A
tentative conclusion is that diversity is not such a problem where a large number
of features is involved but needs to be monitored for problems with smaller
numbers of features ? say less than 25.
Author: Cunningham, Padraig
Publisher:
Trinity College Dublin, Department of Computer ScienceType of material:
Technical ReportCollections
Series/Report no:
Computer Science Technical ReportTCD-CS-2000-07
Availability:
Full text availableKeywords:
Computer ScienceMetadata
Show full item recordLicences: