Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets
Metadata:Show full item record
Citation:Cunningham, Padraig; Loughrey, John. 'Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets'. - Dublin, Trinity College Dublin, Department of Computer Science, TCD-CS-2005-17, 2005, pp11
In Wrapper based feature selection, the more states that are visited during the search phase of the algorithm the greater the likelihood of finding a feature subset that has a high internal accuracy while generalizing poorly. When this occurs, we say that the algorithm has overfitted to the training data. We outline a set of experiments to show this and we introduce a modified genetic algorithm to address this overfitting problem by stopping the search before overfitting occurs. This new algorithm called GAWES (Genetic Algorithm With Early Stopping) reduces the level of overfitting and yields feature subsets that have a better generalization accuracy.
Science Foundation Ireland
Publisher:Trinity College Dublin, Department of Computer Science
Series/Report no:Computer Science Technical Report