Active learning query selection with historical information
Citation:
Michael Davy, 'Active learning query selection with historical information', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2009, pp 193Download Item:

Abstract:
This work describes novel methods and techniques to decrease the cost of employing active
learning in text categorisation problems. The cost of performing active learning is a combination
of labelling effort and computational overhead. Reducing the cost of active learning
allows for accurate classifiers to be constructed inexpensively, increasing the number of realworld
problems where machine learning solutions can be successfully applied. In this thesis
we investigate strategies and techniques to reduce both computational expense and labelling
effort in active learning.
Critical to the success of active learning is the query selection strategy, which is responsible
for identifying informative unlabelled examples. Selecting only the most informative
examples will reduce labelling effort as redundant and uninformative examples are ignored.
The majority of query selection strategies select queries based on the labelling predictions
of the current classifier. This thesis suggests that information from prior iterations of active
learning can help select more informative queries in the current iteration. We propose
History-based query selection strategies, which incorporate predictions from prior iterations
of active learning into the selection of the current query. These strategies have been shown
to increase the accuracy of classifiers produced using active learning, thereby reducing labelling
effort. In addition, History-based query selection strategies are very efficient since
information is reused from previous iterations of active learning.
Another contributing factor to the cost of active learning is computational expense. Query
selection strategies can require considerable computation to identify the most informative examples.
We investigate pre-filtering optimisation for the computationally inefficient error
reduction sampling (ERS) query selection strategy. Pre-filtering restricts the number of unlabelled
examples considered to a small subset of the pool, constructed using query selection
strategy. Optimising ERS using pre-filtering was found to simultaneously reduce computational
overhead and the labelling effort.
Author: Davy, Michael
Advisor:
Luz, SaturninoQualification name:
Doctor of Philosophy (Ph.D.)Publisher:
Trinity College (Dublin, Ireland). School of Computer Science & StatisticsNote:
TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ieType of material:
thesisCollections:
Availability:
Full text availableLicences: