A knowledge-light approach to regression using case-based reasoning
Citation:
Neil McDonnell, 'A knowledge-light approach to regression using case-based reasoning', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 2007, pp 165Download Item:

Abstract:
Case-based reasoning (CBR) is among the most influential paradigms in modern machine
learning. It advocates a strategy of storing specific experiences in the form of cases, and
solving new problems by re-using solutions from similar past cases. The most difficult aspect
of CBR is deciding how to adapt past solutions to precisely match the circumstances of new
problems. No generally applicable method of doing this has been found; different domains and
tasks have their own individual characteristics, and successful adaptation has usually relied on
the presence of explicit, hand-coded domain knowledge. Such knowledge is usually difficult
both to acquire and maintain. For this reason, most CBR systems in operation today are
‘retrieval only’ in that they do not attempt to adapt the solutions of past cases to solve new
problems.
For certain machine learning tasks, however, customisation of old solutions can be
performed using only knowledge contained within the set of stored cases. One such task is
regression (i.e. predicting the value of a numeric variable). Regression is among the oldest
machine learning tasks, dating back to Francis Galton’s work on predicting the heights of
parents and their children in nineteenth century England. A modern example would be to
predict tomorrow’s stock market prices based on today’s financial data. Many different
approaches to solving regression problems have been developed over the years, for example,
k
-NN, locally weighted linear regression and artificial neural networks.
The aim of this thesis is to apply CBR to the problem of regression. It begins by analysing
previous attempts to do this, paying particular attention to those aspects that might be
improved. One CBR-based approach from the mid-1990’s is examined in considerable detail.
It works by finding the differences between a new problem and a similar past problem, then
searching for a pair of stored cases with the same differences between them. These stored
cases indicate the effect of the differences on the solution. This ‘case differences’ approach has
much to recommend it. In particular, the knowledge needed to solve new problems is
automatically generated from stored cases—no additional external knowledge must be added.
Unfortunately, it also suffers from some theoretical limitations that greatly restrict its use.
This thesis presents two new CBR-based regression algorithms that build on the strengths of
previous approaches while addressing their limitations. One is a minor variant of the
v
traditional
k
-NN algorithm, while the other uses the case differences approach and is more
sophisticated. The main contribution of the second algorithm is that it uses locally weighted
linear regression as a guide to help choose past cases that are likely to be useful for solving new
problems. It also takes steps to increase robustness when basing predictions on noisy datasets.
An experimental evaluation of the new techniques shows that they perform well relative to
standard regression algorithms on a range of datasets.
Author: McDonnell, Neil
Advisor:
Cunningham, PádraigQualification name:
Doctor of Philosophy (Ph.D.)Publisher:
Trinity College (Dublin, Ireland). School of Computer Science & StatisticsNote:
TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ieType of material:
thesisCollections:
Availability:
Full text availableLicences: