Computerising a library catalogue using optical character recognition

File Type:

PDF

Item Type:

thesis

Date:

1993

Author:

Anderson, Glynn

Access:

openAccess

Citation:

Glynn Anderson, 'Computerising a library catalogue using optical character recognition', [thesis], Trinity College (Dublin, Ireland). School of Computer Science & Statistics, 1993, pp 164

Download Item:

Anderson TCD THESIS 2842 Computerising a.pdf (PDF) 67.76Mb

Abstract:

Trinity College Library contains several million books. Catalogues for the more modern books have been computerised to allow readers a fast and efficient means of locating a book. The 1872 Printed Catalogue which lists books owned by the library before 1872 has not yet been computerised. The catalogue lists 165,000 books, some of which are the most valuable in the library. The purpose of this project is to write a computer program that will automatically computerise the catalogue using optical character recognition (OCR). OCR is the process by which a digital picture of a portion of text is converted into computer readable text. Each character on the page is represented by a group or ’blob’ of dots or pixels. The role of the computer is twofold; first to decide which pixels should be grouped together (ie which belong to the same character) and second to decide what character each of the blobs of pixels represents. The output of the OCR program is sent to a database and will eventually be incorporated into the existing DYNIX© database, currently in use in the library. The thesis contains a review of several different approaches to OCR, including feature vector analysis, discrimination trees, stroke analysis and neural networks. The implementation and results of a selection of these methods are described. The recognition or classification method used in this project, template matching, has not been implemented before as a primary classification method. The results of this thesis show that template matching compares very favourably with other classification methods. The thesis describes the considerable work undertaken in deriving a good matching algorithm which is the key to success of template matching. The segmentation of lines and characters is described in full including the development of a very efficient perimeter tracing algorithm. Before the final chapters on results, conclusion and future work, there is a chapter explaining how a state machine is used, while classifying, to delimit the fields within each entry on a catalogue page.

URI:

http://hdl.handle.net/2262/96773

Author: Anderson, Glynn

Advisor:

Byrne, John G.

Qualification name:

Master in Science (M.Sc.)

Publisher:

Trinity College (Dublin, Ireland). School of Computer Science & Statistics

Note:

TARA (Trinity's Access to Research Archive) has a robust takedown policy. Please contact us if you have any concerns: rssadmin@tcd.ie

Type of material:

thesis

URI:

http://hdl.handle.net/2262/96773

Collections:

Availability:

Full text available

Keywords:

Computer Science, M.Sc., M.Sc. Trinity College Dublin 1993, Printed Catalogue, 1872 Catalogue, Trinity College Library

Show full item record

Licences:

Original License

Browse

All of TARA

This Collection

Statistics