CASSANDRA: A Collaborative Anti Spam System Allowing Node Decentralised Research Algorithms
TCD-CS-2003-49.pdf (PDF) 910.4Kb
Email has become a critical tool in many people's lives, both professionally and personally. It is easily accessible, inexpensive, fast, versatile and is far-reaching both in terms of the number and spectrum of people to whom it can deliver information. For exactly the same reasons, email is being exploited by mass-marketers, and the unsolicited bulk email (or spam) problem is reaching epidemic proportions, with half of the daily worldwide emails sent being spam. Some sources estimate that spam will cost companies a total of US$20.5 billion in 2003 alone.
The first generation of content-based spam filters has proven inadequate to the taks of sorting legitimate emaial from spam email. This is because of the implicit assumptions in their design and the fact that the filters lack the sophistication to be able to classify all email correctly. Collaborative filtering has proven to be more successful than content-based filtering, but still suffers from lack of scalability and poor performance, due to centralisation of data and the assumption that all users are equally likely to receive the same spam.
This thesis reviews the state of the art in content-based ans collaborative spam filters. The assumptions that cause poor accuracy, performance and scalability are identified. The concept of personalised, collaborative spam filtering is introduces. Personalised, collaborative filtering does not make the same inaccurate assumptions that other filters do, and are characterised by dissseminating information about new spam to the users that are most likely to receive that spam. A versatile, extensible framework that details how personalised, collaborative spam filters can be built and interact is presented. This framework defines objects and messages that can composed to form different network topologies and filtering systems.
A peer-to-peer (P2P), signature-based proof-of-concept instance of the framework is implemented on a popular Mail User Agent (MUA). This framework is tested on simulated spam and non-spam email by real users. The results of this experiment show that the implementation is accurate and efficient. The results also validate the framework and present directions for future work.
Author: GRAY, ALAN
Availability:Full text available