Computational Shedding in Stream Computing
Citation:GUERIN, DAVID, Computational Shedding in Stream Computing, Trinity College Dublin.School of Computer Science & Statistics, 2019
PhD Thesis.pdf (Accepted for publication (author's copy) - Peer Reviewed) 2.376Mb
Stream Computing, a generic on-line data processing paradigm has emerged as the preferred approach in the processing of continuous data streams. Data streams suffer from a bursty characteristic where the data rate of the stream can spike temporarily up to orders of magnitude greater than normal expected levels. As such, producing timely application results is difficult as queues fill. The classic response to these temporary scenarios is to shed input data. However, Load Shedding (LS) impacts negatively on application output accuracy as relevant data is discarded before it is processed. Further, LS rates tend to be proportional to the input rate of the stream, as such high data rates can lead to high data loss during overload events. For many classes of applications, this can have a particularly negative impact on the quality of the output result, given that data is simply not processed before it is shed. This thesis presents a new approach, Computational Shedding (CS), to the problem of maintaining application result accuracy while attempting to forgo input data loss during transient busty data events. Rather shedding input data within the stream, we propose to adapt the application and shed tasks or subtasks of the executing application temporarily, to reduce message process costs in the stream. As such, this mechanism provides for an opportunity to temporarily increase processing capacity thereby forgoing the need for deliberate data loss during a bursty data event. We have evaluated this approach against traditional LS techniques in a number of ways, such as in terms of output application accuracy and application processing duration. In experimentation, we have found in applicable applications, subtasks can be discarded for a time. Remaining subtasks can continue to produce a valid imprecise result. CS was compared to LS alternatives which simply do not process any discarded data. We show that through the results of our evaluation, CS leads to more timely and accurate results when compared to LS alternatives.
Author: GUERIN, DAVID
Publisher:Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Science
Type of material:Thesis
Availability:Full text available
Showing items related by title, author, creator and subject.
LUPOI, ROCCO (2011)The exponential increase of industrial demand in the past two decades has led scientists to the development of alternative technologies for the fast manufacturing of engineering components, aside from ...
MEIER, RENE (IEEE Computer Society Press, 2007)Due to the heterogeneity of the environment, in which hosts may have different bandwidth capacities and network distances between hosts vary, current mesh-based multicast protocols for video streaming over the Internet ...