Ego-hand Gesture Recognition in Trimmed and Untrimmed Videos for Interactions in Augmented and Virtual Reality.
Citation:Chalasani, Tejo Krishna, Ego-hand Gesture Recognition in Trimmed and Untrimmed Videos for Interactions in Augmented and Virtual Reality., Trinity College Dublin.School of Computer Science & Statistics, 2021
Dissertation.pdf (Thesis) 27.23Mb
Hand gestures are used as a way of communication in our daily lives. Using hand gestures to interact with the virtual environment in Augmented and Virtual reality is a natural extension from the real to the virtual scenario. Thus, recognising hand gestures as seen by the head-mounted Augmented and Virtual reality devices termed as ego-hand gestures, is a problem worth exploring. Deep neural networks recently have been used to solve the problem of ego-hand gesture recognition, as they are known for their robustness to handle various problems posed by ego-hand gesture recognition like varying lighting conditions, background environments, skin colour, ego-motion, motion blur and more. However, they need a large amount of data for training and testing, making data collection and annotation process very laborious. This work proposed a novel data augmentation technique and published a new ego-hand gestures dataset (Green Screen Ego-hand Gesture Dataset) that can reduce the labour intensive data collection process while being able to train generalisable networks. A new deep neural network architecture that works on trimmed video ego-hand gesture recognition paying specific attention to ego-hands in the images was proposed. This network was trained and tested on various available trimmed ego-hand gesture recognition dataset including the Green Screen Ego-hand gesture dataset we proposed, advancing the state of the art recognition performance on all the tested datasets. Trimmed video recognition is not applicable in a real-world scenario, since real-world scenario contains untrimmed videos. In contrast to trimmed videos, untrimmed videos contains gestures interspersed with non-gesture images. StAG LSTM, an extension to LSTM framework was proposed to train ego-hand recognition deep neural network on untrimmed videos. This addition reduces the number of heuristics used, which are a standard part of existing methods. A new loss function(IG Loss) that better optimises the network and a new evaluation metric that is more useful than the current metrics for measuring accuracy of untrimmed video recognition is also proposed. StAG LSTM with IG Loss advanced state of the art performance on ego-hand gesture recognition on untrimmed video.
Author: Chalasani, Tejo Krishna
Publisher:Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Science
Type of material:Thesis
Availability:Full text available