Applications in Image Aesthetics Using Deep Learning: Attribute Prediction, Image Captioning and Score Regression
Citation:
Ghosal, Koustav, Applications in Image Aesthetics Using Deep Learning: Attribute Prediction, Image Captioning and Score Regression, Trinity College Dublin.School of Computer Science & Statistics, 2021Download Item:
Dissertation_Koustav.pdf (PhD Thesis) 19.33Mb
Abstract:
Image Aesthetics refers to the branch of computer vision which is about the study of aesthetic properties of photographs i.e. the factors which make an image look pleasing or dull. Such factors extend beyond the physical properties of an image such as object category or location to subtler and more nuanced ambiguous concepts such as "candid expression", "harsh lighting", "bad placement" etc. Nevertheless, the problems in Image Aesthetics have traditionally been modelled as classical computer vision tasks such as classification, regression etc. And, as with most other problems in computer vision, deep learning based strategies have proved more effective in this area as well, outperforming the classical approaches by a wide margin. Nowadays, automated systems for Image Aesthetics Analysis have widespread applications from professional multimedia content development to casual creatives in social media and advertising.
In this thesis, we study three different applications in Image Aesthetics using deep learning: attribute classification, captioning and score prediction. First, we study the capacity of deep neural networks in capturing the geometric attributes i.e. those which depend on the arrangement of objects within the image. Based on this, we propose a system that predicts the dominant aesthetic attributes in a photograph such as The Rule of Thirds, leading lines etc. Second, we develop an aesthetic image captioning framework by exploiting "in the wild" user feedback from the web. Given an image, our framework generates critical feedback such as "nice composition but the foreground is out of focus". Third, we investigate the limitations of traditional convolutional neural networks with respect to global relational reasoning and handling photographs of arbitrary aspect ratio and resolution. We present a visual attention based graph neural network that addresses these limitations and advances the state-of-the-art in aesthetic score prediction.
Sponsor
Grant Number
Science Foundation Ireland (SFI for RF)
Description:
APPROVED
Author: Ghosal, Koustav
Advisor:
Smolic, AljosaPublisher:
Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer ScienceType of material:
ThesisCollections:
Availability:
Full text availableLicences: