Convolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Vision
Citation:
Ulicny, Matej, Convolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Vision, Trinity College Dublin.School of Computer Science & Statistics, 2021Download Item:
Abstract:
Convolutional neural networks (CNNs) have become a paradigm for designing vision based intelligent systems. These models are controlled by a vast amount of parameters, which are learned thanks to the availability of annotated datasets. Image data is available in multiple formats including JPEG that uses Discrete Cosine Transform (DCT) coefficients to efficiently encode and compress visual information. We first propose to use directly these DCT coefficients of the JPEG images as input of CNN models, removing the need to completely decode JPEG format before applying CNNs.
Furthermore, we propose to use DCT basis functions to express convolutional filters in any layer of a CNN and we show that this provides an advantageous regularization during the training process. We show that expressing weights within DCT bases can increase performance and speed up the training. We improve several popular models on standard benchmarks such as ImageNet classification accuracy by 1%, MS COCO object detection average precision by 1% and Pascal VOC semantic segmentation IoU score by 1.1%. We propose to exploit properties of natural images by restricting the set of basis functions used during the training. Suppressing the low-frequency component on the first layer can make models insensitive to illumination effects. High-frequency truncation on multiple layers can in turn add stability and efficiently compress a model without any significant loss in accuracy. Using the DCT bases provides a prior that reduces overfitting, specially when compression is applied, and helps with generalization when fewer samples are available.
Lastly, the standard DCT-based compression is modified and extended to be applicable to any weight tensor used in neural networks. We propose to reshape a tensor into a 2-dimensional matrix and reorder its rows based on pairwise distances between the columns in order to make the matrix more coherent. The reordered matrix is transformed via 1-dimensional DCT and high frequencies are truncated. We further correct the scale and bias parameters of batch normalization layers to take into account compression of the preceding layers. Promising results are achieved even without a need for model fine-tuning. The use of a short fine-tuning of one epoch can lead to models with 3-times fewer parameters without a loss in accuracy.
Sponsor
Grant Number
Science Foundation Ireland (SFI)
Description:
APPROVED
Author: Ulicny, Matej
Advisor:
Dahyot, RozennPublisher:
Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer ScienceType of material:
ThesisCollections
Availability:
Full text availableMetadata
Show full item recordLicences: