Convolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Vision

Ulicny, Matej

This item is covered by a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Internationa. Click to find out more

File Type:

PDF

Item Type:

Thesis

Date:

2021

Author:

Ulicny, Matej

Access:

openAccess

Citation:

Ulicny, Matej, Convolutional Neural Networks based on Discrete Cosine Transform with Applications in Computer Vision, Trinity College Dublin.School of Computer Science & Statistics, 2021

Download Item:

(matej_ulicny_thesis.pdf) 16.59Mb

Abstract:

Convolutional neural networks (CNNs) have become a paradigm for designing vision based intelligent systems. These models are controlled by a vast amount of parameters, which are learned thanks to the availability of annotated datasets. Image data is available in multiple formats including JPEG that uses Discrete Cosine Transform (DCT) coefficients to efficiently encode and compress visual information. We first propose to use directly these DCT coefficients of the JPEG images as input of CNN models, removing the need to completely decode JPEG format before applying CNNs. Furthermore, we propose to use DCT basis functions to express convolutional filters in any layer of a CNN and we show that this provides an advantageous regularization during the training process. We show that expressing weights within DCT bases can increase performance and speed up the training. We improve several popular models on standard benchmarks such as ImageNet classification accuracy by 1%, MS COCO object detection average precision by 1% and Pascal VOC semantic segmentation IoU score by 1.1%. We propose to exploit properties of natural images by restricting the set of basis functions used during the training. Suppressing the low-frequency component on the first layer can make models insensitive to illumination effects. High-frequency truncation on multiple layers can in turn add stability and efficiently compress a model without any significant loss in accuracy. Using the DCT bases provides a prior that reduces overfitting, specially when compression is applied, and helps with generalization when fewer samples are available. Lastly, the standard DCT-based compression is modified and extended to be applicable to any weight tensor used in neural networks. We propose to reshape a tensor into a 2-dimensional matrix and reorder its rows based on pairwise distances between the columns in order to make the matrix more coherent. The reordered matrix is transformed via 1-dimensional DCT and high frequencies are truncated. We further correct the scale and bias parameters of batch normalization layers to take into account compression of the preceding layers. Promising results are achieved even without a need for model fine-tuning. The use of a short fine-tuning of one epoch can lead to models with 3-times fewer parameters without a loss in accuracy.

URI:

http://hdl.handle.net/2262/96207

Sponsor

Grant Number

Science Foundation Ireland (SFI)

Author's Homepage:

https://tcdlocalportal.tcd.ie/pls/EnterApex/f?p=800:71:0::::P71_USERNAME:ULINM

Description:

APPROVED

Author: Ulicny, Matej

Advisor:

Dahyot, Rozenn

Publisher:

Trinity College Dublin. School of Computer Science & Statistics. Discipline of Computer Science

Type of material:

Thesis

URI:

http://hdl.handle.net/2262/96207

Collections

Availability:

Full text available

Keywords:

Boundary Detection, Convolutional Neural Network, Discrete Cosine Transform, Image Classification, Model Compression, Object Detection, Semantic Segmentation

Metadata

Show full item record

Licences:

Original License

Browse

My Account