Show simple item record

dc.contributor.authorSmolic, Aljosa
dc.date.accessioned2021-02-18T17:30:40Z
dc.date.available2021-02-18T17:30:40Z
dc.date.issued2020
dc.date.submitted2020en
dc.identifier.citationChao, F. -Y., Ozcinar, C., Zhang, L., Hamidouche, W., Deforges, O., Smolic, A., "Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio," 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), Macau, China, 2020, pp. 355-358en
dc.identifier.otherY
dc.identifier.urihttp://hdl.handle.net/2262/95242
dc.description.abstractOmnidirectional videos (ODVs) with spatial audio enable viewers to perceive 360° directions of audio and visual signals during the consumption of ODVs with head-mounted displays (HMDs). By predicting salient audio-visual regions, ODV systems can be optimized to provide an immersive sensation of audio-visual stimuli with high-quality. Despite the intense recent effort for ODV saliency prediction, the current literature still does not consider the impact of auditory information in ODVs. In this work, we propose an audio-visual saliency (AVS360) model that incorporates 360° spatial-temporal visual representation and spatial auditory information in ODVs. The proposed AVS360 model is composed of two 3D residual networks (ResNets) to encode visual and audio cues. The first one is embedded with a spherical representation technique to extract 360° visual features, and the second one extracts the features of audio using the log mel-spectrogram. We emphasize sound source locations by integrating audio energy map (AEM) generated from spatial audio description (i.e., ambisonics) and equator viewing behavior with equator center bias (ECB). The audio and visual features are combined and fused with AEM and ECB via attention mechanism. Our experimental results show that the AVS360 model has significant superiority over five state-of-the-art saliency models. To the best of our knowledge, it is the first w ork that develops the audio-visual saliency model in ODVs. The code will be publicly available to foster future research on audio-visual saliency in ODVs.en
dc.language.isoenen
dc.relation.urihttps://ieeexplore.ieee.org/abstract/document/9301766en
dc.rightsYen
dc.subjectAudio-visual saliencyen
dc.subjectSpatial sounden
dc.subjectAmbisonicsen
dc.subjectOmnidirectional video (ODV)en
dc.subjectVirtual Reality (VR)en
dc.titleTowards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audioen
dc.title.alternative2020 IEEE International Conference on Visual Communications and Image Processing (VCIP) IEEE, China, 2020.en
dc.typeConference Paperen
dc.contributor.sponsorScience Foundation Ireland (SFI)en
dc.type.supercollectionscholarly_publicationsen
dc.type.supercollectionrefereed_publicationsen
dc.identifier.peoplefinderurlhttp://people.tcd.ie/smolica
dc.identifier.rssinternalid223806
dc.identifier.doi10.1109/VCIP49819.2020.9301766
dc.rights.ecaccessrightsopenAccess
dc.contributor.sponsorGrantNumber15/RP/2776en
dc.relation.doi10.1109/VCIP49819.2020.9301766en
dc.relation.citesCitesen
dc.relation.citesCitesen
dc.subject.TCDThemeCreative Technologiesen
dc.subject.TCDThemeDigital Engagementen
dc.subject.TCDTagImage Processingen
dc.subject.TCDTagInformation technology in educationen
dc.subject.TCDTagMultimedia & Creativityen
dc.identifier.rssurihttps://ieeexplore.ieee.org/abstract/document/9301766
dc.subject.darat_impairmentOtheren
dc.status.accessibleNen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record