Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Smolic, Aljosa; Rana, Aakanksha; Oczinar, Cagri

dc.contributor.author	Smolic, Aljosa
dc.contributor.author	Rana, Aakanksha
dc.contributor.author	Oczinar, Cagri
dc.date.accessioned	2019-11-08T16:02:13Z
dc.date.available	2019-11-08T16:02:13Z
dc.date.issued	2019
dc.date.submitted	2019	en
dc.identifier.citation	Rana, A., Ozcinar, C. & Smolic, A., Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 44th International Conference on Acoustics, Speech, and Signal Processing, (ICASSP), Forthcoming., 2019, 2012-2016	en
dc.identifier.other	Y
dc.identifier.uri	https://ieeexplore.ieee.org/document/8683318
dc.identifier.uri	http://hdl.handle.net/2262/90360
dc.description	PUBLISHED	en
dc.description.abstract	Ambisonics i.e., a full-sphere surround sound, is quintessential with 360◦ visual content to provide a realistic virtual reality (VR) experience. While 360◦ visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360◦ videos using the audiovisual cue. With this aim, firstly, a novel 360◦ audio-visual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning based audiovisual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360◦ input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360◦ audio-visual analysis for future investigations.	en
dc.format.extent	2012-2016	en
dc.language.iso	en	en
dc.rights	Y	en
dc.subject	Virtual reality	en
dc.subject	360 video	en
dc.subject	Spatial sound	en
dc.subject	Ambisonics	en
dc.subject	Multi-model	en
dc.subject	Deep learning	en
dc.title	Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality	en
dc.title.alternative	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings	en
dc.title.alternative	44th International Conference on Acoustics, Speech, and Signal Processing, (ICASSP), Forthcoming.	en
dc.type	Conference Paper	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/smolica
dc.identifier.rssinternalid	199013
dc.rights.ecaccessrights	openAccess
dc.subject.TCDTheme	Creative Technologies	en
dc.subject.TCDTag	Multimedia & Creativity	en
dc.identifier.rssuri	https://v-sense.scss.tcd.ie/wp-content/uploads/2019/02/ICASSP2019_multimodal.pdf
dc.status.accessible	N	en
dc.contributor.sponsor	Science Foundation Ireland (SFI)	en
dc.contributor.sponsorGrantNumber	15/RP/2776	en

Files in this item

Name:: Towards Generating Ambisonics ...
Size:: 28.90Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.499Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Scholarly Publications)
Computer Science (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Files in this item

This item appears in the following Collection(s)