Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality
File Type:
PDFItem Type:
Conference PaperDate:
2019Access:
openAccessCitation:
Rana, A., Ozcinar, C. & Smolic, A., Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 44th International Conference on Acoustics, Speech, and Signal Processing, (ICASSP), Forthcoming., 2019, 2012-2016Download Item:
Abstract:
Ambisonics i.e., a full-sphere surround sound, is quintessential with
360◦
visual content to provide a realistic virtual reality (VR) experience. While 360◦
visual content capture gained a tremendous boost
recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information
about the sound-source locations. In this paper, we introduce a novel
problem of generating Ambisonics in 360◦
videos using the audiovisual cue. With this aim, firstly, a novel 360◦
audio-visual video
dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic
estimation problem. Benefiting from the deep learning based audiovisual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations
to encode to the B-format. To benchmark our dataset and pipeline,
we additionally propose evaluation criteria to investigate the performance using different 360◦
input representations. Our results
demonstrate the efficacy of the proposed pipeline and open up a new
area of research in 360◦
audio-visual analysis for future investigations.
Sponsor
Grant Number
Science Foundation Ireland (SFI)
15/RP/2776
Author's Homepage:
http://people.tcd.ie/smolicaDescription:
PUBLISHEDOther Titles:
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings44th International Conference on Acoustics, Speech, and Signal Processing, (ICASSP), Forthcoming.
Type of material:
Conference PaperCollections:
Availability:
Full text availableKeywords:
Virtual reality, 360 video, Spatial sound, Ambisonics, Multi-model, Deep learningSubject (TCD):
Creative Technologies , Multimedia & CreativityLicences: