D. Kelly, F. Pitie,A.Kokaram, F. Boland,, A Comparative Error Analysis of Audio-Visual Source Localization, International Workshop on Multi Camera and Multi-modal Sensor Fusion, ECCV 2008, Marsaille, October 2008, 2008
This paper examines the accuracy of audio-video based localization
using multiple cameras and multi-microphones. Covariance mapping
theory is used to determine the accuracy of audio and video based
localization. Both modalities are compared in terms of their ability to
provide accurate location estimates of a moving audio-visual source. Relatively,
video is found to be significantly more accurate than audio. The
problem of audio-video fusion is also examined. The fusion of audio and
video location estimates is applied in the audio domain, the video domain
and the positional domain. The accuracy of these three fusion strategies
for 3D localization are examined from a theoretical basis. The best localization
performance is found when fusion is applied in the positional
domain. Fusing audio and video data in the video domain is found to
exhibit the worst localization performance. This analysis is confirmed by
measuring the accuracy of each fusion strategy in localizing a moving
Please note: There is a known bug in some browsers that causes an
error when a user tries to view large pdf file within the browser window.
If you receive the message "The file is damaged and could not be
repaired", please try one of the solutions linked below based on the
browser you are using.
Items in TARA are protected by copyright, with all rights reserved, unless otherwise indicated.