Supervisor: Dr Lin Wang
With the prevalence of personal equipment like smartphones and laptops, it often occurs that many people record a same social event with their personal devices (e.g. public talk, music concert). However, due to undesired field of view, the influence of environmental noise and room reverberation, the audio and visual recording at individual device is usually of poor quality. In fact, the microphones and cameras embedded in multiple devices can be used to construct an ad-hoc audio-visual sensing network. The project mainly aims to develop novel audio-visual signal processing and machine learning algorithms that can exploit the recordings from multiple smartphones to improve the sound quality and to generate desirable audio and video content. There are several interesting directions (not limited to) to work on. 1) Device localization: to robustly determine the locations of the smartphones using ambient information extracted from audio and visual recordings. 2) Enhanced audio content generation: to exploit the spatial information captured by distributed smartphones for acoustic scene analysis and target speech extraction in noisy and adverse environments, and to render spatialized audio that gives listeners an immersive perception. 3) Joint audio-visual content generation: to spatially and temporally synchronize the audio and visual information captured by distributed smartphones and to render enjoyable multi-view and immersive multimedia presentation. The project will be collaborated with Centre for Intelligent Sensing (CIS) and Centre for Digital Music (C4DM).