School of Electronic Engineering and Computer Science

Context-Adaptive Sound Separation with Deep Networks

Supervisor: Dr Sebastian Ewert

Given an audio recording of a piece of music, the goal of sound separation is to isolate and un-mix individual sounds (related to instruments, notes, melody, background) from the recording. Sound separation is often referred to as a key technology in music processing as it enables direct access to individual sound events and thus increases ways to analyse, process and re-use a recording (e.g. for re-mixing and up-mixing sounds, or for musical analysis). Unfortunately, without making prior assumptions about the recording, sound separation is mathematically ill-posed and unsolvable. However, if rich prior knowledge can be provided, the results are useful for a variety of applications. The goal of this PhD is to investigate in the context of audio sound separation, different approaches to incorporating prior knowledge into deep neural networks or to imposing informed constraints on the learning process. This way, building on recent success in deep learning, the prior knowledge can be used to constrain the expressive and analytical power provided by neural networks, resulting in potentially higher separation quality. In this context, the use and development of (Bayesian) graph networks, variational autoencoders and other methods combining probabilistical modelling with neural networks is of central interest.