Neural Synthesis of Sound Effects
This PhD project explores the application of deep generative neural networks to sound effects, with applications like movies, video games or radio shows. When sound designers are given the task of creating sound effects for a multimedia application, they have essentially two options: pre-recorded samples or procedural audio. In the first case, they usually rely on large libraries of high quality audio recordings, from which they have to select, edit and mix samples for each event they need to sonify. For a realistic result, especially in video games where the same actions are repeated many times, several samples are selected for every event and randomised during action. This creates challenges in terms of memory requirements, assets management and implementation time. Alternatively, procedural audio aims to synthesise sound effects in real time, based on a set of input parameters. In the context of video games, these parameters might come from the specific interaction of a character with the environment. This approach presents challenges in terms of development of procedural models which can synthesise high quality and realistic audio, as well as finding the right parameters values for each sound event.
In recent years, generative neural networks are seeing a great leap forward, with new, more accurate and more flexible neural synthesis models being proposed - especially for speech synthesis applications. Even with all the research effort, there is still much more work required, for musical instruments tones and sound effects synthesis, before obtaining synthesis results comparable to recorded samples. Large libraries of high-quality recorded samples are available to be exploited as ground truth to evaluate and improve existing models or develop new ones. The recent developments in data representation, latent space control and latent variables disentanglement show promising results in the challenge of designing flexible and controllable architectures capable of generating a wide range of sounds.
C4DM theme affiliation: Audio Engineering, Sound Synthesis