Supervisor: Dr Sebastian Ewert
The development of music compression was at its peak between 1990-2000. At the time, the automatic identification of perceptually irrelevant information in audio streams enabled a drastic decrease in bitrate and had a massive impact on how we listen to, buy and produce music today. Since then, however, the underlying concepts have hardly changed and can be found in essentially identical form on all subsequent audio coding standards (MPEG surround, SAOC, MPEG-H,...). In lossless audio compression, linear prediction remains a central component that has not been replaced since the 1970s. The goal of this PhD is to design deep neural networks that can learn to anticipate music signals and thus to create a conceptually novel approach for further reducing the bitrate. Such a method can (but not necessarily must) be combined with existing perceptual coding techniques. This problem scenario could involve various types of networks for temporal modelling (possibly with a specific focus on attention-based models to incorporate offline knowledge) and adversarial training (as music streams are compared to the frame-rate of the spectral front-end rather stationary processes and adversarial training helps learning the behaviour at boundaries between such processes).