School of Electronic Engineering and Computer Science

Deep Learning for Evaluation: Differentiable Metrics from the Crowd

Supervisor: Dr Sebastian Ewert

Evaluation is of central importance not only in music informatics but in signal processing and machine learning in general - only with proper evaluation it is possible to state whether a proposed system is likely to do what we want it to do. Therefore, the more accurate and representative the evaluation, the more conclusive the results. For example, in music processing, listening tests are often the only reliable way to obtain perceptually relevant evaluation metrics. In machine learning, however, such evaluations have to be conducted thousands or millions of times, as the parameters of a system are adjusted to maximize an evaluation measure - here, listening tests are too time-consuming and expensive. As a makeshift, most machine learning approaches employ simplistic evaluation measures such as a Euclidean distance or a cross-entropy. The goal of this PhD is to develop neural networks that can learn to predict the results of listening tests, with training data obtained using crowd-sourcing services such as Amazon Turk or similar platforms. Such a network is intrinsically differentiable and thus is immediately useful as an objective function in machine learning and thus can directly be used to improve upon the currently used simplistic measures. In this sense, the goal of this PhD is not to replace listening tests but to improve upon how we train our models in general.