Optical Music Recognition using Deep Learning
The proposed PhD focuses on developing novel techniques for optical music recognition (OMR) using Deep Neural Networks (DNN). The research will be carried out in collaboration with Steinberg Media Technologies opening the opportunity to work with and test the research outcomes in leading music notation software.
Musicians, composer, arrangers, orchestrators and other users of music notation have long had a dream that they could simply take a photo or use a scan of sheet music and bring it into a music notation application to be able to make changes, rearrange, transpose, or simply listen to being played by the computer. The PhD aims to investigate and demonstrate a novel approach to converting images of sheet music into a semantic representation such as MusicXML and/or MEI. The research will be carried out in the context of designing a music recognition engine capable of ingesting, optically correcting, processing and recognising multiple pages of handwritten or music from image captured by mobile phone, or low-resolution copyright-free scans from the International Music Score Library Project (IMSLP). The main objective is outputting semantic mark-up identifying as many notational elements and text as possible, along with the relationship to their position in the original image. Prior solutions have used algorithmic approaches and have involved layers of algorithmic rules applied to traditional feature detection techniques such as edge detection. An opportunity exists to develop and evaluate new approaches based on DNN and other machine learning techniques.
State-of-the-art Optical Music Recognition (OMR) is already able to recognise clean sheet music with very high accuracy, but fixing the remaining errors may take just as long, if not longer, than transcribing the music into notation software by hand. A new method that can improve recognition rates will allow users who are not so adept at inputting notes into a music notation application to get better results quicker. Another challenge to tackle is the variability in quality of input (particularly from images captured from smartphones) and how best to preprocess the images to improve the quality of recognition for subsequent stages of the pipeline. The application of cutting edge techniques in data science, including machine learning, particularly convolutional neural networks (CNN) may yield better results than traditional methods. To this end, research will start from testing VGG like architectures (https://arxiv.org/abs/1409.1556) and residual networks (e.g. ResNet, https://arxiv.org/pdf/1512.03385.pdf) for the recognition of hand written and/or low-resolution printed sheet music. The same techniques may also prove useful in earlier stages of the pipeline such as document detection and feature detection. It would be desirable to recognise close to all individual objects in the score. One of the first objectives will be to establish the methodology for determining the differences between the reference data and the recognised data. Furthermore data augmentation can be supported by existing Steinberg software. The ideal candidate would have previous experience of training machine learning models and would be familiar with Western music notation. Being well versed in image acquisition, processing techniques, and computer vision would be a significant advantage.
C4DM theme affiliation:
Machine Learning (Postgraduate)
The aim of the module is to give students an understanding of machine learning methods, including pattern recognition, clustering and neural networks, and to allow them to apply such methods in a range of areas.
Machine Learning for Visual Data Analysis (Postgraduate)
The module will cover the following topics: The Discrete Fourier Transform and the frequency content of images. The design and use of Gabor filters. Principal Component Analysis for denoising and compression. Unsupervised classification via feature space clustering. Texture segmentation with Gabor filters.
Optical Music Recognition with Deep Learning
Optical Music Recognition (OMR) is concerned with digitizing music sheets into a machine-readable format. Being able to compose, transcribe and edit music by means of taking a picture of a music sheet, would put musician?s workload at ease. Such automation would allow musicians to use search-ability and to perform quantitative measures in the musical pieces. This problem comes down to a simple, how can computers be made to read music. The output to this process being a machine-readable file such as MIDI, MusicXML, MEI files. The objective is outputting semantic mark-up identifying as many notational elements as possible, along with the relationship to their position in the original image.
Prior solutions have used algorithmic approaches and have involved layers of algorithmic rules applied to traditional feature detection techniques such as edge detection. One of the approaches we want to further investigate is using deep neural networks to solve the problem. Before going into this step another very important processing should be performed, that is, tackling the quality of the input picture of the music sheet. Image preprocessing steps are to be taken which will later help in the training step.
An OMR pipeline should be able to capture the right position and the relationships between two notes and its distinctive features. Pacha et al. (2018) proposed an end-to-end trainable object detector that can detect almost the full vocabulary of modern music notation in handwritten scores. Using deep convolutional networks in a dataset with symbol-level notations they achieve a mean average precision up to 80 %.
The OMR pipeline has four main blocks, and we want to tackle them one by one, using a deep learning technique and compare to the already existing techniques. If the DL techniques show improvements, then an end-to-end network is the final goal of our work. Since the existing datasets do not offer enough classes and data, the first step for use would be data augmentation. This will be done using the digitized musical sheets from music notation software Dorico, having this way a ground truth. These sheets will be subject to image degradation techniques, using the depredated images as inputs in our pipeline. The next step would be designing the methodology for object recognition and reconstruction using the deep neural network approach.