Lyrics Alignment For Polyphonic Music
Two approaches towards lyrics alignment for polyphonic music are proposed. The first one utilizes the ability of the attention mechanism to model the probability of classes, while the second one treat the task as a reinforcement learning game.
Lyrics alignment provides a fine grained level analysis of vocal songs. While it has direct application in karaoke and navigation within a song, it might benefit other singing content analysis tasks as well, such as singing voice separation and lyrics transcription, by providing word or phoneme level annotations. Comparing to speech-text alignment or speech recognition, lyrics alignment is more challenging due to the accompaniment. Besides, the pronunciations of the same word can be different from speech, and the syllable duration is usually adapted to the melody. Various singing techniques make it even more complex to model.
Regarding to the dataset for training a machine learning model, the word level alignment needs much human effort thus it is hard to get a large dataset. However, given the fact that phrase level (or line level) lyrics alignment is relatively easy to obtain, we can address the word level lyrics alignment as a problem of getting strong labels from weakly-labeled data.
Another way to view this problem is to treat it as a reinforcement learning game. This is actually close to human's alignment process. It is not necessary to know the whole phrase of lyrics for a human to align a word to a segment of audio.
C4DM theme affiliation: Music Informatics, Machine Listening