School of Electronic Engineering and Computer Science

Reasoning about Natural Language Data using Random Matrix Theory

Supervisor: Dr Mehrnoosh Sadrzadeh

Research group(s): Theory

Distributional semantics is a model of natural language computing meanings of words using their frequency of occurrences in context. A set of contexts are fixed and vector representations for words are built in the space spanned by these contexts. These models have found vast applications in areas such as named entity extraction and parsing, disambiguation, paraphrasing and summarisation. Their Achilles heel, however, has been in their inability to represent meanings of sentences. Recently, through encoding grammatical structures of sentences into linear maps that act on word meanings, this issue has been resolved. The problem that remains, however, is that the resulting representations can be matrices, cubes, and in principle, any other higher order tensor. Learning these tensors and reasoning about them has been a challenge for the field. Random Matrix Theories (RMT) of Dyson and Wigner and their extensions to higher order tensors, are developed for the exact same purposes: to analyse the statistical properties of linear objects that go beyond vector. RMT has found a flurry of applications in fundamental physics notably quantum field theory and string theory, but as well to chaotic systems, complex networks, and financial correlations. This PhD is about applying RMT and their higher tensor extensions to natural language data. This interdisciplinary research program at the interface of computer science and theoretical physics was initiated by Kartsaklis, Ramgoolam, and Sadrzadeh. The PhD will be supervised jointly by Ramgoolam and Sadrzadeh.