School of Electronic Engineering and Computer Science

Information Extraction and Computational Linguistics: A Case for Probabilistic Datalog

Supervisor: Dr Thomas Roelleke

Research group(s): Risk & Information Management

Probabilistic Datalog (PDatalog) is a rule-based programming paradigm that provides a high-level data abstraction. PDatalog can be applied to information management tasks such as classification, summarisation, semantic (knowledge-based) retrieval, prediction and recommendation. This project aims at exploring the options to model methods and algorithms from information extraction and computational linguistics in PDatalog. The syntax and meaning of language can be captured in rules (onthologies), and the semantics of a text can be modelled as a set of facts and rules. The purpose of this project is to investigate the application of probabilistic reasoning to extract information and to reason about language. There are numerous challenges to be addressed. The main hypothesis is that many knowledge engineers (data analysts) can benefit from a high-level abstraction to model methods used for information extraction and in computational linguistics.