Research Themes
My research is in the area of Human Centered Machine Learning, using Machine Learning, Computer Vision and Signal Processing methodologies to learn from multiple sources concepts that enable Intelligent Systems to understand, communicate and collaborate with humans. Currently it evolves around three themes:
- Learning to recognise behaviour, emotions and cognitive states of people by analysing their images, video and neuro-physiological signals
- Learning across modalities, and in particular at the intersections of language and vision, using large, pretrained language and audio-visual models
- Learning from generative models and learning to control generation for privacy, interpretability and control purposes.
Multimodal Machine Learning (Vision and Language)
This line of work is concerned with learning across modalities, and in particular at the intersections of language and vision, utilising, fine-tuning and adapting large, pre-trained Language and Vision-Language models.
Key references:Affective Computing
This line of research is concerned with the recognition of behaviour, emotions and cognitive states of people by analysing their images, video and neuro-physiological signals. In a recent line of work this extends to the analysis of mental health illnesses, such as schizophrenia and depression.
Key references:Generation and Learning
This line of research is concerned with learning from generative models and learning to control generation for privacy, interpretability and control purposes. This includes learning representations in the latent space of generative models so as to control local changes, control image generation with natural language and controlling generation so as to anonymise datatasets in order to use them for training machine learning models in a privacy preserving manner.
Key references:Learning with few/no/noisy/uncertain/imprecise annotations
This line of research is concerned with learning in the absence of reliable annotations. This includes self-supervised representation learning, unsupervised learning with clustering objectives or learning with labels of different granularity than that of the downstream task.
Key references:-
NoiseBox: Towards More Efficient and Effective Learning with Noisy LabelsIEEE Transactions on Circuits and Systems for Video Technology, 2024
-
Linear Maximum Margin Classifier for Learning from Uncertain DataIEEE Trans. Pattern Anal. Mach. Intell., 2018
-
Unsupervised convolutional neural networks for motion estimationIn 2016 IEEE International Conference on Image Processing, ICIP 2016, Phoenix, AZ, USA, September 25-28, 2016, 2016
Video Understanding
This line of research is concerned with analysis of video for retrieval, summarisation and activity/action recognition.
Key references:-
FIVR: Fine-Grained Incident Video RetrievalIEEE Trans. Multim., 2019
-
TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action RecognitionIn 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9-12, 2019, 2019
-
ViSiL: Fine-Grained Spatio-Temporal Video Similarity LearningIn 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, 2019