Research Themes

My research is in the area of Human Centered Machine Learning, using Machine Learning, Computer Vision and Signal Processing methodologies to learn from multiple sources concepts that enable Intelligent Systems to understand, communicate and collaborate with humans. Currently it evolves around three themes:

  • Learning to recognise behaviour, emotions and cognitive states of people by analysing their images, video and neuro-physiological signals
  • Learning across modalities, and in particular at the intersections of language and vision, using large, pretrained language and audio-visual models
  • Learning from generative models and learning to control generation for privacy, interpretability and control purposes.
Card image cap

Multimodal Machine Learning (Vision and Language)

This line of work is concerned with learning across modalities, and in particular at the intersections of language and vision, utilising, fine-tuning and adapting large, pre-trained Language and Vision-Language models.

Key references:
  1. EMSO.png
    Get Confused Cautiously: Textual Sequence Memorization Erasure with Selective Entropy Maximization
    Zhaohan Zhang, Ziquan Liu, and Ioannis Patras
    In The International Conference on Computational Linguistics (COLING), 2025
    generation-and-learning multimodal-ml
  2. expclip.png
    Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer
    Zengqun Zhao, Yu Cao, Shaogang Gong, and 1 more author
    In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
    affective-computing multimodal-ml
  3. CemiFace: Center-based Semi-hard Synthetic Face Generation for Face Recognition
    Zhonglin Sun, Siyang Song, Ioannis Patras, and 1 more author
    In Advances in Neural Information Processing Systems (NeurIPS), 2025
    multimodal-ml generation-and-learning
  4. clipcleaner.png
    CLIPCleaner: Cleaning Noisy Labels with CLIP
    Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras
    In ACM International Conference on Multimedia (ACM MM), 2024
    learning-from-few-samples multimodal-ml
  5. simple-vqa-baseline.png
    A Simple Baseline for Knowledge-Based Visual Question Answering
    Alexandros Xenos, Themos Stafylakis, Ioannis Patras, and 1 more author
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
    multimodal-ml
  6. vl-bias.png
    Improving Fairness using Vision-Language Driven Image Augmentation
    Moreno D’Incà, Christos Tzelepis, Ioannis Patras, and 1 more author
    In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024
    generation-and-learning multimodal-ml
  7. emo-clip.png
    EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition
    Niki Maria Foteinopoulou, and Ioannis Patras
    ArXiv, 2023
    multimodal-ml
  8. pos.png
    Parts of Speech-Grounded Subspaces in Vision-Language Models
    James Oldfield, Christos Tzelepis, Yannis Panagakis, and 2 more authors
    In Advances in Neural Information Processing Systems (NeurIPS), 2023
    generation-and-learning multimodal-ml
  9. dferclip.png
    Prompting Visual-Language Models for Dynamic Facial Expression Recognition
    Zengqun Zhao, and Ioannis Patras
    In British Machine Vision Conference (BMVC), 2023
    affective-computing multimodal-ml
  10. contra-clip.png
    ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences
    Christos Tzelepis, James Oldfield, Georgios Tzimiropoulos, and 1 more author
    ArXiv, 2022
    generation-and-learning multimodal-ml
Card image cap

Affective Computing

This line of research is concerned with the recognition of behaviour, emotions and cognitive states of people by analysing their images, video and neuro-physiological signals. In a recent line of work this extends to the analysis of mental health illnesses, such as schizophrenia and depression.

Key references:
  1. expclip.png
    Enhancing Zero-Shot Facial Expression Recognition by LLM Knowledge Transfer
    Zengqun Zhao, Yu Cao, Shaogang Gong, and 1 more author
    In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025
    affective-computing multimodal-ml
  2. dferclip.png
    Prompting Visual-Language Models for Dynamic Facial Expression Recognition
    Zengqun Zhao, and Ioannis Patras
    In British Machine Vision Conference (BMVC), 2023
    affective-computing multimodal-ml
  3. amigos.png
    AMIGOS: A Dataset for Affect, Personality and Mood Research on Individuals and Groups
    Juan Abdon Miranda Correa, Mojtaba Khomami Abadi, Nicu Sebe, and 1 more author
    IEEE Trans. Affect. Comput., 2021
    affective-computing
  4. schinet.png
    SchiNet: Automatic Estimation of Symptoms of Schizophrenia from Facial Behaviour Analysis
    Mina Bishay, Petar Palasek, Stefan Priebe, and 1 more author
    IEEE Trans. Affect. Comput., 2021
    affective-computing
  5. pairwise-ranking.png
    Pairwise Ranking Network for Affect Recognition
    Georgios Zoumpourlis, and Ioannis Patras
    In 9th International Conference on Affective Computing and Intelligent Interaction, ACII 2021, Nara, Japan, September 28 - Oct. 1, 2021, 2021
    affective-computing
  6. DEAP: A Database for Emotion Analysis Using Physiological Signals
    Sander Koelstra, Christian Mühl, Mohammad Soleymani, and 6 more authors
    IEEE Trans. Affect. Comput., 2012
    affective-computing
  7. acmmm2022_foteinopoulou_methodOverview.png
    Learning from Label Relationships in Human Affect
    Niki Maria Foteinopoulou, and Ioannis Patras
    In Proceedings of the 30th ACM International Conference on Multimedia, 2022
    affective-computing
Card image cap

Generation and Learning

This line of research is concerned with learning from generative models and learning to control generation for privacy, interpretability and control purposes. This includes learning representations in the latent space of generative models so as to control local changes, control image generation with natural language and controlling generation so as to anonymise datatasets in order to use them for training machine learning models in a privacy preserving manner.

Key references:
  1. EMSO.png
    Get Confused Cautiously: Textual Sequence Memorization Erasure with Selective Entropy Maximization
    Zhaohan Zhang, Ziquan Liu, and Ioannis Patras
    In The International Conference on Computational Linguistics (COLING), 2025
    generation-and-learning multimodal-ml
  2. CemiFace: Center-based Semi-hard Synthetic Face Generation for Face Recognition
    Zhonglin Sun, Siyang Song, Ioannis Patras, and 1 more author
    In Advances in Neural Information Processing Systems (NeurIPS), 2025
    multimodal-ml generation-and-learning
  3. mumoe-anim.gif
    Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization
    James Oldfield, Markos Georgopoulos, Grigorios G. Chrysos, and 5 more authors
    In Advances in Neural Information Processing Systems (NeurIPS), 2024
    generation-and-learning
  4. LAFS.png
    LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition
    Zhonglin Sun, Chen Feng, Ioannis Patras, and 1 more author
    In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024
    generation-and-learning
  5. facial_region_awareness.png
    Self-Supervised Facial Representation Learning with Facial Region Awareness
    Zheng Gao, and Ioannis Patras
    In IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024
    generation-and-learning
  6. vl-bias.png
    Improving Fairness using Vision-Language Driven Image Augmentation
    Moreno D’Incà, Christos Tzelepis, Ioannis Patras, and 1 more author
    In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024
    generation-and-learning multimodal-ml
  7. pos.png
    Parts of Speech-Grounded Subspaces in Vision-Language Models
    James Oldfield, Christos Tzelepis, Yannis Panagakis, and 2 more authors
    In Advances in Neural Information Processing Systems (NeurIPS), 2023
    generation-and-learning multimodal-ml
  8. hyperreenact.png
    HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces
    Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, and 2 more authors
    2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023
    generation-and-learning
  9. falco.png
    Attribute-preserving Face Dataset Anonymization via Latent Code Optimization
    Simone Barattin, Christos Tzelepis, Ioannis Patras, and 1 more author
    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
    generation-and-learning
  10. panda.jpg
    PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs
    James Oldfield, Christos Tzelepis, Yannis Panagakis, and 2 more authors
    In The Eleventh International Conference on Learning Representations (ICLR), 2023
    generation-and-learning
  11. contra-clip.png
    ContraCLIP: Interpretable GAN generation driven by pairs of contrasting sentences
    Christos Tzelepis, James Oldfield, Georgios Tzimiropoulos, and 1 more author
    ArXiv, 2022
    generation-and-learning multimodal-ml
Card image cap

Learning with few/no/noisy/uncertain/imprecise annotations

This line of research is concerned with learning in the absence of reliable annotations. This includes self-supervised representation learning, unsupervised learning with clustering objectives or learning with labels of different granularity than that of the downstream task.

Key references:
  1. clipcleaner.png
    CLIPCleaner: Cleaning Noisy Labels with CLIP
    Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras
    In ACM International Conference on Multimedia (ACM MM), 2024
    learning-from-few-samples multimodal-ml
  2. excb.png
    Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
    Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, and Ioannis Patras
    In European Conference on Computer Vision (ECCV), 2024
    learning-from-few-samples
  3. noise_box.png
    NoiseBox: Towards More Efficient and Effective Learning with Noisy Labels
    Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras
    IEEE Transactions on Circuits and Systems for Video Technology, 2024
    learning-from-few-samples
  4. hypercolumn.png
    Self-Supervised Representation Learning with Cross-Context Learning between Global and Hypercolumn Features
    Zheng Gao, Chen Feng, and Ioannis Patras
    In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024
    learning-from-few-samples
  5. simdetr.png
    SimDETR: Simplifying self-supervised pretraining for DETR
    Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, and 2 more authors
    2023
    learning-from-few-samples
  6. maskcon.png
    MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset
    Chen Feng, and Ioannis Patras
    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
    learning-from-few-samples
  7. divclust.png
    DivClust: Controlling Diversity in Deep Clustering
    Ioannis Maniadis Metaxas, Georgios Tzimiropoulos, and Ioannis Patras
    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
    learning-from-few-samples
  8. ssr_method.png
    SSR: An Efficient and Robust Framework for Learning with Unknown Label Noise
    Chen Feng, Georgios Tzimiropoulos, and Ioannis Patras
    In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022, 2022
    learning-from-few-samples
  9. ascl_framework.png
    Adaptive Soft Contrastive Learning
    Chen Feng, and Ioannis Patras
    In 26th International Conference on Pattern Recognition, ICPR 2022, Montreal, QC, Canada, August 21-25, 2022, 2022
    learning-from-few-samples
  10. Linear Maximum Margin Classifier for Learning from Uncertain Data
    Christos Tzelepis, Vasileios Mezaris, and Ioannis Patras
    IEEE Trans. Pattern Anal. Mach. Intell., 2018
    learning-from-few-samples
  11. Unsupervised convolutional neural networks for motion estimation
    Aria Ahmadi, and Ioannis Patras
    In 2016 IEEE International Conference on Image Processing, ICIP 2016, Phoenix, AZ, USA, September 25-28, 2016, 2016
    learning-from-few-samples
Card image cap

Video Understanding

This line of research is concerned with analysis of video for retrieval, summarisation and activity/action recognition.

Key references:
  1. sum-survey.png
    Video Summarization Using Deep Neural Networks: A Survey
    Evlampios E. Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, and 2 more authors
    Proc. IEEE, 2021
    video-understanding
  2. ac-sum-gan.png
    AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization
    Evlampios E. Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, and 2 more authors
    IEEE Trans. Circuits Syst. Video Technol., 2021
    video-understanding
  3. Unsupervised Video Summarization via Attention-Driven Adversarial Learning
    Evlampios E. Apostolidis, Eleni Adamantidou, Alexandros I. Metsai, and 2 more authors
    In MultiMedia Modeling - 26th International Conference, MMM 2020, Daejeon, South Korea, January 5-8, 2020, Proceedings, Part I, 2020
    video-understanding
  4. FIVR: Fine-Grained Incident Video Retrieval
    Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, and 1 more author
    IEEE Trans. Multim., 2019
    video-understanding
  5. TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition
    Mina Bishay, Georgios Zoumpourlis, and Ioannis Patras
    In 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, September 9-12, 2019, 2019
    video-understanding
  6. ViSiL: Fine-Grained Spatio-Temporal Video Similarity Learning
    Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, and 1 more author
    In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, 2019
    video-understanding