Muhammad Uzair Khattak

PhD Candidate, EPFL, Switzerland - MSc from MBZUAI, Abu-Dhabi - BSc from SEECS, NUST, Pakistan.


Lausanne, Switzerland

Hi, I am Muhammad Uzair, a PhD candidate at VILAB at EPFL supervised by Prof. Amir Zamir and PD. Dr. Federico Tombari. Previously, I completed my MSc in Computer Vision at the IVAL lab at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), where I was kindly supervised by Dr. Salman Khan and Dr. Fahad Khan. I am also grateful to be co-supervised and mentored by Dr. Muzammal Naseer.

My research focus is on adapting foundational multi-modal models for vision tasks including image recognition, object detection and video action recognition. The goal is to steer these foundational models for downstream tasks with limited data (few-/zero-shot) while maintaining their pre-trained generalization for novel tasks.

Currently, I am focusing on scaling up Multi-task foundational models and Complex Video Reasoning using Large Multi-modal Models (LMMs).

Email / Google Scholar / Github / Twitter / CV


Sep 1, 2024 I have started my PhD studies at EPFL, Switzerland.
May 9, 2024 We have released CVRR-ES: Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs. More details on the project page.
Feb 22, 2024 Invited talk on Multi-modal learning @ Amazon Prime Video.
Feb 5, 2024 Invited talk on our recent ProText work at Cohere For AI. (Slides / Recording)
Jan 5, 2024 We have released ProText, a novel framework to adapt Vision-Language models with text-only data. More details on the project page !
Dec 16, 2023 Invited talk on our recent PromptSRC work at Computer Vision Talks.
Dec 15, 2023 Invited talk at WADLA 2023 Deep Learning Workshop. (Recording)
Nov 4, 2023 Paper and Code for our NeurIPS’23 work PromptAlign are released!

View all news

Selected publications

* denotes joint first authors


  1. cvrres_preview.png
    How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
    Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Jameel Hassan, Muzammal Naseer, Federico Tombari, Fahad Shahbaz Khan, and Salman Khan
    arXiv preprint arXiv:2405.03690, 2024
  2. protext_preview.png
    Learning to Prompt with Text Only Supervision for Vision-Language Models
    Muhammad Uzair khattak, Muhammad Ferjad Naeem, Naseer Muzzamal, Luc Van Gool, and Federico Tombari
    arXiv:2401.02418, 2024


  1. maple_preview.png
    Maple: Multi-modal prompt learning
    Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, and Fahad Shahbaz Khan
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
  2. promptalign_preview.png
    Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
    Jameel Hassan, Hanan Gani, Noor Hussein, Muhammad Uzair Khattak+, Muzammal Naseer, Fahad Shahbaz Khan, and Salman Khan
    Advances in Neural Information Processing Systems, 2023
  3. vificlip_preview.png
    Fine-tuned clip models are efficient video learners
    Hanoona Rasheed*, Muhammad Uzair Khattak*, Muhammad Maaz, Salman Khan, and Fahad Shahbaz Khan
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
  4. promptsrc_preview.png
    Self-regulating Prompts: Foundational Model Adaptation without Forgetting
    Muhammad Uzair Khattak*, Syed Talal Wasim*, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, and Fahad Shahbaz Khan
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Oct 2023
  5. focalnets_preview.png
    Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
    Syed Talal Wasim*, Muhammad Uzair Khattak*, Muzammal Naseer, Salman Khan, Mubarak Shah, and Fahad Shahbaz Khan
    In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2023


  1. ovd_preview.png
    Bridging the gap between object and image-level representations for open-vocabulary detection
    Hanoona Bangalath*, Muhammad Maaz*, Muhammad Uzair Khattak, Salman H Khan, and Fahad Shahbaz Khan
    Advances in Neural Information Processing Systems, Oct 2022
  2. loopclosure_preview.png
    Investigating and Improving Common Loop Closure Failures in Visual SLAM
    Saran Khaliq, Muhammad Latif Anjum, Wajahat Hussain, Muhammad Uzair Khattak, and Momen Rasool
    Autonomous Robots, Oct 2022