Mohamed Afham

I'm a graduate student at the Technical University of Darmstadt - Visual Inference Lab, working with Prof. Stefan Roth under ELLIS. My broader research interest lies in the intersection of Computer Vision and Machine Learning. My graduate study is generously supported by the ELIZA Scholarship from the German Academic Exchange Service (DAAD).

Previously I spent a wonderful year at Meta AI as an AI Resident working on long-form video representation learning. I completed my bachlor's degree at University of Moratuwa, Sri Lanka, where my thesis was on Learning Representations for 3D Point Cloud Processing, advised by Dr. Ranga Rodrigo. I did a research internship with Prof. Salman Khan at MBZUAI, UAE during my undergraduate.

I'm interested in broader areas in Computer Vision and Machine Learning with focus in the subdomains of Self-Supervised Learning, 3D Vision, and Learning with Limited Labels (few-shot, zero-shot).

Email  /  CV  /  Google Scholar  /  Twitter  /  LinkedIn  /  Github

profile photo

[Sep 2023]   Admitted as a graduate student at Technical University of Darmstadt in Germany.
[Jul 2023]   One paper accepted at ICCV 2023 Workshops.
[Oct 2022]   Two papers accepted at ECCV 2022 Workshops.
[Jul 2022]   Joined Meta AI at New York City as an AI Resident.
[Mar 2022]   Serving as a reviewer for ECCV 2022, IROS 2022 and IET-Computer Vision journal.
[Mar 2022]   One paper accepted at CVPR 2022.
[Jan 2022]   One paper accepted at ICASSP 2022.
[Nov 2021]   Serving as a reviewer for CVPR 2022.
[Oct 2021]   One paper accepted at BMVC 2021.
[Oct 2021]    Towards Accurate Cross-Domain In-Bed Human Pose Estimation: preprint available on arxiv.
[Sep 2021]    Our team NFP Undercover emerged 2nd runners up at IEEE VIP Cup.
[Jun 2021]    Joined VeracityAI as an Associate Machine Learning Engineer.
[Apr 2021]    Rich Semantics Improve Few-Shot Learning: preprint available on arxiv.
[Nov 2020]   Our team Wanderers emerged as IEEE SMC winners of the hackathon.
[Oct 2020]   Joined MBZUAI as a Research Assistant.


I'm fascinated by the growth of computer vision community towards making the models see and understand the world as humans do. In particular, I'm intrigued by the results of the models learnt with self-supervision or with label constrained environments.

CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Mohamed Afham, Isuru Dissanayake, Dinithi Dissanayake, Amaya Dharmasiri, Kanchana Thilakarathna, Ranga Rodrigo

CVPR 2022
Paper / Code / Project Page
  • Description: Introduced a joint learning objective encapsulating intra-modal correspondence within point cloud modality and cross-modal correspondence between point cloud and 2D image modalities, leveraging contrastive learning.

  • Outcome: Produced state-of-the-art performance in downstream tasks such as 3D object classification, few-shot object classification and 3D object part segmentation, outperforming previous unsupervised learning methods.
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding
Mohamed Afham, Satya Narayan Shukla, Omid Poursaeed, Pengchuan Zhang, Ashish Shah, Sernam Lim

ICCV 2023, Workshop on Resource Efficient Deep Learning for Computer Vision
  • Description: We propose a task-agnostic, unsupervised and scalable approach based on Kernel Temporal Segmentation (KTS) for adaptive sampling and tokenizing long videos.

  • Outcome: Produce competitive performance on several benchmarks for long video modeling, specifically in tasks such as video classification and temporal action localization.
Visual - Semantic Contrastive Alignment for Few-Shot Image Classification
Mohamed Afham, Ranga Rodrigo

ECCV 2022, Workshop on Computer Vision in the Wild
  • Description: Proposed an auxiliary multimodal contrastive learning objective between visual and semantic class prototypes to enhance the visual class-discriminative capability of several few-shot baselines.

  • Outcome: Outperformed the standard meta learning baselines in few-shot learning by simply plugging in the proposed multimodal contrastive learning objective.
Towards Accurate Cross-Domain In-Bed Human Pose Estimation
Mohamed Afham*, Udith Haputhanthri*, Jathurshan Pradeepkumar*, Mithunjha Anandakumar, Ashwin De Silva, Chamira Edussooriya
(* denotes equal contribution)

Paper / Code
  • Description: Proposed a novel learning strategy with two-fold data augmentation and self-supervised knowledge distillation to reduce the domain discrepancy between labeled source domain and unlabeled target domain.

  • Outcome: Improved performance on SLP dataset over two standard pose estimation baselines.
Rich Semantics Improve Few-Shot Learning
Mohamed Afham, Salman Khan, Muhammad Haris Khan, Muzammal Naseer, Fahad Shahbaz Khan

BMVC 2021
Paper / Code / Presentation
  • Description: Proposed a multi-modal architecture for few-shot learning which leverages the class-level descriptions to learn better representations.

  • Outcome: Improved state-of-the-art performances on CUB, VGG-Flowers and ShapeWorld and competitive performance on miniImagenet.

Meta AI, New York, USA
AI Resident
Jul 2022 - Jul 2023

VeracityAI, Colombo, Sri Lanka
Associate Machine Learning Engineer
June 2021 - Feb 2022

Mohamed Bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
Research Assistant
Oct 2020 - Apr 2021
Advisor: Salman Khan

Technical University of Darmstadt, Germany
Master's + PhD in Computer Science
Oct 2023 - Present

University of Moratuwa, Sri Lanka
Bachelor's in Science (Engineering) specialized in Electronics and Telecommunication
Aug 2017 - Jul 2022

I borrowed this website layout from here!