nahidalam - Overview

Hi, this is Nahid. I am an independent researcher with Cohere Labs community, working on Multimodal Learning, Computer Vision and Embodied AI.

I recently created Maya – a multilingual multimodal LLM. I work at the intersection of multimodal learning, computer vision and embodied ai, developing models that perceive, reason, and act in the physical world.
My current interests include:

Spatial understanding in VLMs
Physics-aware world models
Multimodal Learning
Causal Learning

Publications

The Spatial Blindspot of Vision-Language Models. Nahid Alam et al. pre-print arXiv
Behind Maya: Building a Multilingual Vision-Language Model.
Nahid Alam et al. CVPR 2025 Workshop (VLMs4All).
arXiv · Google Scholar
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA.
Karthik Reddy Kanjula, Surya Guthikonda, Nahid Alam, Shayekh Islam.
CVPR 2025 Workshop (ReGenAI), Oral.
arXiv · Google Scholar
Embedding Geometries of Contrastive Language-Image Pre-Training.
Jason Chuan-Chih Chou, Nahid Alam. ECCV 2024 Workshop (Beyond Euclidean).
arXiv · Google Scholar

More at Google Scholar

Recent Projects

Maya: Multilingual multimodal foundation model (2 CVPR workshops)
Gemma3n-VLA: Vision-Language-Action model built with Hugging Face LeRobot
GR00T-N1 Hackathon: Bimanual robot manipulation with multimodal control

🌐 Connect

LinkedIn: nahidalam
Twitter / X: @nahidalam