nahidalam - Overview

Hi, this is Nahid. I am an independent researcher with Cohere Labs community, working on Multimodal Learning, Computer Vision and Embodied AI.

I recently created Maya โ€“ a multilingual multimodal LLM. I work at the intersection of multimodal learning, computer vision and embodied ai, developing models that perceive, reason, and act in the physical world.
My current interests include:

  • Spatial understanding in VLMs
  • Physics-aware world models
  • Multimodal Learning
  • Causal Learning

Publications

  • The Spatial Blindspot of Vision-Language Models. Nahid Alam et al. pre-print arXiv

  • Behind Maya: Building a Multilingual Vision-Language Model.
    Nahid Alam et al. CVPR 2025 Workshop (VLMs4All).
    arXiv ยท Google Scholar

  • Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA.
    Karthik Reddy Kanjula, Surya Guthikonda, Nahid Alam, Shayekh Islam.
    CVPR 2025 Workshop (ReGenAI), Oral.
    arXiv ยท Google Scholar

  • Embedding Geometries of Contrastive Language-Image Pre-Training.
    Jason Chuan-Chih Chou, Nahid Alam. ECCV 2024 Workshop (Beyond Euclidean).
    arXiv ยท Google Scholar

More at Google Scholar


Recent Projects

  • Maya: Multilingual multimodal foundation model (2 CVPR workshops)
  • Gemma3n-VLA: Vision-Language-Action model built with Hugging Face LeRobot
  • GR00T-N1 Hackathon: Bimanual robot manipulation with multimodal control

๐ŸŒ Connect