Harsh Agrawal Personal Website

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

A. Szot, B. Mazoure, O. Attia, A. Timofeev, H. Agrawal, D. Hjelm, Z. Gan, Z. Kira, A. Toshev

2025


Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Z. Li, K. You, H. Zhang, D. Feng, H. Agrawal, X. Li, M. P. S. Moorthy, J. Nichols, Y. Yang, Z. Gan

ICLR 2025


Grounding Multimodal Large Language Models in Actions

Grounding Multimodal Large Language Models in Actions

A. Szot, B. Mazoure, H. Agrawal, D. Hjelm, Z. Kira, A. Toshev

NeurIPS 2024


Large Language Models as Generalizable Policies for Embodied Tasks

Large Language Models as Generalizable Policies for Embodied Tasks

A. Szot, M. Schwarzer, H. Agrawal, B. Mazoure, W. T. K. Metcalf, N. Mackraz, D. Hjelm, A. Toshev

ICLR, 2024


Simple and Effective Synthesis of Indoor 3D Scenes

Simple and Effective Synthesis of Indoor 3D Scenes

J. Y. Koh*, H. Agrawal*, D. Batra, R. Tucker, A. Waters, H. Lee, Y. Yang, J. Baldridge, P. Anderson

AAAI 2023


Housekeep: Tidying Virtual Households using Commonsense Reasoning

Housekeep: Tidying Virtual Households using Commonsense Reasoning

Y. Kant, A. Ramachandran, S. Yenamandra, I. Gilitschenski, D. Batra, A. Szot*, H. Agrawal*

ECCV 2022


SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

A. Moudgil, A. Majumdar, H. Agrawal, S. Lee, D. Batra

NeurIPS 2021


Known unknowns: Learning novel concepts using reasoning-by-elimination

Known unknowns: Learning novel concepts using reasoning-by-elimination

H. Agrawal, E. A. Meirom, Y. Atzmon, S. Mannor, G. Chechik

UAI 2021 (Long Talk)


The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

X. Zhao, H. Agrawal, D. Batra, A. Schwing

ICCV 2021


Contrast and Classify: Alternate Training for Robust VQA

Contrast and Classify: Alternate Training for Robust VQA

Y. Kant, A. Moudgil, D. Batra, D. Parikh, H. Agrawal

ICCV 2021


Spatially Aware Multimodal Transformers for TextVQA

Spatially Aware Multimodal Transformers for TextVQA

Y. Kant, D. Batra, P. Anderson, A. Schwing, D. Parikh, J. Lu, H. Agrawal

ECCV 2020


Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

J. Aneja*, H. Agrawal*, D. Batra, A. Schwing

ICCV 2019


nocaps: novel object captioning at scale

nocaps: novel object captioning at scale

H. Agrawal*, K. Desai*, Y. Wang, X. Chen, R. Jain, M. Johnson, D. Batra, D. Parikh, S. Lee, P. Anderson

ICCV 2019


Sort Story: Sorting Jumbled Images and Captions into Stories

Sort Story: Sorting Jumbled Images and Captions into Stories

H. Agrawal*, A. Chandrasekaran*, D. Batra, D. Parikh, M. Bansal

EMNLP 2016


Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

A. Das*, H. Agrawal*, C. L. Zitnick, D. Parikh, D. Batra

Computer Vision and Image Understanding (CVIU) 2017

EMNLP 2016

ICML 2016 Workshop on Visualization for Deep Learning (Best Student Paper)


Object-Proposal Evaluation Protocol is 'Gameable'

Object-Proposal Evaluation Protocol is 'Gameable'

N. Chavali*, H. Agrawal*, A. Mahendru*, D. Batra

CVPR 2016 (Spotlight)


EvalAI: Towards Better Evaluation Systems for AI Agents

EvalAI: Towards Better Evaluation Systems for AI Agents

D. Yadav, R. Jain, H. Agrawal, P. Chattopadhyay, T. Singh, A. Jain, S. B. Singh, S. Lee, D. Batra

AI Systems Workshop (SOSP 2019)


CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

H. Agrawal, C. S. Mathialagan, Y. Goyal, N. Chavali, P. Banik, A. Mohapatra, A. Osman, D. Batra

Book Chapter: Mobile Cloud Visual Media Computing, 265-290


Fabrik: An Online Collaborative Neural Network Editor

Fabrik: An Online Collaborative Neural Network Editor

U. Garg, V. Prabhu, D. Yadav, R. Ramrakhya, H. Agrawal, D. Batra

AI Systems Workshop (SOSP 2019)