Harsh Agrawal Personal Website

From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons

A. Szot, B. Mazoure, O. Attia, A. Timofeev, H. Agrawal, D. Hjelm, Z. Gan, Z. Kira, A. Toshev

2025

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Z. Li, K. You, H. Zhang, D. Feng, H. Agrawal, X. Li, M. P. S. Moorthy, J. Nichols, Y. Yang, Z. Gan

ICLR 2025

Grounding Multimodal Large Language Models in Actions

A. Szot, B. Mazoure, H. Agrawal, D. Hjelm, Z. Kira, A. Toshev

NeurIPS 2024

Large Language Models as Generalizable Policies for Embodied Tasks

A. Szot, M. Schwarzer, H. Agrawal, B. Mazoure, W. T. K. Metcalf, N. Mackraz, D. Hjelm, A. Toshev

ICLR, 2024

Simple and Effective Synthesis of Indoor 3D Scenes

J. Y. Koh*, H. Agrawal*, D. Batra, R. Tucker, A. Waters, H. Lee, Y. Yang, J. Baldridge, P. Anderson

AAAI 2023

Housekeep: Tidying Virtual Households using Commonsense Reasoning

Y. Kant, A. Ramachandran, S. Yenamandra, I. Gilitschenski, D. Batra, A. Szot*, H. Agrawal*

ECCV 2022

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

A. Moudgil, A. Majumdar, H. Agrawal, S. Lee, D. Batra

NeurIPS 2021

Known unknowns: Learning novel concepts using reasoning-by-elimination

H. Agrawal, E. A. Meirom, Y. Atzmon, S. Mannor, G. Chechik

UAI 2021 (Long Talk)

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

X. Zhao, H. Agrawal, D. Batra, A. Schwing

ICCV 2021

Contrast and Classify: Alternate Training for Robust VQA

Y. Kant, A. Moudgil, D. Batra, D. Parikh, H. Agrawal

ICCV 2021

Spatially Aware Multimodal Transformers for TextVQA

Y. Kant, D. Batra, P. Anderson, A. Schwing, D. Parikh, J. Lu, H. Agrawal

ECCV 2020

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

J. Aneja*, H. Agrawal*, D. Batra, A. Schwing

ICCV 2019

nocaps: novel object captioning at scale

H. Agrawal*, K. Desai*, Y. Wang, X. Chen, R. Jain, M. Johnson, D. Batra, D. Parikh, S. Lee, P. Anderson

ICCV 2019

Sort Story: Sorting Jumbled Images and Captions into Stories

H. Agrawal*, A. Chandrasekaran*, D. Batra, D. Parikh, M. Bansal

EMNLP 2016

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

A. Das*, H. Agrawal*, C. L. Zitnick, D. Parikh, D. Batra

Computer Vision and Image Understanding (CVIU) 2017

EMNLP 2016

ICML 2016 Workshop on Visualization for Deep Learning (Best Student Paper)

Object-Proposal Evaluation Protocol is 'Gameable'

N. Chavali*, H. Agrawal*, A. Mahendru*, D. Batra

CVPR 2016 (Spotlight)

EvalAI: Towards Better Evaluation Systems for AI Agents

D. Yadav, R. Jain, H. Agrawal, P. Chattopadhyay, T. Singh, A. Jain, S. B. Singh, S. Lee, D. Batra

AI Systems Workshop (SOSP 2019)

CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

H. Agrawal, C. S. Mathialagan, Y. Goyal, N. Chavali, P. Banik, A. Mohapatra, A. Osman, D. Batra

Book Chapter: Mobile Cloud Visual Media Computing, 265-290

Fabrik: An Online Collaborative Neural Network Editor

U. Garg, V. Prabhu, D. Yadav, R. Ramrakhya, H. Agrawal, D. Batra

AI Systems Workshop (SOSP 2019)