Harsh Agrawal Personal Website
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
A. Szot, B. Mazoure, O. Attia, A. Timofeev, H. Agrawal, D. Hjelm, Z. Gan, Z. Kira, A. Toshev
2025
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Z. Li, K. You, H. Zhang, D. Feng, H. Agrawal, X. Li, M. P. S. Moorthy, J. Nichols, Y. Yang, Z. Gan
ICLR 2025
Grounding Multimodal Large Language Models in Actions
A. Szot, B. Mazoure, H. Agrawal, D. Hjelm, Z. Kira, A. Toshev
NeurIPS 2024
Large Language Models as Generalizable Policies for Embodied Tasks
A. Szot, M. Schwarzer, H. Agrawal, B. Mazoure, W. T. K. Metcalf, N. Mackraz, D. Hjelm, A. Toshev
ICLR, 2024
Simple and Effective Synthesis of Indoor 3D Scenes
J. Y. Koh*, H. Agrawal*, D. Batra, R. Tucker, A. Waters, H. Lee, Y. Yang, J. Baldridge, P. Anderson
AAAI 2023
Housekeep: Tidying Virtual Households using Commonsense Reasoning
Y. Kant, A. Ramachandran, S. Yenamandra, I. Gilitschenski, D. Batra, A. Szot*, H. Agrawal*
ECCV 2022
SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation
A. Moudgil, A. Majumdar, H. Agrawal, S. Lee, D. Batra
NeurIPS 2021
Known unknowns: Learning novel concepts using reasoning-by-elimination
H. Agrawal, E. A. Meirom, Y. Atzmon, S. Mannor, G. Chechik
UAI 2021 (Long Talk)
Contrast and Classify: Alternate Training for Robust VQA
Y. Kant, A. Moudgil, D. Batra, D. Parikh, H. Agrawal
ICCV 2021
Spatially Aware Multimodal Transformers for TextVQA
Y. Kant, D. Batra, P. Anderson, A. Schwing, D. Parikh, J. Lu, H. Agrawal
ECCV 2020
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
J. Aneja*, H. Agrawal*, D. Batra, A. Schwing
ICCV 2019
nocaps: novel object captioning at scale
H. Agrawal*, K. Desai*, Y. Wang, X. Chen, R. Jain, M. Johnson, D. Batra, D. Parikh, S. Lee, P. Anderson
ICCV 2019
Sort Story: Sorting Jumbled Images and Captions into Stories
H. Agrawal*, A. Chandrasekaran*, D. Batra, D. Parikh, M. Bansal
EMNLP 2016
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
A. Das*, H. Agrawal*, C. L. Zitnick, D. Parikh, D. Batra
Computer Vision and Image Understanding (CVIU) 2017
EMNLP 2016
ICML 2016 Workshop on Visualization for Deep Learning (Best Student Paper)
Object-Proposal Evaluation Protocol is 'Gameable'
N. Chavali*, H. Agrawal*, A. Mahendru*, D. Batra
CVPR 2016 (Spotlight)
EvalAI: Towards Better Evaluation Systems for AI Agents
D. Yadav, R. Jain, H. Agrawal, P. Chattopadhyay, T. Singh, A. Jain, S. B. Singh, S. Lee, D. Batra
AI Systems Workshop (SOSP 2019)
CloudCV: Large Scale Distributed Computer Vision as a Cloud Service
H. Agrawal, C. S. Mathialagan, Y. Goyal, N. Chavali, P. Banik, A. Mohapatra, A. Osman, D. Batra
Book Chapter: Mobile Cloud Visual Media Computing, 265-290
Fabrik: An Online Collaborative Neural Network Editor
U. Garg, V. Prabhu, D. Yadav, R. Ramrakhya, H. Agrawal, D. Batra
AI Systems Workshop (SOSP 2019)