Vision and Language Group@ MIL
Deep Modular Co-Attention Networks for Visual Question Answering
Python 458 89
A lightweight, scalable, and general framework for visual question answering research
Python 331 64
A PyTorch reimplementation of bottom-up-attention models
Jupyter Notebook 301 76
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Python 278 28
imp imp Public
a family of highly capabale yet efficient large multimodal models
Python 193 15
An VideoQA dataset based on the videos from ActivityNet
Python 91 10