bfshi - Overview

Baifeng Shi bfshi

Scaling Vision Pre-Training to 4K Resolution

Python 221 10
When do we not need larger vision models?

Python 413 15
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3.8k 315
Official code for "TOAST: Transfer Learning via Attention Steering"

Python 188 10
Official code for "Top-Down Visual Attention from Analysis by Synthesis" (CVPR 2023 highlight)

Jupyter Notebook 167 13