I am a member of technical staff at xAI. I obtained my Ph.D. from University of Wisconsin-Madison in May 2024, under the supervision of Prof. Yong Jae Lee. During my Ph.D., I’ve been fortunate to work with Dr. Chunyuan Li at Microsoft Research. Before this, I obtained my bachler’s degree (with honor) at Zhejiang University, where I worked with Prof. Xiaogang Jin and Prof. Fei Wu.
I am generally interested in computer vision and machine learning. My recent focus is on building steerable large models. The first baby is LLaVA.
I am a core contributor to Grok-1.5V and Grok-2. I led the vision effort of Grok-3 and Grok-3 Reasoning.
selected publications
-
Blog
LLaVA-NeXT: Improved reasoning, OCR, and world knowledge
Jan 2024
-
Improved Baselines with Visual Instruction Tuning (LLaVA-1.5)
CVPR, 2024
-
Visual Instruction Tuning (LLaVA)
NeurIPS, 2023 (Oral, top 0.5%)
-
Learning Customized Visual Models with Retrieval-Augmented Knowledge
CVPR, 2023 (Highlight, top 2.5%)
-
GLIGEN: Open-Set Grounded Text-to-Image Generation
CVPR, 2023
-
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li* , Haotian Liu* , Liunian Li , Pengchuan Zhang , Jyoti Aneja , Jianwei Yang, Ping Jin , Houdong Hu , Zicheng Liu , Yong Jae Lee, and Jianfeng Gao
NeurIPS, Datasets and Benchmarks Track, 2022
-
Masked Discrimination for Self-Supervised Learning on Point Clouds
ECCV, 2022
-
YolactEdge: Real-time Instance Segmentation on the Edge
ICRA, 2021