photo.jpg

I am a member of technical staff at xAI. I obtained my Ph.D. from University of Wisconsin-Madison in May 2024, under the supervision of Prof. Yong Jae Lee. During my Ph.D., I’ve been fortunate to work with Dr. Chunyuan Li at Microsoft Research. Before this, I obtained my bachler’s degree (with honor) at Zhejiang University, where I worked with Prof. Xiaogang Jin and Prof. Fei Wu.

I am generally interested in computer vision and machine learning. My recent focus is on building steerable large models. The first baby is LLaVA.

I am a core contributor to Grok-1.5V and Grok-2. I led the vision effort of Grok-3 and Grok-3 Reasoning.

selected publications

  1. Blog

    LLaVA-NeXT: Improved reasoning, OCR, and world knowledge

    Jan 2024

  2. llava_v15.jpg

    Improved Baselines with Visual Instruction Tuning (LLaVA-1.5)

    CVPR, 2024

  3. llava2023.jpg

    Visual Instruction Tuning (LLaVA)

    NeurIPS, 2023 (Oral, top 0.5%)

  4. react2023.jpg

    Learning Customized Visual Models with Retrieval-Augmented Knowledge

    CVPR, 2023 (Highlight, top 2.5%)

  5. gligen2023.jpg

    GLIGEN: Open-Set Grounded Text-to-Image Generation

    CVPR, 2023

  6. elevater2022.jpg

    ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

    Chunyuan Li*Haotian Liu* , Liunian Li , Pengchuan Zhang , Jyoti Aneja , Jianwei Yang, Ping Jin , Houdong Hu , Zicheng Liu , Yong Jae Lee, and Jianfeng Gao

    NeurIPS, Datasets and Benchmarks Track, 2022

  7. maskpoint2022.png

    Masked Discrimination for Self-Supervised Learning on Point Clouds

    ECCV, 2022

  8. icra2021.jpg

    YolactEdge: Real-time Instance Segmentation on the Edge

    ICRA, 2021