Yixiao Ge

News:

  • [Jul 2025] I join XPENG Robotics and establish the Multimodal Intelligence Department.
  • [Jul 2025] We release ARC-Hunyuan-Video-7B, crushing real-world video comprehension.
  • [Jun 2025] Four papers are accepted to ICCV 2025 with one oral presentation.
  • [May 2025] Two papers are accepted to ICML 2025.
  • [Feb 2025] Three papers are accepted to CVPR 2025.
  • [Jan 2025] One paper is accepted to NAACL 2025 as a finding paper.
  • [Sep 2024] One paper is accepted to NeurIPS 2024 as a spotlight presentation.
  • [July 2024] Two papers are accepted to ECCV 2024, and one paper is accepted to TMLR.
  • [July 2024] Excited to release two open-source projects, MLLM-NPU and Open-MAGVIT2.
  • [May 2024] One paper is accepted to the main conference of ACL 2024.
  • [Apr 2024] Excited to release SEED-X, the latest version of the SEED series.
  • [Feb 2024] Nine papers are accepted to CVPR 2024.
  • [Feb 2024] Excited to release YOLO-World, a real-time open-vocabulary object detector.
  • [Jan 2024] One paper is accepted to ICLR 2024.
  • [Jan 2024] Excited to release LLaMA Pro, the SOTA model among the LLaMA family.
  • [Dec 2023] One paper is accepted to AAAI 2024.
  • [Nov 2023] Glad to launch SEED-Bench-2 and ViT-Lens-2!
  • [Oct 2023] Excited to unveil SEED-LLaMA (SEED-2), featuring in-context emergent capabilities.
  • [Sep 2023] Three papers are accepted to NeurIPS 2023.
  • [Aug 2023] Glad to release ViT-Lens, advancing omni-modal representation learning.
  • [Aug 2023] Glad to release SEED-Bench, the most comprehensive MLLM benchmark to date.
  • [July 2023] Glad to release SEED, an image tokenizer tailored for LLM.
  • [Jan-July 2023] 11 papers were accepted by ICLR/CVPR/ICML/KDD/ICCV 2023.
  • [Jan-Nov 2022] 11 papers were accepted by ICLR/CVPR/IJCAI/ECCV 2022 and AAAI 2023, 2 of which were oral.
  • [Mar-Jul 2021] 5 papers were accepted by CVPR/ICCV 2021.
  • [Jan-Sep 2020] 3 papers were accepted by ICLR/ECCV/NeurIPS 2020, 1 of which was spotlight.

Publications

( *equal contribution   #corresponding author / project lead )

Selected Preprints:

  • ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts

    A powerful multimodal model designed for understanding real-world videos.

    Yuying Ge, Yixiao Ge#, Chen Li, Teng Wang, Junfu Pu, Yizhuo Li, Lu Qiu, Jin Ma, Lisheng Duan, Xinyu Zuo, Jinwen Luo, Weibo Gu, Zexuan Li, Xiaojing Zhang, Yangyu Tao, Han Hu, Di Wang, Ying Shan

  • SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

    The latest version of the SEED series, towards multimodal models in the real world.

    Yuying Ge*, Sijie Zhao*, Jinguo Zhu*, Yixiao Ge#, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan

2025:

  • Scalable Image Tokenization with Index Backpropagation Quantization

    Fengyuan Shi, Zhuoyan Luo, Yixiao Ge#, Yujiu Yang, Ying Shan, Limin Wang#

    ICCV, 2025 [Paper] [Code] GitHub stars

  • Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

    Shijie Ma, Yuying Ge, Teng Wang, Yuxin Guo, Yixiao Ge, Ying Shan

    ICCV, 2025 [Project] [Paper] [Code] GitHub stars

  • Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

    Yi Chen, Yuying Ge, Weiliang Tang, Yizhuo Li, Yixiao Ge, Mingyu Ding, Ying Shan, Xihui Liu

    ICCV, 2025 (Oral) [Project] [Paper] [Code] GitHub stars

  • AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

    Junhao Cheng, Yuying Ge, Yixiao Ge, Jing Liao, Ying Shan

    ICCV, 2025 [Paper] [Code] GitHub stars

  • HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding

    Rui Yang, Lin Song, Yicheng Xiao, Runhui Huang, Yixiao Ge, Ying Shan, Hengshuang Zhao

    ICML, 2025 [Poster] [Code] GitHub stars

  • LoRA-Gen: Specializing Large Language Model via Online LoRA Generation

    Yicheng Xiao, Lin Song, Rui Yang, Cheng Cheng, Yixiao Ge, Xiu Li, Ying Shan

  • Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

    Yuying Ge, Yizhuo Li, Yixiao Ge, Ying Shan

    CVPR, 2025 [Paper] [Code] GitHub stars

  • ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models

    Xubing Ye, Yukang Gan, Yixiao Ge#, Xiao-Ping Zhang, Yansong Tang#

    CVPR, 2025 [Paper]

    [Project]

  • VoCo-LLaMA: Towards Vision Compression with Large Language Models

    Xubing Ye, Yukang Gan, Xiaoke Huang, Yixiao Ge#, Yansong Tang#

    CVPR, 2025 [Paper] [Project] [Code] GitHub stars

  • Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

    Chengyue Wu, Yixiao Ge, Qiushan Guo, Jiahao Wang, Zhixuan Liang, Zeyu Lu, Ying Shan, Ping Luo

    NAACL Findings, 2025 [Paper] [Data] [Code] GitHub stars

2024:

  • GrootVL: Tree Topology is All You Need in State Space Model

    Yicheng Xiao, Lin Song, Shaoli Huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan

    NeurIPS, 2024 (Spotlight) [Paper] [Code]

    GitHub stars

  • Vision-language instruction tuning: A review and analysis

    Chen Li, Yixiao Ge, Dian Li, Ying Shan

    TMLR, 2024 [Paper] [Code] [Data] GitHub stars

  • ST-LLM: Large Language Models Are Effective Temporal Learners

    Ruyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li

    ECCV, 2024 [Paper] [Code] GitHub stars

  • DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

    Yunpeng Bai, Xintao Wang, Yan-pei Cao, Yixiao Ge, Chun Yuan, Ying Shan

    ECCV, 2024 [Paper] [Code] GitHub stars

  • LLaMA Pro: Progressive LLaMA with Block Expansion

    SOTA foundation models among the LLaMA family, excelling in general tasks, code, and math.

    Chengyue Wu, Yukang Gan, Yixiao Ge#, Zeyu Lu, Jiahao Wang, Ye Feng, Ping Luo, Ying Shan

    ACL, 2024 [Project]

    [Paper] [Code] [Model] GitHub stars

  • YOLO-World: Real-Time Open-Vocabulary Object Detection

    A real-time open-vocabulary object detector with SOTA performance.

    Tianheng Cheng*, Lin Song*#, Yixiao Ge#, Wenyu Liu, Xinggang Wang#, Ying Shan

    CVPR, 2024 [Project] [Paper] [Code] GitHub stars

  • ViT-Lens: Towards Omni-modal Representations

    Advancing omni-modal representation learning with modality lens. Support 3D point cloud, depth, audio, tactile, EEG. Enable any-modality to text and image generation.

    Weixian Lei, Yixiao Ge#, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou#

  • SEED-Bench: Benchmarking Multimodal Large Language Models

    Comprises 24K multiple-choice questions with accurate human annotations, which spans 27 dimensions, including the evaluation of both text and image generation.

    Bohao Li*, Yuying Ge*, Yixiao Ge#, Guangzhi Wang, Rui Wang, Ruimao Zhang#, Ying Shan

  • UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

    Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan

    CVPR, 2024 [Paper] [Code] GitHub stars

  • BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning

    Ruyang Liu, Chen Li, Yixiao Ge, Ying Shan, Thomas H. Li, Ge Li

    CVPR, 2024 [Paper] [Code] GitHub stars

  • SmartEdit: Exploring Complex Instruction-based Image Editing with Large Language Models

    Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan

    CVPR, 2024 [Paper] [Code] GitHub stars

  • Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

    Yuchao Gu, Xintao Wang, Yixiao Ge, Ying Shan, Mike Zheng Shou

  • Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

    Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue

    CVPR, 2024 [Paper] [Code] GitHub stars

  • LoRA-Sparse: Low-Rank Approximation for Sparse Large Language Models

    Lin Song, Yukang Chen, Shuai Yang, Xiaohan Ding, Yixiao Ge, Ying-Cong Chen, Ying Shan

  • Making LLaMA SEE and Draw with SEED Tokenizer

    Offers unified multimodal comprehension and generation, featuring multi-turn in-context emergent capabilities, akin to an AI aide.

    Yuying Ge*, Sijie Zhao*, Ziyun Zeng, Yixiao Ge#, Chen Li, Xintao Wang, Ying Shan

  • Cached Transformers: Improving Transformers with Differentiable Memory Cache

    Zhaoyang Zhang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, Ping Luo

2023:

  • GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

    Rui Yang, Lin Song, Yanwei Li, Sijie Zhao, Yixiao Ge, Xiu Li, Ying Shan

    NeurIPS, 2023 [Project] [Paper] [Demo] [Code] GitHub stars

  • Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

    Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou

    NeurIPS, 2023 [Project] [Paper]

    [Code] GitHub stars

  • Meta-Adapter: An Online Few-shot Learner for Vision-Language Model

    Cheng Cheng, Lin Song, Ruoyi Xue, Hang Wang, Hongbin Sun, Yixiao Ge, Ying Shan

    NeurIPS, 2023

    [Paper]

    [Code] GitHub stars

  • Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

    Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou

    ICCV, 2023 [Project] [Paper] [Demo] [Code] GitHub stars

  • Exploring Model Transferability through the Lens of Potential Energy

    Xiaotong Li, Zixuan Hu, Yixiao Ge, Ying Shan, Lingyu Duan

    ICCV, 2023 [Paper] [Code] GitHub stars

  • BoxSnake: Polygonal Instance Segmentation with Box Supervision

    Rui Yang, Lin Song, Yixiao Ge, Xiu Li

    ICCV, 2023 [Paper] [Code] GitHub stars

  • Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

    Yuxin Fang*, Shusheng Yang*, Shijie Wang*, Yixiao Ge, Ying Shan, Xinggang Wang

    ICCV, 2023 [Paper] [Code] GitHub stars

  • Binary Embedding-based Retrieval at Tencent

    Yukang Gan*, Yixiao Ge*, Chang Zhou*, Shupeng Su, Zhouchuan Xu, Xuyuan Xu, Quanchao Hui, Xiang Chen, Yexin Wang, Ying Shan

    KDD, 2023 [Paper] [Code] GitHub stars

  • π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

    Chengyue Wu, Teng Wang, Yixiao Ge#, Zeyu Lu, Ruisong Zhou, Ying Shan, Ping Luo

    ICML, 2023 [Paper] [Code] GitHub stars

  • Accelerating Vision-Language Pretraining with Free Language Modeling

    Teng Wang, Yixiao Ge, Feng Zheng, Ran Cheng, Ying Shan, Xiaohu Qie, Ping Luo

    CVPR, 2023 [Paper] [Code] GitHub stars

  • Masked Visual Reconstruction in Language Semantic Space

    Shusheng Yang, Yixiao Ge#, Kun Yi, Dian Li, Ying Shan, Xiaohu Qie, Xinggang Wang#

    CVPR, 2023 [Paper] [Code] GitHub stars

  • Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

    Ziyun Zeng*, Yuying Ge*, Xihui Liu, Bin Chen#, Ping Luo, Shu-Tao Xia, Yixiao Ge#

    CVPR, 2023 [Paper] [Code] GitHub stars

  • All in One: Exploring Unified Video-Language Pre-training

    Alex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Zheng Shou

    CVPR, 2023 [Paper] [Code] GitHub stars

  • Masked Image Modeling with Denoising Contrast

    Kun Yi*, Yixiao Ge*#, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, Xiaohu Qie

    ICLR, 2023 [Paper] [Code] GitHub stars

  • Darwinian Model Upgrades: Model Evolving with Selective Compatibility

    Binjie Zhang*, Shupeng Su*, Yixiao Ge#, Xuyuan Xu, Yexin Wang, Chun Yuan, Mike Zheng Shou, Ying Shan

  • Video-Text Pre-training with Learned Regions

    Rui Yan, Mike Zheng Shou, Yixiao Ge, Alex Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang

    AAAI, 2023 [Paper] [Code] GitHub stars

2022:

  • MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

    Yuying Ge, Yixiao Ge, Xihui Liu, Jinpeng Wang, Jianping Wu, Ying Shan, Xiaohu Qie, Ping Luo

    ECCV, 2022 [Paper] [Code] GitHub stars

  • Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space

    Wenqi Shao#, Xun Zhao, Yixiao Ge#, Zhaoyang Zhang, Lei Yang, Xiaogang Wang, Ying Shan, Ping Luo

    ECCV, 2022 [Paper] [Code] GitHub stars

  • mc-BEiT: Multi-choice Discretization for Image BERT Pre-training

    Xiaotong Li, Yixiao Ge, Kun Yi, Zixuan Hu, Ying Shan, Lingyu Duan

    ECCV, 2022 [Paper] [Code] GitHub stars

  • Towards Universal Backward-Compatible Representation Learning

    Binjie Zhang, Yixiao Ge#, Yantao Shen, Shupeng Su, Fanzi Wu, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan

    IJCAI, 2022 (Long oral)

    [Paper] [Code] GitHub stars

  • Bridging Video-text Retrieval with Multiple Choice Questions

    Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, Xiaohu Qie, Ping Luo

    CVPR, 2022 (Oral)

    [Paper] [Code] GitHub stars

  • Object-aware Video-language Pre-training for Retrieval

    Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou

    CVPR, 2022

    [Paper] [Code] GitHub stars

  • Hot-Refresh Model Upgrades with Regression-Alleviating Compatible Training in Image Retrieval

    Binjie Zhang, Yixiao Ge#, Yantao Shen, Yu Li, Chun Yuan#, Xuyuan Xu, Yexin Wang, Ying Shan

    ICLR, 2022

    [Paper] [Code] GitHub stars

  • Dynamic Token Normalization Improves Vision Transformer

    Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo

    ICLR, 2022

    [Paper] [Code] GitHub stars

  • Uncertainty Modeling for Out-of-Distribution Generalization

    Xiaotong Li, Yongxing Dai, Yixiao Ge, Jun Liu, Ying Shan, Lingyu Duan

    ICLR, 2022

    [Paper] [Code] GitHub stars

  • Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-ID

    Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li

    IEEE TNNLS, 2022 [Project] [Paper]

2021:

  • Progressive Correspondence Pruning by Consensus Learning

    Chen Zhao*, Yixiao Ge*, Feng Zhu, Rui Zhao, Hongsheng Li, Mathieu Salzmann

    ICCV, 2021 [Project] [Paper] [Code] GitHub stars

  • Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

    Yi Zheng, Shixiang Tang, Guolong Teng, Yixiao Ge, Kaijian Liu, Donglian Qi, Jing Qin, Dapeng Chen

  • Refining Pseudo Labels with Clustering Consensus over Generations for Unsupervised Object Re-identification

    Xiao Zhang*, Yixiao Ge*, Yu Qiao, Hongsheng Li

  • DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

    Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li

    CVPR, 2021 [Paper] [Code] GitHub stars

  • Mutual CRF-GNN Network for Few-shot Learning

    Shixiang Tang, Dapeng Chen, Lei Bai, Kaijian Liu, Yixiao Ge, Wanli Ouyang

2020:

  • Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID

    Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Hongsheng Li

    NeurIPS, 2020 [Project] [Paper] [Code] GitHub stars

  • Self-supervising Fine-grained Region Similarities for Large-scale Image Localization

    Yixiao Ge, Haibo Wang, Feng Zhu, Rui Zhao, Hongsheng Li

    ECCV, 2020 (Spotlight) [Project] [Paper] [Code] GitHub stars

  • Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification

    Yixiao Ge, Dapeng Chen, Hongsheng Li

    ICLR, 2020 [Project] [Paper] [Code] GitHub stars

Before 2020:

  • FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification

    Yixiao Ge*, Zhuowan Li*, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li

    NeurIPS, 2018 [Project] [Paper] [Code] GitHub stars