Yuxi Xiao

Yuxi Xiao 肖宇曦 /ˈjuː-ʃee shyao/

I am a third-year Ph.D. student at the State Key Laboratory of CAD&CG, Zhejiang University, advised by Prof. Xiaowei Zhou. Currently, I am also a research intern at ByteDance Seed. Pronounced like “yoo-shee shyao.”

My recent research focuses on three directions:

Developing 3D/4D foundation models for reconstruction and spatial perception (see my SpatialTracker series).
Defining and improving spatial abilities for multimodal large language models with 3D foundation models and principles of cognitive science (see my SpatialTree series).
Creating spatial AI agents capable of perceiving, manipulating, and learning from the real physical world.

I am open for collaborations. Please feel free to contact me if you are interested in my research.

Email / Google Scholar / Twitter / Github

ZJU

Zhejiang University

Sep. 2023 - Present

PhD Student

ByteDance Seed

March. 2025 - Present

Top Seed Research Intern

Ant Group

Feb. 2023 - March. 2025

Research Intern

WHU

Wuhan University

Sep. 2019 - June. 2023

Bachelor Degree

Blog

Notes from projects, prototyping diaries, and thoughts on spatial AI. I drop new entries whenever a project teaches me something worth sharing.

CVPR 2026 · Research

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Exploring how spatial abilities emerge and branch out in multimodal large language models: A Taxonomy, Benchmark, and Transfer Analysis.

MLLMs Spatial AI Foundation Models

Explore the project →

ICCV 2025 · Research

SpatialTrackerV2: 3D Point Tracking Made Easy

Making 3D point tracking practical with foundation priors, handling long videos efficiently, and lessons learned from building a production-ready tracking system.

3D Tracking Foundation Models Computer Vision

Explore the project →

Research

(* indicates the equal contribution)

SpatialTree: How Spatial Abilities Branch Out in MLLMs
Yuxi Xiao^*, Longfei Li^*, Shen Yan, Xinhang Liu, Sida Peng, Yunchao Wei^†, Xiaowei Zhou^†, Bingyi Kang^†
CVPR, 2026
project page / arXiv / code

SpatialTrackerV2: 3D Point Tracking Made Easy
Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Iurii Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, Xiaowei Zhou^†,
ICCV, 2025
project page / arXiv / code

/ Online Demo

SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao^*, Qianqian Wang^*, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen^† Xiaowei Zhou^†,
CVPR, 2024 (selected as highlight paper)
project page / arXiv / code

	CoDeF: Content Deformation Fields for Temporally Consistent Video Processing Hao Ouyang^, Qiuyu Wang^, Yuxi Xiao^, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen^†, Yujun Shen^† CVPR*, 2024 (selected as highlight paper) project page / arXiv / code
	Volumetric Wireframe Parsing from Neural Attraction Fields Nan Xue, Bin Tan, Yuxi Xiao, Liang Dong, Gui-Song Xia, Tianfu Wu, CVPR, 2024 project page / arXiv / code
	Level-S²fM: Structure from Motion on Neural Level Set of Implicit Surfaces Yuxi Xiao, Nan Xue, Tianfu Wu, Gui-Song Xia CVPR, 2023 project page / arXiv / code
	DeepMLE: A Robust Deep Maximum Likelihood Estimator for Two-view Structure from Motion Yuxi Xiao, Li Li, Xiaodi Li, Jian Yao IROS, 2022 project page / arXiv / code