Puhao Li | 李浦豪

I am currently a Ph.D. student in Dept. of Automation, Tsinghua University advised by Prof. Song-Chun Zhu. I am also a research intern in General Vision Lab at Beijing Institute for General Artificial Intelligence (BIGAI), and I am grateful to be advised by Dr. Tengyu Liu and Dr. Siyuan Huang. Previously, I obtained my B.Eng. degree from Tsinghua University in 2023.

My research interests lie in the intersection of robotics manipulation and 3D computer vision. My long-term goal is to develop embodied intelligent systems capable of interpreting human intent and naturally interacting with people in various environments, learning reusable and endless low-level skill sets and high-level common sense. Currently, I am working on 3D scene understanding and robotic manipulation learning, pushing the boundaries of how robots operate within complex settings.

Email / CV / Google Scholar / Github / Twitter

	Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation Yuyang Li, Yinghan Chen, Zihang Zhao, Puhao Li, Tengyu Liu, Siyuan Huang, Yixin Zhu arXiv 2025 [Paper] [Code] [Data] [Hardware] [Project Page] We introduce TacThru, an STS sensor enabling simultaneous visual perception and robust tactile signal extraction, and TacThru-UMI, an imitation learning framework that leverages these multimodal signals for robotic manipulation.
	Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation Yuyang Li, Wenxin Du, Chang Yu, Puhao Li, Zihang Zhao, Tengyu Liu, Chenfanfu Jiang, Yixin Zhu, Siyuan Huang NeurIPS* 2025 (Spotlight) [Paper] [Code] [Docs] [NVIDIA Tech Blog] We develop Taccel, a high-performance GPU-based simulator, combining ABD and IPC, for simulating robots with vision-based tactile sensors.
	ControlVLA: Few-shot Object-centric Adaptation for Pre-trained VLA models Puhao Li, Yingying Wu, Ziheng Xi, Wanlin Li, Yuzhe Huang, Zhiyuan Zhang, Yinghan Chen, Jianan Wang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang CoRL 2025 [Paper] [Code] [Project Page] We introduce ControlVLA, a few-shot object-centric adaptation method for pre-trained VLA. By reducing demonstrations requirements, ControlVLA lowers barriers to deploying robots in diverse scenarios.
	GWM: Towards Scalable Gaussian World Models for Robotic Manipulation Guanxing Lu, Baoxiong Jia, Puhao Li, Yixin Chen, Ziwei Wang, Yansong Tang, Siyuan Huang, ICCV* 2025 [Paper] [Code] [Project Page] We present Gaussian World Model (GWM), a world model that predicts future dynamics and enables robotic manipulation using 3D Gaussian Splatting.
	Ag2x2: Robust Agent-Agnostic Visual Representations for Zero-Shot Bimanual Manipulation Ziyin Xiong, Yinghan Chen, Puhao Li, Yixin Zhu, Tengyu Liu, Siyuan Huang, IROS 2025 [Paper] [Code] [Project Page] We propose Ag2x2, a learning framework for bimanual manipulation through coordination-aware visual representations that jointly encode object states and hand motion patterns while maintaining agent-agnosticism.
	ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning Kailin Li, Puhao Li, Tengyu Liu, Yuyang Li, Siyuan Huang CVPR 2025 [Paper] [Code] [Data] [Project Page] We introduce ManipTrans, a novel method for efficiently transferring human skills to dexterous robotic hands in simulation. Leveraging ManipTrans, we contribute DexManipNet, a large-scale dexterous manipulation dataset with diverse tasks.
	MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans Huangyue Yu, Baoxiong Jia, Yixin Chen, Yandan Yang, Puhao Li, Rongpeng Su, Jiaxin Li, Qing Li, Wei Liang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang CVPR* 2025 [Paper] [Code] [Data] [Project Page] We present MetaScenes, a large-scale 3D scene dataset constructed from real-world scans. It features 706 scenes with 15,366 objects across a wide range of types, with realistic layouts, visually accurate appearances and physical plausibility.
	PhysPart: Physically Plausible Part Completion for Interactable Objects Rundong Luo, Haoran Geng, Congyue Deng, Puhao Li, Zan Wang, Baoxiong Jia, Leonidas Guibas, Siyuan Huang ICRA 2025 [Paper] [Project Page] We propose a diffusion-based part generation model that utilizes geometric conditioning through classifier-free guidance and formulates physical constraints as a set of stability and mobility losses to guide the sampling process.
	PhyRecon: Physically Plausible Neural Scene Reconstruction Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bing Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang NeurlPS 2024 [Paper] [Code] [Project Page] We introduce PhyRecon, which enables physically plausible 3D scene reconstruction. PhyRecon features a joint optimization framwork incorporating both differentiable rendering and physics-based objectives.
	Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang IROS 2024 (Oral Pitch) [Paper] [Code] [Project Page] We introduce Ag2Manip, which enables various robotic manipulation tasks without any domain-specific demonstrations. Ag2Manip also supports robust imitation learning of manipulation skills in the real world.
	Grasp Multiple Objects with One Hand Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang RA-L, presented at IROS 2024 (Oral Presentation) [Paper] [Code] [Data] [Project Page] We introduce MultiGrasp, a two-stage framework for simultaneous multi-object grasping with multi-finger dexterous hands. In addition, we contribute Grasp'Em, a large-scale synthetic multi-object grasping dataset.
	Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang CVPR 2024 (Highlight) [Paper] [Code] [Project Page] We introduce a novel two-stage framework that employs scene affordance as an intermediate representation, effectively linking 3D scene grounding and conditional motion generation.
	An Embodied Generalist Agent in 3D World Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang ICML 2024 ICLR 2024 @ LLMAgents Workshop [Paper] [Code] [Data] [Project Page] We introduce LEO, an embodied multi-modal and multi-task generalist agent that excels in perceiving, grounding, reasoning, planning, and acting in 3D world.
	Diffusion-based Generation, Optimization, and Planning in 3D Scenes Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, Song-Chun Zhu CVPR 2023 [Paper] [Code] [Project Page] [Hugging Face] We introduce SceneDiffuser, a unified conditional generative model for 3D scene understanding. In contrast to prior work, SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented.
	GenDexGrasp: Generalizable Dexterous Grasping Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang, Siyuan Huang ICRA 2023 [Paper] [Code] [Data] [Project Page] We introduce GenDexGrasp, a versatile dexterous grasping method that can generalize to out-of-domain robotic hands. In addition, we contribute MultiDex, a large-scale synthetic dexterous grasping dataset.
	DexGraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation Ruicheng Wang, Jialiang Zhang, Jiayi Chen, Yinzhen Xu, Puhao Li, Tengyu Liu, He Wang ICRA 2023 (Oral Presentation, Outstanding Manipulation Paper Finalist) [Paper] [Code] [Data] [Project Page] We introduce a large-scale dexterous grasping dataset DexGraspNet, which based on simulation. DexGraspNet features more physical stability and higher diversity than previous grasping datasets.

	Tsinghua University, China 2023.09 - present Ph.D. Student Advisor: Prof. Song-Chun Zhu
	Beijing Institute for General Artificial Intelligence (BIGAI), China 2021.09 - present Research Intern Advisor: Dr. Tengyu Liu and Dr. Siyuan Huang
	Tsinghua University, China 2019.08 - 2023.06 Undergraduate Student

Fell free to contact me if you have any problem. Thanks for your visiting by 😊
This page is designed based on Jon Barron's website and deployed on Github Pages.

	Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation Yuyang Li, Yinghan Chen, Zihang Zhao, Puhao Li, Tengyu Liu, Siyuan Huang, Yixin Zhu arXiv 2025 [Paper] [Code] [Data] [Hardware] [Project Page] We introduce TacThru, an STS sensor enabling simultaneous visual perception and robust tactile signal extraction, and TacThru-UMI, an imitation learning framework that leverages these multimodal signals for robotic manipulation.
	Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation Yuyang Li, Wenxin Du, Chang Yu, Puhao Li, Zihang Zhao, Tengyu Liu, Chenfanfu Jiang, Yixin Zhu, Siyuan Huang NeurIPS* 2025 (Spotlight) [Paper] [Code] [Docs] [NVIDIA Tech Blog] We develop Taccel, a high-performance GPU-based simulator, combining ABD and IPC, for simulating robots with vision-based tactile sensors.
	ControlVLA: Few-shot Object-centric Adaptation for Pre-trained VLA models Puhao Li, Yingying Wu, Ziheng Xi, Wanlin Li, Yuzhe Huang, Zhiyuan Zhang, Yinghan Chen, Jianan Wang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang CoRL 2025 [Paper] [Code] [Project Page] We introduce ControlVLA, a few-shot object-centric adaptation method for pre-trained VLA. By reducing demonstrations requirements, ControlVLA lowers barriers to deploying robots in diverse scenarios.
	GWM: Towards Scalable Gaussian World Models for Robotic Manipulation Guanxing Lu, Baoxiong Jia, Puhao Li, Yixin Chen, Ziwei Wang, Yansong Tang, Siyuan Huang, ICCV* 2025 [Paper] [Code] [Project Page] We present Gaussian World Model (GWM), a world model that predicts future dynamics and enables robotic manipulation using 3D Gaussian Splatting.
	Ag2x2: Robust Agent-Agnostic Visual Representations for Zero-Shot Bimanual Manipulation Ziyin Xiong, Yinghan Chen, Puhao Li, Yixin Zhu, Tengyu Liu, Siyuan Huang, IROS 2025 [Paper] [Code] [Project Page] We propose Ag2x2, a learning framework for bimanual manipulation through coordination-aware visual representations that jointly encode object states and hand motion patterns while maintaining agent-agnosticism.
	ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning Kailin Li, Puhao Li, Tengyu Liu, Yuyang Li, Siyuan Huang CVPR 2025 [Paper] [Code] [Data] [Project Page] We introduce ManipTrans, a novel method for efficiently transferring human skills to dexterous robotic hands in simulation. Leveraging ManipTrans, we contribute DexManipNet, a large-scale dexterous manipulation dataset with diverse tasks.
	MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans Huangyue Yu, Baoxiong Jia, Yixin Chen, Yandan Yang, Puhao Li, Rongpeng Su, Jiaxin Li, Qing Li, Wei Liang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang CVPR* 2025 [Paper] [Code] [Data] [Project Page] We present MetaScenes, a large-scale 3D scene dataset constructed from real-world scans. It features 706 scenes with 15,366 objects across a wide range of types, with realistic layouts, visually accurate appearances and physical plausibility.
	PhysPart: Physically Plausible Part Completion for Interactable Objects Rundong Luo, Haoran Geng, Congyue Deng, Puhao Li, Zan Wang, Baoxiong Jia, Leonidas Guibas, Siyuan Huang ICRA 2025 [Paper] [Project Page] We propose a diffusion-based part generation model that utilizes geometric conditioning through classifier-free guidance and formulates physical constraints as a set of stability and mobility losses to guide the sampling process.
	PhyRecon: Physically Plausible Neural Scene Reconstruction Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bing Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang NeurlPS 2024 [Paper] [Code] [Project Page] We introduce PhyRecon, which enables physically plausible 3D scene reconstruction. PhyRecon features a joint optimization framwork incorporating both differentiable rendering and physics-based objectives.
	Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang IROS 2024 (Oral Pitch) [Paper] [Code] [Project Page] We introduce Ag2Manip, which enables various robotic manipulation tasks without any domain-specific demonstrations. Ag2Manip also supports robust imitation learning of manipulation skills in the real world.
	Grasp Multiple Objects with One Hand Yuyang Li, Bo Liu, Yiran Geng, Puhao Li, Yaodong Yang, Yixin Zhu, Tengyu Liu, Siyuan Huang RA-L, presented at IROS 2024 (Oral Presentation) [Paper] [Code] [Data] [Project Page] We introduce MultiGrasp, a two-stage framework for simultaneous multi-object grasping with multi-finger dexterous hands. In addition, we contribute Grasp'Em, a large-scale synthetic multi-object grasping dataset.
	Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang CVPR 2024 (Highlight) [Paper] [Code] [Project Page] We introduce a novel two-stage framework that employs scene affordance as an intermediate representation, effectively linking 3D scene grounding and conditional motion generation.
	An Embodied Generalist Agent in 3D World Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang ICML 2024 ICLR 2024 @ LLMAgents Workshop [Paper] [Code] [Data] [Project Page] We introduce LEO, an embodied multi-modal and multi-task generalist agent that excels in perceiving, grounding, reasoning, planning, and acting in 3D world.
	Diffusion-based Generation, Optimization, and Planning in 3D Scenes Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, Song-Chun Zhu CVPR 2023 [Paper] [Code] [Project Page] [Hugging Face] We introduce SceneDiffuser, a unified conditional generative model for 3D scene understanding. In contrast to prior work, SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented.
	GenDexGrasp: Generalizable Dexterous Grasping Puhao Li, Tengyu Liu, Yuyang Li, Yiran Geng, Yixin Zhu, Yaodong Yang, Siyuan Huang ICRA 2023 [Paper] [Code] [Data] [Project Page] We introduce GenDexGrasp, a versatile dexterous grasping method that can generalize to out-of-domain robotic hands. In addition, we contribute MultiDex, a large-scale synthetic dexterous grasping dataset.
	DexGraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation Ruicheng Wang, Jialiang Zhang, Jiayi Chen, Yinzhen Xu, Puhao Li, Tengyu Liu, He Wang ICRA 2023 (Oral Presentation, Outstanding Manipulation Paper Finalist) [Paper] [Code] [Data] [Project Page] We introduce a large-scale dexterous grasping dataset DexGraspNet, which based on simulation. DexGraspNet features more physical stability and higher diversity than previous grasping datasets.