Miao Liu - Academic Website

I'm an Assistant Professor at Tsinghua University, College of Artificial Intelligence.

I'm currently leading the MEOW (Modeling Egocentric Omni World) Lab, which is committed to the following research agenda: Designing human-centered AI that sees through your eyes, learns your skills, and understands your intentions -- 构建能“看你所见、学你所会、懂你所想”的下一代人本智能系统

Previously, I was a Research Scientist at META GenAI, primarily focusing on egocentric vision and generative AI models. I completed my Ph.D. in Robotics at Georgia Tech, advised by Prof. James Rehg. I also work closely with Prof. Yin Li from the University of Wisconsin–Madison. I was fortunate to collaborate with Prof. Siyu Tang and Prof. Michael Black during my visit to ETH Zurich and the Max Planck Institute. I enjoyed a wonderful internship at Facebook Reality Labs, where I worked with Dr. Chao Li, Dr. Lingni Ma, Dr. Kiran Somasundaram, and Prof. Kristen Grauman on egocentric action recognition and localization. I am honored to have received several awards, including Best Paper Candidate at CVPR 2022 and ECCV 2024, and the BMVC Best Student Paper Award. As a primary contributor, I have helped construct several widely recognized egocentric video datasets, including Ego4D, Ego-Exo4D, EGTEA Gaze+, and the Behavior Vision Suite. I have also desgined multiple models that will be deployed in the next-generation smart glasses developed by Meta Reality Labs. During my time at Meta GenAI, I was deeply involved in the training and evaluation of large-scale generative multimodal models, including EMU, Llama3, and Llama4 (multimodal components only).

*This background image of Jaime Lannister charging alone at Daenerys and her dragon reveals what it often takes to do science—you must be willing to stand as the lonely warrior.

Our research is dedicated to Bridging Minds and Machines by leveraging egocentric vision and generative AI to enable AI systems that understand and anticipate human behavior and intentions, and thereby assist human daily life. Our key research directions include:

Human Skill Transfer: Facilitating skill transfer between humans and from humans to robots through augmented reality, enabling efficient and natural human-AI collaboration.
Personalized AI Systems: Building generative AI models that continuously evolve based on user interaction history and preferences, capable of understanding context and adapting to individual users.
AI Agents with Theory of Mind: Developing proactive AI agents that model users’ intentions and cognitive load, leading to more intuitive and seamless human-AI interaction.

My group is always looking for talented students to join us on this journey. For students from Mainland China, please see the note here. For international students, please contact me directly via email.

News

Selected Publications

Taiying Peng, Jiacheng Hua, Miao Liu†, Feng Lu†. In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting, accepted by Neural Information Systems Processing (NeurIPS) 2025 [arXiv] †: Co-corresponding Author
Zeyi Huang*, Yuyang Ji*, Xiaofang Wang, Nikhil Mehta, Tong Xiao, Donghyun Lee, Sigmund Vanvalkenburgh, Shengxin Zha, Bolin Lai, Licheng Yu, Ning Zhang, Yong Jae Lee†, Miao Liu†. Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs, accepted by Computer Vision and Pattern Recognition Conference (CVPR) 2025 [arXiv]
Bolin Lai, Felix Juefei-Xu, Miao Liu, Xiaoliang Dai, Nikhil Mehta, Chenguang Zhu, Zeyi Huang, James M. Rehg, Sangmin Lee, Ning Zhang, Tong Xiao. Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation, accepted by Computer Vision and Pattern Recognition Conference (CVPR) 2025 (Spotlight). [arXiv]
Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu. LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning, accepted by European Conference on Computer Vision (ECCV) 2024 (Oral, Best Paper Award Candidate 15/8585). [arXiv]
Bolin Lai, Fiona Ryan, Wenqi Jia, Miao Liu†, James M. Rehg†. Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation, accepted by European Conference on Computer Vision (ECCV) 2024. [arXiv] †: Co-corresponding Author
Yunhao Ge*, Yihe Tang*, Jiashu Xu*, Cem Gokmen*, Chengshu Li, Wensi Ai, Benjamin Jose Martinez, Arman Aydin, Mona Anvari, Ayush K Chakravarthy, Hong-Xing Yu, Josiah Wong, Sanjana Srivastava, Sharon Lee, Shengxin Zha, Laurent Itti, Yunzhu Li, Roberto Martín-Martín, Miao Liu, Pengchuan Zhang, Ruohan Zhang, Li Fei-Fei, Jiajun Wu. BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation, accepted by Computer Vision and Pattern Recognition Conference (CVPR) 2024 (Spotlight). [arXiv] *: Equal Contribution
With Kristen Grauman, et al. Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives, accepted by Computer Vision and Pattern Recognition Conference (CVPR) 2024 (Oral). [arXiv]
Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James Rehg, Vamsi Krishna Ithapu, Ruohan Gao. The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective, accepted by Computer Vision and Pattern Recognition Conference (CVPR) 2024. [arXiv]
Bolin Lai, Miao Liu†, Fiona Ryan, James M. Rehg. In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond, accepted by International Journal of Computer Vision (IJCV). †: Student Mentor [arXiv]
Bolin Lai*, Hongxin Zhang*, Miao Liu*, Aryan Pariani*, Fiona Ryan, Wenqi Jia, Shirley Anugrah Hayati, James M. Rehg, Diyi Yang. In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation, accepted by the Association for Computational Linguistics (ACL) 2023 (Findings). [arXiv] *: Equal Contribution
Bolin Lai, Miao Liu†, Fiona Ryan, James M. Rehg. In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation, accepted by British Machine Vision Conference (BMVC) 2022. †: Student Mentor, Co-corresponding Author (Spotlight, Best Student Paper Prize). [arXiv]
Wenqi Jia*, Miao Liu*, James M. Rehg. Generative Adversarial Network for Future Hand Segmentation from Egocentric Video, accepted by European Conference on Computer Vision (ECCV) 2022. [arXiv] *: Equal Contribution
Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li, Kristen Grauman, James M. Rehg, Chao Li. Egocentric Activity Recognition and Localization on a 3D Map, accepted by European Conference on Computer Vision (ECCV) 2022. [arXiv]
With Kristen Grauman, et al. Ego4D: Around the World in 3,000 Hours of Egocentric Video, accepted by Computer Vision and Pattern Recognition Conference (CVPR) 2022 (Oral, best paper finalist, 33/8161). [arXiv] Key driver for Social Benchmark and Forecasting Benchmark
Miao Liu, Dexin Yang, Yan Zhang, Zhaopeng Cui, James M. Rehg, and Siyu Tang. 4D Human Body Capture from Egocentric Video via 3D Scene Grounding, accepted by International Conference on 3D Vision. [arXiv] [project page]
Yin Li, Miao Liu, and James M. Rehg. In the Eye of the Beholder: Gaze and Actions in First Person Video, accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2021. [arXiv]
Miao Liu, Xin Chen, Yun Zhang, Yin Li, and James M. Rehg. Attention Distillation for Learning Video Representations, accepted by British Machine Vision Conference (BMVC) 2020 (Oral, acceptance rate 5.0%). [pdf] [project page]
Yun Zhang*, Shibo Zhang*, Miao Liu, Elyse Daly, Samuel Battalio, Santosh Kumar, Bonnie Spring, James M. Rehg, Dr. Nabil Alshurafa. SyncWISE: Window Induced Shift Estimation for Synchronization of Video and Accelerometry from Wearable Sensors, accepted by Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. (IMUWT/UbiComp) 2020 (* denotes equal contribution). [pdf]
Miao Liu, Siyu Tang, Yin Li, and James M. Rehg. Forecasting Human Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Vision, accepted by European Conference on Computer Vision (ECCV) 2020 (Oral, acceptance rate 2.0%). [pdf] [project page]
Yin Li, Miao Liu, and James M. Rehg. In the Eye of Beholder: Joint Learning of Gaze and Actions in First Person Video, accepted by European Conference on Computer Vision (ECCV) 2018. [pdf]

Teaching

2026 Spring: Multimodal Generative AI System Design (80940012)

Students

Ph.D. Students
Jiacheng Hua, 2025 -

Yichi Zhang, 2025 -

Chi Zhang, 2026 -

Jinzhao Li, 2026 -

Yuhang Wu, 2026 -

Alumni

Contact

miaoliu@mail.tsinghua.edu.cn; lmaptx4869@gmail.com