Yan Huang's Homepage

Yan Huang received the BSc degree from University of Electronic Science and Technology of China (UESTC) in 2012, and the PhD degree from University of Chinese Academy of Sciences (UCAS) in 2017. Since July 2017, he has joined the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA) as an associate professor. His research interests include multimodal embodied AI and computer vision. He has obtained awards such as the Young Scientist Award of CSIG, Presidential Special Award of CAS, Excellent Doctoral Thesis of both CAS and CAAI, NVIDIA Pioneering Research Award, Baidu Fellowship, CVPR Workshop Best Paper Award, ICPR Best Student Paper Award, and RACV Best Poster Award. CV

Kehan Chen, Dong An, Yan Huang, Rongtao Xu, Yifei Su, Yonggen Ling, Ian Reid, and Liang Wang, Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), accepted, 2025. PDF

Qisen Ma, Yan Huang, Zikun Liu, Hyunhee Park and Liang Wang, Hierarchical Multimodal Knowledge Matching for Training-Free Open-Vocabulary Object Detection, IEEE Transactions on Image Processing (IEEE TIP), accepted, 2025. PDF

Peiyan Li, Hongtao Wu, Yan Huang, Chilam Cheang, Liang Wang, and Tao Kong, GR-MG: Leveraging Partially-Annotated Data Via Multi-Modal Goal-Conditioned Policy, IEEE Robotics and Automation Letters (IEEE RAL), 10(2): 1912-1919, 2025 PDF

Yan Huang, Yuming Wang, Yunan Zeng, Junshi Huang, Zhenhua Chai, and Liang Wang, Unpaired Image-text Matching via Multimodal Aligned Conceptual Knowledge, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 47(7): 5160-5176, 2025. PDF

Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, and Liang Wang, ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 47(7): 5130-5145, 2025. PDF

Ke Han, Yan Huang, Liang Wang, and Zikun Liu, Self-Supervised Recovery and Guide for Low-Resolution Person Re-Identification, IEEE Transactions on Information Forensics and Security (IEEE TIFS), 19: 6252-6263, 2024. PDF

Kai Niu, Yanyi Liu, Yuzhou Long, Yan Huang, Liang Wang, and Yanning Zhang, An Overview of Text-based Person Search: Recent Advances and Future Directions, IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), 34(9): 7803-7819, 2024. PDF

Kai Niu, Linjiang Huang, Yuzhou Long, Yan Huang, Liang Wang, and Yanning Zhang, Comprehensive Attribute Prediction Learning for Person Search by Language, IEEE Transactions on Image Processing (IEEE TIP), 33: 1990-2003, 2024. PDF

Leqi Ding, Lei Liu, Yan Huang, Chenglong Li, Cheng Zhang, Wei Wang, Liang Wang, Text-to-Image Vehicle Re-Identification: Multi-Scale Multi-View Cross-Modal Alignment Network and a Unified Benchmark, IEEE Transactions on Intelligent Transportation Systems (IEEE TITS), 25(7): 7673-7686, 2024. PDF

Keji He, Ya Jing, Yan Huang, Zhihe Lu, Dong An, Liang Wang, Memory-Adaptive Vision-and-Language Navigation, Pattern Recognition (PR), accepted, 2024 PDF

Yan Huang, Yuming Wang, and Liang Wang, Efficient Image and Sentence Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 45(3): 2970-2983, 2023. PDF

Chong Liu, Yuqi Zhang, Hongsong Wang, Weihua Chen, Fan Wang, Yan Huang, Yi-Dong Shen, and Liang Wang, Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive Training, IEEE Transactions on Image Processing (IEEE TIP), 32: 3622-3633, 2023. PDF

Zhengxiong Luo, Yan Huang, Shang Li, Liang Wang, and Tieniu Tan, End-to-End Alternating Optimization for Real-World Blind Super Resolution, International Journal of Computer Vision (IJCV), 131: 3152–3169, 2023. PDF

Yan Huang, Jingdong Wang, and Liang Wang, Few-Shot Image and Sentence Matching via Aligned Cross-Modal Memory, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 44(6): 2968-2983, 2022. PDF

Jianhua Yang, Yan Huang, Kai Niu, Linjiang Huang, Zhanyu Ma, and Liang Wang, Actor and Action Modular Network for Text-based Video Segmentation, IEEE Transactions on Image Processing (IEEE TIP), 31: 4474-4489, 2022. PDF

Wenlong Cheng, Wei Tang, Yan Huang, Yiwen Luo, and Liang Wang, A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval, IEEE Transactions on Multimedia (IEEE TMM), 25: 4067-4080, 2022. PDF

Hongyuan Yu, Houwen Peng, Yan Huang, Hao Du, Jianlong Fu, Liang Wang, and Haibin Ling, Cyclic Differentiable Architecture Search, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 45(1): 211-228, 2022. PDF

Zerui Chen, Yan Huang, Hongyuan Yu, and Liang Wang, Learning a Robust Part-Aware Monocular 3D Human Pose Estimator via Neural Architecture Search, International Journal of Computer Vision (IJCV), 130: 56–75, 2022. PDF

Yuchun Fang, Zhengye Xiao, Wei Zhang, Yan Huang, Liang Wang, Nozha Boujemaa, and Donald Geman, Attribute Prototype Learning for Interactive Face Retrieval, IEEE Transactions on Information Forensics and Security (IEEE TIFS), 16: 2593-2607, 2021. PDF

Chao Fan, Hongyuan Yu, Yan Huang, Caifeng Shan, Liang Wang, and Chenglong Li, SiamON: Siamese Occlusion-aware Network for Visual Tracking, IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), 33(1): 186-199, 2021. PDF

Linjiang Huang, Yan Huang, Wanli Ouyang, and Liang Wang, Two-Branch Relational Prototypical Network for Weakly Supervised Temporal Action Localization, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 44(9): 5729-5746, 2022. PDF

Linjiang Huang, Yan Huang, Wanli Ouyang, and Liang Wang, Modeling Sub-Actions for Weakly Supervised Temporal Action Localization, IEEE Transactions on Image Processing (IEEE TIP), 30: 5154-5167, 2021. PDF

Aihua Zheng, Menglan Hu, Bo Jiang, Yan Huang, Yan Yan, and Bin Luo, Adversarial-Metric Learning for Audio-Visual Cross-Modal Matching, IEEE Transactions on Multimedia (IEEE TMM), 24: 338-351, 2021. PDF

Hongyuan Yu, Yan Huang, Lihong Pi, Chengquan Zhang, Xuan Li, and Liang Wang, End-to-end Video Text Detection with Online Tracking, Pattern Recognition (PR), accepted, 2021. PDF

Ke Han, Yan Huang, Chunfeng Song, Liang Wang, and Tieniu Tan, Adaptive Super-Resolution for Person Re-Identification with Low-Resolution Images, Pattern Recognition (PR), accepted, 2021. PDF

Yan Huang, Qi Wu, Wei Wang, and Liang Wang, Image and Sentence Matching via Semantic Concepts and Order Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 42(3): 636-650, 2020. PDF

Kai Niu, Yan Huang, Wanli Ouyang, and Liang Wang, Improving Description-based Person Re-identification by Multi-granularity Image-text Alignments, IEEE Transactions on Image Processing (IEEE TIP), 29: 5542-5556, 2020. PDF

Weining Wang, Yan Huang, and Liang Wang, Long Video Question Answering: A Matching-guided Attention Model, Pattern Recognition (PR), accepted, 2020. PDF

Kai Niu, Yan Huang, and Liang Wang, Re-ranking Image-text Matching by Adaptive Metric Fusion, Pattern Recognition (PR), accepted, 2020. PDF

Chunfeng Song, Yongzhen Huang, Yan Huang, Ning Jia, and Liang Wang, GaitNet: An End-to-end Network for Gait Based Human Identification, Pattern Recognition (PR), accepted, 2019. PDF

Linjiang Huang, Yan Huang, Wanli Ouyang, and Liang Wang, Part-Aligned Pose-Guided Recurrent Network for Action Recognition, Pattern Recognition (PR), 96:165-176, 2019. PDF

Yan Huang, Wei Wang, and Liang Wang, Video Super-resolution via Bidirectional Recurrent Convolutional Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 40(4), 1015-1028, 2018. PDF

Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan, Conditional High-order Boltzmann Machines for Supervised Relation Learning, IEEE Transactions on Image Processing (IEEE TIP), 26(9):4297-4310, 2017. PDF

Yan Huang, Wei Wang, and Liang Wang, Unconstrained Multimodal Multi-Label Learning, IEEE Transactions on Multimedia (IEEE TMM), 17(11):1923-1935, 2015. PDF

Peiyan Li, Yixiang Chen, Hongtao Wu, Xiao Ma, Xiangnan Wu, Yan Huang, Liang Wang, Tao Kong, Tieniu Tan, BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models, Neural Information Processing Systems (NeurIPS), accepted, 2025. PDF

Yixiang Chen, Peiyan Li, Yan Huang, Jiabing Yang, Kehan Chen, and Liang Wang, EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow, IEEE International Conference on Computer Vision (ICCV), accepted, 2025. PDF

Zhigang Wang, Yifei Su, Chenhui Li, Dong Wang, Yan Huang, Xuelong Li, and Bin Zhao, Open-Vocabulary Octree-Graph for 3D Scene Understanding, IEEE International Conference on Computer Vision (ICCV), accepted, 2025. PDF

Zhuming Wang, Yihao Zheng, Jiarui Li, Yaofei Wu, Yan Huang, Zun Li, Lifang Wu, and Liang Wang, VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition, ACM Conference on Multimedia (MM), accepted, 2025. PDF

Yifei Su, Dong An, Kehan Chen, Weichen Yu, Baiyang Ning, Yonggen Ling, Yan Huang, Liang Wang, Learning Fine-Grained Alignment for Aerial Vision-Dialog Navigation, AAAI Conference on Artificial Intelligence (AAAI), pp. 7060-7068, 2025. PDF

Keji He, Kehan Chen, Jiawang Bai, Yan Huang, Qi Wu, Shu-Tao Xia, and Liang Wang, Everyday Object Meets Vision-and-Language Navigation Agent via Backdoor, Neural Information Processing Systems (NeurIPS), pp. 49684-49705, 2024. PDF

Jilong Wang, Saihui Hou, Yan Huang, Chunshui Cao, Xu Liu, Yongzhen Huang, Tianzhu Zhang, and Liang Wang, Free Lunch for Gait Recognition: A Novel Relation Descriptor, European Conference on Computer Vision (ECCV), pp. 39–56, 2024. PDF

Yunan Zeng, Yan Huang, Jinjin Zhang, Zequn Jie, Zhenhua Chai, and Liang Wang, Investigating Compositional Challenges in Vision-Language Models for Visual Grounding, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14141-14151, 2024. PDF

Keji He, Chenyang Si, Zhihe Lu, Yan Huang, Liang Wang, and Xinchao Wang, Frequency-Enhanced Data Augmentation for Vision-and-Language Navigation, Neural Information Processing Systems (NeurIPS), pp. 4351-4364, 2023. PDF

Dong An, Yuankai Qi, Yangguang Li, Yan Huang, Liang Wang, Tieniu Tan, and Jing Shao, BEVBert: Multimodal Map Pre-training for Language-guided Navigation, IEEE International Conference on Computer Vision (ICCV), pp. 2737-2748, 2023. PDF

Jilong Wang, Saihui Hou, Yan Huang, Chunshui Cao, Xu Liu, Yongzhen Huang, and Liang Wang, Causal Intervention for Sparse-View Gait Recognition, ACM Conference on Multimedia (MM), pp. 77-85, 2023. PDF

Zhengxiong Luo, Dayou Chen, Yingya Zhang, Yan Huang, Liang Wang, Yujun Shen, Deli Zhao, Jingren Zhou, and Tieniu Tan, VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10209-10218, 2023. PDF

Ke Han, Shaogang Gong, Yan Huang, Liang Wang, Tieniu Tan, Clothing-Change Feature Augmentation for Person Re-Identification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22066-22075, 2023. PDF

Weichen Yu, Tianyu Pang, Qian Liu, Chao Du, Bingyi Kang, Yan Huang, Min Lin, Shuicheng Yan, Bag of tricks for training data extraction from language models, International Conference on Machine Learning (ICML), pp. 40306-40320, 2023. PDF

Yan Huang, Yuming Wang, Yunan Zeng, and Liang Wang, MACK: Multimodal Aligned Conceptual Knowledge for Unpaired Image-text Matching, Neural Information Processing Systems (NeurIPS), pp. 7892-7904, 2022. PDF

Kai Niu, Linjiang Huang, Yan Huang, Peng Wang, Liang Wang, and Yanning Zhang, Cross-modal Co-occurrence Attributes Alignments for Person Search by Language, ACM Conference on Multimedia (MM), pp. 4426–4434, 2022. PDF

Weichen Yu, Hongyuan Yu, Yan Huang, and Liang Wang, Generalized Inter-class Loss for Gait Recognition, ACM Conference on Multimedia (MM), pp. 141–150, 2022. PDF

Hongyuan Yu, Tian Li, Weichen Yu, Jianguo Li, Yan Huang, Liang Wang, and Alex Liu, Regularized Graph Structure Learning with Semantic Knowledge for Multi-variates Time-Series Forecasting, International Joint Conference on Artificial Intelligence (IJCAI), 2362-2368, 2022. PDF

Zhengxiong Luo, Yan Huang*, Shang Li, Liang Wang, and Tieniu Tan, Learning the Degradation Distribution for Blind Image Super-Resolution, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-10, 2022. PDF

Ke Han, Chenyang Si, Yan Huang*, Liang Wang, and Tieniu Tan, Generalizable Person Re-Identification via Self-Supervised Batch Norm Test-Time Adaption, AAAI Conference on Artificial Intelligence (AAAI), pp. 817-825, 2022. PDF

Keji He, Yan Huang, Qi Wu, Jianhua Yang, Dong An, Shuanglin Sima, and Liang Wang, Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision, Neural Information Processing Systems (NeurIPS), pp. 652-663, 2021. PDF

Dong An, Yuankai Qi, Yan Huang*, Qi Wu, Liang Wang, and Tieniu Tan, Neighbor-view Enhanced Model for Vision and Language Navigation, ACM Conference on Multimedia (MM), pp. 5101-5109, 2021. (Oral) PDF

Zhengxiong Luo, Zhicheng Wang, Yan Huang, Shang Li, Liang Wang, Tieniu Tan, and Erjin Zhou, Rethinking the Heatmap Regression for Bottom-Up Human Pose Estimation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13264-13273, 2021. PDF

Zhengxiong Luo, Yan Huang*, Shang Li, Liang Wang, and Tieniu Tan, Unfolding the Alternating Optimization for Blind Super Resolution, Neural Information Processing Systems (NeurIPS), pp. 5632-5643, 2020. PDF

Kai Niu, Yan Huang, and Liang Wang, Textual Dependency Embedding for Person Search by Language, ACM Conference on Multimedia (MM), pp. 4032–4040, 2020. PDF

Zerui Chen, Yan Huang, Hongyuan Yu, Bin Xue, Ke Han, Yiru Guo, and Liang Wang, Towards Part-aware Monocular 3D Human Pose Estimation: An Architecture Search Approach, European Conference on Computer Vision (ECCV), pp. 715–732, 2020. (Spotlight) PDF

Ke Han, Yan Huang, Zerui Chen, Liang Wang, Tieniu Tan, Prediction, Recovery and Identification: Adaptive Low-Resolution Person Re-Identification, European Conference on Computer Vision (ECCV), pp. 193–209, 2020. PDF

Linjiang Huang, Yan Huang, Wanli Ouyang, and Liang Wang, Relational Prototypical Network for Weakly Supervised Temporal Action Localization, AAAI Conference on Artificial Intelligence (AAAI), pp. 11053-11060, 2020. (Oral) PDF

Linjiang Huang, Yan Huang, Wanli Ouyang, and Liang Wang, Part-Level Graph Convolutional Network for Skeleton-Based Action Recognition, AAAI Conference on Artificial Intelligence (AAAI), pp. 11045-11052, 2020. (Oral) PDF

Yan Huang and Liang Wang, ACMM: Aligned Cross-Modal Memory For Few-Shot Image and Sentence Matching, IEEE International Conference on Computer Vision (ICCV), pp. 5774-5783, 2019. PDF

Yan Huang, Yang Long, and Liang Wang, Few-Shot Image and Sentence Matching via Gated Visual-Semantic Embedding, AAAI Conference on Artificial Intelligence (AAAI), pp. 8489-8496, 2019. (Spotlight) PDF

Weining Wang, Yan Huang, and Liang Wang, Language-driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 334-343, 2019. (Oral) PDF

Chunfeng Song, Yan Huang, Wanli Ouyang, and Liang Wang, Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3136-3145, 2019. PDF

Kai Niu, Yan Huang, and Liang Wang, Fusing Two Directions in Cross-domain Adaption for Real Life Person Search by Language, IEEE International Conference on Computer Vision Workshop (ICCVW), 2019. (Oral) PDF

Yan Huang, Qi Wu, Chunfeng Song, and Liang Wang, Learning Semantic Concepts and Order for Image and Sentence Matching, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6163-6171, 2018. (Spotlight) PDF

Chunfeng Song, Yan Huang, Wanli Ouyang, and LiangWang, Mask-Guided Contrastive Attention Model for Person Re-Identification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1179-1188, 2018. PDF

Junbo Wang, Wei Wang, Yan Huang, Liang Wang, and Tieniu Tan, Multimodal Memory Modelling for Video Captioning, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7512-7520, 2018. (Spotlight) PDF

Junbo Wang, Wei Wang, Yan Huang, Liang Wang, and Tieniu Tan, Hierarchical Memory Modelling for Video Captioning, ACM Conference on Multimedia (MM), pp. 63-71, 2018. PDF

Chenglong Li, Chengli Zhu, Yan Huang, Jin Tang, and Liang Wang, Cross-Modal Ranking with Soft Consistency and Noisy Labels for Robust RGB-T Tracking, European Conference on Computer Vision (ECCV), pp. 831-847, 2018. PDF

Yan Huang, Wei Wang, and Liang Wang, Instance-aware Image and Sentence Matching with Selective Multimodal LSTM, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2310-2318, 2017. PDF

Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan, See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6776-6785, 2017. PDF

Yan Huang, Wei Wang, and Liang Wang, Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution, Neural Information Processing Systems (NeurIPS), pp. 235-243, 2015. PDF

Yan Huang, Wei Wang, and Liang Wang, Conditional High-order Boltzmann Machine: A Supervised Learning Model for Relation Learning, IEEE International Conference on Computer Vision (ICCV), pp. 4265-4273, 2015. PDF

Peihao Huang, Yan Huang, Wei Wang, and Liang Wang, Deep Embedding Network for Clustering, International Conference on Pattern Recognition (ICPR), pp. 1532-1537, 2014. (Best Student Paper Award) PDF

Wei Wang, Yan Huang, Yizhou Wang, and Liang Wang, Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction, IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) , pp. 490-497, 2014. (Best Paper Award) PDF