Huck Yang - NVIDIA Research
About
I focus on π£οΈ speech-language alignment and scaling laws. Prior to joining NVIDIA, I worked full-time at Amazon ASR-LM, working with Andreas Stolcke in Ivan Bulyko's team, and as a Research Scientist intern at Google Speech & Brain teams (now DeepMind), co-hosted by Bo Li and Yu Zhang in Tara N. Sainath's team.
π My Ph.D. topic is on noise-robust voice model adaptation (now post-training), advised by Prof. Chin-Hui Lee.
𧬠I visited Prof. Jesper Tegnér's group on self-evolutionary algorithms and interned at TSMC in mixed-signal IC design before starting my PhD.
- Acoustic Prompting / Efficient Post-Training: I introduced the first prompt-adaptation method (i.e., trainable inputs plus label mappings) to frozen acoustic models [ICML 21], concurrent with prefix-tuning (ACL 21). Best paper nominee ποΈ on a multilingual study in [Interspeech 23]; a Google affiliated patent [ICASSP 23].
- Text Hypotheses Correction Modeling: I exlpored the first series of n-best hypotheses based generative error correction (GER) in ASR & translation pre-training & post-training in [ASRU 23] and co-invented Whispering-LLaMA [EMNLP 23]; HyPoradise in [NeurIPS 23], and speech post-training to LLM [ICLR 24]. Best industry paper honoronable mentioned award ποΈ on multimodal n-best correction in [ACL 25].
βοΈ Fun fact: I also work on Quantum ML part-time for fun, where I created the first variational circuit based speech [ICASSP 21] and language understanding [ICASSP 22] and received the Xanadu AI Quantum ML Award in 2019; recently, on quantum parameter adaptation for LLMs in [ICLR 25].
Jan 25, 2025
six ICLR 25 papers and one EMNLP 25 Tutorial, accepted
Oct 2, 2024
three EMNLP 24 and one NeurIPS 24, accepted
May 2, 2024
one ACL (oral) 24 and one US Patent, accepted
Selected Publications
DCASE 2025
Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning
, Sreyan Ghosh, Qing Wang, Jaeyeon Kim, Hengyi Hong, Sonal Kumar, Guirui Zhong, Zhifeng Kong, et al.
SLT 2024
LLM Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Chao-Han Huck Yang, Taejin Park, Yuan Gong, Yuanchao Li, Zhehuai Chen, Yen-Ting Lin, Chen Chen, Yuchen Hu, Kunal Dhawan, et al.
ICLR 2024
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Engsiong Chng, Chao-Han Huck Yang
ASRU 2023
Generative Speech Recognition Error Correction with Large Language Models and Task-activating Prompting
Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke
AAAI 2022
Training a Resilient Q-Network against Observational Interference
Chao-Han Huck Yang, I-Te Danny Hung, Yi Ouyang, Pin-Yu Chen
ICML 2021
Voice2series: Reprogramming Acoustic Models for Time Series Classification
Chao-Han Huck Yang, Yun-Yun Tsai, Pin-Yu Chen
Research Areas
Speech-Language Alignment
Exploring semantic and non-semantic alignment for LLMs.
LLM ASR and Translation Cross-Modal
Test-Time Scaling and Reasoning
Developing sample-efficient and cross-modal inference.
Scaling Laws Reward Modeling Decoding
Robust Evaluation and Causality
Building robust evaluation frameworks and intervention-resilient architectures.
Causal Inference Robustness Privacy
Tutorials
EMNLP 2025
Spoken Conversational Agents with Large Language Models
A comprehensive tutorial on integrating LLMs with speech recognition systems, covering task-activating prompting and cross-modal alignment techniques.
Interspeech 2025
Efficient Adaptation in Speech Language Modeling
Introduction to parameter-efficient adaptation methods for speech models, including prompt-tuning and in-context learning approaches.
Interspeech 2023
Cross-Modal Alignment for Voice Foundational Models
Overview of robust speech recognition techniques using large language models, focusing on noise-resilient architectures.