Dong Dai — Associate Professor, University of Delaware

Associate Professor
Department of Computer & Information Sciences
University of Delaware
I am an Associate Professor at the University of Delaware in the Department of Computer and Information Sciences, where I lead the Data Intelligence Research Lab (DIRLab). My research focuses on data-intensive and high-performance systems, spanning parallel file systems, metadata management, graph storage, resource scheduling, and machine learning for systems. Previously, I was an Assistant Professor at UNC Charlotte and held postdoctoral positions at Texas Tech University and Argonne National Lab. I received my Ph.D. in Computer Science from the University of Science and Technology of China (USTC).
Publications
* Ph.D. student mentored † Master/undergraduate student mentored
- IPDPS'26 QoSFlow: Ensuring QoS of Distributed Workflows Using Interpretable Sensitivity Models arXiv
- IPDPS'26 CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems Md. Hasanur Rashid*, Nathan R. Tallent, Forrest Sheng Bao, Dong Dai arXiv
- npj Comp. Mat. AI-assisted Rapid Crystal Structure Generation towards a Target Local Environment O. G. Ridwan, S. Pitié, M. Soundar Raj, Dong Dai, G. Frapper, H. Xue, Q. Zhu. (IF=11.09)
- SC'25 STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for HPC Parallel File Systems Chris Egersdoerfer*, Philip Carns, Shane Snyder, Robert Ross, Dong Dai arXiv
- SC'25 Improving SpGEMM Performance Through Matrix-Reordering and Cluster-wise Computation Abdullah Al Raqibul Islam*, Helen Xu, Dong Dai, Aydin Buluç arXiv
- HPDC'25 TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File System Zheng Wei, Jing Xing, Yida Gu, Wenjing Huang, Dong Dai, Guangming Tan, Dingwen Tao
- CUG'25 Towards Empirical Roofline Modeling of Distributed Data Services: Mapping the Boundaries of RPC Throughput Philip Carns, Matthieu Dorier, Rob Latham, Shane Snyder, A. Gueroudji, S. Ockerman, J. Soumagne, Dong Dai, Robert Ross
- CCGrid'25 DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System Md Hasanur Rashid*, Xinyi Li, Youbiao He, Forrest Sheng Bao, Dong Dai arXiv
- IPDPS'25 IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs Chris Egersdoerfer*, Arnav Sareen†, Jean Luca Bez, Suren Byna, Dongkuan Xu, Dong Dai arXiv
- IPDPS'25 Be Aware of Metadata Corruption in Parallel File Systems: It Can Be Silent and Catastrophic Saisha Kamat*, Mai Zheng, Bo Fang, Dong Dai PDF
- IPDPS'25 AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage Md. Hasanur Rashid*, Dong Dai arXiv
- PDSW'25 LLMTailor: A Layer-wise Tailoring Tool for Efficient Checkpointing of Large Language Models Minqiu Sun*, Xin Huang, Luanzheng Guo, Nathan R. Tallent, Kento Sato, Dong Dai arXiv
- PDSW'25 RL4Sys: A Lightweight System-driven RL Framework for Drop-in Integration in System Optimization Jiaxin Dong*, Md. Hasanur Rashid*, Helen Xu, Dong Dai
- IEEE TC'24 Hardware Accelerated Vision Transformer via Heterogeneous Architecture Design and Adaptive Dataflow Mapping Y. Gao, T. Wang, L. Gong, C. Wang, Dong Dai, Y. Yang, X. Chen, X. Li, X. Zhou
- BigData'24 QualityNet: Error-bounded Lossy Compression Quality Prediction via Deep Surrogate Khondoker Mirazul Mumenin*, Dong Dai, Jinzhen Wang, Sheng Di (acceptance rate: 18.8%) PDF
- PDSW'24 Understanding and Predicting Cross-Application I/O Interference in HPC Storage Systems Chris Egersdoerfer*, Md. Hasanur Rashid*, Dong Dai, Bo Fang, Nathan Tallent PDF GitHub
- HotStorage'24 ION: Navigating HPC I/O Optimization Journey using Large Language Models Chris Egersdoerfer†, Arnav Sareen†, Jean Luca Bez, Suren Byna, Dong Dai PDF GitHub Talk
- JSSPP'24 An Empirical Study of Machine Learning-based Synthetic Job Trace Generation Methods Monish Soundar Raj†, Thomas MacDougall†, Di Zhang*, Dong Dai PDF GitHub Talk
- IPDPS'24 Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters Di Zhang*, Monish Soundar Raj†, Bing Xie, Sheng Di, Dong Dai (acceptance rate: 24%) PDF GitHub Talk
Teaching
University of Delaware
- F 2025 CISC 361 Operating System
- S 2025 CISC 360 Computer Architecture
- F 2024 CISC 361 Operating System
UNC Charlotte
- S 2023 ITCS-6050/8050 ML for Efficient Computing Systems
- S 2023 Undergraduate Research Initiative
- F 2023 Undergraduate Research Initiative
- 19–22 ITCS-5145 Parallel Computing
- 20–22 ITSC-3181 Intro to Computer Architecture
- 18–19 ITCS-6144/8144 Operating Systems Design
Doctoral Students
Current
- Md Hasanur Rashid — 2024–present, passed proposal defense
- Chris Egersdoerfer — 2024–present, passed preliminary exam
- Jiaxin Dong — 2024–present
- Minqiu Sun — 2024–present
- Yuan Liang — 2025–present
Graduated
- Abdullah Al Raqibul Islam — Ph.D. 2019–2025 → Research Associate @ OSU
- Di Zhang — Ph.D. 2019–2024 → Research Scientist @ Meta
Research Projects
- Active Moving Machine Learning into the Next-Generation Cloud Flexibly, Agilely and Efficiently
- Active Hybrid NVM based Computing Architecture for Machine Learning Applications
- Active Parallel Graph-Based Paradigm for HPC Parallel File System Checkers
- Active Empowering Data-driven Discovery with Provenance Infrastructure
- Past Partitioning Large Graphs in Deep Storage Architecture
- Past Tuning Extreme-scale Storage Stack through Deep Reinforcement Learning
- Past Uncovering Vulnerabilities in Parallel File Systems