ColinLeeo - Overview

Hi there ๐Ÿ‘‹

๐Ÿง‘โ€๐Ÿ’ปAbout me

I am a graduate student at the School of Software, Tsinghua University, with a background in information and communication engineering from Beijing University of Posts and Telecommunications.

I'm interested in database systems, with a focus on storage, query optimization, and scheduling, and actively involved in the evolution of open-source databases and file formats. My current work focuses on optimizing file formats and database systems for emerging hardware and AI workloads.

๐Ÿš€ Current Projects and Explorations

  • Apache TsFile -- Exploring optimizations of the time-series file format for embedded and analytical workloads, with a focus on improving performance under widespread SIMD adoption and AI training scenarios.
  • MiniGU -- Contributing to the development of a Rust-based embedded graph database with GQL support, currently aimed at educational and research use in universities, and exploring native graphโ€“vector integration for future RAG systems.
  • [SafeBound-for-Update] -- Investigating a pessimistic cardinality estimation method for graph path matching that better accommodates update operations while providing a tighter upper bound than the original Safebound.
  • [LoadaWise-ORCA-for-NewHardwareDB] -- Designing load-aware query optimization and scheduling methods in the ORCA optimizer framework, targeting multi-model databases on emerging hardware platforms.

๐Ÿ“œ Past Projects:

  • Apache IoTDB -- Developed the authorization and user management system, designing and implementing permission mechanisms for both the tree model and the table model.
  • TugraphDB Designed a metadata encoding scheme to resolve update anomalies caused by the original encoding method, enabling metadata modifications with O(1) complexity.
  • Apache HAWQ (commercial version a.k.a. OushuDB) โ€” Implemented resource and cluster virtualization to support multi-tenancy and affinity-based query scheduling.
  • Benchmark for Multi-Language TsFile โ€” A configurable performance benchmarking tool for TsFile across multiple programming languages. It supports scheduled execution and continuous monitoring, with results periodically reported to the TsFile community via GitHub issues.
  • SIMD_TS2DIFF -- A vectorized decoding approach for the core encoding/decoding method in TsFile, incorporating block-level filtering with bit-packing. This design improves time-range query performance by up to two orders of magnitude.

๐Ÿ“ฐ News:

  • Give a talk about data model convertion in IoTDB at CCF bigdata 2025.
  • Release TsFile (C,C++,Python) V2.1.0 as Release Manager.
  • Honored to have become a committer of Apache TsFile.
  • Recognized as Tugraph Core Contributor of the Year by the community.
  • Honored to have become a committer of Apache IoTDB.