OpenDCAI

OpenDCAI

Website Google Scholar X Bilibili RedNote Stars Followers

👋 Welcome

✨We are dedicated to advancing research and open-source tools in Data-Centric Artificial Intelligence (DCAI).✨

🚀Our goal is to develop effective and efficient DCAI systems and algorithms that support and enhance the performance of AI models and applications.

🤝 Community

QR_en

Pinned Loading

  1. Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    Python 3.3k 274

  2. Forked from OriginHubAI/MyScaleDB

    AI Database for unified, scalable SQL + vector data management, search and analytics

    C++ 41 1

  3. DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    Python 260 27

  4. Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    Python 2.2k 150

  5. The First Unified Agent Data Synthesis Framework for Custom Agentic Task with all-in-one envrionment

    Python 89 6

  6. Unified Codebase for Advanced World Models.

    Python 679 34

Repositories

Showing 10 of 33 repositories

  • Mycel Public

    Connect people, agents, and teams for the next era of human-AI collaboration.

    OpenDCAI/Mycel’s past year of commit activity

    Python

    104

    MIT

    14 4 3

    Updated Apr 19, 2026

  • OpenWorldLib Public

    Unified Codebase for Advanced World Models.

    OpenDCAI/OpenWorldLib’s past year of commit activity

    Python

    679

    Apache-2.0

    34 3 1

    Updated Apr 18, 2026

  • DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    OpenDCAI/DataFlex’s past year of commit activity

    Python

    260

    Apache-2.0

    27 1 0

    Updated Apr 17, 2026

  • AgentFlow Public

    The First Unified Agent Data Synthesis Framework for Custom Agentic Task with all-in-one envrionment

    OpenDCAI/AgentFlow’s past year of commit activity

    Python

    89 6 0 1

    Updated Apr 16, 2026

  • OpenDCAI/Open-NotebookLM’s past year of commit activity

    Python

    58

    Apache-2.0

    15 3 5

    Updated Apr 16, 2026

  • DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    OpenDCAI/DataFlow’s past year of commit activity

    Python

    3,273

    Apache-2.0

    274 8 0

    Updated Apr 15, 2026

  • Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    OpenDCAI/Paper2Any’s past year of commit activity

    Python

    2,165

    Apache-2.0

    150 8 4

    Updated Apr 15, 2026

  • OpenDCAI/DataFlow-VQA’s past year of commit activity

    Python

    9 3 0 0

    Updated Apr 15, 2026

  • OpenDCAI/DataFlow-WebUI’s past year of commit activity

    Python

    19 14 0 0

    Updated Apr 15, 2026

  • Flash-MinerU Public

    Ray-powered accelerator for MinerU, turning PDF → Markdown into a scalable, cluster-ready data infrastructure. 基于 Ray 的 MinerU 加速层,将 PDF → Markdown 构建为可扩展、面向集群的数据基础设施。

    OpenDCAI/Flash-MinerU’s past year of commit activity

    Python

    44

    AGPL-3.0

    6 2 0

    Updated Apr 14, 2026