Build software better, together

Here are 5,695 public repositories matching this topic...

Cleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Updated Jan 13, 2026
  • Python

An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM

  • Updated Jul 18, 2025
  • Jupyter Notebook

A comprehensive guide to building a modern data warehouse with SQL Server, including ETL processes, data modeling, and analytics.

  • Updated Apr 23, 2025
  • TSQL

Continuously updated paper list on advancements in Data Agents. Companion repo to our paper "A Survey of Data Agents: Emerging Paradigm or Overstated Hype?"

  • Updated Apr 1, 2026
  • Python
desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

  • Updated Apr 1, 2026
  • C++

Improve this page

Add a description, image, and links to the data-cleaning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-cleaning topic, visit your repo's landing page and select "manage topics."

Learn more