Kedro
Kedro is a toolbox for production-ready data pipelines.
Why Kedro?
Clean Code
Kedro is the foundation for clean data engineering and data science code. It borrows concepts from software engineering and applies them to building data pipelines.
Handles Complexity
A Kedro project provides scaffolding for complex data and machine-learning pipelines. You spend less time on tedious "plumbing" and focus instead on solving new problems.
Standardisation
Kedro standardises how data engineering and data science code is created and ensures teams collaborate to solve problems easily.
Production-Ready
Make a seamless transition from development to production with exploratory code that you can transition to reproducible, maintainable, and modular experiments.
Kedro's Key Concepts Explained →
Features
Pipeline Visualisation
Kedro-Viz is a blueprint of your data and machine-learning workflows. It provides data lineage, surfaces detailed pipeline execution information such as execution time, node status, dataset statistics, and makes it easier to collaborate with business stakeholders.
–– 01
Data Catalog
A series of lightweight data connectors used to save and load data across many different file formats and file systems. The Data Catalog supports S3, GCP, Azure, sFTP, DBFS, and local filesystems. Supported file formats include Pandas, Spark, Dask, NetworkX, Pickle, Plotly, Matplotlib, and many more. The Data Catalog also includes data and model snapshots for file-based systems.


–– 02
Integrations
Amazon SageMaker, Apache Airflow, Apache Spark, Azure ML, Dask, Databricks, Docker, fsspec, Jupyter Notebook, Kubeflow, Matplotlib, MLflow, Plotly, Pandas, VertexAI, and more.

–– 04
Dedicated IDE support
The extension integrates Kedro projects with Visual Studio Code, providing features like enhanced code navigation and autocompletion for seamless development.

Expand all
Kedro is an open-source Python framework hosted by the Linux Foundation (LF AI & Data). Kedro uses software engineering best practices to help you build production-ready data engineering and data science code.
Case studies
Kedro in production at
Learn how Kedro is used in production at Telkomsel, Indonesia's largest telecommunications company. Kedro is used to help consume tens of TBs of data, run hundreds of feature engineering tasks, and serve dozens of ML models.
Kedro in production at
Data scientists at Beamery, a fast-growing talent lifecycle management company, explain how Kedro helps them write "production-code". They talk about a workflow that involves Kedro when they want to progress their POCs.
Testimonials
Eduardo Ohe, Principal Data Engineer
Tremendously valuable
"Kedro has streamlined our workflow process, avoiding a lot of back and forth with debugging. It allowed our company to deliver more value to our customers quickly."
Ghifari Dwiki Ramadhan, Data Engineering
We heavily use Kedro
"We use Kedro in our production environment which consumes tens of TBs of data, runs hundreds of feature engineering tasks, and serves dozens of ML models."
Ready to start?
Kedro is an open-source project. Go ahead and install it with pip or conda:
pip install kedro
or
conda install -c conda-forge kedro
For more details, see the set up documentation or watch the video.