Kedro

Kedro is a toolbox for production-ready data pipelines.

Kedro hero graphic

Why Kedro?

Clean Code

Kedro is the foundation for clean data engineering and data science code. It borrows concepts from software engineering and applies them to building data pipelines.

Handles Complexity

A Kedro project provides scaffolding for complex data and machine-learning pipelines. You spend less time on tedious "plumbing" and focus instead on solving new problems.

Standardisation

Kedro standardises how data engineering and data science code is created and ensures teams collaborate to solve problems easily.

Production-Ready

Make a seamless transition from development to production with exploratory code that you can transition to reproducible, maintainable, and modular experiments.

Kedro's Key Concepts Explained →

Features

Pipeline Visualisation

Kedro-Viz is a blueprint of your data and machine-learning workflows. It provides data lineage, surfaces detailed pipeline execution information such as execution time, node status, dataset statistics, and makes it easier to collaborate with business stakeholders.

–– 01

Data Catalog

A series of lightweight data connectors used to save and load data across many different file formats and file systems. The Data Catalog supports S3, GCP, Azure, sFTP, DBFS, and local filesystems. Supported file formats include Pandas, Spark, Dask, NetworkX, Pickle, Plotly, Matplotlib, and many more. The Data Catalog also includes data and model snapshots for file-based systems.

Data Catalog

–– 02

Integrations

Amazon SageMaker, Apache Airflow, Apache Spark, Azure ML, Dask, Databricks, Docker, fsspec, Jupyter Notebook, Kubeflow, Matplotlib, MLflow, Plotly, Pandas, VertexAI, and more.

Integrations example

–– 04

Dedicated IDE support

The extension integrates Kedro projects with Visual Studio Code, providing features like enhanced code navigation and autocompletion for seamless development.

Kedro VSCode example

Expand all

Kedro is an open-source Python framework hosted by the Linux Foundation (LF AI & Data). Kedro uses software engineering best practices to help you build production-ready data engineering and data science code.

Case studies

Kedro in production at

Case study logo

Learn how Kedro is used in production at Telkomsel, Indonesia's largest telecommunications company. Kedro is used to help consume tens of TBs of data, run hundreds of feature engineering tasks, and serve dozens of ML models.

Kedro in production at

Case study logo

Data scientists at Beamery, a fast-growing talent lifecycle management company, explain how Kedro helps them write "production-code". They talk about a workflow that involves Kedro when they want to progress their POCs.

Testimonials

Testimonial logo

Eduardo Ohe, Principal Data Engineer

Tremendously valuable

"Kedro has streamlined our workflow process, avoiding a lot of back and forth with debugging. It allowed our company to deliver more value to our customers quickly."

Testimonial logo

Ghifari Dwiki Ramadhan, Data Engineering

We heavily use Kedro

"We use Kedro in our production environment which consumes tens of TBs of data, runs hundreds of feature engineering tasks, and serves dozens of ML models."

Ready to start?

Kedro is an open-source project. Go ahead and install it with pip or conda:

pip install kedro

conda install -c conda-forge kedro

For more details, see the set up documentation or watch the video.

Kedro ready to start graphic