🗂 Table of Contents
📖 Introduction
Diffly is a Python package for comparing Polars DataFrames with detailed analysis capabilities. It identifies differences between datasets including schema differences, row-level mismatches, missing rows, and column value changes.
💿 Installation
You can install diffly using your favorite package manager:
pixi add diffly conda install diffly uv add diffly pip install diffly
🎯 Usage
import polars as pl from diffly import compare_frames left = pl.DataFrame({ "id": ["a", "b", "c"], "value": [1.0, 2.0, 3.0], }) right = pl.DataFrame({ "id": ["a", "b", "d"], "value": [1.0, 2.5, 4.0], }) comparison = compare_frames(left, right, primary_key="id") if not comparison.equal(): summary = comparison.summary( top_k_column_changes=1, show_sample_primary_key_per_change=True ) print(summary)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Diffly Summary ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
Primary key: id
Schemas
▔▔▔▔▔▔▔
Schemas match exactly (column count: 2).
Rows
▔▔▔▔
Left count Right count
3 (no change) 3
┏━┯━┯━┯━┯━┓
┃-│-│-│-│-┃ 1 left only (33.33%)
┠─┼─┼─┼─┼─┨╌╌╌┏━┯━┯━┯━┯━┓╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╮
┃ │ │ │ │ ┃ = ┃ │ │ │ │ ┃ 1 equal (50.00%) │
┠─┼─┼─┼─┼─┨╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌├╴ 2 joined
┃ │ │ │ │ ┃ ≠ ┃ │ │ │ │ ┃ 1 unequal (50.00%) │
┗━┷━┷━┷━┷━┛╌╌╌┠─┼─┼─┼─┼─┨╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╯
┃+│+│+│+│+┃ 1 right only (33.33%)
┗━┷━┷━┷━┷━┛
Columns
▔▔▔▔▔▔▔
┌───────┬────────┬───────────────────────────┐
│ value │ 50.00% │ 2.0 -> 2.5 (1x, e.g. "b") │
└───────┴────────┴───────────────────────────┘
See more examples in the documentation.