Delta Lake for Ruby
Supports local files and Amazon S3
Installation
Add this line to your application’s Gemfile:
It can take 5-10 minutes to compile the gem.
Getting Started
Write data from Polars
df = Polars::DataFrame.new({"id" => [1, 2], "value" => [3.0, 4.0]}) DeltaLake.write("./events", df)
Load a table
dt = DeltaLake::Table.new("./events") df = dt.to_polars
Get a lazy frame
lf = dt.to_polars(eager: false)
Append rows
DeltaLake.write("./events", df, mode: "append")
Overwrite a table
DeltaLake.write("./events", df, mode: "overwrite")
Add a constraint
dt.alter.add_constraint({"id_gt_0" => "id > 0"})
Drop a constraint
dt.alter.drop_constraint("id_gt_0")
Delete rows
Vacuum
dt.vacuum(dry_run: false)
Perform small file compaction
Colocate similar data in the same files
dt.optimize.z_order(["category"])
Load a previous version of a table
dt = DeltaLake::Table.new("./events", version: 1) # or dt.load_as_version(1)
Get the schema
Get metadata
Get history
API
This library follows the Delta Lake Python API (with a few changes to make it more Ruby-like). You can follow Python tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/delta-ruby.git cd delta-ruby bundle install bundle exec rake compile bundle exec rake test