Unify Table representations by timsaucer · Pull Request #1256 · apache/datafusion-python
and others added 30 commits
September 18, 2025 15:11docs/tests, add DataFrame view support, and improve Send/concurrency support. migrates the codebase from using `Table` to a `TableProvider`-based API, refactors registration and access paths to simplify catalog/context interactions, and updates documentation and examples. DataFrame view handling is improved (`into_view` is now public), the test-suite is expanded to cover new registration and async SQL scenarios, and `TableProvider` now supports the `Send` trait across modules for safer concurrency. Minor import cleanup and utility adjustments (including a refined `pyany_to_table_provider`) are included.
DataFrame→TableProvider conversion, plus tests and FFI/pycapsule improvements. -- Registration logic & API * Refactor of table provider registration logic for improved clarity and simpler call sites. * Remove PyTableProvider registration from an internal module (reduces surprising side effects). * Update table registration method to call `register_table` instead of `register_table_provider`. * Extend `register_table` to support `TableProviderExportable` so more provider types can be registered uniformly. * Improve error messages related to registration failures (missing PyCapsule name and DataFrame registration errors). -- DataFrame ↔ TableProvider conversions * Introduce utility functions to simplify table provider conversions and centralize conversion logic. * Rename `into_view_provider` → `to_view_provider` for clearer intent. * Fix `from_dataframe` to return the correct type and update `DataFrame.into_view` to import the correct `TableProvider`. * Remove an obsolete `dataframe_into_view` test case after the refactor. -- FFI / PyCapsule handling * Update `FFI_TableProvider` initialization to accept an optional parameter (improves FFI ergonomics). * Introduce `table_provider_from_pycapsule` utility to standardize pycapsule-based construction. * Improve the error message when a PyCapsule name is missing to help debugging. -- DeltaTable & specific integrations * Update TableProvider registration for `DeltaTable` to use the correct registration method (matches the new API surface). -- Tests, docs & minor fixes * Add tests for registering a `TableProvider` from a `DataFrame` and from a capsule to ensure conversion paths are covered. * Fix a typo in the `register_view` docstring and another typo in the error message for unsupported volatility type. * Simplify version retrieval by removing exception handling around `PackageNotFoundError` (streamlines code path).
* Removed unused helpers (`extract_table_provider`, `_wrap`) and dead code to simplify maintenance. * Consolidated and streamlined table-provider extraction and registration logic; improved error handling and replaced a hardcoded error message with `EXPECTED_PROVIDER_MSG`. * Marked `from_view` as deprecated; updated deprecation message formatting and adjusted the warning `stacklevel` so it points to caller code. * Removed the `Send` marker from TableProvider trait objects to increase type flexibility — review threading assumptions. * Added type hints to `register_schema` and `deregister_table` methods. * Adjusted tests and exceptions (e.g., changed one test to expect `RuntimeError`) and updated test coverage accordingly. * Introduced a refactored `TableProvider` class and enhanced Python integration by adding support for extracting `PyDataFrame` in `PySchema`. Notes: * Consumers should migrate away from `TableProvider::from_view` to the new TableProvider integration. * Audit any code relying on `Send` for trait objects passed across threads. * Update downstream tests and documentation to reflect the changed exception types and deprecation.
utilities, docs, and robustness fixes * Normalized table-provider handling and simplified registration flow across the codebase; multiple commits centralize provider coercion and normalization. * Introduced utility helpers (`coerce_table_provider`, `extract_table_provider`, `_normalize_table_provider`) to centralize extraction, error handling, and improve clarity. * Simplified `from_dataframe` / `into_view` behavior: clearer implementations, direct returns of DataFrame views where appropriate, and added internal tests for DataFrame flows. * Fixed DataFrame registration semantics: enforce `TypeError` for invalid registrations; added handling for `DataFrameWrapper` by converting it to a view. * Added tests, including a schema registration test using a PyArrow dataset and internal DataFrame tests to cover new flows. * Documentation improvements: expanded `from_dataframe` docstrings with parameter details, added usage examples for `into_view`, and documented deprecations (e.g., `register_table_provider` → `register_table`). * Warning and UX fixes: synchronized deprecation `stacklevel` so warnings point to caller code; improved `__dir__` to return sorted, unique attributes. * Cleanup: removed unused imports (including an unused error import from `utils.rs`) and other dead code to reduce noise.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters