The Vortex Python API provides a Pythonic interface to the Vortex library via PyO3 bindings. It supports reading and writing Vortex files, compressing data, and integrating with the broader Python data ecosystem including PyArrow, Pandas, Polars, DuckDB, and Ray.
Installation#
Optional integrations can be installed as extras:
pip install vortex-data[polars,pandas,numpy,duckdb,ray]
Compatibility#
The Python bindings require Python 3.11 or newer. Pre-built wheels are available for:
x86_64 Linux
ARM64 Linux
Apple Silicon macOS
They support any Linux distribution with a GLIBC version >= 2.17. This includes
Amazon Linux 2 or newer
Ubuntu 14.04 or newer
Usage Example#
Here’s a basic example of using the Vortex Python API to write and read a Vortex file:
import vortex # Write a Vortex file from a PyArrow table vortex.io.write_path(my_table, "data.vortex") # Read a Vortex file dataset = vortex.dataset("data.vortex") table = dataset.to_arrow()
API Reference#
- Data Types
- Scalars
- Arrays
- Factory Functions
- Base Class
ArrayArray.__len__()Array.apply()Array.display_tree()Array.dtypeArray.filter()Array.from_arrow()Array.from_range()Array.idArray.nbytesArray.scalar_at()Array.take()Array.to_arrow_array()Array.to_arrow_table()Array.to_numpy()Array.to_pandas()Array.to_polars_dataframe()Array.to_polars_series()Array.to_pylist()
- Canonical Encodings
- Utility Encodings
- Compressed Encodings
- Pluggable Encodings
- Registry and Serde
- Streams and Iterators
- Expressions
- Compression
- Input and Output
- Object Store support
- S3 and S3-compatible
S3StoreS3ConfigS3Config.access_key_idS3Config.bucketS3Config.checksum_algorithmS3Config.conditional_putS3Config.container_credentials_relative_uriS3Config.copy_if_not_existsS3Config.default_regionS3Config.disable_taggingS3Config.endpointS3Config.imdsv1_fallbackS3Config.metadata_endpointS3Config.regionS3Config.request_payerS3Config.s3_expressS3Config.secret_access_keyS3Config.server_side_encryptionS3Config.session_tokenS3Config.skip_signatureS3Config.sse_bucket_key_enabledS3Config.sse_customer_key_base64S3Config.sse_kms_key_idS3Config.unsigned_payloadS3Config.virtual_hosted_style_request
S3CredentialS3CredentialProvider
- Google Cloud Storage
- Azure Blob Storage
AzureStoreAzureConfigAzureConfig.account_keyAzureConfig.account_nameAzureConfig.authority_hostAzureConfig.client_idAzureConfig.client_secretAzureConfig.container_nameAzureConfig.disable_taggingAzureConfig.endpointAzureConfig.fabric_cluster_identifierAzureConfig.fabric_session_tokenAzureConfig.fabric_token_service_urlAzureConfig.fabric_workload_hostAzureConfig.federated_token_fileAzureConfig.msi_endpointAzureConfig.msi_resource_idAzureConfig.object_idAzureConfig.sas_keyAzureConfig.skip_signatureAzureConfig.tenant_idAzureConfig.tokenAzureConfig.use_azure_cliAzureConfig.use_emulatorAzureConfig.use_fabric_endpoint
AzureAccessKeyAzureSASTokenAzureBearerTokenAzureCredentialAzureCredentialProvider
- HTTP
- Local
- Memory
- Common Configuration
ClientConfigClientConfig.allow_httpClientConfig.allow_invalid_certificatesClientConfig.connect_timeoutClientConfig.default_content_typeClientConfig.default_headersClientConfig.http1_onlyClientConfig.http2_keep_alive_intervalClientConfig.http2_keep_alive_timeoutClientConfig.http2_keep_alive_while_idleClientConfig.http2_onlyClientConfig.pool_idle_timeoutClientConfig.pool_max_idle_per_hostClientConfig.proxy_urlClientConfig.timeoutClientConfig.user_agent
RetryConfigBackoffConfig
from_url()
- S3 and S3-compatible
- Dataset
VortexDatasetVortexDataset.count_rows()VortexDataset.filter()VortexDataset.get_fragments()VortexDataset.head()VortexDataset.join()VortexDataset.join_asof()VortexDataset.replace_schema()VortexDataset.scanner()VortexDataset.schemaVortexDataset.sort_by()VortexDataset.take()VortexDataset.to_batches()VortexDataset.to_record_batch_reader()VortexDataset.to_table()
VortexFragmentVortexScanner
- Type Aliases