🌪️ Vortex
📚 Documentation | 📊 Performance Benchmarks
Overview
Vortex is a next-generation columnar file format and toolkit designed for high-performance data analytics. It provides:
-
⚡️ Blazing Fast Performance
- 100-200x faster random access reads than Apache Parquet
- 2-10x faster scans with similar compression ratios and write throughput
- Efficient support for wide tables with zero-copy/zero-parse metadata
-
🔧 Extensible Architecture
- Modeled after Apache DataFusion's extensible approach
- Pluggable encoding system
- Zero-copy compatibility with Apache Arrow
Project Information
License
Licensed under the Apache License, Version 2.0
Governance
Vortex is an independent open-source project and not controlled by any single company. The Vortex Project is a sub-project of the Linux Foundation Projects. The governance model is documented in CONTRIBUTING.md and is subject to the terms of the Technical Charter.
Contributing
See CONTRIBUTING.md for guidelines.
Acknowledgments 🏆
This project builds upon groundbreaking work from the academic and open-source communities:
Key Research Papers
- BtrBlocks - Efficient columnar compression
- FastLanes - High-performance integer compression
- FSST - Fast random access string compression
- ALP - Adaptive lossless floating-point compression
- Procella - YouTube's unified data system
- Cloud Object Storage Analytics - High-performance analytics
- ClickHouse - Fast analytics for everyone
Open Source Inspiration
- Apache Arrow & Apache DataFusion
- parquet2 by Jorge Leitao
- DuckDB
- Velox & Nimble
Trademarks
Copyright © Vortex a Series of LF Projects, LLC For web site terms of use, trademark policy and other project policies please see https://lfprojects.org
Thanks to all contributors who have shared their knowledge and code with the community! 🚀