Vortex provides a Spark DataSource V2 connector for reading and writing Vortex files. The
connector is published to Maven Central as dev.vortex:vortex-spark.
Installation#
Add the dependency to your build. The connector is built against Spark 4.x with Scala 2.13.
implementation("dev.vortex:vortex-spark:<version>")
<dependency> <groupId>dev.vortex</groupId> <artifactId>vortex-spark</artifactId> <version>${vortex.version}</version> </dependency>
The connector ships as a shadow JAR that relocates its Arrow, Guava, and Protobuf dependencies to avoid classpath conflicts with Spark.
Reading Vortex Files#
Use the vortex format to read a single file or a directory of Vortex files:
Dataset<Row> df = spark.read() .format("vortex") .option("path", "/path/to/data.vortex") .load();
When pointed at a directory, the connector discovers all .vortex files and creates one read
partition per file.
Column pruning is pushed down — only the columns referenced by the query are read from the file.
Writing Vortex Files#
df.write() .format("vortex") .option("path", "/path/to/output") .mode(SaveMode.Overwrite) .save();
Each Spark partition produces one output file named part-{partitionId}-{taskId}.vortex.
Write Options#
Save Modes#
The connector supports all standard Spark save modes: Overwrite, Append, Ignore, and
ErrorIfExists.
Supported Types#
S3 Support#
The connector supports reading and writing to S3 paths:
Dataset<Row> df = spark.read() .format("vortex") .option("path", "s3://bucket/path/to/data") .load();