New Java Data Platforms 2026
last commit 11 months ago alluxio/alluxio 7K +4
added 9 months ago
Alluxio Open Source (formerly known as Tachyon) is a Distributed Caching Platform for large-scale data.
last commit 2 days ago apache/seatunnel 9K +36
added 9 months ago
A high-performance, distributed data integration tool, capable of synchronizing vast amounts of data daily.
last commit 8 months ago alibaba/datax 17K +4
added 11 months ago
DataX is the open source data integration framework maintained by Alibaba. As a data synchronization framework, DataX abstracts the synchronization of different data sources.
last commit 1 day ago elastic/logstash 14K +12
added 1 year ago
Logstash is a server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash."
last commit 2 days ago apache/pulsar 15K +11
added 1 year ago
Pulsar is a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
last commit 2 days ago apache/rocketmq 22K +10
added 1 year ago
Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.
last commit 17 hours ago apache/spark 43K +58
added 1 year ago
Apache Spark - A unified analytics engine for large-scale data processing.
Java AI / ML Libraries BigData Libraries Data Platforms DataFrame Libraries
last commit 1 week ago apache/systemds 1K +1
added 1 year ago
An open source ML system for the end-to-end data science lifecycle
Java Data Platforms Data Science Libraries AI / ML Libraries
last commit 2 days ago apache/streampipes 715
added 1 year ago
A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
last commit 1 day ago apache/nifi 6K +26
added 1 year ago
Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data.
last commit 20 hours ago apache/flink 25K +33
added 1 year ago
A stream processing framework with powerful stream- and batch-processing capabilities.
Java Data Platforms BigData Libraries Batch Processing Libraries
last commit 1 day ago infinispan/infinispan 1K +1
added 1 year ago
An open source data grid platform and highly scalable NoSQL cloud data store.
last commit 16 hours ago apache/ignite 5K +2
added 1 year ago
Apache Ignite is a distributed database for high-performance computing with in-memory speed.
last commit 1 day ago apache/kafka 32K +17
added 1 year ago
Distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
last commit 1 day ago hazelcast/hazelcast 6K +9
added 1 year ago