New Java Data Platforms 2026

last commit 11 months ago alluxio/alluxio 7K +4

added 9 months ago

Alluxio Open Source (formerly known as Tachyon) is a Distributed Caching Platform for large-scale data.

Java BigData Libraries Data Platforms AI / ML Libraries

last commit 2 days ago apache/seatunnel 9K +36

added 9 months ago

A high-performance, distributed data integration tool, capable of synchronizing vast amounts of data daily.

Java Integration Frameworks Data Platforms

last commit 8 months ago alibaba/datax 17K +4

added 11 months ago

DataX is the open source data integration framework maintained by Alibaba. As a data synchronization framework, DataX abstracts the synchronization of different data sources.

Java Data Platforms

last commit 1 day ago elastic/logstash 14K +12

added 1 year ago

Logstash is a server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash."

Java Data Platforms Logging Libraries

last commit 2 days ago apache/pulsar 15K +11

added 1 year ago

Pulsar is a distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.

Java Data Platforms

last commit 2 days ago apache/rocketmq 22K +10

added 1 year ago

Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.

Java Data Platforms

last commit 17 hours ago apache/spark 43K +58

added 1 year ago

Apache Spark - A unified analytics engine for large-scale data processing.

Java AI / ML Libraries BigData Libraries Data Platforms DataFrame Libraries

last commit 1 week ago apache/systemds 1K +1

added 1 year ago

An open source ML system for the end-to-end data science lifecycle

Java Data Platforms Data Science Libraries AI / ML Libraries

last commit 2 days ago apache/streampipes 715

added 1 year ago

A self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.

Java Data Platforms IoT Libraries

last commit 1 day ago apache/nifi 6K +26

added 1 year ago

Apache NiFi is an easy to use, powerful, and reliable system to process and distribute data.

Java Data Platforms

last commit 20 hours ago apache/flink 25K +33

added 1 year ago

A stream processing framework with powerful stream- and batch-processing capabilities.

Java Data Platforms BigData Libraries Batch Processing Libraries

last commit 1 day ago infinispan/infinispan 1K +1

added 1 year ago

An open source data grid platform and highly scalable NoSQL cloud data store.

Java Caching Libraries Data Platforms

last commit 16 hours ago apache/ignite 5K +2

added 1 year ago

Apache Ignite is a distributed database for high-performance computing with in-memory speed.

Java Caching Libraries Data Platforms

last commit 1 day ago apache/kafka 32K +17

added 1 year ago

Distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Java Data Platforms Kafka Libraries

last commit 1 day ago hazelcast/hazelcast 6K +9

added 1 year ago

A unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.

Java Caching Libraries Data Platforms