Stream processing made easy | Dagger
Built for scale
Dagger or Data Aggregator is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink for stateful processing of both real time and historical streaming data. With Dagger, you don't need to write custom applications to process data as a stream. Instead, you can write SQLs to do the processing and analysis on streaming data.
Reliable & consistent processing
Provides built-in support for fault-tolerant execution that is consistent and correct regardless of data size, cluster size, processing pattern or pipeline complexity.
Robust recovery mechanism
Checkpoints, Savepoints & State-backup ensure that even in unforeseen circumstances, clusters & jobs can be brought back within minutes.
SQL and more
Define business logic in a query & kick-start your streaming job; but it is not just that, there is support for user-defined functions & pre-defined transformations.
Scale
Dagger scales in an instant, both vertically and horizontally for high performance streaming sink and zero data drops.
Extensibility
Add your own sink to dagger with a clearly defined interface or choose from already provided ones. Use Kafka Source for processing real time data or opt for Parquet Source to stream historical data from Parquet Files.
Flexibility
Add custom business logic in form of plugins (UDFs, Transformers, Preprocessors and Post Processors) independent of the core logic.
Key Features
Stream processing platform for transforming, aggregating and enriching data in real-time mode with ease of operation & unbelievable reliability. Dagger can deployd in VMs or cloud-native environment to makes resource provisioning and deployment simple & straight-forward, the only limit to your data processing is your imagination.
Aggregations
Supports Tumble & Slide for time-windows. Longbow feature supports large windows upto 30-day.
SQL Support
Query writing made easy through formatting, suggestions, auto-completes and template queries.
Stream Enrichment
Enrich streamed messages from HTTP endpoints or database sources to bring offline & reference data context to real-time processing.
Observability
Always know what’s going on with your deployment with built-in monitoring of throughput, response times, errors and more.
Analytics Ecosystem
Dagger can transform, aggregate, join and enrich data in real-time for operational analytics using InfluxDB, Grafana and others.
Stream Transformations
Convert messages on the fly for a variety of use-cases such as feature engineering.
Support for Real Time and Historical Data Streaming
Use Kafka Source for processing real time data or opt for Parquet Source to stream historical data from Parquet Files.
Proud Users
Dagger was originally created for the Gojek data processing platform, and it has been used, adapted and improved by other teams internally and externally.