Data processing
Links
Bigslice - System for fast, large-scale, serverless data processing using Go.
Reflow - Language and runtime for distributed, incremental data processing in the cloud.
Differential Dataflow - Implementation of differential dataflow using timely dataflow on Rust. (Book) (HN)
Luna - Data processing and visualization environment built on a principle that people need an immediate connection to what they are building.
Plumbing At Scale (2020) - Event Sourcing and Stream Processing Pipelines at Grab.
Nuclio - High-Performance Serverless event and data processing platform.
Apache Spark - Unified analytics engine for large-scale data processing. (PySpark) (PySpark Style Guide) (Article)
Baker - High performance, composable and extendable data-processing pipeline for the big data era.
cuGraph - GPU Graph Analytics.
Opaque - Secure Apache Spark SQL.
Apache Beam - Unified programming model for Batch and Streaming. (Web)
Stitch - Simple, extensible ETL built for data teams.
Databricks - Unified Data Analytics. (GitHub) (CLI)
AugMix - Simple Data Processing Method to Improve Robustness and Uncertainty.
Snapflow - Framework for building end-to-end functional data pipelines from modular components.
Workflow Description Language (WDL) - Way to specify data processing workflows with a human-readable and writeable syntax.
Last updated