📝
knowledge
  • links
  • 14-Pure-Education
    • My Knowledge Wiki 🌿
      • .github
        • ISSUE_TEMPLATE
          • Question 🤔
          • bug_report
          • Feature ✨
        • Summary
      • design
        • Animation
        • Fonts
        • Framer
        • Color
        • figma
          • Figma
          • Figma plugins
        • Inkscape
        • Blender
        • Design
        • Interior Design
        • Icons
        • Design inspiration
        • 3D modeling
        • Design systems
        • Industrial Design
        • User Experience
        • Logos
      • databases
        • Neo4j
        • Fauna
        • sql
          • SQL
        • blockchain
          • Cardano
          • Arweave
          • Tezos
          • Polkadot
          • Uniswap
          • Ethereum
          • Blockchain
        • Kdb+
        • Cassandra DB
        • PostgreSQL
        • FoundationDB
        • SQLite
        • Prometheus
        • Dgraph
        • Redis
        • DynamoDB
        • Databases
        • Memcached
        • MariaDB
        • Prisma
        • MongoDB
      • augmented-reality
        • Augmented Reality
        • ARKit
      • art
        • Art
        • Pen plotting
        • Drawing
        • Photography
        • Generative art
        • Sketching
        • Comics
        • Anime
        • Furniture
        • Dancing
        • Architecture
        • Clothes
        • Tattoos
      • computer-graphics
        • computer-vision
          • Optical character recognition
          • Computer vision
        • Procedural generation
        • Metal
        • SVG
        • WebGPU
        • [Ray tracing](https://en.wikipedia.org/wiki/Ray_tracing_(graphics))
        • Computer graphics
        • WebGL
        • CUDA
        • OpenGL
        • Vulkan API
        • Bézier curves
        • Shaders
        • Image processing
        • [Rendering](https://en.wikipedia.org/wiki/Rendering_(computer_graphics))
      • computer-science
        • Parsing
        • algorithms
          • Algorithms
          • Compression
        • Computer Science
        • Computer architecture
        • formal-verification
          • Formal verification
          • TLA+
        • Automata theory
        • data-structures
          • Data structures
      • business
        • startups
          • Marketplaces
          • Funding
          • Values
          • Onboarding
          • Venture capital
          • Startups
          • Payroll
        • Products
        • Business
        • Restaurants
        • Landing pages
        • Pricing
      • compilers
        • LLVM
        • Linters
        • build-systems
          • Build systems
          • Bazel
        • Compilers
      • books
        • Mind for numbers - Review
        • Thinking, fast and slow
        • Brave new world
        • Elements of programming interviews
        • Rich dad poor dad
        • Programming in Haskell
        • Code: hidden language of software
        • Surely you are joking Mr Feynman
        • Books
        • Mindstorms
        • Eloquent ruby
        • go-in-action
        • Crafting interpreters
        • Cracking the coding interview
        • Artificial Intelligence: A Modern Approach
      • devops
        • Observability
        • DevOps
        • Site Reliability Engineering
        • Terraform
      • cryptocurrencies
        • Nano
        • Cryptocurrencies
        • Bitcoin
        • Stellar
        • Libra
        • TON
      • backups
        • Backups
      • 3d-printing
        • 3D Printing
      • distributed-systems
        • message-queue
          • Message queue
          • ZeroMQ
          • MQTT
        • [Load balancing](https://en.wikipedia.org/wiki/Load_balancing_(computing))
        • rpcs
          • gRPC
          • Remote Procedure Calls
        • Distributed systems
        • Conflict-free replicated data type
      • cli
        • Command Line Tools
        • Tmux
        • Ngrok
        • Sed
      • automation
        • Home automation
        • Automation
      • biology
        • Computational biology
        • Biology
        • Evolution
        • genomics
          • DNA
          • Genomics
        • immunology
          • Immunotherapy
          • Immunology
        • Bionics
        • bioinformatics
          • Bioinformatics
        • Viruses
      • cloud-computing
        • serverless-computing
          • AWS Lambda
          • Serverless computing
          • Cloudflare workers
        • Cloud computing
        • gcp
          • Google Cloud
        • aws
          • AWS Amplify
          • AWS
        • azure
          • Azure
      • articles
        • Articles
      • anki
        • Anki
      • data-science
        • Data Science
        • Data Visualization
        • Data processing
        • Apache Kafka
      • consciousness
        • Consciousness
        • Ego
      • documentaries
        • Documentaries
      • Summary
      • api
        • API
      • animals
        • Birds
        • Animals
      • courses
        • Courses
      • analytics
        • Analytics
      • chemistry
        • Chemistry
Powered by GitBook
On this page

Was this helpful?

  1. 14-Pure-Education
  2. My Knowledge Wiki 🌿
  3. data-science

Data processing

PreviousData VisualizationNextApache Kafka

Last updated 4 years ago

Was this helpful?

Links

  • - System for fast, large-scale, serverless data processing using Go.

  • - Language and runtime for distributed, incremental data processing in the cloud.

  • ()

  • - Implementation of differential dataflow using timely dataflow on Rust. () ()

  • - Data processing and visualization environment built on a principle that people need an immediate connection to what they are building.

  • - Event Sourcing and Stream Processing Pipelines at Grab.

  • ()

  • - High-Performance Serverless event and data processing platform.

  • - Unified analytics engine for large-scale data processing. () () ()

  • ()

  • - Event replay platform. Version control for data passing through your messaging systems. ()

  • ()

  • - Multi-Model Abstract Data Type. Distributed virtual machine capable of integrating a diverse collection of data processing technologies. ()

  • - Open source layer that delivers resilience and manageability to object-storage based data lakes. ()

  • - High performance, composable and extendable data-processing pipeline for the big data era.

  • - Fast, scalable distributed memory data parallel library for processing structured data. ()

  • - GPU Graph Analytics.

  • - Secure Apache Spark SQL.

  • - Unified programming model for Batch and Streaming. ()

  • - Simple, extensible ETL built for data teams.

  • - Unified Data Analytics. () ()

  • - Simple Data Processing Method to Improve Robustness and Uncertainty.

  • - Framework for building end-to-end functional data pipelines from modular components.

  • - Way to specify data processing workflows with a human-readable and writeable syntax.

  • - Open source serverless data solutions. Future of data pipelines. ()

Bigslice
Reflow
Self-managing serverless computing with Bigmachine (2019)
Bigslice: a cluster computing system for Go (2019)
When your data doesn’t fit in memory: the basic techniques (2019)
HN
Differential Dataflow
Book
HN
The Log: What every software engineer should know about real-time data's unifying abstraction (2013)
Luna
Guide To The Data Lake — Modern Batch Data Warehousing (2020)
Plumbing At Scale (2020)
Differential Dataflow! But at what COST? (2017)
HN
Timely Dataflow and Total Order (2020)
Nuclio
Apache Spark
PySpark
PySpark Style Guide
Article
Spark: The Definitive Guide Book (2018)
Code
Batch
HN
A log/event processing pipeline you can't have (2019)
HN
mm-ADT
Code
Data Preprocessing in Machine Learning (2020)
lakeFS
Web
Baker
Cylon
Web
cuGraph
Opaque
Apache Beam
Web
Stitch
Databricks
GitHub
CLI
AugMix
Snapflow
Workflow Description Language (WDL)
Cloudfuse
GitHub
Create your own data stream for Kafka with Python and Faker (2021)