Puneet Varma (Editor)

Apache Beam

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Development status
  
Active

Operating system
  
Cross-platform

Written in
  
Java, Python

Apache Beam

Developer(s)
  
Apache Software Foundation

Initial release
  
June 15, 2016; 8 months ago (2016-06-15)

Stable release
  
0.5.0 / February 2, 2017; 20 days ago (2017-02-02)

Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. Beam Pipelines are defined using one of the provided SDKs and executed in one of the Beam’s supported runners (distributed processing back-ends) including Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow

It has been termed an "uber-API for big data".

History

Apache Beam is one implementation of the Dataflow model paper. The Dataflow model is based on previous work on distributed processing abstractions at Google, in particular on FlumeJava and Millwheel.

Google released an open SDK implementation of the Dataflow model in 2014 and an environment to execute Dataflows locally (non-distributed) as well as in the Google Cloud Platform service.

In 2016 Google donated the core SDK as well as the implementation of a local runner, and a set of IOs (data connectors) to access Google Cloud Platform data services to the Apache Software Foundation. Other companies and members of the community have contributed runners for existing distributed execution platforms, as well as new IOs to integrate the Beam Runners with existing Databases, Key-Value stores and Message systems. Additionally new DSLs have been proposed to support specific domain needs on top of the Beam Model.

References

Apache Beam Wikipedia