Supriya Ghosh (Editor)

Pivotal Greenplum Database

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Original author(s)
  
Greenplum

License
  
Apache

Developer(s)
  
Pivotal Software

Stable release
  
4.3.11.1 / January, 2017

Operating system
  
Red Hat Enterprise Linux 64-bit 5.x and 6.x, SuSE Linux Enterprise Server 64-bit 10 SP4, 11 SP1, 11 SP2, Oracle Unbreakable Linux 64-bit 5.5, CentOS 64-bit 5.x, and 6.x

Pivotal Greenplum Database is a database management system. It was originally developed by Greenplum which was acquired by EMC Corporation in July 2010 and spun out into Pivotal Software in 2013.

Contents

System Overview

Pivotal Greenplum Database is a MPP (massively parallel processing) database built on open source PostgreSQL. The system consists of a master node, standby master node, and segment nodes. All of the data resides on the segment nodes and the catalog information is stored in the master nodes. Segment nodes run one or more segments, which are modified PostgreSQL database instances and are assigned a content identifier. For each table the data is divided among the segment nodes based on the distribution column keys specified by the user in the DDL statement. For each segment content identifier there is both a primary segment and mirror segment which are not running on the same physical host. When a SQL query enters the master node, it is parsed, optimized and dispatched to all of the segments to execute the query plan and either return the requested data or insert the result of the query into a database table.

Bulk loading and unload is also supported directly to the segment nodes, bypassing the master nodes and can read and write external data from ETL nodes, flat files, or HDFS file systems residing outside of the Greenplum cluster. Greenplum is known for fast parallel data loading/unloading as well as fast internal data transfer for operations such as CTAS (Create Table as Select).

Greenplum supports ACID principles of transaction management for concurrent data access and modification, allowing it to be a system of record database, but is optimized for analytical database workloads as opposed to OLTP workloads. SQL language, SQL:2003 standard, is the interface to the data in Greenplum. User defined functions can be written in languages such as Python, R, Perl, Java, C, or pgSQL and called from within a SQL query.

In February 2015 Pivotal announced it would be open source software by the end of 2015.

Competition

The primary competitors for Pivotal Greenplum Database, are the other MPP database systems provided by major industry vendors such as Teradata, Amazon Redshift, Azure Data Warehouse and IBM Netezza. Additional competition comes from other smaller competitors, column-oriented databases such as HP Vertica and data warehousing vendors with, non MPP architecture, such as Oracle Exadata, IBM DB2, and Hadoop distributions such as Cloudera and Hortonworks.

References

Pivotal Greenplum Database Wikipedia