Supriya Ghosh (Editor)

EMC ScaleIO

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Developer(s)
  
EMC Corporation

Platform
  
IA-32, AMD64

Development status
  
active

Type
  
System_software

Original author(s)
  
ScaleIO (israeli start-up)

Operating system
  
Microsoft Windows (2008 R2, 2012 or 2012 R2), Red Hat Enterprise Linux or CentOS (6 or 7), SUSE Linux 12, Ubuntu 14.04

EMC ScaleIO is a software-defined storage product from EMC Corporation that creates a server-based storage area network (SAN) from local application server storage, converting direct-attached storage into shared block storage. It uses existing host-based internal storage to create a scalable, high-performance, low-cost server SAN. EMC promotes its ScaleIO server storage-area network software as a way to converge computing resources and commodity storage into a “single-layer architecture.”

Contents

ScaleIO can scale from three compute/storage nodes to over 1,000 nodes that can drive up to 240 million IOPS of performance. Developers can deploy the ScaleIO software on-prem commodity infrastructure or in the cloud and then port their applications back into a production ScaleIO instance. As of September, 2015, ScaleIO is also available from the company bundled on EMC commodity computing servers (officially called EMC ScaleIO Node).

ScaleIO can be deployed as storage only or as a converged infrastructure combining storage, computational and networking resources into a single block. Capacity and performance of all available resources are aggregated and made available to every participating ScaleIO server and application. Storage tiers can be created with media types and drive types that match the ideal performance or capacity characteristics to best suit the application needs. It is available for free for testing (with community support) or as a paid-for EMC-supported option.

History

ScaleIO was founded in 2011 by Boaz Palgi, Erez Webman, Lior Bahat, Eran Borovik, and Erez Ungar in Israel. The software was designed for high performance and large systems. The company was backed by venture capital firms including Greylock Partners and Norwest Venture Partners. A product was announced in November 2012.

EMC Corporation bought ScaleIO in June 2013 for about $200 million, only about six months after the company emerged from stealth mode. EMC began promoting ScaleIO in 2014 and 2015, marketing it in competition with EMC’s own data storage arrays. Also in 2015, EMC introduced a model of its VCE (company) hyper-converged infrastructure hardware that supported ScaleIO storage.

At its 2015 trade show, EMC announced that ScaleIO would be made freely available to developers for testing. By May, 2015, developers could download the ScaleIO software. In September 2015, EMC announced the availability of the previously software-only ScaleIO pre-bundled on EMC commodity hardware, called EMC ScaleIO Node.

Architecture

EMC ScaleIO uses storage and compute resources of commodity hardware. It combines HDDs, SSDs, and PCIe flash cards to create a virtual pool of block storage with varying performance tiers. It features on-demand performance and storage scalability, as well as enterprise-grade data protection, multi-tenant capabilities, and add-on enterprise features such as QoS, thin provisioning and snapshots. ScaleIO operates on multiple hardware platforms and supports physical and/or virtual application servers.

ScaleIO works by installing software components on application hosts. Application hosts contribute internal disks and any other direct attached storage resources to the ScaleIO cluster by installing the SDS software. Hosts can then be presented volumes from the ScaleIO cluster by leveraging the SDC software. These components can run alongside other applications on any server (physical, virtual, or cloud) using any type of storage media (disk drives, flash drives, PCIe flash cards, or cloud storage).

The ScaleIO architecture is built on two components: a data client and a data server. The ScaleIO Data Client (SDC) is a lightweight device driver situated in each host whose application or file system requires access to the ScaleIO virtual SAN block devices. The SDC exposes block devices representing the ScaleIO volumes that are currently mapped to that host. The SDCs maintain a small in-memory map, being able to maintain mapping of petabytes of data with just megabytes of RAM. The inter-node protocol used by SDCs is simpler than iSCSI and uses fewer network resources.

The ScaleIO Data Server (SDS) is situated in each host and contributes local storage to the central ScaleIO virtual SAN. Each node is part of a loosely coupled cluster.

Performance is expected to increase as servers and storage devices are added to the cluster. Additional storage and compute resources (i.e., additional servers and drives) can be added modularly. Every server in the ScaleIO cluster is used in the processing of I/O operations, making all I/O and throughput accessible to any application within the cluster. Any needed rebuilds and rebalances are processed in the background. Workloads are evenly shared with a parallel I/O architecture.

ScaleIO can be deployed in either a “two-layer” multi-server cluster in which the application and storage are installed in separate servers, or as “hyper-converged” option where the application and storage are installed on the same servers in the ScaleIO cluster, creating a low-footprint, low-cost scalable single-layer architecture. Capacity and performance of all available resources are aggregated and made available to every participating ScaleIO server and application. Storage tiers can be created with media types and drive types that match the ideal performance or capacity characteristics to best suit the application needs.

Storage and compute resources can be added to or removed from the ScaleIO cluster as needed, with no downtime and minimal impact to application performance. The self-healing, auto-balancing capability of the ScaleIO cluster ensures that data is automatically rebuilt and rebalanced across resources when components are added, removed, or failed. Because every server and local storage device in the cluster is used in parallel to process I/O operations and protect data, system performance scales linearly as additional servers and storage devices are added to the configuration.

ScaleIO software takes each data chunk to be written and spreads it across many nodes, mirroring it as well. This makes data rebuilds from disk loss very fast as several nodes contribute their own smaller, faster and parallel rebuild efforts to the whole. ScaleIO supports VMware, Hyper-V, Xen and KVM hypervisors. It also supports OpenStack, Windows, Red Hat, SLES, CentOS, and CoreOS (docker). Any app needing block storage can use it, including Oracle and other databases. While it is not as closely integrated with VMware as Virtual SAN, the SDC functionality has moved into the VMware kernel.

Promotional reports were commissioned from the Enterprise Strategy Group in 2014 and 2015 including performance measurements.

References

EMC ScaleIO Wikipedia