Samiksha Jaiswal (Editor)

Azure SQL Data Warehouse

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

Azure SQL Data Warehouse is a cloud-based data warehouse-as-a-service hosted within Microsoft’s Azure platform. It has a massively parallel processing (MPP) shared nothing architecture capable of distributing query computation over a set of compute nodes running Azure SQL Database and uses Azure Storage Blobs as the underlying data storage.

Contents

By combining MPP architecture and Azure storage capabilities, SQL Data Warehouse can:

  • Grow or shrink storage independent of compute
  • Grow or shrink compute without moving data
  • Pause compute capacity while keeping data intact
  • Resume compute capacity at a moment's notice
  • It is designed to handle computational and data intensive queries on datasets exceeding 1TB. This technology is based on Microsoft’s on premise data warehousing product, Parallel Data Warehouse (PDW), which runs on the Analytics Platform System (APS).

    Azure SQL Data Warehouse decouples compute and storage enabling compute power to be adjusted independently of storage based on workload requirements any given time.

    Allocation of resources to SQL Data Warehouse is measured in Data Warehouse Units (DWUs). DWUs are a measure of underlying resources like CPU, memory, IOPS, which are allocated to your SQL Data Warehouse. Increasing the number of DWUs increases resources and performance. Specifically, DWUs help ensure that:

  • Scale a data warehouse easily, without worrying about the underlying hardware or software.
  • Predict performance improvement for a DWU level before changing the size of a data warehouse.
  • The underlying hardware and software of an instance can change or move without affecting the workload performance.
  • Microsoft can make adjustments to the underlying architecture of the service without affecting the performance of a workload.
  • Microsoft can rapidly improve performance in SQL Data Warehouse, in a way that is scalable and evenly effects the system.
  • This PaaS also features PolyBase which allows queries to span across both structured and unstructured analytical workloads using Transact-SQL(TSQL). PolyBase is also the optimal method to load data into SQL Data Warehouse as it does so in parallel across the compute nodes.

    All data is written geo-redundantly across regions for high availability and disaster recovery. Scaling computation does not require reorganization of the underlying data which allows this operation to take only the time needed to allocate new compute nodes. This data warehousing service also includes the ability to pause a data warehouse instance which also pauses the billing of the compute nodes.

    Data Tools

    SQL Data Warehouse integrates with data tools such as Azure Machine Learning for advanced analytics, Azure Data Factory for data orchestration, Azure Stream Analytics for real-time analytics, and Power BI for data visualization. Standard query tools like SQL Management Studio (SSMS), SQLCMD, and SQL Server Data Tools can also be used to execute queries.

    Use Cases

  • Disaster Recovery
  • Stream Analytics and Machine Learning
  • Enterprise-wide views and Analysis
  • History

    SQL Data Warehouse is generally available as of July 12, 2016. It was in limited public preview on June 25, 2015.

    Pricing

    Azure SQL Data Warehouse, is an elastic cloud data warehousing service that allows pausing and scaling on demand. It creates a SQL-based view across all your data which enables business insights. Pricing in January 2017 is approximately 1/10th of the cost of traditional appliance solutions.

    References

    Azure SQL Data Warehouse Wikipedia