Suvarna Garge (Editor)

POWER9

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Produced
  
2017

Common manufacturer(s)
  
GlobalFoundries

Min. feature size
  
14 nm (FinFET)

Designed by
  
IBM

Max. CPU clock rate
  
4 GHz

Instruction set
  
Power Architecture (Power ISA v.3.0)

POWER9 are a family of superscalar symmetric multiprocessors based on the Power Architecture announced in August 2016 at the Hot Chips conference. The POWER9 based processors will be manufactured using a 14 nm FinFET process, and will come in at least four versions; 12- and 24-core versions for scale out and 12- and 24-core versions for scale up applications, and possibly more since the POWER9 architecture is open for licensing and modification by the OpenPOWER Foundation members.

Contents

Systems using POWER9 are slated to be available in 2017.

Variants

POWER9 devices will be targeting two different markets, the scale-out and scale-up markets, each variant come in 12- and 24-core versions:

  • IBM POWER9 SO – scale-out version, optimized for dual socket computers with up to 120 GB/s bandwidth to directly attached DDR4 memory (targeted for release in 2017)
  • IBM POWER9 SU – scale-up version, optimized for four sockets or more, for large NUMA machines with up to 230 GB/s bandwidth to buffered memory
  • Design

    The POWER9 core comes in two variants, one is four-way multithreading called SMT4 and one eight-way called SMT8. The SMT4- and SMT8-cores are quite similar, in that they consist of a number of so-called slices fed by common schedulers. A slice is a rudimentary 64-bit single threaded processing core with load store unit (LSU), integer unit (ALU) and a vector scalar unit (VSU, doing SIMD and floating point). A super-slice is the combination of two slices. An SMT4-core consists of a 32 KB L1 cache, a 32 KB L1 data cache, an instruction fetch unit (IFU) and an instruction sequencing unit (ISU) which feeds two super-slices. An SMT8-core has two sets of L1 caches and, IFUs and ISUs to feed four super-slices. The result is that the 12-core and 24-core versions of POWER9 each consist of the same amount of slices, i.e. 96 each and the same amount of L1 cache.

    A POWER9 core, whether SMT4 and SMT8, has a 12-stage pipeline (five stage shorter than its predecessor, the POWER8) but aims to retain the clock frequency of around 4 GHz. It will be the first to incorporate elements of the Power ISA v.3.0 that was released in December 2015, including the VSX-3 instructions The POWER9 design is made to be modular and used in more processor variants and used for licensing, on a different fabrication process than IBM's. On chip are co-processors for compression and cryptography, as well as a large low-latency eDRAM L3 cache.

    I/O

    A lot of facilities are on-chip for helping with massive off-chip I/O performance:

  • The SO version has integrated DDR4 controllers for directly attached RAM, while the SU will use the off-chip Centaur architecture introduced with POWER8 to include high performance eDRAM L4 cache and memory controllers for DDR4 RAM.
  • The Blueink facility running NVLink v.2 interconnects for close attachment of graphics co-processors from Nvidia and Open CAPI accelerators.
  • General purpose PCIe v.4 connections for attaching regular ASICs, FPGAs and other peripherals as well as CAPI 2.0 and CAPI 1.0 devices designed for POWER8.
  • Multiprocessor (Symmetric multiprocessor system) links to connect other POWER9 processors in on the same motherboard, or in other closely attached enclosures.
  • Supercomputers

    The United States Department of Energy together with Oak Ridge National Laboratory and Lawrence Livermore National Laboratory have contracted IBM and Nvidia to build two supercomputers, the Summit and the Sierra, that will be based on POWER9 processors coupled with Nvidia's Volta GPUs. These systems are slated to go online in 2017.

    References

    POWER9 Wikipedia