GeForce 700 series

Updated on Dec 17, 2024

Edit

Comment

Release date May 2013 Architecture Kepler		Codename GK110GK208

Models GeForce SeriesGeForce GT SeriesGeForce GTX Series Fabrication process and transistors 585M 28 nm (GF117)1.020M 28 nm (GK208)1.270M 28 nm (GK107)3.540M 28 nm (GK104)7.080M 28 nm (GK110) Entry-level GeForce GT 705GeForce GT 710GeForce GT 720GeForce GT 730GeForce GT 740GeForce GTX 745

The GeForce 700 Series is a family of graphics processing units developed by Nvidia, used in desktop and laptop PCs. It is mainly based on a refresh of the Kepler microarchitecture (GK-codenamed chips) used in the previous GeForce 600 Series, but also includes cards based on the previous Fermi (GF) and later Maxwell (GM) architectures. A number of GeForce 700 series chips were released for mobile devices in April 2013. GeForce 700 series cards were first released in 2013, starting with the release of the GeForce GTX Titan on February 19, 2013, and the GeForce GTX 780 on May 23, 2013.

Overview

GK110 has been designed and is being marketed with computational performance in mind. It contains 7.1 billion transistors. This model also attempts to maximise energy efficiency through the execution of as many tasks as possible in parallel according to the capabilities of its streaming processors.

With GK110, increases in memory space and bandwidth for both the register file and the L2 cache over previous models, are seen. At the SMX level, GK110's register file space has increased to 256KB composed of 65K 32bit registers, as compared to Fermi's 33K 32bit registers totaling 128 KB. As for the L2 cache, GK110 L2 cache space increased by up to 1.5MB, 2x as big as GF110. Both the L2 cache and register file bandwidth have also doubled. Performance in register-starved scenarios is also improved as there are more registers available to each thread. This goes in hand with an increase of total number of registers each thread can address, moving from 63 registers per thread to 255 registers per thread with GK110.

With GK110, Nvidia also reworked the GPU texture cache to be used for compute. With 48KB in size, in compute the texture cache becomes a read-only cache, specializing in unaligned memory access workloads. Furthermore, error detection capabilities have been added to make it safer for use with workloads that rely on ECC.

This series will support DirectX 12.

Dynamic Super Resolution (DSR) was added to Kepler GPUs with the latest Nvidia drivers.

Architecture

The GeForce 700 Series contains features from both GK104 and GK110. Kepler based members of the 700 series add the following standard features to the GeForce family.

Derived from GK104 :

PCI Express 3.0 interface

DisplayPort 1.2

HDMI 1.4a 4K x 2K video output

Purevideo VP5 hardware video acceleration (up to 4K x 2K H.264 decode)

Hardware H.264 encoding acceleration block (NVENC)

Support for up to 4 independent 2D displays, or 3 stereoscopic/3D displays (NV Surround)

Bindless Textures

GPU Boost

TXAA

Manufactured by TSMC on a 28 nm process

New Features from GK110 :

Compute Focus SMX Improvement

CUDA Compute Capability 3.5

New Shuffle Instructions

Dynamic Parallelism

Hyper-Q (Hyper-Q's MPI functionality reserve for Tesla only)

Grid Management Unit

NVIDIA GPUDirect (GPU Direct’s RDMA functionality reserve for Tesla only)

Compute focus SMX improvement

With GK110, Nvidia opted to increase compute performance. The single biggest change from GK104 is that rather than 8 dedicated FP64 CUDA cores, GK110 has up to 64, giving it 8x the FP64 throughput of a GK104 SMX. The SMX also sees an increase in space for register file. Register file space has increased to 256KB compared to Fermi. The texture cache are also improved. With a 48KB space, the texture cache can become a read-only cache for compute workloads.

New shuffle Instructions

At a low level, GK110 sees additional instructions and operations to further improve performance. New shuffle instructions allow for threads within a warp to share data without going back to memory, making the process much quicker than the previous load/share/store method. Atomic operations are also overhauled, speeding up the execution speed of atomic operations and adding some FP64 operations that were previously only available for FP32 data.

Hyper-Q

Hyper-Q expands GK110 hardware work queues from 1 to 32. The significance of this being that having a single work queue meant that Fermi could be under occupied at times as there wasn’t enough work in that queue to fill every SM. By having 32 work queues, GK110 can in many scenarios, achieve higher utilization by being able to put different task streams on what would otherwise be an idle SMX. The simple nature of Hyper-Q is further reinforced by the fact that it’s easily map to MPI, a common message passing interface frequently used in HPC. As legacy MPI-based algorithms that were originally designed for multi-CPU systems that became bottlenecked by false dependencies now have a solution. By increasing the number of MPI jobs, it’s possible to utilize Hyper-Q on these algorithms to improve the efficiency all without changing the code itself.

Microsoft DirectX support

NVIDIA Kepler GPUs of the GeForce 700 series fully support DirectX 11.0.

NVIDIA will partially support the DX12 API on all the DX11-class GPUs it has shipped; these belong to the Fermi, Kepler and Maxwell architectural families.

Dynamic parallelism

Dynamic Parallelism ability is for kernels to be able to dispatch other kernels. With Fermi, only the CPU could dispatch a kernel, which incurs a certain amount of overhead by having to communicate back to the CPU. By giving kernels the ability to dispatch their own child kernels, GK110 can both save time by not having to go back to the CPU, and in the process free up the CPU to work on other tasks.

GeForce 700 (7xx) series

The GeForce 700 series for desktop architecture. Cheaper and lower performing products were expected to be released over time. Kepler supports 11.1 features with 11_0 feature level through the DirectX 11.1 API, however Nvidia did not enable four non-gaming features in Hardware in Kepler (for 11_1).

¹ Shader Processors : Texture mapping units : Render output units

² Pixel fillrate is calculated as the number of ROPs multiplied by the base core clock speed

³ Texture fillrate is calculated as the number of TMUs multiplied by the base core clock speed.

⁴ Single precision performance is calculated as 2 times the number of shaders multiplied by the base core clock speed.

⁵ Double precision performance of the GTX Titan & GTX Titan Black is either 1/3 or 1/24 of single-precision performance depending on a user-selected configuration option in the driver that boosts single-precision performance if double-precision is set to 1/24 of single-precision performance, while other Kepler chips' double precision performance is fixed at 1/24 of single-precision performance. GeForce 700 series Maxwell chips' double precision performance is 1/32 of single-precision performance.

⁶ SLI supports connecting up to 4 identical graphics cards for a 4-way SLI configuration. Those supporting 4-way SLI can support 3-way & 2-way SLI, however a dual-GPU card already implements 2-way SLI internally, thus only 2 dual-GPU cards can be used in SLI to give a 4-way SLI configuration.

GeForce 700M (7xxM) series

Some implementations may use different specifications.

¹ Unified shaders : Texture mapping units : Render output units

References

GeForce 700 series Wikipedia

(Text) CC BY-SA

Contents