Neha Patil

ARM big.LITTLE

Updated on
Share on FacebookTweet on TwitterShare on LinkedIn
ARM big.LITTLE

ARM big.LITTLE is a heterogeneous computing architecture developed by ARM Holdings, coupling relatively battery-saving and slower processor cores (LITTLE) with relatively more powerful and power-hungry ones (big). Typically, only one "side" or the other will be active at once, but since all the cores have access to the same memory regions, workloads can be swapped between Big and Little cores on the fly. The intention is to create a multi-core processor that can adjust better to dynamic computing needs and use less power than clock scaling alone. ARM's marketing material promises up to a 75% savings in power usage for some activities.

Contents

In October 2011, Big.Little was announced along with the Cortex-A7, which was designed to be architecturally compatible with the Cortex-A15. In October 2012 ARM announced the Cortex-A53 and Cortex-A57 (ARMv8-A) cores, which are also compatible with each other to allow their use in a Big.Little chip. ARM later announced the Cortex-A12 at Computex 2013 followed by the Cortex-A17 in February 2014, both can also be paired in a Big.Little configuration with the Cortex-A7.

Run-state migration

There are three ways for the different processor cores to be arranged in a Big.Little design, depending on the scheduler implemented in the kernel.

Clustered switching

The clustered model approach is the first and simplest implementation, arranging the processor into identically-sized clusters of "Big" or "Little" cores. The operating system scheduler can only see one cluster at a time; when the load on the whole processor changes between low and high, the system transitions to the other cluster. All relevant data is then passed through the common L2 cache, the first core cluster is powered off and the other one is activated. A Cache Coherent Interconnect (CCI) is used. This model has been implemented in the Samsung Exynos 5 Octa (5410).

In-kernel switcher (CPU migration)

CPU migration via the in-kernel switcher (IKS) involves pairing up a 'Big' core with a 'Little' core, with possibly many identical pairs in one chip. Each pair operates as one virtual core, and only one real core is (fully) powered up and running at a time. The 'Big' core is used when the demand is high and the 'Little' core is employed when demand is low. When demand on the virtual core changes (between high and low), the incoming core is powered up, running state is transferred, the outgoing is shut down, and processing continues on the new core. Switching is done via the cpufreq framework. A complete Big.Little IKS implementation was added in Linux 3.11. Big.Little IKS is an improvement of Cluster Migration, the main difference is that each pair is visible to the scheduler.

The more complex arrangement involves a non-symmetric grouping of 'Big' and 'Little' cores. A single chip could have one or two 'Big' cores and many more 'Little' cores, or vice versa. Nvidia created something similar to this with the low-power 'companion core' in their Tegra 3 SoC.

Heterogeneous multi-processing (global task scheduling)

The most powerful use model of Big.Little architecture is heterogeneous multi-processing (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "Big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "Little" cores.

This model has been implemented in the Samsung Exynos starting with the Exynos 5 Octa series (5420, 5422, 5430).

Scheduling

The paired arrangement allows for switching to be done transparently to the operating system using the existing dynamic voltage and frequency scaling (DVFS) facility. The existing DVFS support in the kernel (e.g. cpufreq in Linux) will simply see a list of frequencies/voltages and will switch between them as it sees fit, just like it does on the existing hardware. However, the low-end slots will activate the 'Little' core and the high-end slots will activate the 'Big' core.

Alternatively, all the cores may be exposed to the kernel scheduler, which will decide where each process/thread is executed. This will be required for the non-paired arrangement but could possibly also be used on the paired cores. It poses unique problems for the kernel scheduler, which, at least with modern commodity hardware, has been able to assume all cores in a SMP system are equal.

Advantages of global task scheduling

  • Finer-grained control of workloads that are migrated between cores. Because the scheduler is directly migrating tasks between cores, kernel overhead is reduced and power savings can be correspondingly increased.
  • Implementation in the scheduler also makes switching decisions faster than in the cpufreq framework implemented in IKS.
  • The ability to easily support non-symmetrical SoCs (e.g. with 2 Cortex-A15 cores and 4 Cortex-A7 cores).
  • The ability to use all cores simultaneously to provide improved peak performance throughput of the SoC compared to IKS.
  • References

    ARM big.LITTLE Wikipedia


    Similar Topics
    Just the Way You Are (2015 film)
    Gary Bennett (referee)
    Olivier Monterrubio
    Topics