Neha Patil (Editor)

Memory ordering

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

Memory ordering describes the order of accesses to computer memory by a CPU. The term can refer either to the memory ordering generated by the compiler during compile time, or to the memory ordering generated by a CPU during runtime.

Contents

In modern microprocessors, memory ordering characterizes the CPUs ability to reorder memory operations - it is a type of out-of-order execution. Memory reordering can be used to fully utilize the bus-bandwidth of different types of memory such as caches and memory banks.

On most modern uniprocessors memory operations are not executed in the order specified by the program code. In single threaded programs all operations appear to have been executed in the order specified, with all out-of-order execution hidden to the programmer – however in multi-threaded environments (or when interfacing with other hardware via memory buses) this can lead to problems. To avoid problems memory barriers can be used in these cases.

Compile-time memory ordering

The compiler has some freedom to resort the order of operations during compile time. However this can lead to problems if the order of memory accesses is of importance.

Compile-time memory barrier implementation

These barriers prevent a compiler from reordering instructions during compile time – they do not prevent reordering by CPU during runtime.

  • The GNU inline assembler statement
  • asm volatile("" ::: "memory");

    or even

    __asm__ __volatile__ ("" ::: "memory");

    forbids GCC compiler to reorder read and write commands around it.

  • The C11/C++11 command
  • atomic_signal_fence(memory_order_acq_rel);

    forbids the compiler to reorder read and write commands around it.

  • Intel ECC compiler uses "full compiler fence"
  • __memory_barrier()

    intrinsics.

  • Microsoft Visual C++ Compiler:
  • _ReadWriteBarrier()

    In symmetric multiprocessing (SMP) microprocessor systems

    There are several memory-consistency models for SMP systems:

  • Sequential consistency (all reads and all writes are in-order)
  • Relaxed consistency (some types of reordering are allowed)
  • Loads can be reordered after loads (for better working of cache coherency, better scaling)
  • Loads can be reordered after stores
  • Stores can be reordered after stores
  • Stores can be reordered after loads
  • Weak consistency (reads and writes are arbitrarily reordered, limited only by explicit memory barriers)
  • On some CPUs

  • Atomic operations can be reordered with loads and stores.
  • There can be incoherent instruction cache pipeline, which prevents self-modifying code from being executed without special instruction cache flush/reload instructions.
  • Dependent loads can be reordered (this is unique for Alpha). If the processor fetches a pointer to some data after this reordering, it might not fetch the data itself but use stale data which it has already cached and not yet invalidated. Allowing this relaxation makes cache hardware simpler and faster but leads to the requirement of memory barriers for readers and writers.
  • Some older x86 and AMD systems have weaker memory ordering

    SPARC memory ordering modes:

  • SPARC TSO = total store order (default)
  • SPARC RMO = relaxed-memory order (not supported on recent CPUs)
  • SPARC PSO = partial store order (not supported on recent CPUs)
  • Hardware memory barrier implementation

    Many architectures with SMP support have special hardware instruction for flushing reads and writes during runtime.

  • x86, x86-64
  • lfence (asm), void _mm_lfence(void) sfence (asm), void _mm_sfence(void) mfence (asm), void _mm_mfence(void)
  • PowerPC
  • sync (asm)
  • MIPS
  • sync (asm)
  • Itanium
  • mf (asm)
  • POWER
  • dcs (asm)
  • ARMv7
  • dmb (asm) dsb (asm) isb (asm)

    Compiler support for hardware memory barriers

    Some compilers support builtins that emit hardware memory barrier instructions:

  • GCC, version 4.4.0 and later, has __sync_synchronize.
  • Since C11 and C++11 an `atomic_thread_fence()` command was added.
  • The Microsoft Visual C++ compiler has MemoryBarrier().
  • Sun Studio Compiler Suite has __machine_r_barrier, __machine_w_barrier and __machine_rw_barrier.
  • References

    Memory ordering Wikipedia