Bit Manipulation Instructions Sets (BMI sets) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD. The purpose of these instruction sets is to improve the speed of bit manipulation. All the instructions in these sets are non-SIMD and operate only on general-purpose registers.
Contents
- ABM Advanced Bit Manipulation
- BMI1 Bit Manipulation Instruction Set 1
- BMI2 Bit Manipulation Instruction Set 2
- Parallel bit deposit and extract
- TBM Trailing Bit Manipulation
- Supporting CPUs
- References
There are two sets published by Intel: BMI (here referred to as BMI1) and BMI2; they were both introduced with the Haswell microarchitecture. Another two sets were published by AMD: ABM (Advanced Bit Manipulation, which is also a subset of SSE4a implemented by Intel as part of SSE4.2 and BMI1), and TBM (Trailing Bit Manipulation, an extension introduced with Piledriver-based processors as an extension to BMI1).
In the description of a patch to the GNU binutils package, AMD explicitly revealed that the first iteration of "Zen", its third-generation x86-64 architecture, will not support TBM, XOP and LWP instructions developed specifically for the "Bulldozer" microarchitecture.
ABM (Advanced Bit Manipulation)
ABM is only implemented as a single instruction set by AMD; all AMD processors support both instructions or neither. Intel considers POPCNT
as part of SSE4.2, and LZCNT
as part of BMI1. POPCNT
has a separate CPUID flag; however, Intel uses AMD's ABM
flag to indicate LZCNT
support (since LZCNT
completes the ABM).
LZCNT
is almost identical to the Bit Scan Reverse (BSR
) instruction, but sets the ZF (if the result is zero) and CF (if the source is zero) flags rather than OF, and produces a defined result (the source operand size in bits) if the source operand is zero.
BMI1 (Bit Manipulation Instruction Set 1)
The instructions below are those enabled by the BMI
bit in CPUID. Intel officially considers LZCNT
as part of BMI, but advertises LZCNT
support using the ABM
CPUID feature flag. BMI1 is available in AMD's Jaguar, Piledriver and newer processors, and in Intel's Haswell and newer processors.
BMI2 (Bit Manipulation Instruction Set 2)
Intel introduced BMI2 together with BMI1 in its line of Haswell processors. Only AMD has produced processors supporting only BMI1 without BMI2; BMI2 is supported by AMDs newest Excavator architecture.
Parallel bit deposit and extract
The PDEP
and PEXT
instructions are new generalized bit-level compress and expand instructions. They take two inputs; one is a source, and the other is a selector. The selector is a bitmap selecting the bits that are to be packed or unpacked. PEXT
copies selected bits from the source to contiguous low-order bits of the destination; higher-order destination bits are cleared. PDEP
does the opposite for the selected bits: contiguous low-order bits are copied to selected bits of the destination; other destination bits are cleared. This can be used to extract any bitfield of the input, and even do a lot of bit-level shuffling that previously would have been expensive. While what these instructions do is similar to a bit level gather-scatter SIMD instructions, PDEP
and PEXT
instructions (like the rest of the BMI instruction sets) operate on general-purpose registers.
Below are a few 8-bit examples of these operations:
TBM (Trailing Bit Manipulation)
TBM consists of instructions complementary to the instruction set started by BMI1; their complementary nature means they do not necessarily need to be used directly but can be generated by an optimizing compiler when supported. AMD introduced TBM together with BMI1 in its Piledriver line of processors; AMD Jaguar and upcoming Zen processors do not support TBM.