Supriya Ghosh (Editor)

LEB128

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

LEB128 or Little Endian Base 128 is a form of variable-length code compression used to store an arbitrarily large integer in a small number of bytes. LEB128 is used in the DWARF debug file format.

Contents

Encoding format

LEB128 format is very similar to variable-length quantity format; the primary difference is that LEB128 is little-endian whereas variable-length quantities are big-endian. Both allow small numbers to be stored in a single byte, while also allowing encoding of arbitrarily long numbers. There are 2 versions of LEB128: unsigned LEB128 and signed LEB128. The decoder must know whether the encoded value is unsigned LEB128 or signed LEB128.

Unsigned LEB128

To encode an unsigned number using unsigned LEB128 first represent the number in binary. Then zero extend the number up to a multiple of 7 bits (such that the most significant 7 bits are not all 0). Break the number up into groups of 7 bits. Output one encoded byte for each 7 bit group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte. The number zero is encoded as a single byte 0x00.

As an example, here is how the unsigned number 624485 gets encoded:

10011000011101100101 In raw binary 010011000011101100101 Padded to a multiple of 7 bits 0100110 0001110 1100101 Split into 7-bit groups 00100110 10001110 11100101 Add high 1 bits on all but last group to form bytes 0x26 0x8E 0xE5 In hexadecimal 0xE5 0x8E 0x26 Output stream

Unsigned LEB128 and VLQ (variable-length quantity) both compress any given integer into not only the same number of bits, but exactly the same bits—the two formats differ only in exactly how those bits are arranged.

Signed LEB128

A signed number is represented similarly, except that the two's complement number is sign extended up to a multiple of 7 bits (ensuring that the most significant bit is zero for a positive number and one for a negative number). Then the number is broken into groups as for the unsigned encoding.

For example the signed number -624485 (0xFFF6789B) is encoded as 0x9B 0xF1 0x59. The lower bits of the two's complement of it is 0110_01111000_10011011; to ensure the MSB of 1, padding one 1 to 21 bit is enough; and encoding 1011001_1110001_0011011 is 0x9B(10011011) 0xF1(11110001) 0x59 (01011001).

Uses

The DWARF file format uses both unsigned and signed LEB128 encoding for various fields.

The mpatrol debugging tool uses LEB128 in its tracing file format.

The Android project uses LEB128 in its Dalvik Executable Format (.dex) file format.

Compressing tables in Hewlett-Packard IA-64 exception handling.

It is used in the Linux kernel for its DWARF implementation.

It is used in WebAssembly's portable binary encoding of the modules.

It is used in LLVM's Coverage Mapping Format.

Osu! uses LEB128 in its osu! replay (.osr) format.

The LLVM bitcode file format uses a similar technique except that the value is broken into groups of bits of context-dependent size, with the highest bit indicating a continuation, instead of a fixed 7 bits.

Dlugosz' Variable-Length Integer Encoding uses multiples of 7 bits for the first three size breaks, but after that the increments vary. It also puts all the prefix bits at the beginning of the word, instead of at the beginning of each byte.

Protocol Buffers use the same encoding for unsigned integers, but encode signed integers by prepending the sign as the least significant bit.

References

LEB128 Wikipedia