Harman Patil (Editor)

UBJSON

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Original author(s)
  
The Buzz Media, LLC

Development status
  
Active

Operating system
  
Any

Stable release
  
Draft 12

Written in
  
Various languages

Platform
  
Cross-platform

Universal Binary JSON (UBJSON) is a computer data interchange format. It is a binary form directly imitating JSON, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.

Contents

Rationale and Objectives

UBJSON is a proposed successor to BSON, BJSON and others. UBJSON has the following goals:

  • Complete compatibility with the JSON specification – there is a 1:1 mapping between standard JSON and UBJSON.
  • Ease of implementation – only including data types that are widely supported in popular programming languages so that there are no problems with certain languages not being supported well.
  • Ease of use – it can be quickly understood and adopted.
  • Speed and efficiency – UBJSON uses data representations that are (roughly) 30% smaller than their compacted JSON counterparts and are optimized for fast parsing. Streamed serialisation is supported, meaning that the transfer of UBJSON over a network connection can start sending data before the final size of the data is known.
  • Data types and syntax

    UBJSON uses a single binary tuple to represent all JSON data types (both value and container types):

    type [length] [data]

    Each element in the tuple is defined as:

    type

    The type is a 1-byte ASCII character used to indicate the type of the data following it. The ASCII characters were chosen to make manually walking and debugging data stored in the UBJSON format as easy as possible (e.g. making the data relatively readable in a hex editor). Types are available for the five JSON value types and the two JSON container types. There is also a no-op (used for stream keep-alive) and an end-of-container marker, used when a container of (as yet) unknown size had previously been started.

  • UTF-8 string: s or S
  • Numbers: B, i, I, L, d, D, h or H - there are seven specialisations: byte (B), int16 (i), int32 (I), int64 (L), float32 (d), float64 (D), and huge (H)
  • Boolean types: true (T) and false (F)
  • Null: Z
  • Object container: o or O
  • Array container: a or A
  • No-op: N - no operation, to be ignored by the receiving end
  • End of container: E
  • Huge numbers are represented as an arbitrarily long, UTF-8 string-encoded numeric value.

    length (optional)

    The length is a 1-byte or 4-byte value based on the type specified. These are used for strings, huge numbers and container/array blocks. They are omitted for other types.

  • 1-byte: An unsigned byte (0 to 254) indicating the length of the data payload following it, for small items.
  • 1-byte: The byte value 255 indicating the container that follows has an (as yet) unknown size.
  • 4-byte: An unsigned integer (0 to 231-1) indicating the length of the data payload following it, for larger items.
  • The 1 and 4 byte lengths are easily differentiated because lower-case type characters are used in the 1-byte case, otherwise upper-case type characters are used.

    data (optional)

    A sequence of bytes representing the actual binary data for this type of value. All numbers are sent in big-endian order.

    Representation

    The MIME type 'application/ubjson' is recommended, as is the file extension '.ubj' when stored in a file-system.

    References

    UBJSON Wikipedia