In a computer system, a blitter is a circuit, sometimes as a coprocessor or a logic block on a microprocessor, that is dedicated to the rapid movement and modification of data within that computer's memory. A blitter is capable of copying large quantities of data from one memory area to another relatively quickly, and in parallel with the CPU, while freeing up the CPU's more complex ISA for more general operations.
The name comes from the bit blit operation, which stands for bit-block transfer. A typical use for a blitter is the movement of a bitmap, such as windows and fonts in a graphical user interface or sprites and backgrounds in a 2D computer game. A blit operation is more than a memory copy, because it can involve data that's not byte aligned (hence the bit in bit blit), and because it may need to handle transparent pixels—pixels which should not overwrite the destination data.
In early computers with raster-graphics output, the screen buffer was normally held in main memory and updated using software running on the CPU. For many simple graphics routines, like sprite support or flood filling polygons, large amounts of memory had to be manipulated, and many CPU cycles were spent fetching and decoding instructions for repetitive loops of simple shift/masking operations. For CPUs without caches, this bus requirement for instructions was as significant as data.
The 1973 Xerox Alto, where the term bit blit originated, had a bit block transfer instruction implemented in microcode, making it much faster than the same operation written on the CPU. The microcode was implemented by Dan Ingalls.
The MS-DOS compatible Mindset contained a custom VLSI chip to move rectangular sections of a bitmap. The hardware handled transparency and eight modes for combining the source and destination data. Released in 1984, the Mindset was claimed to have graphics up to 50x faster than PCs of the time, but the system was not successful.
The Commodore Amiga, released the following year, also has a full-featured blitter. The first US patent filing to use the term blitter was "Personal computer apparatus for block transfer of bit-mapped image data," assigned to Commodore-Amiga, Inc. Compared to the MC68000 processor, the blitter needs no memory cycles for fetching instructions, no silicon for decoding, and contains a barrel shifter to assist shifting pixel-accurate graphics in bitplanes. It also performs a "4 operand" boolean operation (typically destination:=op(destination, source, mask))
The IBM 8514/A display adapter, introduced with the IBM Personal System/2 computers in April 1987, includes bit block transfer hardware.
Later models of the Atari ST include a blitter co-processor, which was stylized as the BLiTTER chip. It was introduced on the Mega series, and then also supported on most later machines (except the Atari TT).
The short-lived Atari Transputer Workstation (1989) contained blitter hardware as part of its "Blossom" video system.
1982's Robotron: 2084 from Williams Electronics includes two blitter chips which allow the game to have up to 80 simultaneously moving objects. Performance was measured at roughly 910 KB/second. The blitter operates on 4-bit (16 color) pixels. Color 0 is transparent, allowing for non-rectangular shapes. Williams used the same blitters in other games from the same time period, including Sinistar and Joust.
The TMS34010, released in 1986, is a general purpose 32-bit processor with additional blitter-like instructions for manipulating bitmap data. It is optimized for cases that normally take extra processing on the CPU, such as handling transparent pixels, working with non-byte aligned data, and converting between bit depths. The TMS34010 served as both CPU and GPU for a number of games in the late 1980s and early 1990s, including Hard Drivin', Narc, Smash TV, Mortal Kombat, and NBA Jam.
Hardware sprites are small bitmaps that can be positioned independently, composited together with the background on-the-fly by the video chip, so no actual modification of the frame buffer occurs. Sprite systems are more efficient for moving graphics, typically requiring 1/3 the memory cycles as only image data needed to be fetched, with the subsequent compositing happening on-chip. The downside of sprites is a limit of moving graphics per scanline, which can range from between two (Atari 2600) to eight (Commodore 64 and Atari 8-bits) to significantly higher for 16-bit arcade hardware and consoles, and the inability to update a permanent bitmap making them unsuitable for general desktop GUI acceleration.
Typically, a computer program would put information into certain registers describing what memory transfer needed to be completed and the logical operations to perform on the data, then trigger the blitter to begin operating. The CPU is then free to begin some other operation while the blitter operates.
The destination for the transfer is usually the frame buffer. However, a blitter can also be used for non-graphics work. For example, an area of memory might be filled with zeroes using a blitter more quickly than can be accomplished with the CPU. Additionally, simple mathematical operations can be built from basic logical operations.
The image at right helps illustrate how a blitter may use a 'mask' to decide which pixels to transfer and which to leave untouched. The mask operates like a stencil, showing which pixels in the source image will be written to destination memory. The logical operation would be Dest = ((Background) AND (Mask)) OR (Sprite).
The Amiga stored framebuffers in separate 'bitplanes' (e.g. a series of 5 1bit images combining to produce a 32-colour display), which made masking very convenient, as masks needed only one bitplane. Other systems could perform masking with a transparent colour.
Blitters have been replaced by the modern graphics processing unit. Modern GPUs are designed primarily for 3D graphics, and have added the ability to modify bitmaps in mathematically advanced ways, allowing arbitrary image transformations, texture decompression and filtering, shading for illumination models, alpha blend compositing operations, and depth-buffer comparison/update.
Graphics processing units have evolved beyond pure graphics accelerators with the addition of general purpose programmable floating point units applicable to general purpose computing. They differ from most CPUs in being massively parallel processors optimized for data-parallel throughput instead of rapid individual-instructions of low latency.
In this respect, GPU's have also taken over a role that used to be filled by DSPs such as the Motorola 56001, that were sometimes used for geometry, image and sound processing in intermediate 16/32bit era workstations, accelerator cards and gaming machines, like the Atari Falcon, Macintosh Quadra AV, and Sega Saturn.