Creating a 3D image for display consists of a series of steps. First, the objects to be displayed are loaded up into memory from individual models. The display system then applies mathematical functions to transform the models into a common coordinate system, the world view. From this world view, a series of polygons (typically triangles) is created that approximates the original models as seen from a particular viewpoint, the camera. Next, a compositing system produces an image by rendering the triangles and applying textures to the outside. Textures are small images that are painted onto the triangles to produce realism. The resulting image is then combined with various special effects, and moved into the display buffers. This basic conceptual layout is known as the display pipeline.
In general terms, the display changes little from one frame to another; generally for any given transition from frame-to-frame, the objects in the display are likely to move slightly, but their shape and textures are unlikely to change at all. Changing the geometry is a relatively lightweight operation for the CPU, loading the textures from memory considerably more expensive, and then sending the resulting rendered frame to the framebuffer the most expensive operation of all.
For example, consider rendering settings of the era with 24-bit color, with basic 3D compositing with trilinear filtering and no anti-aliasing: At 640 x 480 resolution it would require 1,900 MB/s of memory bandwidth; at 1024 x 768 resolution it would require 4,900 MB/s. Even basic anti-aliasing would be expected to roughly double those figures. For reference, SGI's then-current RealityEngine2 machines featured a then-high memory bandwidth of about 10,000 MB/s, which was the reason these machines were widely used in 3D graphics. A typical PC of the era using AGP 2X could offer only 508 MB/s.
The first attack on this problem was the introduction of graphics accelerators that handled the texture storage and mapping. These cards, like the original Voodoo Graphics, had the CPU re-calculate the geometry for every frame, and then send the resulting series of co-ordinates to the card. The card then handled the rest of the operation; applying the textures to the geometry, rendering the frame, applying filtering or anti-aliasing, and outputting the results to a local framebuffer. The bandwidth needs in such a system were dramatically reduced; a scene with 10,000 triangles might need 500 to 1000 kB/s, depending on how many of the geometry points could be shared between triangles.
As scene complexity increased, the need to re-generate the geometry for what was essentially a fixed set of objects started to become a bottleneck of its own. Much greater improvements in performance could be had if the graphics card also stored and manipulated the polygons. In such a system, the entire display pipeline could be run on the card, requiring minimal interactions with the CPU. This would require the graphics card to be much "smarter"; as opposed to the very simple operations involved in applying textures, the card would now have to have a complete processor able to calculate the functions used in 3D modeling. At the time a number of companies were exploring this path, the so-called "transform and lighting" cards or T&L, but the complexity and cost of the systems would be considerable.
One solution that was studied during this period was the concept of tiled rendering. This was based on the observation that small changes in camera position could be simulated by manipulating small 2D images, the "tiles". For instance, movement of the camera into the scene can be simulated by taking each tile and making it slightly larger. Likewise, other movements in the scene can be simulated with the application of the appropriate affine transform. However, this process is only approximate, as the movement increases, the visual fidelity will decrease. However, in most cases such a system may reduce the need to re-calculate geometry to every two to three frames on average.
The problem with this approach is that not all tiles necessarily have to be re-rendered every time, only those that contain objects close to the camera. If the entire geometry is sent to the card then this task can be handled entirely on-card, but this requires cards of similar complexity to T&L systems. If the geometry is kept under the control of the CPU, then ideally the card should be able to ask the CPU to re-render only those objects in tiles that are outdated. In many cases, this would require the CPU's rendering pipeline to be changed. In any event, the card and/or drivers need to know about the ordering and position of the objects, something that is normally hidden in the code.
Talisman was a complete suite of software and hardware that attempted to solve the tiled rendering problem. The system shared some information about the tiles and the objects within them in order to find out which tiles were outdated. If a tile became outdated, the CPU was asked to re-render the objects in that tile, and send the results back into the driver and then to the card. Once a particular tile was rendered on the card, it was stored on the card in compressed format so it could be re-used on future frames. Microsoft calculated that each tile could be re-used for about four frames on average, thereby reducing load on the CPU by about four times.
In Talisman, image buffers were broken down into 32 x 32 pixel "chunks" that were individually rendered using the 3D objects and textures provided by the CPU. Pointers to the chunks were then stored in a z-ordered (front to back) list for every 32 scan-lines on the display. One concern is that the chunks cannot be cleanly "stitched together", a problem that has sometimes been visible in various videogames using software rendering. To avoid this, Talisman also stored a separate "edge buffer" for every chunk that stored an "overflow" area that would cover gaps in the mapping.
In a conventional 3D system, geometry is periodically generated, sent to the card for composition, composed into a framebuffer, and then eventually picked up by the video hardware for display. Talisman systems essentially reversed this process; the screen was divided into the 32-line-high strips, and while the video hardware was drawing one of these strips, the hardware would call the Talisman side and tell it to prepare the details for the next strip.
The system would respond by retrieving any chunks that were visible in that strip given the current camera location. In the typical case many of the chunks would be obscured by other chunks, and could be ignored during compositing, saving time. This is the reason for the z-sorting of the chunks, which allows them to be efficiently retrieved in "visibility order". If the chunks could be modified without distortion, the proper affine transform was called to update the chunk in-place. If it could not, say because the camera had moved too much since the last full update, the CPU was asked to provide new geometry for that chunk, which the card then rendered and placed back in storage.
Talisman had no analog of a framebuffer, rendering chunks on demand directly to the screen as the monitor's scan line progressed down the screen. This is an interesting analog with the Atari 2600, which uses a similar system to render 2D images on the screen, a method known as "racing the beam". In both cases, this reduced the amount of memory needed, and the memory bandwidth being used between the display system and video hardware. In both cases this also required dramatically tighter integration between the video system and the programs running it. In the case of Talisman, the programs were required to store their objects in a particular format that the Talisman software drivers understood, allowing it to be quickly picked up from memory during interrupts.
The Talisman effort was Microsoft's attempt to commercialize concepts that had been experimented on for some time. In particular, the PixelFlow system developed at a Hewlett-Packard research lab at the University of North Carolina at Chapel Hill can be considered Talisman's direct parent.
When Talisman was first made widely public at the 1996 SIGGRAPH meeting, they promised a dramatic reduction in the cost of implementing a graphics subsystem. They planned on working with vendors to sell the concept of Talisman for inclusion into other companies' display systems. That is, Talisman was hoped to be a part of a larger media chip, as opposed to an entire 3D system that would stand alone in a system. Their basic system would support 20-30,000 polygons on a 1024 x 768 display at 32 bit/pixel, with a 40 Mpixel/s polygon rendering rate and 320 Mpixel/s image layer compositing rate.
At the time, Microsoft was working with several vendors in order to develop a reference implementation known as Escalante. Samsung and 3DO were working together to design a single-chip DSP-like "Media Signal Processor" (MSP), combining Talisman functionality with additional media functionality. Cirrus Logic would provide a VLSI chip that would retrieve data placed in memory by the MSP, apply effects, and send it off for display. Known as the "Polygon Object Processor" (POP), this chip was periodically polled by another Cirrus Logic chip, the "Image Layer Compositor" (ILC), which was tied to the video circuitry. Additionally, Escalante intended to feature 4 MB of RDRAM on two 600 MHz 8-bit channels, offering 1.2 GB/s throughput. Later Philips entered the fray with a planned new version of their TriMedia processor, which implemented most of Talisman in a single CPU, and Trident Microsystems, with similar plans.
It was in the midst of the Talisman project that the first person shooter genre started to come to the fore in gaming. This created market demand for accelerators that could be used with existing games with minimal changes. By the time the Escalante reference design was ready for production, the market forces had already resulted in a series of newer card designs with such improved performance that the Talisman cards simply couldn't compete. Cards with large amounts of RAM arranged to allow for extremely high speeds solved the bandwidth issue, simply brute forcing the problem instead of attempting to solve it through clever implementation.
Additionally, the Talisman concept required tight integration between the display system and the software using it. Unlike the new 3D cards coming to market at the time, Talisman systems would have to be able to ask the CPU to re-render portions of the image in order to update their chunks. This required the games to have a specific organization in memory in order to respond to these requests. In order to aid developers in this task, Direct3D was changed to more closely match the Talisman needs. However, for any game that had already been written, or those that didn't want to be tied to Talisman, this made the D3D system slower and considerably less interesting.
As a result of these changes, Talisman never became a commercial product. Cirrus Logic and Samsung both gave up on the system some time in 1997, leading Microsoft to abandon plans to release Escalante in 1997, and to external observers it appeared the entire project was dead.
There was a brief rebirth soon after, however, when Fujitsu claimed to be working on a single-chip implementation that would be available in 1998, with rumors of similar projects at S3 Graphics and ATI Technologies. None of these systems ever shipped and Talisman was quietly killed. This was much to the delight of the 3rd party graphics accelerator vendors, as well as the people within Microsoft that supported them in the market with DirectX.
Nevertheless, several of the ideas pioneered in the Talisman system have since become common in most accelerators. In particular, texture compression is now widely used. On more recent cards, compression has also been used on the z-buffers to reduce memory demands while sorting the display. The idea of using "chunks" to sort the display has also been used in a small number of cards, referred to as tile based rendering, but like Talisman in general these have never become competitive in the desktop space due to the rapid changes in the market. However, many recent graphics processors specifically designed for mobile devices (such as cell phones) employ a tile-based approach. Only the one key idea of Talisman, asking for updates to geometry only "when needed", has not been attempted since.