Zipper (data structure) - Alchetron, the free social encyclopedia

A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages. The zipper was described by Gérard Huet in 1997. It includes and generalizes the gap buffer technique sometimes used with arrays.

The zipper technique is general in the sense that it can be adapted to lists, trees, and other recursively defined data structures. Such modified data structures are usually referred to as "a tree with zipper" or "a list with zipper" to emphasize that the structure is conceptually a tree or list, while the zipper is a detail of the implementation.

A layman's explanation for a tree with zipper would be an ordinary computer filesystem with operations to go to parent (often cd ..), and the possibility to go downwards (cd subdirectory). The zipper is the pointer to the current path. Behind the scenes the zippers are efficient when making (functional) changes to a data structure, where a new, slightly changed, data structure is returned from an edit operation (instead of making a change in the current data structure).

Example: Bidirectional list traversal

Many common data structures in computer science can be expressed as the structure generated by a few primitive constructor operations or observer operations. These include the structure of finite lists, which can be generated by two operations:

Empty constructs an empty list,

Cons(x, L) constructs a list by prepending or concatenating value x in front of list L.

A list such as [1, 2, 3] is therefore the declaration Cons(1, Cons(2, Cons(3, Empty))). It is possible to describe the location in such a list as the number of steps from the front of the list to the target location. More formally, a location in the list is the number of Cons operations required to reconstruct the whole list from that particular location. For example, in Cons(1, Cons(2, Cons( X, Cons(4, Empty)))) a Cons(2, L) and a Cons(1, L) operation would be required to reconstruct the list relative to position X otherwise known as Cons( X, Cons(4, Empty)). This recording together with the location is called a zipped representation of the list or a list-zipper.

To be clear, a location in the list is not just the number of Cons operations, but also all of the other information about them; in this case, the values that must be reconnected. Here, these may be conveniently represented in as a separate list in the order of application from the target location. Specifically, from the context of "3" in the list [1, 2, 3], a recording (commonly referred to as a 'path') could be represented as [2, 1] where Cons(2, L) is applied followed by (Cons 1, L) to reconstitute the original list starting from [X, 4].

A list-zipper always represents the entire data structure. However, this information is from the perspective of a specific location within that data structure. Consequently, a list-zipper is a pair consisting of both the location as a context or starting point, and a recording or path that permits reconstruction from that starting location. In particular, the list-zipper of [1, 2, 3, 4] at the location of "3" may be represented as ([2, 1], [3, 4]). Now, if "3" is changed to "10", then the list-zipper becomes ([2, 1], [10, 4]). The list may then be efficiently reconstructed: [1, 2, 10, 4] or other locations traversed to.

With the list represented this way, it is easy to define relatively efficient operations on immutable data structures such as Lists and Trees at arbitrary locations. In particular, applying the zipper transform to a tree makes it is easy to insert or remove values at any particular location in the tree.

Uses

The zipper is often used where there is some concept of focus or of moving around in some set of data, since its semantics reflect that of moving around but in a functional non-destructive manner.

The zipper has been used in

Xmonad, to manage focus and placement of windows

Huet's papers cover a structural editor based on zippers and a theorem prover

A filesystem (ZipperFS) written in Haskell offering "...transactional semantics; undo of any file and directory operation; snapshots; statically guaranteed the strongest, repeatable read, isolation mode for clients; pervasive copy-on-write for files and directories; built-in traversal facility; and just the right behavior for cyclic directory references."

Clojure has extensive support for zippers.

Zipper contexts and differentiation

It has been shown that the type of the items in the context list produced by the zipper transformation is the "derivative" of the original type in a sense that is related to differentiation in calculus by decategorification. Most datatypes are constructed from products and sums of datatypes; any given datatype looks like a polynomial or a Taylor series, and the representation of the type of context items looks like the derivative of that polynomial or series. In a recursive datatype like a list or a tree, the derivative is taken with respect to the recursion variable.

Consider a recursive data structure like a binary tree labeled by data of type A.

The derivative is computed by introducing a recursion variable R

and we recover the original data structure by finding the fixed point R = T ( A , R ) . The datatype of the context is

By taking the fixed point R = T ( A , R ) , we find that a zipper for a tree consists of a "path" and a downward subtree, where a path is a context list of triples consisting of

a value for the root of the tree (type A)

a choice of left or right subtree in which to find the hole (type 2), and

the value of the other subtree (type R).

In general, then, a zipper for a datatype T parameterized by some other type A and a recursion variable R consists of two parts: a context list with items of type d T ( A , R ) d R | R = T ( A , R ) and a copy of the downward substructure T ( A , R ) | R = T ( A , R ) .

Direct modification

In a non-purely-functional programming language, it may be more convenient to simply traverse the original data structure and modify it directly (perhaps after deep cloning it, to avoid affecting other code that might hold a reference to it).

Generic zipper

The Generic Zipper is a technique to achieve the same goal as the conventional zipper by capturing the state of the traversal in a continuation while visiting each node. (The Haskell code given in the reference uses generic programming to generate a traversal function for any data structure, but this is optional – any suitable traversal function can be used.)

However, the Generic Zipper involves inversion of control, so some uses of it require a state machine (or equivalent) to keep track of what to do next.

References

Zipper (data structure) Wikipedia

(Text) CC BY-SA