Literate programming

Updated on Dec 02, 2024

Edit

Comment

Originally published 1983		Author Donald Knuth

Similar Donald Knuth books, Computer program books

Donald knuth literate programming 66 97

Literate programming is a programming paradigm introduced by Donald Knuth in which a program is given as an explanation of the program logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which a compilable source code can be generated.

The literate programming paradigm, as conceived by Knuth, represents a move away from writing programs in the manner and order imposed by the computer, and instead enables programmers to develop programs in the order demanded by the logic and flow of their thoughts. Literate programs are written as an uninterrupted exposition of logic in an ordinary human language, much like the text of an essay, in which macros are included to hide abstractions and traditional source code.

Literate programming (LP) tools are used to obtain two representations from a literate source file: one suitable for further compilation or execution by a computer, the "tangled" code, and another for viewing as formatted documentation, which is said to be "woven" from the literate source. While the first generation of literate programming tools were computer language-specific, the later ones are language-agnostic and exist above the programming languages.

History and philosophy

Literate programming was first introduced by Donald E. Knuth. The main intention behind this approach was to treat program as a literature understandable to human beings. This approach was implemented at Stanford university as a part of research on algorithms and digital typography. This implementation was further called as “WEB” by Donald Knuth since he believed that it was one of the few three-letter words of English that hadn’t already been applied to computer. However, it correctly resembles the complicated nature of software delicately pieced together from simple materials.

Concept

Literate programming is writing out the program logic in a human language with included (separated by a primitive markup) code snippets and macros. Macros in a literate source file are simply title-like or explanatory phrases in a human language that describe human abstractions created while solving the programming problem, and hiding chunks of code or lower-level macros. These macros are similar to the algorithms in pseudocode typically used in teaching computer science. These arbitrary explanatory phrases become precise new operators, created on the fly by the programmer, forming a meta-language on top of the underlying programming language.

A preprocessor is used to substitute arbitrary hierarchies, or rather "interconnected 'webs' of macros", to produce the compilable source code with one command ("tangle"), and documentation with another ("weave"). The preprocessor also provides an ability to write out the content of the macros and to add to already created macros in any place in the text of the literate program source file, thereby disposing of the need to keep in mind the restrictions imposed by traditional programming languages or to interrupt the flow of thought.

Advantages

According to Knuth, literate programming provides higher-quality programs, since it forces programmers to explicitly state the thoughts behind the program, making poorly thought-out design decisions more obvious. Knuth also claims that literate programming provides a first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program's creation. The resulting documentation allows the author to restart his own thought processes at any later time, and allows other programmers to understand the construction of the program more easily. This differs from traditional documentation, in which a programmer is presented with source code that follows a compiler-imposed order, and must decipher the thought process behind the program from the code and its associated comments. The meta-language capabilities of literate programming are also claimed to facilitate thinking, giving a higher "bird's eye view" of the code and increasing the number of concepts the mind can successfully retain and process. Applicability of the concept to programming on a large scale, that of commercial-grade programs, is proven by an edition of TeX code as a literate program.

Contrast with documentation generation

Literate programming is very often misunderstood to refer only to formatted documentation produced from a common file with both source code and comments – which is properly called documentation generation – or to voluminous commentaries included with code. This is backwards: well-documented code or documentation extracted from code follows the structure of the code, with documentation embedded in the code; in literate programming code is embedded in documentation, with the code following the structure of the documentation.

This misconception has led to claims that comment-extraction tools, such as the Perl Plain Old Documentation or Java Javadoc systems, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.

Workflow

Implementing literate programming consists of two steps:

Weaving: Generating comprehensive document about program and its maintenance.
Tangling: Generating machine executable code

Weaving and Tangling are done on the same source so that they are consistent with each other.

Example

A classic example of literate programming is the literate implementation of the standard Unix wc word counting program. Knuth presented a CWEB version of this example in Chapter 12 of his Literate Programming book. The same example was later rewritten for the noweb literate programming tool. This example provides a good illustration of the basic elements of literate programming.

Creation of macros

The following snippet of the wc literate program shows how arbitrary descriptive phrases in a natural language are used in a literate program to create macros, which act as new "operators" in the literate programming language, and hide chunks of code or other macros. The mark-up notation consists of double angle brackets ("<<...>>") that indicate macros, the "@" symbol which indicates the end of the code section in a noweb file. The "<<*>>" symbol stands for the "root", topmost node the literate programming tool will start expanding the web of macros from. Actually, writing out the expanded source code can be done from any section or subsection (i.e. a piece of code designated as "<<name of the chunk>>=", with the equal sign), so one literate program file can contain several files with machine source code.

The unraveling of the chunks can be done in any place in the literate program text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text that envelops the whole program.

Program as a web—macros are not just section names

Macros are not the same as "section names" in standard documentation. Literate programming macros can hide any chunk of code behind themselves, and be used inside any low-level machine language operators, often inside logical operators such as "if", "while" or "case". This is illustrated by the following snippet of the wc literate program.

In fact, macros can stand for any arbitrary chunk of code or other macros, and are thus more general than top-down or bottom-up "chunking", or than subsectioning. Knuth says that when he realized this, he began to think of a program as a web of various parts.

Order of human logic, not that of the compiler

In a noweb literate program besides the free order of their exposition, the chunks behind macros, once introduced with "<<...>>=", can be grown later in any place in the file by simply writing "<<name of the chunk>>=" and adding more content to it, as the following snippet illustrates ("plus" is added by the document formatter for readability, and is not in the code).

Record of the train of thought

The documentation for a literate program is produced as part of writing the program. Instead of comments provided as side notes to source code a literate program contains the explanation of concepts on each level, with lower level concepts deferred to their appropriate place, which allows for better communication of thought. The snippets of the literate wc above show how an explanation of the program and its source code are interwoven. Such exposition of ideas creates the flow of thought that is like a literary work. Knuth wrote a "novel" which explains the code of the computer strategy game Colossal Cave Adventure.

Tools

The first published literate programming environment was WEB, introduced by Donald Knuth in 1981 for his TeX typesetting system; it uses Pascal as its underlying programming language and TeX for typesetting of the documentation. The complete commented TeX source code was published in Knuth's TeX: The program, volume B of his 5-volume Computers and Typesetting. Knuth had privately used a literate programming system called DOC as early as 1979. He was inspired by the ideas of Pierre-Arnoul de Marneffe. The free CWEB, written by Knuth and Silvio Levy, is WEB adapted for C and C++, runs on most operating systems and can produce TeX and PDF documentation.

There are various other implementations of the literate programming concept:

Axiom, which is evolved from scratchpad, a computer algebra system developed by IBM. It is now being developed by Tim Daly, one of the developer of scratchpad, Axiom is totally written as a literate program.

noweb is independent of the programming language of the source code. It is well known for its simplicity, given the need of using only two text markup conventions and two tool invocations, and it allows for text formatting in HTML rather than going through the TeX system.

Literate is a "modern literate programming system." Like noweb, it works with any programming language, but it produces pretty-printed and syntax-highlighted HTML, and it tries to retain all the advantages of CWEB, including output formatted like CWEB. Other notable advantages compared with older tools include being based on Markdown and generating well-formatted "tangled" code. See External Links.

FunnelWeb is another LP tool that can produce HTML documentation output. It has more complicated markup (with "@" escaping any FunnelWeb command), but has many more flexible options. Like noweb, it is independent of the programming language of the source code.

Nuweb can translate a single LP source into any number of code files in any mix of languages together with documentation in LaTeX. It does it in a single invocation; it does not have separate weave and tangle commands. It does not have the extensibility of noweb, but it can use the listings package of LaTeX to provide pretty-printing and the hyperref package to provide hyperlinks in PDF output. It also has extensive indexing and cross-referencing facilities including cross-references from the generated code back to the documentation, both as automatically generated comments and as strings that the code can use to report its behaviour. Vimes is a type-checker for Z notation which shows the use of nuweb in a practical application. Around 15,000 lines of nuweb source are translated into nearly 15,000 lines of C/C++ code and over 460 pages of documentation. See External links.

Molly is a LP tool written in Perl, which aims to modernize and scale it with "folding HTML" and "virtual views" on code. It uses "noweb" markup for the literate source files. See External links.

Codnar is an inverse literate programming tool available as a Ruby Gem (see External links). Instead of the machine-readable source code being extracted out of the literate documentation sources, the literate documentation is extracted out of the normal machine-readable source code files. This allows these source code files to be edited and maintained as usual. The approach is similar to that used by popular API documentation tools, such as JavaDoc. Such tools, however, generate API reference documentation, while Codnar generates a linear narrative describing the code, similar to that created by classical LP tools. Codnar can co-exist with API documentation tools, allowing both a reference manual and a linear narrative to be generated from the same set of source code files.

The Leo text editor is an outlining editor which supports optional noweb and CWEB markup. The author of Leo mixes two different approaches: first, Leo is an outlining editor, which helps with management of large texts; second, Leo incorporates some of the ideas of literate programming, which in its pure form (i.e., the way it is used by Knuth Web tool or tools like "noweb") is possible only with some degree of inventiveness and the use of the editor in a way not exactly envisioned by its author (in modified @root nodes). However, this and other extensions (@file nodes) make outline programming and text management successful and easy and in some ways similar to literate programming.

The Haskell programming language has native support for semi-literate programming, generally inspired by CWEB but with a significantly reduced functionality and simpler implementation. When aiming for TeX output, one writes a plain LaTeX file where source code is marked by a given surrounding environment; LaTeX can be set up to handle that environment, while the Haskell compiler looks for the right markers to identify Haskell statements to compile, removing the TeX documentation as if they were comments. However, as described above, this is not literate programming in the sense intended by Knuth. Haskell's functional, modular nature makes literate programming directly in the language somewhat easier, but it is not nearly as powerful as one of the WEB tools where "tangle" can reorganize in arbitrary ways.

The Web 68 Literate Programming system uses Algol 68 as the underlying programming language, although there is nothing in the pre-processor 'tang' to force the use of that language.

Emacs org-mode for literate programming through Babel, which allows embedding blocks of source code from multiple programming languages within a single text document. Blocks of code can share data with each other, display images inline, or be parsed into pure source code using the noweb reference syntax.

CoffeeScript supports a "literate" mode, which enables programs to be compiled from a source document written in Markdown with indented blocks of code.

Wolfram Language, formerly known as Mathematica, is written in notebooks which combine text with code.

Swift (programming language), created by Apple Inc. can be edited in Playgrounds which provide an interactive programming environment that evaluates each statement and displays live results as the code is edited. Playgrounds also allow the user to add Markup language along with the code that provide headers, inline formatting and images.

Julia (programming language) supports the iJulia mode of development which - inspired by iPython - works in the format of notebooks, which combine text, graphs, etc. with the written code.

References

Literate programming Wikipedia

(Text) CC BY-SA