Puneet Varma (Editor)

Wizard of Oz experiment

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit

In the field of human–computer interaction, a Wizard of Oz experiment is a research experiment in which subjects interact with a computer system that subjects believe to be autonomous, but which is actually being operated or partially operated by an unseen human being.

Contents

Concept

The phrase Wizard of Oz (originally OZ Paradigm) has come into common usage in the fields of experimental psychology, human factors, ergonomics, linguistics, and usability engineering to describe a testing or iterative design methodology wherein an experimenter (the “wizard”), in a laboratory setting, simulates the behavior of a theoretical intelligent computer application (often by going into another room and intercepting all communications between participant and system). Sometimes this is done with the participant’s a-priori knowledge and sometimes it is a low-level deceit employed to manage the participant’s expectations and encourage natural behaviors.

For example, a test participant may think he or she is communicating with a computer using a speech interface, when the participant’s words are actually being secretly entered into the computer by a person in another room (the “wizard”) and processed as a text stream, rather than as an audio stream. The missing system functionality that the wizard provides may be implemented in later versions of the system (or may even be speculative capabilities that current-day systems do not have), but its precise details are generally considered irrelevant to the study. In testing situations, the goal of such experiments may be to observe the use and effectiveness of a proposed user interface by the test participants, rather than to measure the quality of an entire system.

Origin

John F. (“Jeff”) Kelley coined the phrases “Wizard of OZ” and “OZ Paradigm” for this purpose circa 1980 to describe the method he developed during his dissertation work at Johns Hopkins University. (His dissertation advisor was the late professor Alphonse Chapanis, the “Godfather of Human Factors and Engineering Psychology”.) Amusingly enough, in addition to some one-way mirrors and such, there literally was a blackout curtain separating Jeff, as the “Wizard”, from view by the participant during the study.

The “Experimenter-in-the-Loop” technique had been pioneered at Chapanis’ Communications Research Lab at Johns Hopkins as early as 1975 (J. F. Kelley arrived in 1978). W. Randolph Ford used the experimenter-in-the-loop technique with his innovative CHECKBOOK program wherein he obtained language samples in a naturalistic setting. In Ford’s method, a preliminary version of the natural language processing system would be placed in front of the user. When the user entered a syntax that was not recognized, they would receive a “Could you rephrase that?” prompt from the software. After the session, the algorithms for processing the newly obtained samples would be created or enhanced and another session would take place. This approach led to the eventual development of his natural language processing technique, "Multi-Stage Pattern Reduction". Dr. Ford's recollection was that Dr. Kelley did in fact coin the phrase "Wizard of Oz Paradigm" but that the technique had been employed in at least two separate studies before Dr. Kelley had started conducting studies at the Johns Hopkins Telecommunications Lab. A similar early use of the technique to model a Natural Language Understanding system being developed at the Xerox Palo Alto Research Center was done by Allen Munro and Don Norman around 1975 at the University of California, San Diego. Again, the name "Wizard of Oz" had not yet been applied to this technique. The results were published in a 1977 paper by the team (Bobrow, et al.).

In that employment the experimenter (the “Wizard”) sat at a terminal in an adjacent room separated by a one-way mirror so the subject could be observed. Every input from the user was processed correctly by a combination of software processing and real-time experimenter intervention. As the process was repeated in subsequent sessions, more and more software components were added so that the experimenter had less and less to do during each session until asymptote was reached on phrase/word dictionary growth and the experimenter could “go get a cup of coffee” during the session (which at this point was a cross-validation of the final system’s unattended performance).

A final point: Dr. Kelley's recollection of the coinage of the term is backed up by that of the late professor Al Chapanis. In their 1985 University of Michigan technical report, Green and Wei-Haas state the following: The first appearance of the "Wizard of Oz" name in print was in Jeff Kelley's thesis (Kelley, 1983a, 1983b, 1984a). It is thought the name was coined in response to a question at a graduate seminar at Hopkins (Chapanis, 1984; Kelley, 1984b). "What happens if the subject sees the experimenter [behind the "curtain" in an adjacent room acting as the computer]?" Kelley answered: "Well, that's just like what happened to Dorothy in the Wizard of Oz." And so the name stuck. (Cited by permission.)

There is also a passing reference to planned use of the "Wizard of Oz experiments" in a 1982 proceedings paper by Ford and Smith.

One fact, presented in Kelley's dissertation, about the etymology of the term in this context: Dr. Kelley did originally have a definition for the “OZ” acronym (aside from the obvious parallels with the 1900 book The Wonderful Wizard of Oz by L. Frank Baum). “Offline Zero” was a reference to the fact that an experimenter (the “Wizard”) was interpreting the users’ inputs in real time during the simulation phase.

Similar experimental setups had occasionally been used earlier, but without the "Wizard of Oz" name. Design researcher Nigel Cross conducted studies in the 1960s with "simulated" computer-aided design systems where the purported simulator was actually a human operator, using text and graphical communication via CCTV. As he explained, "All that the user perceives of the system is this remote-access console, and the remainder is a black box to him. ... one may as well fill the black box with people as with machinery. Doing so provides a comparatively cheap simulator, with the remarkable advantages of the human operator's flexibility, memory, and intelligence, and which can be reprogrammed to give a wide range of computer roles merely by changing the rules of operation. It sometimes lacks the real computer's speed and accuracy, but a team of experts working simultaneously can compensate to a sufficient degree to provide an acceptable simulation." Cross later referred to this as a kind of Reverse Turing test.

Significance

The Wizard of OZ method (unlike the eponymous “wizard” in the film) is very powerful. In its original application, Dr. Kelley was able to create a simple keyboard-input natural language recognition system that far exceeded the recognition rates of any of the far more complex systems of the day.

The thinking current among many computer scientists and linguists at the time was that, in order for a computer to be able to “understand” natural language enough to be able to assist in useful tasks, the software would have to be attached to a formidable “dictionary” having a large number of categories for each word. The categories would enable a very complex parsing algorithm to unravel the ambiguities inherent in naturally produced language. The daunting task of creating such a dictionary led many to believe that computers simply would never truly “understand” language until they could be “raised” and “experience life” as humans, since humans seem to apply a life’s worth of experiences to the interpretation of language.

The key enabling factor for the first use of the OZ method was that the system was designed to work in a single context (calendar-keeping), which constrained the complexity of language encountered from users to the extent where a simple language processing model was sufficient to meet the goals of the application. The processing model was a two-pass keyword/keyphrase matching approach, based loosely on the algorithms employed in Weizenbaum’s famous Eliza program. By inducing participants to generate language samples in the context of solving an actual task (using a computer that they believed actually understood what they were typing), the variety and complexity of the lexical structures gathered was greatly reduced and simple keyword matching algorithms could be developed to address the actual language collected.

This first use of OZ was in the context of an iterative design approach. In the early development sessions, the experimenter simulated the system in toto, performing all the database queries and composing all the responses to the participants by hand. As the process matured, the experimenter was able to replace human interventions, piece by piece, with newly created developed code (which, at each phase, was designed to accurately process all the inputs that were generated in preceding steps). By the end of the process, the experimenter was able to observe the sessions in a “hands-off” mode (and measure the recognition rates of the completed program).

OZ was important because it addressed the obvious criticism:

Who can afford to use an iterative method to build a separate natural language system (dictionaries, syntax) for each new context? Wouldn’t you be forever adding new structures and algorithms to handle each new batch of inputs?

The answer turned out to be:

By using an empirical approach like OZ, anyone can afford to do this; Dr. Kelley’s dictionary and syntax growth reached asymptote (achieving from 86% to 97% recognition rates, depending on the measurements employed) after only 16 experimental trials and the resulting program, with dictionaries, was less than 300k of code.

In the 23 years that followed initial publication, the OZ method has been employed in a wide variety of settings, notably in the prototyping and usability testing of proposed user interface designs in advance of having actual application software in place.

References

Wizard of Oz experiment Wikipedia