Eye movement in scene viewing

Updated on Apr 25, 2026

Edit

Comment

Eye movement in scene viewing refers to the visual processing of information presented in scenes. This phenomenon has been studied in a range of areas such as cognitive psychology and psychophysics, where eye movement can be monitored under experimental conditions. A core aspect in these studies is the division of eye movements into saccades, the rapid movement of the eyes, and fixations, the focus of the eyes on a point. There are several factors which influence eye movement in scene viewing, both the task and knowledge of the viewer (top-down factors), and the properties of the image being viewed (bottom-up factors). The study of eye movement in scene viewing helps to understand visual processing in more natural environments.

Typically, when presented with a scene, viewers demonstrate short fixation durations and long saccade amplitudes in the earlier phases of viewing an image, representing ambient processing. This is followed by longer fixations and shorter saccades in the latter phases of scene viewing, representing focal processing (Pannasch et al., 2008).

Eye movement behaviour in scene viewing differs between different levels of cognitive development. Fixation durations shorten and saccade amplitudes lengthen with the increase in age. In children, the development of saccades to the amplitude normally found in adults have occur earlier (4–6 years old) than the development of fixation durations (6–8 years old). Yet, the typical pattern of behaviour during scene viewing, when progressing from ambient processing to focal processing, has been observed to occur from the age of 2 years old (Helo, Pannasch, Sirri & Rämä, 2014).

Spatial variation

There are particular factors which affect where eye movements fixate upon, these include bottom-up factors inherent to the stimulus, and top-down factors inherent to the viewer. Even an initial glimpse of a scene has been found to generate an abstract representation of the image that can be stored in memory for use in subsequent eye movements (Castelhano & Henderson, 2007).

In bottom-up factors, eye guidance can be affected by the local contrast or salience of features in an image (Itti & Koch, 2000). An example of this would be an area with a large difference in luminance (Parkhurst et al., 2002), a greater density of edges (Mannan, Ruddock & Wooding, 1996) or binocular disparity determining the distance of different objects on the scene (Jansen et al., 2009).

The top-down factors of scenes have more of an impact than bottom-up features in affecting fixation positions. Behaviourally relevant information that are more interesting in a scene is more salient than low-level features, drawing fixations more frequently and more quickly from scene onset (Onat, Açik, Schumann & König, 2014). Local scene colour in a fixation position has an influence on where fixations occur. The presence of colour can increase the likelihood of the item being processed as a semantic object as it can aid the discrimination of the object, making it more interesting to view (Amano & Foster, 2014). When viewers are semantically primed by being presented with consistently similar scenes, the density of fixations increase, and fixation durations decrease (Henderson, Weeks Jr., & Hollingworth, 1999).

Information separate to what is presented in a scene also has an effect on the area being fixated upon. Eye movements can be guided anticipatorily by linguistic input, where if an item in the scene is presented verbally, the listener will be more likely to move their visual focus to that object (Staub, Abott & Bogartz, 2012). With regard to factors relating to viewers rather than the scene, differences have been found in cross-cultural research. Westerners have an inclination to concentrate on focal objects in a scene, where they look at focal objects more often and quicker in comparison to East Asians who attend more to contextual information, where they make more saccades to the background of the scene (Chua, Boland & Nisbett, 2002).

Temporal variation

Regarding the temporality of fixations, average fixation durations last for 300ms on average, although there is a large variability around this approximation. Some of this variability can be explained through global properties of an image, impacting upon both bottom-up processing and top-down processing.

During natural scene viewing, the masking of an image by replacing it with a grey field during fixations has an increase in fixation durations (Henderson & Pierce, 2008). More subtle degradations of an image on fixation durations, such as the decrease in luminance of an image during fixations, also increases the length of fixation durations (Henderson, Nuthmann & Luke, 2013). An asymmetric effect is shown where the increase of luminance also increases fixation durations (Walshe & Nuthmann, 2014). However, the change in factors affecting top-down processing, such as blurring or phase noise, increases fixation durations when used to degrade a scene and decreases fixation durations when used to enhance a scene (Henderson, Olejarczyk, Luke & Schmidt, 2014; Einhäuser et al., 2006).

Furthermore, temporal and spatial aspects interact in a complex manner. When a picture is first presented on the screen, fixations made within the first second are more likely to be directed toward the left side of the scene, whereas the opposite holds true for the remaining part of the presentation (Ossandón et al., 2014).

References

Eye movement in scene viewing Wikipedia

(Text) CC BY-SA

Contents

Spatial variation

Temporal variation

References