Viewpoint Effects

Introduction
Theories of Object Recognition
Experiments on Viewpoint Invariance
Start Experiment Trials
Typical Results

Introduction

At left you see a number of images of a car, each one from a different viewpoint. Despite the fact that each image activates a radically different set of retinal receptors, you have no difficulty determining that they all depict the same object. This ability is known as viewpoint invariance or viewpoint constancy (the identity of the object remains invariant or constant across changes in viewpoint), and the means by which the visual system achieves such viewpoint constancy is a source of considerable debate in the perception literature.

In this activity, you will read about how competing theories of object recognition explain viewpoint constancy and how these theoretical disagreements have been addressed in experiments. Then you can complete some trials of a short demonstration experiment and see what the results of such an experiment look like.

Instructions

If this is your first visit to this activity, start by clicking the first link at left to read about theories of object recognition.

Theories of Object Recognition

One strategy that the visual system might use to achieve viewpoint invariance would be to store objects in memory in a “viewpoint-free” way. Biederman’s (1987) recognition by components model, described in the textbook, takes this approach. If we encoded a structural description of a car—a representation that says that it should have four wheels on the bottom, doors and windows above them, headlights in the front, etc.—then as long as we see all (or at least most) of these parts in the right configuration, we should be able to tell that the object is a car, no matter which viewpoint we see it in.

Theories like recognition by components require complex representations. However, once an object’s structural description is represented, it is relatively simple to match it to a description in memory and thus identify the object. An alternate theoretical approach posits that the visual system stores simpler representations, but uses complex procedures to match them to memory and recognize an object. A very simple version of this type of theory would say that we store raw images in memory and then “mentally rotate” new images until they come into register with something we remember. Because representations are tied to particular viewpoints in these theories, they are known as view-based theories.

Experiments on Viewpoint Invariance

Structural description and view-based theories make different predictions about how long it should take to recognize an object from a never-before-seen viewpoint. In view-based theories, an object seen from a novel view will require a certain amount of processing time in order to match the novel image with an image stored in memory. Furthermore, most view-based theories hold that it should take longer to recognize an object the further the object is rotated away from an encoded view.

In contrast, structural description theories predict that the time required to recognize an object is determined solely by how long it takes to build the object’s structural description. It generally should not matter, according to these theories, whether the object is seen from a known or a novel viewpoint. Thus, as long as all (or enough of) the object’s parts are in view, structural description theories predict that recognition time should be invariant with the object’s orientation.

It is difficult to test these predictions in experiments with common objects like cars, because participants will have already encountered these objects from many viewpoints before they enter an experiment (remember that the theories only make different predictions about what happens when an object is seen from a new viewpoint). Therefore, researchers have designed sets of “novel objects” to use in their experiments. The figure at left shows three such objects, which we will use in a demonstration experiment.

Beginning Experimental Trials...

Click text on left to begin an experimental trial. Fix your gaze on the cross that appears in the center of the rectangle. You will then see an image of an object. Examine it for as long as you like, then click on the image. Half a second later, another image will come up. Your task is to decide whether the two images were of the same object (possibly depicted from different viewpoints), or were of two different objects.

Click the “Same” link if you think the two images were of the same object.

Click the “Different” link if you think they were of different objects.

Try to respond as quickly and as accurately as you can.

Click on “Typical Results” to the left to skip the trials and see typical results of this type of experiment.

Typical Results

The graph at left shows results from 24 participants in an experiment very similar to (but considerably longer than!) the one you just did. In the demonstration experiment that you did, the first image in the trial was always the one shown in the upper-left corner of the graph. The second image was sometimes of a different object (these trials are not of great theoretical interest) and sometimes of the same object. For “same” trials, the second image depicted the object from one of the views shown on the x-axis. Note that as you move from left to right along this axis, the viewpoint of the second image becomes more and more dissimilar to the viewpoint of the initial image.

In the graph, the y-axis represents the time (in milliseconds) to respond that the two images were of the same object, given the changes in viewpoint indicated on the x-axis. Note that it took participants longer to conclude that the two images were of the same object the more the second view was rotated relative to the first. This pattern indicates that the visual system did not recognize the objects by extracting a viewpoint-free structural description. Instead, it appears that the first image was represented in a view-based manner, then some time-consuming process was used to match the second image to the representation of the first image.

While this experiment (and many more like it) support view-based theories, other types of experiments seem to favor structural description theories. The debate between the two theoretical camps has been spirited, with both sides constantly generating new experimental evidence in support of their views. As in many scientific debates, the end result will most likely be that neither camp is either completely right or completely wrong, and that the visual system uses representations that are tied to viewpoint in some ways and tied to structural descriptions in other ways.

Show Navigation