Created on 19 June 1997.
Last modified on 5 August 2000.

Shepard's Tones

Your browser does not appear to be capable of running Java applets.

Introduction
Applet instructions
Discussion

Main illusion index
Next stop: full tour -:- resolving ambiguity

Introduction

In 1964, the psychologist Roger N. Shepard published a paper entitled Circularity in Judgements of Relative Pitch in the Journal of the Acoustical Society of America [Shepard-64]. It described the set of tones reproduced here. These tones were crafted by Shepard to eliminate all relative pitch discrimination information. As a result, when played in sequence, each tone sounds higher than all tones preceding it and lower than all tones following it (and vice versa when the sequence is played in the opposite order). Since there are only twelve tones in the sequence, played in a continuous loop, every tone sounds both higher and lower than every other at some point in the sequence.

This phenomenon was used as early as the late nineteenth century in orchestral works. Other variations have also been produced. For example, Jean Claude Risset has produced a rhythmic variant in which tempo appears to increase (or decrease) continuously.

Many follow-up studies have been done. One of the more interesting is documented in a paper entitled Some Observations on the Auditory Staircase Illusion in Perceptual and Motor Skills (volume 39, 1974, pp. 212-214), which describes how the illusion breaks down when the pause between tones is eliminated. Depending on the speed of your computer, you may be able to use the demo applet to interactively explore some of the suggestions made in the paper.

The interactive demonstration

Depending on the speed of your network connection it may take some time for all of the tones to be retrieved. The progress is displayed in red immediately following the main title of this page. When loading is complete, the message line is replaced with a selectable link that can be used to launch the demo applet.

The basic demo involves nothing more than pressing the Start button in the applet window and listening as the tones are played. Each one should clearly be higher in pitch than those that preceded it, but the overall range of pitches will not increase appreciably no matter how long you listen. A pair of radio buttons labelled Up and Down controls the direction in which the sequence is played. You should try both directions so that you will have no doubt that there is no danger of confusing a rising and a falling pitch.

The sequence can be stopped again using the Stop button. When stopped, you can play the tones one at a time (in the currently selected direction) using the Step button.

The Speed slider controls the length of the pause between the playing of successive tones. Move it right to decrease the pause. If your computer is fast enough, you can eliminate the pause entirely (if not, you will just create a muddled sequence that staggers along intermittently). When the sequence is played smoothly at this speed the illusion usually breaks down and the transition to a lower pitch is distinctly audible.

The remaining two controls, Display and Components are useful, in conjunction with the figures displayed in the window, for explaining the cause of the illusion. The top figure always shows the first tone of the sequence. The diagram shows pitch (log frequency) on the horizontal axis and amplitude (loudness) on the vertical axis. Each red vertical bar represents a single frequency component. Initially there are six such components comprising the single complex (and therefore shrill sounding) tone. Each component is an octave higher (a factor of two higher in frequency) than the one to its immediate left.

When the Display switch is selected the bottom figure shows the most recently played tone, whose number is displayed in the label at the top of the vertical axis. As the sequence progresses, one can see how the amplitude envelope, drawn in blue, remains stationary as the frequency components shift right (or left). At each step the components move a semitone to the right, and so the overall perception of the tone should increase by a semitone; there is no illusion in this respect. As the progression continues, however, the stationary amplitude envelope forces the higher components to diminish in loudness and the lower components to grow. The average spectral content therefore remains roughly the same as the increasing pitch of each component is offset by the diminishing contribution of the higher tones and increasing contribution of the lower ones.

The Components control allows you to select the number of components in each tone. When the tones contain only two components, it is easy to hear the lower tone come to dominate as the two progress towards the right. On the other hand, the pitch is not heard to fall in the transition from the twelfth tone to the first because the component in the middle at that point dominates the tone that disappears from the right end and reappears at the left. When only three components are present, however, the affect of the growing low frequency components is already masked quite effectively. With six components, only trained musicians (and others with similar experience) can distinguish the individual components.

As for all demo applets, the Dismiss button at the bottom of the applet window will remove the window from the screen. Selecting the APPLET link at the top of this page will then return it to the screen just as you left it.

Discussion

In the field of Human-Computer Interaction, there is a small branch that investigates the use of non-speech audio in the interface. The goal of many researchers in this area is to create a rich acoustical environment in which useful information is conveyed through sound. To accomplish this, the information must be encoded in some way. A common suggestion is to encode the information using pitch: high tones mean one thing, low tones mean another. This demonstration shows that care must be taken, if such an encoding is used, to retain relative pitch discrimination cues. It may be argued that such contrived acoustical spectra will never arise naturally, but complex spectra (i.e., different timbres) provide an effective tool in perceptually separating sound sources that are intended to be logically distinct. When these spectra must be synthesized, computational speed limits constrain their complexity. It is in such circumstances that contrived spectra are likely to arise.

Scott Flinn (flinn@cs.ubc.ca)