3D Virtual Sound Project: Introduction

ejhong@cs.stanford.edu

Introduction

The addition of 3D sound to a virtual environment has several significant advantages. Spatialized sound creates a more realistic and immersive experience for the user. It can be used to cue a user to events occurring within the user's visual field of view (i.e. a virtual cue stick making contact with a virtual billiard ball) or to cue the user to events occuring outside the visual field. In addition, sound provides information about source distance and the surrounding environment. The goal of this project was to implement a 3D audio library for use with a virtual reality system that would allow a user to perceive localized sound in real time. This was achieved by convolving multiple sound sources with head related transfer functions in real time and using pre-processed interleaved sound files. The system does not do any complex environmental context simulation (as this is quite costly to do) but allows for simple simulation of reverb by using pre-processed sound files that have been convolved with a synthetic room reverberation response.

Human Sound Localization

Some of the factors which have been identified as contributing to our ability to localize sound are interaural time differences (ITD), interaural intensity differences (IID), head and source movement cues and spectral shaping by the outer ears [Begault94]. A sound source positioned off the plane that splits the head between the eyes will produce a wavefront that travels different distances to the listener's eardrums. This difference of distance results in a different time of arrival at the eardrums (the ITD) and a difference in intensity due to shadowing by the head (the IID). Head and source movement provide more information about the location of a sound since we combine our known head motion with the effect it has on the perceived sound. This is especially important for resolving front-back ambiguity and up-down ambiguity. However, the three factors listed above do not completely explain our ability to localize sound. Assuming a spherical head, there are generally two possible sound source positions given a set of IID and ITD cues. Yet, humans generally are able to distinguish the locations of these sounds even without head movement. One explanation that has been identifed is the spectral shaping of a sound that occurs due to the shape of our outer ears or pinnae [Gardner73].

The Head Related Transfer Function

The shape of our outer ears causes the spectrum and timing of a sound source to be modified before reaching our eardrums. This filtering by the pinnae is called the head related transfer function (HRTF). The HRTF will vary depending upon the relative position between the pinnae and the sound source. (HRTFs encompass both IID and ITD cues). Since HRTFs are unique for different relative positions, they allow us to resolve front-back and up-down sound source location ambiguity. HRTF measurements can be made by placing a microphone in the ear canal of an individual or a dummy and recording an analytic impulse signal at various azimuth and elevation locations. These HRTFs can then be applied to a 3D virtual audio system by convolving an input sound source with the HRTF measured for the given relative position between listener and source. This should have the effect of creating the perception of the sound source at the given relative position. Because all of our pinnae are different, ideally HRTFs should be individually measured. However, this is impractical for obvious reasons. This project used the HRTFs made available to the public by Bill Gardner and Keith Martin of the MIT Media Lab which were measured using a KEMAR dummy head. I have found that most individuals were able to adequately perceive localized sound using this set of HRTFs given adequate head or source movement cues.

The Perception of Distance

Under anechoic conditions, the intensity of a sound source decreases according to the inverse square law as the distance between listener and source increases. However, under reverberant conditions, the overall intensity of a sound source often does not vary much with listener position. The change in the proportion of reverberant sound to direct sound (the R/D ratio) generally produces the perception of distance [Sheeline82]. In addition, frequency dependent modifications occur as a function of distance.

Back to overview


e-mail: ejhong@cs.stanford.edu

Last modified: March 20, 1996 by Eugene Jhong