
3D Virtual Sound Project: Introduction
ejhong@cs.stanford.edu

Introduction
The addition of 3D sound to a virtual environment has several significant
advantages. Spatialized sound creates a more realistic and immersive
experience for the user. It can be used to cue a user to events occurring
within the user's visual field of view (i.e. a virtual cue stick making
contact with a virtual billiard ball) or to cue the user to events occuring
outside the visual field. In addition, sound provides information about
source distance and the surrounding environment. The goal of this project
was to implement a 3D audio library for use with a virtual reality
system that would allow a user to perceive localized sound in real time.
This was achieved by convolving multiple sound sources with head related
transfer functions in real time and using pre-processed interleaved sound
files. The system does not do any complex environmental context simulation
(as this is quite costly to do) but allows for simple simulation of reverb by
using pre-processed sound files that have been convolved with a synthetic room
reverberation response.
Human Sound Localization
Some of the factors which have been identified as contributing to our
ability to localize sound are interaural time differences (ITD), interaural
intensity differences (IID), head and source movement cues and spectral shaping
by the outer ears [Begault94].
A sound source positioned off the plane that splits the
head between the eyes will produce a wavefront that travels different
distances to the listener's eardrums. This difference of distance results in
a different time of arrival at the eardrums (the ITD) and a difference in
intensity due to shadowing by the head (the IID). Head and source movement
provide more information about the location of a sound since we combine
our known head motion with the effect it has on the perceived sound. This
is especially important for resolving front-back ambiguity and up-down
ambiguity. However, the three factors listed above do not completely
explain our ability to localize sound. Assuming a spherical head, there
are generally two possible sound source positions given a set of IID and
ITD cues. Yet, humans generally are able to distinguish the locations of
these sounds even without head movement. One explanation that has been
identifed is the spectral shaping of a sound that occurs due to the
shape of our outer ears or pinnae [Gardner73].
The Head Related Transfer Function
The shape of our outer ears causes the spectrum and timing of a sound
source to be modified before reaching our eardrums. This filtering by
the pinnae is called the head related transfer function (HRTF). The
HRTF will vary depending upon the relative position between the pinnae
and the sound source. (HRTFs encompass both IID and ITD cues).
Since HRTFs are unique for different relative positions, they allow us to
resolve front-back and up-down sound source location ambiguity.
HRTF measurements can be made by placing a microphone in the ear canal
of an individual or a dummy and recording an analytic impulse signal
at various azimuth and elevation locations. These HRTFs can then be
applied to a 3D virtual audio system by convolving an input sound source
with the HRTF measured for the given relative position between listener
and source. This should have the effect of creating the perception of
the sound source at the given relative position.
Because all of our pinnae are different, ideally HRTFs should be individually
measured. However, this is impractical for obvious reasons. This project
used the HRTFs made available to the public by Bill Gardner and Keith Martin
of the MIT Media Lab
which were measured using a KEMAR dummy head. I have found that most
individuals were able to adequately perceive localized sound using this
set of HRTFs given adequate head or source movement cues.
The Perception of Distance
Under anechoic conditions, the intensity of a sound source decreases
according to the inverse square law as the distance between listener and
source increases. However, under reverberant conditions, the overall
intensity of a sound source often does not vary much with listener position.
The change in the proportion of reverberant sound to direct sound (the R/D
ratio) generally produces the perception of distance [Sheeline82].
In addition, frequency dependent modifications occur as a function of distance.
Back to overview
e-mail:
ejhong@cs.stanford.edu
Last modified: March 20, 1996 by Eugene Jhong