A new tool for real-time visualization of acoustic sound fields
has been developed for CNMATs sound spatialization theatre. Unique
features of the theatre and the acoustic and volumetric modeling
software are described.
We have built a sound spatialization theatre into the main
performance and lecture space at the Center for New Music and
Audio Technologies. The theatre features a flexible suspension
system built primarily for loudspeakers.
Each speaker hangs from a rotating beam. The pivot point for
each speaker runs in a track that slides along rails bolted to
the ceiling. With height adjustment of each suspension cable,
this system safely allows speakers to be moved anywhere in the
room and oriented along two of the three possible axes. Rotational
symmetry of the concentric drivers in Meyer HM-1 speakers obviates
the need for adjustments around the third or "roll"
axis.
Rather than use subwoofers that are fewer in number and spatially
separated from the medium to high frequency speakers, we chose
to place a subwoofer at each of the 8 channel locations. Admittedly
this is not common practice, but when confronted with the question
of how to manage the delivery of low frequencies from several
primary speakers to a few spatially disassociated subwoofers it
became clear that our research interests would be better served
by having full range performance at each speaker location.
Real-time, low-latency audio signal processing for the speaker
array is performed using a multiprocessor Silicon Graphics Octane
workstation or a Macintosh PowerPC system each of which is equipped
with 8 discrete channels of digital-to-audio conversion.
Optimizing the speaker array positioning and sound processing
for each performance in the theatre is challenging. The traditional
empirical approach is far too time-consuming to support situations
in which there are weekly (and sometimes daily) performances with
varied configurations. It takes too long to evaluate the effects
of new speaker positions and software parameter changes for all
listening positions. It is easy to optimize the listening experience
for the lucky person in the "sweet spot" at the expense
of the rest of the audience. The challenge is to find a compromise
where as many listeners as possible experience the intent of the
sound designer and as few listeners as possible endure disastrous
seats.
To aid sound designers and composers in achieving a good compromise
for the diverse applications of the theatre, we have developed
software for visualizing speaker signals, a model of the acoustic
sound field in the room, and interpretations of the field according
to perceptual models. Important examples of prior work in this
area include visualization of acoustic simulation (Stettner and
Greenberg, 1989) and visualization of beam tracing (Monks, et
al., 1996). Unique features of the work described here include
the emphasis on interactive, real-time visualization, the use
of a highly configurable performance space, and the focus on adapting
the processing and space to achieve diverse artistic goals.
The visualization software is part of a complete system managing
audio, gestural flow and visual display. The heart of the system
is a database describing the room. It contains information on
geometric features such as the shape of the room, positioning
and orientation of the speakers, microphones and audience seating,
live performer locations and their musical instruments locations.
Acoustic properties of each object in the room include frequency
dependent radiation patterns and the location of their acoustic
"centers."
This database is used by the spatial sound processing software
to process source signals to create an audience percept of virtual
sources from arbitrary regions in space. The desired percept may
also involve creating the illusion that listeners are in a room
of a different size than the actual theatre (Lehnert and Blauert,
1991). The location of these sources is controlled in real-time
through gestures (Hand, 1997) or arbitrary control messages arriving
from the network (Wright and Freed, 1997).
The visualization software has access to the room database and
real-time parameter estimates from the spatialization software.
Since it has no access to the real sound pressure levels in the
room it must estimate these based on an acoustic model of the
room. The image source method was used (Heewon and Byung-Ho, 1988)
because of its amenability to real-time computation.
The reader is advised to explore the color images and animations
available on CNMATs web site [www.cnmat.berkeley.edu/AcousticVisualization]
for a clearer indication of the systems potential. Sound pressure
levels are shown using a color map on a horizontal cut plane through
the space. This movable plane is typically set to the average
positions of audience5s ears in the room. This surface may be
moved to show, for example, effects of tiered seating or to evaluate
the experience of a performer who may be standing on a raised
stage. Several simultaneous cut surfaces may be necessary, for
example, for balcony seating in large theatres.
It is interesting to contrast this volumetric visualization with
traditional audio metering where scalar signal levels are displayed
for various nodes in the signal processing chain. Such metering
is useful for managing signal levels in the electrical elements
of the audio system to avoid distortion and speaker overload.
However it is hard even for experienced sound engineers to use
scalar metering to predict actual sound pressure levels in many
locations in a venue.
A commonly adopted strategy for sound localization with speaker
arrays is the summing localization model (Blauert, 1997), known
in its general form as vector panning (Jot, 1999, Pulkki, 1997).
Virtual sources are placed between pairs of speakers by dosing
the signal level of the source appropriately for each speaker
(Chowning, 1970).
In CNMATs venue, vector panning failed to provide good virtual
source imaging for most of the audience. This may be explained
by the precedence effect, also known as the "law of the first
wavefront" (Blauert, 1997), which may work against summation
localization. As the difference in the time of arrival of wavefronts
from the two speakers increases towards one millisecond, the source
of the earliest wavefront is perceived as the actual source, regardless
of the amplitude dosing performed by vector-panning. Visualization
of an isosurface along which wavefront time difference is a constant
illustrates the geometric implications of this perceptual phenomenon.
In the figure below the listening locations between the two surfaces
have wavefront-arrival-time differences less than the value determining
the isosurfaces.
This representation is also effective with other important
time delay effects in spatial hearing such as the varied values
of the echo threshold, backward masking, and multiple event thresholds
(Blauert, 1997).
We have generalized this isosurface representation to multiple
speaker arrays by allowing the user to select appropriate subsets
of two speakers and simultaneously display the multiple surfaces.
One of the discouraging observations about these surfaces is that
optimal, i.e., precedence-effect-free, listening regions can be
quite small. Indeed, only a privileged few in an audience find
themselves in or near the "sweet spot." With the goal
of providing a spatial enveloping auditory experience to a larger
segment of the audience we have explored techniques to limit the
influence of the precedence effect.
Inspired by the "Clifton effect" (Clifton, 1987, Clifton
and Freyman, 1996) we have explored the use of roving time-delays
on the direct signals form each speaker to breakdown precedence.
Clifton and her associates have demonstrated that precedence breaks
down for a while when the delay structure is altered to favor
another speaker as the leading signal source. Precedence is then
reestablished with further stimulation from the new temporal configuration.
The idea behind the continual roving of speaker time-delays is
to continually inhibit the establishment of precedence. Initial
results of this technique appear promising.
We have also successfully applied the decorrelation techniques
developed by Gary Kendall and his associates (Kendall, 1995).
Here features of both the magnitude and phase spectra are made
to differ in the speakers where the vector panning is operative.
These efforts to reduce the effects of precedence so that a larger
number of people in the audience can have a compelling spatial
experience may have additional perceptual consequences. In particular,
the decorrelation techniques give rise to considerable ambiguity
as to the location of the source. Here there appears to be a real
trade-off between the enveloping nature of the spatial audio experience
and the precision of localization.
One recent application of the theatre is the "virtual
string quartet." We are able to spatialize four independent
sound sources in real time on a Macintosh G3. In the example illustrated
below the four sources are stored sounds of instrumentalists recorded
under anechoic conditions. We have extended this spatialization
technique to allow for the movement of the listener as well as
the sources. We have also experimented with the use of filters
to simulate the directivity of the instruments.
Instead of the mixing console we have begun to use the desktop
computer with multi-channel I/O in the diffusion of prerecorded
electro-acoustic music and in concerts with live performance.
With the current SGI and Macintosh G3 technologies we have achieved
sound I/O latencies solidly under 7 milliseconds, an acceptable
delay for many situations.
We gratefully acknowledge support from: Alias/Wavefront, Edmund
Campion, Edmund ONeill foundation, Gibson Guitar, Meyer Sound
and Silicon Graphics. Richard Andrews, Tom Johnson, Tibor Knowles
and Matt Wright developed the speaker mounting and audio patching
system for the theatre. René Caussé, Jean-Marc Jot
and John Meyer provided essential insights and data on room and
loudspeaker acoustics.
J. Blauert (1997), Spatial hearing: the psychophysics of
human sound localization. Cambridge: MIT Press.
J. M. Chowning (1970), "The simulation of moving sound sources,"
proceedings of the Audio Engineering Society 39th Convention,
New York, NY, USA.
R. K. Clifton (1987), "Breakdown of echo suppression in the
precedence effect," Journal of the Acoustical Society
of America, vol. 82, pp. 1834-1835.
R. K. Clifton and R. L. Freyman (1996), "The precedence effect:
Beyond echo suppression," in Binaural and spatial hearing,
R. Gilkey and T. Anderson, Eds.: Lawrence Erlbaum, Hilldale, NJ.
C. Hand (1997), "A survey of 3D interaction techniques,"
Computer Graphics Forum, vol. 16, num. 5, pp. 269-81.
L. Heewon and L. Byung-Ho (1988), "An efficient algorithm
for the image model technique," Applied Acoustics,
vol. 24, num. 2, pp. 87-115.
J. M. Jot (1999), "Real-time spatial processing of sounds
for music, multimedia and interactive human-computer interfaces,"
Multimedia Systems, vol. 7, num. 1, pp. 55-69.
G. S. Kendall (1995), "The decorrelation of audio signals
and its impact on spatial imagery," Computer Music Journal,
vol. 19, num. 4, pp. 71-87.
H. Lehnert and J. Blauert (1991), "Virtual auditory environment,"
proceedings of the Fifth International Conference on Advanced
Robotics. Robots in Unstructured Environments (Cat. No.91TH0376-4),
Pisa, Italy.
M. Monks, B. M. Oh, and J. Dorsey (1996), "Acoustic Simulation
and Visualization using a New Unified Beam Tracing and Image Source
Approach," proceedings of the Convention of the Audio Engineering
Society (1996).
V. Pulkki (1997), "Virtual sound source positioning using
vector base amplitude panning," Journal of the Audio Engineering
Society, vol. 45, num. 6, pp. 456-66.
A. Stettner and D. P. Greenberg (1989), "Computer graphics
visualization for acoustic simulation," proceedings of the
Conference Proceedings, Boston, MA, USA.
M. Wright and A. Freed (1997), "Open Sound Control: A New
Protocol for Communicating with Sound Synthesizers," proceedings
of the International Computer Music Conference, Thessaloniki,
Greece.