A new tool for real-time visualization of acoustic sound fields has been developed for a new sound spatialization theatre. The theatre is described and several applications of the acoustic and volumetric modeling software are presented.
The Center for New Music and Audio Technologies, CNMAT, is an interdisciplinary research center at the University of California at Berkeley. Our sound spatialization theatre is built into the main performance and lecture space at CNMAT's facility.
A unique feature of the theatre is a flexible suspension system built primarily for loudspeakers. Each speaker hangs from a rotating beam. The pivot point for each speaker runs in a track that slides along rails bolted to the ceiling. With height adjustment of each suspension cable, this system safely allows speakers to be moved anywhere in the room and oriented along two of the three possible axes.
Rotational symmetry of the concentric drivers in Meyer HM-1 speakers obviates the need for adjustments around the third or "roll" axis.
Rather than use subwoofers that are fewer in number and spatially separated from the medium to high frequency speakers, we chose to place a subwoofer at each of the 8 channel locations. Admittedly this is not common practice, but when confronted with the question of how to manage the delivery of low frequencies from several primary speakers to a few spatially disassociated subwoofers it became clear that our research interests would be better served by having full range performance at each speaker location.
Real-time, low-latency audio signal processing for the speaker
array is performed using a multiprocessor Silicon Graphics Octane
workstation or a Macintosh PowerPC system each of which is equipped
with 8 discrete channels of digital-to-audio conversion.
Most current applications of spatial audio are based on a model where source material is spatially encoded for an ideal room with a predetermined speaker geometry. The result is often unsatisfactory because of the difficulty in adapting real rooms to the ideal. We are working on a more general model where the source material may be from instruments and performers in the room, and therefore real-time spatial processing is required for all sources.
Optimizing the speaker array positioning and sound processing for each performance in the theatre is challenging. The traditional empirical approach is far too time-consuming to support situations in which there are weekly (and sometimes daily) performances with varied configurations. The problem with the trial and error approach is the difficulty of evaluating the effects of new speaker positions and software parameter changes for all listening positions. It is easy to optimize the listening experience for the lucky person in the "sweet spot" at the expense of the rest of the audience. The challenge is to find a compromise where as many listeners as possible experience the intent of the sound designer and as few listeners as possible endure disastrous seats.
To aid sound designers and composers in achieving a good compromise for the diverse applications of the theatre, we have developed software for visualizing source signals, a model of the acoustic sound field in the room, and interpretations of the field according to perceptual models. Important examples of prior work in this area include and . Unique features of the work described here include the emphasis on interactive, real-time visualization, the use of a highly configurable performance space, and the focus on adapting the processing and space to achieve diverse artistic goals.
The visualization software is part of a complete system managing
audio, gestural flow and visual display. The heart of the system
is a database describing the room. It contains information on
geometric features such as the shape of the room, positioning
and orientation of sources, microphones and audience seating,
live performer location and their musical instrument's location.
Acoustic properties of each object in the room include: frequency
dependent radiation patterns and the location of their acoustic
This database is used by the spatial sound processing software to process source signals to create an audience percept of virtual sources from arbitrary regions in space. The desired percept may also involve creating the illusion that listeners are in a room of a different size than the actual theatre . The location of these sources is controlled in real-time through gestures or arbitrary control messages arriving from the network .
The visualization software has access to the room database and real-time parameter estimates from the spatialization software. Since it has no access to the real sound pressure levels in the room it must estimate these based on an acoustic model of the room. The image source method was used because of its amenability to real-time computation.
Volumetric visualization of the time varying sound pressure level in CNMAT's sound spatialization theatre is illustrated in Figure 1. The sound sources in this case are organ pipes. Pressure is shown using a color map on horizontal cut planes through the space. These movable planes are typically set to the average positions of audience's and performer's ears. Multiple simultaneous cut surfaces may be necessary, for example, for balcony seating in large theatres. It is interesting to contrast this volumetric visualization with traditional audio metering where scalar signal levels are displayed for various nodes in the signal processing chain. Such metering is useful for managing signal levels in the electrical elements of the audio system to avoid distortion and speaker overload. However it is hard even for experienced sound engineers to use scalar metering to predict actual sound pressure levels in many locations in a venue.
A commonly adopted strategy for sound localization with speaker
arrays is the summing localization model (Blauert, 1997), known
in its general form as vector panning (Jot, 1999, Pulkki, 1997).
Virtual sources are placed between pairs of speakers by dosing
the signal level of the source appropriately for each speaker
In CNMATs venue, vector panning failed to provide good virtual
source imaging for most of the audience. This may be explained
by the precedence effect, also known as the "law of the first
wavefront" (Blauert, 1997), which may work against summation
localization. As the difference in the time of arrival of wavefronts
from the two speakers increases towards one millisecond, the source
of the earliest wavefront is perceived as the actual source, regardless
of the amplitude dosing performed by vector-panning. Visualization
of an isosurface along which wavefront time difference is a constant
illustrates the geometric implications of this perceptual phenomenon.
In the figure below the listening locations between the two surfaces
have wavefront-arrival-time differences less than the value determining
This representation is also effective with other important
time delay effects in spatial hearing such as the varied values
of the echo threshold, backward masking, and multiple event thresholds
We have generalized this isosurface representation to multiple speaker arrays by allowing the user to select appropriate subsets of two speakers and simultaneously display the multiple surfaces.
One of the discouraging observations about these surfaces is that optimal, i.e., precedence-effect-free, listening regions can be quite small. Indeed, only a privileged few in an audience find themselves in or near the "sweet spot." With the goal of providing a spatial enveloping auditory experience to a larger segment of the audience we have explored techniques to limit the influence of the precedence effect.
Inspired by the "Clifton effect" (Clifton, 1987, Clifton and Freyman, 1996) we have explored the use of roving time-delays on the direct signals form each speaker to breakdown precedence. Clifton and her associates have demonstrated that precedence breaks down for a while when the delay structure is altered to favor another speaker as the leading signal source. Precedence is then reestablished with further stimulation from the new temporal configuration. The idea behind the continual roving of speaker time-delays is to continually inhibit the establishment of precedence. Initial results of this technique appear promising.
We have also successfully applied the decorrelation techniques developed by Gary Kendall and his associates (Kendall, 1995). Here features of both the magnitude and phase spectra are made to differ in the speakers where the vector panning is operative.
These efforts to reduce the effects of precedence so that a larger number of people in the audience can have a compelling spatial experience may have additional perceptual consequences. In particular, the decorrelation techniques give rise to considerable ambiguity as to the location of the source. Here there appears to be a real trade-off between the enveloping nature of the spatial audio experience and the precision of localization.
Acoustic models have to take phase from coherent sources of sound into account. Figure 3 shows the sound pressure level of a sine tone at a particular frequency in the room. At low frequencies destructive and constructive interference create markedly different sound levels around the room.
When a loudspeaker is placed close to a hard wall, reflected waves interfere with the direct source, distorting the frequency and phase response of the loudspeaker. These effects are modeled by introducing the reflections as further sources, as illustrated by the smaller speakers outside the room in Figure 4.
The high level programming tool that binds the spatial visualization system together is Tcl/Tk , a scripting language and graphical user interface toolkit.
The Visualization Toolkit (VTK) , a C++ class library for visualizing data, provides a set of bindings to the Tcl language that allow access to all the classes in the system. Through this scriptable interface, it is possible to create visualizations which change interactively and dynamically in response to user input data. A VTK data object is maintained internally for each sample region in the listening space. This data object is synchronized with the sample points in the region.
Interactive tasks such as moving and orientating sound sources take advantage of user-interface event bindings in Tk. Real-time operation based on monitoring signals being supplied to the sound sources is achieved by a Tcl thread that repeatedly requests current energy estimates, computes the acoustic model and visualization of that model, and renders the scene.
Three kinds of objects are modeled: active sound sources for loudspeakers and musical instruments, passive reflective objects for walls and diffusers, and finally listening points.
Listening space geometry is represented using a set of polygonal faces corresponding to walls, ceilings, floor etc. Each face is decorated with information describing its sound properties, such as frequency dependent reflection coefficients. Small room models can be easily described numerically. More complicated models may be imported from a specialized 3D modeling package. Maya is interesting in this respect because it supports storage of arbitrary data (i.e. acoustic) in nodes of its scene graph.
Listening points are represented as two-dimensionally sampled bounded surfaces in three-dimensional space. This representation allows for fine sampling at important locations without the computational load that would be required for complete volumetric models. Common surfaces used include cut planes corresponding to ear level of seated listeners and performers; and meshes of planes for tiered seating.
Sources are represented using as polygonal models. Each model is decorated with information describing its acoustic properties such as its acoustic center location and frequency-dependent directivity.
In this acoustic modeling technique, information is computed for each listening point in turn. Conceptually, lines are projected from the listening point to the acoustic center of each source and to the acoustic center of reflections of each source from the passive reflecting objects in the space. The lengths of these lines are used to estimate energy reaching the listening point. The solid angles of each line are used to compute the effect of source directivity, and energy loss as a function of frequency and angle of incidence. This method leads to the following expression for the space-complexity of modeling up to the n'th order reflections of a listening space with f faces of defining geometry and s direct sound sources.
Note that when calculating the next successive order of reflections, a source r which was created by reflecting across a face g culls away face g for the next iteration. This must be true for the following reason: g's surface normal must have been facing s in order for the reflection to have been performed. So for this new virtual source r, it must be the case that g's surface normal now points away from s and is discarded from consideration for reflections. This explains the (f-1) term above. However, in more complex models, particularly those that possess non-convex geometry, more than just the previous face will be culled for a given reflected source. Thus, the expression above is an upper bound on the maximum number of sources that could possibly be generated when calculating reflections.
Separate computation and display of direct and reverberant energy is simple to achieve with image source models by introducing upper and lower "reflection limits". The upper reflection limit terminates the reflection-generation process -- essentially halting the recursive process by which source reflections are generated. The display software maintains an active source list. consisting of all the sources, both direct and virtual, which exist between the upper and lower reflection limits, inclusively. A lower reflection limit of zero indicates that the direct sources should be included in the active source list. Setting the upper and lower reflection limits equal to one another allows the acoustic power from a single order of reflections to be modeled.
Once the geometric implications of the relative positions of sources, listening points, and reflecting objects are calculated, the actual acoustic modeling calculation can be performed. The simplest computation uses sine wave probe tones directly calculating and summing vectors for the phase and amplitude of wave fronts arriving at each listening point. For real-time modeling an optimization is required. We use energy estimates of adjacent frequency bands averaged at the visual display rate, to avoid the expense of a sequence of convolutions at the full audio sample rate. This method allows for plausible approximations of energy, although pathological locations where cancellations may occur would not be accurately displayed.
Multiple Source Spatialization
One recent application of the theatre is the "virtual string quartet." We are able to spatialize four independent sound sources in real time on a Macintosh G3. In the example illustrated below the four sources are stored sounds of instrumentalists recorded under anechoic conditions. We have extended this spatialization technique to allow for the movement of the listener as well as the sources. We have also experimented with the use of filters to simulate the directivity of the instruments.
Instead of the mixing console we have begun to use the desktop computer with multi-channel I/O in the diffusion of prerecorded electro-acoustic music and in concerts with live performance. With the current SGI and Macintosh G3 technologies we have achieved sound I/O latencies solidly under 7 milliseconds, an acceptable delay for many situations.
Conclusion and Directions
The visualization system described here is a valuable tool for spatial sound researchers. sound engineers and composers using CNMAT's sound spatialization theatre. Our current direction is to integrate this system into the OSedit framework and a develop a new SDIF representation for spatial audio.
We gratefully acknowledge support from: Alias/Wavefront, Edmund Campion, Edmund ONeill foundation, Gibson Guitar, Meyer Sound and Silicon Graphics. Richard Andrews, Tom Johnson, Tibor Knowles and Matt Wright developed the speaker mounting and audio patching system for the theatre. René Caussé, Jean-Marc Jot and John Meyer provided essential insights and data on room and loudspeaker acoustics.
J. Blauert (1997), Spatial hearing: the psychophysics of human sound localization. Cambridge: MIT Press.
J. M. Chowning (1970), "The simulation of moving sound sources," proceedings of the Audio Engineering Society 39th Convention, New York, NY, USA.
R. K. Clifton (1987), "Breakdown of echo suppression in the precedence effect," Journal of the Acoustical Society of America, vol. 82, pp. 1834-1835.
R. K. Clifton and R. L. Freyman (1996), "The precedence effect: Beyond echo suppression," in Binaural and spatial hearing, R. Gilkey and T. Anderson, Eds.: Lawrence Erlbaum, Hilldale, NJ.
C. Hand (1997), "A survey of 3D interaction techniques," Computer Graphics Forum, vol. 16, num. 5, pp. 269-81.
L. Heewon and L. Byung-Ho (1988), "An efficient algorithm for the image model technique," Applied Acoustics, vol. 24, num. 2, pp. 87-115.
J. M. Jot (1999), "Real-time spatial processing of sounds for music, multimedia and interactive human-computer interfaces," Multimedia Systems, vol. 7, num. 1, pp. 55-69.
G. S. Kendall (1995), "The decorrelation of audio signals and its impact on spatial imagery," Computer Music Journal, vol. 19, num. 4, pp. 71-87.
H. Lehnert and J. Blauert (1991), "Virtual auditory environment," proceedings of the Fifth International Conference on Advanced Robotics. Robots in Unstructured Environments (Cat. No.91TH0376-4), Pisa, Italy.
M. Monks, B. M. Oh, and J. Dorsey (1996), "Acoustic Simulation and Visualization using a New Unified Beam Tracing and Image Source Approach," proceedings of the Convention of the Audio Engineering Society (1996).
V. Pulkki (1997), "Virtual sound source positioning using vector base amplitude panning," Journal of the Audio Engineering Society, vol. 45, num. 6, pp. 456-66.
A. Stettner and D. P. Greenberg (1989), "Computer graphics visualization for acoustic simulation," proceedings of the Conference Proceedings, Boston, MA, USA.
M. Wright and A. Freed (1997), "Open Sound Control: A New Protocol for Communicating with Sound Synthesizers," proceedings of the International Computer Music Conference, Thessaloniki, Greece. A. Freed, "Codevelopment of user interface, control and digital signal processing with the HTM environment," presented at 5th International Conference on Signal Processing Applications and Technology, Dallas, TX, USA, 1994.
 A. Freed, X. Rodet, and P. Depalle, "Synthesis and control of hundreds of sinusoidal partials on a desktop computer without custom hardware," presented at Fourth International Conference on Signal Processing Applications and Technology ICSPAT '93, Santa Clara, CA, USA, 1993.
 A. Freed, "Real-Time Inverse Transform Additive Synthesis for Additive and Pitch Synchronous Noise and Sound Spatialization," presented at AES 104th Convention, San Francisco, CA, 1998.
 M. Bosi and S. E. Forshay, "High quality audio coding for HDTV: an overview of AC-3," presented at International Workshop on HDTV '94, Turin, Italy, 1994.
 M. Feibus, "Microsoft's DirectSound," Windows Sources, pp. 203(3), 1996.
 A. Stettner and D. P. Greenberg, "Computer graphics visualization for acoustic simulation," presented at Conference Proceedings, Boston, MA, USA, 1989.
 M. Monks, B. M. Oh, and J. Dorsey, "Acoustic Simulation and Visualization using a New Unified Beam Tracing and Image Source Approach," presented at Convention of the Audio Engineering Society (1996), 1996.
 H. Lehnert and J. Blauert, "Virtual auditory environment," presented at Fifth International Conference on Advanced Robotics. Robots in Unstructured Environments (Cat. No.91TH0376-4), Pisa, Italy, 1991.
 C. Hand, "A survey of 3D interaction techniques," Computer Graphics Forum, vol. 16, pp. 269-81, 1997.
 M. Wright and A. Freed, "Open Sound Control: A New Protocol for Communicating with Sound Synthesizers," presented at International Computer Music Conference, Thessaloniki, Greece, 1997.
 L. Heewon and L. Byung-Ho, "An efficient algorithm for the image model technique," Applied Acoustics, vol. 24, pp. 87-115, 1988.
 J. Blauert, Spatial hearing : the psychophysics of human sound localization. Cambridge: MIT Press, 1997.
 V. Pulkki, "Virtual sound source positioning using vector base amplitude panning," Journal of the Audio Engineering Society, vol. 45, pp. 456-66, 1997.
 J. M. Chowning, "The simulation of moving sound sources," presented at Audio Engineering Society 39th Convention, New York, NY, USA, 1970.
 G. S. Kendall, "The decorrelation of audio signals and its impact on spatial imagery," Computer Music Journal, vol. 19, pp. 71-87, 1995.
 J. K. Ousterhout, Tcl and the Tk toolkit. Reading, Mass.: Addison-Wesley, 1994.
 W. J. Schroeder, K. M. Martin, and W. E. Lorensen, "The design and implementation of an object-oriented toolkit for 3D graphics and visualization," , 1996.
 Alias/WaveFront, "Maya 1.0,". Toronto, Canada: Alias/WaveFront, 1998.