Perceptual Scheduling in Real-time Music and Audio Applications
PhD Dissertation

Wednesday April 18th, 2001  1:10 - 2:30 PST
405 Soda Hall (Fujitsu Seminar Room)

Amar Chaudhary
Computer Science Division - EECS
U.C. Berkeley


Slides [ pdf | ppt ]

Academic research of computer music and commercial sound systems is moving from special-purpose hardware towards software implementations on general-purpose computers. The enormous gains in general-purpose processor performance gives musicians and composers richer and more complex control of sound in their performances and compositions. Just a geometric modeling has given graphic designers more control of their scenes and objects, (e.g., independent control of size, position and texture), sound synthesis allows musicians more control of musical parameters such as duration, frequency and timbre. Examples of sound-synthesis algorithms include additive synthesis, resonance modeling, frequency-modulation (FM) synthesis and physical models. Applications, called synthesis servers, allow musicians to dynamically specify models for these algorithms and synthesize sound from them in real time in response to user input. A synthesis server is an expressive, software-only musical instrument.

However, the widespread use of synthesis servers has been frustrated by high computational requirements. This problem is particularly true of the sinusoidal and resonance models described in this dissertation. Typical sinusoidal and resonance models contain hundreds of elements, called partials, that together represent an approximation of the original sound. Even though computers are now running above the 1GHz clock rate, it is still not possible to use many large models in polyphonic or multi-channel settings. For example, a typical composition might include eight models with 120 partials each, or 960 partials total. Additionally, current operating systems do not guarantee quality of service (QoS) necessary for interactive real-time musical performance, particularly when the system is running at or near full computational capacity. Traditional approaches that pre-compute audio samples or perform optimal scheduling off-line do not lend themselves to musical applications that are built dynamically and must be responsive to variations in live musical performance.

We introduce a novel approach to reducing the computational requirements in real-time music applications, called perceptual scheduling, in which QoS guarantees are maintained using voluntary reduction of computation based on measures of perceptual salience. When a potential QoS failure is detected, the perceptual scheduler requests that the synthesis algorithms reduce computational requirements. Each algorithm reduces its computation using specific psychoacoustic metrics that preserve audio quality while reducing computational complexity.

This dissertation describes the perceptual scheduling framework and its application to musical works using additive synthesis and resonance modeling. Reduction strategies are developed based on the results of listening experiments. The reduction strategies and the perceptual scheduling framework are implemented in "Open Sound World,'' a prototype programming system for synthesis servers. This implementation is then tested on several short musical examples. The computation saved is measured for each example. The quality of the audio output from the servers with and without perceptual scheduling enabled is evaluated by human listeners in a controlled experiment. The results of this experiment have been encouraging. In one example, the average CPU time decreased by about 75%, yet listeners perceived little degradation in audio quality.

The perceptual scheduling framework can be applied to other compute-intensive algorithms in computer music, such as granular synthesis, pitch detection and sound spatialization. It can also be applied to other perceptually oriented computational tasks, such as real-time graphics and video processing.


Use the links below to watch an on-demand replay of this seminar:

RealMedia: Replay RealMedia
MBone 1Mbps: High Bitrate Replay with Mash Tools