Wessel, David, wessel@cnmat.berkeley.edu
Wright, Matthew, matt@cnmat.berkeley.edu
Khan, Shafqat Ali [no email]

Center for New Music and Audio Technologies
1750 Arch Street
Berkeley, CA 94709
phone (510) 643-9990, fax (510) 642-7918

ICMC 1998 Long Paper Proposal

Preparation for Improvised Performance: Machine Learning, Knowledge Representation, Listening, and Synthesis in Collaborations with a "Kyal" Singer

Key words: melodic process, data mining, machine listening, improvisation, statistical models.

Content area: Interactive Performance Systems, Machine recognition of audio, Real-time systems

In this report we chronicle preparations made for a series of improvisations with an accomplished "Kyal" vocalist. "Kyal" is a Hindustani classical music tradition practiced in both North India and Pakistan. At the onset of this collaboration between two western computer musicians and a Pakistani singer, it was clear that boundaries between musical cultures and aesthetic concerns had to reckoned with. Neither of the western musicians had aspirations to become Hindustani classical musicians nor did the vocalist want to westernize his style. The strategy was to create a musical common ground upon which a vital dialogue could take place. Certain features of classical performance were left intact; the singer played a leading role and the computer parts spanned a range of functions from accompaniment to foil; the "raga" was the point of departure for the pitch material; and the "tal" for the rhythmic organization. The primary aesthetic concern was for responsive interaction among the performers.

The extensive preparations for performance we undertook were motivated by our desire to develop software listening assistants that accurately tracked the singer and that were capable making musically appropriate abstractions about his melodic and rhythmic structures in real-time. These abstractions in turn were used to inform the compositional algorithms the computer musicians guided in performance.

In the machine configuration for the performance the singer was tightly mic'ed and each of the computer performers had a gestural input device, a Wacom digitizing tablet for one and a specially constructed poly-point tactile interface for the other. Power PC Macintoshes running MAX and its MSP signal processing extensions processed and interpreted both gestural data and the signal from the vocalist's microphone and controlled rhythmic, melodic, and drone processes. In addition, two SGI machines were used for real-time additive synthesis and resonance modeling. The four computers were on an ethernet intranet and communication among the them used the Open Sound Control, (OSC), protocol.

Preparations began by gathering material from the singer and extensively data mining the pitch, amplitude, and timbral envelope analyses. Recordings were made by providing the singer with a drone, harmonium, and rhythmic accompaniment in headphones. A few hours of dry isolated vocal performance were obtained in this manner and this material was thoroughly analyzed. Pitch functions were of particular interest as this music involves expressive continuous pitch variation. A variety of statistical models were applied to the data using tools from both S-Plus and MatLab; these included the applications of splines to model amplitude and pitch functions so that they could be controlled in synthesis with the event-based OSC protocol. Hidden Markov models were also applied to the pitch material in the spirit of Narmour and Krumhansl's melodic process notions. This extensive data set allowed us to train our pitch tracker, envelope follower, event detector, and melodic and rhythmic classifiers for use in real-time machine listening. It also allowed us to produce a melodic process model that could be driven around in performance under gestural control.

Our representation and performance engine provide for deterministic control of rhythmic fine structure (reported on at ICMC97). What is new this year are the methods we have developed for navigating among a large and varied class of rhythmic structures. Spatial layout of rhythmic classes are used in combination with controls that exaggerate or diminish their features. No attempt was made to mimic "tabla" style or sound but our effort to provide a rhythmic foundation for the performances is informed by classical practice. In addition, we combine melodic process and rhythmic structure in the performance model.

This work, in addition to its new contributions concerning the use of statistical models of melodic process and rhythmic structure, integrates a variety of computer music practices into a reactive improvised performance practice.