An Adaptive Conductor Follower

Michael Lee, Guy Garnett, and David Wessel

Introduction


Conducting an ensemble of musicians poses difficult recognition and parameter estimation problems. Musicians must extract information such as tempo and volume from the conductor's gestures. The difficulty of this task increases if the musicians are expected to adapt to markedly different conducting styles in addition to dealing with variations within a given style. An artificial musician integrated into a human ensemble requires the same recognition, estimation, and adaptive capabilities as its human counterparts.

We have developed an adaptive artificial musician that addresses these problems using MAX (Puckette 1986), MAXNet (Lee 1991), a Buchla Lightning, and a Mattel Power Glove. Our musician is able to understand gestures and control tempo, volume, and other performance parameters. In this report we focus on adaptive strategies for interpreting the gestures of a conductor or conductor following. We discuss the control parameters of our musician and the adaptive methods used to control these parameters. We also discuss our training method and environment, and evaluate the effectiveness of our adaptive follower.

Conducting

The conductor is responsible for controlling the shape or interpretation of a piece. He has two separate contexts for explaining his interpretation to an ensemble: rehearsal time and performance time. During rehearsal, the ensemble alters its performance of a piece until it matches the conductor's interpretation; the primary goal is working out fine details such as phrasing, balance, and tempo deformations. There is much stopping and repetition of short segments to facilitate learning.

A performance is just another rehearsal in most respects except one: the conductor and ensemble endeavor to keep going even if mistakes are made. Further refinements in interpretation are made during performance as well as compensation for different environments (such as a full hall instead of an empty rehearsal studio), or ensemble personnel.

Conductor Follower and Musical Instrument Control

We can model the task of an individual musician as three subtasks: monitor the conductor's physical gestures and recognize the implied control data (conductor following), combine these control data with a score representation into a performance representation (performance interpretation), and translate the performance interpretation into musical instrument control gestures (instrument control). This decomposition simplifies the overall task and decouples performance interpretation and instrument control from conductor following.

Neural Networks for Conductor Following

Our architecture for simultaneously producing classification and parameter estimate information is modular and can be divided into three pieces: a classification module, a group of parameter estimator modules, and glue to combine the classifier and estimator outputs:

Both the classifier and estimator modules consist of a preprocessor section and a feed-forward neural network. The preprocessors are designed to inject any apriori knowledge or structure about the problem into the system. All feed-forward networks are trained by back-propagation (Rumelhart 1986) using a training set that reflects the prior probability distribution of classes. For the classifier, augmenting the cost function with the constraint that all the output values sum to one results in a net which returns the Bayesian posterior probabilities of class membership for a given feature vector (Bourlard 1990).

The difference between the two classifier and estimator modules is that estimators act as multidimensional function approximators as opposed to strict classifiers. Note that the feature vectors of the estimator modules may partially or completely overlap or overlap with the feature vector of the classifier.

Neural Networks for Tempo Tracking

In our first tempo tracking model, we restricted the baton movements to one dimension. We considered a simple conducting style where the bottom of the beat indicated the actual downbeat. Tempo was computed using the time between downbeats. This method has a couple of serious disadvantages. First, the tempo measurement completes too late because the beat has already finished. Second, there is no way to vary the tempo within the beat. To handle subtle tempo fluctuations within a beat, we need a prediction mechanism and we need more resolution within a beat.

We can get more resolution by calculating tempo from the time between half beats. We have taken some experimental data and found that the top of the beat curve does not necessarily occur exactly halfway through the beat and varies widely from conductor to conductor and tempo to tempo. The time measure should therefore be adjusted for each conductor. This solution now gives control every half beat but still computes it after the fact.

Prediction can be added to the system by using previous velocity and acceleration measures to predict the time until the next half beat. If the resolution is fine enough, we can get an instantaneous measure of the tempo.

Musician ArchitectureTraining

There are two adaptive elements in our artificial musician: the tempo tracker and the gesture classifier. To train the tempo-tracker compensator, we measured the up and down half-beat times for various tempos. This measurement was taken by asking the user to conduct along with a metronome. Time measurements were stored and split into a training and test set. Separate feed-forward neural networks were then trained to approximate the tempo in both directions. Training data for the classifier was collected by asking the user to make hand gestures to corresponding gesture classes.

Conclusions

We have used neural networks to address classification and parameter estimation in a real-time conducting application. The networks were able to adapt to the user's gestures resulting in a simple, flexible conductor follower that responds to a variety of different users. Because of their learning ability, neural networks can help obtain subtle, complex, dynamic control information from a wide variety of conductors.

References


Bourlard, H., Morgan, N., Wellekens, C.J., "Statistical Inference in Multilayer Perceptrons and Hidden Markov Models with Applications in Continuous Speech Recognition," Neurocomputing, Fogelman, F., Herault, J., eds., NATO ASI Series, Vol. F68, 1990.

Lee, M., Freed, A., Wessel, D., "Real-Time Neural Network Processing of Gestural and Acoustic Signals," Proc. of the Int. Computer Music Conf., Montreal, 1991.

Lee, M., Freed, A., Wessel, D., "Neural Networks for Simultaneous Classification and Parameter Estimation in Musical Instrument Control," Proc. of SPIE Conf. on Adaptive and Learning Systems, Orlando, 1992.

Puckett, M., "Interprocess Communication and Timing in Real-time Computer Music Performance," Proc. of the Int. Computer Music Conf., The Hague, 1986.

Rumelhart, D.E., McClelland, J.L. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vols. 1 and 2, MIT Press, Cambridge, 1986.