An Adaptive Conductor Follower
Michael Lee, Guy Garnett, and David Wessel
Conducting an ensemble of musicians poses difficult recognition and parameter
estimation problems. Musicians must extract information such as tempo and
volume from the conductor's gestures. The difficulty of this task increases
if the musicians are expected to adapt to markedly different conducting
styles in addition to dealing with variations within a given style. An artificial
musician integrated into a human ensemble requires the same recognition,
estimation, and adaptive capabilities as its human counterparts.
We have developed an adaptive artificial musician that addresses these problems
using MAX (Puckette 1986), MAXNet (Lee 1991), a Buchla Lightning, and a
Mattel Power Glove. Our musician is able to understand gestures and control
tempo, volume, and other performance parameters. In this report we focus
on adaptive strategies for interpreting the gestures of a conductor or conductor
following. We discuss the control parameters of our musician and the adaptive
methods used to control these parameters. We also discuss our training method
and environment, and evaluate the effectiveness of our adaptive follower.
The conductor is responsible for controlling the shape or interpretation
of a piece. He has two separate contexts for explaining his interpretation
to an ensemble: rehearsal time and performance time. During rehearsal, the
ensemble alters its performance of a piece until it matches the conductor's
interpretation; the primary goal is working out fine details such as phrasing,
balance, and tempo deformations. There is much stopping and repetition of
short segments to facilitate learning.
A performance is just another rehearsal in most respects except one: the
conductor and ensemble endeavor to keep going even if mistakes are made.
Further refinements in interpretation are made during performance as well
as compensation for different environments (such as a full hall instead
of an empty rehearsal studio), or ensemble personnel.
Conductor Follower and Musical Instrument Control
We can model the task of an individual musician as three subtasks: monitor
the conductor's physical gestures and recognize the implied control data
(conductor following), combine these control data with a score representation
into a performance representation (performance interpretation), and translate
the performance interpretation into musical instrument control gestures
(instrument control). This decomposition simplifies the overall task and
decouples performance interpretation and instrument control from conductor
Neural Networks for Conductor Following
Our architecture for simultaneously producing classification and parameter
estimate information is modular and can be divided into three pieces: a
classification module, a group of parameter estimator modules, and glue
to combine the classifier and estimator outputs:
Both the classifier and estimator modules consist of a preprocessor section
and a feed-forward neural network. The preprocessors are designed to inject
any apriori knowledge or structure about the problem into the system. All
feed-forward networks are trained by back-propagation (Rumelhart 1986) using
a training set that reflects the prior probability distribution of classes.
For the classifier, augmenting the cost function with the constraint that
all the output values sum to one results in a net which returns the Bayesian
posterior probabilities of class membership for a given feature vector (Bourlard
The difference between the two classifier and estimator modules is that
estimators act as multidimensional function approximators as opposed to
strict classifiers. Note that the feature vectors of the estimator modules
may partially or completely overlap or overlap with the feature vector of
Neural Networks for Tempo Tracking
In our first tempo tracking model, we restricted the baton movements to
one dimension. We considered a simple conducting style where the bottom
of the beat indicated the actual downbeat. Tempo was computed using the
time between downbeats. This method has a couple of serious disadvantages.
First, the tempo measurement completes too late because the beat has already
finished. Second, there is no way to vary the tempo within the beat. To
handle subtle tempo fluctuations within a beat, we need a prediction mechanism
and we need more resolution within a beat.
We can get more resolution by calculating tempo from the time between half
beats. We have taken some experimental data and found that the top of the
beat curve does not necessarily occur exactly halfway through the beat and
varies widely from conductor to conductor and tempo to tempo. The time measure
should therefore be adjusted for each conductor. This solution now gives
control every half beat but still computes it after the fact.
Prediction can be added to the system by using previous velocity and acceleration
measures to predict the time until the next half beat. If the resolution
is fine enough, we can get an instantaneous measure of the tempo.
There are two adaptive elements in our artificial musician: the tempo tracker
and the gesture classifier. To train the tempo-tracker compensator, we measured
the up and down half-beat times for various tempos. This measurement was
taken by asking the user to conduct along with a metronome. Time measurements
were stored and split into a training and test set. Separate feed-forward
neural networks were then trained to approximate the tempo in both directions.
Training data for the classifier was collected by asking the user to make
hand gestures to corresponding gesture classes.
We have used neural networks to address classification and parameter estimation
in a real-time conducting application. The networks were able to adapt to
the user's gestures resulting in a simple, flexible conductor follower that
responds to a variety of different users. Because of their learning ability,
neural networks can help obtain subtle, complex, dynamic control information
from a wide variety of conductors.
Bourlard, H., Morgan, N., Wellekens, C.J., "Statistical Inference in
Multilayer Perceptrons and Hidden Markov Models with Applications in Continuous
Speech Recognition," Neurocomputing, Fogelman, F., Herault,
J., eds., NATO ASI Series, Vol. F68, 1990.
Lee, M., Freed, A., Wessel, D., "Real-Time Neural Network Processing
of Gestural and Acoustic Signals," Proc. of the Int. Computer Music
Conf., Montreal, 1991.
Lee, M., Freed, A., Wessel, D., "Neural Networks for Simultaneous Classification
and Parameter Estimation in Musical Instrument Control," Proc. of SPIE
Conf. on Adaptive and Learning Systems, Orlando, 1992.
Puckett, M., "Interprocess Communication and Timing in Real-time Computer
Music Performance," Proc. of the Int. Computer Music Conf., The Hague,
Rumelhart, D.E., McClelland, J.L. Parallel Distributed Processing: Explorations
in the Microstructure of Cognition, Vols. 1 and 2, MIT Press, Cambridge,