Instruments That Learn
David Wessel
extract from Computer Music Journal, Vol. 15, No.4, Winter 1991
Musicians often speak of a rather special and very personal relationship
with their instrument. Indeed, many instrumentalists adapt the instrument
physically to particularities of their playing style choosing the
bridge, string, bow, or the mouthpiece reed combinations, and so on. On
more poetic occasions a musician will speak as if the instrument has come
to know something of its player. It would seem quite nature then to think
about intelligent instruments that could adapt in some automated way to
a personal playing style.
Before letting the intelligent instrument go too far towards fantasy, I
would like to present some learning technologies that are now ripe enough
for real applications and that might prove suggestive for musical use. Let
us look for a moment at the use of adaptive neural networks for handwritten
character recognition. While I describe this handwriting classifier I would
like the reader to keep in mind conducting gestures and signals.
At AT&T Bell Laboratories, Isabelle Guyon and her coworkers (Guyon et al.
1991) have developed a system that exploits Time Delay Neural Networks (Waibel
et al. 1989) for writer independent and writer-adaptive on-line character
recognition. Their recognition system is targeted for use in touch terminals
and for signature verification. One of the essential features of their approach
that distinguishes it from much of the work on character recognition is
that the sensing and encoding of the writer's gesture sequence plays a fundamental
role. A character is not just a pixel map but a sequence of sample points
in a plane. Guyon represents a character at the output of the preprocessing
stage as a sampled sequence of features that describe the state of the penup
or down the coordinates of the pen, and the slope and curvature of
the pen's trajectory. With some additional features like velocity, this
sort of representation scheme would seem the right sort of thing for conducting
gestures.
The sequence of feature vectors at the output of the preprocessing stage
is scanned by a Time Delay Neural Network recognizer. This network is constructed
so the simple local topological features are combined through successive
layers of units into more complex and global features until the output layer.
Each of the units in the output layer identifies a character. This network
is trained on a large set of characters from a large group of writers using
the back propagation learning algorithm (Rumelhart, Hinton, and Williams
1986), and a scheme for emphasizing the learning of atypical writing styles.
Guyon and her colleagues have further developed their system so that personalized
characters and writing styles can be added to the core of the writer-independent
neural network. With these techniques they have obtained classification
accuracy of better than 96 percent on test examples, a result far superior
to state of-the-art optical character recognition applied to the same materials.
At CNMAT, Mike Lee, Adrian Freed, and myself have been exploring the use
of neural networks in conjunction with musical instrument controllers like
the Zeta Guitar and with alternate controllers like Max Mathew's Radio Baton
and Don Buchla's Lightning. These latter devices can supply real-time spatial
coordinate sequences.
We have added neural-network objects to the MAX programming environment
Puckette and Zicarelli 1990) so that our experimentation can be carried
out in a live-performance context. Our quite preliminary results are encouraging.
We have obtained reliable recognition of complex guitar strumming gestures
and limited numbers of spatial gestures. In both of these preliminary experiments
the musician supplied a set of personal gestures, each of which was to provoke
a specific response, thus providing the training materials for the back-propagation
learning algorithm.
With such procedures and much more research, we might conceivably move towards
adaptive, personalizable instruments. There is a special and intriguing
dilemma here. As suggested earlier, some types of music like jazz emphasize
the development of personal playing styles. Others like those based on traditional
Western performance practice emphasize a standardization of playing style.
With adaptive instruments there will be a new twist; one will have to decide
when to standardize or fix the instrument and let the musician learn the
appropriate gesture and when to let the instrument adapt to the specialized
approach of a player. How to rig the training harnesses on ourselves as
players and on our instruments as expressively responsive musical tools
will be a question of scientific, aesthetic, and social concern.
References
- Guyon, I. et al. 1991. "Design of a Neural Character Recognizer
for a Touch Terminal." Pattern Recognition 24(2): 105-119.
- Puckette, M. and D. Zicarelli. 1990 MAXAn Interactive Graphic
Programming Environment. Menlo Park, CA: Opcode Systems, Inc.
- Rumelhart, D.D., G.E. Hinton, and R.J. Williams. 1986 "Learning
Internal Representations by Error Propagation." In D.E. Rumelhart and
J. McClelland, eds.. Parallel Distributed Processing: Explorations in
the Microstructure of Cognition, vol.1. Cambridge, MA: MIT Press, pp.
318-362.
- Waibel, A. et al. "Phoneme Recognition Using Time-Delay Neural
Networks." IEEE Transactions on Acoustics, Speech, and Signal Processing.
37:328-339.