Instruments That Learn

David Wessel

extract from Computer Music Journal, Vol. 15, No.4, Winter 1991

Musicians often speak of a rather special and very personal relationship with their instrument. Indeed, many instrumentalists adapt the instrument physically to particularities of their playing style ‹ choosing the bridge, string, bow, or the mouthpiece reed combinations, and so on. On more poetic occasions a musician will speak as if the instrument has come to know something of its player. It would seem quite nature then to think about intelligent instruments that could adapt in some automated way to a personal playing style.

Before letting the intelligent instrument go too far towards fantasy, I would like to present some learning technologies that are now ripe enough for real applications and that might prove suggestive for musical use. Let us look for a moment at the use of adaptive neural networks for handwritten character recognition. While I describe this handwriting classifier I would like the reader to keep in mind conducting gestures and signals.

At AT&T Bell Laboratories, Isabelle Guyon and her coworkers (Guyon et al. 1991) have developed a system that exploits Time Delay Neural Networks (Waibel et al. 1989) for writer independent and writer-adaptive on-line character recognition. Their recognition system is targeted for use in touch terminals and for signature verification. One of the essential features of their approach that distinguishes it from much of the work on character recognition is that the sensing and encoding of the writer's gesture sequence plays a fundamental role. A character is not just a pixel map but a sequence of sample points in a plane. Guyon represents a character at the output of the preprocessing stage as a sampled sequence of features that describe the state of the pen‹up or down‹ the coordinates of the pen, and the slope and curvature of the pen's trajectory. With some additional features like velocity, this sort of representation scheme would seem the right sort of thing for conducting gestures.

The sequence of feature vectors at the output of the preprocessing stage is scanned by a Time Delay Neural Network recognizer. This network is constructed so the simple local topological features are combined through successive layers of units into more complex and global features until the output layer. Each of the units in the output layer identifies a character. This network is trained on a large set of characters from a large group of writers using the back propagation learning algorithm (Rumelhart, Hinton, and Williams 1986), and a scheme for emphasizing the learning of atypical writing styles. Guyon and her colleagues have further developed their system so that personalized characters and writing styles can be added to the core of the writer-independent neural network. With these techniques they have obtained classification accuracy of better than 96 percent on test examples, a result far superior to state of-the-art optical character recognition applied to the same materials.

At CNMAT, Mike Lee, Adrian Freed, and myself have been exploring the use of neural networks in conjunction with musical instrument controllers like the Zeta Guitar and with alternate controllers like Max Mathew's Radio Baton and Don Buchla's Lightning. These latter devices can supply real-time spatial coordinate sequences.

We have added neural-network objects to the MAX programming environment Puckette and Zicarelli 1990) so that our experimentation can be carried out in a live-performance context. Our quite preliminary results are encouraging. We have obtained reliable recognition of complex guitar strumming gestures and limited numbers of spatial gestures. In both of these preliminary experiments the musician supplied a set of personal gestures, each of which was to provoke a specific response, thus providing the training materials for the back-propagation learning algorithm.

With such procedures and much more research, we might conceivably move towards adaptive, personalizable instruments. There is a special and intriguing dilemma here. As suggested earlier, some types of music like jazz emphasize the development of personal playing styles. Others like those based on traditional Western performance practice emphasize a standardization of playing style. With adaptive instruments there will be a new twist; one will have to decide when to standardize or fix the instrument and let the musician learn the appropriate gesture and when to let the instrument adapt to the specialized approach of a player. How to rig the training harnesses on ourselves as players and on our instruments as expressively responsive musical tools will be a question of scientific, aesthetic, and social concern.

References

Guyon, I. et al. 1991. "Design of a Neural Character Recognizer for a Touch Terminal." Pattern Recognition 24(2): 105-119.
Puckette, M. and D. Zicarelli. 1990 MAX‹An Interactive Graphic Programming Environment. Menlo Park, CA: Opcode Systems, Inc.
Rumelhart, D.D., G.E. Hinton, and R.J. Williams. 1986 "Learning Internal Representations by Error Propagation." In D.E. Rumelhart and J. McClelland, eds.. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol.1. Cambridge, MA: MIT Press, pp. 318-362.
Waibel, A. et al. "Phoneme Recognition Using Time-Delay Neural Networks." IEEE Transactions on Acoustics, Speech, and Signal Processing. 37:328-339.