7. Describing Rhythmic Behavior,
Representing Rhythmic Structure

In this chapter I discuss a class of models for rhythmic tapping that have developed in the literature over the last few decades. I discuss what I see as a shortcoming of these models, from which I suggest a supplementary view that allows for different mental processes at different timescales. In this light, I then describe work initiated by Bilmes (1993) and continued by a group consisting of Jeff Bilmes, Matt Wright, David Wessel, and myself (Iyer et al. 1997). This work culminated in the design and implementation of a novel representation for rhythmic structure, incorporating considerations specific to groove-based music such as those discussed in the previous chapter.

Rhythmic tapping. The well-known studies of Wing & Kristofferson (1973), Verborg & Hambuch (1984), Jagacinski et al. (1988), and Verborg & Wing (1996) are concerned with the unraveling of cognitive command structure in the timing of motor activity. Wing & Kristofferson (1973) conducted experiments in which subjects tapped a finger periodically at a moderate, steady rate. Subjects tapping was initially matched with periodic tones, and then the tones ceased and the tapping was unpaced. The experimenters studied how accurately the original period was upheld in unpaced tapping. From their data they developed a model in which it was assumed that there were two independent sources of variability in this unpaced phase: 1) variation in timing of centrally-generated, feed-forward, periodic commands, and 2) variation in implementation (mechanical noise in effector, nerve response delays, and so forth). The variability in tapping was taken to be the sum of these two random variables. By doing elementary statistics on the interval values, one could determine which of these two variables gave rise to a given change in the overall variability. For example, it was found that the typical negative correlation between successive intervals ("negative covariance at lag-1") is associated entirely with variance in implementation. In other words, the natural "swing" inherent in a person s tapped steady pulse is due to variability not in the centrally-generated command originating from the brain, but in the motor processes associated with the body.

Pressing et al. (1996) worked with experienced musicians to find that systematic variations in so-called polyrhythmic tapping, measurable through variance-covariance analysis of tapped intervals, imply a hierarchical cognitive structure in rhythm production. Specifically, there is a difference in the microtiming inflections of a musician tapping 4:3 (four equally spaced taps over the length of three tapped beats) from one tapping 3:4, even though theoretically they are the "same" rhythm. This systematic variation fits with a hierarchical model involving a central clock, a separate process of referential timing, and motor delays.

The general cognitive model proposed by the above authors for the production of polyrhythms is as follows:

  1. A central clock-like command process directly triggers the motor delay process of the "ground" rhythmic stream, played by one hand.
  2. Via an intervening subpulse (subdivision) process, the immediately preceding ground-stream clock element cues the motor delay in the "figure" stream.

It is possible that the results showing the difference between 3:4 and 4:3 are a recasting of the work of Drake and Palmer (1993, discussed in the previous chapter) at a lower cognitive level. If there is indeed some amount of microtiming that is related to meter, then it could show up in the difference between the so-called figure and ground stream. Pressing et al. make no reference to meter, but it is clear that the ground stream functions metrically in this context. The central command clock functions as a pulse; the inter-pulse intervals are subdivided for accurate timing of the figure stream. However, Magill and Pressing (1997) have attempted to posit an asymmetric central clock, based on a view of West African music as treating its so-called timeline patterns as the "ground" rather than a figure. This work was discussed and critiqued earlier, in the chapter on meter.

A subtactus clock. A problem I see with all of the references cited above is that nobody has ever proposed a model for polyrhythmic tapping in which the central clock pulse is the common subdivision of the generated rhythms. Instead, they propose internal timekeepers that are equal to the length of one of the tactus-level pulses. In the above model, subdivision of the ground pulse only occurs when necessary; it does not continue in the absence of subdividing material. This is odd, considering that the oft-cited Povel (1981) has shown that people tend to perceive rhythms in terms of a common underlying unit, and have the least difficulty perceiving rhythms that conform to such a structure.

To be sure, the notion of a second, faster cognitive clock seems to go against some theories of rhythm perception. According to Brower, we process their structure of rapid (sub-tactus) rhythms in a more qualitative fashion. Instead of measuring individual durations against the background of an internally generated metric grid, the listener recognizes sub-pulse rhythms by their qualities of "evenness or unevenness, twoness or threeness, accentedness or unaccentedness, and so on." (Brower 1993: 25) She cites Preusser (1972), who found experimental evidence for a difference between integrated, immediate, passive processing of rapid rhythmic gestalts, and intellectualized, cognitive, active processing of slower rhythms. She says that performance variations (e.g., 2:1 becomes 1.75:1, as described in Fraisse 1982) are evidence that the fast clock is not the most accurate for music. Brower s claims about the unreliability of fast clocks appear to stem from generalizations about unskilled rhythmic behavior.

However, my own experience as a performer has suggested that what we call a groove is easier to reach if the members of the ensemble collectively focus on the ongoing small subdivisions of the beat, whether or not they are filled by musical events. Similarly, C.K. Ladzekpo teaches us to feel the abstract, ongoing rhythmic subdivisions as a constant reservoir of rhythmic intensity, a rapid, energetic, incessant movement, which he calls the "yell." (Ladzekpo 1995) The awareness of this continual activity helps the performer animate a simple rhythmic pattern. It seems quite similar to the "imagined movement" that we have discussed in the case of pulse perception, but now at the faster rate of the pulse subdivision. By maintaining a sense of these abstract subpulses, and indeed by imagining them not silent but loud, the performer not only enhances rhythmic precision but also derives cues for appropriate durations and intensities for individual notes.

Contrary to Brower s claims, Ladzekpo s teachings suggest that one can learn to internalize this fast subpulse such that one need not rely on its physical reinforcement. This relates to a model proposed by Ivry (1998) which proposes a bank of timers in our neural apparatus, each related to the time constants of our various limbs, digits, and other effectors as they perform various tasks. If we are able to generalize from locomotor-type activities to an abstract concept of musical pulse, it is equally possible that, with practice, we can learn to internalize a faster clock. The latter would be related to the temporal structure of digital, manual, and lingual motion, which, as I mentioned in chapter 3, occur at a timescale that is substantially faster than the tactus/locomotor rate.

If all this is granted, it seems as though one could posit two simultaneous central clocks, one of whose frequencies is a multiple of the other. Again, with practice, an individual would learn to yoke together the faster clock associated with the physical activity of music-making, such as rapid finger motion, with a slower tactus-level pulse. I emphasize the notion of practicing this behavior because, as demonstrated in the previous chapter, groove-based musical activity involves highly skilled and precise temporal acuity, far from the simplicity of a typical tapping experiment. Also, significantly, groove-based musical activity is quite corporeal in nature; it is not just an abstract form of knowledge, but also a concrete skill requiring physical dexterity.

A tripartite model. Bilmes (1993) has developed a tripartite model for expressive timing in performance of groove-based music. In addition to the salient moderate-tempo pulse or tactus, another important pulse cycle is defined at the finest temporal resolution relevant to a given piece of music. It is called the temporal atom or tatum (in homage to the great African-American improvising pianist, Art Tatum), the smallest cognitively meaningful subdivision of the main beat. Multiple tatum rates may be active simultaneously, particularly in ensemble performance. In Western notation, tatums may correspond typically to sixteenth-notes or triplets, though they may vary over the course of a performance. As noted above, groove-based music is characterized in part by focused attentiveness to events at this fine level. The tactus and the tatum provide at least two distinct clocks for rhythmic synchronization and communication among musicians.

In Bilmes s scheme, a performance displays musical phenomena that may be represented on three timescales. First, the musical referent or "score" corresponds to the most basic representation of the performed music as it would be notated in Western terms, using quantized rhythmic values (tatums) that subdivide the main pulse. All note-events are represented at this level. Secondly, at relatively large timescale, inter-onset intervals are stretched and compressed through tempo variation. This variation may be represented as a tempo curve -- a function of musical time vs. score time. However, particularly in percussive music, there is no real musical continuum separate from the note-events; score time is quantized in units of tatums. In fact, the tempo curve operates on tatums, modifying their durations such that their sequential sum corresponds to the integral of the tempo curve. In this way, the tatum may be regarded as a sampling rate.

Thirdly, the tatum-relative temporal deviations capture many of the expressive microtiming variations discussed in the previous chapter. Deviations quantify the microscopic delays or anticipations of note-events relative to the theoretical tatum onsets. In other words, they represent the microscopic values by which note onsets differ from rigid quantization, over a metronomic background. Deviations take on continuous values from -0.5 to +0.5 tatums, so that all possible rhythmic placements are allowed. In the case of multiple simultaneous tatum rates, this may allow for a redundant representation, in that a given note-event may be described in a number of different ways. For example, swung notes may be rendered as deviated sextuplets or as differently deviated sixteenth notes. We include this ambiguity purposefully, because such ambiguities occur frequently and naturally in the types of music under study.

In his 1993 work, Bilmes used signal-processing techniques to extract each of these three quantities from a musical performance. The work demonstrated that analysis of deviations can shed some light on musicians' internal representations of the rhythmic content of their performances, particularly in ensemble contexts that feature fixed and variable rhythmic groups.

Representation & Implementation. More recently, we have elaborated upon the above model to develop a powerful representation for musical rhythm. Implemented in MAX, a graphical, object-oriented music-programming environment (Puckette 1991), the representation includes features such as pitch, accent, rhythmic deviations, tempo variation, note durations (which are found to carry important rhythmic information and are therefore treated independently), and probabilistic processes. It may be used in conjunction with MIDI instruments or other synthesizers or sound modules.

To facilitate the use of the representation, we have developed various Editors and Players, for creating and musically enacting the data structures, respectively. One of our main goals has been interactive music performance, so our Players have been designed with real-time control in mind. A Player agent steps through the data structures, scheduling and playing the note-events. Multiple data structures are handled with ease thanks to MAX s parallel architecture. Players may also improvise by selecting from banks of rhythmic data or by creating new structures in real time. The Editors consist of graphical user interfaces for creating and modifying data structures in the representation.

The basic unit of our representation is the Cell, a data structure containing a Duration and any number of Note_layers. A given Note_layer contains either a discrete, regular Tatum_grid whose elements contain Notes, or a linked list of Notes occuring at fractional points of the cell duration. The presence of multiple Note_layers allows a rich variety of rhythmic possibilities at the most fundamental level, including multiple tatum rates, "a-tatum" rhythms, hierarchy, and stratification. A Note is a vector containing data about the note type, loudness/velocity, note duration, and microrhythmic deviation (in the discrete case). Thus, expressive microtiming against a metronomic background stands on equal footing with other continually modulating musical parameters.

Among the various tools for manipulating the data structures, we provide a way to exaggerate or de-emphasize rhythmic features by the use of non-linear (power-law) compression and expansion. This technique applies most effectively to deviation and accent data, where subtle expressive features may be either softened or enhanced via a continuous controller.

Modularity. Note that the overall design privileges hierarchy at the intra-cellular level, and emphasizes "heterarchy" or modularity at the multi-cellular level. This prioritization favors a modular approach to musical organization. As was pointed out in the chapter on music and embodiment, the modular concept of musical form has special relevance to African and African-American musics. For example, rhythmic textures often arise from the superposition of various cyclic musical patterns. A prime instance of this trait occurs in Afro-Cuban rumba, which features fixed cyclic clave and wood-block ostinati, relatively stable repetitive low- and mid-range conga drum patterns, and a variable, heavily improvised quinto (high conga drum) part, all combining to form an extremely rich emergent texture. This modular approach may also occur at a higher hierarchical level. Musical pieces may have a number of different repeated sections or "spaces" that cycle for arbitrary lengths of time; the transitions among these spaces are often cued in an improvisatory fashion, quite possibly without a preordained large-scale temporal structure or a strictly linear notion of overall musical time. As mentioned earlier, the music of James Brown provides many examples of this type. The linguistics-derived structural notion of large-scale recursive depth may be replaced or supplemented, for the musician or the listener, by a concept of large-scale organizational breadth. We characterize these methods of musical organization as modular; large musical structures are assembled from small, fully formed constituent units. This mosaic concept functions as an important aesthetic guideline in African and African-American musics, appearing in many different manifestations in the cultures of the continent and their diaspora.

In our implementation, Cells may be combined either in series or in parallel into larger Cells, or they may be cycled indefinitely. The system employs a novel method for handling large numbers of complex rhythmic structures, using features of the MAX collection object, a versatile and open-ended data structure. This method facilitates many of the modular combination techniques mentioned above, such as the stratification of different-length rhythmic cycles, creation of composite beat schemes, repetition, rhythmic progression, and most importantly, improvisatory manipulation of these structures. Thus, a user may create a number of different Cells and select rapidly from among them in real time, superimposing, serializing, or otherwise manipulating the constituent units.

Applications. The richness of control over many meaningful musical quantities distinguishes our representation from those in more common usage, such as music notation programs, drum machines, while the focus on musical modularity gives the program a different emphasis from traditional sequencers. In addition, as mentioned above, the representation supports creative applications in improvised performance with electronic instruments. We use our implementation to collect rhythmic data from musicians for analysis, and to develop hypotheses and models of rhythm cognition. We have also begun to employ probabilistic processes to construct a useful preliminary representation of rhythmic improvisation, which has been exploited in performance.

 

 Table of Contents

List of Audio Examples

 Bibliography

 Discography

 Previous Chapter

Next Chapter