Timbral Prototypes

CAST Transforms: Timbral Prototypes

revised 2/3/98

Introduction

A timbral prototype is a set of data and an algorithm that can produce data to play a timbre or a set of related timbres.

There are currently 4 kinds of timbral prototypes that the synth can play.

Data files: .fmt, .format, .f0, .F0, and (probably broken) .erb files. These can be interpolated.
Resonances
Shepard Tones
Neural Networks

The timbral prototype transform of the synth allows each voice to be set to one of these kinds of timbral prototypes. Each of these timbral prototypes has its own way of being loaded into the synth, its own rules for setting a voice to play it, and its own ways of being controlled musically.

By default, each voice in the synth is initially set to no timbral prototype at all, so the synth will not make any sound until you set the desired voices to the desired timbral prototypes.

Data Files

There are several timbral prototype file formats that the synth can read and play. These files typically come from running some sort of analysis program, but they can also be created by hand or produced synthetically by a program.

Data files are loaded from disk into the synth's memory when the synthesizer starts up. The synth prints the name of each timbral prototype file, the amount of RAM needed to store the data in it, and its length, in seconds, as it loads them. (We plan to make a way to load files from disk while the synth is already running sometime in the future.) There is a pool of available data files in the synth's memory, numbered according to the order in which they were loaded into the synth. (I.e., the order in which they appear on the synth command line or in synthconfig's "Select Datasets" window.) For example if you start the synth with these command-line arguments

softcast foo.fmt bar.F0 grrzl.fmt

then there will be three data files loaded in the synth's memory, numbered 0 through 2:

data file 0: foo.fmt
data file 1: bar.F0
data file 2: grrzl.fmt

The timbre_index message sets a voice to play an interpolated combination of one or more data files; the arguments are the indices (i.e., numbers) of the desired data files.

Interpolation of Data Files

The CAST synth does a crude form of timbral interpolation ("morphing") using linear interpolation of partial amplitudes and frequencies. When a voice is set to a number of data files, the synth will do the following:

Given a single virtual time for the voice, look up the data corresponding to that time in each of the data files.
For each partial number up to the maximum the synth will synthesize per voice, do the following:
- If this partial number is "dead" at the given time in any of the data files this voice is set to, make this partial dead for the voice as a whole.
- Otherwise, the amplitude for this partial will be a weighted sum of the amplitudes from the data files, and the frequency will be a weighted sum of the frequencies. The weights for this weighted sum are specified by the client with the interpolation_weights message.

This method of interpolation is not "smart" at all about time. If you interpolate between a 1-second file and a 2-second file, the result will be only one second long, because after 1 second all the partials from the shorter file are dead. If you interpolate between a tone with a fast attack and a slow attack, the resulting file will have two attacks, one fast and one slow, rather than a medium-speed attack.

This method of interpolation is also not "smart" about partial numbering. In a .format file, the partial index numbers are supposed to be just arbitrary integers to match up the data for a given sinusoidal track over time. But this form of interpolation reifies these arbitrary index numbers, using them to determine which partials correspond to each other in the timbral prototypes being interpolated. When these partial numbers correspond to harmonic numbers for harmonic tones, this interpolation works reasonably well.

Parameters for Data File Timbral Prototypes

Message Argument(s) Type(s) Description

timbre_index list of data file indices Ints Specifies a set of data file timbral prototypes to interpolate amongst.

interpolation_weights list of interpolation weights from 0. to 1. These should normally add up to 1.0. Floats Specifies the weighting of each data file in the interpolation that produces the additive synthesis data for this voice. There should be as many weights as the number of prototypes the voice is set to.

Examples using Data File Timbral Prototypes

For the following examples, run softcast with two timbral prototypes,
e.g., demosoftcast -voices 2 /usr/local/cast/lib/data/trump-saxhonk-morph/*.fmt

First prototype:

/voices/0/tp/timbre_index 0
/voices/0/tm/goto 0.0
/voices/0/tm/rate 1.0

Second prototype:

/voices/0/tp/timbre_index 1
/voices/0/tm/goto 0.0
/voices/0/tm/rate 1.0

Heterophony

/voices/0/tp/timbre_index 0
/voices/1/tp/timbre_index 0
/voices/0/tm/goto 0.0
/voices/0/tm/rate 1.0
/voices/1/tm/goto 0.5
/voices/1/tm/rate 0.95

Morphing 100% Trumpet

/voices/0/tp/timbre_index 0 1
/voices/0/tp/interpolation_weights 0.0 1.0
/voices/0/tm/goto 0.0
/voices/0/tm/rate 1.0

Morphing 25/75

/voices/0/tp/timbre_index 0 1
/voices/0/tp/interpolation_weights 0.25 0.75
/voices/0/tm/goto 0.0
/voices/0/tm/rate 1.0

Morphing 50/50

/voices/0/tp/timbre_index 0 1
/voices/0/tp/interpolation_weights 0.5 0.5
/voices/0/tm/goto 0.0
/voices/0/tm/rate 1.0

Morphing 75/25

/voices/0/tp/timbre_index 0 1
/voices/0/tp/interpolation_weights 0.75 0.25
/voices/0/tm/goto 0.0
/voices/0/tm/rate 1.0

Morphing: 100% Sax

/voices/0/tp/timbre_index 0 1
/voices/0/tp/interpolation_weights 1.0 0.0
/voices/0/tm/goto 0.0
/voices/0/tm/rate 1.0

Resonances

The resonance model allows you to specify a timbre as a sum of exponentially decaying amplitude sine waves. The frequency, initial amplitude and decay rate are independently specified. In the original formulation resonance models use filter banks. This is why the time constant of the decaying exponential is specified as a bandwidth in Hz.

The bandwidth of a resonance determines the speed at which it decays to silence. A high bandwidth gives a fast decay, a low bandwidth gives a slow decay, and a bandwidth of zero causes a partial to sustain indefinitely. The bandwidth parameter has no effect on the width of the frequency region that makes up the resonance; in this model all resonances are single sinusoids.

Currently, we rely on a resonances database of previously analyzed sounds. We plan to integrate other analysis tools such as those used for modal models, Prony's method etc.

We enhanced this resonance model with the addition of cosine-shaped attacks. When a resonance first starts playing (i.e., when its virtual time is zero), its amplitude is determined by a half-period of a cosine curve, increasing to the maximum amplitude and then decaying exponentially.

The resonance data sent to the synth is stored in each voice. This is different from the data file timbral prototypes, where there is a library of in-memory timbral prototypes and a message to associate a voice with a timbral prototype by index. Instead, since the amount of data for a resonance timbral prototype is so small, we just send all of the data to the voice we want to play that data.

Parameters for Resonance Timbral Prototypes

Message Argument(s) Type(s) Description

`resonances` List of frequency (Hz), gain (dB), and bandwidth (Hertz) for each desired resonance. Floats This message provides all the data to make up a resonance timbral prototype with any number of resonances. For each resonance there should be three float argments, specifying that resonance's frequency, gain, and bandwidth.

`resonances_scaled` Just like the args to `resonances`, but scaled. Floats A specialized version `resonances` for clients that know about the internal workings of the resonance model and can pre-scale their data by the appropriate magic constants.

`tone_bank` List of frequency (Hz), and gain (dB) for each desired partial. Floats Just like `resonances`, except that the partials specified by `tone_bank` sustain forever, and therefore do not need any bandwidth specified.

`res_attack_time` List of attack time in seconds for each partial. Floats Length of time for cosine-shaped attack before exponential decay in resonance model.

Note that gain is expressed in dB where 0dB means a maximum amplitude sine wave. Most of the time these gains will be negative between 0 and -90dB.

Resonance Examples

For the following examples simply run demosoftcast.

Here is an example of a single exponentially decaying sine wave:

/voices/0/tp/resonances 440.0 -6.0 2.0
/voices/0/tm/goto 0.0

It doesn't take many of these resonances to make interesting sounds. Here are six obtained by analyzing a recording of a marimba:

/voices/0/tp/resonances 54.806 -44.21311 1.11059 131.292 -16.43748 0.77083 394.547 -36.65619 0.42492 526.121 -26.90901 3.03617 653.126 -34.51409 5.84818 1571.813 -11.19356 11.46624
/voices/0/tm/goto 0.0

You can change the damping by adjusting the time machine's rate parameter:

/voices/0/tm/rate 2.0
/voices/0/tm/start 0.0

/voices/0/tm/rate 0.4
/voices/0/tm/start 0.0

/voices/0/tm/rate 1.0
/voices/0/tm/start 0.0

A sine wave reference:

/voices/0/tp/tone_bank 440.0 -6.0

Stop the reference:

/voices/0/tp/resonances 440.0 -6.0 2.0
/voices/0/tm/goto 0.0
/voices/0/tm/rate 1.0

Shepard Tones

Shepard tones are special tones that "wrap around" to the exact same tone when played an octave higher or lower. For example, if you play an ascending chromatic scale with a Shepard Tone, each note will sound a half step higher than the previous note, but the last note will sound exactly the same as the first note.

Thus, the fundamental frequency given as the argument to the shepard_freq message has an arbitrary octave; for example, any of the frequencies 12.5, 25, 50, 100, 200, 400, 800, 1600, or even 0.09765625 or 107374182400 would produce exactly the same tone.

Shepard tones consist of a series of octave-spaced partials (not the usual harmonic series, but rather the 1st, 2nd, 4th, 8th, 16th, 32nd... partials) whose amplitudes are determined by a bell-shaped curve. (Actually, in this interpretation, the amplitudes of the partials are determined by a hack that doesn't really make a bell curve. We're thinking about improving this and also giving the user some control over the shape of this spectral envelope.) This series of partials goes all the way from the lower limits of human hearing (around 20 Hz) to the Nyquist frequency.

Parameters for Shepard Tone Timbral Prototypes

Message Argument(s) Type(s) Description

shepard_freq "Fundamental" Frequency (Hz) of Shepard Tone Float Sets the voice to play a Shepard Tone with the given frequency.

Neural Networks

CNMAT is researching uses of neural networks for additive synthesis control. In our current implementation, training a network is a separate phase from running it to produce sound data. Already-trained networks are compiled into a particular version of the synthesizer, and can be accessed by name as timbral prototypes. More information on networks is available if you're inside the CNMAT subdomain.

Parameters for Neural Network Timbral Prototypes

Message Argument(s) Type(s) Description

`set_net` Name of a neural net compiled into the synth String Sets the voice to use the given neural network as its timbral prototype.

`net_inputs` Values for input units of neural network, generally in the range 0 to 1. Floats The meanings of the input units vary for each network and depend on how the network was trained. Our current research thrust is for the net inputs to correspond to musically meaningful parameters like pitch, loudness, brightness, vowel, etc.

`net_pitch` Frequency (Hz) Float For nets that output a harmonic series, this parameter determines the frequencies.

Message	Argument(s)	Type(s)	Description
`timbre_index`	list of data file indices	Ints	Specifies a set of data file timbral prototypes to interpolate amongst.
`interpolation_weights`	list of interpolation weights from 0. to 1. These should normally add up to 1.0.	Floats	Specifies the weighting of each data file in the interpolation that produces the additive synthesis data for this voice. There should be as many weights as the number of prototypes the voice is set to.