SDIF: Things to Think About

by Matt Wright, 10/15/97, updated 9/29/99

This page contains elaborations of some not-so-obvious issues about the current version of the SDIF spec.

The topics in this file are in no particular order.

Two Meanings of the Word "Stream"

These SDIF documents use two meanings for the word "stream."

The first meaning has to do with the idea of "streaming" data over the Internet or some other data channel in SDIF format, so that SDIF data arrives continually as it is used by the receiver, rather than being transmitted all at once as a file. In this sense we say "an SDIF stream" or "an SDIF file or stream".

The second meaning has to do with the notion of a series of frames at different times that describe the same data, e.g., a set of fundamental frequency estimate frames that together constitute an estimated fundamental frequency envelope for a sound. Each of these frames would have the same value for their Stream ID field, and we'd say that together the frames comprise a stream of data within an SDIF file or stream.

The meaning is generally clear from the context.

4-Byte Frame and Matrix Type IDs

Frames and matrices begin with a 4-byte field indicating their type. For all the common types defined in the first version of the SDIF standard, these 4 bytes are ASCII characters beginning with the numeral "1" and continuing with three more letters that have some mnenomic value, like "TDS" for "time domain samples," "TRC" for "tracks," etc.

The "1" corresponds to the version number of the SDIF specification and will be used to create future derivatives.

As the sets of standard frame and matrix types grow, not all new types will have meaningful ASCII type IDs. It's important to remember that these are just arbitrary 4-byte type fields.

We expect to partition this namespace to allow each group to have its own range of reserved type IDs for custom or experimental applications. Here's one idea: a type beginning with a particular byte, such as 11111111, indicates this kind of type ID, the second byte identifies the institution whose type it is (from a centrally managed table of SDIF developers), and the last two bytes provide each institution with 65536 unique IDs.

Stream IDs

We imagine that most programs that need to generate new Stream IDs will just use random numbers.

When merging SDIF files, you don't ever expect to see the same StreamID in both input files. If you do, you should change one of them so that the streams that came from separate input files will have different Stream IDs in the output. (If you find yourself in this situation, maybe it means that the files originally came from the same place and you should bring this to the user's attention...)

"Compound" Frame Types

A frame can contain any number of matrices, so it would be possible for a single frame to contain matrices with multiple sound descriptions, e.g., STFT results and fundamental frequency estimates. A frame has a single time tag, so all information in a frame must pertain to the same time.

In general, we discourage this use of SDIF, because it makes the frame type semantics weak. What frame type should be used when a frame contains matrix types from two or more standard frame types? A progam that does something with fundamental frequency information, for example, would normally read through the frames of an SDIF file looking for 1FQ0. If fundamental frequency estimates could be "hidden" inside other frames, this program would have to go through the matrices of every single frame.

We can imagine special cases where a frame might include a matrix of another type. For example, some applications might like to associate a fundamental frequency estimate (or set of estimates) with each individual frame of sinusoidal tracks. This information would of course be useful as a separate stream of 1FQ0 frames, but for applications that somehow rearranged the 1TRC frames, it could be useful to have the 1FQ0 matrices inside the 1TRC frames themselves.

Time Tags

Since every frame of data has its own time tag, SDIF data is in general sampled at arbitrary time points. Programs that read or synthesize SDIF files or streams should not assume that frames will come at a uniform time rate. This implies that, for many frame types, synthesizers will need to do some sort of interpolation to supply sound data at intermediate points between frames.

Negative Time Tags

Here are some possible uses for negative time tags.

One idea is to put all "initialization" information like synthesis patch configurations in frames with negative times.

Suppose a single SDIF file contains data from two separate sounds that the user wants to offset in time from each other. If time zero was with respect to the later of the two sounds, the earlier sound would start before time zero.

Suppose a program added a reverse reverberation or pre-echo effect to an SDIF file. It would make sense for the time base of the file to remain the same after the effect was added, but the beginning of the reverb should come before the time zero at which the original sound starts.

Extra Columns For Resonances

The original SDIF spec had four extra columns in the resonances matrix type:

Salience
Member
Number
Group

Here's how they were described:

Salience is for a parameter that specifies a sort order to enable pruning based presumably on a perceptual judgment of relative importance of each partial.

In many applications it is useful to accompany the resonance frame with a pitch matrix to describe the virtual pitches associated with the resonant sound."Harmonic" refers to which virtual pitch this partial is a harmonic member of. Virtual pitches are ordered from highest score and numbered here from 1 upwards. Which harmonic number of that virtual pitch (from 1 ) is in the "Number" Column. These two fields are necessary to accommodate inharmonic partials.

The "Group" column allows resonances to be grouped. The value is an arbitrary number common to all resonances in a group. A common use of this is to group frequency proximate resonances into clumps. This facilitates frequency scaling that preserves the beat frequencies between closely spaced resonances.

These extra fields seem apropos to other matrix types besides resonances, and seem like potentially useful optional columns.

back to SDIF Main Page