SDIF Specification

9/29/99 This version by Matthew Wright, incorporating changes made in June 1999 by Xavier Rodet, Diemo Schwarz, and Matthew Wright. 3/24/4 update fixed "A frame may not contain more than one matrix with the same MatrixTypeID."

Overview

This section gives a brief overview of the SDIF format. The links from this section to more detailed descriptions below make this section also an overview or table of contents of this specification.

The SDIF format is a sequence of frames, similar to chunks in the IFF/AIFF/RIFF formats, but not strictly compatible. Every frame's length is a multiple of 8 bytes.

Every SDIF file or stream must begin with a small opening frame.

The body of the file or stream is a contiguous sequence of time-tagged frames, sorted in ascending temporal order, with multiple kinds of frames allowed in a single file or stream. A collection of standard frame and matrix types defines formats for storing common sound representations that are part of the SDIF standard.

SDIF has the notion of a "stream", a series of frames at different times that represent the same sonic object over time. The sequence of time-tagged frames is therefore a set of interleaved streams. (Note that there are two meanings of the word "stream" used with SDIF.)

A frame consists of the following:

The data in a frame are stored in 2D (or 1D or 0D) matrices of floating point numbers, with each column corresponding to a parameter like frequency or amplitude and each row representing an object like a filter, sinusoid, or noise band. A matrix consists of the following:

Data Types

SDIF is an interchange format so portability is very important. Here is a list of the data types used in SDIF:

General Format of an SDIF Frame

All frames within an SDIF file or stream have the following format:

FrameTypeID char[4] A unique 4-byte code indicating what kind of frame this is
FrameDataSize int32 The size, in bytes, of the frame, not including the FrameType or FrameDataSize.
Data anything, as long as the size is a multiple of 8 bytes. The contents of the frame

The point of having each frame's size be a multiple of 8 bytes is that it ensures that 8-byte data will be properly word-aligned on machines with 64-bit architectures.

Opening Frame

All SDIF files must begin with an "opening frame" that identifies the file as SDIF.

FrameTypeID char[4] 'SDIF'
OpeningFrameSize int32 8
SDIFSpecVersion int32 Version number of the SDIF Spec used to create this file. SDIF as described in this document is version 3.
SDIFStandardTypesVersion int32 Version number of the collection of standard frame and matrix types used to create this file.

Note that the Opening Frame has a size field just like any other frame, so if we found more things to put here we could make room for them in a future revision of the SDIF spec. Programs reading SDIF data should pay attention to the OpeningFrameSize rather than assume that the opening frame will be 16 bytes.

Why the opening frame's format is special.

Data Frames

The actual sound description data in an SDIF file are stored in a series of frames. A frame consists of a frame header followed by zero or more matrices. Most frames will consist of a single matrix.

Because a frame can contain any number of matrices, there is a possibility of "compound frames" containing multiple descriptions of the same sound at the same time.

Here's the format of the Frame Header:

FrameTypeID char[4] A unique 4-byte code indicating what kind of frame this is
FrameSize int32 The size in bytes of the frame, not counting the FrameTypeID or FrameSize.
Time float64 The time of the data frame.
StreamID int32 The stream ID number.
MatrixCount int32 The number of matrices in the frame

Frame Type ID

There is a collection of types of SDIF frames, with one frame type per sound representation standardized in SDIF, e.g., fundamental frequency estimates, discrete STFT results, sinusoidal tracks, etc. The standard SDIF frame types are defined centrally by the maintainers of the SDIF standard. Each frame type has a corresponding 4-byte ID to identify frames of that type. The definition of a standard SDIF frame type includes a list of the types of matrices that must always appear in a frame of that type. A frame may always contain additional matrices beyond those required by the standard.

The semantics of frame type IDs is simply that they're unique integers. However, we select frame type IDs for standard frame types based on treating the 4 bytes as characters, similar to the way the "type" and "creator" fields for Macintosh files work.We use only characters allowed as identifiers in XML.

Programs should ignore frame types that they do not recognize while processing frame types that they do understand.

Frame Type IDs beginning with lowercase "x" are reserved for experimental sound representations.

Stream ID

Stream IDs allow a set of frames at different times to be associated together into a single "stream", a continuous series of data. Frames with the same Stream ID belong to the same stream. Thus, a single SDIF file or stream can have multiple interleaved series of frame data of the same type.

All of the frames with a given Stream ID must be of the same frame type.

These 4-byte IDs do not carry any semantic information; they serve only to differentiate streams from each other. The optional Stream Information Matrix provides a way to document the meanings of the different streams.

More thoughts on Stream IDs.

Time Tag

The time tag is a 64 bit floating point number that indicates the time to which the given frame applies. The units are seconds, as measured from a "time zero."

Most frame types are applicable to a single instant of time, e.g., the center point of a STFT window. This avoids issues of overlapping frames. It also implies that synthesizers for these frame types will have some sort of interpolation model to smoothly change data between frames.

A few frame types, e.g., time domain samples and envelope breakpoint functions, represent data that span a given interval of time; in these cases the frame's time tag indicates the beginning of the interval.

More thoughts on time tags.

Matrices

A frame's data consists of zero or more 2D matrices. The order of matrices within a frame does not matter. A frame may not contain more than one matrix with the same MatrixTypeID.

Each column corresponds to a parameter (or "slot") such as frequency or amplitude and each row represents an object such as a filter, sinusoid, or noise band. Therefore, in general, one-dimensional matrices should have one column and multiple rows. Elements of matrices appear in row-major order. The order of rows within a matrix may matter.

MatrixTypeID char[4] A unique 4-byte code indicating what kind of matrix this is.
MatrixDataType int32 The type code of the matrix data. See the table below.
RowCount int32 The number of rows in the matrix.
ColumnCount int32 The number of columns in the matrix.
MatrixData float32, float64, int32, or int64 The matrix data itself, in row-major order.
OptionalBytePadding byte[n] Optional padding bytes to make the total size of the matrix be a multiple of 8 bytes

The MatrixTypeID, MatrixDataType, RowCount, and ColumnCount fields comprise the "matrix header".

MatrixTypeID

The MatrixTypeID is just like a FrameTypeID, except that there is a namespace of Matrix Types distinct from the namespace of Frame Types.

Most frame types consist of a single matrix; in these cases the Matrix Type ID is the same as the Frame Type ID. Some frame types consist of a main matrix of data plus a few extra fields in a secondary 1D matrix, e.g., time-domain-sample frames must include the sampling rate as well as the actual sample values. In these cases the naming convention is for the info matrix's MatrixTypeID to begin with the character "I" (for "info"), and have the same 3 final characters as the FrameTypeID.

Matrix Columns

The definition of a standard matrix type includes a list of all the required columns for that matrix type, an optional list of optional columns for that matrix type, and an explanation of how to interpret these columns. It is illegal for an SDIF matrix of a given type to have fewer columns than the number of required columns for that matrix type.On the other hand, an SDIF matrix may always have more columns after the required columns; in this case, the leftmost columns are the required columns and the remaining columns contain optional data.

The optional columns in the matrix type definition specify the interpretation of the columns immediately after the required columns.

If there are additional columns in a matrix after the required and optional columns, their interpretation is application-specific.

For example, suppose a standard matrix type has required frequency and gain columns and an optional phase column. This table shows how to interpret matrices of this type with varying numbers of columns:

Number of columns Interpretation according to this example
0 Illegal
1 Illegal
2 Frequency, Gain
3 Frequency, Gain, Phase
4 Frequency, Gain, Phase, and one extra column
5 Frequency, Gain, Phase, and two extra columns
6 or more Frequency, Gain, Phase, and n-3 extra columns

Why allow optional columns?

Matrix Data Type

The MatrixDataType field represents the type of each data element in the matrix. All data elements in a matrix must have the same type. The definition of a standard matrix type includes a list of the allowable MatrixDataTypes for that matrix type.

Here is the table of MatrixDataTypes:

MatrixDataType Type Name Meaning Num Bytes
0x4 float32 32-bit big-endian IEEE 754 float 4
0x8 float64 64-bit big-endian IEEE 754 float 8
0x104 int32 32-bit big-endian two's complement integer 4
0x108 int64 64-bit big-endian two's complement integer 8
0x204 uint42 32-bit big-endian unsigned integer 4
0x301 UTF8byte Byte of UTF8-encoded text 1
0x401 byte Arbitrary byte 1

The low-order byte of the MatrixDataType encodes the number of bytes taken by each datum. This allows programs that see an unrecognized MatrixDataType to skip over the matrix.

The "Arbitrary byte" MatrixDataType allows any data to be embedded in an SDIF matrix, and should be used only as a last resort. Whenever possible, data represented in SDIF should use standard matrix and frame types. If no standard type will do the job, it is best to define a new matrix type where the rows and columns have some reasonable meaning. Only if a data object cannot be represented by a normal matrix is it appropriate to define an arbitrary binary format and use a matrix of "Arbitrary bytes."

Text in SDIF

SDIF matrices with text data use UTF-8 instead of ASCII. UTF-8 is a character encoding of Unicode that is backwards-compatible with ASCII: each byte with a high-order bit of zero is an ASCII character. Non-ASCII characters are represented by a variable number of bytes with a high-order bit of one. Thus, if a matrix of UTF-8 characters contains any non-ASCII characters, the number of characters will be less than the number of bytes.

UTF-8 strings must be null-terminated, and the null-termination byte must be counted as a matrix element in the row and column counts.

Byte Padding

Note that the matrix header does not indicate the number of bytes in the matrix. The size of the MatrixData for a given matrix is:

RowCount * ColumnCount * (size of the MatrixDataType)

Since the low-order byte of the MatrixDataType encodes the number of bytes taken by each datum, the size of the matrix data can be computed from the matrix header.

If the size of the MatrixData is not a multiple of 8 bytes, there must be enough padding bytes to make the total matrix size a multiple of eight bytes: eight minus the remainder of the size of the MatrixData divided by 8.

These padding bytes are not counted anywhere in the matrix header, but they are counted in the FrameSize field of the frame header.

For example, suppose a frame contains a single matrix containing 65 bytes of UTF-8 text (including the null termination character). The matrix will have one column and 65 rows, with MatrixDataType 0x301. This makes the MatrixData 65 bytes long, so there are 7 padding bytes. The total size of the matrix is therefore 16 bytes (for the matrix header) plus 65 bytes (of data) plus 7 bytes (of padding) = 88 bytes. Therefore, the FrameSize will be 16 bytes (the size of the frame header, not including FrameTypeID or FrameSize) plus 88 bytes (the total size of the matrix) = 104 bytes.


back to SDIF Main Page