This section gives a brief overview of the SDIF format. The links from this section to more detailed descriptions below make this section also an overview or table of contents of this specification.
The SDIF format is a sequence of frames, similar to chunks in the IFF/AIFF/RIFF formats, but not strictly compatible. Every frame's length is a multiple of 8 bytes.
Every SDIF file or stream must begin with a small opening frame.
The body of the file or stream is a contiguous sequence of time-tagged frames, sorted in ascending temporal order, with multiple kinds of frames allowed in a single file or stream. A collection of standard frame and matrix types defines formats for storing common sound representations that are part of the SDIF standard.
SDIF has the notion of a "stream", a series of frames at different times that represent the same sonic object over time. The sequence of time-tagged frames is therefore a set of interleaved streams. (Note that there are two meanings of the word "stream" used with SDIF.)
A frame consists of the following:
The data in a frame are stored in 2D (or 1D or 0D) matrices of floating point numbers, with each column corresponding to a parameter like frequency or amplitude and each row representing an object like a filter, sinusoid, or noise band. A matrix consists of the following:
SDIF is an interchange format so portability is very important. Here is a list of the data types used in SDIF:
All frames within an SDIF file or stream have the following format:
FrameTypeID | char[4] | A unique 4-byte code indicating what kind of frame this is |
FrameDataSize | int32 | The size, in bytes, of the frame, not including the FrameType or FrameDataSize. |
Data | anything, as long as the size is a multiple of 8 bytes. | The contents of the frame |
The point of having each frame's size be a multiple of 8 bytes is that it ensures that 8-byte data will be properly word-aligned on machines with 64-bit architectures.
All SDIF files must begin with an "opening frame" that identifies the file as SDIF.
FrameTypeID | char[4] | 'SDIF' |
OpeningFrameSize | int32 | 8 |
SDIFSpecVersion | int32 | Version number of the SDIF Spec used to create this file. SDIF as described in this document is version 3. |
SDIFStandardTypesVersion | int32 | Version number of the collection of standard frame and matrix types used to create this file. |
Note that the Opening Frame has a size field just like any other frame, so if we found more things to put here we could make room for them in a future revision of the SDIF spec. Programs reading SDIF data should pay attention to the OpeningFrameSize rather than assume that the opening frame will be 16 bytes.
Why the opening frame's format is special.
The actual sound description data in an SDIF file are stored in a series of frames. A frame consists of a frame header followed by zero or more matrices. Most frames will consist of a single matrix.
Because a frame can contain any number of matrices, there is a possibility of "compound frames" containing multiple descriptions of the same sound at the same time.
Here's the format of the Frame Header:
FrameTypeID | char[4] | A unique 4-byte code indicating what kind of frame this is |
FrameSize | int32 | The size in bytes of the frame, not counting the FrameTypeID or FrameSize. |
Time | float64 | The time of the data frame. |
StreamID | int32 | The stream ID number. |
MatrixCount | int32 | The number of matrices in the frame |
There is a collection of types of SDIF frames, with one frame type per sound representation standardized in SDIF, e.g., fundamental frequency estimates, discrete STFT results, sinusoidal tracks, etc. The standard SDIF frame types are defined centrally by the maintainers of the SDIF standard. Each frame type has a corresponding 4-byte ID to identify frames of that type. The definition of a standard SDIF frame type includes a list of the types of matrices that must always appear in a frame of that type. A frame may always contain additional matrices beyond those required by the standard.
The semantics of frame type IDs is simply that they're unique integers. However, we select frame type IDs for standard frame types based on treating the 4 bytes as characters, similar to the way the "type" and "creator" fields for Macintosh files work.We use only characters allowed as identifiers in XML.
Programs should ignore frame types that they do not recognize while processing frame types that they do understand.
Frame Type IDs beginning with lowercase "x" are reserved for experimental sound representations.
Stream IDs allow a set of frames at different times to be associated together into a single "stream", a continuous series of data. Frames with the same Stream ID belong to the same stream. Thus, a single SDIF file or stream can have multiple interleaved series of frame data of the same type.
All of the frames with a given Stream ID must be of the same frame type.
These 4-byte IDs do not carry any semantic information; they serve only to differentiate streams from each other. The optional Stream Information Matrix provides a way to document the meanings of the different streams.
The time tag is a 64 bit floating point number that indicates the time to which the given frame applies. The units are seconds, as measured from a "time zero."
Most frame types are applicable to a single instant of time, e.g., the center point of a STFT window. This avoids issues of overlapping frames. It also implies that synthesizers for these frame types will have some sort of interpolation model to smoothly change data between frames.
A few frame types, e.g., time domain samples and envelope breakpoint functions, represent data that span a given interval of time; in these cases the frame's time tag indicates the beginning of the interval.
A frame's data consists of zero or more 2D matrices. The order of matrices within a frame does not matter. A frame may not contain more than one matrix with the same MatrixTypeID.
Each column corresponds to a parameter (or "slot") such as frequency or amplitude and each row represents an object such as a filter, sinusoid, or noise band. Therefore, in general, one-dimensional matrices should have one column and multiple rows. Elements of matrices appear in row-major order. The order of rows within a matrix may matter.
MatrixTypeID | char[4] | A unique 4-byte code indicating what kind of matrix this is. |
MatrixDataType | int32 | The type code of the matrix data. See the table below. |
RowCount | int32 | The number of rows in the matrix. |
ColumnCount | int32 | The number of columns in the matrix. |
MatrixData | float32, float64, int32, or int64 | The matrix data itself, in row-major order. |
OptionalBytePadding | byte[n] | Optional padding bytes to make the total size of the matrix be a multiple of 8 bytes |
The MatrixTypeID, MatrixDataType, RowCount, and ColumnCount fields comprise the "matrix header".
The MatrixTypeID is just like a FrameTypeID, except that there is a namespace of Matrix Types distinct from the namespace of Frame Types.
Most frame types consist of a single matrix; in these cases the Matrix Type ID is the same as the Frame Type ID. Some frame types consist of a main matrix of data plus a few extra fields in a secondary 1D matrix, e.g., time-domain-sample frames must include the sampling rate as well as the actual sample values. In these cases the naming convention is for the info matrix's MatrixTypeID to begin with the character "I" (for "info"), and have the same 3 final characters as the FrameTypeID.
The definition of a standard matrix type includes a list of all the required columns for that matrix type, an optional list of optional columns for that matrix type, and an explanation of how to interpret these columns. It is illegal for an SDIF matrix of a given type to have fewer columns than the number of required columns for that matrix type.On the other hand, an SDIF matrix may always have more columns after the required columns; in this case, the leftmost columns are the required columns and the remaining columns contain optional data.
The optional columns in the matrix type definition specify the interpretation of the columns immediately after the required columns.
If there are additional columns in a matrix after the required and optional columns, their interpretation is application-specific.
For example, suppose a standard matrix type has required frequency and gain columns and an optional phase column. This table shows how to interpret matrices of this type with varying numbers of columns:
Number of columns | Interpretation according to this example |
0 | Illegal |
1 | Illegal |
2 | Frequency, Gain |
3 | Frequency, Gain, Phase |
4 | Frequency, Gain, Phase, and one extra column |
5 | Frequency, Gain, Phase, and two extra columns |
6 or more | Frequency, Gain, Phase, and n-3 extra columns |
The MatrixDataType field represents the type of each data element in the matrix. All data elements in a matrix must have the same type. The definition of a standard matrix type includes a list of the allowable MatrixDataTypes for that matrix type.
Here is the table of MatrixDataTypes:
MatrixDataType | Type Name | Meaning | Num Bytes |
0x4 | float32 | 32-bit big-endian IEEE 754 float | 4 |
0x8 | float64 | 64-bit big-endian IEEE 754 float | 8 |
0x104 | int32 | 32-bit big-endian two's complement integer | 4 |
0x108 | int64 | 64-bit big-endian two's complement integer | 8 |
0x204 | uint42 | 32-bit big-endian unsigned integer | 4 |
0x301 | UTF8byte | Byte of UTF8-encoded text | 1 |
0x401 | byte | Arbitrary byte | 1 |
The low-order byte of the MatrixDataType encodes the number of bytes taken by each datum. This allows programs that see an unrecognized MatrixDataType to skip over the matrix.
The "Arbitrary byte" MatrixDataType allows any data to be embedded in an SDIF matrix, and should be used only as a last resort. Whenever possible, data represented in SDIF should use standard matrix and frame types. If no standard type will do the job, it is best to define a new matrix type where the rows and columns have some reasonable meaning. Only if a data object cannot be represented by a normal matrix is it appropriate to define an arbitrary binary format and use a matrix of "Arbitrary bytes."
SDIF matrices with text data use UTF-8 instead of ASCII. UTF-8 is a character encoding of Unicode that is backwards-compatible with ASCII: each byte with a high-order bit of zero is an ASCII character. Non-ASCII characters are represented by a variable number of bytes with a high-order bit of one. Thus, if a matrix of UTF-8 characters contains any non-ASCII characters, the number of characters will be less than the number of bytes.
UTF-8 strings must be null-terminated, and the null-termination byte must be counted as a matrix element in the row and column counts.
Note that the matrix header does not indicate the number of bytes in the matrix. The size of the MatrixData for a given matrix is:
RowCount * ColumnCount * (size of the MatrixDataType)
Since the low-order byte of the MatrixDataType encodes the number of bytes taken by each datum, the size of the matrix data can be computed from the matrix header.
If the size of the MatrixData is not a multiple of 8 bytes, there must be enough padding bytes to make the total matrix size a multiple of eight bytes: eight minus the remainder of the size of the MatrixData divided by 8.
These padding bytes are not counted anywhere in the matrix header, but they are counted in the FrameSize field of the frame header.
For example, suppose a frame contains a single matrix containing 65 bytes of UTF-8 text (including the null termination character). The matrix will have one column and 65 rows, with MatrixDataType 0x301. This makes the MatrixData 65 bytes long, so there are 7 padding bytes. The total size of the matrix is therefore 16 bytes (for the matrix header) plus 65 bytes (of data) plus 7 bytes (of padding) = 88 bytes. Therefore, the FrameSize will be 16 bytes (the size of the frame header, not including FrameTypeID or FrameSize) plus 88 bytes (the total size of the matrix) = 104 bytes.