Database of Challenging Musical Sounds for Evaluation and Refinement of Pitch Estimators

Adrian Freed and Tristan Jehan


Speech researchers have made the most thorough study of the performance of pitch estimation algorithms. A key to their work is the evaluation of algorithm performance against standardized databases of speech that have been "hand" analyzed. Such a database does not exist for musical signals. As a result, pitch estimation papers in the computer music community describe algorithms evaluated using short sound examples often chosen to show new work in the best light. It is thus impossible to predict performance of published algorithms in real musical situations, and difficult for researchers to identify fruitful areas for new work. We describe a publicly available database of musical sound files intended to redress these difficulties.

Musical Sound Database

Sounds in this database can be grouped into two important and hitherto poorly represented categories:

  1. Complete musical phrases are used to evaluate the impact of estimation errors in common and realistic musical contexts.
  2. Challenging examples areused to identify particular points of weakness from which an algorithm may suffer. Included are sounds with: pitch synchronous and additive noise, room ambiance, cross-talk from adjacent strings, ambiguous octaves, inharmonicity, missing fundamentals, glissandi, vibrato and trills.


The database will be available in early 1998 at You will be able to submit your own files to this database by filling in a form at the site. This form represents a contract that establishes you as the owner of the rights to the submitted files and granting permission for their analysis and re-distribution.

AIFF is the chosen format for sound file samples and SDIF for analyses of these files. The SDIF pitch frame type allows for a weighted set of pitches facilitating virtual pitches for inharmonic sounds and management of multiple pitch estimates.

Database Overview


For these string sounds a wide range of playing techniques were used including: open strings, low and high stopped, low and high frequency vibrato, narrow and wide trill, timbre change, sol ponticello, glissandi, tremelo near and away from bridge, pizzicato, pizzicato stopped, slow bow change,harmonics, damped rmonics, hammer on and pull off's, picked, left and right hand damping, slaps, bottleneck slide and pops.



In parallel with the archival activity assembling this database, we are exploring automatic segmentation and parameter estimation tools to develop analyses of the sounds against which algorithms may be judged. Early results using a wavelet technique are very promising. The wavelet method identifies each pitch period and provides a "voiced/unvoiced" estimate. Combining this with energy-based techniques results in good estimations for pitched regions of a phrase. The estimator is robust with impulsive and continuous noise.

Future Work


This work assembles materials developed over many years of work with support from: