The resynthesis of a phase using this technique requires a considerable
amount of parameters to be generated. As a matter of fact, at a given time,
each partial requires at least 3 parameters: Its Frequency, Its Amplitude,
Its Phase. So at a sampling rate of 44.1 Khz and for controlling 1000 oscillators,
one second of sound needs 3*1000*44100 parameters to be generated.
Now we train a neural network using on the results of the previous
analysis so that it learns the spectral behavior of the instrument. We
use SNNS:
the Stuttgart Neural Network Simulator for the training process.
Once the network is trained, we obtain a very compact model that takes
pitch and loudness functions as inputs, and generates the suitable spectral
parameters as outputs. More details on that procedure can be
found in our ICMC'98
publication.
All we need now, to generate the numerous synthesis parameters in real-time,
are the two functions pitch and loudness. There are no more training
datas involved. When we feed the network with the pitch and loudness envelopes
of the phrase it was trained on, we obtain a convincing neural network
approximation
of the previous analysed phrase.
A visual
comparison illustrates how much spectral details is captured
by the neural net.
But we are not interested in simply reproducing phrases that the
model was trained on (although this already represents a huge data
reduction). We are willing to obtain a model general enough so that
it would be able to play any kind of new envelopes that is presented to
it. In other words we want to model the behavior of the whole instrument.
Therefore we analysed a second
suling phrase. We then present the pitch and loudness countours of
this second phrase to the previous network that was trained on the
first phrase. Altough the network was never trained on that particular
phrase, it produces a very
good generalization, again in real-time.