Building a voice model

The fundamental building block of our voice change technology is the Speaker Dependent Modivoice Model (SDMM). The model is formed by observing speech data from media sources . This proprietary model captures the essence of the target speaker voice character. Alternatively, the voice recordings used for model building may be provided by a voice donor via a studio recording session.

 

Speech Cross-generation Application

The cross-generation system essentially produces the voice change effect. Note that two models are needed. The input  (operator speaker) SDMM-x as well as the target speaker SDMM-y are used for morphing.  An illustration on how this works is shown below,

Sound Demo

Voice change:

Operator speech (input):
Target speech (output):

 

 

Speech Re-generation Application

Re-generation system uses a single SDMM for the input speaker. It acts to restore recordings corrupted with unwanted distortions such as background noise or acoustic reverberation. This is not the primary application area for modivoice but we include it here to demonstrate the versatility of the SDMM technology.

Sound Demo

Speech signal degradation – Ambient noise:

Degraded input speech:
Clean output speech:

Speech signal degradation – Reverberation:

Degraded input speech:
Clean output speech: