We propose a novel application of the acoustic- to-articulatory inversion (AAI) towards a quality assessment of the voice converted speech. The ability of humans to speak effortlessly requires the coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards a naturalness, intelligibility and speaker’s identity (which is partially present in voice converted speech). Hence, during voice conversion (VC), the information related to the speech production is lost. In this paper, this loss is quantified for a male voice, by showing an increase in RMSE error (up to 12.7 % in tongue tip) for voice converted speech followed by showing a decrease in mutual information (I) (by 8.7 %). Similar results are obtained in the case of a female voice. This observation is extended by showing that the articulatory features can be used as an objective measure. The effectiveness of the proposed measure over MCD is illustrated by comparing their correlation with a Mean Opinion Score (MOS). Moreover, the preference score of MCD contradicted ABX test by 100 %, whereas the proposed measure supported ABX test by 45.8 % and 16.7% in the case of female-to-male and male-to-female VC, respectively.

Added on April 17, 2020


  • Contributed by : Consortium
  • Product Type : Research Paper
  • License Type : Freeware
  • System Requirement : Not Applicable
  • Author : Avni Rajpal, Nirmesh J. Shah, Mohammadi Zaki, Hemant A. Patil
