Support vector regression for tongue position inference

Alexander Sepúlveda



>The articulatory inversion task consists on recovering the articulators’ position or the vocal tract shape from the acoustic speech signal. The availability of large corpora of parallel acoustic and articulatory data has made possible the use of data-driven methods as an alternative for the solution of the speech inversion problem. This paper presents a method for the inference of tongue positions based on support vector regression techniques. The acoustic speech signal is parametrized by using perceptual linear prediction coefficients (PLP); than, a nonlinear transformation function is applied to the regressors. Model assessment is performed by measuring the similarity between the estimated and the reference signals and by measuring the correlation between inputs and residuals. The proposedmethod shows to be promising.

Full Text:



O. Engwall, Vocal tract modelling in 3D,” TMH-QPSR, vol. 40, no. 1-2, pp. 31-38,1999.

Advances in Speech Signal Processing, ch. Speech coding based on physiological models of speech production. Marcel Decker, 1992.

J. Frankel and S. King, Speech recognition using linear dynamic models,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 246-256, 2007.

B. Potard, Y. Laprie, and S. Ouni, Incorporation of phonetic constraints in acoustic-to-articulatory inversion,” Journal of Acoustical Society of America, vol. 123, no. 4, pp. 2310-2323, 2008.

J. Hogden, A. Lofqvist, V. Gracco, I. Zlokarnik, P. Rubin, and E. Saltzman, Accurate recovery of articulator positions from acoustics: new conclusions based on human data,” Journal of Acoustical Society of America, 1996.

K. Richmond, S. King, and P. Taylor, Modelling the uncertainty in recovering articulation from acoustics,” Computer, Speech & Language, vol. 17, pp. 153-172, 2003.

L. Zhang and S. Renals, Acousticarticulatory modeling with the trajectory HMM,” IEEE Signal Processing Letters, vol. 15, pp. 245 { 248, 2008.

A. Toutios and K. Margaritis, Contribution to statistical acoustic-to-EMA mapping,” in 16th European Signal Processing Conference (EUSIPCO-2008), 2008.

T. Toda, A. Black, and K. Tokuda, Statistical mapping between articulatory movements and acoustic spectrum using gaussian mixture models,” Speech Communication, 2008.

C. Qin, and M. Carreira-Perpiñan, A comparison of acoustic features for articulatory inversion,” in InterSpeech.

Moder Regression Methods. John Wiley & Sons, 1997.

Speech Processing and Synthesis Toolboxes. John Wiley & Sons, 2000.

H. Yang, S. Vuuren, S. Sharma, and H. Hermansky, Relevance of time-frequency features for phonetic and speaker channel classification," Speech Communication, vol. 31, pp. 35-50, 2000.

Adcances in Kernel Methods – Support Vector Learning, ch. Making large-Scale SVM Learning Practical. MIT Press, 1999.

V. Cherkassky and Y. Ma, Practical selection of SVM parameters and noise estimation for SVM regression,” Neural Networks, 2004.

e. a. Wanga, Wenjian, Determination of the spread parameter in the Gaussian kernel for classification and regression," Neurocomputing, 2003.

Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 2001.

F. A. Sep_ulveda, G. Castellanos ,J. I. Godino-Llorente, Acoustic Analysis of the Stop Consonants for Detecting Hypernasal Speech,” in 4th International Symposium on Image/Video Communications, ISIVC2008, Bilbao-Spain.

System Identification: Theory for the user. Prentice Hall PTR, 1999.


  • There are currently no refbacks.

Copyright (c) 2014 TECCIENCIA