Phoneme Recognition System Using Articulatory-Type Information

Alberto Patiño Saucedo, Franklin Alexander Sepulveda Sepulveda, Diego Ferney Gómez Cajas

Abstract


This work is frameworked within the development of phoneme recognition systems and seeks to establish whether the incorporation of information related to the movement of the articulators helps to improve the performance thereof. For this purpose, a pair of systems is compared and developed, where the acoustic model is obtained from training hidden Markov chains. The first system represents the voice signal by Mel Frequency Cepstral Coefficients; the second uses the same Cepstral coefficients but together with articulatory parameters. The experiments were conducted on the MOCHA-TIMIT database. The results show a significant increase in the system´s performance by adding articulatory parameters compared to that based only on Mel Frequency Cepstral Coefficients


Keywords


Mel-cepstrum coefficients; Hidden Markov Models; Articulatory parameters; Phoneme recognition

Full Text:

PDF HTML

References


Jinyu Li. Soft Margin Estimation for Automatic Speech Recognition. PhD thesis, Atlanta, GA, USA, 2008.

A. Mohamed, G.E. Dahl, and G. Hinton. “Acoustic modeling using deep belief networks”. Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 20(1), pp. 14-22, Jan 2012.

K.-F. Lee and H.-W. Hon. “Speaker-independent phone recognition using hidden Markov models”. Acoustics, Speech and Signal Processing, IEEE Transactions on, Vol. 37(11), pp. 1641-1648, Nov 1989.

Sadaoki Furui. “50 years of progress in speech and speaker recognition research”. ECTI Transactions on Computer and Information Technology, Vol. 1(2), pp. 64-74, 2005.

S. M. Witt and S. J. Young. “Phone-level pronunciation scoring and assessment for interactive language learning”. Speech Commun., Vol. 30(2-3), pp. 95-108, February 2000.

Alan A Wrench.” A multichannel articulatory database and its application for automatic speech recognition”. In Proceedings 5 th Seminar of Speech Production, 2000.

J. Pinto, S. Garimella, Magimai-Doss, H. Hermansky, and H. Bourlard. “Analysis of mlp-based hierarchical phoneme posterior probability estimator”. Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 19(2), pp. 225-241, Feb 2011.

G.S.V.S. Sivaram and H. Hermansky. “Sparse multilayer perceptron for phoneme recognition”. Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 20(1), pp. 23-29, Jan 2012.

JunHo Park and Hanseok Ko. “Real-time continuous phoneme recognition system using classdependent tied-mixture hmm with hbt structure for speech-driven lip-sync”. Multimedia, IEEE Transactions on, Vol. 10(7), pp. 1299-1306, Nov 2008.

J. Yousafzai, P. Sollich, Z. Cvetkovic, and Bin Yu. “Combined features and kernel design for noise robust phoneme classification using support vector machines”. Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 19(5), pp. 1396-1407, July 2011.

Diego Ferney Gómez-Cajas, Alexander Sepulveda-Sepulveda, and Mario Pinto-Serrano. “Parametrizaciones robustas de reconocimiento automático de habla (RAH) en redes de comunicaciones”. Ingeuan, Vol. 3(6), 2013.

E.J. Scheme, B. Hudgins, and P.A. Parker. “Myoelectric signal classification for phoneme based speech recognition”. Biomedical Engineering, IEEE Transactions on, Vol. 54(4), pp. 694-699, April 2007.

P. Heracleous, V.-A. Tran, T. Nagai, and K. Shikano. “Analysis and recognition of nam speech using hmm distances and visual information”. Audio, Speech, and Language Processing, IEEE Transactions on, Vol. 18(6), pp. 1528-1538, Aug 2010.

Ziheng Zhou, Guoying Zhao, Xiaopeng Hong, and Matti Pietikäinen. “A review of recent advances in visual speech decoding”. Image and Vision Computing, Vol. 32(9), pp. 590-605, 2014.

Alexander Sepulveda-Sepulveda, Rodrigo Capobianco-Guido, and Germán Castellanos-Dominguez. “Estimation of relevant time-frequency features using kendall coeficient for articulator position inference”. Speech Communication, Vol. 55(1), pp. 99-110, 2013.

K. Richmond. Estimating Articulatory Parameters from the Acoustic Speech Signal. PhD thesis, The Centre for Speech Technology Research, Edinburgh University, 2002.

P. Mermelstein S. Davis. “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 28, 1980.

H. Hon, X. Huang, A. Acero. Spoken Language Processing: A guide to theory, algorithm, and system development. Prentice Hall, 2011.

S. Young M. Gales. “The application of hidden markov models in speech recognition”. Foundations and Trends in Signal Processing, Vol. 1, 2007.

M. Gales S. Young, G. Evermann. The HTK Book. Revised for HTK Version 3.4. Cambridge University Engineering Department, 2006.

Wrench, A., and Richmond, K. Continuous speech recognition using articulatory data, Department of Speech and Language Sciences, Queen Margaret University College, Edinburgh, 2000.

M. Antal. Toward a simple phoneme based speech recognition system. Studia Universitatis Babes, Bolyai, Informatica, 2:41, 2007.

D. F. Gómez-Cajas, C. Peláez-Moreno, and F. Diaz-de-Maria, UEP-driven extended feature extraction for ASR over 3G speech channels. IEEE – CWCAS, Vol. 1, pp. 1-5, Nov. 2012.




DOI: http://dx.doi.org/10.18180/tecciencia.2015.9.3

Refbacks

  • There are currently no refbacks.


Copyright (c) 2015 TECCIENCIA