• Keerthi Anand V D
Keywords: Speaker Recognition, Speech Signal, DWT, GMM Classifier


Speaker recognition plays an important role in a biometric based identification of the person using the information available in their speech signals. In any speaker recognition system, feature extraction using signal processing approaches is an important stage. In this paper, an efficient speaker recognition system is presented by extracting the energy features of the speech signals using Discrete Wavelet Transform (DWT). Then, the extracted DWT energy features are modeled using Gaussian mixture model (GMM) classifier for the recognition of the speaker. Results prove the efficiency of the speaker recognition system with an accuracy of 96.31% at 4th level DWT features with 16 Gaussian densities.


[1] Sukhwal and M. Kumar, “Comparative study of different classifiers based speaker recognition system using modified MFCC for noisy environment”, IEEE International Conference on In Green Computing and Internet of Things, International Conference on, 2015, pp. 976-980.

[2] Z. Senturk and O. Salor, “Effect of plosives on isolated speaker recognition system performance”, IEEE 9th International Conference on Electrical and Electronics Engineering, 2015, pp. 1263-1265.

[3] D. Snyder, D. Garcia-Romero, and D. Povey, “Time delay deep neural network-based universal background models for speaker recognition”, IEEE Workshop on Automatic Speech Recognition and Understanding, 2015 pp. 92-97.

[4] L. Li, Y. Lin, Z. Zhang, and D. Wang, “Improved deep speaker feature learning for text-dependent speaker recognition”, IEEE Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015, pp. 426-429.

[5] M. Jamali, V. Ghafarinia, and M. A. Montazeri, “Recognition of speaker-independent isolated Persian digits using an enhanced vector quantization algorithm”, IEEE Signal Processing and Intelligent Systems Conference, 2015, pp. 164-168.

[6] A. Poddar, M. Sahidullah, and G. Saha, “Performance comparison of speaker recognition systems in presence of duration variability”, IEEE Annual India Conference, 2015, pp. 1-6.

[7] M. Y. A. Khan, S. M. Hossain, and M. M. Hoque, “Isolated Bangla word recognition and speaker detection by semantic modular time delay neural network (MTDNN)”, IEEE 18th International Conference on Computer and Information Technology, 2015, pp. 560-565.

[8] P. Bansal, S. A. Imam, and R. Bharti, “Speaker recognition using MFCC, shifted MFCC with vector quantization and fuzzy”, IEEE International Conference on Soft Computing Techniques and Implementations, 2015, pp. 41-44.

[9] G. Liu, and J. H. Hansen, “An investigation into back-end advancements for speaker recognition in multi-session and noisy enrollment scenarios”, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol. 22, No. 12, 2014, 1978-1992.

[10] D. An, M. Shao, Z. Yuan, H. Shi, and Q. Pan, “Speaker Recognition Method Based on CPSO Clustering and KMP Algorithm”, IEEE 7th International Symposium on Computational Intelligence and Design, Vol. 1, 2014, pp. 556-559.

[11] M. Alsulaiman, “Effect of Spoken Text on Text-Independent Speaker Recognition”, IEEE 5th International Conference on Intelligent Systems, Modelling and Simulation, 2014, pp. 279-284.

[12] M. Zamalloa, L.J Rodríguez-Fuentes, M. Penagarikano, G. Bordel, and J. P. Uribe, “Feature dimensionality reduction through genetic algorithms for faster speaker recognition”, IEEE 16th European Signal Processing Conference, 2008, pp. 1-5.

[13] C.M Bishop), “Pattern recognition and machine learning”, Springer, Chapter 9, Vol. 1, 2006, pp.435.

[14] F. Cummins, M. Grimaldi, T. Leonard, and J. Simko, The CHAINS corpus: CHAracterizing INdividual Speakers. In Proc of SPECOM & rquo; 2006, pp. 431-435