COMPARATIVE ANALYSIS OF THE EFFECTIVENESS OF NEURAL NETWORKS AT DIFFERENT VALUES OF THE SNR RATIO

Aigul Kulakayeva; Valery Tikhvinskiy; Aigul Nurlankyzy; Timur Namazbayev

doi:10.37943/20TTRV6747

Authors

Aigul Kulakayeva International University of Information Technology, Kazakhstan https://orcid.org/0000-0002-0143-085X
Valery Tikhvinskiy International University of Information Technology, Kazakhstan https://orcid.org/0000-0002-3443-5171
Aigul Nurlankyzy Satbayev University, Kazakhstan https://orcid.org/0000-0002-0791-8573
Timur Namazbayev Al-Farabi Kazakh National University, Kazakhstan https://orcid.org/0000-0002-2389-2262

DOI:

https://doi.org/10.37943/20TTRV6747

Keywords:

artificial neural networks, convolutional neural network, recurrent neural network, voice activity detector, signal-to-noise ratio

Abstract

This work is devoted to a comparative analysis of the effectiveness of neural networks, CNN and RNN, at different SNR ratios. The research conducted within the framework of this work showed that CNN convolutional neural networks demonstrate higher efficiency in speech signal recognition tasks, regardless of different levels of SNR ratio and language. Thus, the CNN neural network showed stable superiority over RNN under all conditions, especially at low SNR ratios. It was revealed that with an increase in the SNR ratio, the difference in accuracy between the CNN and RNN neural networks decreases, but the CNN continues to lead, which indicates its higher adaptability and ability to learn under conditions of different noise and interference levels. It is especially important to note that the advantage of CNN becomes noticeable at low SNR values, where the accuracy of the RNN decreases more significantly. As a result, with an SNR ratio of 3 dB, the recognition accuracy using CNN was 80% for the Kazakh language, whereas RNN showed a result in the region of 75%. With an increase in the SNR ratio to 21 dB, the difference in accuracy between CNN and RNN decreased, but CNN continued to lead, reaching 88% accuracy compared to 86% for RNN. In addition, the results showed that the effectiveness of the CNN and RNN depended on the language in which they were trained. Neural networks trained in Kazakh showed the best results in recognizing Kazakh speech but also successfully coped with recognizing the Russian language. This highlights the importance of considering language features when developing and training neural networks to improve their performance in multilingual environments.

Author Biographies

Aigul Kulakayeva, International University of Information Technology, Kazakhstan

PhD, Associate Professor, Department of Radio Engineering and Electronics

Valery Tikhvinskiy, International University of Information Technology, Kazakhstan

Candidate of Technical Sciences, Professor, Department of Radio Engineering and Electronics

Aigul Nurlankyzy, Satbayev University, Kazakhstan

PhD candidate, Department of Electronics, Telecommunications and Space Technologies

Senior Lecturer, Department of Space Engineering

Energo University, Kazakhstan

Timur Namazbayev, Al-Farabi Kazakh National University, Kazakhstan

Senior lecturer, Department of Solid State Physics and Nonlinear Physics

References

Kumar S., Rani R., Chaudhari U. Real-time sign language detection: Empowering the disabled community (2024) MethodsX, 13, art. no. 102901. DOI: 10.1016/j.mex.2024.102901

Wang J., Saleem N., Gunawan T.S. Towards Efficient Recurrent Architectures: A Deep LSTM Neural Network Applied to Speech Enhancement and Recognition (2024) Cognitive Computation, 16 (3), pp. 1221 – 1236. https://doi.org/10.1007/s12559-024-10288-y

Tan Y.W., Ding X.F. Heterogeneous Convolutional Recurrent Neural Network with Attention Mechanism and Feature Aggregation for Voice Activity Detection (2024) APSIPA Transactions on Signal and Information Processing, 13 (1), art. no. e6. https://doi.org/10.1561/116.00000158

Janani S., Akhil Hassan G., Madhankumar S., Kumar M.A. Speech Enhancement Algorithm Analysis for a Reliable Speech Recognition System using Artificial Intelligence Methods (2023) 1st International Conference on Emerging Research in Computational Science, ICERCS 2023 – Proceedings. https://doi.org/10.1109/ICERCS57948.2023.10434226

D. Lim, H. Kang, B. Choi, W. Hong and J. Lee, "An Interpersonal Dynamics Analysis Procedure With Accurate Voice Activity Detection Using Low-Cost Recording Sensors," in IEEE Access, vol. 12, pp. 68427-68440, 2024, https://doi.org/10.1109/ACCESS.2024.3387279.

G. T.Y., B.G. N., Jayanna H.S. Development of noise robust real time automatic speech recognition system for Kannada language/dialects (2024) Engineering Applications of Artificial Intelligence, 135, art. no. 108693. https://doi.org/10.1016/j.engappai.2024.108693

Pavani C. Mumtaz B.M. CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing (2024) PeerJ Comput. Sci., https://doi.org/10.7717/peerj-cs.1901

George C., Thomas H., Stefan G. The effect of spoken language on speech enhancement using self-supervised speech representation loss functions. 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. October 22-25, 2023, New Paltz, NY.

Gurvich et al., "A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism," ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 8781-8785, https://doi.org/10.1109/ICASSP48485.2024.10446169.

Y. Zang, Y. Zhang, M. Heydari and Z. Duan, "SingFake: Singing Voice Deepfake Detection," ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 12156-12160, doi: 10.1109/ICASSP48485.2024.10448184.

Li, H., et al.: Speaker identification using Ultra‐Wideband measurement of voice. IET Radar Sonar Navig. 18(2), 266–276 (2024). https://doi.org/10.1049/rsn2.12525

Ouisaadane A., Safi S., Frikel M. An experiment of Moroccan dialect speech recognition in noisy environments using PocketSphinx (2024) International Journal of Speech Technology, 27 (2), pp. 329 – 339. https://doi.org/10.1007/s10772-024-10103-x

Nurlankyzy A., Akhmediyarova A., Zhetpisbayeva A., Namazbayev T., Yskak A., Yerzhan N., Medetov B. THE DEPENDENCE OF THE EFFECTIVENESS OF NEURAL NETWORKS FOR RECOGNIZING HUMAN VOICE ON LANGUAGE (2024) Eastern-European Journal of Enterprise Technologies, 1 (9(127)), pp. 72 – 81 https://doi.org/10.15587/1729-4061.2024.298687

Mussakhojayeva, S., Khassanov, Y., Varol, H.A.: KSC2: An Industrial-Scale Open-Source Kazakh Speech Corpus. In: Proceedings of the 23rd INTERSPEECH Conference: pp. 1367-1371. 2022.

Mussakhojayeva S., Khassanov Y., Atakan Varol H. (2021) A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English. In: Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science, vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_41

Mussakhojayeva, S.; Dauletbek, K.; Yeshpanov, R.; Varol, H.A. Multilingual Speech Recognition for Turkic Languages. Information 2023, 14, 74.

Musaev M., Mussakhojayeva S., Khujayorov I., Khassanov Y., Ochilov M., Atakan Varol H. (2021) USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments. In: Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science, vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-340

Ardila, R., Branson, M., Davis, K., Kohler, M., Meyer, J., Henretty, M., Morais,R., Saunders, L., Tyers, F.M., Weber, G.: Common voice: A massively-multilingualspeech corpus. In: LREC. pp. 4218–4222. ELRA (2020)

Font, F.; Roma, G.; Serra, X. Freesound Technical Demo. In Proceedings of the 21st ACM International Conference on Multimedia MM ’13, Barcelona, Spain, 21 October 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 411–412.

S. Yadav, P. A. D. Legaspi, M. S. O. Alink, A. B. J. Kokkeler and B. Nauta, "Hardware Implementations for Voice Activity Detection: Trends, Challenges and Outlook," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 3, pp. 1083-1096, March 2023, https://doi.org/10.1109/TCSI.2022.3225717

COMPARATIVE ANALYSIS OF THE EFFECTIVENESS OF NEURAL NETWORKS AT DIFFERENT VALUES OF THE SNR RATIO

Authors

DOI:

Keywords:

Abstract

Author Biographies

Aigul Kulakayeva, International University of Information Technology, Kazakhstan

Valery Tikhvinskiy, International University of Information Technology, Kazakhstan

Aigul Nurlankyzy, Satbayev University, Kazakhstan

Timur Namazbayev, Al-Farabi Kazakh National University, Kazakhstan

References

Downloads

Published

How to Cite

Issue

Section

License