COMPARATIVE ANALYSIS OF THE EFFECTIVENESS OF NEURAL NETWORKS AT DIFFERENT VALUES OF THE SNR RATIO
DOI:
https://doi.org/10.37943/20TTRV6747Keywords:
artificial neural networks, convolutional neural network, recurrent neural network, voice activity detector, signal-to-noise ratioAbstract
This work is devoted to a comparative analysis of the effectiveness of neural networks, CNN and RNN, at different SNR ratios. The research conducted within the framework of this work showed that CNN convolutional neural networks demonstrate higher efficiency in speech signal recognition tasks, regardless of different levels of SNR ratio and language. Thus, the CNN neural network showed stable superiority over RNN under all conditions, especially at low SNR ratios. It was revealed that with an increase in the SNR ratio, the difference in accuracy between the CNN and RNN neural networks decreases, but the CNN continues to lead, which indicates its higher adaptability and ability to learn under conditions of different noise and interference levels. It is especially important to note that the advantage of CNN becomes noticeable at low SNR values, where the accuracy of the RNN decreases more significantly. As a result, with an SNR ratio of 3 dB, the recognition accuracy using CNN was 80% for the Kazakh language, whereas RNN showed a result in the region of 75%. With an increase in the SNR ratio to 21 dB, the difference in accuracy between CNN and RNN decreased, but CNN continued to lead, reaching 88% accuracy compared to 86% for RNN. In addition, the results showed that the effectiveness of the CNN and RNN depended on the language in which they were trained. Neural networks trained in Kazakh showed the best results in recognizing Kazakh speech but also successfully coped with recognizing the Russian language. This highlights the importance of considering language features when developing and training neural networks to improve their performance in multilingual environments.
References
Kumar S., Rani R., Chaudhari U. Real-time sign language detection: Empowering the disabled community (2024) MethodsX, 13, art. no. 102901. DOI: 10.1016/j.mex.2024.102901
Wang J., Saleem N., Gunawan T.S. Towards Efficient Recurrent Architectures: A Deep LSTM Neural Network Applied to Speech Enhancement and Recognition (2024) Cognitive Computation, 16 (3), pp. 1221 – 1236. https://doi.org/10.1007/s12559-024-10288-y
Tan Y.W., Ding X.F. Heterogeneous Convolutional Recurrent Neural Network with Attention Mechanism and Feature Aggregation for Voice Activity Detection (2024) APSIPA Transactions on Signal and Information Processing, 13 (1), art. no. e6. https://doi.org/10.1561/116.00000158
Janani S., Akhil Hassan G., Madhankumar S., Kumar M.A. Speech Enhancement Algorithm Analysis for a Reliable Speech Recognition System using Artificial Intelligence Methods (2023) 1st International Conference on Emerging Research in Computational Science, ICERCS 2023 – Proceedings. https://doi.org/10.1109/ICERCS57948.2023.10434226
D. Lim, H. Kang, B. Choi, W. Hong and J. Lee, "An Interpersonal Dynamics Analysis Procedure With Accurate Voice Activity Detection Using Low-Cost Recording Sensors," in IEEE Access, vol. 12, pp. 68427-68440, 2024, https://doi.org/10.1109/ACCESS.2024.3387279.
G. T.Y., B.G. N., Jayanna H.S. Development of noise robust real time automatic speech recognition system for Kannada language/dialects (2024) Engineering Applications of Artificial Intelligence, 135, art. no. 108693. https://doi.org/10.1016/j.engappai.2024.108693
Pavani C. Mumtaz B.M. CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing (2024) PeerJ Comput. Sci., https://doi.org/10.7717/peerj-cs.1901
George C., Thomas H., Stefan G. The effect of spoken language on speech enhancement using self-supervised speech representation loss functions. 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. October 22-25, 2023, New Paltz, NY.
Gurvich et al., "A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism," ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 8781-8785, https://doi.org/10.1109/ICASSP48485.2024.10446169.
Y. Zang, Y. Zhang, M. Heydari and Z. Duan, "SingFake: Singing Voice Deepfake Detection," ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 12156-12160, doi: 10.1109/ICASSP48485.2024.10448184.
Li, H., et al.: Speaker identification using Ultra‐Wideband measurement of voice. IET Radar Sonar Navig. 18(2), 266–276 (2024). https://doi.org/10.1049/rsn2.12525
Ouisaadane A., Safi S., Frikel M. An experiment of Moroccan dialect speech recognition in noisy environments using PocketSphinx (2024) International Journal of Speech Technology, 27 (2), pp. 329 – 339. https://doi.org/10.1007/s10772-024-10103-x
Nurlankyzy A., Akhmediyarova A., Zhetpisbayeva A., Namazbayev T., Yskak A., Yerzhan N., Medetov B. THE DEPENDENCE OF THE EFFECTIVENESS OF NEURAL NETWORKS FOR RECOGNIZING HUMAN VOICE ON LANGUAGE (2024) Eastern-European Journal of Enterprise Technologies, 1 (9(127)), pp. 72 – 81 https://doi.org/10.15587/1729-4061.2024.298687
Mussakhojayeva, S., Khassanov, Y., Varol, H.A.: KSC2: An Industrial-Scale Open-Source Kazakh Speech Corpus. In: Proceedings of the 23rd INTERSPEECH Conference: pp. 1367-1371. 2022.
Mussakhojayeva S., Khassanov Y., Atakan Varol H. (2021) A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English. In: Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science, vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_41
Mussakhojayeva, S.; Dauletbek, K.; Yeshpanov, R.; Varol, H.A. Multilingual Speech Recognition for Turkic Languages. Information 2023, 14, 74.
Musaev M., Mussakhojayeva S., Khujayorov I., Khassanov Y., Ochilov M., Atakan Varol H. (2021) USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments. In: Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science, vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-340
Ardila, R., Branson, M., Davis, K., Kohler, M., Meyer, J., Henretty, M., Morais,R., Saunders, L., Tyers, F.M., Weber, G.: Common voice: A massively-multilingualspeech corpus. In: LREC. pp. 4218–4222. ELRA (2020)
Font, F.; Roma, G.; Serra, X. Freesound Technical Demo. In Proceedings of the 21st ACM International Conference on Multimedia MM ’13, Barcelona, Spain, 21 October 2013; Association for Computing Machinery: New York, NY, USA, 2013; pp. 411–412.
S. Yadav, P. A. D. Legaspi, M. S. O. Alink, A. B. J. Kokkeler and B. Nauta, "Hardware Implementations for Voice Activity Detection: Trends, Challenges and Outlook," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 3, pp. 1083-1096, March 2023, https://doi.org/10.1109/TCSI.2022.3225717
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Articles are open access under the Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish a manuscript in this journal agree to the following terms:
- The authors reserve the right to authorship of their work and transfer to the journal the right of first publication under the terms of the Creative Commons Attribution License, which allows others to freely distribute the published work with a mandatory link to the the original work and the first publication of the work in this journal.
- Authors have the right to conclude independent additional agreements that relate to the non-exclusive distribution of the work in the form in which it was published by this journal (for example, to post the work in the electronic repository of the institution or publish as part of a monograph), providing the link to the first publication of the work in this journal.
- Other terms stated in the Copyright Agreement.