DEEP AND MACHINE LEARNING MODELS FOR RECOGNIZING STATIC AND DYNAMIC GESTURES OF THE KAZAKH ALPHABET
DOI:
https://doi.org/10.37943/18JYLU4904Keywords:
Hand gesture recognition, neural networks, SVM, LSTM, CNN, MediaPipe.Abstract
Currently, an increasing amount of research is directed towards solving tasks using computer vision libraries and artificial intelligence tools. Most common are the solutions and approaches utilizing machine and deep learning models of artificial neural networks for recognizing gestures of the Kazakh sign language based on supervised learning methods and deep learning for processing sequential data. The research object is the Kazakh sign language alphabet aimed at facilitating communication for individuals with limited abilities. The research subject comprises machine learning methods and models of artificial neural networks and deep learning for gesture classification and recognition. The research areas encompass Machine Learning, Deep Learning, Neural Networks, and Computer Vision.
The main challenge lies in recognizing dynamic hand gestures. In the Kazakh sign language alphabet, there are 42 letters, with 12 of them being dynamic. Processing, capturing, and recognizing gestures in motion, particularly in dynamics, pose a highly complex task. It is imperative to employ modern technologies and unconventional approaches by combining various recognition methods/algorithms to develop and construct a hybrid neural network model for gesture recognition. Gesture recognition is a classification task, which is one of the directions of pattern recognition. The fundamental basis of recognition is the theory of pattern recognition. The paper discusses pattern recognition systems, the environment and application areas of these systems, and the requirements for their development and improvement. It presents tasks such as license plate recognition, facial recognition, and gesture recognition. The field of computer vision in image recognition, specifically hand gestures, is also addressed. The development of software will enable the testing of the trained model's effectiveness and its application for laboratory purposes, allowing for adjustments to improve the model.
References
Mukhanov, S.B., & Uskenbayeva, R.K. (2020). Pattern Recognition with Using Effective Algorithms and Methods of Computer Vision Library. Advances in Intelligent Systems and Computing, Article 991, 810-819. https://doi.org/10.1007/978-3-030-21803-4_81
Mukhanov, S., Uskenbayeva, R., Young, I.Ch., Kabyl, D., Les, N., & Amangeldi, M. (2023). Gesture Recognition of Machine Learning and Convolutional Neural Network Methods for Kazakh Sign Language. Scientific Journal of Astana IT University. 15(15), 85–100. https://doi.org/10.37943/15LPCU4095
Amirgaliev, E.N., Mukhanov, S.B., Zheksenov, D.B., Kalzhigitov, N.K., Lee, A.S., Evdokimov, D.D., & Kenshimov, C. (2023) A comparative analysis of neural network models for hand gesture recognition methods. Bulletin of the National Engineering Academy of the Republic of Kazakhstan. 2(88), 15-27. https://doi.org/10.47533/2023.1606-146X.2
Kenshimov, C., Mukhanov, S., Merembayev, T., & Yedilkhan, D. (2021). A comparison of convolutional neural networks for Kazakh sign language recognition. Eastern-European Journal of Enterprise Technologies, 5(2 (113), 44–54. https://doi.org/10.15587/1729-4061.2021.241535
Aitulen, A.D., & Mukhanov, S.B. (2019). Processing, identification and recognition by Viola-Jones method. VESTNIK KazNRTU. 6(136), 155-161.
Uskenbayeva, R.K., & Mukhanov S.B. (2020). Contour analysis of external images. Proceeding of the ACM International Conference Proceeding Series, Article 3410811. https://doi.org/10.1145/3410352.3410811
Bazarevsky, V., & Fan, Zh. (2019). On-device, real-time hand tracking with mediapipe. Google AI Blog. Available at: https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html.
Vidyanova, A. (2022). In the USA, they are interested in the development of Kazakhs for the deaf. Capital. https://kapital.kz/business/105455/v-ssha-zainteresovalis-razrabotkoykazakhstantsev-dlya-glukhikh.html
Bazarevsky, V., & Fan Zh. (2019, August 19). On-device, real-time hand tracking with mediapipe. Google AI Blog. https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html.
Wang, Y., Wang, H., & He, X. (2020). Sign language recognition based on deep convolutional neural network. IEEE Access, 8, 64990-64999. https://doi.org/10.3390/electronics12040786.
Lee, A. R., Cho, Y., Jin, S., & Kim, N. (2020). Enhancement of surgical hand gesture recognition using a capsule network for a contactless interface in the operating room. Computer methods and programs in biomedicine, 190, 105385. https://doi.org/10.1016/j.cmpb.2020.105385.
Bilgin, M., & Mutludogan, K. (2019). American Sign Language character recognition with capsule networks. Proceedings of the 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, Ankara, Turkey. https://doi.org/10.1109/ismsit.2019.8932829.
Kudubaeva, S.A., Ryumin, D.A. & Kalzhanov M.U. (2016). Support vector machine for sign speech recognition using the KINECT sensor. Bulletin of KazNU. Series "Mathematics, mechanics, computer science". 91(3). https://bm.kaznu.kz/index.php/kaznu/article/view/541
Adithya, V., & Reghunadhan R. (2020). A deep convolutional neural network approach for static hand gesture recognition. Procedia Computer Science. (171), 2353-2361. https://doi.org/10.1016/j.procs.2020.04.255.
Lai, K., & Yanushkevich, S. N. (2018). CNN+RNN depth and skeleton based dynamic hand gesture recognition. Proceeding of the 24th International Conference on Pattern Recognition (ICPR), IEEE. https://doi.org/10.1109/ICPR.2018.8545718
Merembayev, T., Kurmangaliyev, D., Bekbauov, B., & Amanbek, Y. (2021). A Comparison of Machine Learning Algorithms in Predicting Lithofacies: Case Studies from Norway and Kazakhstan. Energies, 14(7), 1896. https://doi.org/10.3390/en14071896
Mantecón, T., del Blanco, C.R., Jaureguizar, F., & García, N. (2016) Hand gesture recognition using infrared imagery provided by leap motion controller. Int. Conf. on Advanced Concepts for Intelligent Vision Systems, Lecce, Italy, 47-57, 24-27. https://doi.org/10.1007/978-3-319-48680-2_5.
Kumar, A., Thankachan, K., & Dominic, M.M. (2016) Sign language recognition. Proceedings of the 3rd IEEE international conference on recent advances in information technology (RAIT), 422–428. https://doi.org/10.1109/rait.2016.7507939.
Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, 6105-6114. https://arxiv.org/abs/1905.11946.
Lau, S., Gonzaltz, J., & Nolan, D. (2023). Learning Data Science. O'Reilly Media, Inc. 596.
McKinney, W. (2022). Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter (3rd ed.) O'Reilly Media.
Merembayev, T., Kurmangaliyev, D., Bekbauov, B., & Amanbek, Y. (2021). Comparison of machine learning algorithms in predicting lithofacies: Case studies from Norway and Kazakhstan. Energies. 14(7), 1896.
Zhang, Y., Cao, C., Cheng, J., & Lu, H. (2018). EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition. IEEE Transactions on Multimedia. 20(5). https://doi.org/10.1109/TMM.2018.2808769
Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE internet of things journal. 3(5), 637-646. https://doi.org/10.1109/JIOT.2016.2579198
Wong, B.P., & Kerkez, B. (2016). Real-time environmental sensor data: An application to water quality using web services. Environmental Modelling & Software. 84, 505-517. https://doi.org/10.1016/j.envsoft.2016.07.020
Granell, C., Havlik, D., Schade, S., Sabeur, Z., Delaney, C., Pielorz, J., & Mon, J.L. (2016). Future Internet technologies for environmental applications. Environmental Modelling & Software. 78, 1-15.
Alvarez, M.A., & Lawrence, N.D. (2011). Computationally efficient convolved multiple output Gaussian processes. The Journal of Machine Learning Research. 12, 1459-1500.
Futoma, J., Hariharan, S., & Heller, K. (2017). Learning to detect sepsis with a multitask Gaussian process RNN classifier. Proceedings of the International conference on machine learning (PMLR).1, 1174-1182.
Elman, A., & Hill, J. (2006). Data analysis using regression multilevel/hierarchical models. Cambridge university press. 122.
Wang, Y., Wang, H., & He, X. (2020). Sign language recognition based on deep convolutional neural network. IEEE Access, 8, 64990-64999. https://doi.org/10.3390/electronics12040786
Lee, A. R., Cho, Y., Jin, S., & Kim, N. (2020). Enhancement of surgical hand gesture recognition using a capsule network for a contactless interface in the operating room. Computer methods and programs in biomedicine, 190, 105385. https://doi.org/10.1016/j.cmpb.2020.105385.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Articles are open access under the Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish a manuscript in this journal agree to the following terms:
- The authors reserve the right to authorship of their work and transfer to the journal the right of first publication under the terms of the Creative Commons Attribution License, which allows others to freely distribute the published work with a mandatory link to the the original work and the first publication of the work in this journal.
- Authors have the right to conclude independent additional agreements that relate to the non-exclusive distribution of the work in the form in which it was published by this journal (for example, to post the work in the electronic repository of the institution or publish as part of a monograph), providing the link to the first publication of the work in this journal.
- Other terms stated in the Copyright Agreement.