DEEP AND MACHINE LEARNING MODELS FOR RECOGNIZING STATIC AND DYNAMIC GESTURES OF THE KAZAKH ALPHABET

Samat Mukhanov; Raissa Uskenbayeva; Abdul Ahmad Rakhim; Im Cho  Young; Aknur Yemberdiyeva; Zhansaya Bekaulova

doi:10.37943/18JYLU4904

Authors

Samat Mukhanov International Information Technology University, Kazakhstan https://orcid.org/0000-0001-8761-4272
Raissa Uskenbayeva Satbayev University, Kazakhstan https://orcid.org/0000-0002-8499-2101
Abdul Ahmad Rakhim Universiti Tenaga Nasional, Malaysia https://orcid.org/0000-0001-7923-0105
Im Cho Young Gachon University, Korea https://orcid.org/0000-0003-0184-7599
Aknur Yemberdiyeva International Information Technology University, Kazakhstan https://orcid.org/0009-0005-5078-2412
Zhansaya Bekaulova International Information Technology University, Kazakhstan https://orcid.org/0009-0000-9339-9222

DOI:

https://doi.org/10.37943/18JYLU4904

Keywords:

Hand gesture recognition, neural networks, SVM, LSTM, CNN, MediaPipe.

Abstract

Currently, an increasing amount of research is directed towards solving tasks using computer vision libraries and artificial intelligence tools. Most common are the solutions and approaches utilizing machine and deep learning models of artificial neural networks for recognizing gestures of the Kazakh sign language based on supervised learning methods and deep learning for processing sequential data. The research object is the Kazakh sign language alphabet aimed at facilitating communication for individuals with limited abilities. The research subject comprises machine learning methods and models of artificial neural networks and deep learning for gesture classification and recognition. The research areas encompass Machine Learning, Deep Learning, Neural Networks, and Computer Vision.

The main challenge lies in recognizing dynamic hand gestures. In the Kazakh sign language alphabet, there are 42 letters, with 12 of them being dynamic. Processing, capturing, and recognizing gestures in motion, particularly in dynamics, pose a highly complex task. It is imperative to employ modern technologies and unconventional approaches by combining various recognition methods/algorithms to develop and construct a hybrid neural network model for gesture recognition. Gesture recognition is a classification task, which is one of the directions of pattern recognition. The fundamental basis of recognition is the theory of pattern recognition. The paper discusses pattern recognition systems, the environment and application areas of these systems, and the requirements for their development and improvement. It presents tasks such as license plate recognition, facial recognition, and gesture recognition. The field of computer vision in image recognition, specifically hand gestures, is also addressed. The development of software will enable the testing of the trained model's effectiveness and its application for laboratory purposes, allowing for adjustments to improve the model.

Author Biographies

Samat Mukhanov, International Information Technology University, Kazakhstan

PhD, Senior-lecturer, Department of Computer Engineering

Raissa Uskenbayeva , Satbayev University, Kazakhstan

Doctor of technical science, Professor, Vice-Rector for Academic Affairs

Abdul Ahmad Rakhim, Universiti Tenaga Nasional, Malaysia

PhD, Professor, Department of Computing and Informatics

Im Cho Young, Gachon University, Korea

PhD, Professor, Faculty of Computer Engineering

Aknur Yemberdiyeva, International Information Technology University, Kazakhstan

Master of Technical Sciences, Lecturer, Department of Computer Engineering

Zhansaya Bekaulova, International Information Technology University, Kazakhstan

Master of Technical Sciences, Senior-lecturer, Department of Computer Engineering

References

Mukhanov, S.B., & Uskenbayeva, R.K. (2020). Pattern Recognition with Using Effective Algorithms and Methods of Computer Vision Library. Advances in Intelligent Systems and Computing, Article 991, 810-819. https://doi.org/10.1007/978-3-030-21803-4_81

Mukhanov, S., Uskenbayeva, R., Young, I.Ch., Kabyl, D., Les, N., & Amangeldi, M. (2023). Gesture Recognition of Machine Learning and Convolutional Neural Network Methods for Kazakh Sign Language. Scientific Journal of Astana IT University. 15(15), 85–100. https://doi.org/10.37943/15LPCU4095

Amirgaliev, E.N., Mukhanov, S.B., Zheksenov, D.B., Kalzhigitov, N.K., Lee, A.S., Evdokimov, D.D., & Kenshimov, C. (2023) A comparative analysis of neural network models for hand gesture recognition methods. Bulletin of the National Engineering Academy of the Republic of Kazakhstan. 2(88), 15-27. https://doi.org/10.47533/2023.1606-146X.2

Kenshimov, C., Mukhanov, S., Merembayev, T., & Yedilkhan, D. (2021). A comparison of convolutional neural networks for Kazakh sign language recognition. Eastern-European Journal of Enterprise Technologies, 5(2 (113), 44–54. https://doi.org/10.15587/1729-4061.2021.241535

Aitulen, A.D., & Mukhanov, S.B. (2019). Processing, identification and recognition by Viola-Jones method. VESTNIK KazNRTU. 6(136), 155-161.

Uskenbayeva, R.K., & Mukhanov S.B. (2020). Contour analysis of external images. Proceeding of the ACM International Conference Proceeding Series, Article 3410811. https://doi.org/10.1145/3410352.3410811

Bazarevsky, V., & Fan, Zh. (2019). On-device, real-time hand tracking with mediapipe. Google AI Blog. Available at: https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html.

Vidyanova, A. (2022). In the USA, they are interested in the development of Kazakhs for the deaf. Capital. https://kapital.kz/business/105455/v-ssha-zainteresovalis-razrabotkoykazakhstantsev-dlya-glukhikh.html

Bazarevsky, V., & Fan Zh. (2019, August 19). On-device, real-time hand tracking with mediapipe. Google AI Blog. https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html.

Wang, Y., Wang, H., & He, X. (2020). Sign language recognition based on deep convolutional neural network. IEEE Access, 8, 64990-64999. https://doi.org/10.3390/electronics12040786.

Lee, A. R., Cho, Y., Jin, S., & Kim, N. (2020). Enhancement of surgical hand gesture recognition using a capsule network for a contactless interface in the operating room. Computer methods and programs in biomedicine, 190, 105385. https://doi.org/10.1016/j.cmpb.2020.105385.

Bilgin, M., & Mutludogan, K. (2019). American Sign Language character recognition with capsule networks. Proceedings of the 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies, Ankara, Turkey. https://doi.org/10.1109/ismsit.2019.8932829.

Kudubaeva, S.A., Ryumin, D.A. & Kalzhanov M.U. (2016). Support vector machine for sign speech recognition using the KINECT sensor. Bulletin of KazNU. Series "Mathematics, mechanics, computer science". 91(3). https://bm.kaznu.kz/index.php/kaznu/article/view/541

Adithya, V., & Reghunadhan R. (2020). A deep convolutional neural network approach for static hand gesture recognition. Procedia Computer Science. (171), 2353-2361. https://doi.org/10.1016/j.procs.2020.04.255.

Lai, K., & Yanushkevich, S. N. (2018). CNN+RNN depth and skeleton based dynamic hand gesture recognition. Proceeding of the 24th International Conference on Pattern Recognition (ICPR), IEEE. https://doi.org/10.1109/ICPR.2018.8545718

Merembayev, T., Kurmangaliyev, D., Bekbauov, B., & Amanbek, Y. (2021). A Comparison of Machine Learning Algorithms in Predicting Lithofacies: Case Studies from Norway and Kazakhstan. Energies, 14(7), 1896. https://doi.org/10.3390/en14071896

Mantecón, T., del Blanco, C.R., Jaureguizar, F., & García, N. (2016) Hand gesture recognition using infrared imagery provided by leap motion controller. Int. Conf. on Advanced Concepts for Intelligent Vision Systems, Lecce, Italy, 47-57, 24-27. https://doi.org/10.1007/978-3-319-48680-2_5.

Kumar, A., Thankachan, K., & Dominic, M.M. (2016) Sign language recognition. Proceedings of the 3rd IEEE international conference on recent advances in information technology (RAIT), 422–428. https://doi.org/10.1109/rait.2016.7507939.

Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, 6105-6114. https://arxiv.org/abs/1905.11946.

Lau, S., Gonzaltz, J., & Nolan, D. (2023). Learning Data Science. O'Reilly Media, Inc. 596.

McKinney, W. (2022). Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter (3rd ed.) O'Reilly Media.

Merembayev, T., Kurmangaliyev, D., Bekbauov, B., & Amanbek, Y. (2021). Comparison of machine learning algorithms in predicting lithofacies: Case studies from Norway and Kazakhstan. Energies. 14(7), 1896.

Zhang, Y., Cao, C., Cheng, J., & Lu, H. (2018). EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition. IEEE Transactions on Multimedia. 20(5). https://doi.org/10.1109/TMM.2018.2808769

Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE internet of things journal. 3(5), 637-646. https://doi.org/10.1109/JIOT.2016.2579198

Wong, B.P., & Kerkez, B. (2016). Real-time environmental sensor data: An application to water quality using web services. Environmental Modelling & Software. 84, 505-517. https://doi.org/10.1016/j.envsoft.2016.07.020

Granell, C., Havlik, D., Schade, S., Sabeur, Z., Delaney, C., Pielorz, J., & Mon, J.L. (2016). Future Internet technologies for environmental applications. Environmental Modelling & Software. 78, 1-15.

Alvarez, M.A., & Lawrence, N.D. (2011). Computationally efficient convolved multiple output Gaussian processes. The Journal of Machine Learning Research. 12, 1459-1500.

Futoma, J., Hariharan, S., & Heller, K. (2017). Learning to detect sepsis with a multitask Gaussian process RNN classifier. Proceedings of the International conference on machine learning (PMLR).1, 1174-1182.

Elman, A., & Hill, J. (2006). Data analysis using regression multilevel/hierarchical models. Cambridge university press. 122.

Wang, Y., Wang, H., & He, X. (2020). Sign language recognition based on deep convolutional neural network. IEEE Access, 8, 64990-64999. https://doi.org/10.3390/electronics12040786