COMPARATIVE EFFECTIVENESS OF RULE-BASED AND MACHINE LEARNING METHODS IN SENTIMENT ANALYSIS OF KAZAKH LANGUAGE TEXTS

Mukhtar Amirkumar; Kamila  Orynbekova; Assem  Talasbek; Dauren  Ayazbayev; Selcuk  Cankurt

doi:10.37943/17RHPH9724

Authors

Mukhtar Amirkumar SDU University https://orcid.org/0009-0005-7714-0292
Kamila Orynbekova SDU University https://orcid.org/0000-0002-2182-2914
Assem Talasbek SDU University https://orcid.org/0000-0002-0944-1772
Dauren Ayazbayev SDU University https://orcid.org/0000-0001-9973-2145
Selcuk Cankurt Vistula University https://orcid.org/0000-0003-0581-1913

DOI:

https://doi.org/10.37943/17RHPH9724

Keywords:

sentiment analysis, machine learning , rule-based approach , Logistic Regression , Multinomial Naive Bayes

Abstract

Sentiment analysis is increasingly pivotal in natural language processing (NLP), crucial for deciphering public opinions across diverse sectors. This research conducts a comparative examination of rule-based and machine learning (ML) methods in sentiment analysis, specifically targeting the Kazakh language. Given the Kazakh language's limited exposure in computational linguistics, the study meticulously evaluates datasets from news articles, literature, and Amazon product reviews, aiming to compare the efficiency, adaptability, and overall performance of these distinct approaches.

Employing a detailed set of evaluation metrics such as accuracy, precision, recall, and computational efficiency, the study provides a comprehensive analysis of the strengths and limitations of rule-based techniques versus ML models like Logistic Regression, Multinomial Naive Bayes, Decision Trees, Random Forest, and XGBoost. The findings suggest rule-based methods excel in identifying nuanced emotional expressions within literary texts, while ML models demonstrate superior adaptability and robustness, particularly effective in handling the linguistic variations found in news and reviews.

Despite the strengths identified, the study also reveals significant limitations of the rule-based approach, especially in broader contexts beyond literary analysis. This highlights an imperative for future research to integrate sentiment dictionaries or domain-specific lexicons that cater to a wider array of linguistic styles, potentially enhancing sentiment analysis tools' applicability in Kazakh and similar less-studied languages.

This investigation contributes significantly to the sentiment analysis discourse, offering invaluable insights for both researchers and practitioners by elucidating the complexities of applying NLP technologies across diverse linguistic landscapes, thus advancing the understanding and methodologies of sentiment analysis in the Kazakh language context.

References

Mehta, P., & Pandya, S. (2020), A review on sentiment analysis methodologies, practices and applications. Int. J. Sci. Technol. Res., 9 (2), 601–609.

Wankhade, M., Rao, A. C. S., & Kulkarni, C. (2022). A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev., 55 (7), 5731–5780.

Parlar, T., Ozel, S., & Song, F. (2019). Analysis of data pre-processing methods for sentiment analysis of reviews. Comput. Sci., 20, 123-141.

Özçift, A. (2022). FastText Word Embedding Model in Aspect-Level Sentiment Analysis of Airline Customer Reviews for Agglutinative Languages: A Case Study for Turkish. International Conference on Artificial Intelligence and Applied Mathematics in Engineering, Springer, 691–702.

Matlatipov, S., Rahimboeva, H., Rajabov, J., & Kuriyozov, E. (2022). Uzbek sentiment analysis based on local restaurant reviews. ArXiv Prepr, 220515930.

Yergesh, B., Bekmanova, G., Sharipbay, A. & Yergesh, M. (2017). Ontology-based sentiment analysis of kazakh sentences. Computational Science and Its Applications–ICCSA 2017: 17th International Conference, Springer, 669–677.

Li Z., Li X., Sheng J., & Slamu, W. (2020), AgglutiFiT: efficient low-resource agglutinative language model fine-tuning. IEEE Access, 8, 148489–148499.

Kurian, D. D. M. K., Vishnupriya, S., Ramesh, R., Divya, G., & Divya, D. (2015). Big data sentiment analysis using hadoop. Int. J. Innov. Res. Sci. Technol., 1 (11), 92–96.

Niyazmetova, K., Raximov, K., Anvarova, D., & Bekjanov, R. (2023). Formation of a Database For Sentiment Analysis of Texts in the Uzbek Language. Sci. Innov., 2 (C11), 20–23.

Tussupov, J., Sambetbayeva, M., Idrisova, I., & Yerimbetova, A. (2015). Development and implementation of a morphological model of kazakh language. Eurasian J. Math. Comput. Appl., 3 (3), 69–79.

Zhumabekova, A. K., & Mirzoyeva, L. Y. (2016), Peculiarities of indirect translation from English into Kazakh via Russian language. TOJET. pp. 189-194

Surya, P. P., & Subbulakshmi, B. (2019). Sentimental analysis using Naive Bayes classifier. 2019 International conference on vision towards emerging trends in communication and networking (ViTECoN), 1-5.

Hariguna, T., Baihaqi, W. M., & Nurwanti, A. (2019). Sentiment analysis of product reviews as a customer recommendation using the naive Bayes classifier algorithm. International Journal of Informatics and Information Systems, 2(2), 48-55.

Open Access Kazakh News Sentiment Labeled Dataset. https://github.com/chapayevdauren/sentiment-analysis-for-kz/blob/master/data/sample.csv.

Serek, A., Issabek, A., & Bogdanchikov, A. (2019). Distributed sentiment analysis of an agglutinative language via Spark by applying machine learning methods. 2019 15th International Conference on Electronics, Computer and Computation (ICECCO), IEEE, 1–4.

Sentiment Labeled Sentences Data Set of Product Reviews. https://www.kaggle.com/datasets/marklvl/sentiment-labelled-sentences-data-set?rvi=1.

Kotzias, D., Denil, M., De Freitas, N., & Smyth, P. (2015). From group to individual labels using deep features. 21th ACM SIGKDD international conference on knowledge discovery and data mining, 597–606.

Sozdikqor.kz: Comprehensive Kazakh Language Portal for Diverse Word Meanings and Phrases. https://sozdikqor.kz/

Abubakar, H. D., Umar, M., & Bakale, M. A. (2022). Sentiment classification: Review of text vectorization methods: Bag of words, Tf-Idf, Word2vec and Doc2vec. SLU Journal of Science and Technology, 4(1 & 2), 27-33.

Goyal, R. (2021). Evaluation of rule-based, CountVectorizer, and Word2Vec machine learning models for tweet analysis to improve disaster relief. 2021 IEEE Global Humanitarian Technology Conference (GHTC), 16-19.

Saad, S. E., & Yang, J. (2019). Twitter sentiment analysis based on ordinal regression. IEEE Access, 7, 163677-163685.

Abbas, M., Memon, K. A., Jamali, A. A., Memon, S., & Ahmed, A. (2019). Multinomial Naive Bayes classification model for sentiment analysis. IJCSNS Int. J. Comput. Sci. Netw. Secur, 19(3), 62.

COMPARATIVE EFFECTIVENESS OF RULE-BASED AND MACHINE LEARNING METHODS IN SENTIMENT ANALYSIS OF KAZAKH LANGUAGE TEXTS

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License