COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORTMS TO IDENTIFY EXTREMIST TEXTS IN THE KAZAKH LANGUAGE

Shynar Mussiraliyeva; Milana Bolatbek; Aigerim Zhumakhanova; Zhanar Medetbek; Moldir Sagynay

doi:10.37943/14DKRN4681

Authors

Shynar Mussiraliyeva Al-Farabi Kazakh National University https://orcid.org/0000-0001-5794-3649
Milana Bolatbek Al-Farabi Kazakh National University https://orcid.org/0000-0002-2153-180X
Aigerim Zhumakhanova Al-Farabi Kazakh National University https://orcid.org/0009-0008-0210-4037
Zhanar Medetbek Al-Farabi Kazakh National University https://orcid.org/0000-0001-7536-5889
Moldir Sagynay Al-Farabi Kazakh National University https://orcid.org/0009-0004-1377-5742

DOI:

https://doi.org/10.37943/14DKRN4681

Keywords:

machine learning model, classification, extremist text.

Abstract

The article explores various models and methods employed in classifying text content with the aim of identifying destructive information within social networks. The study focuses on utilizing machine learning techniques, such as support vector machines, naive Bayes classifiers, random tree methods, decision tree, k-Nearest Neighbors algorithm, logistic regression, gradient boosting to identify extremist texts. The research findings showcase the effectiveness of these methodologies in the identification process.

The article also offers an overview of existing research, methodologies, and software products in the analysis of extremist texts, emphasizing the importance of case-based learning, deductive learning models, and automated data collection and analysis. Additionally, the article provides an overview of existing research, methods, and software products within the field of analyzing extremist texts. It highlights the significance of case-based learning and the use of deductive learning models, as well as automated data collection and analysis techniques. These approaches contribute to the overall understanding and detection of extremist content.

The article further discusses the relevance and future prospects of the presented research. It emphasizes the need to expand the corpus of documents studied, enabling a more comprehensive analysis of texts, including those in photo, audio, and video formats. The development of complex models for recognizing hidden extremist propaganda is also identified as a key direction for future work.

By addressing these areas of focus, the research presented in the article aims to advance the field of identifying and combating extremist content within social networks. The incorporation of advanced techniques and technologies is crucial to effectively detect and address the presence of such content in various forms and formats.

Author Biographies

Shynar Mussiraliyeva , Al-Farabi Kazakh National University

Candidate of Physical and Mathematical Sciences, Head of the department “Information systems”

Milana Bolatbek, Al-Farabi Kazakh National University

PhD., Senior Lecturer of the department “Information systems”

Aigerim Zhumakhanova, Al-Farabi Kazakh National University

Master of Technical Sciences, Lecturer of the department “Information systems”

Zhanar Medetbek, Al-Farabi Kazakh National University

Master of Military Affairs and Security, Lecturer of the department “Information systems”

Moldir Sagynay, Al-Farabi Kazakh National University

Master of Technical Sciences, Lecturer of the department “Information systems”

References

Machine learning. (2022, January 20). Retrieved from http://www.machinelearning.ru/wiki/index. php?title=%D0%9C%D0%B0%D1%88%D0%B8%D0%BD%D0%BD%D0%BE%D0%B5_%D0%BE% D0%B1%D1%83%D1%87%D0%B5%D0%BD%D0%B8%D0%B5

Arpinar, I.B., Kursuncu, U., & Achilov, D. (2016). Social media analytics to identify and counter Islamist extremism: Systematic detection, evaluation, and challenging of extremist narratives online. In 2016 International Conference on Collaboration Technologies and Systems (pp. 611-612). IEEE. https://doi.org/10.1109/CTS.2016.0113

Liu, B. (2007). Web data mining: Exploring hyperlinks, contents, and usage data. Prentice Hall.

Ul Rehman, Z., Abbas, S., Khan, M. A., Mustafa, G., Fayyaz, H., Hanif, M., ... & Saeed, M. A. (2020). Understanding the language of ISIS: An empirical approach to detect radical content on Twitter using machine learning. Computers, Materials & Continua, 66(2), 1075-1090. https://doi.org/10.32604/cmc.2020.012770

Ahmad, S. , Asghar, M.Z. , Alotaibi, F.M. , & Awan, I. (2019). Detection and classification of social media-based extremist affiliations using sentiment analysis techniques. Human-centric Computing and Information Sciences, 9(24), 1-23. https://doi.org/10.1186/s13673-019-0185-6

Mayur, G., Swati, A., Ketan, K., & Ajith, A. (2022). Multi-ideology multi-class extremism classification using deep learning techniques. IEEE Access. https://doi.org/10.1109/ACCESS.2022.3205744

Mayur, G., Swati, A., Shraddha, P., & Ketan, K. (2021). Online extremism detection: A systematic literature review with emphasis on datasets, classification techniques, validation methods, and tools. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3068313

Asif, M., Ishtiaq, A., Ahmad, H., Aljuaid, H., & Shah, J. (2020). Sentiment analysis of extremism in social media from textual information. Telematics Informat, 48, 101345. https://doi.org/10.1016/j.tele.2020.101345

Klausen, J., Marks, C.E., & Zaman, T. (2018). Finding extremists in online social networks. European Journal of Operational Research, 66(4), 957-976. https://doi.org/10.1287/opre.2018.1719

Ul Rehman, Z., Abbas, S., Khan, M.A., Mustafa, G., Fayyaz, H., Hanif, M., & Saeed, M.A. (2020). Understanding the language of ISIS: An empirical approach to detect radical content on Twitter using machine learning. Computers, Materials & Continua, 66(2), 1075-1090. https://doi.org/10.32604/cmc.2020.012770

Burkov, A. (2020). Machine learning without further ado. Peter.

Swamy, M. N., Hanumanthappa, M., & Jyothi, N. M. (2014). Indian language text representation and categorization using supervised learning algorithm. In 2014 International Conference on Intelligent Computing Applications (pp. 406-410). https://doi.org/10.1109/ICICA.2014.89

Mashechkin, I., Petrovskiy, M., Tsarev, D., & Chikunov, M. (2019). Machine learning methods for detecting and monitoring extremist information on the internet. Programming and Computer Software, 45, 99-115.

Ashraf, N., Rafiq, A., Butt, S., Shehzad, S.M.F., Sidorov, G., & Gelbukh, A. (2022). YouTube-based religious hate speech and extremism detection dataset with machine learning baselines. Journal of Intelligent and Fuzzy Systems, 42(5), 4769-4777.

Neurohive. (2022, February 18). Gradientnyj busting – prosto o slozhnom. [Gradient bousting - simple to complex]. Retrieved from https://neurohive.io/ru/osnovy-data-science/gradientyj-busting/

COMPARATIVE ANALYSIS OF MACHINE LEARNING ALGORTMS TO IDENTIFY EXTREMIST TEXTS IN THE KAZAKH LANGUAGE

Authors

DOI:

Keywords:

Abstract

Author Biographies

Shynar Mussiraliyeva , Al-Farabi Kazakh National University

Milana Bolatbek, Al-Farabi Kazakh National University

Aigerim Zhumakhanova, Al-Farabi Kazakh National University

Zhanar Medetbek, Al-Farabi Kazakh National University

Moldir Sagynay, Al-Farabi Kazakh National University

References

Downloads

Published

How to Cite

Issue

Section

License