DETECTION OF HATE SPEECH ON SOCIAL MEDIA UTILIZING MACHINE LEARNING

Authors

DOI:

https://doi.org/10.37943/22SKSG8575

Keywords:

hate speech, machine learning, natural language processing, detection, social media

Abstract

This article investigates the identification of hate speech on social media using machine learning and deep learning techniques. The research uses metrics such as F-measure, AUC-ROC, precision, accuracy, and recall assessing the effectiveness of various tactics. The findings indicate that deep learning models, particularly the bidirectional long short-term memory (BiLSTM) architecture, consistently outperform other methods in categorization tasks. The research emphasizes the importance of sophisticated neural network designs in identifying the intricacies of hostile and offensive content online. The study offers insights for promoting early identification and prevention of cyberbullying, improving secure and inclusive online environments. Future research may explore real-time detection systems, hybrid approaches, or the integration of complementary components to enhance and improve innovative technology in tackling this significant social issue.

A sample tweet was annotated by specialists who categorize tweets as hate speech, offensive language, or neutral. The researchers applied shallow learning methodologies and integrated word embeddings like Word2Vec and GloVe to enhance the efficacy of deep learning models. The results indicate that BiLSTM surpasses shallow learning methods in detecting hate speech on Twitter, highlighting the efficacy of deep learning approaches in recognizing and tracking hate speech on social media platforms. When comparing different deep learning and machine learning models on different datasets, the results reveal that deep learning techniques are usually more effective. A reasonably high level of accuracy is achieved by KNN and SVM among classical algorithms, whereas Naïve Bayes performs the poorest. While deep learning approaches provide better results, tree-based models such as Random Forest and Decision Trees offer more consistent accuracy. Models based on neural networks, such as LSTM, CNN, and BI-LSTM, perform well, with LSTM-based methods excelling in particular. The most successful strategy for classification problems is the model presented, which obtains the greatest accuracy, precision, recall, F1-score of 95%. The research aids in the development of advanced tools and methodologies to mitigate hate speech on social media and foster positive online interactions. Future research may investigate alternative deep learning architectures, such as transformers, to enhance hate speech detection efficacy. The advancement of interpretable AI methodologies for identifying hate speech and delivering transparent forecasts might enhance user confidence and facilitate better content moderation decisions.

Author Biographies

Aziza Zhidebayeva, Academician A.Kuatbekov Peoples' Friendship University, Kazakhstan

Candidate of Technical Sciences, Senior Lecturer, Department of Computer Science and Mathematics

Sabira Akhmetova, Mukhtar Auezov South Kazakhstan University, Kazakhstan

Candidate of Physical and Mathematical Sciences, Associate Professor, Department of Information system

Satmyrza Mamikov, Academician A.Kuatbekov Peoples' Friendship University, Kazakhstan

Candidate of Pedagogical Sciences, Associate Professor, Department of Computer Science and Mathematics

Mukhtar Kerimbekov, Academician A.Kuatbekov Peoples' Friendship University, Kazakhstan

Candidate of pedagogical Sciences, Associate Professor , Department of Computer Science and Mathematics

Sapargali Aldeshov, Ozbekali Zhanibekov South Kazakhstan Pedagogical University, Kazakhstan

Candidate of Pedagogical Sciences, Associate Professor, Department of Computer Science and Mathematics

Guldana Shaimerdenova, Mukhtar Auezov South Kazakhstan University, Kazakhstan

PhD, Associate Professor, Department of Information Communication Technologies

References

Alsubait, T., & Alfageh, D. (2021). Comparison of machine learning techniques for cyberbullying detection on youtube arabic comments. International Journal of Computer Science & Network Security, 21(1), 1-5. https://doi.org/10.22937/IJCSNS.2021.21.1.1

Dewani, A., Memon, M. A., & Bhatti, S. (2021). Cyberbullying detection: advanced preprocessing techniques & deep learning architecture for Roman Urdu data. Journal of big data, 8(1), 160. https://doi.org/10.1186/s40537-021-00550-7

Hall, D. L., Silva, Y. N., Wheeler, B., Cheng, L., & Baumel, K. (2022). Harnessing the power of interdisciplinary research with psychology-informed cyberbullying detection models. International journal of bullying prevention, 4(1), 47-54. https://doi.org/10.1007/s42380-021-00107-5

Arce-Ruelas, K. I., Alvarez-Xochihua, O., Pellegrin, L., Cardoza-Avendaño, L., & González-Fraga, J. Á. (2022). Automatic cyberbullying detection: A Mexican case in high school and Higher Education Students. IEEE Latin America Transactions, 20(5), 770-779. https://doi.org/10.1109/TLA.2022.9693561

Ahmed, M. T., Rahman, M., Nur, S., Islam, A. Z. M. T., & Das, D. (2021). Natural language processing and machine learning based cyberbullying detection for Bangla and Romanized Bangla texts. TELKOMNIKA (Telecommunication Computing Electronics and Control), 20(1), 89-97. http://doi.org/10.12928/telkomnika.v20i1.18630

Toktarova, A., Sultan, D., & Azhibekova, Z. (2024, May). Review of Machine Learning Models in Cyberbullying Detection Problem. In 2024 IEEE 4th International Conference on Smart Information Systems and Technologies (SIST) (pp. 233-238). IEEE. http://doi.org/10.1109/SIST61555.2024.10629223

Al-Marghilani, A. (2022). Artificial intelligence-enabled cyberbullying-free online social networks in smart cities. International Journal of Computational Intelligence Systems, 15(1), 9. https://doi.org/10.1007/s44196-022-00063-y

Theng, C. P., Othman, N. F., Abdullah, R. S., Anawar, S., Ayop, Z., & Ramli, S. N. (2021). Cyberbullying detection in twitter using sentiment analysis. International Journal of Computer Science & Network Security, 21(11), 1-10. https://doi.org/10.22937/IJCSNS.2021.21.11.1

Sadiq, S., Mehmood, A., Ullah, S., Ahmad, M., Choi, G. S., & On, B. W. (2021). Aggression detection through deep neural model on twitter. Future Generation Computer Systems, 114, 120-129. https://doi.org/10.1016/j.future.2020.07.050

Sarac Essiz, E., & Oturakci, M. (2021). Artificial bee colony–based feature selection algorithm for cyberbullying. The Computer Journal, 64(3), 305-313. https://doi.org/10.1093/comjnl/bxaa066

Gomez, C. E., Sztainberg, M. O., & Trana, R. E. (2022). Curating cyberbullying datasets: A human-AI collaborative approach. International journal of bullying prevention, 4(1), 35-46. https://doi.org/10.1007/s42380-021-00114-6

Salawu, S., Lumsden, J., & He, Y. (2022). A mobile-based system for preventing online abuse and cyberbullying. International journal of bullying prevention, 4(1), 66-88. https://doi.org/10.1007/s42380-021-00115-5

Mladenović, M., Ošmjanski, V., & Stanković, S. V. (2021). Cyber-aggression, cyberbullying, and cyber-grooming: A survey and research challenges. ACM Computing Surveys (CSUR), 54(1), 1-42. https://doi.org/10.1145/3424246

Sangwan, S. R., & Bhatia, M. P. S. (2021). Denigrate comment detection in low-resource Hindi language using attention-based residual networks. Transactions on Asian and Low-Resource Language Information Processing, 21(1), 1-14. https://doi.org/10.1145/3431729

Yan, R., Li, Y., Li, D., Wang, Y., Zhu, Y., & Wu, W. (2021). A stochastic algorithm based on reverse sampling technique to fight against the cyberbullying. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(4), 1-22. https://doi.org/10.1145/3441455

Yin, C. J., Ayop, Z., Anawar, S., Othman, N. F., & Zainudin, N. M. (2021). Slangs and short forms of malay twitter sentiment analysis using supervised machine learning. International Journal of Computer Science & Network Security, 21(11), 294-300. https://doi.org/10.22937/IJCSNS.2021.21.11.40

Jacobs, G., Van Hee, C., & Hoste, V. (2022). Automatic classification of participant roles in cyberbullying: Can we detect victims, bullies, and bystanders in social media text?. Natural Language Engineering, 28(2), 141-166. https://doi.org/10.1017/S135132492000056X

Kumari, K., Singh, J. P., Dwivedi, Y. K., & Rana, N. P. (2021). Multi-modal aggression identification using convolutional neural network and binary particle swarm optimization. Future Generation Computer Systems, 118, 187-197. https://doi.org/10.1016/j.future.2021.01.014

Abbas, A. M. (2021). Social network analysis using deep learning: applications and schemes. Social Network Analysis and Mining, 11(1), 106. https://doi.org/10.1007/s13278-021-00799-z

Toktarova, A., Beissenova, G., Kozhabekova, P., Makhanova, Z., Tulegenova, B., Rakhymbek, N., ... & Azhibekova, Z. (2021). Automatic offensive language detection in online user generated contents. Journal of Theoretical and Applied Information Technology, 99(9), 2054-2067. https://www.elibrary.ru/item.asp?id=46818459

Downloads

Published

2025-06-30

How to Cite

Zhidebayeva, A., Akhmetova, S. ., Mamikov, S. . ., Kerimbekov, M. ., Aldeshov, S., & Shaimerdenova, G. (2025). DETECTION OF HATE SPEECH ON SOCIAL MEDIA UTILIZING MACHINE LEARNING. Scientific Journal of Astana IT University, 22, 71–87. https://doi.org/10.37943/22SKSG8575

Issue

Section

Information Technologies