data preprocessing, natural language processing , thematic clustering , research abstracts


In today's academic environment, the rapid growth of research publications calls for advanced methods to organize and understand the extensive collections of academic work. This study aims to systematically categorize a substantial number of research paper abstracts from Kazakhstani institutions, focusing on identifying key themes and potential interdisciplinary collaboration opportunities. The dataset includes 13,356 abstracts from the Scopus database, covering a wide range of academic fields. The methodology of this research goes beyond traditional hand-done analysis by using advanced text analysis tools to organize the text data efficiently. This initial phase is crucial for summarizing each abstract's core content. The next steps of the analysis use this organized data to find and group similar thematic areas, considering the complex and multi-dimensional nature of academic research topics. The results reveal a diverse array of research themes, highlighting the dynamic academic contributions from Kazakhstan. Significant areas such as environmental science, technological advancements, linguistics, and cultural studies are among the prominent clusters identified. These insights not only provide an overview of current research directions but also highlight the potential for cross-disciplinary partnerships. Moreover, the findings have important implications for decision-makers, scholars, and educational institutions by illuminating key research areas and collaborative possibilities. This thematic overview acts as a guide for shaping research policies, fostering academic connections, and efficiently distributing resources within the scholarly community. Ultimately, this study adds to the academic conversation by offering a way to navigate and utilize the wealth of information in scientific literature, promoting a more collaborative and integrated research environment.


Al-Obaydy, W.I., Hashim, H.A., Najm, Y.A., & Jalal, A.A. (2022). Document classification using term frequency-inverse document frequency and K-means clustering. Indonesian Journal of Electrical Engineering and Computer Science, 27(3), 1517-1524.

Shetty, K., & Kallimani, J.S. (2017, December). Automatic extractive text summarization using K-means clustering. 2017 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT), 881-890

Biloshchytskyi, A., Kuchansky, А., Andrashko, Y., Biloshchytska, S., Kuzka, O., & Terentyev, O. (2017). Evaluation methods of the results of scientific research activity of scientists based on the analysis of publication citations. Vostochno-Evropejskij zhurnal peredovyh tehnologij, 3 (2), 4-10.

Alsmadi, I., & Alhami, I. (2015). Clustering and classification of email contents. Journal of King Saud University-Computer and Information Sciences, 27(1), 46-57.

Rejito, J., Atthariq, A., & Abdullah, A. S. (2021). Application of text mining employing k-means algorithms for clustering tweets of Tokopedia. Journal of Physics: Conference Series, 1722 (1), 012019.

Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168-189.

Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2022). Text preprocessing for text mining in organizational research: Review and recommendations. Organizational Research Methods, 25(1), 114-146.

Alhawarat, M., & Hegazi, M. (2018). Revisiting k-means and topic modeling, a comparison study to cluster arabic documents. IEEE Access, 6, 42740-42749.

Oti, E. U., Olusola, M.O., Eze, F.C., & Enogwe, S.U. (2021). Comprehensive review of K-Means clustering algorithms. International Journal of Advances in Scientific Research and Engineering, 7(8), 64.

Vijayarani, S., Ilamathi, M.J., & Nithya, M. (2015). Preprocessing techniques for text mining-an overview. International Journal of Computer Science & Communication Networks, 5(1), 7-16.

[11] Aubaidan B., Mohd M., Albared M. (2014). Comparative study of k-means and k-means++ clustering algorithms on crime domain. Journal of Computer Science, 10 (7), 1197-1206.

Tabassum, A., & Patil, R.R. (2020). A survey on text pre-processing & feature extraction techniques in natural language processing. International Research Journal of Engineering and Technology (IRJET), 7(06), 4864-4867.

Kadhim, A.I., Cheah, Y.N., & Ahamed, N.H. (2014, December). Text document preprocessing and dimension reduction techniques for text document clustering. 2014 4th international conference on artificial intelligence with applications in engineering and technology, 69-73.

Al-Anazi, S., AlMahmoud, H., & Al-Turaiki, I. (2016). Finding similar documents using different clustering techniques. Procedia Computer Science, 82, 28-34.

Bafna, P., Pramod, D., & Vaidya, A. (2016, March). Document clustering: TF-IDF approach. 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), 61-66.

Arora, P., Deepali Dr., Varshney, S. (2016). Analysis of k-means and k-medoids algorithm for big data. Procedia Computer Science, 78, 507-512.

Zhou, S., Xu, X., Liu, Y., Chang, R., & Xiao, Y. (2019). Text similarity measurement of semantic cognition based on word vector distance decentralization with clustering analysis. IEEE Access, 7, 107247-107258.

Singh, A.K., & Shashi, M. (2019). Vectorization of text documents for identifying unifiable news articles. International Journal of Advanced Computer Science and Applications, 10(7).

Naeem, S., & Wumaier, A. (2018). Study and implementing K-mean clustering algorithm on English text and techniques to find the optimal value of K. International Journal of Computer Applications, 182(31), 7-14.

Kim, S.W., & Gil, J.M. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-centric Computing and Information Sciences, 9, 1-21.




How to Cite

Biloshchytskyi, A., Shamgunova, . M. ., & Biloshchytska , S. . (2024). EXPLORATION OF THE THEMATIC CLUSTERING AND COLLABORATION OPPORTUNITIES IN KAZAKHSTANI RESEARCH. Scientific Journal of Astana IT University, 17(17), 106–121. https://doi.org/10.37943/17ALVR8114



Information Technologies