scientometry, search for scientific partners , scientific collaboration , clustering of publications , n-gram analysis , determination of research directions


The article describes the solution to the problem of clustering scientists' publications, taking into account the finding of similarities in the annotations and texts of these publications based on n-grams of analysis and cross-references, as well as the tasks of identifying potential project groups for the implementation of research and educational projects based on the results of clustering. The selection of scientific partners in the world practice is done without a comprehensive assessment of their activities. Most of the well-known indexes for evaluating the research activities of scientists need to consider information about citations fully. The methods developed in the study for evaluating the scientific activities of scientists and universities, as well as methods for selecting scientific partners for the implementation of educational and scientific projects on a scientific basis, allow us to organize the influential work of universities qualitatively. In the article, a probabilistic thematic model is constructed that allows the clustering of scientists' publications in scientific fields, considering the citation network, which is an important step in solving the problem of identifying subject scientific spaces. As a result of constructing the model, the problem of increasing instability of clustering of the citation graph due to a decrease in the number of clusters has been solved. The main objective of this work is to address the challenge of selecting suitable partners for collaboration in scientific and educational projects. To achieve this, a method for choosing project executors has been developed, which employs fuzzy logical inference to harmonize expert opinions regarding candidate requirements. This approach helps facilitate the multi-criteria selection of potential partners for scientific and educational projects. In addition to the method, various software modules have been created as part of this research. These modules are designed for the automated collection of information on the publications and citation records of scientists through international scientometric databases. They also encompass a visualization module and a user interface that aids in evaluating the scientific activities of university teaching staff. Choosing partners for grants or strategic collaborations, especially in the context of a globalized and highly mobile scientific community, remains a pertinent issue. The approach described in this research involves clustering the scientific publications of potential project partners. Furthermore, it incorporates conducting comparative citation analyses of these publications and establishing proximity based on n-gram annotation analysis. These methods provide a scientific basis for making informed choices when selecting partners, which is crucial for initiating and advancing research projects. Consequently, the selection of partners for forming research project teams is an immediate and pressing task.


. Kuchansky, A., Biloshchytskyi, A., Andrashko, Yu., Vatskel, V., Biloshchytska, S., Danchenko, O., & Vatskel, I. (2018). Combined models for forecasting the air pollution level in infocommunication systems for the environment state monitoring. IEEE 4th Intern. Symp. on Wireless Systems within the Int. Conf. On Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS), 125–130.

. Li, B. & Zhang, J. (2021). A Cooperative Partner Selection Study of Military-Civilian Scientific and Technological Collaborative Innovation Based on Interval-Valued Intuitionistic Fuzzy Set. Symmetry, 13, 553.

. Gladka, M., Kravchenko, O. Hladkyi, Y., & Borashova, S. (2021). Qualification and appointment of staff for project work in implementing IT systems under conditions of uncertainty. 2021 IEEE International Conference on Smart Information Systems and Technologies, 1-6.

. Kolomiiets, A., & Morozov, V. (2021). Investigation of optimization models in decisions making on integration of innovative projects. Advances in Intelligent Systems and Computing, 51–64.

. Chen, L., Jagota, V. & Kumar, A. (2021). Research on optimization of scientific research performance management based on BP neural network. Int J Syst Assur Eng Manag.

. Kuchansky, A., Biloshchytskyi, A., Andrashko, Yu., Biloshchytska, S., Shabala, Ye., & Myronov, O. (2018). Development of adaptive combined models for predicting time series based on similarity identification. Eastern-European Journal of Enterprise Technologies, 1/4 (91), 25 – 28.

. Kuchansky, A., Andrashko, Yu., Biloshchytskyi, A., Danchenko, O., Ilarionov, O., Vatskel, I., & Honcharenko, T. (2018). The method for evaluation of educational environment subjects’ performance based on the calculation of volumes of m-simplexes. Eastern-European Journal of Enterprise Technologies, 2/4 (92), 15–25.

. Biloshchytskyi, A., Kuchansky, A., Paliy, S., Biloshchytska, S., Bronin, S., Andrashko, Yu., Shabala, Ye., & Vatskel, V. (2018). Development of technical component of the methodology for project-vector management of educational environments. Eastern-European Journal of Enterprise Technologies, 2/2 (92), 4-13.

. Lizunov, P., Biloshchytskyi, A., Kuchansky, A., Andrashko, Yu., & Biloshchytska, S. (2019). Improvement of the method for scientific publications clustering based on n-gram analysis and fuzzy method for selecting research partners. Eastern-European Journal of Enterprise Technologies, 4/4 (100), 6 –14.

. Bykov, V., Biloshchytskyi, A., Kuchansky, A., Andrashko, Yu., Dikhtiarenko, O., & Budnik, S. (2019). Development of information technology for complex evaluation of higher education institutions. Information Technologies and Learning Tools, 73(5), 293 – 306.

. Bykov, V., Spirin, O., Biloshchytskyi, A., Kuchansky, A., Dikhtiarenko, O. (2020). Open digital systems for assessment of pegadodical research results. Information Technologies and Learning Tools, 75(1), 294 – 315.

. Lizunov P., Biloshchytskyi A., Kuchansky A., Andrashko Yu., Biloshchytska S. The use of probabilistic latent semantic analysis to identify scientific subject spaces and to evaluate the completeness of covering the results of dissertation studies Eastern-European Journal of Enterprise Technologies. 2020. № 4/4 (106). P. 14–20.

. Ioannidis, J.P.A., Baas, J. Klavans, R. & Boyack, K. (2019). Supplementary data tables for “A standardized citation metrics author database annotated for scientific field” (PLoS Biology 2019). Mendeley Data, 1.

. Noorden, R.V. & Chawla, D.S. (2019). Hundreds of extreme self-citing scientists revealed in new database. Nature, 572, 578-579.

. Liu, L., & Ran, W. (2020). Research on supply chain partner selection method based on BP neural network. Neural Comput & Applic, 32, 1543–1553.

. Newman, M. (2023). Who is the best connected scientist? A study of scientific coauthorship networks.

. Han, J., Teng, X. & Cai, X. (2019). A Novel Network Optimization Partner Selection Method based on collaborative and knowledge networks. Information Sciences, 484.

. Lungeanu, A., Carter, D., Dechurch, L. & Contractor, N. (2018). How Team Interlock Ecosystems Shape the Assembly of Scientific Teams: A Hypergraph Approach. Communication Methods and Measures, 12, 1-25.

. Huilin, X. (2019). Review of methods of evaluation of scientific and research activity for the choice of selection of scientific partners. Management of development of complex systems, 38, 156 – 160.

. Citation Network Dataset: DBLP+Citation, ACM Citation network. (2022). Aminer.

. Gephi. (2022). The Open Graph Viz Platform.

. I’m knowledge. (2023). Mendeley.




How to Cite

Biloshchytskyi, A., Kuchansky, O. ., Mukhatayev, A. ., Biloshchytska , S. ., Andrashko, Y. ., Toxanov, S. ., & Faizullin, A. . (2023). CLUSTERING OF SCIENTISTS’ PUBLICATIONS, CONSIDERING FINDING SIMILARITIES IN ABSTRACTS AND TEXTS OF PUBLICATIONS BASED ON N-GRAM ANALYSIS AND IDENTIFYING POTENTIAL PROJECT GROUPS. Scientific Journal of Astana IT University, 16(16).