Scientific Journal of Astana IT University

PHYSICALLY BASED EVALUATION OF SNOWPACK SENSITIVITY TO TEMPERATURE PERTURBATIONS IN EAST KAZAKHSTAN

Mon, 30 Mar 2026 00:00:00 +0500

Seasonal snowpack is a critical regulator of water supply and flood risk in continental climates, yet its reliable assessment in Central Asia is constrained by sparse observations. This study applies the multilayer Snow Thermal Model, driven by ERA5-Land reanalysis, to simulate snowpack evolution in East Kazakhstan during the 2022-2023 season and evaluates its performance against snow depth and snow water equivalent observations from the Kazhydromet network.

The model reproduced snow accumulation, peak storage, and melt onset with high accuracy, achieving explained variance above 90%. Importantly, analysis of energy fluxes and stratigraphy revealed that more than half of simulated meltwater was produced under subfreezing air temperatures. Snowmelt is primarily controlled by positive surface energy balance dominated by net radiation and turbulent heat fluxes.

Perturbation experiments further highlight the disproportionate sensitivity of the snow regime to modest thermal changes. A uniform +2 °C warming reduced peak snow water equivalent by nearly one third and advanced melt onset by two to three weeks, while a −1 °C cooling increased snow storage and prolonged snow duration. These threshold-driven responses show that even small climatic deviations or biases in forcing data can shift runoff timing and seasonal water availability. For water managers, this implies that operational planning must explicitly account for temperature sensitivity, since minor departures from average conditions can trigger substantial changes in spring flood risk.

Overall, the study demonstrates that reanalysis-driven, physically based snow modeling provides robust diagnostics in data-scarce regions, surpassing empirical methods in both accuracy and explanatory power. The findings establish its importance for climate sensitivity analysis, flood preparedness, and water resource planning in snow-dominated basins.

BUILDING A DEEP SEARCH CRAWLER FOR THE KAZAKH LANGUAGE: A REPRODUCIBLE WEB-SCALE PIPELINE

Mon, 30 Mar 2026 00:00:00 +0500

We present a reproducible, web-scale pipeline for building a Kazakh-language corpus from the national e-government portal. The system treats the website as a directed graph and performs breadth-first traversal to preserve section hierarchy. Static acquisition relies on robust HTTP requests and HTML parsing; for pages with dynamic widgets, we selectively enable a headless layer to render the final DOM prior to extraction. We define a minimal JSON schema aligned with downstream NLP needs (URL, category, titles, cleaned descriptions) and implement normalization (Unicode NFC/NFKC, transliteration repair for Kazakh, boilerplate removal) and fragment-level deduplication. To strengthen the scientific contribution, we formalize the crawling–extraction process as an optimization under resource constraints and propose field-level quality metrics (precision, recall, F1), coverage of categories, and completeness gains attributable to headless rendering. Our experimental protocol compares static parsing against a hybrid static+headless setup on multiple portal categories, reports field-wise effectiveness with confidence intervals, and analyzes dominant error sources (DOM drift, client-side rendering, code-switching). Ablation studies quantify the impact of normalization and duplication. We also outline ethical access (robots.txt compliance, throttling, conditional requests) and provide artifacts to ensure reproducibility (versioned scripts, schema validators, logging). We release open-source scripts, detailed runbooks, and a small, labeled benchmark to facilitate fair comparisons and independent replication across institutions. The resulting corpus targets low-resource Kazakh NLP and e-government analytics, supporting tasks such as classification, terminology normalization, named-entity recognition, and LLM adaptation. Overall, the proposed pipeline demonstrates that selective headless rendering combined with rigorous normalization is a practical and effective strategy for high-quality data acquisition in dynamically rendered public portals.

IMPROVING THE EFFICIENCY OF PIPELINE LEAK DETECTION SYSTEMS USING NEURAL NETWORKS

Mon, 30 Mar 2026 00:00:00 +0500

Pipeline systems for oil and gas transportation are complex distributed infrastructure facilities whose efficient and safe operation largely depends on the application of modern information and communication technologies. In the context of industrial digital transformation, intelligent monitoring systems capable of continuous acquisition, transmission, and analysis of telemetry data for the early detection of emergency conditions have become particularly relevant. One of the most critical challenges is the timely detection and accurate localization of leaks, which can result in significant economic losses, environmental damage, and threats to public safety. The objective of this study is to develop an approach for determining the coordinates of a pipeline leak based on intelligent processing of measurement data using machine learning methods. The proposed solution is intended for integration into information and communication systems for supervisory control and digital monitoring of pipeline transport. A two-layer multilayer perceptron implemented in the MATLAB environment is employed as the data analysis tool, enabling the development of a computationally efficient algorithm suitable for practical use in decision-support systems operating in near-real-time conditions. The neural network was trained on experimental datasets generated for various leak locations and flow rate values of the transported medium and was tested on independent datasets. The influence of the number of neurons in the hidden layer on leak localization accuracy was investigated. Maximum and root mean square localization errors were used as performance metrics. The results demonstrate that increasing model complexity by raising the number of neurons beyond 3–4 does not lead to a significant improvement in accuracy and may be accompanied by overfitting, thereby reducing the reliability of the algorithm when processing new data. It was found that the optimal neural network architecture comprises three neurons in the hidden layer, providing a root mean square error of approximately 2 km and a maximum error not exceeding 5.5 km. The obtained results confirm the effectiveness of neural network methods for intelligent analysis of telemetry information and demonstrate the feasibility of developing scalable information and communication systems for early leak detection. The practical significance of this work lies in improving the accuracy of accident localization and reducing pipeline operation risks through the implementation of intelligent data processing algorithms within digital industrial monitoring platforms.

DECENTRALIZED IDENTITY AND ACCESS MANAGEMENT IN INTERNET OF THINGS SYSTEMS BASED ON BLOCKCHAIN

Mon, 30 Mar 2026 00:00:00 +0500

The exponential proliferation of Internet of Things (IoT) devices presents critical challenges to traditional centralized identity and access management systems, which are plagued by issues of scalability, single points of failure, and significant privacy risks. While blockchain technology offers a promising decentralized alternative, its direct application is often hindered by low transaction throughput, high costs, and the computational limitations of IoT devices. This study addresses these challenges by proposing and formally evaluating HybID-AC, a novel hybrid architecture for decentralized identity and access management tailored for large-scale, heterogeneous IoT ecosystems. The methodology involves a dual-layer design that separates global trust anchoring from local execution. A highly scalable, feeless Directed Acyclic Graph (DAG) based distributed ledger serves as a public "anchor layer" for registering W3C standard Decentralized Identifiers (DIDs) and access policy hashes. All high-frequency access control operations are processed off-chain at the "edge layer" using the DIDComm v2 peer-to-peer protocol, Attribute-Based Access Control (ABAC) for fine-grained policy enforcement, and Zero-Knowledge Proofs (ZKP) to ensure privacy-preserving attribute verification. The results of our analytical evaluation demonstrate that the HybID-AC architecture achieves orders-of-magnitude improvements in latency and cost-efficiency compared to fully on-chain models, maintaining consistent performance as the network scales. Furthermore, we introduce an original probabilistic model that provides a quantitative metric for assessing the integral security risk of ABAC policies against attribute compromise. The study concludes that this hybrid approach effectively resolves the inherent trade-offs of blockchain in an IoT context, offering a robust, scalable, and interoperable framework that empowers devices with self-sovereign identity while ensuring security and privacy by design.

COMPARATIVE ANALYSIS OF DEEP LEARNING MODELS FOR CHEST DISEASE DIAGNOSIS USING NIH X-RAY DATASET

Dinara Kaibassova , Kalizhan Akhmetov — Mon, 30 Mar 2026 00:00:00 +0500

The integration of deep learning in medical image analysis has significantly advanced computer-aided diagnosis, particularly in chest radiography. However, selecting an optimal convolutional neural network (CNN) architecture for reliable disease classification remains a critical challenge due to data variability, annotation quality, and architectural trade-offs. This study presents a comparative evaluation of three CNN models - DenseNet121, ResNet50, and a custom SimpleCNN - for automated detection of pulmonary infiltrations using a subset of the NIH Chest X-ray dataset. To ensure computational feasibility, only one archive segment was used, and preprocessing included filtering, normalization, and image resizing to 224×224 pixels. Models were trained using cross-entropy loss with the Adam optimizer for five epochs and evaluated on a 20% test split. The performance was assessed using multiple diagnostic metrics essential in medical imaging - accuracy, precision, recall, F1-score, and AUC-ROC - to provide a comprehensive understanding beyond overall accuracy. The ResNet50 model achieved the highest test accuracy and the most balanced trade-off across precision and recall, outperforming DenseNet121 and SimpleCNN. Despite these moderate results, the findings confirm that pre-trained deep architectures generalize more effectively than shallow networks under limited data conditions. The study underscores the impact of dataset size, image resolution, and label quality on diagnostic outcomes. These results form a methodological baseline for further research, where improvements are expected through training on the complete dataset, using full-resolution images, and refining model hyperparameters. Ultimately, this comparative framework contributes to identifying optimal CNN architectures for future clinical diagnostic support systems. Additionally, this study highlights the limitations of small-scale datasets and emphasizes the importance of data augmentation and extended training strategies for improving model performance in medical imaging tasks.

INTELLIGENT DRONE ROUTE PLANNING IN URBAN ENVIRONMENTS: OPTIMIZATION AND SAFETY

Manara Seksembayeva, Zhaksylyk Amangaliyev, Miras Anuarbekov — Mon, 30 Mar 2026 00:00:00 +0500

This study develops and evaluates an intelligent route-planning model for unmanned aerial vehicles operating in urban environments with the objective of minimizing total flight time while maintaining flight safety. The proposed approach addresses a practical last-mile delivery scenario in a smart-city context and is implemented using Google OR-Tools within a constrained Traveling Salesman Problem framework.

Open geospatial data from OpenStreetMap are used to obtain building geometries, street topology, and available height-related attributes, while meteorological data from OpenWeatherMap are used to account for wind conditions in flight-time estimation. Safe cruising altitude is determined from the maximum surrounding obstacle height with an additional safety margin. If explicit building-height data are unavailable, height is approximated from the number of building levels.

A prototype system was developed to visualize routes, estimate route distance, flight time, and delivery cost, and export missions in MAVLink/QGroundControl-compatible format. Experimental scenarios for real delivery points in the Astana metropolitan area, including Astana, Kosshy, and Koyandy demonstrate the practical feasibility of the proposed open-data-driven routing approach and show improvements over a baseline sequential routing strategy.

The scientific contribution of the study lies in the integration of open geospatial building data, weather-aware flight-time estimation, and safety-oriented altitude selection into a reproducible urban UAV route-planning framework. The proposed method can support the development of safe and cost-efficient drone delivery services in dense urban environments.

Experimental scenarios with increasing numbers of delivery points demonstrated that the proposed method remains computationally efficient, with sub-second solver runtime for the tested cases for practical smart-city logistics applications.

DETECTING AND RANKING URBAN BOTTLENECKS FROM PASSIVE SPEED AGGREGATES

Bakbergen Mendaliyev, Didar Yedilkhan, Aidarbek Shalakhmetov — Mon, 30 Mar 2026 00:00:00 +0500

Urban traffic congestion remains a persistent problem, yet many cities still lack dense sensor networks, calibrated simulation models, or detailed origin-destination data for operational bottleneck monitoring. This study develops a lightweight framework for detecting and ranking urban bottlenecks using passive probe-based speed aggregates alone. For each road segment, a free-flow benchmark is estimated from high night-time speeds. Hourly median speeds are then converted into travel times, and cumulative delay is normalized by segment length and mildly regularized to reduce instability on short urban links. The framework is applied to a 20-day probe-data sample for Astana containing 6.34 million link-hour observations across 22,333 segments. After quality checks and coverage filtering, 8,634 segments remain in the final analysis set.

The main results are as follows:

The estimated free-flow benchmark aligns closely with posted speed limits and remains stable under alternative percentile and night-window definitions.
Congestion is strongly concentrated: 1,336 segments account for about 50% of the total delay, while 3,868 segments account for 80%, indicating a pronounced Pareto-type structure.
The ranking remains robust after excluding days affected by major external disturbances, which suggests that the main bottleneck pattern is not driven by a small number of atypical days.
Spatial diagnostics reveal significant positive autocorrelation in congestion severity, and the clustering pattern becomes stronger when road connectivity is represented with a network-based weight matrix rather than a purely geometric nearest-neighbour specification.
Local cluster analysis identifies corridor cores of severe delay together with adjacent transition links, showing that the most critical bottlenecks are spatially connected rather than randomly scattered.
Clustering of normalized 24-hour delay profiles reveals three evening-oriented regimes that differ mainly in congestion intensity.

Taken together, these findings show that routinely collected passive probe data can recover meaningful and operationally useful congestion structure even when a city lacks dense fixed-sensor coverage or a calibrated simulation model. The proposed workflow is transparent, reproducible, and suitable for corridor prioritization, before-and-after evaluation, and future digital-twin-based traffic management.

A TIME-AWARE TEMPORAL BERT FRAMEWORK FOR LONGITUDINAL DETECTION OF DEPRESSIVE AND SUICIDE-RELATED RISK PATTERNS IN SOCIAL MEDIA

Mon, 30 Mar 2026 00:00:00 +0500

In this paper, we introduce a time-aware deep learning model designed to identify and predict signs of depression and suicidal ideation across social networks. Standard static text classifiers typically analyze updates in isolation; however, our method tracks the long-term progression of a person's emotional state by merging contextual language embeddings with temporal encodings and specific psycholinguistic markers. We gathered our primary dataset from Twitter, Reddit, and Facebook, ensuring all user histories were strictly anonymized and organized chronologically. The study evaluates multiple neural network architectures, specifically Temporal BERT (Bidirectional Encoder Representations from Transformers), time-encoded BiLSTM (Bidirectional Long Short-Term Memory), and a temporal transformer utilizing positional features. Our experiments demonstrate that factoring in the chronological dimension substantially boosts classification accuracy, allowing for the earlier detection of declining mental health. The Temporal BERT model achieved the highest F1 score (harmonic mean of precision and recall) and AUC (Area Under the Receiver Operating Characteristic Curve) values on several datasets, outperforming both standard (static) BERT and basic recurrent models. Analysis of temporal trajectories also allowed us to identify clear clusters of user behavior: stable, improving, and deteriorating - this makes conclusions more interpretable and helps us understand personal emotional dynamics. The early-warning module was evaluated at 7-, 14-, and 21-day prediction horizons and showed that risk-related deterioration patterns could be identified in advance of the reference event. Across all evaluated horizons, Temporal BERT demonstrated the strongest Recall@k performance, meaning that it more consistently captured at-risk users among the top-ranked predictions.

This article emphasizes that depressive and suicide-related risk signals are often not evident in isolated posts but emerge through longitudinal behavioral patterns. The proposed approach may support earlier and more sensitive identification of elevated risk patterns in digital mental health monitoring settings. At the same time, such use requires strict ethical safeguards, rigorous anonymization, and human-in-the-loop oversight. Future research should extend the framework toward multimodal, multilingual, and socially contextualized modeling.

AUTOMATED AVALANCHE MONITORING: ENGINEERING AND SOFTWARE SOLUTIONS

Mon, 30 Mar 2026 00:00:00 +0500

An autonomous avalanche hazard monitoring system has been developed and piloted in the East Kazakhstan Region to enable continuous, data-driven early detection and prediction of snow avalanches in mountainous environments. The system integrates a hardware–software ecosystem that overcomes the limitations of traditional manual observations by combining real-time data acquisition, transmission, and predictive analytics. The prototype includes base stations, autonomous snow-temperature measuring rails, meteorological sensors, and a secure web interface with an API for reliable data management.

Field deployments were conducted in three avalanche-prone areas with diverse terrain and climate conditions: Glubokoe district (Mountain Ulbinka), Altai district (Zubovsk), and Ulan district (Taynty river basin). The hardware, including 6-meter modular aluminum masts and sensor-equipped snow rails, was designed for extreme environments, operating reliably within a temperature range of –60 °C to +50 °C and withstanding strong winds and snow loads. The system supports autonomous operation in remote regions with minimal maintenance requirements.

The monitoring network collects high-resolution environmental data, including air temperature, humidity, wind parameters, atmospheric pressure, snow depth, and vertical snow temperature gradients. Data are transmitted every 15 minutes via LoRa, with LTE/Wi-Fi as backup, and stored in a centralized MySQL database. A dedicated software platform enables data visualization, processing, and integration with analytical modules, while a mobile application provides real-time monitoring and alerts.

Logistic regression models were applied to estimate avalanche probability based on meteorological and snowpack data, demonstrating the effectiveness of combining continuous monitoring with statistical forecasting. The system provides a scalable and adaptable framework for avalanche hazard assessment, early warning, and informed decision-making, contributing to improved safety in mountainous regions.

ASSESSMENT OF PASSIVE DEORBITING OF THE KAZAKH EARTH REMOTE SENSING SATELLITES KAZEOSAT-1 AND KAZEOSAT-2

Aigul Kulakayeva, Berik Zhumazhanov , Yevgeniya Daineko, Aigul Nurlankyzy — Mon, 30 Mar 2026 00:00:00 +0500

In this work, passive deorbiting of the Kazakh Earth remote sensing satellites KazEOSat-1 and KazEOSat-2, operating in sun-synchronous low Earth orbits and not equipped with onboard deorbiting systems, is investigated. The object of the study is the dynamics of their orbital motion and the processes of aerodynamic drag during the final stage of operation. Using numerical modeling, time-dependent variations of the main orbital elements were obtained, the rates of orbital altitude decay were estimated, and possible timelines for passive deorbiting were evaluated. In addition, a sensitivity analysis of the results to variations in aerodynamic parameters was performed. It was established that the differences in the orbital evolution of the KazEOSat-1 and KazEOSat-2 spacecraft are mainly determined by the combination of orbital altitude and ballistic coefficient. In particular, the lower orbital altitude and smaller ballistic coefficient of KazEOSat-2 lead to more intense aerodynamic drag and, consequently, to accelerated orbital decay. The orbital degradation of KazEOSat-1 occurs at a significantly slower rate. A distinctive feature of the presented results is their applicability to real operating spacecraft that are not equipped with deorbiting systems, as well as the consideration of uncertainties in aerodynamic characteristics, which made it possible to obtain well-grounded estimates of passive deorbiting timelines. The results of the study can be used in planning the end-of-life phase of Earth remote sensing spacecraft, in assessing the risks of non-compliance with international recommendations on orbital deorbiting timelines, and in substantiating the need for passive or active deorbiting systems in the development of future spacecraft of a similar type.

AI-BASED QUESTION GENERATION FOR AVIATION TRAINING: COMPARING RETRIEVAL-AUGMENTED GENERATION AND FINE-TUNED MODELS

Aruzhan Tugambayeva, Aivar Sakhipov — Mon, 30 Mar 2026 00:00:00 +0500

This study examines how retrieval-augmented and fine-tuned architecture influences the cognitive complexity, terminology usage, and pedagogical characteristics of automatically generated aviation-related questions. The objective is to determine how different modeling strategies affect not only linguistic quality but also the educational value of generated content. A retrieval-augmented generation pipeline was implemented by combining vector-based document retrieval using Facebook AI Similarity Search with the Mistral-7B language model, containing seven billion parameters, applied to a curated knowledge base of 238 aviation documents. In parallel, a T5-small language model, comprising 60 million parameters, was fine-tuned using the Low-Rank Adaptation method on a dataset of 920 aviation context–question pairs.

Both systems were evaluated on a test set of 116 examples. The evaluation framework included expert-based assessment aligned with Bloom's taxonomy of cognitive learning objectives, as well as domain-specific criteria such as aviation terminology coverage and lexical diversity. In addition, widely used text similarity metrics were employed, including Bilingual Evaluation Understudy, Recall-Oriented Understudy for Gisting Evaluation with the longest common subsequence variant, and Bidirectional Encoder Representations from Transformers Score.

The results reveal distinct differences in the cognitive profiles of the generated questions. All questions produced by the fine-tuned model corresponded to the Knowledge level of Bloom's taxonomy, indicating a strong emphasis on factual recall. In contrast, the retrieval-augmented system generated questions that more frequently addressed higher cognitive levels, particularly Comprehension (53.3%) and Application (40.0%). It also demonstrated broader coverage of aviation terminology (92.2% compared to 44.0%) and greater output diversity (112 unique questions versus 56). Conversely, the fine-tuned model achieved higher similarity scores and approximately five times faster inference speed.

NUMERICAL STUDY OF THE WATER SURFACE MOVEMENT DURING A DAM BREAK ON A REAL TERRAIN OF THE TASOTKEL RESERVOIR, ZHAMBYL REGION, KAZAKHSTAN

Mon, 30 Mar 2026 00:00:00 +0500

This work presents a comprehensive numerical simulation of the dam-break process at the Tasotkel reservoir, located in real terrain conditions in the Zhambyl region. The study focuses on understanding how the released water mass propagates over complex topography and how terrain irregularities influence wave dynamics and inundation patterns. A Volume of Fluid method was employed to model the free‐surface evolution of dam-break flows and their subsequent impact on flooding within the downstream valley. This approach allows accurate tracking of interface deformation and the movement of water over uneven ground. To validate the numerical model and ensure the reliability of the applied methodology, a set of controlled dam break simulations was carried out. These include dam-break tests in a channel with a trapezoidal recess and scenarios in an inclined channel. The outcomes of these simulations were compared against known reference data, demonstrating strong agreement and confirming the capability and accuracy of the proposed computational approach. The results reveal that the developed numerical model captures key flow features, such as wave arrival time, flow depths, and velocity variations over different terrain structures. The outcomes confirm that the method can be effectively used for high-precision and efficient assessment of dam-break consequences, including inundation zones and possible risks to downstream infrastructure or settlements. The main objective of this study is to provide a robust numerical framework for analyzing floods and inundation processes caused by dam failures on realistic three-dimensional terrain. The developed tool contributes to improved risk analysis, emergency planning, and safety assessment of hydraulic structures.

DEVELOPMENT OF A METHOD FOR AUTOMATIC DOCUMENT RECOVERY FOLLOWED BY ANALYSIS OF INTEGRITY AND ABSENCE OF ENCRYPTION FOR FORENSIC PURPOSES

Mon, 30 Mar 2026 00:00:00 +0500

As digital infrastructures grow increasingly complex, the need for robust forensic tools that can recover and interpret Office documents, particularly Microsoft Word (.docx) files, has become paramount. Traditional recovery tools often struggle with file integrity verification and fail to determine whether a document is encrypted, leading to limited courtroom admissibility and investigative delays. To address this, this work presents ForenDOC, a systematic approach for the automated recovery and forensic examination of fragmented Office Open XML documents obtained from volatile memory sources. The methodology begins with byte-level capture using raw image formats to preserve unallocated and slack space data. It proceeds with signature-based scanning to detect probable document file offsets, followed by automated Extensible Markup Language (XML) schema validation to guarantee structural integrity and filter out corrupted data. To ensure data uniqueness, Secure Hash Algorithm 1 (SHA-1) hashing and textual deduplication are implemented. Furthermore, the framework utilizes an entropy-based analysis using a Shannon entropy threshold of 5.0 to distinguish readable material from encrypted or obfuscated segments, facilitating the prompt triage of suspicious files. The system functions strictly offline via a read-only interface, enforcing stringent security protocols in accordance with ISO/IEC 27001 and National Institute of Standards and Technology (NIST) Special Publication 800-101 standards. The retrieved documents undergo processing via a custom machine learning pipeline. This includes a Random Forest model for encryption detection, achieving 94.7% precision, and a Bidirectional Long Short-Term Memory (BiLSTM) network for semantic classification spanning legal, fraud, medical, darknet, religious, and economic sectors. Experimental validation of 7,680 memory fragments yielded 970 signature matches, from which ForenDOC successfully isolated exactly 12 structurally viable files. This highlights the system's efficiency in filtering out approximately 98.7% of corrupted data—or false positives—that traditional carving tools would otherwise present to investigators. The results validate the practicality of integrating low-level recovery methods with sophisticated classification models within a cohesive forensic framework. The suggested approach improves evidential reliability and investigation efficiency, providing a scalable tool for digital forensics that adheres to international compliance requirements.

A NOVEL ACOUSTIC-ASSISTED CHIP-OFF FRAMEWORK FOR DATA EXTRACTION FROM DAMAGED HARD DISK DRIVES

Mon, 30 Mar 2026 00:00:00 +0500

This article discusses best practices for extracting data from damaged mobile phones and hard drives while maintaining the integrity of the storage hardware. It emphasizes that data recovery is essential for digital forensics and cybersecurity due to a common approach to data recovery from mobile devices. In many cases, step-by-step low-level collections instead of quick logical groups reveal hidden artifacts or recently deleted files. Sometimes, this is the only reliable option.

Hard disk errors are usually divided into two categories: logical errors and physical damage. The recovery platform combines proven diagnostics, predictive analysis, and specially designed tools, ranging from installing a magnetic head and replacing an image disk to changing file system settings to make the data readable again. One of the new ideas is acoustic perception of the environment. Just as a device listens to the sound of a running engine, it listens to an acoustic response that can be used to automate the detection of mechanical defects. Tics or stuttering can tell you a lot. The study includes two models: one for detecting problems on the hard drive and the other for data recovery. Model A detects errors related to noise, and Model B tries to recover the data. Thus, this study uses a combined approach to extract data from a damaged hard drive.

With the proliferation of devices for the Internet of Things, acoustic-enabled chip disconnection methods provide forensic protection for repairing and inspecting damaged equipment, such as sensors and damaged industrial components. These results should be of interest to research groups, corporate lawyers, and criminologists in terms of broader coverage and reliability of data recovery operations.

DEVELOPMENT AND VERIFICATION OF CYBER SECURITY ARCHITECTURE FOR UNMANNED AERIAL VEHICLE TELEMETRY BASED ON SIMULATION MODELLING

Mon, 30 Mar 2026 00:00:00 +0500

The rapid development and widespread adoption of unmanned technologies have led to significant advancements across various fields of human activity. At the same time, the risks associated with the unauthorized use of unmanned aerial systems have increased. This has led to the emergence of a distinct area of research focused on countermeasures and the protection of various components and platforms within unmanned aerial systems. Despite the existence of current methods for detecting attacks and anomalies, their effectiveness is significantly reduced under complex operational scenarios, including dynamically changing environments, interference, small target sizes, and low radar visibility. To address this issue, this study presents the main findings of a comprehensive analysis of contemporary cyber threats and vulnerabilities arising in unmanned aerial vehicle (UAV) systems. Based on this analysis, an up-to-date classification of existing types of attacks on the basic architecture of UAVs has been compiled. This enabled an examination of the main protection methods for ensuring the security of UAV systems and components, as well as the classification of methods for detecting cyberattacks on their systems. Based on the data obtained, a multi-level protection architecture was developed, comprising three main levels: a secure communication channel, a secure flight controller, and a secure ground control station.

The software environment developed for simulating telemetry streams in Python 3.12 enabled the generation of packets in Micro Air Vehicle Link (MAVLink)/ User Datagram Protocol (UDP)/ Transmission Control Protocol (TCP) format, as well as the simulation of attacks and the detection of network anomalies in the UAV telemetry system. The results obtained include the processing of 97 MAVLink packets, where the proportion of anomaly injections was 10%, totalling 118 units. The average MAVLink packet delay was 0.037 seconds, which indicates stable operation of the telemetry channel. Experimental verification comprising 100 cycles demonstrated the ability to detect data packet structure violations, false identifiers, coordinate substitution, and delay anomalies.

A DUAL-PATH MULTI-TASK FRAMEWORK FOR STRICT THREE-CURVE COBB ANGLE ESTIMATION IN IDIOPATHIC SCOLIOSIS

Beibit Abdikenov, Ayan Kokhan, Temirlan Karibekov — Mon, 30 Mar 2026 00:00:00 +0500

Adolescent idiopathic scoliosis management depends on reproducible Cobb angle measurement across three clinically defined spinal regions: proximal thoracic, main thoracic, and thoracolumbar/lumbar. Although manual measurement remains the reference standard, it is observer-dependent and time-consuming, with inter-observer variability exceeding five degrees even among experienced readers. Most automated deep learning approaches target a single dominant curve or use unconstrained outputs, which limits their applicability to structured clinical workflows requiring strict regional assignment. This study presents a dual-path multi-task framework for simultaneous estimation of all three regional Cobb angles from posteroanterior spinal radiographs. The architecture integrates a ConvNeXt-Tiny encoder, vertebral localization heads, direct global angle regression via soft-argmax, and a geometric tilt-aggregation pathway. A learned per-region sigmoid gate fuses the global and geometric pathways, providing a fixed but optimized balance between statistical and anatomical estimation. The model was developed on 21,294 radiographs with leakage-controlled partitioning into training (N = 17,262), validation (N = 2,016), and test (N = 2,016) subsets. Training employed a two-stage curriculum with severity-aware sampling and hard replay for difficult cases. Three independent runs (seeds 42, 52, 62) were ensembled with test-time augmentation. On the primary held-out set (N = 2,015), the ensemble achieved a mean absolute error of 2.24 degrees (proximal thoracic 2.21, main thoracic 1.97, thoracolumbar/lumbar 2.54), with near-zero Bland-Altman bias (0.03 degrees), good-to-excellent intraclass correlation coefficients (0.884–0.971), and 90.4% of predictions within 5 degrees. At the 40-degree treatment threshold, sensitivity was 0.934 and specificity was 0.994. These findings support the feasibility of strict three-curve automation for reader-in-the-loop clinical workflows.

SCALABLE NEAR-DUPLICATE DETECTION IN KAZAKH SCIENTIFIC TEXTS VIA SEMANTIC EMBEDDINGS AND OPTIMIZED CANDIDATE FILTERING

Mon, 30 Mar 2026 00:00:00 +0500

This work considers the problem of efficient detection of near-duplicate documents in Kazakh scientific texts, which is particularly challenging due to the agglutinative nature of the language and the high computational cost of pairwise document comparison. Traditional approaches based on lexical similarity are ineffective under such conditions, while semantic models, although more accurate, are computationally expensive and scale poorly. To overcome these limitations, the study proposes a scalable framework that combines semantic similarity modeling with optimization techniques, including text canonicalization, efficient indexing, and multi-stage candidate filtering. The canonicalization process reduces morphological variability, increasing the stability of similarity estimation for Kazakh texts. The indexing mechanism, based on dense vector representations, enables efficient selection of candidate pairs using approximate nearest neighbor search. The hierarchical filtering strategy further reduces the number of comparisons, while a transformer-based model provides accurate semantic matching. The proposed approach is evaluated on a large-scale dataset of Kazakh scientific abstracts and near-duplicate pairs. The results demonstrate that the framework achieves high detection accuracy while significantly reducing computational costs compared to exhaustive pairwise comparison. The use of dynamic threshold adjustment allows effective handling of overlapping similarity distributions between duplicate and non-duplicate classes. The obtained results confirm that the combination of linguistic preprocessing and computational optimization is crucial for scalable near-duplicate detection in low-resource agglutinative languages such as Kazakh. The proposed framework can be applied in plagiarism detection, document deduplication, and large-scale text analysis systems.

KAZMORPHLM: MORPHEME-AWARE LANGUAGE MODEL FOR KAZAKH AUTOMATIC SPEECH RECOGNITION

Yerlan Karabaliyev, Kateryna Kolesnikova — Mon, 30 Mar 2026 00:00:00 +0500

This paper presents KazMorphLM, a morpheme-aware language model for automatic speech recognition (ASR) in the Kazakh language. Kazakh belongs to the Turkic family and is characterised by a highly agglutinative morphology, in which a single root can generate a large number of inflected forms through productive suffixation. This property causes severe data sparsity for conventional word-level language models and reduces recognition accuracy.

The proposed model introduces three main innovations. First, a rule-based morpheme segmenter uses an inventory of 230 suffixes across fourteen grammatical categories and includes phonological validation through vowel harmony and consonant assimilation rules. Second, a two-level interpolated n-gram architecture combines a 7-gram morpheme-level model with a 5-gram word-level model using an interpolation ratio of 0.6 to 0.4 and Witten–Bell smoothing. Third, a four-channel rescoring mechanism integrates acoustic confidence, word-level and morpheme-level language-model probabilities, and a vowel-harmony consistency score.

KazMorphLM was integrated into a hybrid ASR pipeline combining NVIDIA FastConformer and Meta MMS-1B acoustic models. On the FLEURS test set, the system achieves a word error rate of 6.86%, a 14.6% relative improvement over word-level KenLM rescoring. The results indicate that higher-order morpheme modelling is essential for agglutinative languages and that corpus quality outweighs corpus size. The approach is applicable to other morphologically rich Turkic languages.

DYNAMIC DETERMINATION OF INFORMATION SYSTEM SECURITY PARAMETERS BASED ON ATTACK GRAPHS AND MARKOV MODELS UNDER CONDITIONS OF UNCERTAINTY

Mon, 30 Mar 2026 00:00:00 +0500

The article presents an approach to the dynamic determination of information system security parameters under conditions of uncertainty and incomplete monitoring data. An attack graph is used as the structural foundation, describing possible compromise trajectories while considering vulnerability dependencies, configurations, access rights, and protective measures. To obtain quantitative assessments, a Markov model of adversary progress is introduced, in which intermediate states represent attack stages and absorbing states correspond to the achievement of critical goals related to violations of confidentiality, integrity, and availability. A key element of the methodology is the procedure for estimating transition probabilities given sparse observations from security logs and interval-based expert estimates for poorly observed attack steps. The proposed combination of event statistics and expert constraints is supplemented by regularization and dynamic updates, which increase parameterization stability, reduce the impact of isolated incidents, and account for operational environment drift. The calculated output indicators include the probability of compromise within a given horizon, separate violation probabilities for confidentiality, integrity, and availability, and the expected time to compromise. Experimental demonstration on a typical corporate architecture confirms the model's suitability for comparing defense scenarios and quantitatively justifying countermeasures: strengthening segmentation and privilege control reduces the reachability of target states, while enhancing monitoring and response further decreases the probability of achieving goals and increases the predicted time to compromise. Signs of attacks on management planes are also considered, including vulnerabilities in secure exchange protocols and network management protocols, as well as the compromise of device firmware. The results can be used for risk-oriented planning of security measures under budget constraints and for forming dynamic security effectiveness indicators in a Zero Trust architecture.

CROSS-SUBJECT EEG-BASED FATIGUE CLASSIFICATION USING MACHINE LEARNING, RIEMANNIAN GEOMETRY, AND COMPACT DEEP NEURAL NETWORKS

Mon, 30 Mar 2026 00:00:00 +0500

Drowsiness reduces efficiency in perceptual processing, reaction time, and executive control, posing risks in safety-critical domains such as driving and long-duration monitoring tasks. EEG-based fatigue detection has emerged as a powerful approach for quantifying early neurophysiological signs of vigilance decline, yet many proposed algorithms are insufficiently evaluated in strictly subject-independent conditions. To address this gap, we systematically compare classical machine learning models, Riemannian geometry-based classification, and compact deep neural architectures on a publicly available electroencephalography (EEG) dataset containing 11 subjects. We employ a rigorous leave-one-subject-out (LOSO) protocol, ensuring that no individual contributes information simultaneously to the training and test sets.

The study evaluates logistic regression, support vector machines with radial-basis kernels, random forests, a Log-Euclidean Riemannian classifier, EEGNet, a transformer encoder, and a bidirectional long short-term memory (BiLSTM) with temporal attention. Across folds, accuracy and macro-F1 scores were calculated and summarized with mean and standard deviation. The BiLSTM-attention model achieved the highest performance (accuracy ; macro-F1 ) but only moderately exceeded EEGNet and the classical baselines. Wilcoxon signed-rank tests revealed no significant difference between EEGNet and BiLSTM (), although BiLSTM significantly outperformed the transformer model (). Analysis of error structure demonstrated a notable asymmetry with 295 false positives and 184 false negatives aggregated across folds.

Band-specific analysis revealed theta activity as the strongest contributor to class separation, followed by delta and alpha rhythms. Channel-importance analysis indicated that posterior and paracentral regions were consistently more informative. These findings highlight that model complexity does not guarantee superior performance in small datasets with large inter-subject variability. The study provides a transparent, fully reproducible baseline for future fatigue-classification research and demonstrates the practical relevance of compact architectures and Riemannian geometry in low-data conditions.