SYNTHETIC DATA GENERATION FOR ANN MODELING OF THE HYDRODYNAMIC PROCESSES OF IN-SITU LEACHING

: The work presents an approach to enhance the forecasting capabilities of In-Si - tu Leaching processes during both the production stage and early prognosis. ISL, a crucial method for resource extraction, demands rapid on-site forecasting to guide the deployment of new technological blocks. Traditional modeling techniques, though effective, are hindered by their computational demands and network throughput requirements, particularly when dealing with substantial datasets


Introduction
The In-Situ Leaching (ISL) method, widely used in uranium mining in Kazakhstan, involves injecting a leaching solution (usually sulfuric acid) into the subterrain, which dissolves crystallized minerals, thereby forming a productive solution that is brought to the surface for subsequent processing.The method allowed Kazakhstan to become a world leader in uranium production, accounting for as much as 40% of world share [1].Among 15 mining sites active in Kazakhstan, ISL provides a safe, reliable and comparatively inexpensive way metal extraction.Major problem during mining operations arises due to the fact, that when a solution enters the subterrain, consequent processes are hidden below the surface, which hinders decision making process [2].Therefore, simulating techniques as well as software packages have been under development to allow visualization, forecasting and decision making based on geological, physical and chemical modeling [3,4].
The ISL process itself is implemented by drilling a network of injection and production wells.The network configuration is determined by the geological and hydrological properties of the subsoil, and, usually, has a row or hexagonal pattern [4,5].In the first case, the rows of production and injection wells alternate; in the latter case, the injection wells form a cell in the form of a hexagon with a production well at the center.The cells are grouped into a technological block, which are part of the deposit as a whole.The injection and extraction of the solution into and out of the formation is carried out through filters located along the well [6].
The efficiency of production directly depends on the decisions made regarding the choice of drilling patterns (well location), filter positioning, well flow rates and other parameters that directly affect the hydrodynamics and chemistry of the leaching process [7].These decisions are made depending on the specific geological and hydrogeological conditions in each block [8].
Decision making can be based on various techniques, as well as on the results of modeling the ISL process.Modeling of the ISL process consists of the following stages: • geological modelling; • hydrodynamic modelling; • modeling of the kinetics of chemical processes of reacting components; • economic assessment.Geological modeling is the process of countering the ore body, determining the lithological and filtration properties of the formation [9].The process typically involves geostatistical calculations to interpolate values in the interwell space from known data obtained through geophysical surveys [6].Hydrodynamic modeling is usually carried out by numerical calculation based on conservation laws and Darcy's law, which describes the movement of fluids in porous medium [10].Modeling the kinetics of chemical processes is usually based on the law of mass action [10].Economic assessment is carried out by determining the operating time of the unit, operating and capital costs [11].
The work [12] highlights the results of mathematical modeling to create a geological and technological model of uranium ISL, as well as the use of mathematical modeling to solve geotechnological and environmental problems.The paper [13] details the application of a 3D reactive transport approach using the Hytec code at an operational scale in Kazakhstan.Hytec technology uses computing clusters to solve mass transport problems, in particular to solve chemical equations for most geochemical reactions, including aqueous complexation, redox, dissolution/precipitation and sorption.The technology also has a hydrodynamic module that describes the filtration of solutions during ISL.
Mass transport modeling is computationally intensive for 3D problems [14].The work [14] considers a classic approach to increasing computational performance by converting sequential computations on central processors into parallel computations on high-performance graphics processors.The non-stationarity of the hydrodynamic processes of the ISL leads to the need to carry out calculations with each change in production modes.On average, the operation of a process unit can last several years, while changes in hydrodynamic conditions can occur daily, and, in some cases, hourly.Hydrodynamic modeling involves calculating the pressure distribution in the reservoir, which is based on solving elliptic differential equations.Calculating such tasks for each block on medium-power machines can take up to two days, depending on: the operating time of the block, the dimensions of the computational grid, the frequency of changes in flow rates at wells, etc.
Acceleration of resource-intensive calculations can potentially be achieved through the use of neural network modeling.The effective use of neural networks to speed up modeling tasks has been demonstrated in solving biological problems [15].
The authors of this article suggest that a properly trained neural network model will significantly speed up the process of hydrodynamic modeling during ISL.The neural network is trained on a set of input and output data, however, due to the complexity of the problem of calculating pressure fields, a significant amount of training data is required.However, there is no information available regarding the actual pressure distribution throughout the entire area under consideration during the production process.The collection of training data, nevertheless, can be achieved by generation done through deterministic traditional calculations based on conservation and Darcy laws.
The authors of the current work propose a technique for generating synthetic data to form the input layer of neural networks in order to simulate the process of uranium mining using the ISL method.
The paper discusses a background of application of machine learning approaches for simulating physical processes in "Methods section"; an approach proposed by authors to generate training datasets in "Generation of synthetic data" section.The latter section discusses conventional method of hydrodynamic simulation, which was used to generate training data, its processing, and splitting to input and output datasets for learning purposes.

Methods
Currently, machine learning (ML) methods are used in the field of mining, in particular in processes associated with drilling, blasting, logistics, processing and transportation of enriched metals, etc. [16].In the mining industry, ML is used at various stages of operation, from exploration to final reclamation (Figure 1) [17].Increasing demand for raw materials, complex geological structures of ores and declining ore grades necessitate high-quality mineralogical analysis for efficient mining operations.Mineralogical data plays a critical role in estimating production duration and solving problems in ore exploration, mining and processing.Recent advances in mineral processing technology enable the use of sophisticated instrumentation and machine learning to increase the efficiency of design and operational workflows.Machine learning coupled with accurate data collection allows to diagnose and optimize various enterprise parameters in real time to improve recovery and energy efficiency.Integrating machine learning can: provide quantitative assessments of relationships between process units, optimize entire process, and identify performance issues [Minerals].
Recent years have seen significant application of deep learning, a form of machine learning, in a number of areas.Unlike many other machine learning techniques, deep learning naturally takes advantage of automatic pattern discovery from data combined with modeling frameworks [18].
The use of neural networks in modeling physical processes provides advantages such as their ability to capture complex nonlinear relationships, adaptability to different types of data, learning from production data, automatic feature extraction, and parallel processing capabilities.However, a number of disadvantages include the requirement for a significant amount of training data, limited interpretability, limitations in extrapolation, and the complexity of the initial configuration [19].
The Physics-Informed Neural Networks (PINNs) approach has been widely used, which is a type of approximation of universal functions described by partial differential equations, and includes in the learning process knowledge of the physical laws governing a given dataset [20].In particular, DeepXDE technology, developed by Lu's group at the University of Pennsylvania, is used to solve forward and inverse problems, as well as partial differential equations.Vivid examples are: solving the Poisson, Laplace, Heimholtz equations, etc. [21].PINNs combine data-driven deep learning techniques with the governing equations of a physical system to simultaneously fit data and enforce fundamental laws of physics.This is achieved by incorporating physics constraints into the loss function, allowing the neural network to optimize its weights and biases to minimize both fitting errors and violations of the underlying physics.
In neural networks, the input layer is the initial vector that receives and processes input data.It consists of neurons (also known as nodes or units), the number of which is equal to the size of the input data.Each neuron in the input layer represents a function or attribute of the input data.For example, when solving problems on a computational grid, the values of the input layer can be the values at the grid nodes and/or boundary and/or initial conditions.
The main function of the input layer is to pass input data forward through the network by applying weights and biases to the input values and passing them on to subsequent layers, often called hidden layers.These hidden layers then perform calculations and transformations on this data to learn and extract relevant features, which ultimately results in network output, which may be classification, regression, or some other tasks depending on the architecture and purpose of the network.
The current work describes the process of generating input layer data and final data for network training.The input layer is formed by normalizing and vectorizing the initial pressure distribution in the reservoir, while the final data is calculated by traditional numerical modeling.In other words, a set of synthetic data is generated to train a neural network to solve the problem of determining reservoir pressure at given well flow rates during uranium ISL.

Generation of synthetic data (3D domain state before and after)
A two-dimensional area of the ISL technological block is considered, with specified positions of production and injection wells (Figure 2).where hwell is the hydraulic head at the injection or production well, qwell is the well flow rate, ∆V is the volume of the well assembly, Kf is the rock filtration coefficient.In all other nodes, the pressure is zero.
At the moment, the neural network will be trained according to the positions of the wells; accordingly, the filtration characteristics were considered as homogeneous and isotropic.The Mass Conservation Law in the context of problems in continuum mechanics can be formulated using a partial differential equation.The change in fluid density (denoted as ρ) with time (t), accompanied by flow ( ), is described by the following equation: where, ρ is the fluid density, is time, is the Darcy filtration rate, and δ is the Dirac delta function.
It is assumed that all fluids in the rock, including groundwater and injected solutions, are considered incompressible, which means .Therefore, the equation can be simplified to the following form: (3) Flow in a porous medium at low speeds is described by the Darcy's Law: (4) where is the filtration rate, is the absolute flow velocity.Substituting the equation ( 4) for (3), the following equation is obtained: (5) The equation ( 5) is an elliptic equation and is usually solved using iterative methods in continuum mechanics problems.
The solution of this second-order partial differential equation is carried out by semi-implicit upper and lower relaxation schemes.In this case, for even iteration steps, upper relaxation is applied, and for odd iteration steps, lower relaxation is applied: (6) for n=2k+1: Thus, if we take into account that the final result was obtained in N iterations, for a specific well configuration, h 0 is the data set for the input layer, h N .
Numerous data grids have been generated for the purposes of training the neural network.Positions of both injection and production wells were determined at random within the area of the computational domain.Hydraulic head values at the grid nodes where wells reside were set to 7 and -7 meters for production and injection wells respectively, which corresponds flow rates 5 and -5 m 3 /hour.The dimensions of the grid are fixed, and in current case were equal to 170x170 for the area equal to 170x170 meters, i.e., the are of each cell in the grid is equal to 1 m 2 .Boundary conditions were set to no-flow Neuman.
Let's consider a neural network with the number of layers L. The layers have n i neurons, where is the layer number.The number of neurons in the input and output layers of a single one: n 1 = n L , and, in the context of the current task, will be equal to the number of nodes in the computational grid.The model includes the weight values w between all neighboring layers, the bias values b and the activation function f .Thus, the activation of each neuron is calculated using the following formula: (8) Regression type problems in neural network modeling use specific activation functions that transform data in a certain range (usually from -1 to 1).For example, the hyperbolic tangent activation function is well suited for regression problems: (9) Data preparation is an important and critical step preceding the training process and has a significant impact on the efficiency and accuracy of the neural model.The main stages are: collecting and sampling data; integration of data from different sources and data processing.The last stage includes vectorization and data normalization [22].
In the current work, data collection is carried out based on the results of hydrodynamic modeling.Data sets h 0 and h N are reduced to one-dimensional arrays L l and E o of the input layer and expected results, respectively.
According to the above, the process of training a neural network requires normalization of input data and arrays containing expected results to determine the loss function.There are a number of normalization algorithms for data processing: Min-Max Normalization, Decimal Scaling Normalization, Z-Score Normalization, Median Normalization, Sigmoid Normalization, and Tanh estimators (Figure 5).Min-Max Normalization scales data to a fixed range (usually between 0 and 1) by subtracting the minimum value and dividing by the range.Decimal Scaling Normalization shifts decimal points of data values to achieve normalization without altering the distribution.Z-Score Normalization rescales data to have a mean of 0 and a standard deviation of 1 by subtracting the mean and dividing by the standard deviation.Median Normalization adjusts data based on the median value, dividing each value by the median to center the distribution around 1. Sigmoid Normalization applies a sigmoid function to squash data between 0 and 1, often used in logistic regression to handle outliers.Tanh Normalization applies the hyperbolic tangent function to map data to the range [-1, 1], useful for handling data with negative values.  5. MSE errors when using various data normalization methods [23] For the current task, when using an activation function that produces values in the range [-1; 1], where -1 will correspond to the minimum flow rate at the wells, and 1 to the maximum, taking into account the study of the effectiveness of various normalization methods in neural network modeling conducted in [23], it is reasonable to use the Min-Max algorithm to normalize hydrodynamic data.The reason behind choosing Min-Max algorithm is to scale their data to a fixed range between -1 and 1 and maintain the relative relationships between the original data points.This is to be able to reconvert data obtained through AI prediction to an original format.
Using Min-Max normalization, the input layer data is prepared using the following formula: (10) for each x value from the L l array, where high = 1 and low = -1.Model is to be trained on a fixed grid size, although the assumption is that grids of different dimensions will be resized to fit the node amount of input and output layers.This can be achieved through ordinary interpolation.
Eventual efficiency advantage of pre-trained neural network over conventional method can be estimated through Big-O notation of computational complexity.

Conclusion
Current modelling techniques provide a set of tools to determine the dynamics of the ISL process during production stage as well as during early prognosis.The continuous development of the deposit, requires a quick forecast capability on site, before launching new technological blocks.Traditional methods, however, are computationally expensive and might require introduction of cluster computing, which will be impossible to implement for every deposit or would require high throughput of the network in case of remote computing, due to large size of datasets.
An introduction of AI technologies such as neural networks might significantly decrease calculation times via forward propagation through a trained neural model.However, steps are required to convert a traditional numerical method dataset into the form appropriate for neural modelling.
An absence of training data available during production with necessary parameters being hidden underground, leads to a challenge of training an AI model.
In current work, a technique for generating training data for the most resource intensive CFD problems arising during ISL modeling has been proposed.Traditional numerical modelling has been used to generate training datasets with input and expected outputs.In this case, specifically for varying well network patterns.
Further work has been conducted to convert obtained data into a structure suitable for neural network.The data has been further normalized to correspond to data ranges imposed by the activation functions used in the network.
Neural networks, once trained, can dramatically cut computation times through forward propagation.In other words, calculation would only go forward through a configured network once.In contrast, when using conventional methods, this would require solving an elliptical equation iteratively for each time the hydrodynamic regimes change, which occurs daily over several ears of a technological block operation.
Next step of current scientific research would be to train the network and to obtain predictions with sufficient accuracy.

Scientific
Figure1.Share of published articles on the use of ML at various stages of field exploitation[17]

Figure 2 .
Figure 2. The domain under consideration

Figure 3 .
Figure 3.Some examples of generated grids

Figure 4 .
Figure 4. Neural network configuration example

Figure
Figure5.MSE errors when using various data normalization methods[23]

Figure 6 .
Figure 6.Schematic representation of data preparation for neural network