FEATURE SELECTION METHODS FOR LSTM-BASED RIVER WATER LEVEL AND DISCHARGE FORECASTING
DOI:
https://doi.org/10.37943/21EHLH9882Keywords:
LSTM, feature selection, water level forecasting, ERA5-Land, PCA, flood monitoringAbstract
Accurate forecasting of river discharge and water levels is essential for effective water resource management, flood mitigation, and public safety. This study compares correlation-based and PCA-based feature selection methods for LSTM forecasting models in the study area at Uba River basin, within Shemonaiha city in the East Kazakhstan region. The dataset spans from 1995 to 2021, with 1995 to 2019 used for training and validation and 2020 to 2021 for testing. Both feature selection methods reduced the original predictor set to 13 features while generally maintaining predictive accuracy. An ensemble of 10 LSTM models was trained using 60-day input sequences to forecast discharge and water levels over a 10-day horizon, reducing variance from random initialization and stabilizing predictions. Performance was evaluated using the Nash-Sutcliffe Efficiency. Results showed that correlation-based selection performed comparably to the full-feature baseline in 2020 test set, suggesting that removing highly correlated predictors did not decrease short-term forecasts capacity of the model. The model with PCA-based selected features, while slightly lagging at longer lead times in 2020, exhibited advantages in most lead times with 2021 forecasts. However, overall predictive performance declined in 2021 compared to 2020, indicating that the hydrological conditions deviate more from the historical training record, and suggesting the need for model updates with relevant historical training data. Both feature selection methods successfully reduced dimensionality, while preserving performance capacity, though neither was universally superior across all forecast lead times. These results emphasize the value of systematic feature selection in hydrological modeling and highlight the importance of model adaptability to evolving environmental conditions.
References
Depetris, P. J. (2021). The importance of monitoring river water discharge. Frontiers in Water, 3, 745912. doi.org/10.3389/frwa.2021.745912.
Özdoğan-Sarıkoç, G., & Dadaser-Celik, F. (2024). Physically based vs. data-driven models for streamflow and reservoir volume prediction at a data-scarce semi-arid basin. Environmental Science and Pollution Research, 1-22. doi.org/10.1007/s11356-024-33732-w.
Hunt, K. M., Matthews, G. R., Pappenberger, F., & Prudhomme, C. (2022). Using a long short-term memory (LSTM) neural network to boost river streamflow forecasts over the western United States. Hydrology and Earth System Sciences, 26(21), 5449-5472. doi.org/10.5194/hess-26-5449-2022.
Anshuka, A., Chandra, R., Buzacott, A. J., Sanderson, D., & van Ogtrop, F. F. (2022). Spatio temporal hydrological extreme forecasting framework using LSTM deep learning model. Stochastic environmental research and risk assessment, 36(10), 3467-3485. doi.org/10.1007/s00477-022-02204-3.
Kim, G. B., Hwang, C. I., & Choi, M. R. (2021). PCA-based multivariate LSTM model for predicting natural groundwater level variations in a time-series record affected by anthropogenic factors. Environmental Earth Sciences, 80(18), 657. doi.org/10.1007/s12665-021-09957-0.
Nifa, K., Boudhar, A., Ouatiki, H., Elyoussfi, H., Bargam, B., & Chehbouni, A. (2023). Deep learning approach with LSTM for daily streamflow prediction in a semi-arid area: a case study of Oum Er-Rbia river basin, Morocco. Water, 15(2), 262. doi.org/10.3390/w15020262.
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., & Herrnegger, M. (2018). Rainfall–runoff modelling using long short-term memory (LSTM) networks. Hydrology and Earth System Sciences, 22(11), 6005-6022. doi.org/10.5194/hess-22-6005-2018.
Nearing, G., Cohen, D., Dube, V., Gauch, M., Gilon, O., Harrigan, S., ... & Matias, Y. (2024). Global prediction of extreme floods in ungauged watersheds. Nature, 627(8004), 559-563. doi.org/10.1038/s41586-024-07145-1.
Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., & Nearing, G. (2019). Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets. Hydrology and Earth System Sciences, 23(12), 5089-5110. doi.org/10.5194/hess-23-5089-2019.
Fang, W., Ren, K., Liu, T., Shang, J., Jia, S., Jiang, X., & Zhang, J. (2024). An evaluation of random forest based input variable selection methods for one month ahead streamflow forecasting. Scientific Reports, 14(1), 29766. doi.org/10.1038/s41598-024-81502-y.
Reis, G. B., da Silva, D. D., Fernandes Filho, E. I., Moreira, M. C., Veloso, G. V., de Souza Fraga, M., & Pinheiro, S. A. R. (2021). Effect of environmental covariable selection in the hydrological modeling using machine learning models to predict daily streamflow. Journal of Environmental Management, 290, 112625. doi.org/10.1016/j.jenvman.2021.112625.
Galelli, S., Humphrey, G. B., Maier, H. R., Castelletti, A., Dandy, G. C., & Gibbs, M. S. (2014). An evaluation framework for input variable selection algorithms for environmental data-driven models. Environmental Modelling & Software, 62, 33-51. doi.org/10.1016/j.envsoft.2014.08.015.
Ren, K., Fang, W., Qu, J., Zhang, X., & Shi, X. (2020). Comparison of eight filter-based feature selection methods for monthly streamflow forecasting–three case studies on CAMELS data sets. Journal of Hydrology, 586, 124897. doi.org/10.1016/j.jhydrol.2020.124897.
Liu, Z., Xu, W., Feng, J., Palaiahnakote, S., & Lu, T. (2018, August). Context-aware attention LSTM network for flood prediction. In 2018 24th international conference on pattern recognition (ICPR) (pp. 1301-1306). IEEE. doi.org/10.1109/ICPR.2018.8545385.
Thakur, A., Chandel, A., & Shankar, V. (2025). Prediction of groundwater levels using a long short-term memory (LSTM) technique. Journal of Hydroinformatics, 27(1), 51-68. doi.org/10.2166/hydro.2024.239.
Wang, C., Li, T., Xin, D., Wang, Q., Chen, R., & Cao, C. (2023). Pan Evaporation Prediction Using LSTM Models Based on PCA Factor Reduction and Firefly Optimization Algorithm. IEEE Journal on Miniaturization for Air and Space Systems. doi.org/10.1109/JMASS.2023.3319579.
Ghobadi, F., Tayerani Charmchi, A. S., & Kang, D. (2023). Feature extraction from satellite-derived hydroclimate data: Assessing impacts on various neural networks for multi-step ahead streamflow prediction. Sustainability, 15(22), 15761. doi.org/10.3390/su152215761.
John, T. J., & Nagaraj, R. (2023). Prediction of floods using improved pca with one-dimensional convolutional neural network. International Journal of Intelligent Networks, 4, 122-129. doi.org/10.1016/j.ijin.2023.05.004.
Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., ... & Thépaut, J. N. (2021). ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth system science data, 13(9), 4349-4383. doi.org/10.5194/essd-13-4349-2021.
Amani, M., Ghorbanian, A., Ahmadi, S. A., Kakooei, M., Moghimi, A., Mirmazloumi, S. M., ... & Brisco, B. (2020). Google earth engine cloud computing platform for remote sensing big data applications: A comprehensive review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 5326-5350. doi.org/10.1109/JSTARS.2020.3021052.
McMillan, H. K., Gnann, S. J., & Araki, R. (2022). Large scale evaluation of relationships between hydrologic signatures and processes. Water Resources Research, 58(6), e2021WR031751. doi.org/10.1029/2021WR031751.
Safeeq, M., Mauger, G. S., Grant, G. E., Arismendi, I., Hamlet, A. F., & Lee, S. Y. (2014). Comparing large-scale hydrological model predictions with observed streamflow in the Pacific Northwest: Effects of climate and groundwater. Journal of Hydrometeorology, 15(6), 2501-2521. doi.org/10.1175/JHM-D-13-0198.1.
Çokluk Bökeoğlu, Ö., & Koçak, D. (2016). Using Horn's parallel analysis method in exploratory factor analysis for determining the number of factors. Educational sciences-theory & practice, 16(2). doi.org/10.12738/estp.2016.2.0328.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Articles are open access under the Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish a manuscript in this journal agree to the following terms:
- The authors reserve the right to authorship of their work and transfer to the journal the right of first publication under the terms of the Creative Commons Attribution License, which allows others to freely distribute the published work with a mandatory link to the the original work and the first publication of the work in this journal.
- Authors have the right to conclude independent additional agreements that relate to the non-exclusive distribution of the work in the form in which it was published by this journal (for example, to post the work in the electronic repository of the institution or publish as part of a monograph), providing the link to the first publication of the work in this journal.
- Other terms stated in the Copyright Agreement.