COMPARATIVE ANALYSIS OF DEEP LEARNING MODELS FOR CHEST DISEASE DIAGNOSIS USING NIH X-RAY DATASET

Dinara  Kaibassova; Kalizhan Akhmetov

doi:10.37943/25DECX4995

Authors

Dinara Kaibassova Astana IT University https://orcid.org/0000-0002-8410-7758
Kalizhan Akhmetov Astana IT University https://orcid.org/0009-0008-5488-0329

DOI:

https://doi.org/10.37943/25DECX4995%20

Keywords:

chest X-ray, deep learning, convolutional neural networks, ResNet50, DenseNet121, medical image analysis, diagnostic accuracy, transfer learning, AUC-ROC, NIH Chest X-ray Dataset

Abstract

The integration of deep learning in medical image analysis has significantly advanced computer-aided diagnosis, particularly in chest radiography. However, selecting an optimal convolutional neural network (CNN) architecture for reliable disease classification remains a critical challenge due to data variability, annotation quality, and architectural trade-offs. This study presents a comparative evaluation of three CNN models - DenseNet121, ResNet50, and a custom SimpleCNN - for automated detection of pulmonary infiltrations using a subset of the NIH Chest X-ray dataset. To ensure computational feasibility, only one archive segment was used, and preprocessing included filtering, normalization, and image resizing to 224×224 pixels. Models were trained using cross-entropy loss with the Adam optimizer for five epochs and evaluated on a 20% test split. The performance was assessed using multiple diagnostic metrics essential in medical imaging - accuracy, precision, recall, F1-score, and AUC-ROC - to provide a comprehensive understanding beyond overall accuracy. The ResNet50 model achieved the highest test accuracy and the most balanced trade-off across precision and recall, outperforming DenseNet121 and SimpleCNN. Despite these moderate results, the findings confirm that pre-trained deep architectures generalize more effectively than shallow networks under limited data conditions. The study underscores the impact of dataset size, image resolution, and label quality on diagnostic outcomes. These results form a methodological baseline for further research, where improvements are expected through training on the complete dataset, using full-resolution images, and refining model hyperparameters. Ultimately, this comparative framework contributes to identifying optimal CNN architectures for future clinical diagnostic support systems. Additionally, this study highlights the limitations of small-scale datasets and emphasizes the importance of data augmentation and extended training strategies for improving model performance in medical imaging tasks.

Author Biographies

Dinara Kaibassova , Astana IT University

PhD, Associate Professor, School of Software Engineering

Kalizhan Akhmetov, Astana IT University

Master’s student, School of Software Engineering

References

Rajpurkar, P. et al. (2017). CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv:1711.05225. https://arxiv.org/abs/1711.05225

Baltruschat, I. M., Nickisch, H., Grass, M., Knopp, T., & Saalbach, A. (2018). Comparison of Deep Learning Approaches for Multi-Label Chest X‐Ray Classification. https://arxiv.org/abs/1803.02315

Pan I, Agarwal S, Merck D. Generalizable Inter-Institutional Classification of Abnormal Chest Radiographs Using Efficient Convolutional Neural Networks. J Digit Imaging. 2019 Oct;32(5):888-896. https://doi.org/10.1007/s10278-019-00180-9

Ucan M, Kaya B, Aygun O, Kaya M, Alhajj R. Comparison of EfficientNet CNN models for multi-label chest X-ray disease diagnosis. PeerJ Comput Sci. 2025 Jul 1;11:e2968. doi: 10.7717/peerj-cs.2968

Rahman, T., Khandakar, A., Abdul Kadir, M., Islam, K. K., Islam, F., Mazhar, R., Hamid, T., Islam, M. T., Mahbub, Z. B., & Ayari, M. A. (2020). Reliable Tuberculosis Detection using Chest X-ray with Deep Learning, Segmentation and Visualization. arXiv. https://doi.org/10.48550/arXiv.2007.14895

Badr M, Al-Otaibi S, Alturki N, Abir T. Deep Learning-Based Networks for Detecting Anomalies in Chest X-Rays. Biomed Res Int. 2022 Jul 23;2022:7833516. doi: 10.1155/2022/7833516

Rehman A, Khan A, Fatima G, Naz S, Razzak I. Review on chest pathogies detection systems using deep learning techniques. Artif Intell Rev. 2023 Mar 20:1-47. doi: 10.1007/s10462-023-10457-9

NIH Chest X-ray dataset documentation. NIH, Google Cloud. (n.d.). NIH Chest X-ray dataset consists of 100,000 de-identified images in PNG format. Retrieved from https://cloud.google.com/healthcare-api/docs/resources/public-datasets/nih-chest

Powers, D. M. W. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv. https://doi.org/10.48550/ARXIV.2010.16061

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1). https://doi.org/10.1186/s12864-019-6413-7

Wang, J., Wang, S., & Zhang, Y. (2024). Deep learning on medical image analysis. CAAI Transactions on Intelligence Technology, 10(1), 1–35. https://doi.org/10.1049/cit2.12356

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1512.03385

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2017.243

Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, R. M. (2017). ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. arXiv. https://doi.org/10.48550/ARXIV.1705.02315

Oakden-Rayner, L. (2019). Exploring large scale public medical image datasets (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1907.12720

Maguolo, G., & Nanni, L. (2021). A critic evaluation of methods for COVID-19 automatic detection from X-ray images. Information Fusion, 76, 1–7. https://doi.org/10.1016/j.inffus.2021.04.008

Garcea, F., Serra, A., Lamberti, F., & Morra, L. (2023). Data augmentation for medical imaging: A systematic literature review. Computers in Biology and Medicine, 152, 106391. https://doi.org/10.1016/j.compbiomed.2022.106391

Kebaili, A., Lapuyade-Lahorgue, J., & Ruan, S. (2023). Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. Journal of Imaging, 9(4), 81. https://doi.org/10.3390/jimaging9040081

Loshchilov, I., & Hutter, F. (2017). Decoupled Weight Decay Regularization (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1711.05101

Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on Image Data Augmentation for Deep Learning. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0197-0

COMPARATIVE ANALYSIS OF DEEP LEARNING MODELS FOR CHEST DISEASE DIAGNOSIS USING NIH X-RAY DATASET

Authors

DOI:

Keywords:

Abstract

Author Biographies

Dinara Kaibassova , Astana IT University

Kalizhan Akhmetov, Astana IT University

References

Downloads

Published

How to Cite

Issue

Section

License