COMPARATIVE ANALYSIS OF DEEP LEARNING MODELS FOR CHEST DISEASE DIAGNOSIS USING NIH X-RAY DATASET
DOI:
https://doi.org/10.37943/25DECX4995%20Keywords:
chest X-ray, deep learning, convolutional neural networks, ResNet50, DenseNet121, medical image analysis, diagnostic accuracy, transfer learning, AUC-ROC, NIH Chest X-ray DatasetAbstract
The integration of deep learning in medical image analysis has significantly advanced computer-aided diagnosis, particularly in chest radiography. However, selecting an optimal convolutional neural network (CNN) architecture for reliable disease classification remains a critical challenge due to data variability, annotation quality, and architectural trade-offs. This study presents a comparative evaluation of three CNN models - DenseNet121, ResNet50, and a custom SimpleCNN - for automated detection of pulmonary infiltrations using a subset of the NIH Chest X-ray dataset. To ensure computational feasibility, only one archive segment was used, and preprocessing included filtering, normalization, and image resizing to 224×224 pixels. Models were trained using cross-entropy loss with the Adam optimizer for five epochs and evaluated on a 20% test split. The performance was assessed using multiple diagnostic metrics essential in medical imaging - accuracy, precision, recall, F1-score, and AUC-ROC - to provide a comprehensive understanding beyond overall accuracy. The ResNet50 model achieved the highest test accuracy and the most balanced trade-off across precision and recall, outperforming DenseNet121 and SimpleCNN. Despite these moderate results, the findings confirm that pre-trained deep architectures generalize more effectively than shallow networks under limited data conditions. The study underscores the impact of dataset size, image resolution, and label quality on diagnostic outcomes. These results form a methodological baseline for further research, where improvements are expected through training on the complete dataset, using full-resolution images, and refining model hyperparameters. Ultimately, this comparative framework contributes to identifying optimal CNN architectures for future clinical diagnostic support systems. Additionally, this study highlights the limitations of small-scale datasets and emphasizes the importance of data augmentation and extended training strategies for improving model performance in medical imaging tasks.
References
Rajpurkar, P. et al. (2017). CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv:1711.05225. https://arxiv.org/abs/1711.05225
Baltruschat, I. M., Nickisch, H., Grass, M., Knopp, T., & Saalbach, A. (2018). Comparison of Deep Learning Approaches for Multi-Label Chest X‐Ray Classification. https://arxiv.org/abs/1803.02315
Pan I, Agarwal S, Merck D. Generalizable Inter-Institutional Classification of Abnormal Chest Radiographs Using Efficient Convolutional Neural Networks. J Digit Imaging. 2019 Oct;32(5):888-896. https://doi.org/10.1007/s10278-019-00180-9
Ucan M, Kaya B, Aygun O, Kaya M, Alhajj R. Comparison of EfficientNet CNN models for multi-label chest X-ray disease diagnosis. PeerJ Comput Sci. 2025 Jul 1;11:e2968. doi: 10.7717/peerj-cs.2968
Rahman, T., Khandakar, A., Abdul Kadir, M., Islam, K. K., Islam, F., Mazhar, R., Hamid, T., Islam, M. T., Mahbub, Z. B., & Ayari, M. A. (2020). Reliable Tuberculosis Detection using Chest X-ray with Deep Learning, Segmentation and Visualization. arXiv. https://doi.org/10.48550/arXiv.2007.14895
Badr M, Al-Otaibi S, Alturki N, Abir T. Deep Learning-Based Networks for Detecting Anomalies in Chest X-Rays. Biomed Res Int. 2022 Jul 23;2022:7833516. doi: 10.1155/2022/7833516
Rehman A, Khan A, Fatima G, Naz S, Razzak I. Review on chest pathogies detection systems using deep learning techniques. Artif Intell Rev. 2023 Mar 20:1-47. doi: 10.1007/s10462-023-10457-9
NIH Chest X-ray dataset documentation. NIH, Google Cloud. (n.d.). NIH Chest X-ray dataset consists of 100,000 de-identified images in PNG format. Retrieved from https://cloud.google.com/healthcare-api/docs/resources/public-datasets/nih-chest
Powers, D. M. W. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv. https://doi.org/10.48550/ARXIV.2010.16061
Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1). https://doi.org/10.1186/s12864-019-6413-7
Wang, J., Wang, S., & Zhang, Y. (2024). Deep learning on medical image analysis. CAAI Transactions on Intelligence Technology, 10(1), 1–35. https://doi.org/10.1049/cit2.12356
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1512.03385
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2017.243
Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, R. M. (2017). ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. arXiv. https://doi.org/10.48550/ARXIV.1705.02315
Oakden-Rayner, L. (2019). Exploring large scale public medical image datasets (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1907.12720
Maguolo, G., & Nanni, L. (2021). A critic evaluation of methods for COVID-19 automatic detection from X-ray images. Information Fusion, 76, 1–7. https://doi.org/10.1016/j.inffus.2021.04.008
Garcea, F., Serra, A., Lamberti, F., & Morra, L. (2023). Data augmentation for medical imaging: A systematic literature review. Computers in Biology and Medicine, 152, 106391. https://doi.org/10.1016/j.compbiomed.2022.106391
Kebaili, A., Lapuyade-Lahorgue, J., & Ruan, S. (2023). Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. Journal of Imaging, 9(4), 81. https://doi.org/10.3390/jimaging9040081
Loshchilov, I., & Hutter, F. (2017). Decoupled Weight Decay Regularization (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1711.05101
Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on Image Data Augmentation for Deep Learning. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0197-0
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Articles are open access under the Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish a manuscript in this journal agree to the following terms:
- The authors reserve the right to authorship of their work and transfer to the journal the right of first publication under the terms of the Creative Commons Attribution License, which allows others to freely distribute the published work with a mandatory link to the the original work and the first publication of the work in this journal.
- Authors have the right to conclude independent additional agreements that relate to the non-exclusive distribution of the work in the form in which it was published by this journal (for example, to post the work in the electronic repository of the institution or publish as part of a monograph), providing the link to the first publication of the work in this journal.
- Other terms stated in the Copyright Agreement.