OVERVIEW OF TRANSFORMER-BASED MODELS FOR MEDICAL IMAGE SEGMENTATION
DOI:
https://doi.org/10.37943/13BKBF2003Keywords:
Computer Vision, Transformers, Image processing, premedical diagnostics, SegmentationAbstract
Premedical diagnostics is the process of examining survey results. Correct premedical diagnostics can improve the process of patient management and reduce the burden on the medical sector. Diagnostics of medical images such as computed tomography and X-ray are obligatory steps for further treatment. However, the shortage of clinicians causes delays in this step. We observed two state-of-the-art algorithms proposed for medical image segmentation: TransUnet and Swin-Unet. We conducted a theoretical comparison of algorithms in terms of the applicability of pre-hospital diagnostics according to quality and speed of training. The comparison is based on the original source of code provided by the authors of the original articles. We chose these two algorithms because they have similar U-form architecture, a high level of citation, and show competitive DICE scores on pictures of various human organs. Some architectural features were also important. Both models inherit key elements of U-net. TransUnet is a hybrid Transformer and CNN model. It consists of Transformer encoder and a convolutional decoder. Some additional computations are required in the bottleneck. Swin-Unet is a fully Transformer-based model. These architectural differences give rise to a difference in the number of trainable parameters. Generally, deeper architectures with a bigger number of parameters usually show better performance, however, according to our review, Swin-Unet has a smaller number of parameters and shows better DICE and Hausdorff Distance. It should be noted that the distribution between false positive and false negative predictions is important in medical image processing. It is crucial to avoid overloading the medical sector while also not missing any sick patients. Precision and recall can be used to evaluate the ratio of incorrect predictions. Therefore, we also observed the results of caries segmentation where precision and DICE were provided. In this specific case, TransUnet shows better DICE and recall values but worse precision.
References
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., & Jackel, L.D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541-551. https://doi.org/10.1162/neco.1989.1.4.541
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009, June). ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248-255). IEEE. https://doi.org/10.1109/CVPR.2009.5206848
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). IEEE.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18 (pp. 234-241). Springer International Publishing. https://doi.org/10.1007/978-3-319-24574-4_28
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., & Wang, M. (2023, February). Swin-unet: Unet like pure transformer for medical image segmentation. In Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III (pp. 205-218). Cham: Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-25066-8_9
Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., ... & Zhou, Y. (2021). Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. https://doi.org/10.48550/arXiv.2102.04306
Jia, Q., & Shu, H. (2022, July). Bitr-unet: a cnn-transformer combined network for mri brain tumor segmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 7th International Workshop, BrainLes 2021, Held in Conjunction with MICCAI 2021, Virtual Event, September 27, 2021, Revised Selected Papers, Part II (pp. 3-14). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-09002-8_1
Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yang, X., Heng, P.A., ... & Jodoin, P.M. (2018). Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved?. IEEE transactions on medical imaging, 37(11), 2514-2525. https://doi.org/10.1109/TMI.2018.2837502
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., ... & Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
Multi-Atlas Labeling Beyond the Cranial Vault - Workshop and Challenge. Synapse multi-organ computer tomography dataset. [Data set]. Synapse.org. https://repo-prod.prod.sagebase.org/repo/v1/doi/loca te?id=syn3193805type=ENTITY. https://doi.org/10.7303/SYN3193805
Chen et. al. (2021). TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. Github. https://github.com/Beckschen/TransUNet
Hu et.al. (2022). Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. Github. https://github.com/HuCaoFighting/Swin-Unet
Ying, S., Wang, B., Zhu, H., Liu, W., & Huang, F. (2022). Caries segmentation on tooth X-ray images with a deep network. Journal of Dentistry, 119, 104076. https://doi.org/10.1016/j.jdent.2022.104076
Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-attention with relative position representations. arXiv preprint arXiv:1803.02155. https://doi.org/10.48550/arXiv.1803.02155
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Articles are open access under the Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish a manuscript in this journal agree to the following terms:
- The authors reserve the right to authorship of their work and transfer to the journal the right of first publication under the terms of the Creative Commons Attribution License, which allows others to freely distribute the published work with a mandatory link to the the original work and the first publication of the work in this journal.
- Authors have the right to conclude independent additional agreements that relate to the non-exclusive distribution of the work in the form in which it was published by this journal (for example, to post the work in the electronic repository of the institution or publish as part of a monograph), providing the link to the first publication of the work in this journal.
- Other terms stated in the Copyright Agreement.