An Analytical Study of Vision Transformer Models for Medical Image Classification and Segmentation

Authors

  • Wafaa Ayoub Kassara Accounting Technologies Department, Technical College of Management, Baghdad Middle Technical University Author

DOI:

https://doi.org/10.63964/JATUC.43.1.2026.17

Keywords:

Vision Transformer (ViT), Medical Image Analysis, Image Classification, Image Segmentation, Unified Models.

Abstract

This study aimed to demonstrate the importance of analyzing medical images through precise segmentation and pinpointing locations to obtain the finest details. It presented a model for image classification and data segmentation within an integrated framework. This is because traditional deep learning methods perform image segmentation, analysis, and classification separately. The unified model, based on vision converter technologies, features an innovative architecture that relies on a shared encoder and dual decoding. A single unified vision encoder creates a rich, integrated representation of the input image, which is then used concurrently by dedicated decoding specialists. This involves graded sampling for segmentation and a simple header for classification. This approach leverages the common features of the tasks to enhance efficiency and specificity while reducing structural redundancy. After extensive testing on a standard set of medical imaging data, the model demonstrated superior, accurate, and integrated performance compared to currently used hybrid structures and single-tasking models in terms of accuracy and efficiency. By integrating segmentation and classification functions into a single system, the model represents significant progress towards developing more efficient, user-friendly, and effective AI systems, thereby improving workflow and achieving optimal diagnostic results for the patients.

Downloads

Published

2026-02-15