TransCeption: Enhancing Medical Image Segmentation with an Inception-Like Transformer Design for Efficient Feature Fusion

Azad, Reza and Jia, Yiwei and Aghdam, Ehsan Khodapanah and Cohen-Adad, Julien and Merhof, Dorit (2025) TransCeption: Enhancing Medical Image Segmentation with an Inception-Like Transformer Design for Efficient Feature Fusion. COMPUTATIONAL VISUAL MEDIA, 11 (5). pp. 1079-1095. ISSN 2096-0433, 2096-0662

Full text not available from this repository. (Request a copy)

Abstract

While CNN-based methods have been the cornerstone of medical image segmentation due to their promising performance and robustness, they suffer from limitations in capturing long-range dependencies. Transformer-based approaches are currently prevailing since they enlarge the receptive field to model global contextual correlations. To further extract rich representations, some extensions of U-Net employ multi-scale feature extraction and fusion modules to obtain improved performance. Inspired by this idea, we propose TransCeption for medical image segmentation, a pure transformer-based U-shaped network incorporating an inception-like module in the encoder and adopting a contextual bridge for better feature fusion. The design proposed in this work is based on three core principles. (i) The patch merging module in the encoder is redesigned to use ResInception Patch Merging (RIPM). The Multi-Branch (MB) transformer has the same number of branches as the outputs of RIPM. Combining the two modules enables the model to capture a multi-scale representation within a single stage. (ii) We apply an Intra-stage Feature Fusion (IFF) module following the MB transformer to enhance the aggregation of feature maps from all branches and particularly focus on the interaction between the different channels at all scales. (iii) In contrast to a bridge that only contains token-wise self-attention, we propose a Dual Transformer Bridge that also includes channel-wise self-attention to exploit correlations between scales at different stages from a dual perspective. Extensive experiments on multi-organ and skin lesion segmentation tasks show the superiority of TransCeption to previous work. The code is publicly available on GitHub.

Item Type: Article
Uncontrolled Keywords: Transformers; Merging; Decoding; Image segmentation; Computational modeling; Bridges; Computer architecture; Semantics; Medical diagnostic imaging; Feature extraction; transformer; medical image segmentation; multi-scale feature fusion; inception
Subjects: 000 Computer science, information & general works > 004 Computer science
Divisions: Informatics and Data Science > Department Computational Life Science > Chair of Image Analysis and Computer Vision (Prof. Dr.-Ing. Dorit Merhof)
Depositing User: Dr. Gernot Deinzer
Date Deposited: 06 May 2026 08:07
Last Modified: 06 May 2026 08:07
URI: https://pred.uni-regensburg.de/id/eprint/66925

Actions (login required)

View Item View Item