Pure Vision Transformer (CT-ViT) with Noise2Neighbors Interpolation for Low-Dose CT Image Denoising

AbstractConvolutional neural networks (CNN) have been used for a wide variety of deep learning applications, especially in computer vision. For medical image processing, researchers have identified certain challenges associated with CNNs. These challenges encompass the generation of less informative features, limitations in capturing both high and low-frequency information within feature maps, and the computational cost incurred when enhancing receptive fields by deepening the network. Transformers have emerged as an approach aiming to address and overcome these specific limitations of CNNs in the context of medical image analysis. Preservation of all spatial details of medical images is necessary to ensure accurate patient diagnosis. Hence, this research introduced the use of a pure Vision Transformer (ViT) for a denoising artificial neural network for medical image processing specifically for low-dose computed tomography (LDCT) image denoising. The proposed model follows a U-Net framework that contains ViT modules with the integration of Noise2Neighbor (N2N) interpolation operation. Five different datasets containing LDCT and normal-dose CT (NDCT) image pairs were used to carry out this experiment. To test the efficacy of the proposed model, this experiment includes comparisons between the quantitative and visual results among CNN-based (BM3D, RED-CNN, DRL-E-MP), hybrid CNN-ViT-based (TED-Net), and the proposed pure ViT-based denoising model. The findings of this study show...
Source: Journal of Digital Imaging - Category: Radiology Source Type: research