Conditional-Based Transformer Network With Learnable Queries for 4D Deformation Forecasting and Tracking

Real-time motion management for image-guided radiation therapy interventions plays an important role for accurate dose delivery. Forecasting future 4D deformations from in-plane image acquisitions is fundamental for accurate dose delivery and tumor targeting. However, anticipating visual representations is challenging and is not exempt from hurdles such as the prediction from limited dynamics, and the high-dimensionality inherent to complex deformations. Also, existing 3D tracking approaches typically need both template and search volumes as inputs, which are not available during real-time treatments. In this work, we propose an attention-based temporal prediction network where features extracted from input images are treated as tokens for the predictive task. Moreover, we employ a set of learnable queries, conditioned on prior knowledge, to predict future latent representation of deformations. Specifically, the conditioning scheme is based on estimated time-wise prior distributions computed from future images available during the training stage. Finally, we propose a new framework to address the problem of temporal 3D local tracking using cine 2D images as inputs, by employing latent vectors as gating variables to refine the motion fields over the tracked region. The tracker module is anchored on a 4D motion model, which provides both the latent vectors and the volumetric motion estimates to be refined. Our approach avoids auto-regression and leverages spatial transformation...
Source: IEE Transactions on Medical Imaging - Category: Biomedical Engineering Source Type: research