Curation of myeloma observational study MALIMAR using XNAT: solving the challenges posed by real-world data

ConclusionsMALIMAR exemplifies the vital role that curation plays in machine learning studies that use real-world data. A research repository such as XNAT facilitates day-to-day management, ensures robustness and consistency and enhances the value of the final dataset. The types of process described here will be vital for future large-scale multi-institutional and multi-national imaging projects.Critical relevance statementThis article showcases innovative data curation methods using a state-of-the-art image repository platform; such tools will be vital for managing the large multi-institutional datasets required to train and validate generalisable ML algorithms and future foundation models in medical imaging.Key points• Heterogeneous data in the MALIMAR study required the development of novel curation strategies.• Correction of multiple problems affecting the real-world data was successful, but implications for machine learning are still being evaluated.• Modern image repositories have rich application programming interfaces enabling data enrichment and programmatic QA, making them much more than simple “image marts”.Graphical Abstract
Source: Insights into Imaging - Category: Radiology Source Type: research