Sensors, Vol. 24, Pages 1634: Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets

Sensors, Vol. 24, Pages 1634: Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets Sensors doi: 10.3390/s24051634 Authors: Aakash Tripathi Asim Waqas Kavya Venkatesan Yasin Yilmaz Ghulam Rasool The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS)—a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS consolidates over 41,000 cases from across repositories while achieving a high compression ratio relative to the 3.78 PB source data size. It offers sub-5-s query response times for interactive exploration. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with...
Source: Sensors - Category: Biotechnology Authors: Tags: Article Source Type: research