publications | Camillo Maria Caruso

2026

ICIAP

Mask-Aware Transformers Enable Robust Learning from Incomplete Volumetric Medical Imaging

Camillo Maria Caruso, Riccardo Bruni, and Valerio Guarrasi

In Image Analysis and Processing - ICIAP 2025 Workshops, 2026

Abs DOI

Incomplete or sparsely sampled volumetric scans are pervasive across medical imaging modalities: patient motion, shortened acquisition protocols, and hardware constraints frequently result in missing slices that undermine downstream analysis. Conventional deep learning pipelines either discard these studies or rely on voxel-wise interpolation, potentially introducing artefactual signals. We present a Mask-Aware Vision Transformer (MAViT), a modality-agnostic architecture that learns directly from incomplete 3D volumes without synthetic reconstruction. MAViT leverages a binary slice-availability mask to identify corrupted patches and selectively suppress their contribution within each self-attention block, effectively guiding feature aggregation while mitigating the impact of missing data. To benchmark robustness, we synthetically corrupt brain MRI volumes from the Alzheimer’s Disease Neuroimaging Initiative with different slice-drop rates. Despite being trained on heterogeneous missing-slice patterns, MAViT achieves state-of-the-art performance on Alzheimer’s disease classification, surpassing 3D interpolation-based and 2D slice-wise baselines. These findings indicate that mask-aware modelling offers a valuable approach to learning from incomplete volumetric data, readily extending beyond brain MRI to other imaging modalities.

2025

IMAVIS

A systematic review of intermediate fusion in multimodal deep learning for biomedical applications

Valerio Guarrasi, Fatih Aksu, Camillo Maria Caruso, and 4 more authors

Image and Vision Computing, 2025

Abs DOI Supp

Deep learning has revolutionized biomedical research by providing sophisticated methods to handle complex, high-dimensional data. Multimodal deep learning (MDL) further enhances this capability by integrating diverse data types such as imaging, textual data, and genetic information, leading to more robust and accurate predictive models. In MDL, differently from early and late fusion methods, intermediate fusion stands out for its ability to effectively combine modality-specific features during the learning process. This systematic review comprehensively analyzes and formalizes current intermediate fusion methods in biomedical applications, highlighting their effectiveness in improving predictive performance and capturing complex inter-modal relationships. We investigate the techniques employed, the challenges faced, and potential future directions for advancing intermediate fusion methods. Additionally, we introduce a novel structured notation that standardizes intermediate fusion architectures, enhancing understanding and facilitating implementation across various domains. Our findings provide actionable insights and practical guidelines intended to support researchers, healthcare professionals, and the broader deep learning community in developing more sophisticated and insightful multimodal models. Through this review, we aim to provide a foundational framework for future research and practical applications in the dynamic field of MDL.
CIBM

MARIA: a Multimodal Transformer Model for Incomplete Healthcare Data

Camillo Maria Caruso, Paolo Soda, and Valerio Guarrasi

Computers in Biology and Medicine, 2025

Abs DOI Code

In healthcare, the integration of multimodal data is pivotal for developing comprehensive diagnostic and predictive models. However, managing missing data remains a significant challenge in real-world applications. We introduce MARIA (Multimodal Attention Resilient to Incomplete datA), a novel transformer-based deep learning model designed to address these challenges through an intermediate fusion strategy. Unlike conventional approaches that depend on imputation, MARIA utilizes a modified masked self-attention mechanism, which processes only the available data without generating synthetic values. This approach enables it to effectively handle incomplete datasets, enhancing robustness and minimizing biases introduced by imputation methods. We evaluated MARIA against 10 state-of-the-art machine learning and deep learning models across 8 diagnostic and prognostic tasks. The results demonstrate that MARIA outperforms existing methods in terms of performance and resilience to varying levels of data incompleteness, underscoring its potential for critical healthcare applications. To support transparency and encourage further research, the source code is openly available at https://github.com/cosbidev/MARIA.
arXiv

Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining

Daniele Molino, Camillo Maria Caruso, Filippo Ruffini, and 2 more authors

arXiv preprint arXiv:2506.00633, 2025

Abs arXiv Code

Objective: While recent advances in text-conditioned generative models have enabled the synthesis of realistic medical images, progress has been largely confined to 2D modalities such as chest X-rays. Extending text-to-image generation to volumetric CT remains a significant challenge, due to its high dimensionality, anatomical complexity, and the absence of robust frameworks that align vision-language data in 3D medical imaging. Methods: We introduce a novel architecture for Text-to-CT generation that combines a latent diffusion model with a 3D contrastive vision-language pretraining scheme. Our approach leverages a dual-encoder CLIP-style model trained on paired CT volumes and radiology reports to establish a shared embedding space, which serves as the conditioning input for generation. CT volumes are compressed into a low-dimensional latent space via a pretrained volumetric VAE, enabling efficient 3D denoising diffusion without requiring external super-resolution stages. Results: We evaluate our method on the CT-RATE dataset and conduct a comprehensive assessment of image fidelity, clinical relevance, and semantic alignment. Our model achieves competitive performance across all tasks, significantly outperforming prior baselines for text-to-CT generation. Moreover, we demonstrate that CT scans synthesized by our framework can effectively augment real data, improving downstream diagnostic performance. Conclusion: Our results show that modality-specific vision-language alignment is a key component for high-quality 3D medical image generation. By integrating contrastive pretraining and volumetric diffusion, our method offers a scalable and controllable solution for synthesizing clinically meaningful CT volumes from text, paving the way for new applications in data augmentation, medical education, and automated clinical simulation. Code at https://github.com/cosbidev/Text2CT.
JITC

Transformer-based AI approach to unravel long-term, time-dependent prognostic complexity in patients with advanced NSCLC and PD-L1 ≥50%: insights from the pembrolizumab 5-year global registry

Alessio Cortellini, Valentina Santo, Leonardo Brunetti, and 8 more authors

Journal for Immunotherapy of Cancer, 2025

Abs DOI

Background With nearly one-third of patients with advanced non-small cell lung cancer (NSCLC) and PD-L1 Tumor Proportion Score≥50% surviving beyond 5 years following first-line pembrolizumab, long-term outcomes challenge traditional paradigms of cancer prognostication. The emergence of non-cancer-related factors and time-dependent trends underscores the need for advanced analytical frameworks to unravel their complex interplay. Methods We analyzed the Pembro-real 5Y registry, a global real-world dataset of 1050 patients treated across 61 institutions in 14 countries with a long-term follow-up and a large panel of baseline variables. Two complementary approaches were employed: ridge regression, chosen for its ability to address multicollinearity while retaining interpretability, and not another imputation method (NAIM), a transformer-based artificial intelligence model designed to handle missing data without imputation. Endpoints included risk of death at 6, 12, 24, 60 months and 5-year survival. Results The ridge regression model achieved a c-statistic of 0.66 (95% CI: 0.59 to 0.72) for the risk of death and an area under the curve (AUC) of 0.72 (95% CI: 0.65 to 0.78) for 5-year survival, identifying Eastern Cooperative Oncology Group Performance Status (ECOG-PS)≥2, increasing age, and metastatic burden as primary risk factors. However, wide CIs for some predictors highlighted statistical instability. NAIM demonstrated robust handling of missing data, with a c-index of 62.98±2.11 for risk of death and an AUC of 60.52±3.71 for 5-year survival. The comprehensive SHapley Additive exPlanations analysis revealed dynamic, time-dependent patterns, with early mortality dominated by acute factors (eg, ECOG-PS, steroids) and long-term outcomes increasingly influenced by systemic health markers (eg, absence of hypertension, increasing body mass index). Unexpected insights included the protective role of dyslipidemia (but not statins) and the nuanced impact of smoking status, reflecting evolving disease dynamics and host-tumor interplay. Conclusions Our integrative framework illuminates the complexity of long-term outcomes in patients with NSCLC treated with pembrolizumab, uncovering dynamic, non-linear prognostication trends. This analysis provides insights into patient trajectories, emphasizing the need for holistic, long-term management strategies.
LUNG

Long-term outcomes from pembrolizumab monotherapy in patients with advanced NSCLC, PD-L1 expression ≥50%, and poor performance status: Transformer-based AI to characterize prognostic complexity

Alessio Cortellini, Edoardo Garbo, Giulia La Cava, and 69 more authors

Lung Cancer, 2025

Abs DOI

Background The use of first-line single agent immunotherapy in patients with advanced NSCLC and ECOG PS\geq2 remains controversial, as this frail population has been largely excluded from pivotal clinical trials. Real-world evidence suggests that although median survival is poor, a subset of these patients may achieve long-term benefit. Methods We analyzed data from the Pembro-Real 5Y registry, a global real-world dataset with >5 years follow-up. The cohort included patients with advanced NSCLC, PD-L1 TPS≥50%, treated with first line pembrolizumab outside of clinical trials. Univariable analyses were conducted to identify descriptive characteristics associated with survival. To address the complexity of long-term outcome prediction, we integrated Elastic Net regression and a transformer-based AI model (NAIM). The Elastic Net model was employed to mitigate collinearity and select relevant prognostic factors, while NAIM was used to explore non-linear, time-dependent interactions between variables. Endpoints included overall survival (OS) and 5-year survival rates. Results Out of 1050 patients, 161 patients with ECOG PS≥2 were included, showing a median OS of 5.4 months (95% CI: 3.8–7.8), and a 5-year survival rate of 13.0% (95% CI: 8.1–19.9). Univariable analysis indicated that no single baseline variable was strongly predictive of 5-year survival, except for TMB, KRAS, and BRAF status, which were significantly limited by missingness. Elastic Net identified only two significant predictors of 5-year survival: high TMB (with unstable confidence intervals) and KRAS mutation. NAIM provided a dynamic perspective, confirming that bone metastases and baseline corticosteroid use were strong predictors of early mortality, whereas BMI increase and systemic health markers/host factors (e.g., hypertension and dyslipidemia) gained importance in long-term survivors. However, NAIM exhibited a notable performance drop from training to validation suggesting overfitting and the challenge of modeling long-term outcomes using baseline static variables. Conclusions Despite the overall poor prognosis, a subset of patients with ECOG PS≥2 achieves long-term survival with pembrolizumab monotherapy, indicating that performance status alone should not preclude treatment in all cases. Our analysis highlights the limitations of traditional statistical approaches and AI-driven models in predicting long-term benefit in this heterogeneous population. Future efforts should focus on refining hybrid modeling strategies and incorporating prospective validation to better identify those who may benefit from immunotherapy beyond short-term expectations.

2024

CMPB

A Deep Learning Approach for Overall Survival Prediction in Lung Cancer with Missing Values

Camillo Maria Caruso, Valerio Guarrasi, Sara Ramella, and 1 more author

Computer Methods and Programs in Biomedicine, 2024

Abs DOI Code

Background and Objective: In the field of lung cancer research, particularly in the analysis of overall survival (OS), artificial intelligence (AI) serves crucial roles with specific aims. Given the prevalent issue of missing data in the medical domain, our primary objective is to develop an AI model capable of dynamically handling this missing data. Additionally, we aim to leverage all accessible data, effectively analyzing both uncensored patients who have experienced the event of interest and censored patients who have not, by embedding a specialized technique within our AI model, not commonly utilized in other AI tasks. Through the realization of these objectives, our model aims to provide precise OS predictions for non-small cell lung cancer (NSCLC) patients, thus overcoming these significant challenges. Methods: We present a novel approach to survival analysis with missing values in the context of NSCLC, which exploits the strengths of the transformer architecture to account only for available features without requiring any imputation strategy. More specifically, this model tailors the transformer architecture to tabular data by adapting its feature embedding and masked self-attention to mask missing data and fully exploit the available ones. By making use of ad-hoc designed losses for OS, it is able to account for both censored and uncensored patients, as well as changes in risks over time. Results: We compared our method with state-of-the-art models for survival analysis coupled with different imputation strategies. We evaluated the results obtained over a period of 6 years using different time granularities obtaining a Ct-index, a time-dependent variant of the C-index, of 71.97, 77.58 and 80.72 for time units of 1 month, 1 year and 2 years, respectively, outperforming all state-of-the-art methods regardless of the imputation method used. Conclusions: The results show that our model not only outperforms the state-of-the-art’s performance but also simplifies the analysis in the presence of missing data, by effectively eliminating the need to identify the most appropriate imputation strategy for predicting OS in NSCLC patients.
arXiv

Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets

Camillo Maria Caruso, Paolo Soda, and Valerio Guarrasi

arXiv preprint arXiv:2407.11540, 2024

Abs arXiv Code

Handling missing values in tabular datasets presents a significant challenge in training and testing artificial intelligence models, an issue usually addressed using imputation techniques. Here we introduce "Not Another Imputation Method" (NAIM), a novel transformer-based model specifically designed to address this issue without the need for traditional imputation techniques. NAIM’s ability to avoid the necessity of imputing missing values and to effectively learn from available data relies on two main techniques: the use of feature-specific embeddings to encode both categorical and numerical features also handling missing inputs; the modification of the masked self-attention mechanism to completely mask out the contributions of missing data. Additionally, a novel regularization technique is introduced to enhance the model’s generalization capability from incomplete data. We extensively evaluated NAIM on 5 publicly available tabular datasets, demonstrating its superior performance over 6 state-of-the-art machine learning models and 5 deep learning models, each paired with 3 different imputation techniques when necessary. The results highlight the efficacy of NAIM in improving predictive performance and resilience in the presence of missing data. To facilitate further research and practical application in handling missing data without traditional imputation methods, we made the code for NAIM available at https://github.com/cosbidev/NAIM.
Ital-IA 2024

Towards AI-driven next generation personalized healthcare and well-being

Fatih Aksu, Alessandro Bria, Alice Natalina Caragliano, and 8 more authors

In 2024 Ital-IA Intelligenza Artificiale-Thematic Workshops, Ital-IA 2024, Naples, Italy, May 29-30, 2024, 2024

Abs HTML

In the last few years Artificial Intelligence (AI) is emerging as a game changer in many areas of society and, in particular, its integration in medicine heralds a transformative approach towards personalized healthcare and well-being, promising significant improvements in diagnostic precision, therapeutic outcomes, and patient care. Our research explores the cuttingedge realms of multimodal AI, resilient AI, and healthcare robotics, aiming to harness the synergy of diverse data modalities and advanced computational models to redefine healthcare paradigms. This multidisciplinary effort seeks to bridge technology and clinical practice, advancing AI-driven next generation personalized healthcare and well-being.

2023

Ital-IA 2023

Building an AI-Enabled Metaverse for Intelligent Healthcare: Opportunities and Challenges

Valerio Guarrasi, Lorenzo Tronchin, Camillo Maria Caruso, and 8 more authors

In CEUR WORKSHOP PROCEEDINGS, 2023

Abs HTML

This abstract discusses the development of a metaverse for intelligent healthcare, which involves creating a virtual environment where healthcare professionals, patients, and researchers can interact and collaborate using digital technologies. The metaverse can improve the efficiency and effectiveness of healthcare services and provide new opportunities for research and innovation. AI models are necessary for analyzing patient data and providing personalized healthcare recommendations, but the data in a metaverse setting is inherently multimodal, unstructured, noisy, incomplete, limited, or partially inconsistent, which poses a challenge for AI models. However, it becomes necessary the integration of AI models for the development of virtual scanners to simulate image modalities, and robotics to simulate surgical procedures within a virtual environment. The ultimate goal is to leverage the power of AI to enhance the quality of healthcare in a metaverse for intelligent healthcare, which has the potential to transform the way healthcare services are delivered and improve health outcomes for patients worldwide.
IEEE Access

A Cascade of Learners for Firemen’ Emergency Events Classification

Camillo Maria Caruso, Paolo Soda, Carlo Giammichele, and 2 more authors

IEEE Access, 2023

2022

MDPI

A Multimodal Ensemble Driven by Multiobjective Optimisation to Predict Overall Survival in Non-Small-Cell Lung Cancer

Camillo Maria Caruso, Valerio Guarrasi, Ermanno Cordelli, and 8 more authors

Journal of Imaging, 2022

Abs DOI

Lung cancer accounts for more deaths worldwide than any other cancer disease. In order to provide patients with the most effective treatment for these aggressive tumours, multimodal learning is emerging as a new and promising field of research that aims to extract complementary information from the data of different modalities for prognostic and predictive purposes. This knowledge could be used to optimise current treatments and maximise their effectiveness. To predict overall survival, in this work, we investigate the use of multimodal learning on the CLARO dataset, which includes CT images and clinical data collected from a cohort of non-small-cell lung cancer patients. Our method allows the identification of the optimal set of classifiers to be included in the ensemble in a late fusion approach. Specifically, after training unimodal models on each modality, it selects the best ensemble by solving a multiobjective optimisation problem that maximises both the recognition performance and the diversity of the predictions. In the ensemble, the labels of each sample are assigned using the majority voting rule. As further validation, we show that the proposed ensemble outperforms the models learning a single modality, obtaining state-of-the-art results on the task at hand.