

Multi-modal machine learning, while more robust in its estimations over unimodal methods, has drawbacks in its scalability and the time-consuming nature of information concatenation.Ĭlinical decision support has long been an aim for those implementing algorithms and machine learning in the health sphere 1, 2, 3. However, those that did achieved an average increase of 6.4% in predictive accuracy. Few papers compared the outputs of a multimodal approach with a unimodal prediction. These findings provide a summary on multimodal data fusion as applied to health diagnosis/prognosis problems. Lacking from the papers were clear clinical deployment strategies, FDA-approval, and analysis of how using multimodal approaches from diverse sub-populations may improve biases and healthcare disparities. Notably, there was an improvement in predictive performance when using data fusion.

Early fusion was the most common data merging strategy. The most common health areas utilizing multi-modal methods were neurology and oncology. A final set of 128 articles were included in the analysis. Search strings were established and used in databases: PubMed, Google Scholar, and IEEEXplore from 2011 to 2021. We conducted this review in accordance with the PRISMA extension for Scoping Reviews to characterize multi-modal data fusion in health. This review was conducted to summarize the current studies in this field and identify topics ripe for future research. Attempts to improve prediction and mimic the multimodal nature of clinical expert decision-making has been met in the biomedical field of machine learning by fusing disparate data. Its use has historically been focused on single modal data. Machine learning is frequently being leveraged to tackle problems in the health sector including utilization for clinical decision-support.
