Nature ML publication: why multimodal AI needs a 'deployment-first' approach
AI representation using multiple data sources

Nature ML publication: why multimodal AI needs a 'deployment-first' approach

Published:

We're excited to announce the successful publication of the paper, "Towards deployment-centric multimodal AI beyond vision and language" in the prestigious journal Nature Machine Intelligence! We are especially glad to recognise the contribution of former Simula PhD student, Anastasiia Grishina, who is a co-author on the paper.

Deployable multimodal AI

This paper, featuring contributions from over 40 authors, is a powerful example of open research and collaborations across different disciplines within healthcare, social science, engineering, science, sustainability, and finance. 

To tackle the complex, real-world challenges spanning these fields, the paper focuses on Multimodal Artificial Intelligence (AI) and deployability.

Multimodal Artificial Intelligence (AI) is an approach that combines different types of data—like images, text, sound, or numerical measurements—using machine learning. This integration helps us gain deeper insights, make better predictions, and improve decision-making in vital fields such as healthcare, science, and engineering.

Presently, much of the progress in multimodal AI focuses heavily on combining vision and language (think of models that describe an image or generate text from a video). A major bottleneck remains: making these advanced models deployable—meaning they can actually be used effectively in real-world settings.

The authors of the paper propose a shift to a deployment-centric workflow. This means we need to consider the practical limits and constraints of a deployment environment—like required speed, memory, or regulatory rules—early in the research process, not just at the end. This complements the existing focus on refining and optimisation of models and data.

Furthermore, the paper authors advocate for broader and deeper integration across various types of data and AI systems, moving beyond the current emphasis on just vision and language. The importance of interdisciplinary collaboration and working closely with stakeholders (the people who will use or be affected by the AI) is stressed. 

In this paper, there was an analysis of common challenges that multimodal AI faces across different fields, that then explored three critical real-world applications: pandemic response, self-driving car design, and climate change adaptation while drawing on expertise from diverse areas such as social science, sustainability, and finance.

By encouraging open research and fostering dialogue across different disciplines, our community can speed up the development of truly deployable multimodal AI that delivers a wide-ranging, positive impact on society.

Accelerating research through collaboration

A highlight of this collaborative success is the involvement of former PhD student Anastasiia, who is a contributing co-author on this paper. Her participation and subsequent co-authorship were made possible through a research stay program at The Alan Turing Institute (ATI) in London. An in-person workshop organised by the group leader in November 2023 was the first step in gathering ideas on challenges and successes of multi-modal AI. Real-world applications were the central topic of discussions from the first meeting. After the in-person gathering, the work continued online with multiple online and in-person sync-ups with topic leaders and review rounds. 

This research stay was funded by the Norwegian Artificial Intelligence Research Consortium (NORA), which includes Simula and coordinates the six-month research exchange program for select AI PhD students in Norway. Anastasiia's achievement showcases the direct, tangible benefits of this program and the value of interdisciplinary collaboration in high-impact research.

The publication of this paper exemplifies how open, cross-institutional partnerships can address complex challenges in AI, leading to significant societal impact.

Read the paper: 

Image source: The University of Sheffield