Panels summary > Foundation Models for Science: Myth or Reality?

Panel: Foundation Models for Science: Myth or Reality?

Takeaways from the SCOPE 26 Panel

This post summarizes the panel discussion held on March 11, 2026, at the SCOPE conference (Science at the Convergence of AI and Exascale Computing), hosted at the Institut Henri Poincaré, Paris. The panel brought together Gael Varoquaux (INRIA), François Lanusse (CNRS), Gregor Kasieczka (University of Hamburg), and Pete Beckman (Northwestern University), moderated by Pedro L. C. Rodrigues (INRIA).

The term “foundation model” is everywhere, but what does it actually mean for scientific research? And is the promise real, or mostly hype? This was the animating question behind one of the liveliest sessions at SCOPE 26, and the panelists did not shy away from the hard parts.

What is a foundation model?

The discussion opened with a definitional debate. The panelists largely converged on the idea that what matters is not how a model is built, but how it is used: a foundation model is one that can be reused and adapted across a broad range of tasks. François Lanusse tied this to the older concept of transfer learning; Gael Varoquaux traced the term back to its introduction by Bommasani et al. in 2021; Pete Beckman emphasized scale as the key differentiator: a foundation model is something you could not afford to retrain yourself, built with a massive upfront investment by others, and designed to be reused by the many.

Where AI genuinely helps science — and where it doesn’t (yet)

The panelists identified a clear context where foundation models add real value: domains where no equations or simulations exist, or where data is too complex for traditional modeling. Pete Beckman illustrated this with edge cases like detecting volcanic eruptions or wildlife events in video streams, where labeled data is scarce and classical methods struggle.

Gael Varoquaux made a more philosophical point: physics has long operated under the assumption that simple mathematical laws are sufficient descriptions of the world. AI challenges that assumption: it lets us go to places where we cannot track the equations. He was careful, however, to note that predictability remains the golden rule of science: a model’s value is measured by how well it generalizes, not by the elegance of its architecture.

François Lanusse offered the most candid admission: while foundation models clearly accelerate scientific development and shrink time-to-discovery, there is not yet a clear example of a discovery that would have been impossible without them.

The scale problem: who builds the telescope?

A recurring theme was the concentration of resources required to train large models. Gregor Kasieczka drew a compelling analogy: foundation models for science are like CERN or a large radio telescope, an infrastructure that requires coordinated, community-level investment, not individual labs. He was also blunt on the current state: scientific foundation models are probably too small to reach the most interesting regime. The field is not yet at the right scale.

Pete Beckman reframed this as an opportunity: AI inverts the traditional HPC economics. Supercomputers are expensive and historically allocated to a narrow set of projects. With foundation models, the heavy compute is front-loaded in training, and inference is cheap — which means a model trained on a large supercomputer can genuinely serve the many.

Gael Varoquaux added a note of institutional humility: scientists are funded by society to build knowledge, not primarily to produce papers. The economic rationale for investing public money in very large foundation models needs to be clearly articulated, and it is not always obvious, especially for long-tail rare events where the benefit is diffuse.

The data dilemma

Scientific data is often heterogeneous, hard to access, and not in a shape that makes training straightforward. François Lanusse pointed to a silver lining: the process of preparing data for foundation model training forces communities to standardize and document their datasets, a side effect that benefits science well beyond AI. This mirrors a broader transition: just as scientific software moved from single-person scripts to shared community libraries, FM training is pushing data practices in the same direction.

Pete Beckman raised the metadata problem: the hidden knowledge in a dataset is not always findable, and asking “is there enough metadata here?” is itself scientifically valuable. Gregor Kasieczka pushed back on the scarcity framing: in physics, at least, there is a data deluge, not a drought, and the hard cases are rare events, not typical ones. Gael Varoquaux countered with medical data as the domain where scarcity is genuine and expensive, and where coverage matters more than raw volume.

Stochasticity: a bug or a feature?

A sub-debate emerged around randomness. Pete Beckman warned against replacing deterministic simulation components with stochastic models — reproducibility and explainability suffer. The counter-argument, from Gael Varoquaux and François Lanusse, was that stochasticity is not inherently bad. Science has always lived with randomness (statistical mechanics, clinical trials). The real obligation is to document and quantify it: being transparent about how a result was reached and measuring its sensitivity to stochastic choices.

Evaluation: the unsolved problem

How do you benchmark a foundation model for science? The panelists were largely aligned: task-specific benchmarks exist and are not the main concern. The deeper issue is that models trained to perform well on known tasks are poorly suited to finding surprising things, and surprise is often where science lives.

Gregor Kasieczka raised an additional worry: if everyone uses the same foundation models, we risk correlated failures across the scientific community. Diversity of models is not just an aesthetic preference, it is a safeguard.

Calls to action

The closing remarks crystallized a few key messages:

Think in terms of goals, not tools. Gael Varoquaux’s most memorable line: we define our jobs by the things we do, but we should define them by the goals we pursue. The question is which sciences we enable, not which models we use.

Treat foundation models as shared infrastructure. Kasieczka’s CERN analogy implies a call to action: the community should organize around FM training the way it organizes around large instruments, with shared governance, shared credit, and shared access.

Exploit the training/inference asymmetry. Pete Beckman’s point deserves to be a planning principle: the heavy upfront cost of training a scientific foundation model, amortized across many users and tasks, can be a radically efficient use of compute.

Invest in data infrastructure as a prerequisite. Even teams not yet building foundation models should invest in data curation and documentation — the payoff comes regardless of whether a large model ever gets trained.

Build reproducibility tools alongside models. The panel converged on the view that releasing a foundation model without tools to audit it, version it, and test sensitivity to model choice leaves scientific conclusions on shaky ground. Version control, model diversity, and interpretability need to be first-class research outputs.

Notes taken live during the session by Thomas Moreau (INRIA). Quotes are reconstructed from notes and may not be verbatim.

Privacy | Accessibility: non-compliant