AI for Science: An Emerging Agenda - 2023

Details

Title : AI for Science: An Emerging Agenda Author(s): Berens, Philipp and Cranmer, Kyle and Lawrence, Neil D. and von Luxburg, Ulrike and Montgomery, Jessica Link(s) : http://arxiv.org/abs/2303.04217

Rough Notes

AI for Science requires modelling approaches that:

Can simulate physical and social systems for researchers to study them.
Learn causal relationships via data and structured knowledge of the world.
Work adaptively with domain experts to combine data-driven approaches with pre-existing domain knowledge.

Some progress driven by AI research include:

Helping to investigate different parts of the Earth's biosphere interact, and affected by climate change.
Supports modelling to reconstruct historical climate patterns, to enable better predictions for future climate variability.
Helping farmers to faster diagnoses of animal diseases, enabling better responses.
Advancing out understanding of the nature of dark matter.
Generating insights into the genetic processes that shape cell development.
Allowing researchers to analyse features of natural environments more accurately.
Helping model how different neural circuits fire to deliver different behaviours in animals.

Current areas of research in AI for science :

How to effectively combine observations, data-driven models and physical models to better understand complex systems? Such methods need to operate across different levels of granularity.
How do such AI systems align with what researchers already know, and how can such systems help uncover causal relationships in data?
How can AI systems be integrated into the scientific process safely and robustly?

Some directions in Simulation-based Inference (SBI):

Diagnostic checks in the self-consistency of the Bayesian joint distribution, which measure the scientific quality of the regions computed by Bayesian SBI methods.
Hybrid modelling, i.e. combining data-driven knowledge with mechanistic components specified from existing domain knowledge.

Some desiderata in AI systems with causality:

Be able to operate outside the iid setting, in environments different to what was trained on.
Be able to learn about tasks from few examples of the task in different conditions, and the be able to rapidly adapt to new environments.
Be able to support users to analyse the impact of different interventions on a system, giving explanations/ways to attribute credit to different actions.
Be able to respond to different ways of transmitting information between individuals and groups.

Some directions for AI systems with causality:

Transfer learning - Applying what was learnt from one task/domain to another task/domain.
Multi-task learning - Enabling a system to solve multiple tasks in multiple environments.
Adversarial learning - To reduce the vulnerability of models to performance degradation on out-of-distribution.
Causal representation learning - Defining variables that are related by causal models.
RL approaches that reward agents for identifying policies based on invariances over different conditions.

Neural ODEs have been shown to identify causal structures in time-series data. (, ) Describing causal effects as objective functions in constrained optimization problems. (, a)

Strategies to facilitate effective integration of domain knowledge into AI systems:

Algorithmic design.
AI integration in the lab.
Effective communication and collaboration.

Domain insights can be integrated:

With SBI.
For augmenting data with invariances.
To add invariances to data-driven models.
For emulation and surrogate modelling.

Scientific centaurs: The modern experimental lab includes a considerable digital component. By combining data from measurement devices, simulations of lab processes, and computational models of research or user objectives, virtual labs can provide a "digital sibling". In drug design, virtual labs can accelerate the testing and analysis processes to identify candidate drugs, instead of relying on physical testing. Moving from virtual labs to AI assistants requires further advances in AI system design to create AI agents that can elicit guidance or input from domain experts. Such agents can provide useful intuitions for scientific modelling and serve as scientific sidekicks, actively helping researchers drive their research. Such an AI assistant should be able to model the research problem of interest, alongside the goals and preferences of the expert users even when the users themselves may not be able to clearly articulate them. For the AI assistants, user goals will have uncertainty, and user behaviour may change in response to the outputs of the AI system. AI assistants should also incorporate insights from cognitive science, team decision-making, and learning strategies based on limited examples.

Need to match model to user : requires shared understanding of the research question, constraints (data/compute/time/energy/function) and user needs of the domain environment. Some level of explainability from the model should be given to the user. Some requirements:

Approaches for decision-making with delayed reward or zero-shot learning to help agents solve tasks when there is little or nothing known about the reward function, and no previous behaviour to learn from.
Interactive knowledge elicitation, combining prior knowledge from cognitive science with learning from data, generative user models.

#TODO Put Figure 3 here.

Some other important considerations:

What guardrails are needed to ensure high confidence in the output of ML based simulations? How can we manage uncertainty, especially when different failure modes result in different implications? In ML and science, it is often more beneficial to reduce the risk of false positives.