# Working groups of GdR MASCOT-NUM

- Numerical implementation
- Design of numerical experiments
- Metamodeling
- Sensitivity analysis, calibration and validation issues
- Risk and Uncertainty
- Optimization
- Industrial applications
- Environmental, agronomical and biological applications

## Numerical implementation

*Coordinators:*

*Yann Richet (IRSN) & Laurence Viry (University Joseph Fourier)*

Numerical implementation is a preliminary step to achieve the applicability of our research group.

This issue is all the more difficult as it is based on an evolving technological environment, so the same numerical problem can find several different solutions.

Thus, this activity focuses on the promotion of understanding and usage of technological solutions to the problems encountered in third-party simulators or digital methods themselves: high performance computing, grid computing, parametric modeling. Algorithm template

## Design of numerical experiments

*Coordinators:*

*Clémentine Prieur (Université Joseph Fourier) & Luc Pronzato (Université Nice Sophia Antipolis)*

The extensive use of numerical simulations opens the ways for new types of experimentations and designs of experiments: more intensive explorations become possible, more hazardous configurations can be tested and, hopefully, better understanding and optimized responses can be achieved together with more accurate statements about risk and failures. At the same time, complex simulations require long computations, which sets a limitation on what can be learnt in reasonable time. The domain of design and analysis of computer experiments aims at defining what should be chosen for the inputs of a numerical model in order to achieve a prescribed objective. In particular, one may want to: (i) predict the behaviour of a numerical model from the results of a small number of runs; (ii) optimize the response of a numerical model; that is, determine the values of inputs corresponding, for example, to the highest performance or smallest cost; (iii) estimate the variability of a response as a function of that of the inputs (also known as sensitivity analysis); (iv) estimate a probability of failure in presence of uncertainties when some inputs are randomized with a given probability measure. Whereas space-filling designs are commonly used for the first objective, different types of designs may be more relevant in other situations. Sequential strategies (or active learning) that construct a model of the numerical simulator step by step, are especially attractive. The topics considered in this Working Group cover the definition of design criteria related to a given objective, the construction of efficient algorithms for the determination of optimal experiments, the investigation of asymptotic properties of designs, the construction of designs for dealing with simulators with several levels of predictive accuracy. Experiments for real physical systems, where in general purely random errors corrupt the observations, are also considered.

## Metamodeling

*Coordinators: Amandine Marrel (CEA) &*

*Anthony Nouy (Ecole Centrale de Nantes)*

Different approaches can be used to build approximations of expensive computer codes. These approximations, called metamodels, are computationally cheap and they can then be used to replace the actual computer codes to tackle problems such as uncertainty analysis, sensitivity analysis, probabilistic inverse problems or optimization under uncertainty.

A wide variety of approximation tools exist (polynomials, splines, wavelets, neural networks…) and different approaches can be used to construct approximations, depending on the level of information on the model. They range from statistical methods using a computer code as a simple black-box to projection methods based on the equations of a model.

This working group will focus on challenging questions that arise when constructing metamodels for real and/or industrial problems, including (among others):

- What can we do when the number of input variables is really large?
- What can we do when the code outputs are high-dimensional vectors or functional data ?
- How can we introduce the objectives of the analysis in the construction of metamodels ?
- How to simultaneously deal with several output variables?
- How to take into account multiple codes with different levels of fidelity ?
- How to handle complex models involving different codes in interaction ?
- How to take into account
*a priori*constraints such as physical constraints, monotonicity..., ensuring plausible metamodel predictions?

## Sensitivity analysis, calibration and validation issues

*Coordinators: Sébastien Da Veiga (Safran) & Nabil Rachdi (Airbus group)*

The traditional steps of sensitivity analysis involve screening and the computation of classical Sobol indices, either directly or through the preliminary construction of a metamodel. Recently several authors proposed new viewpoints for sensitivity analysis in order to go further than Sobol indices, ranging from derivative-based indices to goal-oriented measures and density-based approaches. The goal of this working group is to continue exploring such alternative approaches while evaluating their relevance for both screening and quantitative sensitivity analysis. In particular, we will focus on feature selection techniques which are used in the machine learning community.

On the other hand, calibration and validation are now daily tasks engineers have to perform in industry. Many calibration techniques, originating from optimization (inverse problems) or Bayesian statistics (a posteriori density estimation through Kalman filters, MCMC, particle filters, …) are available but usually require thousands of runs of the code. They are thus impractical in our context : the aim of the working group is to investigate how metamodeling techniques coupled with these approaches can help calibrate complex industrial codes. Another aspect concerns sensitivity analysis for calibration problems. Indeed, standard GSA can identify which parameters most influence the output on average, but is not well suited to detect the relevant parameters for calibration. This issue will be a major research area of the working group.

Calibration techniques are also relevant for numerical model validation where one may want to check that the behaviour of the numerical model corresponds to the specified one. Methodological aspects will be tackled in this working group.

## Risk and Uncertainty

*Coordinators:*

*Nicolas Bousquet (EDF R&D) & Bruno Sudret (ETH Zürich)*

The notion of uncertainty, daily used in engineering practice, appears as a key element of any study based on phenomenological modeling. Traditionally, uncertainty is described qualitatively as the nature of misknowledge affecting a phenomenon. On the one hand, the aleatory part of uncertainty defines an intrinsic state of randomness, characterizing for instance the occurrence of a natural hazard in risk analysis, while, on the other hand, the so-called epistemic part of uncertainty characterizes the misknowledge of the phenomenon itself, typically because measurements of this phenomenon are indirect or/and noisy. Hence the classical distinction between the two definitions of uncertainty is that aleatory uncertainty cannot be reduced by additive knowledge, unlike epistemic uncertainty.

Such definitions remain fuzzy, in the sense that they do not provide formal and consensual representations of uncertainty. Today’s there is no mathematical theory of uncertainty that is fully accepted by the community of researchers and engineers in all areas. Nonetheless, the probabilistic encoding of uncertainty, based on the interpretation of probabilities as tools allowing for coherent representations of misknowledge, appears today to be the prominent modelling support. Other theories are based on considering different levels of fuzziness in the delivery of knowledge (for instance fuzzy logic, Dempster-Shafer possibility theory, etc.), which are intrinsically related to particular (e.g., experimental) situations. Most of them, however, require probabilistic tools for their use. Others remain difficult to carry because of their complexity and computational cost.

Amongst all problems linked to the modelling of uncertainties, the elicitation problem, defined as the extraction of data knowledge from various sources of information except objective experiments (real or simulated experiments), appears to be one of the major issues encountered in the practice of phenomenological modelling. In many situations, a complex functional links input variables to output variables (of interest), which can be implemented as a computer model. Many of these input variables are considered as random (intrinsically) or randomized, and the nature of this randomness is sometimes scarcely described by experimental data. Therefore expert information appears essential to drive the choice of this randomness (typically using classical statistical models), and the use of Bayesian tools is often recommended to do so. This is especially the case in structural reliability problems, where the simulated output is used to compute risk indicators (as probabilities of failure, quantiles, etc.).

Discussing new methodologies and providing research avenues on these themes is the aim of the Group « Risk and Uncertainty ». It focuses more precisely on offering a framework, through specialized conference sessions, workshops and meetings, to (among others):

- formalize the various uncertainties affecting the industrial processes in a generic way ;
- propose new computational approaches that handle the various kinds of uncertainties and their representations for risk and reliability analysis;
- highlight techniques controlling the conservatism of risk indicators in applications;
- provide tools of selection between theories of uncertainty ;
- link uncertainty and model error through verification and validation (V&V) techniques.

</sub>

## Optimization

*Coordinators:*

*Emmanuel Vazquez (Laboratoire des Signaux et Systèmes, CentraleSupélec)*

How to optimize the performance of a system using numerical simulations? In particular, when simulations are time-consuming, it becomes essential to consider optimization algorithms that use the information provided by the simulations as efficiently as possible. Several approaches may be used. Most of them are based on the construction of an approximation of the output of the simulation (a metamodel). For instance, the idea of the Bayesian approach for optimization is to use a random process as a model of the function to be optimized. Then, the optimization is performed by making evaluations of the function in sequence, each evaluation being chosen in order to minimize a criterion that quantifies the expected loss, under the random process model, incurred by taking the best evaluation result collected so far instead of the true unknown optimum. Both theoretical and practical aspects are considered in this working group.

## Industrial applications

*Coordinators: Fabien Mangeant (Airbus group)*

Uncertainty spans over a wide-encompassing spectrum of science and practical knowledge : economics, physics, decision theory and risk assessment are the traditional sources, and more recently modelling, numerical analysis, advanced statistics and computer science; but it even reaches epistemology, management science, psychology or public debate and democracy. The subject has a strategic interest in modelling, either driven by regulatory demands (e.g., in safety, security or environmental control in the certification or licensing processes), or by markets (e.g., industrial process optimization or business development), whereby a better understanding of the margins of uncertainty and possible actions for their reduction are being increasingly investigated for associated risks and opportunities. As modelling and simulations are more and more popular in the industrial pratice, many industrial challenges arise when pratictioners want to deal with uncertainty and sensitivity analysis. Within this group, the first goal is to guarantee the applicability and the transfer of the previous scientific techniques to current difficulties encountered by industrial practitioners. On the other hand, the different industrial partners will propose current challenges to the scientific community. This concerns for example: high dimensionality, computational times, coupling between different codes, scarcity of observational data, special repartition in the space of the factors. Workshops and dedicated sessions will be organised to ease a fruitful dialog between the different communities.

To amplify the efforts undertaken by the GDR-MASCOT, several partners have created a « Club des partenaires » to accompany the development of this network. Link to the Club des partenaires

## Environmental, agronomical and biological applications

*Coordinators:*

*Robert Faivre (INRA) & Nathalie Saint-Geours*

1 General description of the models and objectives

- (a) Environmental: it mainly concerns physical or physico-chemical processes mostly modelled by differential equations based on theoretical foundations. The spatial and temporal extent of the studies is often large and the resolution high. Solving these equations makes expensive the computing time. Depending on applications, the models may be stochastic (epidemic diffusion for example). Objectives include the quantification of risk, the analysis of the dynamics on a space (from 1D to 3D) often on long-term temporal extent. Environment is mainly a physical one: ground or soil, atmosphere, river system on a catchment over a watershed, a region.
- (b) Agronomical (crop model): crop models involve a large number of processes described by empirical models using a large number of parameters estimated on experimental data. The spatial and temporal extent is a plot (or a collection of plots) during a growing season with a daily temporal resolution; the spatial resolution is of 1 (homogeneous crop) except when individual plants (or rows) are modelled or precision agriculture is considered. The environment of the crop can be modelled (micro-meteorology, soil, water) or included as external inputs. Decision rules (irrigation, fertilization) are included in the model with external parameters or optimized during the simulation. Except when environment is modelled, computational time is fast but some models are stochastic.
- (c) Biological: there are many different types of biological systems, and many that are of concern for systems biology: regulation network, enzyme kinetics, enzymes and metabolites in a metabolic pathway, population dynamics, neurophysiology, cell functioning, ... In most cases, these models are dynamical ones. Objectives concern the search for the most influential parameters or for characterizing emergence.

Some questions are specific to the kind of models or applications but a lot of them are generic. A first goal of this group is to share the different practices and experiences from a large community of practitioners. Links with the mexico network (http:*reseau-mexico.fr* will be encouraged.

A number of specific features may arise in such applications:

- a large number of applications use functional inputs (spatial and temporal) for their model. Inputs can also be discrete or continuous values. All research work dealing with such multiple and heterogeneous inputs is part of the questioning of this group;
- scale change, metamodelling and multifidelity is a major tripod of complex system modelling. We will focus on all methods developed in that sense;
- rare events or quantile estimation are key points for some environmental applications;
- some models are stochastic and methods to conduct sensitivity analyses of stochastic models are of interest for this group ;
- the link between data and simulated data is a generic problem in most applications, specifically to calibrate/estimate input parameters;
- optimization (of agricultural practices for example) is an objective of some models and criteria are often multiple. All research work on methods dealing with multi-criteria optimization is welcome in this group;
- a large number of indicators can be deduced from complex system models. How to analyse and to combine them?