Working meeting "Uncertainty quantification and machine learning"

March, 10th 2020

Seminar organized by the GdR MASCOT-NUM

Organizers: Sébastien Da Veiga, Bertrand Iooss, Anthony Nouy, Guillaume Perrin, Victor Picheny

It will take place on:

March 10, 2020, at Amphithéâtre Hermite, Institut Henri Poincaré, Paris.

Presentation of the workshop

Recent advances which emerged in parallel in the machine learning and UQ communities show that they could both benefit even further from joint research efforts and practice sharing. In particular, the trending issues in machine learning related to nonlinear approximation theory, uncertainty prediction and explainability of black-box models have deep links with some methodologies explored in the UQ community. The goal of this workshop is to investigate these links and gather a diverse audience from both research areas to facilitate future collaborations between them.

Agenda

9h15 - Welcome - Introduction

Theme 1: Nonlinear Approximation - Deep learning

Chair: Victor Picheny

9h30 - Anthony Nouy (Centrale Nantes): Learning with deep tensor networks - nouy20.pdf

10h15 - Mark van der Wilk (Imperial College London): Gaussian processes for uncertainty quantification in deep models - vanderwilk20.pdf

11h00 - Break

Theme 2: Explainability and interpretability

Chair: Bertrand Iooss

11h30 - Max Halford (Université Toulouse): Global explanation of machine learning with sensitivity analysis - halford20.pdf

12h15 - Lunch break (on your own)

14h30 - Christophe Labreuche (Thalès): Interpretability methods in AI and a comparison with sensitivity analysis - labreuche20.pdf

Theme 3: Prediction&

Chair: Guillaume Perrin

15h15 - Nicolas Brosse (Thalès): Uncertainties for classification tasks in Deep Neural Networks: a last layer approach - brosse20.pdf

16h00 - Break

16h15 - Sébastien Da Veiga (Safran Tech): Sampling posteriors in high dimension: potential industrial applications with UQ - daveiga20.pdf

17h00 - End

Abstracts

We consider the approximation of functions in a statistical learning setting, by empirical risk minimization over model classes of functions in tree-based tensor format, which can be interpreted as particular cases of neural networks. We present results on their approximation power for some classes of functions (including standard regularity classes and classes of functions given by compositions of regular functions) and on their statistical complexity. We also present adaptive learning algorithms and illustrate their performance for supervised or unsupervised learning tasks.

Gaussian processes (GPs) are distributions over functions with many useful mathematical properties. Particularly the closed-form solution for Bayesian inference makes them useful for uncertainty quantification. In this talk, I will discuss GPs as a building block for deep models. I will discuss why I believe that deep GPs may be a better way to do uncertainty quantification in deep models than weight-space uncertainty, and I will present some evidence for this. I will end with some thoughts on the influence of model structure on uncertainty quantification and generalisation ability.

We present a novel framework to understand machine learning black box using sensitivity analysis. For this, we will construct new test distributions as close as possible from the test distributions using entropic projection. Such distributions enjoy the property that constraints can be incorporated in a feasible way and can be easily computed. Adding constraints enable to stress the algorithms and thus enable to understand its behavior with respect to changes in the distribution.

Interpretability and sensibility analysis methods are originated from different community (AI for the first one, and statistics for the second one) and aim at different goals (local analysis for the first one and global analysis for the second one).
However, we can draw interesting connections between them.
Prior to that, we will start by describing some challenges in a large class of interpretability methods called Feature Attribution.

Feature Attribution aims at allocating the level of influence of each feature to the output of the AI model. The Shapley value of one of the leading concepts for feature attribution.
Its benefit is its axiomatic justification in Cooperative Game Theory.
It has been adapted to different fields of AI. In Machine Learning, the difficulty is to take into account the dependencies among featutes.
In Decision Aiding, the features are often organized in a hierarchical way and the standard Shapley value is not suitable.

We will show interesting connections between Feature Attribution methods and sensitivity analysis. Under some assumptions, the Sobol indices correspond to a concept which is a variant of the Shapley values.

Uncertainty quantification for deep learning is a challenging open problem. Bayesian statistics offer a mathematically grounded framework to reason about uncertainties; however, approximate posteriors for modern neural networks still require prohibitive computational costs.

We propose a family of algorithms which split the classification task into two stages: representation learning and uncertainty estimation. We compare four specific instances, where uncertainty estimation is performed via either an ensemble of Stochastic Gradient Descent or Stochastic Gradient Langevin Dynamics snapshots, an ensemble of bootstrapped logistic regressions, or via a number of Monte Carlo Dropout passes.

We evaluate their performance in terms of selective classification (risk-coverage), and their ability to detect out-of-distribution samples. Our experiments suggest there is limited value in adding multiple uncertainty layers to deep classifiers, and we observe that these simple methods outperform a vanilla point-estimate Stochastic Gradient Descent in some complex benchmarks like ImageNet.

Stochastic Gradient Langevin Dynamics and its variants recently emerged as promising algorithms for sampling posteriors.
In this talk we investigate their potential in the context of fully Bayesian Gaussian process regression.
We will focus on high-dimensional problems with sparsity-inducing priors (e.g. Acosso models).
In addition, we will show how the stochastic gradient point of view can handle a large number of design points, either by subsampling or with random Fourier features.