3 main courses

During the winter school, there will be three courses offered, each consisting of three 90-minute sessions.

Non-parametric Inference under Local Differential Privacy

Data privacy protection is a major issue for our society nowadays due to the massive amounts of data collected and stored by many electronic devices at all times, on social networks, in medecine, in finance and so on. This leads to multiple sources of data concerning the same individuals (persons, funds, etc.) that can be easily aggregated in order to identify them. Therefore, privacy preserving mechanisms have to be applied to the data before their public release which implies to quantify the amount of privacy, but also to decide a priori whether collaboration between data holders is possible/authorized or unadvisable/forbidden.

More information: Abstract Non-parametric Inferece under Local Differential Privacy

Principal Component Analysis: some recent results and applications

Several recent applications in statistics, machine learning or numerical analysis can be formulated as high-dimensional matrices processing problems. Extracting information efficiently from these objects often require to develop new computationally efficient methods. Understanding how and when these methods work is a fascinating topic of research that require to combine tools from several fields of mathematics: statistics, probability, perturbation theory and convex optimization. In this course, we will review how to use these tools in the context of Principal Component Analysis to analyse the performances of the standard PCA method. Results will include concentration bounds, asymptotic distributions and minimax lower bounds for functionals of spectral projectors. Next, we will explain how to exploit this recent theory to provide some insight into some new or longstanding problems in machine learning including Gaussian mixture, graph clustering, domain adaptation.

Statistical inference of incomplete data models to analyse ecological networks

Ecological networks aim at describing the interactions between a set of species sharing a same ecological niche. The interactions constituting a network can be directly observed (e.g. via plant-pollinator contacts) or may need to be reconstructed based on the fluctuations of the species’ abundance across different sites. Statistical models are needed either to describe the organisation (or ‘topology’) of an observed network, or to infer the set of interactions that underlies the joint distribution of the abundances. Various models have been proposed for both purposes.

These lectures will focus on two emblematic families of models. The stochastic block-models (SBM) are dedicated to the topological analysis of observed networks and assumes that species have different roles in the network and that the interaction between them depend on their respective roles. The Poisson log-normal (PLN) model is a joint species distribution model (JSDM) that relies on a Gaussian latent layer. Interestingly, both models are incomplete data models and their statistical inference raise similar issues.

After a brief reminder about most popular methods for the inference of incomplete data models, we will show that they do not apply to SBMs or PLN models. We will introduce inference methods based on variational algorithms, which rely on an approximation of the conditional distribution of the unobserved variables given the observed data. Such algorithms have been shown to be efficient for the inference of a large class of incomplete data models, but their theoretical understanding remains itself incomplete.  Eventually, we will discuss various leads to combine variational approximations with statistically grounded estimation procedures.