My research spans Bayesian deep learning, approximate Monte Carlo methods, and mathematics of deep learning. By conducting research in these areas, I am interested in addressing respective questions in uncertainty quantification for deep learning, approximate inference with big data or with high-dimensional models, and function approximation.
PhD in statistics, 2009
University of Warwick
MSc in statistics, 2005
University of Warwick
BSc in mathematics, 2004
University of Ioannina
Random variables and their distributions are a central part in many areas of statistical methods. The Distributions.jl package provides Julia users and developers tools for working with probability distributions, leveraging Julia features for their intuitive and flexible manipulation, while remaining highly efficient through zero-cost abstractions.
There has recently been much work on the ‘wide limit’ of neural networks, where Bayesian neural networks (BNNs) are shown to converge to a Gaussian process (GP) as all hidden layers are sent to infinite width. However, these results do not apply to architectures that require one or more of the hidden layers to remain narrow. In this paper, we consider the wide limit of BNNs where some hidden layers, called ‘bottlenecks’, are held at finite width. The result is a composition of GPs that we term a ‘bottleneck neural network Gaussian process’ (bottleneck NNGP). Although intuitive, the subtlety of the proof is in showing that the wide limit of a composition of networks is in fact the composition of the limiting GPs. We also analyze theoretically a single-bottleneck NNGP, finding that the bottleneck induces dependence between the outputs of a multi-output network that persists through extreme post-bottleneck depths, and prevents the kernel of the network from losing discriminative power at extreme post-bottleneck depths.
Traditionally, ODE parameter inference relies on solving the system of ODEs and assessing fit of the estimated signal with the observations. However, nonlinear ODEs often do not permit closed form solutions. Using numerical methods to solve the equations results in prohibitive computational costs, particularly when one adopts a Bayesian approach in sampling parameters from a posterior distribution. With the introduction of gradient matching, we can abandon the need to numerically solve the system of equations. Inherent in these efficient procedures is an introduction of bias to the learning problem as we no longer sample based on the exact likelihood function. This paper presents a multiphase MCMC approach that attempts to close the gap between efficiency and accuracy. By sampling using a surrogate likelihood, we accelerate convergence to the stationary distribution before sampling using the exact likelihood. We demonstrate that this method combines the efficiency of gradient matching and the accuracy of the exact likelihood scheme.
Approximation of the model evidence is well known to be challenging. One promising approach is based on thermodynamic integration, but a key concern is that the thermodynamic integral can suffer from high variability in many applications. This article considers the reduction of variance that can be achieved by exploiting control variates in this setting. Our methodology applies whenever the gradient of both the log-likelihood and the log-prior with respect to the parameters can be efficiently evaluated. Results obtained on regression models and popular benchmark datasets demonstrate a significant and sometimes dramatic reduction in estimator variance and provide insight into the wider applicability of control variates to evidence estimation. Supplementary materials for this article are available online.
RNA editing is a mutational mechanism that specifically alters the nucleotide content in transcribed RNA. However, editing rates vary widely, and could result from equivalent editing amongst individual cells, or represent an average of variable editing within a population. Here we present a hierarchical Bayesian model that quantifies the variance of editing rates at specific sites using RNA-seq data from both single cells, and a cognate bulk sample to distinguish between these two possibilities. The model predicts high variance for specific edited sites in murine macrophages and dendritic cells, findings that we validated experimentally by using targeted amplification of specific editable transcripts from single cells. The model also predicts changes in variance in editing rates for specific sites in dendritic cells during the course of LPS stimulation. Our data demonstrate substantial variance in editing signatures amongst single cells, supporting the notion that RNA editing generates diversity within cellular populations.
Differential geometric Markov Chain Monte Carlo (MCMC) strategies exploit the geometry of the target to achieve convergence in fewer MCMC iterations at the cost of increased computing time for each of the iterations. Such computational complexity is regarded as a potential shortcoming of geometric MCMC in practice. This paper suggests that part of the additional computing required by Hamiltonian Monte Carlo and Metropolis adjusted Langevin algorithms produces elements that allow concurrent implementation of the zero variance reduction technique for MCMC estimation. Therefore, zero variance geometric MCMC emerges as an inherently unified sampling scheme, in the sense that variance reduction and geometric exploitation of the parameter space can be performed simultaneously without exceeding the computational requirements posed by the geometric MCMC scheme alone. A MATLAB package is provided, which implements a generic code framework of the combined methodology for a range of models.