Workshop

Computational Strategies for Large-Scale Statistical Data Analysis

02 - 06 Jul 2018

ICMS, 15 South College Street Edinburgh

About

Large-scale data is increasingly encountered in biology, medicine, engineering, social sciences and economics with the advance of the measurement technology. A distinctive feature of such data is that it usually comes with a large sample size and/or a large number of features, creating challenges for data storage, processing and data analysis. On the other hand, classical statistical methodology, theory and computation have been developed based on the assumption that the entire data reside on a central location. As a result, most classical statistical methods face computational challenges for analysing large-scale data in the big data era. Specifically, big data is known to possess the so-called 4D features: Distributed, Dirty, Dimensionality and Dynamic. These features make it very challenging to apply traditional statistical thinking to massive data.

The main aims of this workshop were to exchange developments made in distributed data analysis and aggregated inference with consideration on computational complexity and statistical properties of relevant estimators; to discuss open challenges, exchange research ideas and forge collaborations in three research areas: statistics, machine learning and optimisation; to promote the development of software with justified statistical properties and efficient computational properties; to engage more UK young researchers to work at the interface of computing and statistics.

Programme

Meet the speakers

David Dunson

Duke University

Scaling up Bayesian Inference

Ata Kaban

University of Birmingham

Structure Aware Generalisation Error Bounds Using Random Projections

Eric Xing

Carnegie Mellon University

On System and Algorithm Co-Design and Automatic Machine Learning

Yoonkyung Lee

The Ohio State University

Dimensionality Reduction for Exponential Family Data

Jinchi Lv

University of Southern California

Asymptotics of Eigenvectors and Eigenvalues for Large Structured Random Matrices

Faming Liang

Purdue University

Markov Neighbourhood Regression for High-Dimensional Inference

Ping Ma

University of Georgia

Asympirical Analysis: a New Paradigm for Data Science

Mladen Kolar

University of Chicago

Recovery of Simultaneous Low Rank and Two-Way Sparse Coefficient Matrices, a Nonconvex Approach

Guang Cheng

Purdue University

Large-Scale Nearest Neighbour Classification with Statistical Guarantee

Yining Chen

LSE

Narrowest-Over-Threshold Detection of Multiple Change-points and Change-point-like Features

Haeran Cho

University of Bristol

Multiscale MOSUM Procedure with localised Pruning

Jason Lee

University of Southern California

Geometry of Optimization Landscapes and Implicit Regularization of Optimization Algorithms

Chen Zhang

University College London

Variational Gaussian Approximation for Poisson Data

Jeremias Knoblauch

University of Warwick

Bayesian Online Changepoint Detection and Model Selection in High-Dimensional Data

Stanislav Volgushev

University of Toronto

Distributed Inference for Quantile Regression Processes

Hua Zhou

University of California

Global Solutions of Generalized Canonical Correlation Analysis Problems

Matteo Fasiolo

University of Bristol

Calibrated Additive Quantile Regression

Wenxuan Zhong

University of Georgia

Leverage Sampling to Overcome the Computational Challenges for Big Spatial Data

Moulinath Banerjee

University of Michigan

Divide and Conquer in Nonstandard Problems: the Super-Efficiency Phenomenon

Qifan Song

Purdue University

Bayesian Shrinkage Towards Sharp Minimaxity

Binyan Jiang

The Hong Kong Polytechnic University

Penalized Interaction Estimation for Ultra High Dimensional Quadratic Regression

Chao Zheng

Lancaster University

Revisiting Huber’s M-Estimation: a Tuning-Free Approach

Xin Bing

Cornell University

A Fast Algorithm with Minimax Optimal Guarantees for Topic Models with an Unknown Number of Topics

Cheng Qian

LSE

Covariance and Graphical Modelling for High-Dimensional Longitudinal and Functional Data

Didong Li

Duke University

Efficient Manifold and Subspace Approximations with Spherelets

Xiaoming Huo

Georgia Institute of Technology

Non-Convex Optimization and Statistical Properties