Large-scale data is increasingly encountered in biology, medicine, engineering, social sciences and economics with the advance of the measurement technology. A distinctive feature of such data is that it usually comes with a large sample size and/or a large number of features, creating challenges for data storage, processing and data analysis. On the other hand, classical statistical methodology, theory and computation have been developed based on the assumption that the entire data reside on a central location. As a result, most classical statistical methods face computational challenges for analysing large-scale data in the big data era. Specifically, big data is known to possess the so-called 4D features: Distributed, Dirty, Dimensionality and Dynamic. These features make it very challenging to apply traditional statistical thinking to massive data.
The main aims of this workshop were to exchange developments made in distributed data analysis and aggregated inference with consideration on computational complexity and statistical properties of relevant estimators; to discuss open challenges, exchange research ideas and forge collaborations in three research areas: statistics, machine learning and optimisation; to promote the development of software with justified statistical properties and efficient computational properties; to engage more UK young researchers to work at the interface of computing and statistics.