Grants and Contributions:

Title:
Methods for Statistical Learning
Agreement Number:
RGPIN
Agreement Value:
$146,250.00
Agreement Date:
May 10, 2017 -
Organization:
Natural Sciences and Engineering Research Council of Canada
Location:
Nova Scotia, CA
Reference Number:
GC-2017-Q1-02519
Agreement Type:
Grant
Report Type:
Grants and Contributions
Additional Information:

Grant or Award spanning more than one fiscal year. (2017-2018 to 2022-2023)

Recipient's Legal Name:
Chipman, Hugh (Acadia University)
Program:
Discovery Grants Program - Individual
Program Purpose:

The proposed research lies at the intersection of modern statistical learning and "traditional" statistical ideas such as design and analysis of experiments and uncertainty quantification. In statistical learning, data are used to train a supervised learner, which is a flexible statistical model that predicts a response variable using the values of input variables. Statistical techniques, such as Bayesian modelling, make it possible to quantify uncertainty about the supervised learner. Design and analysis of experiments provide a way to collect relevant data for training the model.

This research program concerns the invention and application of novel supervised learning models. It will generalize the Bayesian Additive Regression Trees (BART) model, making it applicable to a wider variety of data types and incorporating new structure for the case where the response variable is numeric. Generalized data types will include classification with more than two classes, via multinomial regression. Additional structure will include monotonicity, combination with linear and mixed effect linear models and joint modelling of location and dispersion, with either a normal error model or a more flexible error model. As with the original BART model, a framework for statistical uncertainty will be implemented in a way that scales to large data problems. A particular focus of the BART model will be the sequential design and analysis of computer experiments. A sequential design algorithm can exploit the flexibility of BART and use the ability to quantify uncertainty in evaluation of a sequential design criterion.

Another direction for this research program will be the application of the full suite of tools and ideas from computer experiments and classical design of experiments to the problem of evaluating the performance of statistical models through simulation. Nearly all research that presents a new statistical model relies on simulation experiments to study the performance of statistical models in realistic scenarios that are not amenable to theoretical study. Yet most experiments fail to employ either statistical design or analysis methods. This research will bring the full suite of experimental design, including lesser known methods such as Taguchi's robust parameter design, to simulation experiments. It will develop a systematic approach that researchers can use to gain understanding of performance over a range of sources of variation.

Many of the methods developed will be applicable to big data problems, and will often be inspired by real applications. These aspects will provide excellent training opportunities for students, developing them as "data scientists" by the time they complete their studies.