Grants and Contributions:

Title:
Computational and Inferential Tools for Machine Learning Methods in Biostatistical Research
Agreement Number:
RGPIN
Agreement Value:
$70,000.00
Agreement Date:
May 10, 2017 -
Organization:
Natural Sciences and Engineering Research Council of Canada
Location:
Ontario, CA
Reference Number:
GC-2017-Q1-03361
Agreement Type:
Grant
Report Type:
Grants and Contributions
Additional Information:

Grant or Award spanning more than one fiscal year. (2017-2018 to 2022-2023)

Recipient's Legal Name:
kustra, rafal (University of Toronto)
Program:
Discovery Grants Program - Individual
Program Purpose:

Modern machine learning methods, such as boosting, support vector machines, or neural networks, have made great impact on statistical research and application mostly in terms of improved predictive and prognostic accuracy. Their enhanced abilities to model complex interactions and non-linear effects could also be utilized to explain the underlying physical or physiological phenomena and to generate specific scientific hypothesis for further study. In non-strictly predictive applications, use of many modern methods, however, is hampered by their black-box nature and by the lack of inferential tools that would allow to obtain statistical confidence measures on inferred relationships. The simplest statistical inference which is universal in classical models pertains to statements on individual covariates. For example, is covariate "Gender" an important factor in a model of disease progression? In classical models this is answered by calculating statistical inference quantities (p-values, confidence intervals) on a parameter (or small set of parameters) that are connected with "Gender" in a model. In contrast, machine learning methods utilize a non-parametric approach where covariates influence on the outcome is not controlled by a small set of parameters. Hence the classical approach is not applicable and an importance of any particular covariate in the model of the outcome is not easily tested. While many model-specific or approximate measures have been proposed, in particular Variable Importance Metric in a Random Forest model, there is no universal, statistically coherent approach present in literature. We propose to develop, validate, apply and disseminate - in the form of freely available software packages - a set of tools for classical inference that will allow researchers to test the importance and influence of covariates of interest in the non-parametric machine learning models of the outcome.