Grants and Contributions:

Title:
Topics in Statistical Modelling and Inference with High-Dimensional, Complex Data
Agreement Number:
RGPIN
Agreement Value:
$215,000.00
Agreement Date:
May 10, 2017 -
Organization:
Natural Sciences and Engineering Research Council of Canada
Location:
Ontario, CA
Reference Number:
GC-2017-Q1-02826
Agreement Type:
Grant
Report Type:
Grants and Contributions
Additional Information:

Grant or Award spanning more than one fiscal year. (2017-2018 to 2022-2023)

Recipient's Legal Name:
Wu, Yuehua (York University)
Program:
Discovery Grants Program - Individual
Program Purpose:

Technology is changing our lives. Data is now collected in large volumes at various granularities. Such data often exhibit complex structures. Statistical models able to capture this complexity can further our understanding of the underlying data-generating mechanism and advance relevant fields in science and engineering. However, modelling such data can be challenging, particularly when we have endogenous measurements, outliers, missing observations, or other anomalies.

The objective of this proposed research program is to tackle statistical problems encountered in modelling high-dimensional, complex data, with particular interest on spatio-temporal, financial, business intelligence, genomic data. It will focus on development of robust, computationally feasible and fast model selection procedures for noisy high- or ultra high-dimensional, complex data in the scenario that the set of candidate models may not contain the true one. It will develop a two-stage regularization method based multiple change-point detection in general model formulations, e.g., generalized linear models and functional data models, which will simultaneously estimate all the change-points and perform variable selection. It will deal with multiple change-point detection in high-dimensional regression models, which include high-dimensional linear dynamic panel data models and spatio-temporal regression models. It will further the development in regression clustering using respective regularization methods and stochastic search. It will propose non-stationary spatio-temporal modelling which leverage of geographical neighbourhood information. It will tackle spatio-temporal inverse modelling problems by effectively grouping grid data using penalized approaches. It will develop the feasible association rule mining for the massive business intelligence, and genomic data. It will address statistical modelling of financial data, e.g., long term implied volatility and trading data. For our proposed methods, we will carry out both theoretical and methodological investigations; in addition, we will develop computational algorithms to ensure that our methods are computationally feasible and efficient, and present results from numerical studies which validate our findings. The advancements achieved under the proposed research will produce significant impact in statistical modelling, inference, computing, and their applications in practice.