Grants and Contributions:

Title:
Causal Inference in High-Dimensional Settings
Agreement Number:
RGPIN
Agreement Value:
$105,000.00
Agreement Date:
May 10, 2017 -
Organization:
Natural Sciences and Engineering Research Council of Canada
Location:
Ontario, CA
Reference Number:
GC-2017-Q1-01712
Agreement Type:
Grant
Report Type:
Grants and Contributions
Additional Information:

Grant or Award spanning more than one fiscal year. (2017-2018 to 2022-2023)

Recipient's Legal Name:
Zhu, Yeying (University of Waterloo)
Program:
Discovery Grants Program - Individual
Program Purpose:

My research program is directed at the development of statistical methodologies and attendant theory in the area of causal inference through the creation of innovative methods to estimate causal treatment effects in the high-dimensional setting. Algorithms will be developed using sufficient dimension reduction, reproducing kernel Hilbert space methodology and machine learning techniques.

It is well-known that in casual inference, the convergence rate of a matching procedure depends on the dimension of the covariates to be matched on. When there exist a large number of covariates, matching could be highly inefficient. In the first research stream, I propose employing sufficient dimension reduction to obtain the central mean subspaces for the potential outcomes. The reduced covariates are estimable non-parametrically under mild assumptions and require a weaker common support condition, compared to the original covariates and the propensity scores. Therefore, the proposed matching procedure is more applicable than existing ones in the high-dimensional setting.

In the second research stream, I plan to develop a new propensity score estimation method based on reproducing kernel Hilbert spaces for estimating causal effects via balancing covariates. I will first define an objective function with certain constraints such that the balance is achieved for each covariate after weighting. Then, generalized method of moments or empirical likelihood will be employed to estimate the parameters. By using the kernel distance, the proposed method achieves balance in the whole distribution of the covariates, not only the finite moments. In addition, since the convergence rate of the kernel distance does not depend on the dimension of the covariates, the proposed method avoids the curse of dimensionality.

An alternative to the kernel-based method proposed in the second research stream is to use model-averaging approaches to estimate propensity scores when the number of covariates is large. Machine learning methods, which are nonparametric algorithms and are less susceptible to model misspecification, will be combined to estimate the propensity scores. The proposed method is expected to lead to efficient estimators under regularity conditions.

Primarily subfields of statistics that will be addressed in the research include causal inference, dimension reduction and computational machine learning methodologies, along with their interface. The areas of application of the proposed research are vast and lie in biomedical studies, public health and social sciences. For example, the proposed research can be applied to draw causal inference for “big data”, such as large databases of health records in Canada and longitudinal datasets of smoking. The development of related R packages will enable us to disseminate our research and benefit more applied researchers in the above-mentioned research areas.