Grants and Contributions:
Grant or Award spanning more than one fiscal year. (2017-2018 to 2022-2023)
Our proposed research concentrates on developing tools of statistical inference, particularly for the selection of graphical models. Graphical models form an essential part of the statistician's toolbox. They find applications in various areas: in medicine to model the dependence between certain gene mutations and the existence of a disease, or in finance to model the dependence relationship between various stocks.
Graphical models are multivariate statistical models that model the dependence relationship between different variables by means of a graph. The vertices of the graph represent the variables while the edges between the vertices are a code for the dependence or independence between these variables. The variables considered can be continuous (Gaussian) or discrete (multinomial) and the graphs considered to represent the dependence relationship can be directed or undirected. In our research, we will consider four types of graphical models: Gaussian coloured undirected graphs, discrete directed acyclic graphs (abbreviated DAG), discrete undirected graphs and discrete heterogeneous graphs. Each type of graph is best adapted to represent certain data sets. Our research is aimed at selecting a model best representing a given data set, for the purpose of explanation and/or prediction.
Coloured Gaussian undirected models. These are classical graphical models with added equality restrictions on the relationship between given groups of variables. The additional constraints diminish the number of free parameters in our model. While this is a good thing because we have less parameters to estimate, it renders the classical graphical Gaussian methods impossible to apply. Our main aim is to give a new method of Bayesian model selection based on a process called a birth and death process.
DAG discrete models . A notoriously hard task is to do model selection in the space of such models. This task is hard because many graphs can represent the same dependence relationship between variables (we say that such models are Markov equivalent). The set of Markov equivalent DAGs can be represented by a graph called an essential graph. Using some recent result of ours, we want to reduce the search to a search in the space of essential graphs.
MTP2 discrete loglinear models . These are log-linear models with positive associations between the variables. We propose to do model selection through maximum likelihood estimation of the parameter of the model.
Discrete heterogeneous models are used to model the dependence relationship between given variables for k different subpopulations. We want to identify the similarities and differences between the k graphs underlying the subpopulations. High-dimensional discrete heterogeneous models have not been studied from a Bayesian perspective. Our radically different approach proposes to use the Parafac factorization of the k tables of probabilities.