Grants and Contributions:
Grant or Award spanning more than one fiscal year (2017-2018 to 2022-2023).
In familial genetic studies, sharing of rare genetic variants (RVs) by relatives with a phenotype of interest is a key piece of information to infer the involvement of the RVs in the phenotype. I have spearheaded the development of RV sharing probabilities among distantly related subjects as basis to link RVs to a phenotype. The application of this approach to whole genome sequencing studies presents a number of challenges: 1) there is currently no validated approach to group intergenic variants expected to have a similar impact on gene regulation; 2) the abundance of RVs implies that multiple RVs occur on the same haplotype within small genomic regions which need to be precisely delineated and 3) RV sharing probabilities may be underestimated due to unknown relationships among apparently unrelated family members. I propose to address these challenges in my research program through the following specific objectives:
Define clusters of rare variants based on 3D chromosomal contacts and develop family-based statistics to test the link between these clusters and dichotomous phenotypes in whole genome sequence data.
To this end, clusters of RVs will be derived from 3D contact matrices produced by high-throughput chromosome conformation capture experiments, as regions in close contact are involved in the same gene regulation processes. RV sharing statistics will be developed over clusters within domains of 3D contacts and across distinct domains.Accurately infer recombination events between distant relatives to improve rare variant haplotype inference.
This aim will be achieved by combining information from genetic transmission in families and from population haplotype frequencies.Model more accurately unknown relationships among family members.
Estimates of distant relatedness developed by collaborators will be integrated in the RV sharing approach.Develop a software tool linking rare variant statistics and 3D contact databases.
The proposed methods will be implemented as a package for the Bioconductor project expanding the existing R package RVsharing.
Actual DNA sequence, phenotype and family structure data will be used for training models, testing methods and software and calibrating simulations for evaluating statistical properties. This research program will provide new tools for researchers conducting familial genetic studies and train graduate students in biostatistics and bioinformatics in an environment involving such researchers.