Grants and Contributions:

Title:
Topology and Transformers in tandem: developing methodology to combine these cutting-edge tools
Agreement Number:
1012497
Agreement Value:
$183,333.00
Agreement Date:
Dec 1, 2023 - Mar 31, 2026
Description:
The Core AI4Design proposal seeks to integrate two powerful methods for data analysis, namely Topological Data Analysis (TDA) and machine learning (ML). These two approaches have so far been relatively independent of each other in practice. This proposal aims to develop new techniques that combine the strengths of both approaches to create nextgeneration tools for analyzing complex data. More precisely, the many types of sequence data where transformers are used - sequences of words in natural language texts, or sequences of programming code, or sequences of video frames and audio - are all highly complex in that the data themselves are high-dimensional (e.g. video), they involve very high-order structure (e.g. the way in which pixels in video allow us to see a cat moving is not something that can be summarized in a handful of equations) and the corpora of data that one needs for training are huge. TDA on the other hand is very effective at summarizing high-order structure and actual geometry in data. We propose to develop a systematic approach for combining TDA and transformers for more efficient learning and analysis of complex data. The proposal comprises two main streams. The first stream aims to develop a hierarchical version of TDA structures that utilize successive layers, with each layer pooling the points of the layer below, akin to the operation of convolutional neural networks. The second stream targets the ML paradigm of transformer networks and proposes to use TDA persistence images as the data representation, which will be served to transformers. The developed tools will be tested on datasets where the current state of the art uses transformers but where "shape" or geometry is believed to play a role in the underlying learning problem yet has not been leveraged yet due to the absence of such information in the typical input for transformers. An example is the discovery of new proteins predicted to have desirable luminosity or therapeutic properties. State of the art in this field uses transformers to generate sequences of amino acids likely to have the desired properties (these are then synthesized and tested). It is known that the geometry of a fluorescent protein contributes directly to luminosity, and folded shape information is available for many proteins used to train the transformer but this geometric information is not fed to the transformer. We propose to summarize it with TDA persistence images and use these to train slightly modified transformers. Our aim is to demonstrate improved generative ability in the TDA-transformer compared to a basic transformer alone.
Organization:
National Research Council Canada
Expected Results:

In the short term, anticipated outcomes will be strengthened collaborations across industry, academia, and government to support research excellence. In the medium term, anticipated outcomes will be the development of new and potentially disruptive technologies with collaborators. In the long term, find collaborative solutions to public policy challenges and create stronger innovation systems.

Location:
Ottawa, Ontario, CA K1N 6N5
Reference Number:
172-2023-2024-Q3-1012497
Agreement Type:
Grant
Report Type:
Grants and Contributions
Recipient Business Number:
119278877
Recipient Type:
Academia
Recipient's Legal Name:
University of Ottawa
Federal Riding Name:
Ottawa–Vanier
Federal Riding Number:
35078
Program:
Collaborative Science, Technology and Innovation Program - Collaborative R&D Initiatives
Program Purpose:

Collaborate on multiparty research and development programs to catalyze transformative, high-risk, high-reward research with the potential for game-changing scientific discoveries and technological breakthroughs in priority areas.

NAICS Code:
541710