Grants and Contributions:
Grant or Award spanning more than one fiscal year. (2017-2018 to 2022-2023)
The sentences in a document are related to each other in order to express complex ideas. For instance, two sentences can be in a contrast relation is they present opposite ideas, or in an elaboration relation is one sentence is presenting more details on the idea expressed by the other. A discourse parser is a software program that given a document as input is able to extract all the relations between its sentences. The output of discourse parsing can then be leveraged to support many other useful tasks, like creating a summary of the document or determining the opinions expressed by the document. The main goal of the proposed project is to improve current discourse parsing technology in several ways. First, we aim to boost the accuracy and the speed of current parsers by applying novel techniques from artificial intelligence and machine learning. Secondly, we will study how other text processing tasks, like summarization and text mining, can maximally benefit from the output of our more accurate and faster discourse parsers. Finally, we will extend discourse parsing beyond text, to deal with the many documents, ranging from newspaper articles to scientific publications, in which text is combined with visual material (i.e., multimodal documents). In addition to scientific impact, we expect this research to have real and significant economic and social benefits for Canada. There is a rapidly expanding demand for applications that are designed to help users understand and manipulate complex bodies of textual and multimodal documents, both in the workplace and for personal use. For instance, journalists often need to analyze many documents to check facts and discover stories, while citizens may benefit from summaries of news from different sources to build more informed opinions about current events. Similarly, in the healthcare sector, summaries of patient histories and relevant medical literature could be very useful to doctors, while patients may need support in exploring on-line discussions about their condition. Finally, consider the business domain, where Canadian companies could better understand their customers and develop better products by mining online reviews. Conversely, consumers, by accessing the same information, could make more informed purchases. The proposed research has great potential to fuel these and many other applications, by boosting discourse parsing performance, extending its applicability to documents including visualizations and by turning these improvements into robust and versatile text processing technologies that can benefit both Canadian companies and citizens.