Grants and Contributions:

Back to search

Title:

DeFacto: Acquiring, Curating, and Using a Bilingual Domain Aware Commonsense Knowledge Base

Agreement Number:

RGPIN

Agreement Value:

$115,000.00

Agreement Date:

May 10, 2017 -

Organization:

Natural Sciences and Engineering Research Council of Canada

Location:

Quebec, CA

Reference Number:

GC-2017-Q1-02419

Agreement Type:

Grant

Report Type:

Grants and Contributions

Additional Information:

Grant or Award spanning more than one fiscal year. (2017-2018 to 2022-2023)

Recipient's Legal Name:

Langlais, Philippe (Université de Montréal)

Program:

Discovery Grants Program - Individual

Program Purpose:

Automatically extracting knowledge from a large set of mostly unstructured documents (such as the Web) and organizing it into a knowledge base (KB) is a key challenge in artificial intelligence. Intuitively, such KBs should directly impact the quality of many NLP applications such as question answering, information retrieval or Text Analytics. Open information extraction, the task of extracting knowledge from texts without much supervision (especially not a prescription of the kind of information to mine), has brought new hope for such an endeavour.

Despite a number of well-designed components are nowadays widespread and readily available for extracting facts and relations (so-called tuples) from texts, tapping information in large collections of texts still raises a number of issues. The technology embedded in a typical knowledge extraction pipeline is fraught with shortcomings: coreference resolution, named-entity resolution and parsing errors are collapsing so that many tuples (if not the vast majority) are simply useless. Also, most works are targeting very frequent entities and relations, which exclude a large quantity of information on domain specific texts that are pervasive over the Web.

Our long term objective consists in developing the necessary expertise in populating, curating, maintaining and using a KB. Our proposal departs from several existing initiatives by a number of key factors. First, since specific domains are prevalent over the Web, we want our technology to be domain aware. Second, since today's world is multi-lingual and because not everything is written in English, we further want our technology to be multi-lingual in nature. Last, most works are devoted to develop fully automatic technology for assisting humans. In our proposal, we are interested in measuring how much gaming with a purpose can make humans assist the computer.

In order to succeed, we target in this proposal the development of deFacto, a multi-domain, bilingual KB (French -- English) acquired iteratively from texts mined over the web, with the help of feedback collected from users via serious gaming.