Grants and Contributions:

Title:
(Re)designing Clustering Algorithms for Big Data
Agreement Number:
RGPIN
Agreement Value:
$100,000.00
Agreement Date:
May 10, 2017 -
Organization:
Natural Sciences and Engineering Research Council of Canada
Location:
Quebec, CA
Reference Number:
GC-2017-Q1-02763
Agreement Type:
Grant
Report Type:
Grants and Contributions
Additional Information:

Grant or Award spanning more than one fiscal year. (2017-2018 to 2022-2023)

Recipient's Legal Name:
Aloise, Daniel (École Polytechnique de Montréal)
Program:
Discovery Grants Program - Individual
Program Purpose:

Our current speed of data generation combined with storage capacity increases have given rise to new paradigms in computing. For example, according to the IBM's website, there are approximately 695,000 status updates and 11 million instant messages sent every minute on Facebook. However, many organizations have faced the problem of having a lot of data, but poor knowledge about them. Clustering methods help to automatically identify unobserved groups for a set of data objects, and are currently being radically transformed by the size, the variety and the nature of the available data, i.e., by so-called “Big Data”. This research program focuses on the development of scalable algorithms for Big data clustering. This will be achieved both by: (i) redesigning well-known successful serial algorithms for scalability, leveraging their main theoretical ideas; and (ii) developing new algorithms and heuristics using new programming paradigms associated with Big Data.
In summary, the objectives of this research program are :
A. Produce an extensive survey of the existing Big Data clustering methods in order to provide a complete panorama about which ones can be adapted to approach Big data.
B. Redesign successful serial clustering algorithms, leveraging their main theoretical ideas to work with the new programming paradigms and computational tools from Big Data.
C. Develop algorithms for semi-supervised Big Data clustering, incorporating supplementary information provided by the user into the clustering decision process.
D. Develop Big Data clustering algorithms for new emerging applications.
E. Provide a repository of Big Data software and clustering algorithms with guaranteed effectiveness.
The software and algorithms developed in this research program are expected to constitute new benchmarks to the field, allowing larger datasets to be tackled with efficiency and effectiveness, leading to new insights in commerce, industry and academia. Moreover, given the Big Data expert shortage in Canada and worldwide, this research program will help to form and train specialists who will be able to master the scientific and technological issues emerging from the Big Data explosion.
In the long term, the findings of this research program will support data mining-driven decision making in the Internet of Things (IoT) era in which huge amounts of data are generated in real-time from the most varied sources and devices (e.g. vehicles, sensors, home appliances, etc.). By combining massive data processing, machine learning techniques and algorithms, IoT devices will be able to perform clustering as well as other classification tasks on a large scale.