Grants and Contributions:
Grant or Award spanning more than one fiscal year. (2017-2018 to 2022-2023)
We propose to build data models for real-time, on-the-fly pattern recognition in the large volumes of operational data produced by Cloud environments. Our pattern recognition techniques will be applied to a range of automated management tasks for Cloud environments, such as, anomaly diagnosis, and resource provisioning. We will also investigate the extensibility and applicability of our pattern recognition techniques to other types of big data, such as, biomedical and biometric data hosted on Cloud environments.
Objectives
1. Data Collection
We will capture various types of Cloud operational data and tag them with the high level application scope in which they occur, at the collection points, for all the application scopes of interest e.g., workload phase, parametrized request, or method invocation. The types of operational data we plan to capture and investigate include routine logging and monitoring information, such as, throughput, latency, dependability, or other application QoS metrics, as well as log statements and resource consumption time series data for CPU, network bandwidth, disk and memory consumption.
- Pattern Learning and Recognition
As data is collected, one or more statistical learning techniques are deployed in order to learn mathematical data models, per context. For contexts that statistically diverge from the learned data model, we will trigger an automated system adaptation e.g., in terms of resource allocation and/or provide context-rich problem digests for human inspection.
Methodology
Our key idea is to leverage application contexts within our on-the-fly pattern learning and recognition framework, by explicitly augmenting routine system logging and monitoring data collection with the application contexts where these data occur. We expect that several, complementary data modelling and analysis methods will be applied jointly, or independently, for the various contexts, e.g., for the same application scope, for different scopes of the same granularity, and/or for nested scopes of different granularities. Different methods may work better for different data and different contexts. For example, a classification method based on clustering may work well for log template data, but a Neural Network method for pattern matching in time series data for resource consumption could provide better pattern extraction, learning and matching accuracy. Both methods could provide higher accuracy for anomaly detection than either method alone. In order to detect application contexts we currently use static analysis on application source code. However, we plan on including more general techniques based on binary code inspection as our project progresses. In our experimental evaluation of our prototype, we will use open-source systems in our Cloud software stack and experiment with clustering, dynamic time warp, Neural Networks and other statistical learning methods.