# Using Artificial Intelligence to Find Hidden Anomalies in Massive Datasets | MIT News

Identifying a malfunction in the national power grid can be like trying to find a needle in a huge haystack. Hundreds of thousands of interdependent sensors distributed across the United States capture data on electrical current, voltage, and other critical information in real time, often taking multiple records per second.

Researchers at the MIT-IBM Watson AI Lab have developed a computationally efficient method that can automatically identify anomalies in these real-time data streams. They demonstrated that their artificial intelligence method, which learns to model the interconnection of the power grid, is much better at detecting these problems than some other popular techniques.

Since the machine learning model they developed does not require annotated power grid anomaly data for training, it would be easier to apply in real-world situations where high-quality labeled datasets are often hard to find. The model is also flexible and can be applied to other situations where a large number of interconnected sensors collect and report data, such as traffic monitoring systems. It could, for example, identify traffic bottlenecks or reveal the cascade of traffic jams.

“In the case of a power grid, people have tried to capture the data using statistics and then define detection rules with domain knowledge to say that, for example, if the voltage increases by a certain percentage, then the network operator must be alerted. Such rule-based systems, even enhanced by the analysis of statistical data, require a lot of work and expertise. We show that we can automate this process and also learn patterns from the data using advanced machine learning techniques,” says lead author Jie Chen, research staff member and head of the MIT-IBM Watson AI Lab.

The co-author is Enyan Dai, an MIT-IBM Watson AI Lab intern and graduate student at Pennsylvania State University. This research will be presented at the International Conference on Representations of Learning.

**Probability poll**

The researchers started by defining an anomaly as an event that has a low probability of occurring, such as a sudden spike in voltage. They treat power grid data as a probability distribution, so if they can estimate probability densities, they can identify low density values in the data set. Data points least likely to occur are anomalies.

Estimating these probabilities is not an easy task, especially since each sample captures multiple time series and each time series is a collection of multi-dimensional data points recorded over time. Additionally, the sensors that capture all of this data are conditional on each other, meaning they are connected in a certain configuration and sometimes one sensor can impact the others.

To learn the complex conditional probability distribution of the data, the researchers used a special type of deep learning model called normalization flow, which is particularly good at estimating the probability density of a sample.

They augmented this normalization flow model by using a type of graph, known as a Bayesian network, that can learn the complex causal relationship structure between different sensors. This graph structure allows researchers to see patterns in the data and estimate anomalies more accurately, Chen says.

“Sensors interact with each other, and they have causal relationships and depend on each other. So we need to be able to inject that dependency information into how we calculate probabilities,” he says.

This Bayesian network factors or decomposes the joint probability of multiple time series data into less complex conditional probabilities that are much easier to parameterize, learn, and evaluate. This allows researchers to estimate the likelihood of observing certain sensor readings and identify readings that have a low probability of occurring, meaning they are anomalies.

Their method is particularly powerful because this complex graph structure does not need to be defined in advance — the model can learn the graph on its own, unsupervised.

**A powerful technique**

They tested this framework by seeing how well it could identify anomalies in power grid data, traffic data, and water system data. The datasets they used for testing contained anomalies that had been identified by humans, so the researchers were able to compare anomalies identified by their model with real problems in each system.

Their model outperformed all baselines by detecting a higher percentage of true anomalies in each dataset.

“For baselines, a lot of them don’t have a graphical structure. This fully supports our hypothesis. Understanding the dependency relationships between different nodes in the graph definitely helps us,” Chen says.

Their methodology is also flexible. Armed with a large set of unlabeled data, they can tune the model to make effective anomaly predictions in other situations, such as traffic patterns.

Once the model is deployed, it would continue to learn from a steady stream of new sensor data, adapting to possible drift in the data distribution and maintaining accuracy over time, Chen says.

Although this particular project is coming to an end, he looks forward to applying the lessons he learned to other areas of deep learning research, especially on graphs.

Chen and his colleagues could use this approach to develop models that map other complex conditional relationships. They also want to explore how they can efficiently learn these patterns when the graphs get huge, perhaps with millions or billions of interconnected nodes. And rather than finding anomalies, they could also use this approach to improve the accuracy of predictions based on datasets or streamline other classification techniques.

This work was funded by the MIT-IBM Watson AI Lab and the US Department of Energy.