Industrial Big Data

Updated on Oct 12, 2024

Edit

Comment

Industrial Big Data generally refers to a large amount of diversified time series generated at a high speed by industrial equipment. The term emerged along with the concept of "Industrial Internet of Things” and "Industry 4.0”, and diverges from "Big Data”, which is more popular in information technology, in that data created by industrial equipment might hold more potential business values. "Industrial Big Data” takes advantage of the infrastructure realized by "Industrial Internet”, and draws actionable information from the raw data to support management decision making, so that businesses will be able to reduce costs in maintenance and improve customer service. A research conducted by Accenture and General Electric forecasted that the values created by "Industrial Internet of Things” and "Industrial Big Data” could be worth $500 billion by 2020.

Similarities and differences between "Industrial Big Data" and "Big Data"

The concept of "Industrial Big Data" in industry is related to "Big Data" in information technology, but there are certainly distinctive characteristics between them. Both "Industrial Big Data" and "Big Data" refer to data generated in high volume, high variety, and high velocity that require new technologies of processing to enable better decision making, knowledge discovery and process optimization. Sometimes, the feature of veracity is also added to emphasize the quality and integrity of the data. However, for "Industrial Big Data", there should be two more "V’s". One is "Visibility", which refers to the discovery of unexpected insights of the existing assets and/or processes and in this way transferring invisible knowledge to visible values. The other "V" is "Value", which put an emphasis on the objective of "Industrial Big Data" analytics – creating values. This characteristic also implies that, due to the risks and impacts industry might face, the requirements for analytical accuracy in "Industrial Big Data" is much higher than "Big Data" analytics in general, such as social media and customer behavior.

Compared to "Big Data" in general, "Industrial Big Data" is usually more structured, more correlated, more orderly in time and more ready for analytics. This is because "Industrial Big Data" is generated by automated equipment and processes, where the environment and operations are more controlled and human involvement is reduced to minimum. Nevertheless, the values in "Industrial Big Data" will not reveal themselves after connectivity is realized by "Industrial Internet". Even though machines are more connected and networked, "Industrial Big Data" usually possess the characteristics of "3B", namely:

Background

o General "Big Data" analytics often focuses on the mining of relationships and capturing the phenomena. Yet "Industrial Big Data" analytics is more interested in finding the physical root cause behind features extracted from the phenomena. This means effective "Industrial Big Data" analytics will require more domain know-how than general "Big Data" analytics.

Broken

o Compared to "Big Data" analytics, "Industrial Big Data" analytics favors the "completeness" of data over the "volume" of the data, which means that in order to construct an accurate data-driven analytical system, it is necessary to prepare data from different working conditions. Due to communication issues and multiple sources, data from the system might be discrete and un-synchronized. That is why pre-processing is an important procedure before actually analyzing the data to make sure that the data are complete, continuous and synchronized.

Bad-Quality

o The focus of "Big Data" analytics is mining and discovering, which means that the volume of the data might compensate the low-quality of the data. However, for "Industrial Big Data", since variables usually possess clear physical meanings, data integrity is of vital importance to the development of the analytical system. Low-quality data or incorrect recordings will alter the relationship between different variables and will have a catastrophic impact on the estimation accuracy.

Therefore, simply transferring the techniques developed for general-purpose "Big Data" analytics might not work well for "Industrial Big Data" analytics. "Industrial Big Data" requires deeper domain knowledge, clear definitions of analytical system functions, and the right timing of delivering extracted insights to the right personnel to support wiser decision making.

Data acquisition, storage and management infrastructure

As data from automated industrial equipment are being generated at an extraordinary speed and volume, the infrastructure of storing and managing these data becomes the first challenge any industry will face. Different from the tradition business intelligence which mostly focuses on internal structured data and processes that information in regularly occurring cycles, "Industrial Big Data” analytical system requires near real-time analytics and visualization of the results.

The first step is to collect the right data. Since the automation level of modern equipment is getting higher, data are being generated from an increasing number of sensors. Recognizing the parameters are related to equipment status is important to reducing the amount of data necessary to be collected and increase the efficiency and effectiveness of data analytics.

The next step is to build a data management system that will be able to handle large amounts of data and perform analytics in near real-time. In order to enable rapid decision making, data storage, management and processing need to be more integrated. General Electric has built a prototype data storage infrastructure for fleet of gas turbines. The developed in-memory data grids (IMDG)-based system was proved to be able to handle challenging high velocity and high volume data flow while performing near real-time analytics on the data. They believe that the developed technology has demonstrated a viable path to realize batch "Industrial Big Data” management infrastructure. As prices of memory becomes cheaper, such systems will become central and fundamental to future industry.

Cyber-physical systems

Cyber-physical systems is the core technology of "Industrial Big Data”. Cyber-physical systems are systems that require seamless integration between computational models and physical components. Differing from the traditional operation technology, "Industrial Big Data” requires that the decision to be informed from a way wider scope, a central part of which is equipment status. The "5C” (Connection, Conversion, Cyber, Cognition, Configuration) architecture has indicated that cyber-physical systems is focused on transferring raw data to actionable information, understanding process insights, and eventually improve the process by well-informed decision making. Improved processes will further increase productivity and reduce costs. This aligns with the mission of "Industrial Big Data”, which is to reveal insights from the large amount of raw data and turn that information into values. This combines the power of information technology and operation technology to create an information-transparent environment to support decisions for users of different levels.

Application of such techniques has been realized by the NSF Industry/University Collaborative Research Center for Intelligent Maintenance Systems (IMS) on a Cosen bandsaw machine, and demonstrated the technology in IMTS 2014 in Chicago. Adaptive degradation monitoring techniques have been developed by IMS to cope with the high data volume and velocity generated during cutting and the ever-changing load conditions. With the predicted bandsaw degradation condition, users will be advised of the optimal time of bandsaw change, so that safety will be ensured and material failure cuts will be avoided. The developed analytical computation is realized on cloud, and is accessible through the Internet and mobile devices.

Sample repositories

Every unit in an industrial system generates vast amount of data every moment. Billions of data samples are being generated by every single machine per day in a manufacturing line. As an example, a Boeing 787 generates over half a terabyte of data per flight. Clearly the volume of data generated by group of units in an industrial system is far beyond the capability of traditional methods therefore handling, managing and processing it would be a challenge.

In the course of last several years, researchers and companies have actively participated in collecting, organizing and analyzing huge industrial data sets. Some of these data sets are currently available for public usage for research purposes.

NASA data repository is one of the most famous data repositories for Industrial Big Data. Various data sets provided by this repository may be used for predictive analysis, fault detection, prognostics and etc.

References

Industrial Big Data Wikipedia

(Text) CC BY-SA

Contents

Similarities and differences between "Industrial Big Data" and "Big Data"

Data acquisition, storage and management infrastructure

Cyber-physical systems

Sample repositories

References