zuruck zur Themenseite

Articles and background information on the topic

Industrial Data Fabric

Andreas Mühlbauer,

Data hub for production

Uniform access to distributed and heterogeneous data sources is a key to the adaptable factory. To avoid dependencies, this key must remain in the hands of the manufacturing company. This can be achieved with a separate architecture layer that creates a uniform data space across value chains, locations and clouds.

Distributed and heterogeneous data sources are key to digital production. © Hewlett Packard Enterprise

Collecting data, analyzing it and deriving conclusions and actions from it - this is essentially the three-step process of self-learning systems. They are the basis for the transition from automated to autonomous and therefore adaptable processes in production. The goals: Improving overall equipment effectiveness and establishing new digital business areas.

In machining production, for example, machine learning processes can digitally measure the surface quality of a workpiece. Algorithms and dynamic sets of rules generate recommendations for action on this basis, or they trigger automatic actions, such as sorting out the workpiece, adapting subsequent production steps or optimizing the milling process.

To do this, the manufacturing company must digitally map the production logic and production resources and provide access to the relevant quality and process parameters. By correlating process and quality parameters, data models are generated that enable continuous control and optimization of production.

So much for the theory. In practice, manufacturing companies are faced with a dilemma arising from the requirements of the data cycle. These requirements, the dilemma and a solution strategy are described below.

Advertisement

Data cycle between production sites and head office

The quality of the data models depends to a large extent on the quantity and quality of the "learning material". A maintenance technician may only know the exact behavior of an individual milling machine before and during a fault so well after years that he can identify or avoid the fault at an early stage. If experience with hundreds or thousands of similar milling machines could be aggregated, the learning process could be reduced to weeks or days.

Nothing else happens when data from the same machines or production steps from different locations is collected and fed into a self-learning data model. The more relevant data is available, the faster and better the learning effect. This is known as the data network effect. Industrial data in particular holds considerable potential for building sustainable competitive advantages through such effects.[1 ] It is not only the data generated directly in the production process that is relevant here - to stay with the example of machining production. Not only data on surface roughness, milling process, feed rate or cutting depth, but also, for example, measured values from the milling machine itself (e.g. vibration), environmental information (e.g. humidity), as well as logistical and business management parameters, for example from ERP systems.

The trained models, algorithms and rules control the data analysis and action in the production process. Depending on the production process, low response times (latency) through to real-time requirements are a prerequisite. For this reason, the data processes are usually carried out in the factories close to the machine or specialist system. Industrial edge systems are used here, which form the interface between industrial and IT systems. They ensure that analysis and action can take place without data transfer to remote data centers or clouds in order to guarantee the highest possible process stability.

This results in a permanent data cycle between distributed production sites and the head office. The data models are continuously improved at the central location with the help of the data generated at the production sites. In turn, the pre-trained models, algorithms and rules are used for operational process control at the production sites.

Avoid data islands and dependencies

In order to establish such a data cycle along the entire value chain, data must be integrated both vertically and horizontally. Vertical integration involves transmitting data from machines or systems to central IoT or cloud platforms. This can be, for example, the temperature and vibration value with the respective time stamp, which are regularly sampled and visualized or further processed in a central system.

Manufacturing companies are faced with a dilemma. If they use the IoT platforms of their various machine manufacturers in a "best-of-breed" approach, data islands can arise that make overarching analysis and control more difficult. This cements or further increases the complexity that already exists in manufacturing environments - in other words, the typical "spaghetti architecture" in which multiple databases, analysis tools and applications are connected to each other in a criss-cross fashion via individual interfaces. If, on the other hand, companies only rely on one or a few cloud platforms to reduce complexity, they can become overly dependent.

Horizontal data networking can solve these problems. The data is not transferred to a central location, but is linked together via a separate data layer. Hence the name "data fabric" or "network of meshed data connections".

Controlling the data cycle with a data fabric

On the one hand, a data fabric combines distributed and heterogeneous file systems into a single namespace by abstraction. This gives a production company standardized access to data and files that can be distributed across a wide variety of systems and any number of locations. This enables holistic data management, for example to ensure access rights and other compliance requirements. The data fabric also organizes the data cycle described above. It is therefore the hub through which production sites, cloud services and partner companies are integrated into this cycle as suppliers or recipients of data and analysis models.

In the "spaghetti architecture" typical of manufacturing environments, databases, analysis tools and applications are connected to each other in a criss-cross fashion via individual interfaces. © Hewlett Packard Enterprise

This approach therefore solves the complexity problem, as there is consolidated access to data and the interaction of applications, data sources and databases is organized via a uniform data layer. It also solves the dependency problem, as the production company itself controls the cycle - and not external platforms that integrate customers into their own network like a spider. Even when using several external platforms, the uniformity of the data architecture is not lost; a production company can therefore further reduce its dependency through a multi-vendor strategy without having to fear excessive complexity or data islands.

Building blocks of a data fabric

The Data Fabric is based on an open and permeable architecture. The most important building blocks along the process chain of acquisition, aggregation, analysis and action are described below:

Acquisition: Data is acquired using software modules that access databases via programming interfaces (API), such as the SQL database of the ERP system, the sensor data of a milling machine or the NoSQL database of a cloud application. They convert the respective industry protocols into IP packets and thus open up the variety of data sources.

Aggregation: Data flows from source systems into the data fabric via data pipelines in order to make it accessible to the target applications via messaging systems. In this process, the data is often selected and aggregated, as not all source data is usually relevant for further processing. The data can also be stored in a so-called data lake. This "data lake" aggregates the wealth of heterogeneous production-related data in order to create the most comprehensive database possible for machine learning.

Unlike a traditional data warehouse, a data lake can be distributed across various locations and environments - such as production sites, data centers or clouds. Client control is used to control which users can access which data sets and how. This means that the distributed data lake can also be used across several companies without compromising the data sovereignty of the companies involved.

Analysis: Using data taps, data analysts can access both the operationally circulating data and the distributed data lake in order to experiment with data models, train them, refine them and update them on an ongoing basis. Using stream analytics - the near-real-time analysis of event data streams - the trained models are then used to monitor the sensor data of ongoing production. For example, they detect deviations or conspicuous accumulations that indicate emerging faults. Stream analytics is therefore the basis for autonomous actions in operations as well as for more medium-term interventions, such as predictive maintenance.

Action: Actions are triggered on the basis of algorithms or business logic, such as the opening of a service order if a machine no longer produces the desired quality due to wear. In addition, downstream processes can be adapted on the basis of this knowledge in order to bring the quality back within the tolerance range. This is referred to as self-optimizing or autonomous systems.

Containers as a technological basis

Container virtualization is used as the technological basis for the data fabric. This allows the business logic of the data fabric to be distributed and operated uniformly across production and logistics sites, data centers and clouds. Containers and container orchestration with Kubernetes are the means of choice today for building distributed and largely platform-independent applications. The problem of data persistence can now also be solved so that even monolithic applications - such as MES or PPS - can be "containerized". This gives companies the invaluable advantage of a homogeneous environment with corresponding efficiency and transparency benefits in operation.

When setting up such a data fabric, each company must answer the question of "make or buy" for itself. Today, there is a wealth of technologies and open source tools available with which a company can set up a data fabric on its own. The alternative to this are commercial standard products. These include solutions such as the HPE Ezmeral Data Fabric. This is a massively scalable distributed file system that can also be used to manage data volumes in the petabyte range with high performance. It is a core component of the HPE Ezmeral Container Platform, which is used to build data fabrics and for the containerization of application environments. The platform also supports the analysis of distributed databases and offers functions for the persistent storage of data in container environments.

Tapping into the potential for added value

With a data fabric, manufacturing companies can create the network effects required to unlock the value creation potential of their data - for example in the form of increased operational efficiency, or by the company itself becoming a platform that offers its customers digital services. The data fabric is the data hub that enables the exchange and control of data and process logic. External data sources can also be tapped into and external parties can be granted controlled access to the company's own data sources. This also enables cross-company network effects.

Companies remain largely independent of central IoT or cloud platforms as they achieve network effects via a decentralized architecture that they control themselves. External cloud services nevertheless remain indispensable resources, as they have excellent tools and aggregation options. However, they can be used from a position of sovereignty, i.e. with sovereignty over the data and the resulting added value.

By Florian Doerr, Lead Solution Architect IoT and Data Analytics, Hewlett Packard Enterprise

[1] James Currier, https://www.nfx.com/post/truth-about-data-network-effects/, NFX, 2019

  • Xing Icon
  • LinkedIn Icon
Advertisement
Back to topic page
Advertisement

You might also be interested in

Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement

IIoT networking

How production can benefit from AI

Together with AI technology, IIoT networking makes it possible to better control machine parameters and optimize quality with predictive quality. Downtimes and set-up times can also be further minimized. Cloud platforms also make these technologies...

read more...
Subscribe to our newsletter
Advertisement
Back to home