Data lake as the basis for Industry 4.0

Dispositive data processing

Andreas Mühlbauer, 10.03.2021, 06:18

Data lake as the basis for Industry 4.0

For a long time, the data warehouse was considered the central source for all data analyses. In the course of increasing digitalization, however, the data lake has overtaken the classic data warehouse. Especially in Industry 4.0, many use cases are no longer conceivable without it. What do companies need to consider when implementing the technology?

Images

Comparison between data warehouse and data lake. © Alexander Thamm

The right architecture for planning data processing had been clearly defined since the 1990s. A data warehouse collects the relevant data from the various operational source systems in a hub-and-spoke approach. The data is then harmonized, integrated and persisted in a multi-layered data integration and data refinement process. In this way, a single point of truth can be created from the data: a universally valid, correct database that can be relied upon. The user can access this treasure trove of data via reporting and analysis tools.

An essential characteristic of the data warehouse is to provide a uniform view of the company data - in a strict and predefined data model that is optimized for the evaluation of the data. Past-optimized analyses of key figures along consolidated evaluation structures can thus be optimally implemented. However, the high standards of correctness and degree of harmonization usually also mean that it takes quite a long time for data from a new data source to be correctly integrated - because a great deal of design and coordination work is required in advance.

New data sources require new solutions

This problem has become particularly apparent since the emergence of new data sources such as social media or IoT data. This data is often available in semi-structured or unstructured form, but still needs to be integrated into the data format. With the increasing relevance of these data sources, the idea of the data lake was born. The data lake can make all source data - internal and external, structured and polystructured - available as raw data, even in its unprocessed form, in order to have it available as quickly as possible.

Creation of a data lake structure. © Alexander Thamm

While the focus of the data warehouse is clearly on the past-oriented analysis of key figures along consolidated evaluation structures, the data lake offers greater agility and flexibility. It can quickly integrate diverse data sources and large volumes of data and process them into data streams. This enables complex analyses - even those that are usually not even defined at the time of data storage.

Looking at these different objectives and characteristics of data lakes and data warehouses, it becomes clear that a data lake does not replace a data warehouse, but complements it. Both architectural concepts have their relevance and serve different use cases.

The data lake enables the optimization of products in the industry

In industry, two specialist requirements are driving the use of data lakes in particular. The optimization of production and the offer of better or new products, sometimes even completely new business models. The basic use cases here are the "digital twin", i.e. the digital image of the company's own or produced machines and the connection of these to the data lake with almost real-time data up-to-dateness.

While the data lakes of the first generation were technically very complex and the connection was challenging in terms of the required timeliness, the barriers to the use of data lakes have fallen today. Due to the change in the market situation of commercial distribution providers and the general strategy of increased cloud usage, this is shifting in the case of second-generation data lakes: the complexity of managing the basic platform is massively simplified when using native cloud services or dedicated managed Hadoop environments. Today, this enables the use of data lakes for almost any size of company.

The right strategy for companies

If a company wants to use a data lake, a number of considerations need to be made in advance. To this end, it is advisable to clearly identify and prioritize the use cases as part of a roadmap. The components that are to be used initially must then be selected. A continuous search and evaluation of alternatives from commercial, open source and cloud services options makes it possible to create optimum added value for the company.

In addition to the functional requirements, other points must also be taken into account in industrial use. These include, in particular, the protection of trade secrets from competitors and legal aspects. Machine manufacturers are also faced with the challenge of accessing the data of their own machines in the customer context, as machines from different manufacturers are often used in combination and customers in turn do not disclose all data to protect their company.

When setting up a data lake initiative, certain key conditions also emerge in practice as the basis for successful implementation. These are similar to those for implementing a central data warehouse: a strong management decision to set up and use a central platform initiative and the resulting close cooperation between business and production IT, and possibly also product development, which has often not been practiced to date, are fundamental.

In addition, the operation of a data lake should be set up flexibly and holistically. A DevOps team that continuously develops the platform and keeps it stable in operation has proven to be best practice.

In summary, it can be said that data warehouses and data lakes fulfill different requirements. In principle, a data lake platform is required for every Industry 4.0 initiative. The technological entry barrier for data lakes has fallen, but still requires sound planning of the architecture. The basis should be a roadmap for use cases. In order to maximize value creation in the long term, the necessary organizational requirements for the successful use of a data lake platform must be created alongside the technology.

Dr. Carsten Dittmar and Peter Schulz, Alexander Thamm GmbH

Back to topic page

You might also be interested in

ESTIA aims to strengthen Europe's cloud

New tech alliance for EU sovereignty

Twelve companies, including Dassault Systèmes, are founding the European Sovereign Tech Industry Alliance (ESTIA). The initiative aims to strengthen Europe's technological independence and define requirements for sovereign cloud services. The start...

AI sovereignty for Germany?

Deutsche Telekom and Nvidia launch one of Europe's largest AI factories

With the Industrial AI Cloud in Munich, Deutsche Telekom and NVIDIA are creating one of the largest AI factories in Europe. Companies can train AI models, carry out simulations and access 10,000 GPUs.

Streaming of production data

Leveraging data potential with AI and streaming

Huge data treasures are still lying dormant in many manufacturing companies. And yet it is high time that these treasures were unlocked. There is often a lack of resources and capacity to make data usable in real time. Streaming production data can...

Noise mapping

More awareness of noise

Dematic offers solutions that are tailored to different cloud environments and reduce environmental impact through reduced material consumption and improved energy efficiency.

AI-supported solutions

Generative AI in the automation environment

The mechanical engineering and automation industry is undergoing profound change. This transformation is being driven by more volatile global trade, geopolitical crises and increasing market pressure, which demands greater agility, technological...

Hybrid cloud storage

Rigid data structures have had their day

The traditional data center is no longer the center of data. A hybrid cloud storage approach enables manufacturing companies to both meet the current requirements of data-intensive processes and future-proof their infrastructure.

Digital product development

Software as a service in three stages

CIM Database Cloud is a new out-of-the-box solution from Contact for end-to-end digital product development.

IIoT networking

How production can benefit from AI

Together with AI technology, IIoT networking makes it possible to better control machine parameters and optimize quality with predictive quality. Downtimes and set-up times can also be further minimized. Cloud platforms also make these technologies...

Altair has acquired OmniV

More expertise for digital engineering

Altair, a computational science and artificial intelligence (AI) company, has acquired OmniV, a technology from Michigan, USA-based software company XLDyn.

Data lake as the basis for Industry 4.0

New data sources require new solutions

The data lake enables the optimization of products in the industry

The right strategy for companies

You might also be interested in

New tech alliance for EU sovereignty

Deutsche Telekom and Nvidia launch one of Europe's largest AI factories

Leveraging data potential with AI and streaming

More awareness of noise

Generative AI in the automation environment

Rigid data structures have had their day

Software as a service in three stages

How production can benefit from AI

More expertise for digital engineering

Media & Events

Service

Further offers

About us

Our network