Data analysis
Data quality influences AI projects
The industrial use of artificial intelligence is making slow progress in Germany. Data quality is crucial for the technology to develop its full potential.
In a survey conducted by the industry association Bitkom, around 10% of companies in Germany stated that they now use AI technology.
AI can provide relief and streamline processes in very different areas such as quality control in production, predictive maintenance or production process optimization. Many companies also expect to gain expert knowledge from its use that would not be available without AI-supported analyses and forecasts. Expectations of predictive analytics are very high. However, disillusionment often sets in when these expectations are disappointed and no additional knowledge is gained.
The basis for every AI project lies in the selection of one or more suitable AI models. Even at such an early stage, the course is set for the success of a project. This is because not every model produces the same results with the same inputs. The selected AI model, the configuration and the amount and distribution of the training data all influence the results. The number of training runs also plays a role - another important factor, as data quality also influences the outcome of an AI project.
Data quality is crucial
In order to achieve convincing results when using artificial intelligence, data quality must already be considered when merging the data, the ETL process (Extract, Transform, Load). The quality of the data can be measured. Computer science knows a few parameters here:
Completeness: Data is considered complete if all content and data has been transferred in an ETL process.
Correctness: The rule of thumb here is that a data set is correct if it corresponds to reality.
Consistency: Data records must not have any logical contradictions with each other or with other data records within a data source.
Uniqueness: A data record is unique if the objects described are only displayed once.
Conformity: The data must correspond to the defined format.
Validity: The data corresponds to the defined value ranges.
High data quality is important when using AI, as the technology is also subject to another disturbance known as "bias". AI is made by humans, which is why it cannot act without bias. Prejudices of the development teams flow into the programming, whether intentional or unintentional. If the bias meets low data quality in a model that has also been incorrectly selected, the AI must inevitably fall short of its own capabilities. The importance of data quality can be summarized using a simple example. In this respect, AI is no different from image processing. It cannot turn a bad photo into a masterpiece.
This means that AI systems can only deliver correct (and unbiased) results if data is available in a cleansed and suitable format. In addition to selecting a suitable AI model, measuring data quality right at the start of an AI project is therefore crucial. The results from determining the quality are then incorporated into an analysis that shows whether further data needs to be collected or whether there are still gaps.
In order to achieve the best results, data from different sources is usually used in AI projects and data analyses. However, in many organizations, important information and data still lie dormant in more or less large data silos. One such silo can be the customer database created by the sales department itself, whose data is withheld from other departments. In data analyses, however, it is important that data can flow freely. However, this is not an exclusively technical process. After all, data silos are often simply the result of silo thinking, which also affects data quality.
Raising awareness of the benefits of AI
The required data often belongs to a specialist department that is also responsible for data quality. In case of doubt, this department also has the knowledge of what information needs to be added or where it can be obtained. However, the influence on the success of an analysis project is not understood there. This is often due to the fact that the people involved from the specialist area do not recognize the advantages that the project has for their own work. As a result, there is a lack of motivation to improve data quality and measures to improve data quality tend to be seen as an annoying additional task. Data analysis and AI projects are also change projects in which the specialist departments, as data owners, need to be convinced that better data quality will benefit them, i.e. that real added value can be achieved.
As part of such a change project, the stakeholders from the specialist departments, data analysts and data engineers must gain a common understanding in order to act as a "data product" team. When improving data quality, the focus should be on the wishes and requirements of the specialist department, as they will also have to work with the information later on. AI projects and data analyses are team projects and therefore also have to do with the corporate culture. Anyone considering using artificial intelligence should therefore first create the internal conditions.
Elena Fomenko, Senior Consultant, AI- and Data-Driven product development, Detecon









