Interaction between man and machine
Reliably detect human movement
Accurately detecting people in an industrial environment is a particular challenge. But what exactly makes it so difficult to reliably recognize people?
Training a machine to not only perceive but also understand the world around it is a complex technological and computational challenge. Recognizing people is an even more complex task, as the uniqueness and diversity of humans make them one of the most difficult objects to recognize unless a system is extensively trained on specific individuals. Even a change in clothing or hairstyle can lead to recognition problems. If additional factors such as a spatially extensive interaction environment or the unpredictability of human behavior are added, the technical challenges increase rapidly.
In industrial environments, for example, several people often work at high speed and perform different tasks in the same space. Attempts to track their movements from a lateral or even isometric view have so far provided inadequate solutions, as such a solution requires the system to understand the depth of the view. Furthermore, in a single-camera configuration, one person can easily obscure the view of another and create blind spots.
Furthermore, one of the biggest challenges in the development of machine vision systems is not so much the capture of images, but rather their processing. For a machine to understand human movement in real time, strong computing power is required to ensure both high speed and accuracy. As no two environments are the same, developing a system that can not only understand the nuances of human movement but also adapt to different scenarios and lighting levels has been an obstacle to the widespread application of such technologies.
A different perspective
Human recognition systems are typically based on a top-down perspective and capture images in a similar way to security cameras. This top-down approach is common because there is an abundance of publicly available imagery from this perspective that is used to train AI models. However, from a bird's eye view, it is difficult for recognition systems to capture the position of people in detail, especially when people overlap in the scene. This makes it less effective for support tasks such as improving work efficiency.
Omron therefore decided to train the 'AM1' camera system using images taken from the front, i.e. from the same perspective from which one colleague would see the other. However, as such images are rarely found on the internet, the AI models were created and trained using a proprietary data set. This approach gives the system, which was developed for human productivity in industrial environments, an advantage in recognizing and analyzing human movements. It uses a single top-down camera in combination with software specifically optimized to detect and interpret human motion.
Human movement patterns
The AM1 system is trained with still images and not with movement sequences. However, Omron has included a variety of postures and movement patterns in these still images to ensure that typical human behaviors such as walking, standing, bending or reaching are captured comprehensively. The AI is trained to recognize these static poses, which represent dynamic movements when processed in real time. This approach does not require millions of training images; instead, a carefully selected set of images covers the most important postures and scenarios. This allows the system to be trained efficiently without relying on an excessively large data set.
Detection range and frame rate
The software can track up to ten people within a 7 m × 7 m area with an accuracy of over 95 percent. This makes it possible to track where and how employees move around or how long they spend in a particular location. Companies can use this type of information to identify bottlenecks and ensure that space utilization and workflows are as efficient as possible. In practice, this could mean removing obstacles in pathways, shortening the most frequently used routes or reducing the likelihood of workers having to cross someone else's path. By identifying and understanding problems earlier, solutions can be found more quickly based on a data-driven approach.
The accuracy of the AM1 is achieved through a frame rate of ten images per second. The image data from the camera, or several cameras if necessary, is fed into a processing hub via Ethernet, which is operated by an 'OpenVino' accelerator from Intel.
In cases where larger areas need to be covered, for example in rooms larger than 7 m × 7 m, it is possible to combine the results from several cameras. By merging the images from the individual cameras and removing overlapping areas, the system can create a seamless, larger field of view.
The 'OpenVino' accelerator
Intel's OpenVino (Open Visual Inference and Neural Network Optimization) toolkit is a software framework for accelerating the performance of deep learning models for computer vision applications. It optimizes pre-trained AI models for efficient execution on Intel hardware, including CPUs, integrated GPUs, FPGAs and dedicated accelerators.
In the case of the AM1 system, the toolkit acts as an accelerator by making the human recognition and motion interpretation models run faster and more efficiently on standard computing platforms. This significantly reduces the inference time (i.e. the time it takes to process new images and generate recognition results), which is crucial for real-time applications in industrial environments. Essentially, OpenVino helps the AM1 system deliver fast, reliable and accurate human recognition without the need for high-end or specialized hardware.
This enables the system to quickly convert raw data into useful information. 'Fast' in this context refers to the system's ability to process image data and detect the presence or posture of people in real or near real time. The AM1 system can detect and analyze human positions with minimal delay, usually within milliseconds, so that it can react immediately to changes in the environment.
For industrial applications, this level of speed is essential to ensure smooth operation, avoid bottlenecks and support safety protocols. For example, if the system is used to monitor the presence of employees in a hazardous area or to optimize the efficiency of workflows, it must detect and respond to human movement without noticeable delay. Once processed, the information is then relayed to a standard PC or PLC for human operators.
Thanks to Omron's extensive data library, which has been collected over years of developing image processing solutions, the system does not need to be trained on specific people and can recognize any human body type. Therefore, no special programming knowledge is required for users.












