Artificial Intelligence for the Physical Space

Melanie Steinbeck,

Alibaba Expands Qwen Model Family with a Robotics Suite for Physical AI

Alibaba is expanding its Qwen model family with a robotics suite for Physical AI. The three models—for manipulation, navigation, and simulation—are designed to assist autonomous robots with complex tasks in real-world environments.

Illustrative image © IM Imaginary/stock.adobe.com

Today, artificial intelligence can write text, analyze images, and carry on conversations. Things get more challenging when it’s expected not only to think but also to act. It is precisely at this intersection between the digital and physical worlds that Alibaba is focusing its efforts with its new Qwen Robot Suite. The expansion of the Qwen model family is designed to help robots perceive their environment, make decisions, and perform tasks independently.

The suite comprises three core models: Qwen-RobotManip for manipulation tasks, Qwen-RobotNav for navigation, and Qwen-RobotWorld, a so-called video world model for embodied intelligence.

When AI Gets Its Hands Dirty

Large multimodal models are now impressively capable of processing text, speech, and images. However, translating these capabilities into precise physical actions remains one of the greatest challenges in robotics. This is because robots must not only understand voice commands but also translate them into concrete movements, perceive unfamiliar environments, and handle objects they have never seen before.

Advertisement
According to Alibaba, Qwen-RobotManip—codenamed Lira and Atlas—achieved top results in RoboChallenge, a benchmark for embodied intelligence using real robots. © Alibaba

That is exactly why the Qwen Robot Suite was developed. The models are designed to help real robots—from industrial robot arms to delivery robots to robotic dogs—perceive their environment in real time, make decisions, and carry out actions. Their ability to generalize is particularly important here: they should be able to handle new tasks, new locations, or new objects without requiring extensive adjustments.

See, Understand, Act

According to Alibaba, the three models achieve industry-leading results in dozens of recognized robotics benchmarks. These include RoboChallenge, a large-scale benchmark for embodied intelligence using real robots. Selected enterprise customers of Alibaba Cloud in the robotics industry are already testing the Qwen-Robot Suite in pilot projects.

Qwen-RobotManip, codenamed Lira and Atlas, is based on Qwen3.5-4B VL and was trained using more than 38,000 hours of open-source data. The training data comes from sources including robotics repositories, videos of human manipulation tasks, and synthetically generated human-to-robot datasets. The model improves upon the previous state of the art in transfer learning between different robot platforms by a factor of three. As a result, it can be deployed on various robot hardware with minimal retraining.

Qwen-RobotNav is based on Qwen3-VL and was trained on 15.6 million curated examples. The data covers trajectory planning as well as visual-linguistic reasoning. The model serves both as a scalable navigation engine and as a unified interface for agent-based navigation systems. This makes it particularly well-suited for agent-based systems that handle long-term tasks, such as embodied question answering. In this context, a robot answers questions about its environment—for example, where a specific object was placed.

Based on current observations, Qwen-RobotWorld predicts physically plausible future visual motion sequences. Alibaba trained the model using 8.6 million video-text pairs. These encompass more than 200 million frames, over 20 embodiment types, and 500 action categories. The model can generate synthetic video training data for robots and enable systems to simulate future motion sequences before execution. This capability is particularly well-suited for robotic manipulation, embodied planning, and complex indoor navigation.

From Chatbot to Key Player

With the Qwen Robot Suite, Alibaba is bringing its Qwen architecture from the digital world into the realm of Physical AI. The technology company is thus shifting its focus away from simple chatbots and toward autonomous agents that handle complex tasks in both the digital and physical worlds.

According to Alibaba, the Qwen Robot Suite achieves top results in several robotics evaluation benchmarks across a variety of task areas. © Alibaba

The Qwen-Robot Suite is intended to lay the groundwork for transforming general AI models into practical agents for the physical world. General Qwen models can collaborate directly with the robotics models and use them as specialized tools. This is intended to bridge the gap between general intelligence and physical action.

Alibaba illustrates what this might look like with an example: “Check if someone left a green umbrella at Cotti Coffee.” For a request like this, an agent-based system can use a general Qwen model as a high-level strategic planner and leverage Qwen-RobotNav for real-time execution. The system then navigates autonomously through the physical space and provides a response based on concrete observations.

Looking ahead, Alibaba plans to integrate the Qwen Robot Suite into a broader ecosystem of physical agents. These agents are designed to autonomously perceive their surroundings, make spatial decisions, and perform long-term tasks in dynamic real-world environments.

  • Xing Icon
  • LinkedIn Icon
Advertisement
Advertisement

You might also be interested in

Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Subscribe to our newsletter
Advertisement
Back to home