A mechanical hand is on display at Robot Mall, the world’s first Embodied Intelligent Robot 4S store, on August 13, 2025 in Beijing, China.
Video Visual China Group | Getty Images
Beijing — alibaba Cloud is investing in new types of artificial intelligence designed to better replicate the real world using a different approach than chatbots, such as OpenAI’s ChatGPT.
This shift recognizes the limitations of “large-scale language models” trained primarily on text. Instead, developers are starting to focus on “world models” built from videos and real-world physical scenarios.
To capitalize on this trend, Alibaba has led a 2 billion yuan ($290 million) investment in ShengShu, a startup developing the AI video generation tool Vidu, the company announced on Friday. TAL Education and Baidu Ventures also participated in the Series B funding round.
The investment comes about two months after ShengShu raised 600 million yuan from Qiming Venture Partners and other backers. The company declined to disclose its valuation.
ShengShu said the funding will support the development of a “general world model” that uses AI to bridge two currently separate realms: the digital world of games and AI-generated videos and the physical world of self-driving cars and robots.
“ShengShu believes that general world models built on multimodal data such as vision, sound, and touch provide a more natural picture of how the physical world works than large-scale language models,” the three-year-old startup said in a statement.

“We aim to connect perception and behavior,” ShengShu founder Zhu Jun added in a statement, allowing AI systems to consistently better model and predict real-world behavior.
According to Artificial Analysis, ShengShu’s latest Vidu Q3 Pro model, released in January, ranks among the top 10 AI models that generate videos from text and images.
The company launched Vidu globally a few months before OpenAI’s now-shuttered Sora tool for AI video generation became widely available. Chinese short video company Kuaisho ByteDance and ByteDance have also released similar competing AI tools for generating videos.
world model contest
Alibaba is increasing its investment in related startups.
Last month, the Chinese tech giant and Baidu Ventures led a $50 million investment in Tripo AI, a platform that uses AI to quickly generate digital 3D models from photos. Tripo also said it is moving away from technology used in language models to AI tools based on physical space and developing its own world models.
In September, Alibaba also led a $60 million investment in PixVerse, which earlier this year released an AI world model that allows users to dictate how videos unfold during generation.
Alibaba, which got its start in e-commerce, also released a free open-source AI model for video generation, and in February released an AI model for powering robots.
Shengshu announced on Friday that it has entered into strategic partnerships with companies that develop body-based AI (systems such as humanoid robots that interact with the physical world) for use across industrial, commercial and home environments.
Kevin Kelly, co-founder of US technology magazine Wired, wrote on his Substack last month:
Ultimately, Kelly said, AI will need three things to replicate human intelligence: reasoning, understanding the physical world, and continuous learning. He said that while AI in the learning category has not yet been developed, chatbots powered by LLM create knowledge elements and world models are an important area in need of breakthroughs.
