Google DeepMind shared a research preview of SIMA 2 on Thursday. SIMA 2 is a next-generation generalist AI agent that integrates the language and reasoning capabilities of Gemini, Google’s large-scale language model, to enable it to understand and interact with its environment, not just follow instructions.
Like many of DeepMind’s projects, including AlphaFold, the first version of SIMA was trained using hundreds of hours of video game data to learn how to play multiple 3D games like humans (including games it wasn’t trained on). SIMA 1, announced in March 2024, was able to follow basic instructions in a wide range of virtual environments, but had only a 31% success rate in completing complex tasks compared to 71% for humans.
“SIMA 2 has significant changes and improvements in functionality compared to SIMA 1,” Joe Marino, a senior researcher at DeepMind, said in a press conference. “This is a more general agent. It can complete complex tasks in environments it’s never seen before. It’s also a self-improving agent, so it can actually improve itself based on its own experience. This is a step toward more general-purpose robots and AGI systems.”

SIMA 2 is powered by the Gemini 2.5 flashlight model, and AGI stands for Artificial General Intelligence. DeepMind defines it as a system capable of performing a wide range of intellectual tasks, with the ability to learn new skills and generalize knowledge across different disciplines.
DeepMind researchers say working with so-called “embodied agents” is crucial for generalized intelligence. Marino explained that while a materialized agent interacts with the physical or virtual world through its body, observing input and performing actions just like a robot or a human would, a nonmaterialized agent might interact with a calendar, take notes, or run code.
Jane Wang, a senior researcher at DeepMind with a background in neuroscience, told TechCrunch that SIMA 2 goes far beyond gameplay.
“We’re asking them to understand what’s actually going on, what the user wants, and be able to respond in a common sense way, which is actually very difficult,” Wang said.
tech crunch event
san francisco
|
October 13-15, 2026
By integrating Gemini, SIMA 2 doubled the performance of its predecessor, integrating Gemini’s advanced language and reasoning abilities with the embodied skills developed through training.

Marino demonstrated SIMA 2 in “No Man’s Sky,” where the agent described the surface of a surrounding rocky planet and determined next steps by recognizing and communicating with a distress beacon. SIMA 2 also uses Gemini for internal inference. In another game, when asked to walk to a house the color of a ripe tomato, the agent expressed the idea that “ripe tomatoes are red, so we should go to the red house,” and then found the house and approached it.
Being powered by Gemini also means that SIMA 2 follows instructions based on emojis. “When you tell us 🪓🌲, we will cut down the tree,” Marino said.
Marino also demonstrated how SIMA 2 navigates a newly generated photorealistic world by DeepMind’s world model, Genie, correctly identifying and interacting with objects such as benches, trees, and butterflies.

Gemini also enables self-improvement without requiring a lot of human data, Marino added. While SIMA 1 was trained entirely on human gameplay, SIMA 2 uses it as a baseline to provide a powerful initial model. When the team places the agent in a new environment, it asks another Gemini model to create a new task and creates another reward model that scores the agent’s attempts. Using these self-generated experiences as training data, the agent learns from its mistakes and gradually improves its performance, teaching itself new behaviors through trial and error just as humans do, guided by AI-based feedback instead of humans.
DeepMind sees SIMA 2 as a step toward expanding the possibilities of more versatile robots.
“When you think about what a system needs to do to perform a task in the real world, such as a robot, I think there are two components to the system,” Frédéric Besse, a senior staff research engineer at DeepMind, said in a press conference. “First, it requires a sophisticated understanding of the real world and what needs to be done, as well as some amount of reasoning.”
If you ask a humanoid robot at home to go check how many cans of beans are in the cupboard, the system needs to understand all the different concepts (what are beans, what is a cupboard) and navigate to that location. Besse said SIMA 2 focuses more on high-level movements than low-level actions, which he calls controlling things like physical joints and wheels.
The team declined to share a specific timeline for implementing SIMA 2 into physical robotic systems. Besse told TechCrunch that DeepMind’s recently unveiled robotics foundation model, which can reason about the physical world and even create multi-step plans to complete a mission, was trained separately in a different way than SIMA.
There’s also no timeline for a release beyond SIMA 2’s preview, but Wang told TechCrunch that the goal is to show the world what DeepMind has been working on and see what kinds of collaborations and potential uses are possible.
