Microsoft built a fake marketplace to test its AI agent - and it failed in a surprising way - WhistleBuzz – Smart News on AI, Business, Politics & Global Trends

On Wednesday, Microsoft researchers released a new simulation environment designed to test AI agents, along with new research showing that current agent models may be vulnerable to manipulation. The study, conducted in collaboration with Arizona State University, raises new questions about how well AI agents perform when working without supervision, and how quickly AI companies can realize the promise of their future.

The simulation environment, named “Magentic Marketplace” by Microsoft, is built as a synthesis platform for experimenting with AI agent behavior. In a typical experiment, a customer agent might try to order dinner according to a user’s instructions, while agents representing different restaurants compete to get the order.

The team’s first experiment involved 100 individual customer-side agents interacting with 300 business-side agents. Because the Marketplace source code is open source, it is easy for other groups to adapt the code to run new experiments and reproduce the results.

Ece Kamar, managing director of Microsoft Research’s AI Frontiers Lab, said this type of research will be important for understanding the capabilities of AI agents. “There are real questions about how the world changes when these agents work together and talk to each other and negotiate with each other,” Kamal said. “We want to understand these things deeply.”

In our initial research, we investigated a combination of key models, including GPT-4o, GPT-5, and Gemini-2.5-Flash, and discovered some surprising weaknesses. Specifically, researchers have discovered several techniques that companies can use to manipulate customer agents into purchasing their products. Researchers found that efficiency decreased, especially as customer agents had more options to choose from and vast amounts of agent attention space.

“We want these agents to help us work through a lot of options,” Comer says. “And we find that the current model is actually overwhelmed by too many options.”

Agents also encountered problems when asked to work together toward a common goal. Apparently, they didn’t know which agent should play what role in the collaboration. Although giving the model clearer instructions on how to collaborate improved performance, the researchers believed that the model’s unique features still needed improvement.

tech crunch event

san francisco
|
October 13-15, 2026

“You can instruct a model step-by-step, just like you would teach a model,” Comer says. “But if you’re essentially testing collaborative features, you would expect these models to have those features by default.”

Source link

What's Hot

Israel approves reopening of Rafah crossing in Gaza after being closed for nearly two years, official announced

Winmau World Darts Masters: Luke Littler defeats Luke Humphries in final set thriller to win TV title in Milton Keynes | Darts News

Disney (DIS) Q1 earnings

Microsoft built a fake marketplace to test its AI agent – and it failed in a surprising way

These AI note-taking devices help you record and transcribe meetings

AI staff reduction or “AI cleaning”? |Tech Crunch

India to cut taxes to zero until 2047 to attract global AI workloads

Cuba denies accusations of security threat as US increases pressure | Political News

President Trump to close Kennedy Center for renovations following backlash from performers | 2020 Donald Trump News

5-year-old boy and father detained by ICE return to Minnesota | Migration News

These AI note-taking devices help you record and transcribe meetings

AI staff reduction or “AI cleaning”? |Tech Crunch

India to cut taxes to zero until 2047 to attract global AI workloads

What's Hot

Microsoft built a fake marketplace to test its AI agent – and it failed in a surprising way

Related Posts

Subscribe to Updates