Meta AI security researcher said OpenClaw agent is rampant in inboxes

At first glance, the much-talked about meta-AI security researcher Summer Yue’s X post reads like satire. She instructed the OpenClaw AI agent to review her overcrowded email inbox and suggest what to delete or archive.

The agent continued his rampage. Ignoring the cease and desist orders from my phone, I began “speedrunning” deleting all of her emails.

“I had to run to my Mac mini like I was defusing a bomb,” she wrote, posting an image of the ignored stop prompt as a receipt.

The Mac Mini is an affordable Apple computer that sits flat on your desk and fits in the palm of your hand, and is a popular device these days for running OpenClaw. (The Mini is selling “like hotcakes,” one “bewildered” Apple employee told prominent AI researcher Andrej Karpathy when he bought the Mini to run an OpenClaw replacement called NanoClaw.)

OpenClaw is, of course, an open-source AI agent that rose to fame through its AI-only social network, Moltbook. The OpenClaw agent was a central figure in the now largely debunked Maltbook episode in which it appeared as though AI was conspiring against humans.

But OpenClaw’s mission is not focused on social networks, according to its GitHub page. We aim to be a personal AI assistant that runs on your device.

The Silicon Valley crowd went crazy for OpenClaw, and “claws” and “claws” became buzzwords for agents running on personal hardware. Other such agents include ZeroClaw, IronClaw, and PicoClaw. Y Combinator’s podcast team appeared in our latest episode wearing lobster costumes.

tech crunch event

boston, massachusetts
|
June 9, 2026

But Yue’s post serves as a warning. As other users of X have pointed out, if AI security researchers can run into this problem, what hope is there for mere humans?

“Did you intentionally test the guardrails or did you make a rookie mistake?” a software developer asked her about X.

“It was a rookie mistake,” she replied. She was testing the agent with what she called a small “toy” inbox, but it worked fine with less important emails. Since it gained her trust, she thought to release it for real.

Yue wrote that she believes the large amount of data in her actual inbox caused the “compression.” Compression occurs when the context window (the execution record of everything the AI has said and done in a session) grows so large that the agent begins to summarize, compress, and manage the conversation.

At that point, the AI could skip instructions that humans would consider extremely important.

In this case, she may have skipped the last prompt that told her not to take action and returned to the instructions from her “toy” inbox.

As several others at X have pointed out, you can’t trust prompts to act as security guardrails. The model can misunderstand or ignore them.

Various people provided suggestions ranging from the exact syntax that Yue should use to stop the agent to various ways to better adhere to the guardrails, such as writing the instructions to a dedicated file or using other open source tools.

In the interest of full transparency, TechCrunch could not independently verify what happened to Yue’s inbox. (She did not respond to our request for comment, but answered many questions and comments sent to X.)

But that doesn’t really matter.

The gist of this story is that agents targeting knowledge workers are dangerous at their current stage of development. Those who say they are making good use of them have put together ways to protect themselves.

Someday, perhaps soon (by 2027? 2028?), it may become widely available. We know that many of us want help with things like email, ordering groceries, and making dentist appointments. But that day has not come yet.

Source link

What's Hot

Anthropic accuses Chinese AI lab of mining Claude as US debates exporting AI chips

I think we can double this business.

Panama abandons China-related port agreement, transfers canal terminal to Maersk, MSC

Anthropic accuses Chinese AI lab of mining Claude as US debates exporting AI chips

With the advent of AI, investor loyalty is (almost) gone. At least 12 OpenAI VCs also back Anthropic

Spotify rolls out AI-powered prompt playlists to UK and other markets

Russia-Ukraine War: List of major events, day 1,461 | Russia-Ukraine War News

President Trump’s new tariff threats cause economic uncertainty. Trade deal impasse trade war news

Three killed in new attack on US military ship in Caribbean, Pentagon says | Military News

Anthropic accuses Chinese AI lab of mining Claude as US debates exporting AI chips

With the advent of AI, investor loyalty is (almost) gone. At least 12 OpenAI VCs also back Anthropic