Cybersecurity researchers aren't satisfied with Anthropic fable's guardrails

Anthropic released its latest model, Fable, on Tuesday, touting it as a public and limited edition of its powerful and highly touted cybersecurity model, Mythos.

However, not everyone is happy with this restriction, with many cybersecurity researchers and experts voicing their complaints online.

“[Fable]denies any request that might have something to do with cyber, even something as innocuous as reading a blog post,” said Valentina “Chompy” Palmiotti, a prominent security researcher who works at IBM X-Force.

If the prompt triggers a guardrail, Fable will pause the chat and say, “Due to safety precautions, this message has been flagged as a cybersecurity or biology topic.”

The guardrails were put in place to limit the risk of Fable being used to develop malware or compromise software, a long-standing concern within Anthropic. Restrictions on biology stem from similar concerns regarding the development of biological weapons.

When the AI giant released Mythos in April, it limited the model to a limited number of businesses and organizations in a project called Project Glasswing, an effort to deploy the model to protect critical software and infrastructure. Last week, Anthropic expanded access to Mythos to hundreds of organizations in 15 countries.

But despite good intentions, many cybersecurity experts remain uncomfortable with the haphazard nature of the restrictions. “If you ask them to write secure code, they’ll think it’s cybersecurity work rather than software engineering best practices, and they’ll demote it,” cybersecurity veteran Matt Swish told TechCrunch. Fable is programmed to fall back to Claude Opus 4.8 if it hits a guardrail. “It seems to be keyword-based, so anything in the vocabulary area of ‘cybersecurity’ will trigger guardrails.”

inquiry

Want more information on how hackers are using AI? Or how are cybersecurity companies leveraging AI? We’d love to hear from you. You can contact Lorenzo Franceschi-Bicchierai securely from any non-work device or network on Signal (+1 917 257 1382), Telegram and Keybase @lorenzofb, or email.

“But we’re still in the early stages and they’re still adapting the guardrails, so that’s understandable. I’m sure it will evolve over time as Anthropic and other frontier model companies collaborate more with today’s new generation of cybersecurity companies,” said Suiche, who is part of the technical staff at AI cybersecurity startup Tolmo. “When you do a stocking like this, it’s better to catch more people and loosen the guardrails over time than not catch enough people.”

Another researcher complained to X that “even requiring a code review” would trigger Fable’s guardrails.

Anthropic did not immediately respond to a request for comment.

Aside from the guardrails in our model, Anthropic also requires cybersecurity professionals to apply for a cyber validation program. If approved, applicants will have fewer restrictions on using Claude for cybersecurity work. OpenAI has a similar program called Trusted Access for Cyber.

If you buy through links in our articles, we may earn a small commission. This does not affect editorial independence.

Source link

What's Hot

Taiwan tests rocket firing towards China from US-supplied mobile launch system

Datadog veteran launches AI coding startup Niteshift to combat AI lock-in at scale

Strategy’s drop signals first real stress test for Bitcoin government bond trading

Cybersecurity researchers aren’t satisfied with Anthropic fable’s guardrails

Datadog veteran launches AI coding startup Niteshift to combat AI lock-in at scale

How memory tools make AI models worse

Decart’s new world model can simulate hours of photorealistic driving, but with some caveats

After standoff with Democrats, President Trump signs $70 billion immigration enforcement bill | Donald Trump News

President Trump hardens his stance on Iran, saying attacks “could continue” | US-Israel war against Iran News

Apaches repelled, the fight continues: What the latest US-Iranian attacks mean | US-Israel war against Iran News

Datadog veteran launches AI coding startup Niteshift to combat AI lock-in at scale