With shipping agent functionality becoming a staple among Foundation model companies, Anthropic is releasing Claude Sonnet 5, a more powerful, agent-like version of the Lab’s medium model.
“You can plan, use tools like browsers and devices, and execute autonomously at a level that required larger, more expensive models just a few months ago,” Anthropic said in a blog post.
This framework mirrors what OpenAI and Google have said about their recent releases. OpenAI’s GPT-5.6 Sol, released in preview last week, is also the company’s most agent-like model to date, allowing users to split work among subagents for long-running autonomous tasks. Google’s Gemini 3.5 Flash, launched in May, was pitched as a transition from conversational chatbots to agent tools that plan, build, and iterate on actual work with minimal human input.
The Sonnet 5 pitch confirms that agent capabilities are the new baseline expectation at all price points. The differentiator will no longer be about who can best perform an agent’s work, but rather how cheaply and reliably an agent’s work can be performed without human oversight.
Sonnet 5 promises performance close to Opus 4.8, but at a much lower cost. Starting Tuesday, Claude Sonnet 5 will be the default model for Free and Pro plans and available for all subscriptions.
At launch, Sonnet 5 will be priced at $2 per million input tokens and $10 per million output tokens until August 31st, after which it will jump to $3 per million input tokens and $15 per million output tokens. This makes Sonnet 5 cheaper than Opus 4.8, and cheaper than OpenAI’s GPT-5.5 and Google’s Gemini 3.1 Pro. (Still more expensive than Gemini 3.5 Flash.)
Anthropic says the new model offers significant improvements in agent performance, including inference, tool usage, software coding, and knowledge work, compared to the previous version, Sonnet 4.6, released in February.
For example, in one benchmark, the agent coding score is 63.2% for Sonnet 5, compared to 69.2% for Opus 4.8 and 58.1% for Sonnet 4.6. In the Knowledge Work benchmark, Sonnet 5 actually slightly outperformed Opus 4.8. Opus 4.8 is known for successfully solving the most difficult problems, including delicate judgment and in-depth investigation.
“While Opus 4.8 remains the model of choice for higher accuracy in these tasks, Sonnet 5 provides developers with a much higher quality and lower cost option than was previously available,” says Anthropic. “Between Sonnet 5 and Opus 4.8, users can adjust their level of effort to find the right balance between cost and performance.”
According to a tester cited in the blog post, the Sonnet 5 is also good at completing complex tasks where previous model versions would get stuck, “checking its own output without being explicitly asked to do so.”
“We tasked Claude Sonnet 5 with a two-part job of updating Salesforce account tiers and sending launch notifications to company contacts, and it was completed end-to-end,” Daniel Shepherd, senior engineer at Zapier, said in a statement. “Before, we would stop in the middle. With routine automation, this is easy.”
In terms of safety, Sonnet 5 also has a lower rate of “undesirable behavior” such as exploitation and cooperation in deception than previous generations, making it safer to use in agent contexts. It is good at rejecting malicious requests and avoiding hijacking attempts in prompt injection attacks. They also hallucinate and act sycophantic at a lower rate than in Sonnet 4.6.
That said, it’s not on the same level as Opus 4.8 or Claude Mythos Preview when it comes to inconsistent behavior. “Evaluations also show that it is far less capable of performing risky cybersecurity tasks than the current Opus model,” the blog post reads.
Lovable co-founder Fabian Hedin said in a statement that Claude Sonnet 5 “clearly and consistently rejects insecure requests.”
“At Lovable, we put a powerful tool in the hands of millions of builders,” says Hedin. “The model you know when to say no is just as important as the model you know how to build.”
Updated to correct that starting August 31st, the price of output tokens will be $15 per million output tokens.
If you buy through links in our articles, we may earn a small commission. This does not affect editorial independence.
