Last month, I wrote about Mercor’s new benchmarks that measure the ability of AI agents in specialized tasks such as law and corporate analysis. The scores at the time were pretty dire, with all major institutes scoring below 25%. Therefore, we conclude that lawyers are safe from AI exclusion, at least for now.
But AI capabilities can change significantly in a matter of weeks.
This week’s release of Opus 4.6 rocked the leaderboards, with Anthropic’s new model scoring just under 30% in the one-shot trial, and an average score of 45% after solving a few more problems. Notably, this release includes a number of new agent features, including “agent swarms,” which can be useful for this type of multi-step problem resolution.
Either way, this score is a big jump from the previous state-of-the-art and shows that the underlying model’s progress is not slowing down. Melco CEO Brendan Foudy was particularly impressed, saying: “To go from 18.4% to 29.8% in a matter of months is insane.”

30 percent is still a long way from 100 percent, so we don’t have to worry about lawyers being replaced by machines next week. But they should be far less confident than they were last month.
