Anthropic must continue to revise their technical interview tests so Claude can't cheat

Since 2024, Anthropic’s Performance Optimization team has been giving job candidates take-home tests to check their knowledge. But as AI coding tools have improved, testing has had to change significantly to stay ahead of AI-powered fraud.

Team leader Tristan Hume explained the history of the challenge in a blog post Wednesday. “Each time a new Claude model appeared, the tests had to be redesigned,” Hume writes. “Given the same time limit, Claude Opus 4 outperformed most human applicants. It was still able to distinguish the strongest candidates, but then Claude Opus 4.5 even matched those applicants.”

This results in serious problems in evaluating candidates. Without in-person proctoring, there is no way to tell if someone is using AI to cheat on an exam. If a person cheats, he or she will quickly rise to the top. “Under the constraints of the take-home test,” Hume writes, “there was no longer any way to distinguish between the accomplishments of the best candidates and the most competent models.”

The issue of AI cheating is already causing havoc in schools and universities around the world, so it’s ironic that AI labs are also having to deal with it. But Anthropic is also uniquely equipped to address this issue.

Ultimately, Hume designed a new test that had less to do with hardware optimization and was novel enough to overwhelm modern AI tools. However, as part of the post, he shared his original test to see if anyone reading could come up with a better solution.

“If you can achieve Opus 4.5, we’d love to hear from you,” the post reads.

Source link

What's Hot

Amazon has deployed enough satellites to launch Leo service this year

Türkiye blocks US LGBTQ+ cruises from entering port due to ‘moral standards’

Russia launches large-scale attack on Ukraine as Poland scrambles jets

Anthropic must continue to revise their technical interview tests so Claude can’t cheat

Anthropic is in talks with Samsung about new custom chips

Meta quietly launches vibe-coded gaming app Pocket

OpenAI proposed donating 5% of its stock to a US sovereign wealth fund

Trump administration renews pressure on International Criminal Court | ICC News

Why did President Trump refuse to renew USMCA and what does it mean? | Commentary News

U.S. judge sided with NAACP on mail-in voting restrictions | Election News

Anthropic is in talks with Samsung about new custom chips

Meta quietly launches vibe-coded gaming app Pocket

OpenAI proposed donating 5% of its stock to a US sovereign wealth fund

What's Hot

Anthropic must continue to revise their technical interview tests so Claude can’t cheat

Related Posts

Subscribe to Updates