Since 2024, Anthropic’s performance optimization team has given job applicants a take-home test to make sure they know their stuff. But as AI coding tools have improved, the test has had to change a lot to stay ahead of AI-assisted cheating.
Team leader Tristan Hume describes the history of the challenge In a blog post on Wednesday. Hume wrote, “Each new clad model obliges us to redesign the experiment.” “When given the same time frame, Claude Opus 4 outperforms most human applicants. It still allows us to distinguish strong candidates — but then, Claude Opus 4.5 matches them.”
The result is a serious candidate-evaluation problem. Without in-person proctoring, there’s no way to be sure someone isn’t using AI to cheat on a test — and if they do, they’ll be on top quickly. Hume wrote, “Under the limitations of home-taken tests, we no longer had the means to distinguish between the output of our top candidates and our most capable models.”
AI cheating is already there Destruction of schools and universities Around the world, so ironic that even AI labs have to deal with it. but is uniquely equipped to deal with anthropic problems.
In the end, Hume designed a new experiment that had less to do with optimizing hardware, making it novel enough to stump contemporary AI tools. But as part of the post, he shared the original test to see if anyone could come up with a better solution for reading.
“If you can make Opus 4.5 the best,” the post reads, “we want to hear from you.”