Fable 5 just set a new performance record for AI automation – but it can’t replace humans just yet

Follow ZDNET: Add us as a favorite resource on Google.
Highlights taken by ZDNET
- Legend 5 speeds up the AI success rate in remote missions to 16%.
- AI skills are all over the map.
- Still, the agent’s skills “quadrupled in less than eight months,” CAIS said.
After a short hiatus, Anthropic’s recommended Fable 5 model is back, and it resets the bar for automatic performance.
The US government also approved the model — which Anthropic says shares similarities with the Mythos 5, still only available for use by select organizations — on June 30. But before it was released, the Center for AI Safety (CAIS) tested the Fable 5 on the Remote Labor Index (RLI), released in October 2025. It blew out Opus’s 8 GPT-5.5, each relatively new and it is considered beautiful, it comes out of the water.
Also: How to beat the AI algorithm and get your dream job
RLI measures “how often AI agents can complete independent, economically significant projects […] with a quality that a paying client would really accept,” explains CAIS in the study. This can include computer-aided design and photography, data analysis, video work, and more. As with other similar human skill tests, each of the deliverables of the intoxicating models is evaluated by humans according to the level of expertise it delivers. The resulting level of automation reflects the distribution of projects where the testers found that the AI produces work as good or better than a trained human.
CAIS asked Fable 5, GPT-5.5, and Opus 4.8 to design a 3D illustration of an engagement ring, create a video ad, and map a floor plan, among other exercises. The researchers provided each model with human-generated input files to get you started, similar to how you would prepare a human freelancer with the appropriate documentation and job information.
Also: The Anthropic Mythos is advancing faster than expected, reports the AI security agency
The Legend 5 achieved an automatic rate of 16.1%, a benchmark record — and double the Opus 4.8, which scored 8.3%. The GPT-5.5 came in third with 6.3%, but CAIS noted that all three models scored the highest of all models tested so far.
“In context, the previously published leader sits at 4.17% (Opus 4.6 and Claude Cowork scaffold), and the platform is out at 2.5% at the time of the RLI release,” CAIS said. “The margin has more than quadrupled in less than eight months, which is a tangible sign of how fast AI agents are advancing the economy.”
Automation ratings are measured by CAIS against its RLI benchmark.
CAIS
CAIS noted that its tests were interrupted by the government’s shutdown of Fable 5 in mid-June, but that these partial results put the model aside.
“Even under the worst case scenario that Fable 5 failed all the lost projects, its default rate would still be 14.6%, higher than any other model,” the researchers said.
What does this mean for freelancers
Although the rate of acceleration of the AI model is significant in just a few months, that does not automatically translate into independent work change or loss across the board. Sixteen percent is not close to 100%. Besides, despite the obvious benefits, AI is not an attractive solution for every organization; security concerns and other barriers to adoption often make integrating AI tools a slow, multi-step process for many companies, at least to begin with. To fully replace freelancers, organizations would likely need a network of agents to monitor factors such as work quality, budget, and timeline; the tradeoff is not one-to-one.
Also: I sent Gemini and Claude to write my email responses – but only one sounded like me
CAIS tried to replace the examiner with an “LLM judge,” ostensibly to see how far this examination is from a person-in-the-loop, but the model failed.
“Assessment of RLI deliverables is itself a complex, agent-based task,” explains CAIS. “Doing it right means opening the project files in the right applications, using those programs correctly, and making a decision the way the client can, the very computer skills that today’s agents are so weak at.”
Also: How to set OpenAI API usage limits to stop agent overspending and other AI billing nightmares
That said, developing skills can reduce some of the opportunities for independence for some companies that have successfully integrated AI. Furthermore, if computer skills are a current limitation and are ready to improve based on industry investment in growing agency models, that barrier may eventually disappear. According to the models that have been developing in other benchmarks that measure the ability of the agent, it may come sooner than we think.
Speaking of time: CAIS also found that if a task takes a long time for a human, that doesn’t mean it will be difficult for AI to complete. That horizon analysis is true for coding, for example, but not the broader list of RLI steps for remote tasks. At present, it is difficult to draw conclusions about the future.
“Some quick jobs for a skilled technician are far away [for AI]such as writing music or real-time game testing, while other tasks that would take human hours, such as digital art or coding, are completed by current models in minutes,” CAIS wrote.



