NewsAnthropicClaude

Anthropic's Claude Just Became the First AI to Autonomously Hack a Corporate Network

The UK's AI Safety Institute confirms Claude Mythos completed a 32-step cyber attack from reconnaissance to full domain takeover. Frontier AI offense capability is now doubling every four months.

AI Learning Hub2 min read

The UK's AI Safety Institute dropped a finding last week that should make every security team pay attention. Anthropic's Claude Mythos Preview became the first AI model to clear "The Last Ones" — a 32-step simulated corporate network that goes from initial reconnaissance all the way to full domain takeover.

In plain language: an AI autonomously hacked a realistic corporate environment. Not a toy CTF challenge. A simulation that typically takes a human red team about 20 hours to complete.

Mythos succeeded in 3 out of 10 runs. That might not sound high, but the benchmark wasn't designed for AI at all. It was built for human penetration testers. The fact that an AI cleared it at all — let alone in three separate runs — rewrites what we thought was possible.

The numbers that worry people

Mythos hit a 73% success rate on expert-level offensive tasks. Three weeks later, OpenAI's GPT-5.5 followed with a 71.4% rate on the same tasks. Two different labs, two different architectures, nearly identical offensive capability.

The AISI now estimates frontier cyber-offense capability is doubling every four months. At the end of 2025, they had it at seven months. The acceleration is the story, not the absolute numbers.

What does doubling every four months actually mean? If the rate holds, tasks that currently require a well-resourced threat actor will be within reach of mid-tier criminal groups sometime next year. And mid-tier groups don't have the same constraints that state actors do. They're faster, less careful, and harder to deter.

What this doesn't mean

It doesn't mean Claude or ChatGPT will hack your laptop while you sleep. These are frontier models running in controlled test environments. The AISI benchmark simulates a corporate network, not the open internet. Real-world attacks involve messy variables — custom configurations, legacy systems, weird network topologies — that don't exist in a clean simulation.

It also doesn't mean Anthropic or OpenAI built these models to hack things. Both companies have safety teams and red-teaming processes. The capability emerged from general-purpose reasoning improvements, not from training models on offensive security tasks. That's actually the more important point: better reasoning means better hacking, whether anyone intended it or not.

The context

This finding lands in the middle of several related stories. Anthropic was excluded from new Pentagon classified AI contracts after refusing to loosen restrictions around autonomous weapons and mass surveillance. The company sued and won a temporary injunction. Meanwhile, OpenAI launched Daybreak, an AI-assisted cyber defense tool that finds vulnerabilities and generates patches.

The same capability that finds zero-days can write exploits for them. There's no firewall between offense and defense in AI. Better reasoning cuts both ways, and the doubling-every-four-months trend suggests the window between vulnerability discovery and exploitation is collapsing faster than most organizations' patch cycles.