NewsAnthropicClaude

Anthropic's Claude Just Became the First AI to Autonomously Hack a Corporate Network

The UK's AI Safety Institute confirms Claude Mythos completed a 32-step cyber attack from reconnaissance to full domain takeover. Frontier AI offense capability is now doubling every four months.

Alex Chen6 min read(Updated: )
Anthropic's Claude Just Became the First AI to Autonomously Hack a Corporate Network

The UK's AI Safety Institute (AISI) dropped a finding last week that should make every security team stop and read carefully. Anthropic's Claude Mythos Preview became the first AI model to clear "The Last Ones," a 32-step simulated corporate network attack that runs from initial reconnaissance all the way to full domain takeover.

For those who haven't tracked Anthropic's recent releases: Claude Mythos is the company's most advanced reasoning model, announced in April 2026 as a limited research preview. It uses what Anthropic calls "extended deliberation trees", the model explores multiple reasoning paths simultaneously and prunes unpromising branches before committing to an answer. The design goal was better performance on complex scientific and mathematical reasoning, particularly in biology and chemistry. Offensive cybersecurity was not on the intended capability list. Nobody at Anthropic expected the model to autonomously execute a full penetration test.

In plain language: an AI autonomously hacked a realistic corporate environment. Not a toy CTF challenge. A simulation modeled on real enterprise architecture, Active Directory domains, segmented networks, hardened endpoints, that typically takes a human red team operator about 20 hours to complete.

Mythos succeeded in 3 out of 10 runs. That 30% success rate might not sound high, but the benchmark wasn't designed for AI at all. It was built for human penetration testers as a certification-level challenge. The fact that a language model cleared it at all, let alone on three separate attempts, rewrites assumptions about what's possible with current technology.

How the AISI discovered this

The AISI wasn't specifically testing for offensive cyber capability. Claude Mythos was going through a standard capability evaluation suite that Anthropic voluntarily submitted the model for, as part of the UK's frontier AI testing framework established at the 2023 Bletchley Park AI Safety Summit. The suite covers biology, chemistry, cybersecurity, and autonomous replication. The cybersecurity track includes "The Last Ones" as its hardest benchmark. When Mythos cleared it on the first run, the AISI team assumed a configuration error. They ran it nine more times under varying conditions. It succeeded on two additional attempts.

That detail matters because it reveals the shape of the capability. An AI that can breach a network 30% of the time may not sound reliable enough to worry about, until you consider that an attacker can run it against a target a hundred times and take the three successes. In cybersecurity, the defender has to win every time. The attacker only needs to win once.

What "The Last Ones" actually tests

The benchmark isn't a single challenge. It's a 32-step campaign that chains together the full kill chain: passive reconnaissance, service enumeration, vulnerability identification, exploit selection, privilege escalation, lateral movement, credential dumping, persistence establishment, and domain controller compromise. Each step requires different tools, different knowledge domains, and different decision-making.

A human red teamer going through this would fire up nmap for scanning, Metasploit for exploitation, Mimikatz for credential extraction, and BloodHound for Active Directory pathfinding. Mythos did all of this through API calls and tool invocation, the model decided which tool to use at each step, interpreted the output, and planned the next move.

The 7 runs where Mythos failed are as interesting as the 3 where it succeeded. In most failures, it got stuck after triggering a defensive alert, the model couldn't adapt when the network started actively responding. A human operator would try a different approach; Mythos kept attempting variations on the same technique. Defensive adaptation is still a human advantage, for now.

Anthropic's response

Anthropic released a statement within 48 hours. The company acknowledged the AISI findings, disclosed that it had not observed similar behavior in internal red-teaming before the evaluation, and announced it was restricting access to Mythos Preview while the safety team investigated. The statement noted that Anthropic's policies already prohibit using Claude for unauthorized system access, but effectively admitted that the policy framework hadn't anticipated the model developing this capability autonomously.

The company's chief security officer said Anthropic was "working closely with the AISI to understand the capability's origins and develop appropriate containment measures." No timeline was given for when Mythos Preview access would resume. For a company that has positioned itself as the safety-first AI lab, having your flagship reasoning model clear a human red-team benchmark is an uncomfortable moment. The response so far has been measured, but the question hanging over it is whether safety testing frameworks need to be restructured to catch capabilities that emerge from general reasoning improvements rather than from specific training.

The numbers that worry people

Mythos hit a 73% success rate on expert-level offensive tasks. Three weeks later, OpenAI's GPT-5.5 followed with a 71.4% rate on the same evaluation set. Two different labs, two different architectures, nearly identical offensive capability. This convergence suggests the capability isn't a fluke of Anthropic's training approach, it's a natural byproduct of building models that reason well.

The AISI now estimates frontier cyber-offense capability is doubling every four months. At the end of 2025, they had the doubling rate at seven months. The acceleration is the story, not the absolute numbers. If the four-month rate holds through early 2027, tasks that currently need a well-resourced state actor will be within reach of mid-tier criminal groups sometime next year. Mid-tier groups move faster than states, take more risks, and are harder to deter through diplomatic channels.

What this means for enterprise security

The most uncomfortable implication isn't about state actors or criminal groups. It's about the everyday security posture of companies already deploying AI agents internally. An HR agent with access to employee records. A finance agent connected to the ERP system. A customer support agent with database query privileges. These agents have legitimate access. The Mythos finding raises an uncomfortable question: what happens when an AI we already deployed and authorized does something we didn't intend?

The line between "agent doing its job" and "agent doing something dangerous" comes down to prompt boundaries, permission scoping, and execution guardrails, none of which are standardized. Most enterprise agent deployments I've seen have authorization models that would make a security architect uncomfortable. The lesson from Mythos isn't "don't deploy agents." It's "deploy them with the same security architecture you'd use for a human with admin access: least privilege, activity logging, and independent review of sensitive actions."

The dual-use problem, made concrete

This finding lands in the middle of an ongoing tension at Anthropic. The company was recently excluded from new Pentagon classified AI contracts after refusing to loosen restrictions around autonomous weapons and mass surveillance. Anthropic sued and won a temporary injunction, arguing the exclusion was procedurally improper. Meanwhile, OpenAI launched Daybreak, an AI-assisted cyber defense tool that finds vulnerabilities and generates patches automatically.

The irony is sharp: the same capability that finds zero-days can write exploits for them. There's no technical firewall between offense and defense in AI. Better reasoning cuts both ways. Every improvement in Claude's ability to understand code also improves its ability to find vulnerabilities in code. Every advancement in planning and tool use also advances the capacity for autonomous attack chains.

Anthropic's position, build the safest models, refuse weaponization contracts, but acknowledge that general intelligence includes offensive capability, is the most honest approach available. It's also the hardest to explain to a public that wants clear distinctions between good AI and bad AI.

What security teams should do now

The practical takeaway for organizations isn't to panic about AI hackers breaching your network next week. It's to recognize that the patch window is shrinking. If offense capability doubles every four months, a vulnerability that would take a sophisticated actor 8 months to exploit today will take 2 months to exploit a year from now. Most enterprise patch cycles run on 90-day SLAs. Those numbers are about to stop working.

Defensive AI tools like Daybreak help, automated vulnerability scanning and patch generation can compress the defense side of the equation too. But the asymmetry favors offense: attackers only need to find one way in; defenders need to close every path. AI makes both sides faster, but speed benefits the attacker more than the defender, at least in the near term.

One thing the Mythos finding settles: the era when AI companies could claim they didn't know what their models were capable of is over. When a model clears a benchmark designed for human penetration testers, nobody can credibly plead ignorance. The legal and regulatory implications of that shift will take years to sort out. Security teams should treat this AISI finding as a deadline, not a warning.