BIP NYC NEWS

collapse
Home / Daily News Analysis / Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

May 24, 2026  Twila Rosenbaum  6 views
Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

Anthropic's latest large language model, Claude Mythos Preview, has ignited a critical debate in cybersecurity: can a machine designed to break into systems ever be safely used only for defense? Unveiling the model on April 7, Anthropic described Mythos as performing strongly across general tasks, but with a striking specialization in computer security. According to the company, the model can identify and exploit zero-day vulnerabilities in every major operating system and Web browser at user direction, including subtle and historically difficult-to-detect flaws. One example involved a patched 27-year-old vulnerability in OpenBSD, demonstrating the model's ability to resurrect and weaponize aging security gaps.

The announcement comes with a set of controls and a new initiative called Project Glasswing, a partnership involving Apple, AWS, Microsoft, Palo Alto Networks, CrowdStrike, and dozens of other organizations. Anthropic has committed $100 million in Mythos Preview usage credits to Project Glasswing and $4 million in direct donations to open source security efforts. The goal is to empower defenders to find and patch vulnerabilities before attackers can exploit them.

The Rise of Autonomous Exploit Generation

Mythos Preview's security capabilities emerged as a downstream consequence of broader improvements in code reasoning and logic, rather than an explicit design goal. In one demonstration, the model wrote a Web browser exploit that chained together four vulnerabilities, composing a complex JIT heap spray that escaped both the renderer sandbox and the operating system sandbox. It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR bypasses. Another example showed the model writing a remote code execution exploit for FreeBSD's NFS server, splitting a 20-gadget ROP chain over multiple packets to grant full root access to unauthenticated users.

These feats are not merely theoretical. The company claims it has identified thousands of high-risk and critical vulnerabilities through Mythos and is responsibly disclosing them to affected vendors. However, the dual-use nature of such capabilities is impossible to ignore. As Forrester senior analyst Erik Nost noted, the release serves both as a public relations statement that Anthropic's AI can reshape cybersecurity and as a call to action highlighting the vulnerability detection gaps that have plagued the industry for decades.

Project Glasswing: A Defensive Shield

Recognizing the potential for misuse, Anthropic launched Project Glasswing alongside the model. The initiative brings together more than 40 organizations to scan and secure first-party and open source systems using Mythos Preview. Palo Alto Networks chief product and technology officer Lee Klarich described early results as compelling, though specific findings have not been published.

Project Glasswing is not merely a technological push; it includes governance and access controls. Anthropic has limited Mythos Preview access to vetted partner organizations, and the company monitors usage patterns for signs of abuse. Still, experts remain skeptical about the long-term effectiveness of these measures. Julian Totzek-Hallhuber, senior principal solution architect at Veracode, pointed out that because no definitive answer exists for how to keep such tools out of attacker hands, defenders should assume the capability will proliferate. He urged investment in detection over prevention, identification of behavioral signatures associated with AI-assisted exploitation, and adoption of zero-trust architectures combined with aggressive patching cycles and anomaly-based detection.

The Arms Race Intensifies

The emergence of exploit-writing AI accelerates the already rapid pace of vulnerability discovery and weaponization. Traditionally, finding and exploiting zero-days required deep expertise, time, and manual effort. Mythos Preview reduces the barrier to entry, although Anthropic emphasizes that users still need to provide proper prompts. Nevertheless, the company stated that one does not need to be a security engineer to direct the model effectively.

Melissa Ruzzi, director of AI at AppOmni, offered a sobering perspective: no one can ever keep anything completely out of attackers' hands. The best that can be done is to make it more difficult for them to gain access. This sentiment echoes throughout the industry, where penetration testing tools like Cobalt Strike have long been dual-use—legitimate for authorized tests but often abused by threat actors.

Anthropic's blog post acknowledged the risk explicitly, stating that the same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them. This paradoxical reality forces defenders to rethink traditional security postures. Instead of relying solely on prevention, they must assume that adversaries will eventually obtain similar capabilities and focus on resilience, detection, and rapid response.

Skepticism and the Need for Independent Validation

Despite the impressive demonstrations, the security community has called for independent verification. Totzek-Hallhuber emphasized that Anthropic controls both the model and the narrative. Independent replication is impossible when the model is not publicly available. Until independent researchers with access can run their own evaluations, healthy skepticism is warranted. The claims cannot be tested, so they cannot be fully trusted or refuted.

Dark Reading contacted Anthropic for statistics regarding false positives and error rates but did not receive a response by press time. This lack of transparency adds to the uncertainty. While the early examples are compelling, two data points do not make a pattern. The restricted access model means that the community must rely on Anthropic's word alone, which is insufficient for building trust in such a powerful tool.

The broader industry is watching closely. If Mythos Preview performs as advertised, it could fundamentally transform vulnerability management, enabling defenders to patch flaws faster than ever. But if it falls into the wrong hands—either through a leak, targeted theft, or the development of similar models by adversarial entities—the consequences could be severe. As Ruzzi noted, the only reliable approach is to assume proliferation and prepare accordingly. Investing in detection, zero-trust architectures, and behavioral analytics will be critical to staying ahead in the new era of AI-assisted exploitation.

Anthropic's announcement also draws attention to the long-standing weaknesses in software security. The fact that a 27-year-old vulnerability in OpenBSD could still be exploited highlights the slow pace of patching in many organizations. Mythos Preview may serve as a wake-up call, pushing companies to prioritize vulnerability management and adopt more proactive security practices. The race is now on: defenders must remediate and patch before other AIs, in the wrong hands, discover these zero-days and rapidly write exploits.

In the end, the question is not whether exploit-writing AI can be contained, but whether defenders can adapt quickly enough to mitigate the risks. With Project Glasswing, Anthropic has taken a step in the right direction, but the long-term outcome will depend on global collaboration, transparent testing, and a fundamental shift in how security is approached. The era of AI-driven cybersecurity has begun, and it promises to be both transformative and turbulent.


Source: Dark Reading News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy