
The Vulnerability Lifecycle Is Collapsing on One Side. The Metric Executives Need Is the Velocity Gap.
On 9 June 2026 Anthropic released Claude Fable 5, the first publicly available model in the Mythos class, and upgraded Project Glasswing partners to the unsafeguarded Claude Mythos 5 in collaboration with the US government. Nine weeks earlier, on 7 April, the company had disclosed that it built Claude Mythos Preview and chose not to release it: the model could autonomously identify thousands of zero-day vulnerabilities across every major operating system and web browser, including a 27-year-old flaw in OpenBSD (a system specifically engineered for security) and a 16-year-old flaw in FFmpeg that fuzzing tools had executed five million times without catching. The public version ships with cybersecurity queries classifier-routed to the older Claude Opus 4.8; the version with those safeguards lifted is restricted to the Glasswing defensive consortium, which includes Amazon, Apple, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.
The capability went from withheld to publicly distributed in nine weeks. The follow-on data from the intervening two months is the more consequential half of the story: reporting indicates that under one percent of the vulnerabilities Mythos surfaced through Project Glasswing have been patched, even with responsible disclosure to the vendors that own the affected code and the deepest-resourced consortium in the industry holding the pen. Discovery has moved to machine speed and is now diffusing on a release cycle measured in weeks. Remediation has not moved. The gap between the two is the variable on which executive cyber decisions should now be made.
The implication for boards, CFOs, and CISOs is narrower than the headlines suggest and sharper than open-vulnerability dashboards can show. Risk is no longer well-described by counting unpatched bugs. It is described by the time between an exploit becoming reachable and the operating environment containing it, and by the number of systems any single compromise can touch before it is stopped. Those two numbers are answerable, ownable, and the two an executive team can actually move.
Key Takeaways
- Claude Mythos Preview (Anthropic, April 2026) autonomously found thousands of zero-days across every major OS and browser, including a 27-year-old OpenBSD flaw and a 16-year-old FFmpeg flaw the fuzzing industry had missed across five million test executions
- Mythos-class capability went from withheld to publicly distributed in nine weeks: Claude Fable 5 (released 9 June 2026) brings the model class to general availability with cyber queries classifier-routed to an older model, while the unsafeguarded Claude Mythos 5 went to Project Glasswing defenders and a US-government collaboration
- Under 1% of Mythos-discovered vulnerabilities have been patched as of mid-2026, even with responsible disclosure and the deepest-resourced consortium in the industry doing the disclosing. Discovery is now at machine speed; the ecosystem that has to absorb the patches is not
- AI also discovers zero-days in production code (Google Big Sleep, CVE-2025-6965), orchestrates multi-target intrusion campaigns at machine speed (Anthropic GTG-1002), exploits known flaws cheaply (87% success at $3.52 per attempt in the Fang study), and matches top human pen-testers on bounded scope (XBOW, mid-2025)
- The Verizon 2025 DBIR shows vulnerability exploitation reached 20% of breaches (up 34% YoY) and edge/VPN exploitation grew roughly eight-fold to 22% of exploitation actions, with only 54% of those vulnerabilities fully remediated at a 32-day median
- The metric to report to a board is no longer the open-vulnerability count. It is the Velocity Gap (attacker time-to-weaponize minus defender time-to-contain) and the Blast Radius Index (average systems reachable from one compromise)
- The doctrine that follows is Compress & Contain: compress the exposure window through machine-speed remediation, contain the radius through isolation and regeneration. Both metrics move under executive control
What Changed: The Capability Ceiling Moved
Five public event threads, all from the last twenty-four months, describe a capability ceiling that did not exist before.
Claude Mythos to Claude Fable 5 (Anthropic, April to June 2026). Anthropic disclosed the Mythos Preview system card on 7 April 2026 alongside the Project Glasswing announcement and a deliberate decision to withhold the model from general release. Mythos scored 100% on the Cybench benchmark (35 capture-the-flag cybersecurity challenges, every challenge solved on every attempt) and 83.1% on CyberGym (1,507 real-world vulnerability tasks, up from 66.6% for the prior-generation Claude Opus 4.6). Beyond benchmarks, the model identified flaws in production code that had survived decades of human review and millions of automated tests, including a 27-year-old OpenBSD vulnerability and the 16-year-old FFmpeg flaw fuzzers had executed five million times against. The system card also documents earlier model versions escaping sandboxes, concealing disallowed actions, and posting exploit details to public sites without being prompted.
Nine weeks later, on 9 June 2026, the withholding posture changed. Claude Fable 5 brought the Mythos class to general availability with classifier-based safeguards: cybersecurity-sensitive queries route to the older Claude Opus 4.8, with Anthropic reporting that more than 95% of sessions involve no fallback and zero compliance on harmful single-turn cyberattack requests across 30 public jailbreak techniques. The unsafeguarded Claude Mythos 5 went to Project Glasswing partners as an immediate upgrade, alongside a US-government collaboration. The diffusion pattern matters more than either individual release: frontier vulnerability-discovery capability now reaches the public on a cycle measured in weeks, behind a safeguard layer, while defenders hold the unrestricted version. For the full thesis behind the original withholding decision and what it means for AI governance, see Claude Mythos Preview: Anthropic built its most powerful model and chose not to release it.
Google Big Sleep (2025). Google's Big Sleep agent discovered CVE-2025-6965, a critical SQLite zero-day that was already known to threat actors, and used threat-intelligence context to predict and pre-empt its exploitation. Google described it as the first time an AI agent directly foiled an in-the-wild exploit. The same agent later reported twenty more flaws across widely used open-source projects. Mythos and Big Sleep together establish that frontier-grade discovery is not a single-vendor curiosity.
Anthropic GTG-1002 (November 2025). Anthropic disrupted a Chinese state-aligned espionage campaign that used Claude Code as an autonomous orchestrator across approximately thirty targets, at request rates a human team could not sustain. The campaign produced a handful of validated intrusions. The autonomy was real but imperfect: the AI hallucinated findings and overstated some results.
Fang et al. (arXiv:2404.08144, 2024). A peer-reviewed study found that a GPT-4 agent given just a CVE description exploited 87% of fifteen one-day vulnerabilities at roughly $3.52 per run. Without the CVE description, success fell to 7%. The asymmetry is structural: machines are better at weaponizing a known flaw than at finding an unknown one. Discovery is the harder, less-saturated frontier, though Mythos shows that frontier is now also moving.
XBOW on HackerOne (mid-2025). An autonomous pen-testing system topped HackerOne's US bug-bounty leaderboard. In one head-to-head, it completed in roughly 28 minutes what took a human approximately 40 hours. The result was for a specific quarter and reputation metric, and XBOW relied on a non-LLM deterministic validator to suppress false positives, but the directional point was clear: autonomous offense now matches top humans on bounded scope.
For the board-level briefing that ties these capability shifts together with the broader 2026 threat landscape, see AI-powered cyber attacks in 2026: what boards and CFOs need to act on.
The Defender Side Has Not Kept Pace
The Verizon 2025 DBIR shows the defender side independently of any AI argument. Vulnerability exploitation rose to 20% of breaches, a 34% year-over-year increase. Exploitation of edge and VPN devices grew roughly eight-fold to 22% of exploitation actions. Yet only 54% of those edge vulnerabilities were fully remediated, at a 32-day median. Mandiant's M-Trends 2026 found that 2025's most-exploited bugs were all zero-days in internet-facing servers.
The Mythos under-1%-patched figure is the ecosystem-side proof of the same dynamic. Two months after disclosure, with explicit consortium support and direct vendor cooperation, less than one percent of the vulnerabilities the model surfaced have been absorbed by the patch pipeline. The bottleneck is not discovery. It is the engineering capacity, vendor coordination, and patch-distribution channels that have to push fixes downstream into production. For the assessment-practice implications of the Mythos discovery and patching-gap pair, see Project Glasswing and the new baseline for cybersecurity assessment.
One caveat worth retiring before it spreads: Mandiant's "22 seconds" figure refers to the median handoff time between an initial-access broker and the next group in the criminal supply chain. It is evidence of an automated criminal economy, not a measure of AI weaponization speed. The relevant numbers for an executive conversation about AI-driven attacker speed are the Fang exploit-cost data, the XBOW head-to-head, the GTG-1002 request-rate data, and Big Sleep's pre-emption case. The 22-second statistic measures something different.
The Velocity Gap
Two clocks start the moment a vulnerability becomes reachable. The attacker's clock measures time to weaponize. The defender's clock measures time to contain. The gap between them is the organization's exposure window.
| Side | Activity | Order of magnitude |
|---|---|---|
| Attacker | Find, weaponize, move (one uninterrupted agent loop) | Minutes to hours |
| Defender | Detect, triage, change-approval, test, deploy (human-gated) | ~32 days median (edge devices, DBIR 2025) |
The defender figure is anchored to a specific measured surface (edge and VPN devices in the DBIR sample). It is illustrative of the broader gap, not a universal median across all attack types. The attacker figure reflects autonomous-tool demonstrations (Fang, XBOW, Big Sleep, AIxCC, Mythos benchmark performance). It describes the ceiling of what is now possible, not the median attack across all incidents. The planning assumption that the ceiling moves toward becoming the floor no longer needs a long horizon: Mythos-class capability went from withheld to general availability in nine weeks, gated by a safeguard layer rather than by scarcity. Each release cycle from here raises the floor.
The metric that follows: Velocity Gap = T(weaponize) − T(contain). When the gap is positive, the organization has an exposure window. When the gap is zero or negative, the defender is at or ahead of the curve on that exploit. The executive question on a quarterly review is whether the gap is shrinking, holding, or widening on the systems that matter.
Machines Exploit Better Than They Discover
The Fang result deserves a separate moment. The same agent's success collapsed from 87% to 7% the instant the CVE description was removed. Exploitation of a known flaw is the easier half of the lifecycle. Discovery of an unknown flaw is the harder half, and the half that AI is now beginning to crack but has not yet saturated. Mythos and Big Sleep are the leading indicators on the discovery side. Fang is the proof on the exploitation side.
The practical implication is that the universe of exploitable conditions is expanding faster than the universe of discovered vulnerabilities. Once a vulnerability is known and described, the cost of weaponizing it is now measured in dollars and minutes. The race the defender has to win is not "find every flaw before the attacker does." It is "contain every known reachable flaw before its weaponization curve runs out."
Why the Asymmetry Is Structural, Not Just a Capability Race
Both sides have access to the same frontier models. DARPA's AIxCC final round (August 2025) demonstrated AI's defensive capability directly: across 54 million lines of code, the competing systems found 77% of synthetic bugs and patched 61%, at a roughly 45-minute average and approximately $152 per task. They also discovered 18 real-world flaws. The capability is symmetric.
The friction is not.
| Why "attacker wins" is the wrong framing | Why the friction asymmetry still favors offense |
|---|---|
| AI patches too. AIxCC: 77% found, 61% auto-patched, ~45 min, ~$152 per task | One path vs. all paths. The attacker needs a single working route; the defender must close every one. AI multiplies both, but the attacker's job stays smaller |
| AI predicts and pre-empts. Big Sleep cut off a zero-day before use | No change-control board on offense. Defense ships through approvals, testing, and risk gates. Same model, asymmetric friction |
| Aggregate breach data is calm. DBIR sees no large AI boost to attackers yet | Adoption lag. Attackers integrate tooling in days; enterprises procure it in quarters (DBIR: 32-day medians) |
| Autonomy is flaky. Even GTG-1002's AI hallucinated and overstated findings | Correlated blast radius. Shared models and libraries mean one find can hit a whole sector at once, an offense-only multiplier |
The under-1% Mythos patching figure sits squarely at the friction intersection. The capability to find the flaws was already there. The capability to patch them is also there (AIxCC proved it). The bottleneck is the deployment layer between capability and production. That is where the gap lives, and that is where it has to be closed.
The June 2026 release sequence is the cleanest demonstration of the point. Defenders received the unsafeguarded Mythos 5 through Project Glasswing, ahead of and above what the public can access through Fable 5's safeguard layer. Defense holds the capability advantage on paper. The under-1% patching figure stands anyway, because access to frontier discovery was never the binding constraint. Change-approval queues, vendor coordination, regression testing, and patch distribution are. An organization that gains Mythos-class discovery tomorrow inherits a longer findings list and the same 32-day contain clock. Capability uplift without friction reduction widens the Velocity Gap rather than closing it.
The Doctrine: Compress & Contain
If the discovery curve is no longer controllable, and capability symmetry does not yield outcome symmetry because of the friction asymmetry, then only two variables remain under executive control. The Compress & Contain doctrine is the disciplined operation on both.
Risk decomposes into three terms: the probability that any given component is exploitable (which AI drives up and the organization cannot stop), the exposure window between weaponization and containment, and the blast radius if the race is lost. The first term is no longer controllable. The other two are.
Three operating verbs.
- Compress. Collapse the exposure window. Treat remediation as continuous deployment: agentic triage and machine-speed fixes that drive the contain clock down to the operational floor.
- Isolate. Cap what any single exploit can reach. Microsegmentation, default-deny egress, and service or kernel-level isolation hold the radius near zero.
- Regenerate. Deny persistence. Ephemeral instances with short TTLs, killed and respawned on a high-fidelity threat signal. The exposure window becomes an architectural constant (the TTL declared in configuration) rather than an operational variable (how fast humans react). Attacker dwell resets to roughly zero on every regeneration cycle.
Compress drives the Velocity Gap toward zero. Isolate and Regenerate drive the Blast Radius toward zero. Both metrics move under executive control, not under the attacker's.
Two North Star Metrics, Six Capability Domains
The metrics worth reporting to a board are not the open-vulnerability count. That number is now effectively infinite, and it was only ever a proxy for risk in a world where a meaningful patch window existed. The two metrics that are answerable, improvable, and ownable are:
- Velocity Gap = T(weaponize) − T(contain). The organization's exposure window on any given reachable exploit. Moving this toward zero is the entire purpose of the Compress verb.
- Blast Radius Index = average number of systems reachable from one compromise. When this approaches zero, the Velocity Gap stops being existential: a lost race no longer ends the company.
The doctrine maps to six capability domains. Each domain drives one or both of the North Star metrics. The detail of how each domain is built (the maturity ladder, the diagnostic rubric, the control-plane reference architecture, the autonomy boundaries) is the implementation work that follows a baseline diagnostic. The shape of the work, however, is public:
| # | Capability domain | What it drives |
|---|---|---|
| 01 | Continuous Self-Discovery. Point AI agents at your own code, dependencies, and configuration before adversaries do | Find it first |
| 02 | Machine-Speed Remediation. A CD pipeline for fixes, built for a deluge of small changes | Velocity Gap → 0 |
| 03 | Regenerative Containment. Isolate small; kill-and-respawn on threat (the keystone) | Blast Radius → 0; dwell → 0 |
| 04 | Agent Security. The organization's own AI agents are now privileged, manipulable insiders | Defend the defenders |
| 05 | Behavioral Detection and Deception. Catch the intrusion behavior, not the CVE | Vulnerability-independent |
| 06 | Velocity Governance. Fast-track defensive onboarding; re-base risk models to minutes, not quarters | The meta-enabler |
Domain 03 is the keystone. Regenerative Containment is what converts the exposure window from an operational variable (how fast humans react) into an architectural constant (the instance TTL declared in configuration). It is also the domain most resistant to retrofit, which is why a credible diagnostic measures it first.
For the runtime-control layer that supports Domain 06 (Velocity Governance), see AI governance as an operating system, not a policy PDF. For the AI-agent identity foundation that makes Domain 04 (Agent Security) operational, see you cannot secure AI agents with human-era identity models. For the broader baseline shift in what "secure enough" means, see the new baseline: why AI changed what 'secure enough' means.
What This Changes for the Executive Team
Three executive moves follow directly from the Velocity Gap thesis.
The board reporting changes. Open-vulnerability counts are retired as the headline cyber metric. The Velocity Gap and the Blast Radius Index replace them, measured against named systems of consequence: revenue-critical platforms, regulated workloads, and the systems most likely to surface in M&A or regulatory exposure. The board does not need to learn the metrics' definitions to act on them. It needs to see them trending in the right direction across consecutive quarters, with the residual gap narrowed against a target curve agreed at the start of the program.
The capital allocation changes. If the keystone is Regenerative Containment and the operating levers are architectural, the investment profile shifts from "more scanners and more humans" toward "ephemeral compute, isolation primitives, and the agentic-triage layer that closes the contain clock." Savings come from the scanner-and-headcount column. New spend lands on the architecture column. The total often nets close to neutral; the risk posture changes materially.
The vendor stack changes. Vendors selling open-vuln-count dashboards are selling a metric the doctrine retires. Vendors selling Velocity Gap measurement, Blast Radius measurement, agentic remediation, and regenerative-containment primitives are selling the new stack. The procurement question shifts to whether a vendor moves either of the two North Star metrics, with evidence. The procurement signal to watch for is a pitch whose underlying logic depends on a meaningful patch window still existing.
How Innovaiden's Method Maps to This
Compress & Contain is the doctrine. The implementation work runs through the Innovaiden engagement method: Anchor, Mine, Connect, Translate.
- Anchor. Success criteria are agreed against the systems an executive team actually owns. The Velocity Gap and Blast Radius targets are set on revenue-critical systems, regulated workloads, or named acquisition targets, not against the entire estate. The baseline diagnostic produces the starting measurements; the executive team approves the target curve; the rest of the work scopes to that.
- Mine. External signal first: prior incidents, exposed surface, actual reachable footprint, the systems most often named in regulatory or contractual exposure (NIS2, DORA, GDPR, eIDAS, CER, and the AI Act on the EU side; CIRCIA and sectoral regulators on the US side). For the cross-framework view of how the Velocity Gap intersects with EU regulatory consolidation, see the EU's single entry point solves the regulator's problem.
- Connect. The crosswalk that pulls each finding through the seven workstreams on the methodology page. The Velocity Gap measurement, the Blast Radius measurement, and the gap-to-keystone (Domain 03) sit at the center of this matrix. Each finding produces a remediation that ladders into one of the six capability domains.
- Translate. Each audience receives the artifact they have to act on. The board receives a two-page Velocity Gap brief with the trend curve and the residual gap. The CFO receives the capital allocation model. The CISO receives the technical roadmap to the keystone. The deal team or operating partner (in a private-equity context) receives the Day-100 plan to align portfolio companies onto a single Compress & Contain baseline.
Closing
The vulnerability lifecycle is collapsing on one side. Discovery has moved to machine speed and is now diffusing on a release cycle measured in weeks: nine weeks separated the decision to withhold Mythos from its arrival in public distribution as Fable 5. Remediation, on the evidence, has not moved: 32-day medians on the systems most often exploited, and under one percent of Mythos-discovered vulnerabilities patched two months after disclosure by the deepest-resourced defensive consortium in the industry. The defender side is the variable executives can move. The Velocity Gap and the Blast Radius Index are the metrics that describe whether the program is winning. Compress & Contain is the operating doctrine that drives both numbers down.
The Velocity Gap Executive Diagnostic produces the baseline measurement, the achievable operational floor for each capability domain, and the audience-specific artifacts the engagement method delivers. It is the first conversation a CISO, CIO, CFO, board chair, or operating partner has with us when starting the Compress & Contain journey.
Diagnose Your Velocity Gap and Blast Radius
Innovaiden runs a four-week diagnostic that measures the Velocity Gap and Blast Radius across the systems an executive team actually owns, and produces the Compress & Contain roadmap from the baseline. Reach out to walk through your portfolio.
Get in TouchFrequently Asked Questions
What is the Velocity Gap?
The Velocity Gap is the difference between attacker time-to-weaponize a vulnerability and defender time-to-contain it: Gap = T(weaponize) − T(contain). When the gap is positive the organization has a window of exposure. When the gap is zero or negative, the defender is at or ahead of the curve on that exploit. The metric is more useful than open-vulnerability counts because counts are now effectively infinite while the gap is bounded, measurable, and the single number that AI changes most directly.
What is the Blast Radius Index?
The Blast Radius Index is the average number of systems reachable from a single compromise. When the radius is near zero, the Velocity Gap stops mattering: the organization can lose a race on a specific exploit and still survive. Reducing the blast radius is the architectural insurance that makes the operating model survivable when an exploit lands faster than the contain clock can run.
How do Claude Mythos and Fable 5 change the executive picture?
Mythos demonstrated in April 2026 that a frontier model can autonomously find thousands of zero-days across every major operating system and web browser, including a 27-year-old flaw in OpenBSD and a 16-year-old flaw in FFmpeg that fuzzers had executed five million times without catching. The follow-on data is the more consequential half: under 1% of Mythos-discovered vulnerabilities have been patched as of mid-2026, even with responsible disclosure. On 9 June 2026 Anthropic released Claude Fable 5, the first publicly available Mythos-class model, with cybersecurity queries classifier-routed to an older model, and gave Project Glasswing defenders the unsafeguarded Claude Mythos 5. Mythos-class capability went from withheld to publicly distributed in nine weeks. Discovery has moved to machine speed and is now diffusing; remediation has not moved. The gap between the two is the variable on which executive cyber decisions should be made.
Does Fable 5's public release mean attackers now have Mythos-class cyber capability?
Not directly. Fable 5 ships with classifier-based safeguards that route cybersecurity-sensitive queries to the older Claude Opus 4.8, and Anthropic reports the model complied with zero harmful single-turn cyberattack requests across 30 public jailbreak techniques. The unsafeguarded variant, Claude Mythos 5, is restricted to Project Glasswing partners and a US-government collaboration. What changed for planning purposes is the diffusion timeline: the underlying capability is now in public distribution behind a safeguard layer, defenders hold the unrestricted version, and the precedent is set that Mythos-class capability reaches general availability. Executive planning should assume the capability floor rises with each release cycle rather than treating frontier discovery as exotic.
Does AI favor attackers or defenders?
Both can use AI, and DARPA's AIxCC final round (August 2025) showed AI can find 77% of synthetic flaws and patch 61% autonomously at roughly 45 minutes per task. The capability is symmetric. What is asymmetric is the friction around deployment: the attacker needs one working route while the defender must close every one; offense ships instantly while defense ships through change control; attackers integrate tooling in days while enterprises procure it in quarters. AI multiplies both sides, but the deployment-friction asymmetry favors offense. The Mythos under-1%-patched figure is the ecosystem-side proof of that asymmetry.
What is Compress & Contain?
Compress & Contain is the operating doctrine that follows from the Velocity Gap thesis. Three verbs: Compress (collapse the exposure window through agentic triage and machine-speed remediation), Isolate (cap what any single exploit can reach through microsegmentation, default-deny egress, and service or kernel isolation), Regenerate (deny persistence with ephemeral instances on short TTLs, killed and respawned on a high-fidelity threat signal). Compress drives the Velocity Gap toward zero. Isolate and Regenerate drive the Blast Radius toward zero. The doctrine maps to six capability domains, of which Regenerative Containment is the keystone.
Related Insights
AI & Cybersecurity
AI Development Tooling: The Supply Chain Attack Your Security Team Is Not Watching
AI & Cybersecurity
Agentic Attackers Are Here: What Mythos and Recent AI-Enabled Operations Mean for Your Threat Model
AI & Cybersecurity
Your Next Security Incident May Start in an AI Assistant, Not an Inbox
Sources
- Anthropic — Claude Mythos Preview system card (frontier vulnerability-discovery capabilities, withheld from general release). April 2026.
- Anthropic — Project Glasswing: applying frontier AI to defensive cybersecurity (12-organization consortium, $100M usage credits). April 2026.
- Anthropic — Claude Fable 5 and Mythos 5 (first public Mythos-class model; cyber safeguards via classifier routing; Mythos 5 to Glasswing partners and US-government collaboration). 9 June 2026.
- CNBC — Anthropic releases Mythos-like AI model to the public, Claude Fable 5. 9 June 2026.
- Bloomberg — Anthropic releases Mythos-like model without cyber capabilities. 9 June 2026.
- Google — Cybersecurity updates: Big Sleep and the CVE-2025-6965 disclosure. 2025.
- The Record — Google's Big Sleep AI tool found 20 flaws in open-source software. 2025.
- Anthropic — Disrupting the first reported AI-orchestrated cyber espionage campaign (GTG-1002). November 2025.
- Fang, Bindu, Gupta, Kang — LLM Agents Can Autonomously Exploit One-day Vulnerabilities (arXiv:2404.08144). 2024.
- XBOW — XBOW on HackerOne: what's next, and the road to Top 1. 2025.
- Dark Reading — AI-based pen tester becomes top bug hunter on HackerOne. 2025.
- Verizon — 2025 Data Breach Investigations Report (DBIR). 2025.
- Mandiant / Google Cloud — M-Trends 2026 (initial-access-broker handoff at 22 seconds; dwell time; top exploited bugs). 2026.
- Google Threat Intelligence Group — Look What You Made Us Patch: 2025 Zero-Days in Review. 2026.
- DARPA — AI Cyber Challenge (AIxCC) results: 77% found, 61% patched, ~45 min, 18 real-world flaws. August 2025.
- CyberScoop — DARPA's AI Cyber Challenge reveals winning models (~$152 average task cost). 2025.