Citizen Hacker

How long until Mythos-level capability is generally available?

probability of unrestricted Mythos-level capability by end of 2026
Time-to-Parity (TTP): days · 95% CI:

By Conor Sherman · CISO in Residence, Sysdig · Contributing author, CSA/SANS Mythos-Ready Security Program

Anthropic’s Claude Mythos Preview can autonomously discover zero-day vulnerabilities at industrial scale — finding flaws in OpenBSD, FFmpeg, and Firefox that survived decades of expert review. Anthropic concluded general release would make large-scale cyberattacks “far more likely” and restricted access to eleven partner organizations.

This project tracks one metric: Time-to-Parity (TTP) — how many days until unrestricted, downloadable AI models match Mythos across reasoning, software engineering, and cybersecurity. Historically, TTP has compressed from 440 days to 106 days — a 4x acceleration. At current rates, there is an 85% probability of full parity before the end of 2026.

Three pillars to parity

Mythos-level capability requires reasoning, software engineering, and cybersecurity. The headline number tracks the last pillar to fall.

The Open-Weight Inversion

On the most direct cybersecurity benchmark, open-weight models have already surpassed every frontier model. The gap we’re tracking isn’t between open-weight and frontier — it’s between open-weight and a single restricted model.

The frontier tier — models with audit trails, guardrails, and usage policies — is already behind the unmonitored, decentralized tier on direct cybersecurity capability. Open-weight models should be on every security leader’s radar.

The Convergence

The gap between frontier and open-weight is compressing across every benchmark. When the lines converge, the gap closes. When amber crosses above blue, open-weight has surpassed frontier.

GPQA Diamond — frontier vs open-weight

Frontier best Open-weight best Shaded area = gap

SWE-bench Pro — frontier vs open-weight

Frontier best Open-weight best Shaded area = gap

CyberGym — frontier vs open-weight

Frontier best Open-weight best Shaded area = gap

The Trendline

Time-to-parity has compressed from ~440 days (at the 49% SWE-bench threshold) to ~106 days (at the 80% threshold). That’s a 4x compression over roughly two years. The question isn’t whether open-weight catches up — it’s how fast.

The Inflection Points

DeepSeek R1 (January 2025)
First open reasoning model matching o1 capabilities at a fraction of cost. Proved that reasoning-level capability could be replicated rapidly.
Chinese lab cluster (late 2025–early 2026)
GLM-5, Kimi K2.5, DeepSeek V3.2, Qwen 3.5 all crossing 70%+ simultaneously. Five independent open model families reaching frontier quality made the trend structural, not a one-off.
MoE architecture democratization
Enabled frontier-class inference on single GPUs, making open-weight models operationally viable for adversaries with limited compute.

Accelerators

Three risk factors could compress the projected parity window below what the historical trendline suggests:

Distillation & output leakage
Mythos is deployed to eleven Glasswing partners. Their outputs — vulnerability reports, code fixes, reasoning traces — enter the world. If that material reaches open-weight training data, TTP compresses from the time to discover capability to the time to copy it.
Inference-time scaling
Same base weights can produce 5–15 point score swings depending on reasoning mode, scaffolding, and agent loops. The effective parity date may arrive before the benchmark parity date.
Spud (GPT-5.5/6)
OpenAI’s next frontier model. Pre-training completed March 2026. Leaked ~40% improvement over GPT-5.4. If released publicly, does the open-weight ecosystem close that gap within the historical 3–6 month window? How OpenAI handles this release will be a defining moment.

“Citizen Hacker”

Just as citizen developers use AI to build software without formal training, citizen hackers will use AI to discover and exploit vulnerabilities without deep security expertise. When open-weight models reach Mythos-level capability, anyone with a GPU can run autonomous offensive campaigns — with no audit trail, no guardrails, and no way for the originating lab to intervene.

The VulnApocalypse is already underway. Research from Nicholas Carlini at UnPrompted 2025, MOAK.AI, and AISLE demonstrates that current-generation models already provide real offensive uplift. Mythos is a step-change beyond that — and eleven organizations already have access to it. This project tracks how fast unrestricted models are closing in.

CISO Response

The same convergence that makes offensive capability more accessible also makes defensive capability cheaper — open-weight models that reach 90%+ SWE capability let enterprises self-host autonomous remediation without sending proprietary code to third-party APIs. The resources below are where the practitioner community is coordinating the response.

How We Measure — the benchmarks, projection methods, confidence intervals, and what we deliberately don’t control for.

Forward projections fit logistic growth with linear fallback on the leading-edge (best-so-far) trajectory of non-restricted models. 95% bootstrap CIs from 1,000 resamples. All-model regression reported as cross-check. Probability = fraction of bootstrap draws projecting parity on or before Dec 31, 2026; combined = product of independent pillar probabilities.