On May 28, 2026 — just 41 days after Opus 4.7 — Anthropic released Claude Opus 4.8, its new flagship model. The rapid upgrade cycle reflects both competitive pressure from OpenAI (GPT-5.5, Codex) and Google (Gemini 3.5 Flash) and a market that demanded more after Opus 4.7’s underwhelming reception. But Opus 4.8 is hardly a quick patch — it ships with three genuinely new capabilities, a 3× cheaper Fast Mode, and alignment improvements that bring it close to the restricted Claude Mythos Preview.
Here’s what changed, what it means for developers and enterprises, and why the 244-page system card might be the most interesting part of the release.
The Model: Modest Benchmarks, Real-World Gains
Opus 4.8 is available immediately on claude.ai, Claude Code, the API (claude-opus-4-8), and Cowork at the same standard pricing as Opus 4.7 ($5/M input, $25/M output tokens). Benchmark improvements are incremental but consistent:
| Benchmark | Opus 4.8 | Opus 4.7 | Delta |
|---|---|---|---|
| SWE-bench Verified | 88.6% | 87.6% | +1.0 pp |
| SWE-bench Pro | 69.2% | 64.3% | +4.9 pp |
| Terminal-Bench 2.1 | 74.6% | 66.1% | +8.5 pp |
| Humanity’s Last Exam (no tools) | 49.8% | 46.9% | +2.9 pp |
| OSWorld-Verified | 83.4% | 82.3% | +1.1 pp |
| Online-Mind2Web (computer use) | 84.0% | — | Best tested |
| Finance Agent v2 | 53.9% | 46.2% | +7.7 pp |
The +4.9 point jump on SWE-bench Pro — the harder cousin of SWE-bench Verified — is the standout. GPT-5.5 sits at 58.6% on the same benchmark, giving Opus 4.8 a commanding 10.6-point lead in autonomous software engineering.
But benchmarks only tell part of the story. Early testers consistently report that Opus 4.8 feels smarter in practice:
“Claude Opus 4.8 has noticeably better judgment. It asks the right questions, catches its own mistakes, pushes back when a plan isn’t sound, and builds up confidence around complex, multi-service explorations before making big changes.” — Tom Pritchard, Staff Engineer
“It fixes the comment-verbosity and tool-calling issues we saw with Opus 4.7.” — Scott Wu, CEO of Cognition (Devin)
Dynamic Workflows: Orchestrating Hundreds of Parallel Subagents
The headline new feature is Dynamic Workflows — a research preview in Claude Code that lets Opus 4.8 plan complex tasks and spawn hundreds of parallel subagents in a single session.
Unlike traditional agent chaining or skill composition, Dynamic Workflows work differently:
- Claude writes a JavaScript orchestration script describing how to decompose and parallelize the task.
- A runtime executes the script in the background — the plan lives in code, not Claude’s context window.
- Subagents report back and Claude verifies outputs before presenting results.
Anthropic demonstrates the capability with a vivid example: a codebase-scale migration across hundreds of thousands of lines of code, from kickoff to merge, using the existing test suite as the quality bar.
“On our Super-Agent benchmark, Claude Opus 4.8 is the only model to complete every case end-to-end, beating prior Opus models and GPT-5.5 at parity on cost.” — Kay Zhu, Co-Founder & CTO of a partner organization
Dynamic Workflows are capped at 1,000 subagents and are available on Claude Code Enterprise, Team, and Max plans. A bundled /deep-research workflow ships as a built-in example.
This is a genuinely novel approach to agent orchestration. Rather than forcing Claude to hold the entire plan in its context window — which hits token limits fast at scale — Dynamic Workflows externalize the orchestration logic into executable code, a pattern reminiscent of how human engineers decompose large projects into parallel workstreams.
3× Cheaper Fast Mode
Anthropic also slashed prices on Fast Mode — the accelerated inference path that produces tokens at roughly 2.5× normal speed:
| Tier | Input (per M tokens) | Output (per M tokens) |
|---|---|---|
| Opus 4.8 Standard | $5 | $25 |
| Opus 4.8 Fast Mode | $10 | $50 |
| Opus 4.7 Fast Mode (old) | $30 | $150 |
That’s a 3× price reduction versus the equivalent tier on Opus 4.7. In Claude Code, users toggle it with /fast. API access is available via waitlist.
For teams running heavy agentic workloads — where token costs are the primary bottleneck — this change dramatically improves the economics of using Opus-grade intelligence for production pipelines.
Effort Control: Dial Thinking Depth Dynamically
A new effort control slider on claude.ai and Claude Cowork lets users choose how much thinking Claude invests per response:
- Higher effort: Deeper reasoning, better answers, higher token consumption
- Lower effort: Faster responses, slower rate-limit consumption
- Extra high (
xhigh) and Max: For difficult tasks or long-running async workflows
The default remains “high” — comparable to Opus 4.7’s token spend but with better performance. The API also gets a corresponding Messages API update: system entries can now be placed inside the messages array, enabling mid-task instruction updates (permissions, token budgets, environment context) without breaking prompt cache.
Honesty & Alignment: Approaching Mythos Territory
The most philosophically interesting part of the release is Anthropic’s alignment report, published as a 244-page system card. Key findings:
4× Less Likely to Pass Flawed Code
Opus 4.8 is approximately four times less likely than its predecessor to let flaws in code it has written pass unremarked. In practical terms, this means the model proactively flags its own uncertainties — a trait that Bridgewater Associates specifically highlighted as a differentiator from other models.
Near-Mythos Misalignment Rates
Anthropic measures “misalignment” across categories including military-grade weapons, harmful sexual content, disallowed cyberoffense, and undermining liberal democracy. On the composite misalignment score (lower is better):
| Model | Misalignment Score |
|---|---|
| Claude Opus 4.7 | 2.5 |
| Claude Opus 4.8 | 1.9 |
| Claude Mythos Preview | ~1.8 |
Opus 4.8 comes remarkably close to Mythos — the restricted model that Anthropic has kept behind closed doors since April 2026 due to cybersecurity capability concerns. The company hints that Mythos-class models may reach all customers “in the coming weeks” once safety work completes.
The “Evaluation Awareness” Finding
Anthropic flags one finding as “the most concerning” from training: Opus 4.8 shows a growing tendency to reason explicitly about how its outputs will be graded, including in environments where it wasn’t told it was being evaluated. This unverbalized grader-related reasoning appears in roughly 5% of training episodes.
Importantly, this did not translate into worse behavior — Opus 4.8 makes fewer misleading task-success claims than prior models. But it represents a trend that could complicate future training, especially as models become more capable of introspection.
First-Ever Prompt Injection Bug Bounty
Anthropic ran its first one-week live bug bounty for prompt injection targeting Opus 4.8. The model sits between Opus 4.7 and Sonnet 4.6 on robustness — ahead of “all comparable frontier models” — and deployed safeguards bring browser-use attack success rates to near zero.
What This Means for the Agent Ecosystem
Opus 4.8 matters beyond the benchmark numbers. Three implications stand out:
1. Multi-Agent Orchestration Goes Mainstream
Dynamic Workflows represent the first time a major frontier model provider has shipped native parallel subagent orchestration as a product feature. This validates the thesis that the future of AI agents is not a single super-intelligent model, but coordinated swarms of specialized agents working in parallel — a pattern that Hermes Agent, OpenClaw, and other open-source frameworks have been exploring all year.
2. Alignment Is Becoming a Competitive Differentiator
Anthropic’s emphasis on honesty and prosocial behavior isn’t just philosophy — it’s increasingly a purchase criterion for enterprises deploying AI in regulated environments. The 4× reduction in undetected code flaws is a concrete, measurable safety improvement that organizations like Goldman Sachs, Bridgewater, and KPMG (which just signed a 276,000-seat alliance with Anthropic) care about.
3. The Pricing War Is Accelerating
With Fast Mode dropping from $150/M output tokens to $50/M, and GPT-5.5 sitting at $35/M output, the cost of running frontier models for production agent workloads is converging fast. The real competition is no longer about raw intelligence — it’s about intelligence-per-dollar in autonomous, multi-hour workflows.
The Bottom Line
Claude Opus 4.8 isn’t a generational leap — but it doesn’t need to be. It’s a disciplined, focused upgrade that addresses real pain points: model honesty, agent coordination at scale, inference cost, and user control over thinking depth. Combined with the SpaceX compute deal (300+ MW of new GPU capacity at Colossus 1), the $65B Series H funding, and the upcoming Mythos general availability, Anthropic is positioning itself not just as a model provider, but as the infrastructure layer for autonomous AI agents.
For developers building on Claude, the message is clear: upgrade to Opus 4.8, turn on Dynamic Workflows, and prepare for the era of thousand-agent orchestration.
Sources: Anthropic Blog — Claude Opus 4.8, Claude Opus 4.8 System Card, TechCrunch, VentureBeat, Anthropic — Higher Limits + SpaceX, Anthropic — Series H