AI Agent Finds 21 Zero-Days in FFmpeg for $1,000 — The Economics of Vulnerability Discovery Just Changed

AI Agent Finds 21 Zero-Days in FFmpeg for $1,000 — The Economics of Vulnerability Discovery Just Changed
📑 Table of Contents

TL;DR: Security startup depthfirst deployed an autonomous AI agent against FFmpeg’s ~1.5 million lines of C code. The result: 21 confirmed zero-day vulnerabilities, each with a reproducible proof-of-concept, for approximately $1,000 in cloud compute costs. One critical finding — a stack overflow in the AV1 RTP depacketizer — is a network-reachable RCE exploitable with a single 183-byte RTP packet over RTSP. Nine CVEs have been assigned (CVE-2026-39210 through CVE-2026-39218). The discovery represents a 200x to 500x cost reduction compared to a traditional human-led security audit, and it raises an uncomfortable question: when finding bugs is this cheap, can the patch pipeline keep up?


Introduction

On June 6, 2026, security startup depthfirst published the results of an experiment that should make every CTO, CISO, and open-source maintainer pause. The company pointed its autonomous AI security agent at FFmpeg — the ubiquitous open-source multimedia library embedded in YouTube, Netflix, Zoom, Discord, VLC, and billions of devices worldwide — and let it run.

The agent scanned roughly 1.5 million lines of C, reasoning about code structure and data flow rather than simply fuzzing for crashes. It returned 21 confirmed zero-day vulnerabilities. The total cloud compute cost: approximately $1,000.

A human-led engagement of equivalent scope would typically run $200,000 to $500,000 and take months. Depthfirst’s agent did it for the cost of a mid-range laptop. (Source: The Hacker News — AI Agent Uncovers 21 Zero-Days in FFmpeg)

This is not an isolated stunt. It joins a growing body of evidence that AI-driven vulnerability discovery has crossed a threshold — and that the security industry’s bottleneck has shifted from finding bugs to fixing them.


What Depthfirst’s Agent Found

The 21 Zero-Days

The depthfirst agent identified vulnerabilities across FFmpeg’s most complex subsystems — parsers, demuxers, and decoders that handle untrusted media input. The majority were heap and stack buffer overflows, the class of memory corruption bugs that most frequently lead to code execution.

Vulnerability Class Count Example
Heap overflow 8 TS demuxer, Matroska parser
Stack overflow 7 AV1 RTP depacketizer, VP9 decoder
Use-after-free 3 Audio format parser
Integer overflow 2 H.264 SEI parsing
Out-of-bounds read 1 Subtitles decoder

(Source: depthfirst Research — 21 Zero-Days in FFmpeg)

Nine CVEs have been assigned so far — CVE-2026-39210 through CVE-2026-39218 — with the remainder tracked but not yet numbered. All findings include a reproducible proof-of-concept input in a public GitHub repository.

The Critical One: Network-Reachable RCE in the AV1 RTP Depacketizer

One vulnerability stands out. A stack overflow in FFmpeg’s AV1 RTP depacketizer — the code that reassembles AV1 video frames received over RTP — can be triggered by sending a single 183-byte RTP packet to any FFmpeg process consuming an RTSP stream:

ffmpeg -i rtsp://attacker/stream

No user interaction required. No unusual flags. Just a standard FFmpeg command consuming a media stream from an untrusted source. The attack surface includes automated media pipelines, broadcast ingest systems, security camera monitoring, developer testing environments — any deployment where FFmpeg accepts RTSP or RTP input.

(Source: CSA Labs — AI Finds 21 FFmpeg Zero-Days for $1,000)

The Oldest Bug: 23 Years Latent

One of the stack overflows traces back to code introduced in 2003 — a service-description-table parser that has been quietly vulnerable for 23 years. It survived countless human reviews, static analysis scans, and fuzzing campaigns. The AI agent found it in minutes.

This is the pattern that should worry every enterprise security team: the absence of prior CVEs in a widely-deployed C/C++ library is not evidence of security — it’s evidence that nobody has looked with the right tools.


The Cost Comparison: 200x Cheaper Than Human Audits

Depthfirst’s $1,000 run is not just cheap — it represents a structural shift in the economics of vulnerability research.

Method Estimated Cost Time Coverage
Human-led audit (external firm) $200K–$500K 3–6 months Selective (scope-defined)
Traditional fuzzing (OSS-Fuzz, etc.) $10K–$50K (infra) Continuous Shallow (crash-oriented)
Anthropic Mythos (prior FFmpeg work) ~$10,000 Days Targeted (16-yr-old H.264 flaw)
Depthfirst AI agent ~$1,000 Hours Full codebase (reasoning-based)

(Source: CSA Labs Research Note)

The depthfirst agent is not just cheaper than prior AI approaches — it’s roughly 10x cheaper than Anthropic’s Mythos model, which needed roughly $10,000 for comparable FFmpeg analysis. The difference is that depthfirst built a purpose-built agent that reasons about code structure and data flow, rather than a general-purpose model applied to security.


The Bigger Picture: AI Vulnerability Discovery Is Accelerating Fast

Depthfirst’s FFmpeg findings are the latest in a cascade of AI-driven vulnerability discoveries that have defined 2026:

  • Anthropic Mythos (May 2026): Independently discovered and chained zero-day vulnerabilities across every major operating system and browser, demonstrating autonomous exploitation at a level previously attributed to nation-state actors. (Source: The Agent Report — Anthropic Mythos Autonomous Exploitation)

  • Project Glasswing (Anthropic, Q1 2026): Scanned 1,000+ open-source projects, flagged 23,019 potential vulnerabilities, 90.6% confirmed real, 6,202 rated high or critical. Partners include AWS, Apple, Broadcom, Cisco, Google, Microsoft, and NVIDIA. As of late May 2026, only 97 of 1,596 disclosed Glasswing vulnerabilities had been patched — a 6% patch rate. (Source: Anthropic — Project Glasswing)

  • AI on Redis (2026): An autonomous AI tool discovered an authenticated RCE in Redis that had been present since v7.2.0 — unnoticed for over two years.

  • AI on Linux Kernel (Feb 2026 study): AI agents reproduced working PoCs for over 50% of 100 real N-day bugs in the Linux kernel, beating traditional fuzzing approaches.

The trend is unmistakable: finding bugs has become cheap and automated. The bottleneck has moved downstream.


The Patch Pipeline Crisis

Here is the uncomfortable truth that depthfirst’s demonstration exposes. As of late May 2026, only 6% of the vulnerabilities disclosed through Project Glasswing had been patched. The other 94% remain open, with known exploits, in widely-deployed open-source software.

Meanwhile, Chrome 149 — released the same week as the depthfirst findings — patched a record 429 vulnerabilities, the largest single-release patch count in Chrome history. Over 100 were critical or high severity. The most severe, CVE-2026-10881 (CVSS 9.6), is an out-of-bounds read/write in the ANGLE graphics engine that enables sandbox escape and arbitrary code execution.

Google paid $97,000 for that single bug. But the bigger story is that 19 of the 22 critical bugs in Chrome 149 were found by Google’s own internal AI tools — not external researchers. (Source: The Hacker News — Chrome 149 Patches Record 429 Bugs)

The economics have flipped:

  • Discovery: Near-zero marginal cost. One agent, one codebase, $1,000.
  • Triage: Still manual. Someone must verify each finding, assess impact, and prioritize.
  • Fix: Still manual. Someone must write the patch, test it, review it, and ship it.
  • Deployment: Still manual. Someone must update every instance, including embedded copies 3–4 layers deep in dependency chains.

As the CSA Labs research note puts it: “The asymmetry between discovery economics and remediation economics is not merely an operational challenge — it represents a structural advantage for offense.”


What This Means for AI Agent Deployments

For organizations building or deploying AI agents, the FFmpeg findings have direct implications:

  1. Assume every library is vulnerable. If an agent touches media processing (video, audio, streaming), FFmpeg is likely in the dependency chain — often multiple times, at different version levels. SBOM audits are no longer optional.

  2. Network-reachable RCE in media pipelines is now a primary threat. The AV1 RTP depacketizer flaw means that any agent consuming RTSP streams from untrusted sources is potentially exploitable. Sandbox media processing in isolated environments with network egress controls.

  3. The AI security agent market just got validated. Depthfirst’s demonstration is a product pitch as much as a research paper. Expect every security vendor to announce an “AI agent for vulnerability discovery” in the coming weeks. The question buyers should ask: can your agent reason about code, or is it just a fuzzer with a language model wrapper?

  4. Patch cycles must accelerate. The industry’s current mean time to remediate (MTTR) for critical vulnerabilities is 38 days (per Synack’s 2026 report). With AI agents finding bugs faster than humans can patch them, that number needs to come down to days, not weeks.


FAQ

Q: Are the FFmpeg vulnerabilities already being exploited in the wild? A: There are no confirmed reports of active exploitation as of June 8, 2026. However, the PoC is public in depthfirst’s GitHub repository, making weaponization straightforward.

Q: Does this mean human security researchers are obsolete? A: No — it means their role is shifting. Discovery is becoming automated, but triage, patch writing, and strategic security architecture still require human judgment. The best outcomes will come from human-AI collaboration.

Q: Should I patch FFmpeg immediately? A: Yes. Pull the latest upstream fixed build or distribution security update. Prioritize deployments accepting RTSP/RTP from untrusted sources — the AV1 depacketizer flaw is network-reachable with a single packet.

Q: How does this compare to the Anthropic Mythos demonstration? A: Mythos showed an agent capable of chaining multiple zero-days across different systems for autonomous exploitation. Depthfirst’s agent found many vulnerabilities in a single codebase. The two are complementary: one is breadth-and-chain, the other is deep-scan.

Q: Is the $1,000 cost reproducible? A: Depthfirst’s figure is for cloud compute costs for a single run of their agent against FFmpeg. Results will vary by codebase size, complexity, and the agent’s configuration. But the order-of-magnitude cost reduction relative to human audits is structural, not anecdotal.


Further Reading