<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://the-agent-report.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://the-agent-report.com/" rel="alternate" type="text/html" /><updated>2026-05-28T15:12:57+00:00</updated><id>https://the-agent-report.com/feed.xml</id><title type="html">The Agent Report</title><subtitle>The Agent Report is a curated magazine covering the latest developments in AI agents, agentic frameworks, tool use, autonomous systems, and the future of human-AI collaboration.
</subtitle><entry><title type="html">Meta MTIA: Four Custom AI Chips in Two Years — How Meta Is Powering Llama at Global Scale</title><link href="https://the-agent-report.com/2026/05/meta-mtia-four-chips-two-years-llama-infrastructure/" rel="alternate" type="text/html" title="Meta MTIA: Four Custom AI Chips in Two Years — How Meta Is Powering Llama at Global Scale" /><published>2026-05-28T14:00:00+00:00</published><updated>2026-05-28T14:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/meta-mtia-four-chips-two-years-llama-infrastructure</id><content type="html" xml:base="https://the-agent-report.com/2026/05/meta-mtia-four-chips-two-years-llama-infrastructure/"><![CDATA[<p><strong>March 11, 2026</strong> — Meta published a detailed technical overview of its Meta Training and Inference Accelerator (MTIA) family, revealing four successive chip generations — MTIA 300, 400, 450, and 500 — designed and deployed in rapid succession over roughly two years. The chips form the hardware backbone powering Llama inference, Muse Spark deployments, and the ranking and recommendation systems that drive billions of daily interactions across WhatsApp, Instagram, Facebook, and Messenger.</p>

<p>For the open-source AI community, the MTIA story matters because it directly shapes what Meta can deliver with Llama. Custom silicon designed in tight iteration loops with the model team means Meta can optimize the hardware-software stack end-to-end — and the fruits of that investment are now being pressed into service for GenAI workloads at unprecedented scale.</p>

<hr />

<h2 id="why-custom-silicon-matters-for-llama-and-genai">Why Custom Silicon Matters for Llama and GenAI</h2>

<p>Every day, billions of people across Meta’s platforms use AI-powered features — personalized recommendations, real-time translation, AI assistants, content moderation, and more. Serving this workload at the lowest possible cost requires purpose-built hardware. Off-the-shelf GPUs, while powerful, carry overhead in memory bandwidth, interconnect topology, and instruction set generality that Meta cannot afford at its scale.</p>

<p>Meta’s response is MTIA: a family of custom ASICs developed in close partnership with Broadcom. The chip family began with two earlier generations (MTIA 100 and MTIA 200, detailed at ISCA’23 and ISCA’25) that were initially optimized for ranking and recommendation (R&amp;R) inference — the dominant AI workload before GenAI took off.</p>

<p>Then GenAI happened. And Meta pivoted hard.</p>

<hr />

<h2 id="the-four-generations-of-mtia">The Four Generations of MTIA</h2>

<h3 id="mtia-300--the-foundation">MTIA 300 — The Foundation</h3>

<p>MTIA 300 was designed primarily for R&amp;R training workloads. Key innovations include:</p>

<ul>
  <li><strong>Built-in NIC chiplets</strong> for low-latency communication</li>
  <li><strong>Dedicated message engines</strong> for offloading communication collectives</li>
  <li><strong>Near-memory compute</strong> for reduction-based collectives</li>
</ul>

<p>While optimized for R&amp;R, these building blocks — low-latency, high-bandwidth communication components — proved foundational for GenAI inference in subsequent generations. MTIA 300 is in production today for R&amp;R training.</p>

<h3 id="mtia-400--the-genai-pivot">MTIA 400 — The GenAI Pivot</h3>

<p>As the GenAI wave surged, Meta evolved MTIA 300 into the MTIA 400, rebalancing the design to better support GenAI models while retaining R&amp;R capability. The chip features a <strong>72-accelerator scale-up domain</strong> and delivers performance competitive with leading commercial products. MTIA 400 has completed lab testing and is on the path to data-center deployment.</p>

<h3 id="mtia-450--inference-optimized">MTIA 450 — Inference-Optimized</h3>

<p>Anticipating massive GenAI inference demand, MTIA 400 transitioned into MTIA 450 with specific optimizations for inference workloads. The standout improvement: <strong>HBM bandwidth was doubled</strong> from MTIA 400, making it significantly higher than existing commercial alternatives. Meta also introduced <strong>low-precision data types co-designed for inference</strong> workloads. MTIA 450 is scheduled for mass deployment in early 2027.</p>

<h3 id="mtia-500--the-flagship">MTIA 500 — The Flagship</h3>

<p>Continuing the GenAI inference focus, MTIA 500 increases HBM bandwidth by an additional 50% over MTIA 450 and introduces further innovations in low-precision data types. Scheduled for mass deployment in 2027, it represents the culmination of two years of relentless iteration.</p>

<hr />

<h2 id="the-numbers-25-compute-growth-in-two-years">The Numbers: 25× Compute Growth in Two Years</h2>

<p>The raw specs tell the story of a team executing at remarkable velocity:</p>

<table>
  <thead>
    <tr>
      <th>Metric</th>
      <th>MTIA 300 → MTIA 500 Improvement</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>HBM Bandwidth</td>
      <td><strong>4.5× increase</strong></td>
    </tr>
    <tr>
      <td>Compute FLOPS</td>
      <td><strong>25× increase</strong> (MX8 → MX4 precision)</td>
    </tr>
    <tr>
      <td>Generations</td>
      <td>4 in under 2 years</td>
    </tr>
  </tbody>
</table>

<p>This rapid advancement is the result of a deliberate <strong>iterative strategy</strong>. Rather than betting on a single long-cycle design, Meta builds each generation on the last using modular chiplets, incorporating the latest AI workload insights and hardware technologies on a shorter cadence. As Meta’s blog post explains: <em>“Chip designs are based on projected workloads, but by the time the hardware reaches production — often two years later — those workloads may have shifted substantially.”</em> The solution is to shorten the loop.</p>

<hr />

<h2 id="from-rr-to-genai-why-the-pivot-matters-for-the-open-source-ecosystem">From R&amp;R to GenAI: Why the Pivot Matters for the Open-Source Ecosystem</h2>

<p>The MTIA journey is a case study in how quickly an organization can realign its hardware roadmap around a paradigm shift. In 2023, Meta’s dominant AI workload was ranking and recommendation — the systems that decide what you see in your feed. By 2025, GenAI had become the primary focus, with Llama and Muse Spark driving demand for inference compute at previously unimaginable scales.</p>

<p>For developers building on Llama, the implications are significant:</p>

<ol>
  <li>
    <p><strong>Lower inference costs</strong>: Custom silicon tailored to Llama’s architecture means Meta can offer Llama API pricing that undercuts general-purpose cloud providers. As MTIA 450 and 500 come online in 2027, margins improve further.</p>
  </li>
  <li>
    <p><strong>Tighter model-hardware co-design</strong>: When the chip team and the model team work from the same playbook, the entire stack is more efficient. Meta has confirmed it tested MTIA with <strong>Llama LLMs</strong> during development, a feedback loop that benefits both sides.</p>
  </li>
  <li>
    <p><strong>Strategic independence</strong>: By owning its silicon roadmap, Meta reduces dependence on NVIDIA and other GPU vendors — a critical factor as global AI chip supply remains constrained. Hundreds of thousands of MTIA chips are already deployed in production.</p>
  </li>
</ol>

<hr />

<h2 id="the-bigger-picture-metas-ai-infrastructure-bet">The Bigger Picture: Meta’s AI Infrastructure Bet</h2>

<p>The MTIA program is part of a broader infrastructure strategy that includes massive data-center buildouts and a commitment to a <strong>diverse silicon portfolio</strong>. Meta has stated it will continue to leverage the best solutions available — both internally and externally — but MTIA is increasingly central to its plans.</p>

<p>This matters because Meta’s investment in custom silicon directly affects the open-source Llama ecosystem. Every efficiency gain in the inference stack makes it cheaper and more sustainable for Meta to run Llama-based services — and by extension, to justify continued investment in the model family.</p>

<p>The MTIA roadmap also signals something about Meta’s long-term intentions: the company is not outsourcing its AI future to chipmakers. By building its own accelerators, Meta retains control over the hardware-software interface — and that control translates into faster iteration, lower costs, and a competitive moat that grows deeper with each new chip generation.</p>

<hr />

<h2 id="whats-next">What’s Next</h2>

<p>With MTIA 450 and 500 on the horizon for 2027, and MTIA 400 entering deployment, Meta’s hardware story is only accelerating. The company has demonstrated that it can move from design to deployment faster than traditional chip-development cycles — a capability that will become increasingly valuable as AI models continue to evolve at breakneck speed.</p>

<p>For the open-source AI community, the takeaway is clear: Meta is building the infrastructure to run Llama and its successors at a scale that few organizations can match. Whether you access those models through the Llama API, run them on your own hardware, or fine-tune them for specific tasks, the economics of inference — and therefore the viability of open-source AI — will be shaped in part by what Meta achieves with MTIA in 2026 and 2027.</p>

<hr />

<p><em>This article was researched from Meta’s official blog post “Four MTIA Chips in Two Years: Scaling AI Experiences for Billions” (March 11, 2026), the ISCA’23 and ISCA’25 papers on MTIA architecture, and Meta’s published infrastructure strategy documents. All information is current as of May 28, 2026.</em></p>]]></content><author><name>The Agent Report</name></author><category term="research" /><category term="meta" /><category term="llama" /><category term="mtia" /><category term="ai-hardware" /><category term="inference" /><category term="custom-silicon" /><summary type="html"><![CDATA[Meta has unveiled four generations of its custom MTIA AI chips in under two years — MTIA 300, 400, 450, and 500 — purpose-built to run Llama, Muse Spark, and Meta's entire GenAI stack at planetary scale. With 25× compute growth and 4.5× bandwidth improvement across the family, the MTIA program is the infrastructure bedrock behind Meta's AI ambitions.]]></summary></entry><entry><title type="html">BadHost: The Starlette Vulnerability That Exposed Millions of AI Agents and MCP Servers</title><link href="https://the-agent-report.com/2026/05/badhost-starlette-cve-critical-ai-agent-vulnerability/" rel="alternate" type="text/html" title="BadHost: The Starlette Vulnerability That Exposed Millions of AI Agents and MCP Servers" /><published>2026-05-28T10:00:00+00:00</published><updated>2026-05-28T10:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/badhost-starlette-cve-critical-ai-agent-vulnerability</id><content type="html" xml:base="https://the-agent-report.com/2026/05/badhost-starlette-cve-critical-ai-agent-vulnerability/"><![CDATA[<h1 id="badhost-the-starlette-vulnerability-that-exposed-millions-of-ai-agents-and-mcp-servers">BadHost: The Starlette Vulnerability That Exposed Millions of AI Agents and MCP Servers</h1>

<p><strong>May 28, 2026</strong> — A critical authentication bypass vulnerability in Starlette, the Python ASGI framework that underpins much of the AI infrastructure ecosystem, has put millions of AI agents and MCP (Model Context Protocol) servers at risk of data theft, credential exposure, and remote code execution.</p>

<p>Tracked as <strong><a href="https://osv.dev/vulnerability/DEBIAN-CVE-2026-48710">CVE-2026-48710</a></strong> and nicknamed <strong>BadHost</strong>, the vulnerability allows attackers to bypass path-based authentication middleware with a single malformed HTTP Host header character. The flaw affects all Starlette versions prior to <strong>1.0.1</strong>, which was released on Friday.</p>

<p>“Millions of AI agents and tools around the world have been imperiled by a critical vulnerability that can allow hackers to breach the servers running them and make off with sensitive data and credentials,” Ars Technica’s Dan Goodin <a href="https://arstechnica.com/information-technology/2026/05/millions-of-ai-agents-imperiled-by-critical-vulnerability-in-open-source-package/">reported</a>.</p>

<h2 id="how-badhost-works">How BadHost Works</h2>

<p>Starlette reconstructs <code class="language-plaintext highlighter-rouge">request.url</code> by concatenating the HTTP Host header with the request path — without validating the Host value against RFC 9112 or RFC 3986 grammar. An attacker can send a crafted header like <code class="language-plaintext highlighter-rouge">Host: example.com/health?x=</code> that shifts path and query boundaries during re-parsing, making <code class="language-plaintext highlighter-rouge">request.url.path</code> point to a different endpoint than the one the ASGI server actually routed to.</p>

<p>The result: the router dispatches on the real wire path (e.g., <code class="language-plaintext highlighter-rouge">/admin</code>), but middleware sees the poisoned re-parsed path (e.g., <code class="language-plaintext highlighter-rouge">/health</code>). Any path-based security decision made in middleware can be bypassed.</p>

<p>X41 D-Sec, the security firm that discovered the bug during an audit sponsored by OSTIF, described it in stark terms:</p>

<blockquote>
  <p><em>“A single character injected into the HTTP Host header bypasses path-based authorization in Starlette, the routing core of FastAPI.”</em></p>
</blockquote>

<h2 id="the-scope-millions-of-affected-systems">The Scope: Millions of Affected Systems</h2>

<p>Starlette receives <strong>325 million downloads per week</strong> and is the foundation of FastAPI — the most popular Python web framework for AI applications. The downstream impact is staggering:</p>

<ul>
  <li><strong>vLLM</strong> — where the bug was originally discovered — the leading open-source LLM inference server</li>
  <li><strong>LiteLLM</strong> — the widely-used LLM proxy that sits in front of dozens of model providers</li>
  <li><strong>MCP servers</strong> — the Model Context Protocol infrastructure that connects AI agents to external tools, databases, and APIs</li>
  <li><strong>Agent harnesses</strong> and <strong>eval dashboards</strong></li>
  <li><strong>Google ADK-Python</strong> and <strong>Ray Serve</strong></li>
  <li><strong>BentoML</strong> and other ML serving platforms</li>
</ul>

<p>MCP servers are particularly exposed because the MCP specification mandates <strong>unauthenticated OAuth discovery endpoints</strong>, providing attackers with a reliable path to find and exploit vulnerable instances. These servers store credentials for databases, email accounts, cloud services, and internal tools — making them exceptionally valuable targets.</p>

<h3 id="data-types-exposed-by-scans">Data Types Exposed by Scans</h3>

<p>X41 D-Sec’s internet-wide scan revealed a disturbing range of exposed data across vulnerable systems:</p>

<table>
  <thead>
    <tr>
      <th>Sector</th>
      <th>Exposed Data</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Biopharma AI</td>
      <td>Clinical trial databases, M&amp;A data</td>
    </tr>
    <tr>
      <td>Identity Verification</td>
      <td>Face analysis, KYB, live PII, internal codebases</td>
    </tr>
    <tr>
      <td>IoT/Industrial</td>
      <td>SSH access to devices, remote code execution</td>
    </tr>
    <tr>
      <td>Email/SaaS</td>
      <td>Full mailbox access (read/send/delete), S3 exports</td>
    </tr>
    <tr>
      <td>HR/Recruitment</td>
      <td>Candidate PII, hiring pipeline data</td>
    </tr>
    <tr>
      <td>Cloud Monitoring</td>
      <td>AWS topology, metric queries</td>
    </tr>
    <tr>
      <td>Cybersecurity</td>
      <td>Asset inventory, live scanner access</td>
    </tr>
  </tbody>
</table>

<h2 id="why-this-matters-for-ai-agents">Why This Matters for AI Agents</h2>

<p>The BadHost vulnerability is emblematic of a structural risk in the AI agent ecosystem: <strong>trust at the wrong layer</strong>. Starlette, FastAPI, vLLM, and LiteLLM form the backbone of most Python-based AI infrastructure, yet the interaction between ASGI server behavior, framework URL construction, and middleware auth decisions created a vulnerability that no single component could fix alone.</p>

<p>As OSTIF noted in their disclosure: <em>“This bug is a classic ‘responsibility gap’ where if this maintainer didn’t patch, thousands of exposed projects would have to individually secure their projects.”</em></p>

<p>The vulnerability also highlights a limitation of current AI-powered security tools. The researchers noted that even Claude Mythos (Anthropic’s code-scanning agent) did not find CVE-2026-48710 during Project Glasswing, because the bug spans three independent layers — each behaving correctly in isolation — rather than existing in a single codebase.</p>

<h2 id="mitigation-and-response">Mitigation and Response</h2>

<p><strong>The fix</strong>: Upgrade Starlette to version <strong>1.0.1</strong> or later. The patched version rejects Host headers containing invalid characters instead of using them for URL construction.</p>

<p><strong>For those who cannot upgrade immediately</strong>:</p>

<ol>
  <li><strong>Replace <code class="language-plaintext highlighter-rouge">request.url.path</code> with <code class="language-plaintext highlighter-rouge">request.scope["path"]</code></strong> in every middleware, dependency, and decorator that makes security decisions</li>
  <li><strong>Deploy an RFC-compliant reverse proxy</strong> (nginx, Caddy, Traefik, HAProxy) that validates Host headers before forwarding to ASGI servers</li>
  <li><strong>Audit bundled and vendored Starlette</strong> — container images, virtualenvs, and pip-installed dependencies may pin vulnerable versions</li>
</ol>

<p>A free online scanner is available at <strong><a href="https://badhost.org/">badhost.org</a></strong> — developed jointly by X41 D-Sec, Persistent Security Industries, and Bintech — to check if any reachable endpoint is vulnerable. The open-source repository also includes PoC exploits, Semgrep rules for static detection, and CodeQL queries for large-scale scanning.</p>

<h2 id="the-takeaway">The Takeaway</h2>

<p>For developers building on AI agent infrastructure, BadHost is a wake-up call. The Python AI tooling ecosystem has grown so fast that foundational security assumptions at the framework layer have gone unexamined. Every team running FastAPI-based MCP servers, LLM proxies, or agent harnesses should treat this as a <strong>critical priority</strong> — scan their infrastructure, patch Starlette, and audit middleware for path-based auth patterns.</p>

<p>The vulnerability may have been disclosed, but the real impact depends on how quickly the ecosystem patches. With 325 million weekly downloads and MCP servers holding credentials to production systems, the window for exploitation is wide open.</p>

<p><em>Sources: <a href="https://arstechnica.com/information-technology/2026/05/millions-of-ai-agents-imperiled-by-critical-vulnerability-in-open-source-package/">Ars Technica — Millions of AI agents imperiled by critical vulnerability</a> | <a href="https://ostif.org/disclosing-the-badhost-vulnerability-in-starlette/">OSTIF — Disclosing the BADHOST Vulnerability</a> | <a href="https://badhost.org/">badhost.org — Scanner &amp; Details</a> | <a href="https://osv.dev/vulnerability/DEBIAN-CVE-2026-48710">OSV — CVE-2026-48710</a></em></p>]]></content><author><name>The Agent Report</name></author><category term="tools-frameworks" /><category term="starlette" /><category term="fastapi" /><category term="security" /><category term="vulnerability" /><category term="mcp" /><category term="badhost" /><category term="cve" /><category term="ai-infrastructure" /><category term="agent-security" /><summary type="html"><![CDATA[A critical vulnerability in Starlette — the Python framework powering FastAPI, vLLM, and most MCP servers — lets attackers bypass authentication with a single malformed HTTP header, exposing millions of AI agents to data theft and remote code execution.]]></summary></entry><entry><title type="html">Openclaw v2026.5.26 Makes Transcripts Core, Ships Faster Gateway and Production-Ready Channels</title><link href="https://the-agent-report.com/2026/05/openclaw-v2026-5-26-transcripts-core-gateway-performance/" rel="alternate" type="text/html" title="Openclaw v2026.5.26 Makes Transcripts Core, Ships Faster Gateway and Production-Ready Channels" /><published>2026-05-28T10:00:00+00:00</published><updated>2026-05-28T10:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/openclaw-v2026-5-26-transcripts-core-gateway-performance</id><content type="html" xml:base="https://the-agent-report.com/2026/05/openclaw-v2026-5-26-transcripts-core-gateway-performance/"><![CDATA[<p>Just two days after the <a href="/2026/05/openclaw-v2026-5-22-4100x-model-listing-meeting-notes/">v2026.5.22 release with its 4,100× model-listing optimization</a>, Openclaw is back with <strong>v2026.5.26</strong> — a stable release that makes transcripts a first-class core capability, delivers substantial gateway performance improvements, and brings Telegram, iMessage, WhatsApp, and Discord to genuine production-readiness.</p>

<p>With <strong>375,000+ GitHub stars</strong>, <strong>78,200+ forks</strong>, and <strong>61 named contributors</strong> in the release changelog alone, the project continues to consolidate its position as the leading open-source claw controller for AI agents.</p>

<h2 id="transcripts-go-core">Transcripts Go Core</h2>

<p>The defining architectural change in v2026.5.26 is the elevation of <strong>transcripts</strong> from a plugin-level concern to a core system capability. Every agent interaction — whether initiated via CLI, WebChat, media upload, follow-up, hook, or Codex mirror — now flows through a unified transcript pipeline.</p>

<h3 id="what-this-means">What This Means</h3>

<ul>
  <li><strong>Transcript-backed meeting summaries</strong> — Agent conversations are captured with full source-provider metadata, cleaned user turns, and media provenance, enabling accurate post-hoc summaries</li>
  <li><strong>Codex mirror transcripts</strong> — Codex app-server interactions are mirrored into the same transcript store, giving operators a single pane of glass across all agent activity</li>
  <li><strong>CLI/TUI replay</strong> — Transcripts support deterministic replay with hooks, making debugging and auditing dramatically simpler</li>
  <li><strong>Media provenance</strong> — Every image, file, and generated asset is tracked with its origin context in the transcript record</li>
</ul>

<p>The transcript capture happens at the gateway level, not the plugin level, which means <strong>every</strong> conversation path is covered — including system events, hook-generated turns, and fallback routing. This is a foundational change that lays the groundwork for compliance, audit, and training-data pipelines.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Access transcripts via the new CLI surface</span>
openclaw transcript list
openclaw transcript view &lt;session-id&gt;
</code></pre></div></div>

<h2 id="gateway-performance-less-rediscovery-faster-replies">Gateway Performance: Less Rediscovery, Faster Replies</h2>

<p>The v2026.5.26 release targets one of the most pervasive sources of latency in agent gateways: <strong>repeated rediscovery of the same information</strong>. The team audited every hot path in the gateway startup and reply pipeline, adding smart caching where it matters most.</p>

<h3 id="key-optimizations">Key Optimizations</h3>

<table>
  <thead>
    <tr>
      <th>Area</th>
      <th>Optimization</th>
      <th>Impact</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>Plugin metadata</strong></td>
      <td>Plugin metadata snapshots are cached for the process lifetime</td>
      <td>Reply-time skill setup no longer rescans plugin metadata on every turn</td>
    </tr>
    <tr>
      <td><strong>Startup warnings</strong></td>
      <td>Startup-warning metadata is cached and reused</td>
      <td>Gateway startup avoids repeated filesystem scans</td>
    </tr>
    <tr>
      <td><strong>Auth stores</strong></td>
      <td>Auth env snapshots are prepared once and reused</td>
      <td>No repeated credential resolution on every request</td>
    </tr>
    <tr>
      <td><strong>Model cost indexes</strong></td>
      <td>Model pricing metadata is cached</td>
      <td>Usage-cost tracking is near-instant</td>
    </tr>
    <tr>
      <td><strong>Channel resolution</strong></td>
      <td>Channel routing is cached per session</td>
      <td>No repeated dispatch table rebuilds</td>
    </tr>
    <tr>
      <td><strong>Session caches</strong></td>
      <td>Session read paths avoid cloning</td>
      <td>Lower memory pressure under load</td>
    </tr>
  </tbody>
</table>

<p>The most visible impact is on <strong>visible reply delivery latency</strong>. Telegram typing/progress context is preserved, slash-command startup metadata is lazy-loaded, model hydration on hot paths is avoided, Codex profiler timing is flag-gated, and context compaction maintenance is deferred until after the user-facing reply is sent. The net effect: <strong>users see responses faster</strong>, even as the gateway continues processing background work.</p>

<h2 id="four-channels-reach-production-readiness">Four Channels Reach Production Readiness</h2>

<h3 id="telegram">Telegram</h3>
<p>Telegram receives the most extensive channel update in this release. Inbound text entities are preserved, overlapping DM replies are handled correctly, account-scoped topic caches keep forum context, outbound replies carry proper context, targeted bot-command mentions work reliably, durable group retry targets are maintained, and native progress callbacks keep users informed during long-running operations.</p>

<h3 id="imessage">iMessage</h3>
<p>iMessage now handles attachment roots correctly — images saved under <code class="language-plaintext highlighter-rouge">~/Library/Messages/Attachments</code> are read through the existing inbound path policy. Duplicate local Messages-source accounts are deduplicated at startup, direct DM history is seeded reliably, and image/group media attachment commands work as expected. The development team also addressed the long-standing issue where <code class="language-plaintext highlighter-rouge">channels.imessage.accounts</code> listing both <code class="language-plaintext highlighter-rouge">default</code> and a named account would spawn duplicate watchers.</p>

<h3 id="whatsapp">WhatsApp</h3>
<p>WhatsApp regains proper group/media behavior with restored ack identity and group-drop warnings. The update also fixes media path resolution when <code class="language-plaintext highlighter-rouge">OPENCLAW_HOME</code> differs from the OS home directory.</p>

<h3 id="discord">Discord</h3>
<p>Discord voice playback reliability is significantly improved. Large model picker menus are now bucketed alphabetically when the provider list exceeds 25 items. Media captions are merged into a single message, gateway metadata is routed through the configured proxy, numeric channel IDs work for outbound sends, self-reply echoes are suppressed, and wake-name matching is tightened without breaking fuzzy wake phrases.</p>

<h2 id="voice-and-talk-full-realtime-control">Voice and Talk: Full Realtime Control</h2>

<p>The voice subsystem receives a major architectural upgrade in v2026.5.26. The team extracted a <strong>shared realtime voice SDK</strong> that provides common primitives for turn-context tracking, output activity monitoring, consult question matching, speakable-result extraction, and alias-aware forced-consult coordination. This SDK is then reused across Discord, browser voice, Google Meet, and all other voice surfaces.</p>

<p>Key capabilities now available:</p>

<ul>
  <li><strong>Realtime Talk runs</strong> can be inspected, steered, cancelled, or followed up from both the Web UI and Discord voice</li>
  <li><strong>Wake-name handling</strong> is more tolerant of ambient noise without letting ambient speech falsely trigger agents</li>
  <li><strong>iOS Talk mode</strong> now features direct realtime voice sessions, a compact toolbar status indicator, and responsive voice waveform feedback</li>
  <li><strong>Android</strong> gains the pair-new-gateway action with improved offline voice recovery</li>
  <li><strong>Google Meet</strong> command bridges reuse the shared output activity tracking for local barge-in detection</li>
</ul>

<h2 id="safer-content-boundaries">Safer Content Boundaries</h2>

<p>Security hardening continues with several important improvements:</p>

<ul>
  <li><strong>Browser snapshot reads</strong> now honor SSRF policy before ChromeMCP or direct CDP reads</li>
  <li><strong>System-event text</strong> cannot spoof nested prompt markers — untrusted plugin/channel labels are sanitized before they reach the prompt</li>
  <li><strong>Fetched file text</strong> is wrapped as external content with metadata boundaries</li>
  <li><strong>ClickClack</strong> inbound sender allowlists are applied before agent dispatch</li>
  <li><strong>Stale device tokens</strong> are rejected during rotation</li>
  <li><strong>Serialized tool-call text</strong> is scrubbed from visible replies</li>
</ul>

<p>The team also enabled the <strong>default auth rate limiter</strong> for remote non-browser HTTP gateway auth failures when <code class="language-plaintext highlighter-rouge">gateway.auth.rateLimit</code> is unset, while preserving the loopback exemption for local development.</p>

<h2 id="providers-codex-and-local-models">Providers, Codex, and Local Models</h2>

<p>The provider layer sees steady improvements across the board:</p>

<ul>
  <li><strong>Named auth profiles</strong> allow multiple login configurations per provider, with migration support for Hermes, OpenCode, and Codex auth profiles</li>
  <li><strong>OpenAI sampling params</strong> are now forwarded through the gateway</li>
  <li><strong>Codex app-server resume/timeout/usage-limit recovery</strong> is hardened — Codex turn timeouts stay inside the Codex runtime boundary so they don’t poison shared app-server clients</li>
  <li><strong>xAI usage limits</strong> are surfaced in status output</li>
  <li><strong>Ollama</strong> receives top-p normalization to ensure consistent generation behavior</li>
  <li><strong>Local approval resolution</strong> is fixed for plugin command paths</li>
  <li><strong>Memory/local embeddings</strong> now run GGUF embeddings in an isolated worker sidecar — if the native embedding process crashes, the gateway degrades gracefully to keyword search instead of taking down the entire system</li>
</ul>

<h2 id="observability-gets-richer">Observability Gets Richer</h2>

<p>v2026.5.26 introduces several observability improvements that make it easier to understand what the gateway is doing:</p>

<ul>
  <li><strong>Activity tab</strong> — A new ephemeral tab in the Control UI shows sanitized live tool activity summaries without persisting raw telemetry</li>
  <li><strong>Gateway secret-prep traces</strong> — The diagnostics pipeline now traces secret preparation, making it easier to debug auth failures</li>
  <li><strong>Model stream progress</strong> — Users can see streaming progress for model responses</li>
  <li><strong>Explicit fast-mode status</strong> — The TUI now shows when fast-mode is active</li>
  <li><strong>OpenTelemetry LLM spans</strong> — Content spans are now emitted through the OTLP exporter, giving operators full visibility into model interactions</li>
  <li><strong>Alertable telemetry</strong> — Blocked tools, model failover, stale sessions, liveness warnings, oversized payloads, and webhook ingress all generate actionable signals</li>
</ul>

<h2 id="the-big-picture">The Big Picture</h2>

<p>Openclaw v2026.5.26 is a <strong>consolidation release</strong> — it takes capabilities that were scattered across plugins, channels, and undocumented code paths and pulls them into a coherent, performant, observable core. Transcripts as a first-class feature, a faster gateway through smarter caching, and four channels reaching genuine production readiness represent meaningful progress toward the project’s vision of being the universal control plane for AI agents.</p>

<p>The release follows a familiar pattern: a feature-packed stable release (<a href="/2026/05/openclaw-v2026-5-22-4100x-model-listing-meeting-notes/">v2026.5.22</a>) is followed by a hardening release that fixes edge cases, shores up performance, and closes security gaps. With v2026.5.27-beta.1 already published today (May 28) — bringing Pixverse video generation, enhanced security boundaries, and more reliable Codex runs — the release cadence remains relentless.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>npm <span class="nb">install</span> <span class="nt">-g</span> openclaw
</code></pre></div></div>

<hr />

<p><em>Openclaw v2026.5.26 is available now via <code class="language-plaintext highlighter-rouge">npm install -g openclaw</code>. Full release notes on <a href="https://github.com/openclaw/openclaw/releases/tag/v2026.5.26">GitHub</a>.</em></p>]]></content><author><name>The Agent Report</name></author><category term="openclaw" /><category term="openclaw" /><category term="claw-controller" /><category term="agent-autonomy" /><category term="transcripts" /><category term="gateway-performance" /><summary type="html"><![CDATA[Openclaw v2026.5.26 elevates transcripts from a plugin-level concern to a core system capability, ships a substantially faster gateway with smarter caching, brings four major channels to production-grade stability, and delivers richer realtime voice control. With 375,000+ GitHub stars, the project shows no signs of slowing.]]></summary></entry><entry><title type="html">DuckDuckGo Surges 28% as Users Flee Google’s AI Mode — The Great Search Rebellion?</title><link href="https://the-agent-report.com/2026/05/duckduckgo-surge-google-ai-mode-backlash/" rel="alternate" type="text/html" title="DuckDuckGo Surges 28% as Users Flee Google’s AI Mode — The Great Search Rebellion?" /><published>2026-05-28T08:00:00+00:00</published><updated>2026-05-28T08:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/duckduckgo-surge-google-ai-mode-backlash</id><content type="html" xml:base="https://the-agent-report.com/2026/05/duckduckgo-surge-google-ai-mode-backlash/"><![CDATA[<h1 id="duckduckgo-surges-28-as-users-flee-googles-ai-mode--the-great-search-rebellion">DuckDuckGo Surges 28% as Users Flee Google’s AI Mode — The Great Search Rebellion?</h1>

<p><strong>May 28, 2026</strong> — When Google CEO Sundar Pichai told investors earlier this month that users “love” the company’s new AI Mode in Search, he may have unintentionally triggered the most visible user exodus in the search market’s recent history.</p>

<p>According to data reported by <a href="https://www.pcgamer.com/hardware/duckduckgos-ai-free-search-saw-nearly-28-percent-more-visits-in-the-week-following-googles-insistence-that-people-love-ai-mode/">PC Gamer</a>, <strong>DuckDuckGo saw nearly 28% more visits in the week immediately following Google’s insistence that people love AI Mode</strong> — a signal that a significant portion of the search-using public may be looking for an alternative to the AI-first future being pushed by the major tech platforms.</p>

<h2 id="the-backlash-is-real">The Backlash Is Real</h2>

<p>The numbers are hard to ignore. DuckDuckGo’s chief communications officer Kamyl Bazbaz confirmed the surge, noting that while DuckDuckGo’s own AI overviews remain popular, so does the option to <strong>filter out AI-generated images from search results</strong>.</p>

<p>“People just want a choice,” Bazbaz told PC Gamer. “Amen to that,” the publication’s reporter added — a sentiment that appears to resonate with a growing number of search users.</p>

<p>The story gained explosive traction on Hacker News, where it garnered over <strong>835 points</strong> and 390 comments — making it one of the most-discussed stories of the day.</p>

<h2 id="what-is-google-ai-mode">What Is Google AI Mode?</h2>

<p>Google’s AI Mode, launched earlier this year, represents the company’s most aggressive push yet into AI-generated search results. Instead of displaying a traditional list of links, AI Mode generates comprehensive, conversational answers powered by Google’s Gemini models — complete with citations, follow-up suggestions, and synthesized information from multiple sources.</p>

<p>While Google frames this as a productivity enhancement — getting you an answer faster, without clicking through multiple pages — critics argue that it fundamentally undermines the web’s traffic economy. If users get their answers from AI summaries, <strong>the sites that produce the original content see fewer visits, less ad revenue, and ultimately less incentive to create</strong>.</p>

<p>This is the same tension that has plagued Google’s AI Overviews since their launch in 2024, but AI Mode takes it several steps further by making the AI response the <strong>default experience</strong> rather than a supplementary feature.</p>

<h2 id="duckduckgos-counter-positioning">DuckDuckGo’s Counter-Positioning</h2>

<p>DuckDuckGo has positioned itself as the anti-AI-search alternative. The privacy-focused search engine went viral earlier this year with its promise:</p>

<blockquote>
  <p>“Everything you do in DuckDuckGo is private, we don’t collect search histories or chats, and nothing is used for AI training.”</p>
</blockquote>

<p>This message has proven remarkably effective. In a world where every major tech company is racing to inject AI into every product surface, DuckDuckGo offers a <strong>radically simple value proposition</strong>: search that just searches. No AI summaries. No model training on your queries. No personalized tracking.</p>

<p>The 28% traffic surge suggests this message is landing with a substantial audience — not just privacy diehards, but mainstream users who find AI Mode intrusive, slow, or untrustworthy.</p>

<h2 id="the-numbers-behind-the-story">The Numbers Behind the Story</h2>

<p>The surge is particularly noteworthy given DuckDuckGo’s trajectory:</p>

<ul>
  <li><strong>Weekly visits</strong> : Baseline → <strong>+28%</strong></li>
  <li><strong>Hacker News rank</strong> : N/A → <strong>#3 trending (835 pts)</strong></li>
  <li><strong>User sentiment</strong> : Stable → <strong>Rapidly growing</strong></li>
</ul>

<p>DuckDuckGo has been steadily growing for years, but a 28% weekly spike is extraordinary for a mature search engine. For context, DuckDuckGo processed approximately <strong>3.5 billion searches per month</strong> in 2025. A 28% increase would represent nearly <strong>1 billion additional searches per month</strong> — a massive shift in user behavior.</p>

<h2 id="what-this-means-for-the-ai-agent-ecosystem">What This Means for the AI Agent Ecosystem</h2>

<p>The DuckDuckGo surge has implications beyond the search market itself. AI agents fundamentally depend on access to high-quality, up-to-date information — and the primary way they get it is through <strong>web search APIs and indexed content</strong>.</p>

<p>If the push toward AI-generated search results continues to cannibalize web traffic, we could see:</p>

<ol>
  <li><strong>A reduction in the quality of web content</strong> as publishers find it harder to monetize traffic from AI-intermediated searches</li>
  <li><strong>Increased reliance on specialized data sources</strong> (academic papers, newsletters, proprietary APIs) rather than general web search</li>
  <li><strong>A fragmentation of the search market</strong> into AI-powered and traditional segments, forcing agent developers to choose which backend to prioritize</li>
  <li><strong>Privacy-first search APIs</strong> becoming more attractive for agent developers who want to avoid their agent’s queries being used for training</li>
</ol>

<p>For agent developers building search-dependent workflows, the takeaway is clear: the search landscape is fragmenting, and relying exclusively on one provider’s API carries both technical and reputational risk.</p>

<h2 id="the-bigger-picture-users-want-agency-over-ai">The Bigger Picture: Users Want Agency Over AI</h2>

<p>The DuckDuckGo surge is part of a broader pattern. Recent polling and survey data consistently shows that while users find AI tools useful in specific contexts, there is <strong>growing resistance to AI being forced into every digital experience</strong>:</p>

<ul>
  <li><strong>YouTube</strong> announced this week it will <a href="https://blog.youtube/news-and-events/improving-ai-labels-viewers-creators/">automatically label AI-generated videos</a>, after creator backlash over undisclosed synthetic content.</li>
  <li><strong>Apple and Google</strong> are facing increasing scrutiny over <a href="https://www.jacquescorbytuech.com/writing/what-apple-and-google-are-doing-your-push-notifications">how push notifications are handled</a> — including AI-driven notification management that users didn’t ask for.</li>
  <li><strong>Google’s own data</strong> reportedly shows that AI Overviews have <strong>lower click-through rates</strong> than traditional search results, suggesting users may not actually “love” them as much as the company claims.</li>
</ul>

<p>The common thread is <strong>user agency</strong>. People want the ability to choose when AI helps them and when it stays out of the way. DuckDuckGo’s surge proves that “AI-free” is not just a niche selling point — it’s a competitive advantage.</p>

<h2 id="whats-next">What’s Next?</h2>

<p>DuckDuckGo’s 28% surge is a warning shot for Google and every other platform betting everything on AI. The message from users is clear: <strong>AI is a tool, not a mandate</strong>. The companies that treat it as optional — letting users opt in rather than forcing them to opt out — may end up winning the long game.</p>

<p>For DuckDuckGo, the challenge will be retaining these new users and proving that an AI-free search experience can keep up with the features and accuracy that users expect. For Google, the challenge is more fundamental: how do you convince users that AI Mode is valuable when every signal suggests they’re running away from it?</p>

<p>One thing is certain: the search wars just got a lot more interesting.</p>

<p><em>Sources: <a href="https://www.pcgamer.com/hardware/duckduckgos-ai-free-search-saw-nearly-28-percent-more-visits-in-the-week-following-googles-insistence-that-people-love-ai-mode/">PC Gamer — DuckDuckGo's AI-free search saw nearly 28% more visits</a> | <a href="https://news.ycombinator.com/item?id=48296649">Hacker News discussion</a></em></p>]]></content><author><name>The Agent Report</name></author><category term="industry" /><category term="duckduckgo" /><category term="google" /><category term="ai-mode" /><category term="search" /><category term="ai-backlash" /><category term="privacy" /><category term="user-choice" /><summary type="html"><![CDATA[DuckDuckGo's AI-free search saw a 28% traffic surge after Google's controversial claim that users love AI Mode — the biggest signal yet that the 'AI everything' strategy may be pushing users away.]]></summary></entry><entry><title type="html">Anthropic and OpenAI Finally Found Product-Market Fit — and It’s All About Coding Agents</title><link href="https://the-agent-report.com/2026/05/anthropic-openai-pmf-coding-agents-april-2026/" rel="alternate" type="text/html" title="Anthropic and OpenAI Finally Found Product-Market Fit — and It’s All About Coding Agents" /><published>2026-05-28T07:00:00+00:00</published><updated>2026-05-28T07:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/anthropic-openai-pmf-coding-agents-april-2026</id><content type="html" xml:base="https://the-agent-report.com/2026/05/anthropic-openai-pmf-coding-agents-april-2026/"><![CDATA[<h1 id="anthropic-and-openai-finally-found-product-market-fit--and-its-all-about-coding-agents">Anthropic and OpenAI Finally Found Product-Market Fit — and It’s All About Coding Agents</h1>

<p><strong>May 28, 2026</strong> — Is the AI industry’s massive infrastructure spend finally paying off? According to a <a href="https://simonwillison.net/2026/May/27/product-market-fit/">deeply researched analysis</a> by Simon Willison, the answer is a resounding <strong>yes</strong> — and the driver is not chatbots, not image generators, but <strong>coding agents</strong>.</p>

<p>Willison’s post, which rocketed to the top of Hacker News with over 830 points and 950 comments, argues that April 2026 marks a genuine inflection point for the frontier AI labs. “I think they’ve finally found product-market fit, with the coding/general-purpose agent products embodied by Claude Code/Cowork and Codex,” he writes.</p>

<h2 id="enterprise-customers-are-now-paying-api-prices">Enterprise Customers Are Now Paying API Prices</h2>

<p>The cornerstone of Willison’s argument is a seismic shift in how Anthropic and OpenAI charge their enterprise customers. At some point in the last six months, Anthropic switched its Enterprise plan to <strong>$20/seat/month plus API pricing for usage</strong>. OpenAI followed suit in April 2026, aligning Codex pricing with API token costs.</p>

<p>This is a dramatic departure from the flat-rate enterprise deals that characterized the 2024–2025 era. Now, companies signing year-long contracts are locked into full API prices — <strong>no more deep discounts</strong>.</p>

<blockquote>
  <p>“I currently subscribe to the $100/month Max plan from Anthropic and the $100/month Pro plan from OpenAI,” Willison notes. “I just ran the ccusage tool on my laptop to get an estimate of how much I would have spent if I were to pay for API tokens in the past 30 days and got $1,199.79 for Anthropic Claude Code and $980.37 for OpenAI Codex.”</p>
</blockquote>

<p>That’s <strong>$2,180.16 worth of tokens for $200</strong> — and Willison describes himself as a “moderately heavy user,” not someone running agents around the clock.</p>

<p>The pricing becomes even starker when you consider the latest model releases. <strong>GPT-5.5</strong> (released April 23rd) costs 2× the API price of GPT-5.4. <strong>Opus 4.7</strong> (April 16th) runs around 1.4× the price of Opus 4.6 when accounting for a new tokenizer. Enterprise customers face a double whammy: higher model prices and the removal of bulk discounts.</p>

<h2 id="why-this-is-product-market-fit">Why This Is Product-Market Fit</h2>

<p>Willison draws a crucial distinction between popularity and profitability. ChatGPT boasts more than <strong>900 million weekly active users</strong>, but only <strong>50 million</strong> — 5.6% — are paying consumer subscribers.</p>

<p>“Charging $10–$20/month per user is an OK business, but you’d need 1–2 billion subscribers sticking around for four years to cover $1 trillion in infrastructure,” he calculates.</p>

<p>Coding agents change this equation entirely. These tools burn vastly more tokens than chat interfaces, but they are quickly becoming <strong>daily drivers for extremely well-compensated professionals</strong>. Companies spending <strong>$200+/month per user</strong> — or in Willison’s power-user case, <strong>~$1,000/month per vendor</strong> — generate revenue at a scale that can meaningfully offset infrastructure costs.</p>

<blockquote>
  <p>“Coding agents really did change everything. These are tools which burn vastly more tokens, but are also quickly becoming daily drivers for the work carried out by extremely well-compensated professionals.”</p>
</blockquote>

<p>The models released in <strong>November 2025</strong> — GPT-5.1 and Opus 4.5 combined with their respective coding agent harnesses — elevated agents to being genuinely useful. We’ve now had six months for organizations to integrate these tools into their workflows, and the spending is following.</p>

<h2 id="the-ramp-up-enterprise-sales-teams-are-growing-fast">The Ramp-Up: Enterprise Sales Teams Are Growing Fast</h2>

<p>As further evidence, Willison points to the open job listings at both companies:</p>

<ul>
  <li><strong>OpenAI</strong>: 703 open jobs, of which <strong>229 (32.6%)</strong> relate to enterprise sales and support — account executives, “Go To Market” roles, and Forward Deployed Engineers.</li>
  <li><strong>Anthropic</strong>: 390 open jobs, with <strong>105 (26.9%)</strong> in enterprise-facing roles.</li>
</ul>

<p>“It’s pleasingly ironic that these AI labs have picked a business model with such a heavy demand on human labor — enterprise sales contracts don’t close themselves without a whole lot of humans in the mix!” Willison observes.</p>

<p>Notably, he conducted this analysis <strong>using Claude Code itself</strong> — scraping job sites, piping data into Datasette Cloud, and analyzing with Datasette Agent. Full dogfooding.</p>

<h2 id="the-ai-failure-stories-are-actually-evidence-of-pmf">The “AI Failure” Stories Are Actually Evidence of PMF</h2>

<p>The narrative around companies being “shocked” by their AI bills — most notably Uber reportedly maxing out its full-year AI budget just months into 2026 — is actually <strong>evidence for the PMF thesis</strong>, not against it.</p>

<blockquote>
  <p>“The best advice I ever heard on pricing a product was that your customer should suck air through their teeth and then say yes. Uber’s budget overrun and Microsoft’s seat cancellations look like that effect playing out in practice.”</p>
</blockquote>

<p>Microsoft’s decision to cancel Claude Code licenses — ostensibly to encourage dogfooding of Copilot CLI — was also reported to be a financial decision triggered by the June 30th end of Microsoft’s fiscal year. When your customers are making billion-dollar budget allocation decisions about your product, you’ve found product-market fit.</p>

<h2 id="the-colossus-deal-changes-everything">The Colossus Deal Changes Everything</h2>

<p>Perhaps the most staggering data point comes from an unexpected source: <strong>SpaceX’s recent S-1 filing</strong> revealed that Anthropic signed a deal for cloud services worth <strong>$1.25 billion per month through May 2029</strong> for access to compute capacity across the Colossus and Colossus II clusters.</p>

<p>The Anthropic announcement indicated this deal would allow them to “increase our usage limits for Claude Code and the Claude API,” heavily implying Colossus is being used for inference, not training. Given that Anthropic already has vast compute from other providers, the willingness to spend $1.25 billion/month from just one vendor hints at the enormous scale of inference budgets today.</p>

<h2 id="a-two-inflection-point-story">A Two-Inflection-Point Story</h2>

<p>Willison identifies two critical inflection points:</p>

<ol>
  <li>
    <p><strong>November 2025</strong> — The <em>capability</em> inflection point, when GPT-5.1 and Opus 4.5, combined with their coding agent harnesses, became genuinely useful for real work.</p>
  </li>
  <li>
    <p><strong>April 2026</strong> — The <em>revenue</em> inflection point, when the enterprise pricing shift and the resulting budget impacts made clear that these are real businesses, not just research projects.</p>
  </li>
</ol>

<blockquote>
  <p>“We’ll know for sure how real this moment is when the S-1 documents for the upcoming Anthropic and OpenAI IPOs give us some real, audited numbers to get our teeth into.”</p>
</blockquote>

<h2 id="what-this-means-for-the-agent-ecosystem">What This Means for the Agent Ecosystem</h2>

<p>For the broader AI agent ecosystem, the implications are profound:</p>

<ul>
  <li><strong>Cursor and Copilot face direct competition</strong> from Anthropic and OpenAI’s own agent products. No wonder Cursor is investing in their own models.</li>
  <li><strong>Enterprise pricing at API rates</strong> means the cost of running AI agents at scale is now transparent and predictable — but expensive.</li>
  <li><strong>The middleman squeeze</strong> is real: Anthropic’s Claude Code directly competes with the very tools that were previously Anthropic’s biggest API customers (Cursor and GitHub Copilot were reportedly responsible for $1.2 billion of Anthropic’s then-$4 billion revenue in 2025).</li>
  <li><strong>Infrastructure providers</strong> (CoreWeave, Lambda, and now SpaceX) become critical — and their own IPOs will provide visibility into the AI industry’s true scale.</li>
</ul>

<p>For developers and enterprises building on AI agents, the message is clear: the era of cheap, flat-rate agentic automation is over. But the value these tools deliver is now proven enough that organizations are willing to pay real money. That’s not a bug — it’s product-market fit.</p>

<p><em>Read the full original analysis: <a href="https://simonwillison.net/2026/May/27/product-market-fit/">"I think Anthropic and OpenAI have found product-market fit" by Simon Willison</a></em></p>]]></content><author><name>The Agent Report</name></author><category term="industry" /><category term="anthropic" /><category term="openai" /><category term="product-market-fit" /><category term="coding-agents" /><category term="claude-code" /><category term="codex" /><category term="ai-economics" /><category term="enterprise-ai" /><summary type="html"><![CDATA[Simon Willison makes the case that Anthropic and OpenAI have finally found genuine product-market fit — through coding agents. With enterprise pricing switching to API rates and companies spending $200+/month per user, April 2026 marks a new revenue inflection point for frontier AI labs.]]></summary></entry><entry><title type="html">AI Agent Terminology: 55+ Terms You Need to Know in 2026</title><link href="https://the-agent-report.com/2026/05/ai-agent-glossary-55-terms/" rel="alternate" type="text/html" title="AI Agent Terminology: 55+ Terms You Need to Know in 2026" /><published>2026-05-27T22:00:00+00:00</published><updated>2026-05-27T22:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/ai-agent-glossary-55-terms</id><content type="html" xml:base="https://the-agent-report.com/2026/05/ai-agent-glossary-55-terms/"><![CDATA[<p>The AI agent landscape has exploded in 2026. New frameworks launch weekly, protocols are being ratified in real time, and the vocabulary is evolving faster than most of us can keep up with. Whether you’re reading a research paper, evaluating a vendor, or debugging a multi-agent system at 2 a.m., you’ve probably hit a term that made you pause and think, <em>“Wait — what exactly does that mean in an agent context?”</em> For a deeper dive into the architectures and frameworks these terms describe, our <a href="/2026/05/complete-guide-to-ai-agents-2026/">Complete Guide to AI Agents</a> provides the full technical context.</p>

<p>This glossary is built for that moment. It covers the terms you’ll actually encounter: the core concepts that define how agents work, the frameworks everyone is debating on Hacker News, the technical primitives that power production systems, the safety vocabulary that regulators and red teams use, and the enterprise terminology that’s shaping how companies adopt agentic AI.</p>

<p>We’ve aimed for clarity over exhaustiveness. Every definition is 1–3 sentences, written in plain English, and grounded in how the term is used in practice — not in a whitepaper. Think of this as a field guide, not an encyclopedia. For a historical perspective on how these frameworks evolved, our <a href="/2025/04/open-source-agent-frameworks-comparison/">2025 open-source agent framework comparison</a> shows where the landscape stood before the 2026 explosion.</p>

<hr />

<h2 id="core-concepts">Core Concepts</h2>

<h3 id="ai-agent">AI Agent</h3>
<p>A software system that uses a large language model (LLM) as its reasoning engine to perceive its environment, make decisions, and take actions autonomously. Unlike a chatbot, an agent can plan multi-step tasks, use external tools, and adapt its behavior based on outcomes.</p>

<h3 id="autonomous-agent">Autonomous Agent</h3>
<p>An AI agent capable of operating with minimal or no human supervision over extended periods. Autonomous agents set their own sub-goals, recover from errors without intervention, and persist across sessions — key for production workloads like customer support triage or infrastructure monitoring.</p>

<h3 id="multi-agent-system-mas">Multi-Agent System (MAS)</h3>
<p>An architecture where multiple AI agents collaborate, compete, or negotiate to solve problems that are too complex for a single agent. Each agent may have a specialized role (researcher, coder, reviewer) and the system includes protocols for communication, task delegation, and conflict resolution.</p>

<h3 id="agentic-ai">Agentic AI</h3>
<p>A term describing AI systems that exhibit goal-directed, autonomous behavior — the quality of <em>being an agent</em> rather than a passive tool. Agentic AI implies planning, tool use, memory, and the ability to pursue objectives over multiple steps without step-by-step human prompting.</p>

<h3 id="tool-use">Tool Use</h3>
<p>The ability of an AI agent to invoke external functions, APIs, or software tools to accomplish tasks beyond text generation. Tools can include web search, code execution, file system operations, database queries, or any external capability exposed through a defined interface.</p>

<h3 id="function-calling">Function Calling</h3>
<p>A specific mechanism by which an LLM outputs structured data (typically JSON) that triggers a predefined function in the host application. Function calling is the most common implementation pattern for tool use — the model decides <em>which</em> function to call and <em>with what arguments</em> based on the user’s intent.</p>

<h3 id="reasoning">Reasoning</h3>
<p>The cognitive process by which an LLM breaks down complex problems, evaluates alternatives, and draws logical conclusions before acting. Advanced reasoning techniques — like step-by-step decomposition and self-verification — are what separate simple instruction-following from genuine agentic behavior.</p>

<h3 id="planning">Planning</h3>
<p>An agent’s ability to decompose a high-level goal into a sequence of actionable steps before execution. Effective planning involves anticipating dependencies, ordering tasks correctly, and dynamically re-planning when intermediate steps fail or produce unexpected results.</p>

<h3 id="memory-short-term--long-term">Memory (Short-Term / Long-Term)</h3>
<p><strong>Short-term memory</strong> refers to context held within the model’s context window during a single session — the current conversation, recent tool outputs, and in-flight reasoning. <strong>Long-term memory</strong> persists across sessions via external storage (vector databases, knowledge graphs, or structured logs), allowing agents to remember user preferences, past decisions, and learned patterns over days or months.</p>

<h3 id="rag-retrieval-augmented-generation">RAG (Retrieval-Augmented Generation)</h3>
<p>A technique that grounds an LLM’s responses in external knowledge by retrieving relevant documents from a database before generating an answer. In agent systems, RAG is often used as a tool — the agent queries a knowledge base, retrieves context, and uses that context to inform decisions or responses, reducing hallucination on factual queries.</p>

<h3 id="orchestration">Orchestration</h3>
<p>The coordination layer that manages how multiple agents, tools, and workflows interact within a larger system. Orchestration handles task routing, dependency management, state tracking, and error handling — it’s the conductor that keeps a multi-agent system from descending into chaos.</p>

<h3 id="agent-loop">Agent Loop</h3>
<p>The core execution cycle of an AI agent: observe (gather information from the environment or tool outputs), reason (analyze and decide what to do next), act (execute a tool call or produce output), and observe again. The loop repeats until the agent determines the task is complete or a termination condition is met.</p>

<h3 id="react-reasoning--acting">ReAct (Reasoning + Acting)</h3>
<p>A prompting and execution pattern where the agent interleaves reasoning traces with concrete actions. Instead of thinking fully and then acting, the agent thinks a step, acts, observes the result, thinks about the result, and acts again — producing more grounded and correctable behavior than pure chain-of-thought approaches.</p>

<h3 id="chain-of-thought-cot">Chain-of-Thought (CoT)</h3>
<p>A prompting technique that instructs the LLM to produce intermediate reasoning steps before giving a final answer. By verbalizing its thinking, the model often achieves higher accuracy on complex reasoning tasks — and makes its decision process interpretable to human observers.</p>

<h3 id="tree-of-thought-tot">Tree-of-Thought (ToT)</h3>
<p>An extension of chain-of-thought where the LLM explores multiple reasoning paths simultaneously, evaluates them, and prunes unpromising branches — much like a search algorithm. Tree-of-thought is especially powerful for planning and problem-solving tasks where the agent must consider several possible strategies before committing.</p>

<hr />

<h2 id="frameworks--platforms">Frameworks &amp; Platforms</h2>

<h3 id="langchain">LangChain</h3>
<p>An open-source framework for building LLM-powered applications with a focus on composability. LangChain provides abstractions for chains, agents, tools, and memory, along with a growing ecosystem of integrations — making it one of the most widely adopted starting points for agent development.</p>

<h3 id="autogen">AutoGen</h3>
<p>Microsoft’s open-source multi-agent conversation framework. AutoGen lets developers define specialized agents that communicate through structured conversations, with built-in support for human-in-the-loop patterns, code execution sandboxes, and group chat topologies.</p>

<h3 id="crewai">CrewAI</h3>
<p>A Python framework for orchestrating role-based AI agents that work together as a “crew.” CrewAI assigns each agent a defined role, goal, and backstory, then manages sequential or hierarchical task execution — popular for rapid prototyping of multi-agent workflows.</p>

<h3 id="openai-agents-sdk">OpenAI Agents SDK</h3>
<p>OpenAI’s official software development kit for building, testing, and deploying AI agents. The SDK provides primitives for tool definitions, guardrails, handoffs between agents, and tracing — designed to work natively with OpenAI models and the Responses API.</p>

<h3 id="claude-anthropic">Claude (Anthropic)</h3>
<p>Anthropic’s family of frontier LLMs, widely used as the reasoning engine in agent systems. Claude models are known for strong instruction following, long context windows (up to 200K tokens), native tool-use capabilities, and safety-focused design principles that make them popular for production agent deployments.</p>

<h3 id="hermes-agent">Hermes Agent</h3>
<p>An open-source AI agent runtime and personal assistant framework by Nous Research, designed to give users full control over their agent’s skills, plugins, memory, and model backend. Hermes Agent emphasizes local-first operation, cross-platform support, and a community-driven ecosystem of shareable skills and profiles.</p>

<h3 id="openclaw">Openclaw</h3>
<p>An open-source personal AI agent platform focused on multi-channel communication (Telegram, Discord, WhatsApp, Slack, email, voice) with a plugin architecture. Openclaw emphasizes multi-profile management, policy plugins for compliance, and the ability to run entirely on user-owned infrastructure.</p>

<h3 id="haystack">Haystack</h3>
<p>An open-source NLP framework by deepset for building search and retrieval pipelines. In the agent ecosystem, Haystack is commonly used to implement RAG backends, document processing, and knowledge retrieval — often as a tool invoked by higher-level agent frameworks.</p>

<h3 id="semantic-kernel">Semantic Kernel</h3>
<p>Microsoft’s open-source SDK for integrating LLMs into applications with an emphasis on enterprise scenarios. Semantic Kernel provides a plugin model, orchestration patterns, and native integration with the Microsoft ecosystem (Azure, Copilot, Teams).</p>

<h3 id="microsoft-copilot-studio">Microsoft Copilot Studio</h3>
<p>A low-code platform for building custom AI copilots and agents within the Microsoft 365 ecosystem. Copilot Studio enables organizations to create agents that work across Teams, SharePoint, Dynamics 365, and Power Platform — with built-in connectors to enterprise data sources.</p>

<hr />

<h2 id="technical">Technical</h2>

<h3 id="mcp-model-context-protocol">MCP (Model Context Protocol)</h3>
<p>An open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP defines a client-server architecture where agents (clients) discover and invoke capabilities exposed by MCP servers — analogous to how USB standardized peripheral connections, but for AI tool integration.</p>

<h3 id="acp-agent-communication-protocol">ACP (Agent Communication Protocol)</h3>
<p>An emerging standard for how AI agents communicate with each other across different frameworks and platforms. ACP aims to solve agent-to-agent interoperability — allowing a LangChain agent to delegate work to an AutoGen agent using a common message format, capability discovery mechanism, and security model.</p>

<h3 id="gguf">GGUF</h3>
<p>A file format for storing quantized LLM weights, widely used in the local and open-source model ecosystem. GGUF enables running large models on consumer hardware by bundling model architecture metadata with compressed weights that tools like llama.cpp can load efficiently.</p>

<h3 id="lora-low-rank-adaptation">LoRA (Low-Rank Adaptation)</h3>
<p>A parameter-efficient fine-tuning technique that adds small, trainable adapter layers to a pre-trained model rather than modifying all weights. LoRA makes it practical to customize foundation models for specific agent tasks — like tool-calling or domain-specific reasoning — at a fraction of the cost and storage of full fine-tuning.</p>

<h3 id="quantization">Quantization</h3>
<p>The process of reducing the numerical precision of a model’s weights (e.g., from 16-bit to 4-bit) to decrease memory usage and inference latency. Quantization is essential for running capable agent models on edge devices, laptops, and cost-constrained cloud instances.</p>

<h3 id="fine-tuning">Fine-tuning</h3>
<p>The process of further training a pre-trained LLM on a curated dataset to improve performance on a specific task or domain. In agent development, fine-tuning is used to improve tool-calling accuracy, teach domain-specific reasoning, or align model behavior with enterprise policies.</p>

<h3 id="structured-output">Structured Output</h3>
<p>A capability where the LLM generates responses in a guaranteed format (typically JSON conforming to a schema) rather than free-form text. Structured output is critical for agent systems because tool calls, data extraction, and agent-to-agent messages must be machine-parseable with zero tolerance for malformed syntax.</p>

<h3 id="json-mode">JSON Mode</h3>
<p>A specific LLM feature that constrains the model’s output to valid JSON. While less rigorous than full structured output with schema validation, JSON mode is widely supported and sufficient for many agent tool-calling implementations.</p>

<h3 id="rate-limiting">Rate Limiting</h3>
<p>A mechanism that restricts how many requests an agent can make to an API or service within a given time window. Proper rate-limit handling — with exponential backoff, queuing, and graceful degradation — is essential for production agents that call external APIs without overwhelming them or exhausting budgets.</p>

<h3 id="token">Token</h3>
<p>The atomic unit of text that an LLM processes — roughly corresponding to a word fragment (~4 characters in English). Token count determines context window usage, API pricing, and latency, making token-aware design critical for cost-efficient agents that handle long conversations or large documents.</p>

<h3 id="context-window">Context Window</h3>
<p>The maximum number of tokens an LLM can process in a single forward pass, encompassing the system prompt, conversation history, tool outputs, and the current query. Modern agents rely on large context windows (128K–200K tokens) to maintain coherence across long, multi-turn interactions — but must still manage context strategically to avoid hitting limits.</p>

<h3 id="embedding">Embedding</h3>
<p>A numerical vector representation of text, images, or other data that captures semantic meaning in a high-dimensional space. Embeddings enable agents to perform similarity search, clustering, and retrieval — the mathematical foundation behind semantic memory and RAG systems.</p>

<h3 id="vector-database">Vector Database</h3>
<p>A specialized database optimized for storing and querying high-dimensional vectors (embeddings). Vector databases power the “retrieval” half of RAG by enabling fast nearest-neighbor search across millions of documents — letting agents find semantically relevant information even when keywords don’t match.</p>

<h3 id="agent-to-agent-communication">Agent-to-Agent Communication</h3>
<p>The mechanisms by which AI agents exchange information, delegate tasks, and coordinate actions. This can range from simple structured message passing to sophisticated protocols involving capability discovery, negotiation, and shared memory — and is a central challenge in multi-agent system design.</p>

<hr />

<h2 id="safety--alignment">Safety &amp; Alignment</h2>

<h3 id="alignment">Alignment</h3>
<p>The field of ensuring that AI systems behave in accordance with human values, intentions, and safety constraints. In agent systems, alignment means the agent pursues its goals without causing unintended harm — even when the shortest path to the goal would violate ethical or operational boundaries.</p>

<h3 id="rlhf-reinforcement-learning-from-human-feedback">RLHF (Reinforcement Learning from Human Feedback)</h3>
<p>A training technique where human evaluators rank model outputs and those rankings are used to train a reward model that fine-tunes the LLM via reinforcement learning. RLHF has been the dominant approach for teaching models to be helpful, harmless, and aligned with user intent.</p>

<h3 id="constitutional-ai">Constitutional AI</h3>
<p>Anthropic’s alignment methodology where an AI is trained to follow a written “constitution” of principles rather than relying solely on human feedback. The model self-critiques and revises its outputs against these principles, enabling scalable oversight without requiring humans to review every output.</p>

<h3 id="red-teaming">Red Teaming</h3>
<p>The adversarial practice of probing an AI system for vulnerabilities, harmful behaviors, or alignment failures before deployment. Red teams simulate attacks — from prompt injection to social engineering — to identify weaknesses that need to be addressed via guardrails, fine-tuning, or architectural changes.</p>

<h3 id="prompt-injection">Prompt Injection</h3>
<p>A security attack where malicious instructions are embedded in data that an agent processes (e.g., a web page, email, or document), causing the agent to disregard its original instructions and follow the attacker’s commands. Prompt injection is one of the most challenging unsolved security problems in agent systems.</p>

<h3 id="guardrails">Guardrails</h3>
<p>Protective constraints placed around an agent’s behavior — implemented as input filters, output validators, or runtime monitors. Guardrails can enforce content policies, prevent harmful actions, validate tool calls against schemas, and ensure the agent stays within its defined operational boundaries.</p>

<h3 id="sandboxing">Sandboxing</h3>
<p>The practice of running agent code execution, tool invocations, or entire agent instances in isolated environments with restricted permissions. Sandboxing prevents an agent from causing damage if it makes a mistake or is compromised — critical for agents that execute arbitrary code or access file systems.</p>

<h3 id="agent-safety">Agent Safety</h3>
<p>The interdisciplinary field concerned with ensuring that autonomous AI agents operate reliably, predictably, and without causing harm — even in unexpected situations. Agent safety encompasses alignment, robustness, monitoring, and the design of “off-switch” mechanisms that remain under human control.</p>

<h3 id="interpretability">Interpretability</h3>
<p>The study of understanding <em>why</em> an AI model made a specific decision, by examining its internal representations, attention patterns, or reasoning traces. In agent systems, interpretability is essential for debugging failures, building trust with users, and satisfying regulatory requirements for explainable AI.</p>

<h3 id="jailbreaking">Jailbreaking</h3>
<p>The practice of circumventing an AI system’s safety restrictions through crafted prompts, role-playing scenarios, or encoding tricks. Agent systems face heightened jailbreak risk because their tool-use and multi-step reasoning capabilities create larger attack surfaces for bypassing guardrails.</p>

<hr />

<h2 id="enterprise--industry">Enterprise &amp; Industry</h2>

<h3 id="sla-service-level-agreement">SLA (Service Level Agreement)</h3>
<p>A contractual commitment defining the expected performance, availability, and reliability of an AI agent service. For production agents, SLAs cover uptime (e.g., 99.9%), response latency, accuracy thresholds, and escalation procedures — critical for enterprise procurement and vendor evaluation.</p>

<h3 id="rpa-robotic-process-automation">RPA (Robotic Process Automation)</h3>
<p>A technology for automating structured, rule-based business processes — such as data entry, invoice processing, or form submission. While traditional RPA follows fixed scripts, the industry is converging with AI agents to create “intelligent automation” that handles exceptions and unstructured data.</p>

<h3 id="erp-agent">ERP Agent</h3>
<p>An AI agent integrated with Enterprise Resource Planning systems (SAP, Oracle, Microsoft Dynamics) to automate workflows like order-to-cash, procurement, and financial close. ERP agents represent one of the largest enterprise adoption vectors for agentic AI, with SAP deploying 200+ production agents in 2026.</p>

<h3 id="autonomous-enterprise">Autonomous Enterprise</h3>
<p>A vision of the future organization where AI agents handle the majority of operational, analytical, and decision-support tasks — with humans shifting to strategic oversight, exception handling, and creative direction. The autonomous enterprise is the endpoint of the agent adoption curve that began with RPA and is accelerating through LLM-powered agents.</p>

<h3 id="digital-worker">Digital Worker</h3>
<p>A term used in enterprise contexts to describe an AI agent that performs a specific job function — analogous to a human employee. Digital workers have defined roles, performance metrics, access permissions, and escalation paths, and are increasingly managed alongside human teams in workforce orchestration platforms.</p>

<h3 id="compliance">Compliance</h3>
<p>The requirement that AI agent systems adhere to regulatory frameworks (GDPR, SOC 2, HIPAA, EU AI Act), industry standards, and internal governance policies. Compliance covers data handling, decision auditability, bias monitoring, and the ability to explain agent actions to regulators and auditors.</p>

<h3 id="observability">Observability</h3>
<p>The practice of instrumenting agent systems to understand their internal state through logs, metrics, traces, and dashboards. In agent contexts, observability goes beyond traditional APM — it must capture reasoning chains, tool-call sequences, memory access patterns, and multi-agent interactions to enable debugging and optimization.</p>

<hr />

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is an AI agent?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "An AI agent is a software system that uses a large language model (LLM) as its reasoning engine to perceive its environment, make decisions, and take actions autonomously. Unlike a chatbot, an agent can plan multi-step tasks, use external tools, and adapt its behavior based on outcomes."
      }
    },
    {
      "@type": "Question",
      "name": "What is the difference between an AI agent and a chatbot?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "A chatbot responds to individual prompts in a stateless, turn-by-turn manner. An AI agent maintains state, plans multi-step tasks, uses tools (like APIs, databases, or code execution), and pursues goals autonomously across multiple interactions — often without step-by-step human guidance."
      }
    },
    {
      "@type": "Question",
      "name": "What are the most popular AI agent frameworks in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "The leading frameworks include LangChain, AutoGen (Microsoft), CrewAI, the OpenAI Agents SDK, Hermes Agent (Nous Research), Openclaw, Semantic Kernel (Microsoft), and Haystack. Each framework has different strengths — from rapid prototyping to production-grade enterprise deployments."
      }
    },
    {
      "@type": "Question",
      "name": "What is MCP (Model Context Protocol)?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "MCP is an open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. It defines a client-server architecture where agents discover and invoke capabilities — analogous to USB for AI tool integration — and is rapidly becoming the industry standard for agent-tool connectivity."
      }
    },
    {
      "@type": "Question",
      "name": "What is RAG in the context of AI agents?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "RAG (Retrieval-Augmented Generation) grounds an LLM's responses in external knowledge by retrieving relevant documents from a database before generating an answer. In agent systems, RAG is often used as a tool — the agent queries a knowledge base, retrieves context, and uses that context to inform decisions or responses."
      }
    },
    {
      "@type": "Question",
      "name": "What is prompt injection and why is it dangerous for agents?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Prompt injection is a security attack where malicious instructions are embedded in data an agent processes — such as a web page, email, or document. It is especially dangerous for agents because their tool-use capabilities create a larger attack surface, and a compromised agent could execute harmful actions through its connected tools and APIs."
      }
    },
    {
      "@type": "Question",
      "name": "How do multi-agent systems work?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Multi-agent systems (MAS) use multiple specialized AI agents that collaborate, communicate, and coordinate to solve complex problems. Each agent typically has a defined role, and the system includes protocols for task delegation, information sharing, conflict resolution, and orchestration to produce coherent outcomes."
      }
    },
    {
      "@type": "Question",
      "name": "What is the difference between fine-tuning and RAG?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Fine-tuning permanently modifies a model's weights by training on domain-specific data, improving its inherent capabilities. RAG keeps the model unchanged but retrieves relevant external information at query time. Fine-tuning is better for teaching new skills or formats; RAG is better for giving access to frequently updated knowledge without retraining."
      }
    },
    {
      "@type": "Question",
      "name": "What is sandboxing in AI agent systems?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Sandboxing runs agent code execution, tool invocations, or entire agent instances in isolated environments with restricted permissions. It prevents an agent from causing damage if it makes a mistake or is compromised — critical for agents that execute arbitrary code, access file systems, or interact with production infrastructure."
      }
    },
    {
      "@type": "Question",
      "name": "What does an autonomous enterprise look like?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "An autonomous enterprise is an organization where AI agents handle the majority of operational, analytical, and decision-support tasks, with humans shifting to strategic oversight and exception handling. It represents the endpoint of the automation journey from RPA through AI agents, where agents run core business processes end-to-end with minimal human intervention."
      }
    }
  ]
}
</script>]]></content><author><name>The Agent Report</name></author><category term="research" /><category term="glossary" /><category term="ai-agents" /><category term="reference" /><category term="terminology" /><category term="beginners-guide" /><summary type="html"><![CDATA[Your go-to glossary of 55+ essential AI agent terms — from Agent Loop to Vector Database, MCP to RLHF. Clear definitions for developers and tech professionals navigating the agentic AI landscape in 2026.]]></summary></entry><entry><title type="html">Anthropic Launches Project Glasswing — Claude Mythos Preview, $100M Cyber Defense Initiative with AWS, Apple, Google, Microsoft, and NVIDIA</title><link href="https://the-agent-report.com/2026/05/anthropic-project-glasswing-claude-mythos-preview-cybersecurity-may27/" rel="alternate" type="text/html" title="Anthropic Launches Project Glasswing — Claude Mythos Preview, $100M Cyber Defense Initiative with AWS, Apple, Google, Microsoft, and NVIDIA" /><published>2026-05-27T14:00:00+00:00</published><updated>2026-05-27T14:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/anthropic-project-glasswing-claude-mythos-preview-cybersecurity-may27</id><content type="html" xml:base="https://the-agent-report.com/2026/05/anthropic-project-glasswing-claude-mythos-preview-cybersecurity-may27/"><![CDATA[<style>
.highlight-box { background: #1a1a2e; border-left: 4px solid #fdcb6e; padding: 1.2rem 1.5rem; margin: 1.5rem 0; border-radius: 0 8px 8px 0; }
.highlight-box p { margin: 0; }
.stat-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin: 1.5rem 0; }
.stat-card { background: #1a1a2e; border-radius: 8px; padding: 1rem; text-align: center; border: 1px solid #2a2a3e; }
.stat-card .stat-value { font-size: 1.6rem; font-weight: bold; color: #fdcb6e; }
.stat-card .stat-label { font-size: 0.85rem; color: #888; margin-top: 0.3rem; }
</style>

<p><strong>Anthropic today announced <a href="https://www.anthropic.com/glasswing">Project Glasswing</a></strong>, the most ambitious cross-industry cybersecurity initiative ever mounted by an AI company. The project brings together <strong>AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks</strong> around a single goal: securing the world’s most critical software before AI-augmented attackers can exploit it.</p>

<p>At the heart of Glasswing is <strong>Claude Mythos Preview</strong> — a new, unreleased frontier model that Anthropic describes as having crossed a threshold where AI can “surpass all but the most skilled humans at finding and exploiting software vulnerabilities.” The model has already discovered <strong>thousands of high-severity vulnerabilities</strong>, including critical flaws in every major operating system and web browser.</p>

<div class="stat-grid">
<div class="stat-card"><div class="stat-value">$100M</div><div class="stat-label">Model Usage Credits</div></div>
<div class="stat-card"><div class="stat-value">$4M</div><div class="stat-label">Open Source Donations</div></div>
<div class="stat-card"><div class="stat-value">12</div><div class="stat-label">Launch Partners</div></div>
<div class="stat-card"><div class="stat-value">40+</div><div class="stat-label">Additional Participants</div></div>
</div>

<hr />

<h2 id="what-claude-mythos-preview-found">What Claude Mythos Preview Found</h2>

<p>The model’s capabilities were demonstrated through a series of striking vulnerability discoveries conducted entirely autonomously:</p>

<p><strong>🔴 A 27-year-old vulnerability in OpenBSD</strong> — one of the most security-hardened operating systems in the world, used to run firewalls and critical infrastructure. The flaw allowed an attacker to remotely crash any machine running the OS just by connecting to it.</p>

<p><strong>🔴 A 16-year-old vulnerability in FFmpeg</strong> — the ubiquitous video encoding library used by countless applications. The vulnerable line of code had been hit <strong>five million times</strong> by automated testing tools without ever triggering a detection.</p>

<p><strong>🔴 A chain of vulnerabilities in the Linux kernel</strong> — the software powering most of the world’s servers. Mythos autonomously found and chained together several flaws to escalate from ordinary user access to complete system control.</p>

<p>All reported vulnerabilities have been patched by the respective maintainers.</p>

<hr />

<h2 id="benchmark-performance">Benchmark Performance</h2>

<p>Claude Mythos Preview’s security-specific capabilities far exceed any publicly evaluated model:</p>

<table>
  <thead>
    <tr>
      <th>Benchmark</th>
      <th>Mythos Preview</th>
      <th>Opus 4.6</th>
      <th>Improvement</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>CyberGym (Vulnerability Reproduction)</td>
      <td><strong>83.1%</strong></td>
      <td>66.6%</td>
      <td>+16.5 pp</td>
    </tr>
    <tr>
      <td>SWE-bench Verified</td>
      <td><strong>93.9%</strong></td>
      <td>80.8%</td>
      <td>+13.1 pp</td>
    </tr>
    <tr>
      <td>SWE-bench Pro (Agentic Coding)</td>
      <td><strong>77.8%</strong></td>
      <td>53.4%</td>
      <td>+24.4 pp</td>
    </tr>
    <tr>
      <td>Terminal-Bench 2.0</td>
      <td><strong>82.0%</strong></td>
      <td>65.4%</td>
      <td>+16.6 pp</td>
    </tr>
    <tr>
      <td>GPQA Diamond (Reasoning)</td>
      <td><strong>94.6%</strong></td>
      <td>91.3%</td>
      <td>+3.3 pp</td>
    </tr>
    <tr>
      <td>Humanity’s Last Exam (with tools)</td>
      <td><strong>64.7%</strong></td>
      <td>53.1%</td>
      <td>+11.6 pp</td>
    </tr>
    <tr>
      <td>OSWorld-Verified (Computer Use)</td>
      <td><strong>79.6%</strong></td>
      <td>72.7%</td>
      <td>+6.9 pp</td>
    </tr>
  </tbody>
</table>

<p>The model’s <strong>SWE-bench Pro score of 77.8%</strong> is particularly notable — it represents a <strong>24.4 percentage point leap</strong> over Opus 4.6, reflecting Mythos’ ability to handle complex, multi-step software engineering tasks autonomously.</p>

<hr />

<h2 id="how-project-glasswing-works">How Project Glasswing Works</h2>

<p>The initiative is structured around <strong>defensive deployment</strong> of Claude Mythos Preview:</p>

<ul>
  <li><strong>Launch partners</strong> (AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, NVIDIA, Palo Alto Networks) receive direct access to Mythos Preview for scanning their foundational systems — codebases representing a large share of the world’s shared cyberattack surface.</li>
  <li><strong>40+ additional organizations</strong> that build or maintain critical software infrastructure can apply for access to scan both first-party and open-source systems.</li>
  <li><strong>$100M in model usage credits</strong> from Anthropic covers substantial usage throughout the research preview period.</li>
  <li><strong>$4M in donations</strong> to open-source security organizations: $2.5M to Alpha-Omega and OpenSSF (via the Linux Foundation) and $1.5M to the Apache Software Foundation.</li>
  <li>After the preview period, Mythos Preview will be available to participants at <strong>$25/$125 per million input/output tokens</strong> on the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry.</li>
</ul>

<hr />

<h2 id="the-urgency-why-now">The Urgency: Why Now?</h2>

<p>Anthropic’s announcement makes a compelling case for urgency. The core argument:</p>

<blockquote>
  <p>“Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely.”</p>
</blockquote>

<p>The company notes that <strong>Claude Mythos Preview is not yet generally available</strong> and will not be made broadly accessible. Instead, the model’s cyber capabilities are being channeled exclusively through Project Glasswing for defensive purposes. Anthropic plans to develop and refine safety safeguards with an upcoming Claude Opus model before considering broader deployment of Mythos-class capabilities.</p>

<hr />

<h2 id="industry-response">Industry Response</h2>

<p>The breadth of industry participation is remarkable for a single AI company initiative:</p>

<ul>
  <li><strong>Cisco</strong> (Anthony Grieco, SVP &amp; Chief Security &amp; Trust Officer): “AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure from cyber threats, and there is no going back.”</li>
  <li><strong>AWS</strong> (Amy Herzog, VP and CISO): “Our teams analyze over 400 trillion network flows every day for threats, and AI is central to our ability to defend at scale.”</li>
  <li><strong>Microsoft</strong> (Igor Tsyganskiy, EVP of Cybersecurity): “When tested against CTI-REALM, our open-source security benchmark, Claude Mythos Preview showed substantial improvements compared to previous models.”</li>
  <li><strong>Google</strong> (Heather Adkins, VP of Security Engineering): “We have long believed that AI poses new challenges and opens new opportunities in cyber defense.”</li>
  <li><strong>Linux Foundation</strong> (Jim Zemlin, CEO): “By giving the maintainers of critical open source codebases access to a new generation of AI models that can proactively identify and fix vulnerabilities at scale, Project Glasswing offers a credible path to changing that equation.”</li>
</ul>

<hr />

<h2 id="what-this-means-for-the-agent-ecosystem">What This Means for the Agent Ecosystem</h2>

<p>Project Glasswing has implications beyond cybersecurity:</p>

<ol>
  <li>
    <p><strong>A new capability tier is confirmed.</strong> Mythos Preview’s benchmark scores — particularly the 24.4pp jump on SWE-bench Pro — validate that a significant capability leap exists beyond <a href="/2026/05/claude-opus-4-7-launch/">Claude Opus 4.7</a>. This is the model that will inform Anthropic’s next general-purpose release.</p>
  </li>
  <li>
    <p><strong>Agentic security is now a first-class use case.</strong> Autonomous vulnerability discovery and patching is one of the highest-value agent applications yet demonstrated. The model found vulnerabilities without human steering, wrote exploits autonomously, and in some cases chained multiple bugs together — all capabilities that transfer directly to non-security agent tasks. This autonomous security capability underscores the concerns raised in our <a href="/2026/05/agent-safety-trust-gap-may23/">agent safety trust gap analysis</a>, which found that only 14.4% of agents receive full security approval before deployment.</p>
  </li>
  <li>
    <p><strong>The defensive vs. offensive AI debate gets real.</strong> Anthropic is explicitly withholding Mythos from general release while deploying it defensively. This sets a precedent for how frontier AI companies might gate access to especially powerful capabilities.</p>
  </li>
  <li>
    <p><strong>Cross-industry AI security coalitions become the norm.</strong> The participation of virtually every major tech company signals that AI-powered cybersecurity is shifting from competitive differentiator to shared infrastructure problem.</p>
  </li>
  <li>
    <p><strong>Open source maintainers get AI-powered help.</strong> The $4M in donations and access program means that resource-constrained open-source projects — which power the vast majority of modern software — can now benefit from frontier AI vulnerability detection.</p>
  </li>
</ol>

<hr />

<h2 id="the-bottom-line">The Bottom Line</h2>

<p>Project Glasswing is the most significant AI security initiative to date — not just because of Claude Mythos Preview’s capabilities, but because of the <strong>unprecedented breadth of industry alignment</strong> around a defensive AI deployment model. The partnership roster reads like a who’s-who of global technology: every major cloud provider (AWS, Google Cloud, Microsoft Azure), every major chipmaker (Apple, Broadcom, NVIDIA), every major security vendor (Cisco, CrowdStrike, Palo Alto Networks), and the world’s largest financial institution (JPMorganChase).</p>

<p>Anthropic has committed to reporting publicly within 90 days on vulnerabilities fixed, lessons learned, and practical recommendations for how security practices should evolve in the AI era. If Project Glasswing succeeds at its stated goals, it could fundamentally reshape how the industry approaches software security — from reactive patching to proactive, AI-driven vulnerability discovery at scale.</p>

<p>For agent builders, the key takeaway is clear: <strong>the frontier of autonomous agent capability is advancing faster than most expected.</strong> If a model can autonomously find a 27-year-old vulnerability in OpenBSD, it can autonomously handle far more than most production agent systems ask of it today.</p>]]></content><author><name>The Agent Report</name></author><category term="industry" /><category term="anthropic" /><category term="claude" /><category term="cybersecurity" /><category term="glasswing" /><category term="mythos-preview" /><category term="zero-day" /><category term="ai-safety" /><summary type="html"><![CDATA[Anthropic today announced Project Glasswing, a landmark cybersecurity initiative backed by $100M in model usage credits and partnerships with AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. At its core is Claude Mythos Preview — an unreleased frontier model that autonomously discovered a 27-year-old vulnerability in OpenBSD and a 16-year-old flaw in FFmpeg that had survived 5 million automated test passes.]]></summary></entry><entry><title type="html">Block Open-Sourced Goose: How a YAML Recipe File Scaled an AI Agent to 60% of the Company</title><link href="https://the-agent-report.com/2026/05/block-goose-ai-agent-recipe-runner-scaled-60-percent/" rel="alternate" type="text/html" title="Block Open-Sourced Goose: How a YAML Recipe File Scaled an AI Agent to 60% of the Company" /><published>2026-05-27T12:00:00+00:00</published><updated>2026-05-27T12:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/block-goose-ai-agent-recipe-runner-scaled-60-percent</id><content type="html" xml:base="https://the-agent-report.com/2026/05/block-goose-ai-agent-recipe-runner-scaled-60-percent/"><![CDATA[<p>Block — the parent company of Square, Cash App, and Afterpay — released <strong>Goose</strong> as an open-source AI agent in early 2025. A year later, the tool has <strong>44,000+ GitHub stars, 368+ contributors, and 2,600+ forks</strong>. But the headline number isn’t open-source adoption. It’s internal adoption: <strong>roughly 60% of Block’s ~12,000 employees use Goose weekly</strong>, spanning 15 different job profiles — engineering, sales, design, product, and customer success.</p>

<p>The question that follows is obvious: how does a single CLI tool serve both an engineer debugging a flaky test and a product manager triaging a Jira ticket?</p>

<p>The answer is a <strong>30-line YAML file</strong>.</p>

<h2 id="the-architecture-local-agent-mcp-tools-recipe-workflow">The Architecture: Local Agent, MCP Tools, Recipe Workflow</h2>

<p>Goose is a Rust binary that runs entirely on the user’s machine. It connects to any major LLM provider — Anthropic, OpenAI, Gemini, Mistral, xAI, or a local model via Ollama — and uses <strong>MCP (Model Context Protocol)</strong> servers as its tool surface. The architecture has three layers that each evolve independently:</p>

<ol>
  <li><strong>The agent runtime</strong> — a core loop (plan → call tools → evaluate → repeat) that stays generic.</li>
  <li><strong>The extension system</strong> — every tool is an MCP server. Adding GitHub access, Jira integration, or an internal API is a config entry, not a code change.</li>
  <li><strong>The recipe</strong> — a YAML document bundling instructions, required extensions, parameters, and the prompt into a single shareable file.</li>
</ol>

<p>The separation is deliberate. The agent doesn’t decide which tools to load — the recipe does. The agent doesn’t free-form its way through the task — the recipe provides a numbered sequence with checkpoints.</p>

<h2 id="what-a-recipe-looks-like">What a Recipe Looks Like</h2>

<p>A recipe is a YAML file that any team member can author. Here’s an abridged example for reviewing a GitHub pull request:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">name</span><span class="pi">:</span> <span class="s">review-pr</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Review a GitHub PR for risk areas</span>
<span class="na">params</span><span class="pi">:</span>
  <span class="na">pr_url</span><span class="pi">:</span>
    <span class="na">type</span><span class="pi">:</span> <span class="s">string</span>
    <span class="na">description</span><span class="pi">:</span> <span class="s">The GitHub PR URL to review.</span>
<span class="na">extensions</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">developer</span>
  <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">github</span>
    <span class="na">args</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">-y"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">@modelcontextprotocol/server-github"</span><span class="pi">]</span>
<span class="na">prompt</span><span class="pi">:</span> <span class="pi">|</span>
  <span class="s">1. Fetch the PR diff and the list of changed files.</span>
  <span class="s">2. For each file, identify: behavior changes, new dependencies,</span>
     <span class="s">missing tests, anything that looks rushed.</span>
  <span class="s">3. Group findings by severity: must-fix, should-fix, nit.</span>
  <span class="s">4. Post a single review comment with the grouped findings.</span>
</code></pre></div></div>

<p>The recipe is run with a single command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>goose recipe run review-pr <span class="nt">--params</span> <span class="nv">pr_url</span><span class="o">=</span>https://github.com/org/repo/pull/42
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">params</code> block makes the recipe a function — you call it with different inputs instead of writing one per task. The <code class="language-plaintext highlighter-rouge">extensions</code> block loads MCP servers dynamically for the duration of the run and discards them afterward. The numbered prompt steps act as a planning skeleton — the agent doesn’t reinvent the workflow each time.</p>

<h2 id="why-this-pattern-scaled">Why This Pattern Scaled</h2>

<p>The recipe format is the architectural breakthrough that explains the adoption number. A <strong>YAML file is a thing a product manager can author</strong>. They can copy a recipe a teammate wrote, change the prompt, run it, and see what happened — no deploy, no code review, no engineering handoff.</p>

<p>For engineers, the value is different: a recipe is a <strong>committable artifact</strong> that lives in the same repo as the code it operates on. A team’s review workflow sits at <code class="language-plaintext highlighter-rouge">recipes/review-pr.yaml</code> next to the service code. New hires read the recipe to understand the workflow. Changes get reviewed like any other artifact.</p>

<p>The MCP extension layer is the multiplier. Every new internal capability is a one-time MCP server build, and then it’s available to every recipe. Block doesn’t write a separate “PR review agent” and “ticket triage agent.” They write <strong>one Goose binary</strong>, then ship a directory of recipes and a directory of MCP servers. <strong>Composition does the rest.</strong> This MCP-based composition pattern is a core theme in our <a href="/2026/05/ultimate-guide-open-source-ai-agent-frameworks/">Ultimate Guide to Open Source AI Agent Frameworks</a>.</p>

<h2 id="goose-is-now-a-foundation-project">Goose Is Now a Foundation Project</h2>

<p>In a move that changes the risk calculus for enterprise adoption, Goose has moved from Block’s governance to the <strong>Agentic AI Foundation under the Linux Foundation</strong>. The tool is now a community-governed project — no single company controls its roadmap.</p>

<p>The implications are significant. The governance risk that held back enterprise teams (“what if Block stops investing?”) is gone. Recent community activity points toward a public recipe registry, tighter MCP server interoperability, and richer parameter types including file uploads and structured objects.</p>

<h2 id="what-this-means-for-the-industry">What This Means for the Industry</h2>

<p>Goose’s recipe pattern is the strongest signal yet that the future of enterprise AI agents is not about better models — it’s about <strong>workflow abstractions that non-engineers can author</strong>. The recipe is an architectural pattern, not a Goose-specific feature. The same shape works on top of Claude Code skills, Cursor agents, or any runtime that supports YAML-defined workflows and MCP tools. For a broader view of how open-source agent tools are evolving, see our <a href="/2026/06/top-20-open-source-ai-agent-tools-2026/">Top 20 Open Source AI Agent Tools</a> ranking.</p>

<p>The takeaway for any team building internal agent platforms: if your system doesn’t have an analog for the recipe, you’re going to end up with bespoke agent builds per team. The recipe is what lets one tool serve a 12,000-person company without forking.</p>

<p>The boring abstraction — a YAML file with a name, a prompt, and an extension list — is how you reach 60% of the company.</p>]]></content><author><name>The Agent Report</name></author><category term="tools-frameworks" /><category term="goose" /><category term="block" /><category term="square" /><category term="open-source" /><category term="mcp" /><category term="enterprise-adoption" /><category term="agent-framework" /><summary type="html"><![CDATA[Block open-sourced Goose, an AI agent that scaled to 60% of its 12,000 employees. The key innovation isn't the model or the prompt — it's a YAML recipe format that any team member can author.]]></summary></entry><entry><title type="html">Hermes Agent Post-Foundation Sprint: Dashboard OAuth, Kynver Memory, Qwen 3.7-Max, and 30+ Merged PRs</title><link href="https://the-agent-report.com/2026/05/hermes-agent-post-foundation-sprint-may27/" rel="alternate" type="text/html" title="Hermes Agent Post-Foundation Sprint: Dashboard OAuth, Kynver Memory, Qwen 3.7-Max, and 30+ Merged PRs" /><published>2026-05-27T12:00:00+00:00</published><updated>2026-05-27T12:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/hermes-agent-post-foundation-sprint-may27</id><content type="html" xml:base="https://the-agent-report.com/2026/05/hermes-agent-post-foundation-sprint-may27/"><![CDATA[<style>
.highlight-box { background: #1a1a2e; border-left: 4px solid #6c5ce7; padding: 1.2rem 1.5rem; margin: 1.5rem 0; border-radius: 0 8px 8px 0; }
.highlight-box p { margin: 0; }
.stat-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(200px, 1fr)); gap: 1rem; margin: 1.5rem 0; }
.stat-card { background: #1a1a2e; border-radius: 8px; padding: 1rem; text-align: center; border: 1px solid #2a2a3e; }
.stat-card .stat-value { font-size: 1.8rem; font-weight: bold; color: #6c5ce7; }
.stat-card .stat-label { font-size: 0.85rem; color: #888; margin-top: 0.3rem; }
</style>

<p>Just 11 days after the massive <a href="/2026/05/hermes-agent-v0140-foundation-release-may16/">v0.14.0 “Foundation” release</a>, the Hermes Agent team is showing no signs of slowing down. Today, May 27, saw a coordinated batch of <strong>30+ merged pull requests</strong> ship across the entire stack — from infrastructure and auth to new model support and security tooling.</p>

<p>The numbers tell the story: the repo has climbed from <strong>155K to 169.5K stars</strong> (+14,500 in 11 days), while <strong>forks have surged from 24,980 to 28,216</strong>, and <strong>open issues have grown to 14,219</strong> — reflecting a community that’s not just watching but building.</p>

<div class="stat-grid">
<div class="stat-card"><div class="stat-value">169.5K</div><div class="stat-label">GitHub Stars</div></div>
<div class="stat-card"><div class="stat-value">28.2K</div><div class="stat-label">Forks</div></div>
<div class="stat-card"><div class="stat-value">14.2K</div><div class="stat-label">Open Issues</div></div>
<div class="stat-card"><div class="stat-value">30+</div><div class="stat-label">PRs Merged Today</div></div>
</div>

<p>Here’s what landed in today’s sprint.</p>

<hr />

<h2 id="-dashboard-oauth-login-30156">🏠 Dashboard OAuth Login (#30156)</h2>

<p>The most user-facing change in today’s batch is the <strong>Dashboard OAuth login flow</strong>. Previously, dashboard users had to configure their provider credentials manually through config files. Now the dashboard supports a full OAuth login flow — operators can log in through their identity provider directly from the dashboard UI.</p>

<p>The implementation is backed by the new <code class="language-plaintext highlighter-rouge">dashboard.public_url</code> config option (<a href="https://github.com/NousResearch/hermes-agent/commit/a890389b69575916dfaf3980556f31f7f25c9871">commit by @benbarclay</a>), which allows operators behind reverse proxies to set the absolute base URL for OAuth callbacks. This fixes a common pain point for self-hosted deployments behind nginx, on-prem ingress controllers, and custom-domain Fly.io setups where <code class="language-plaintext highlighter-rouge">X-Forwarded-Host</code> headers aren’t reliably forwarded.</p>

<blockquote>
  <p>“When set, it is the complete authority — scheme + host + optional path prefix — and becomes the base for the OAuth <code class="language-plaintext highlighter-rouge">redirect_uri</code>.”
— Commit message on <code class="language-plaintext highlighter-rouge">HERMES_DASHBOARD_PUBLIC_URL</code></p>
</blockquote>

<p>The config follows a clean precedence chain: <strong>env var &gt; <code class="language-plaintext highlighter-rouge">config.yaml</code> &gt; auto-detected from request headers</strong>, matching the existing <code class="language-plaintext highlighter-rouge">dashboard.oauth.client_id</code> pattern.</p>

<hr />

<h2 id="-kynver-memory-provider--agentos-bridge-33158">🧠 Kynver Memory Provider + AgentOS Bridge (#33158)</h2>

<p>Memory is one of the most critical subsystems in any self-improving agent, and today Hermes gained a new backend. PR <a href="https://github.com/NousResearch/hermes-agent/pull/33158">#33158</a> adds the <strong>Kynver memory provider</strong> alongside an <strong>AgentOS bridge</strong>.</p>

<p>Kynver is a specialized memory substrate for AI agents, offering persistent, queryable storage optimized for agentic workloads. The AgentOS bridge means Hermes can now leverage AgentOS-compatible memory tools and infrastructure. This is a significant expansion of Hermes’ already rich memory ecosystem, which previously depended on filesystem-based, vector-store, and other backends.</p>

<hr />

<h2 id="-qwen-37-max-joins-the-model-catalog-32806-33129">🤖 Qwen 3.7-Max Joins the Model Catalog (#32806, #33129)</h2>

<p>Two PRs today add <strong>Qwen 3.7-Max</strong> — Alibaba’s latest frontier model — to Hermes’ model catalogs. PR <a href="https://github.com/NousResearch/hermes-agent/pull/32806">#32806</a> adds it to the Alibaba provider list, while <a href="https://github.com/NousResearch/hermes-agent/pull/33129">#33129</a> adds it to the <code class="language-plaintext highlighter-rouge">alibaba-coding-plan</code> catalog.</p>

<p>Qwen 3.7-Max has been making waves in the open-source AI community for its strong reasoning capabilities and competitive benchmark scores. Hermes users on the Alibaba provider can now select it via <code class="language-plaintext highlighter-rouge">hermes model</code> and start building agents with it immediately.</p>

<hr />

<h2 id="-api-server-session-controls-33134-29302">🔌 API Server Session Controls (#33134, #29302)</h2>

<p>The API server — Hermes’ HTTP interface for programmatic access — gets a major upgrade with <strong>session control APIs</strong>. PR <a href="https://github.com/NousResearch/hermes-agent/pull/33134">#33134</a> (salvaging <a href="https://github.com/NousResearch/hermes-agent/pull/29302">#29302</a>) introduces endpoints for:</p>
<ul>
  <li><strong>Session management</strong> — create, list, and manage active sessions</li>
  <li><strong>Chat endpoints</strong> — send messages, retrieve conversation history</li>
  <li><strong>Fork support</strong> — branch a session into a new independent context</li>
  <li><strong>SSE streaming</strong> — real-time event streaming for live agent responses</li>
</ul>

<p>This transforms the API server from a basic HTTP interface into a full-featured agent interaction platform — enabling custom UIs, CI/CD integrations, and programmatic agent orchestration.</p>

<hr />

<h2 id="️-security-plugins-pattern-matched-code-warnings-33131">🛡️ Security Plugins: Pattern-Matched Code Warnings (#33131)</h2>

<p>A new plugin category lands today: <strong>security-guidance plugins</strong>. PR <a href="https://github.com/NousResearch/hermes-agent/pull/33131">#33131</a> introduces a system that pattern-matches against dangerous code patterns in agent-written code and surfaces warnings <em>before</em> the code is executed.</p>

<p>This is especially important for self-improving agents that write and execute their own code — Hermes’ core value proposition. The security-guidance plugin catches common dangerous patterns (unsafe <code class="language-plaintext highlighter-rouge">eval()</code>, file-system traversal, shell injection vectors) and flags them with actionable remediation hints.</p>

<hr />

<h2 id="️-codex-reliability-cluster">🛠️ Codex Reliability Cluster</h2>

<p>A significant portion of today’s merged PRs focus on <strong>Codex (GitHub Copilot) provider reliability</strong> — the workhorse backend for many Hermes users:</p>

<ul>
  <li><strong>Credential pool sync on re-auth</strong> (<a href="https://github.com/NousResearch/hermes-agent/pull/33074">#33074</a>) — fixes a bug where Codex re-authentication via <code class="language-plaintext highlighter-rouge">hermes setup</code> / <code class="language-plaintext highlighter-rouge">hermes model</code> would write fresh OAuth tokens but leave the credential pool holding stale entries, causing 401 errors on every subsequent request</li>
  <li><strong>Foreign-issuer reasoning on replay</strong> (<a href="https://github.com/NousResearch/hermes-agent/pull/33156">#33156</a>, salvaging <a href="https://github.com/NousResearch/hermes-agent/pull/31629">#31629</a>) — prevents <code class="language-plaintext highlighter-rouge">HTTP 400 invalid_encrypted_content</code> errors when switching between model providers mid-conversation (e.g., from Grok to GPT-5.5)</li>
  <li><strong>Transient rs_tmp reasoning state</strong> (<a href="https://github.com/NousResearch/hermes-agent/pull/33146">#33146</a>) — drops stale temporary reasoning items that could accumulate and cause failures</li>
  <li><strong>Null output stream handling</strong> (<a href="https://github.com/NousResearch/hermes-agent/pull/33008">#33008</a>, <a href="https://github.com/NousResearch/hermes-agent/pull/33050">#33050</a>) — normalizes <code class="language-plaintext highlighter-rouge">response.output=None</code> to empty lists, preventing iteration crashes</li>
  <li><strong>Silent-hang workaround hints</strong> (<a href="https://github.com/NousResearch/hermes-agent/pull/33133">#33133</a>, <a href="https://github.com/NousResearch/hermes-agent/pull/33034">#33034</a>) — improved user-facing hints when ChatGPT silent-hang scenarios are detected</li>
  <li><strong>Homebrew CI poller nudges</strong> (<a href="https://github.com/NousResearch/hermes-agent/pull/33142">#33142</a>) — the terminal tool now detects anti-pattern CI polling scripts and nudges users toward canonical green-CI snippets</li>
</ul>

<hr />

<h2 id="-telegram-ux-cleanup">💬 Telegram UX Cleanup</h2>

<p>A cluster of three PRs addresses Telegram operational noise:</p>
<ul>
  <li><a href="https://github.com/NousResearch/hermes-agent/pull/31034">#31034</a> — quiets operational chatter in Telegram gateway</li>
  <li><a href="https://github.com/NousResearch/hermes-agent/pull/31098">#31098</a> — ignores <code class="language-plaintext highlighter-rouge">/start</code> platform pings on Telegram</li>
  <li><a href="https://github.com/NousResearch/hermes-agent/pull/31941">#31941</a> — hides compaction status noise</li>
</ul>

<p>These are small but important UX improvements — reducing noise in Telegram channels where Hermes operates as a bot makes the conversation feel more natural and less “robotic.”</p>

<hr />

<h2 id="-what-this-sprint-means">📊 What This Sprint Means</h2>

<p>Eleven days after the Foundation release, Hermes Agent’s development velocity is accelerating:</p>

<ol>
  <li><strong>The dashboard is becoming a real product</strong> — OAuth login and session control APIs point to Hermes evolving beyond a CLI-only tool into a platform with proper web UI and API access layers</li>
  <li><strong>Memory diversity is growing</strong> — the Kynver + AgentOS bridge means Hermes can plug into more enterprise and research-grade memory substrates</li>
  <li><strong>Security is front-and-center</strong> — pattern-matched security plugins for code writing is a direct response to the unique risks of self-improving agents</li>
  <li><strong>Daily reliability compounding</strong> — the Codex cluster alone fixes 7+ distinct failure modes that real users were hitting</li>
</ol>

<p>The pace is remarkable: 30+ PRs merged in a single day, spanning infrastructure (auth, config, API), models (Qwen 3.7-Max), memory systems, security, and reliability. If the Foundation release was about <em>surface area</em>, this sprint is about <em>depth</em> — making every subsystem more reliable, more secure, and more capable. The project’s momentum mirrors what we’ve documented across the broader <a href="/2026/05/hermes-agent-community-ecosystem-may25/">Hermes Agent community ecosystem</a>, which has grown to 276 documented use cases and 165K GitHub stars.</p>

<p>With <strong>169.5K stars and counting</strong>, Hermes Agent continues to be the fastest-growing open-source agent framework — and if today’s sprint is any indication, the next release (v0.15.0?) will be worth the wait.</p>]]></content><author><name>The Agent Report</name></author><category term="hermes-agent" /><category term="hermes-agent" /><category term="nous-research" /><category term="open-source" /><category term="dashboard-oauth" /><category term="kynver-memory" /><category term="qwen-37-max" /><category term="api-server" /><category term="security-plugins" /><category term="codex-reliability" /><summary type="html"><![CDATA[Eleven days after v0.14.0 'Foundation', Hermes Agent's development hasn't slowed down: Dashboard OAuth login shipped, a Kynver memory provider brings AgentOS bridge, Qwen 3.7-Max lands in model catalogs, the API server gets session controls, and a new security-plugins system delivers pattern-matched code warnings. Plus a cluster of Codex reliability fixes and Telegram UX improvements — all merged on May 27.]]></summary></entry><entry><title type="html">Ultimate Guide to Open Source AI Agent Frameworks in 2026</title><link href="https://the-agent-report.com/2026/05/ultimate-guide-open-source-ai-agent-frameworks/" rel="alternate" type="text/html" title="Ultimate Guide to Open Source AI Agent Frameworks in 2026" /><published>2026-05-27T10:00:00+00:00</published><updated>2026-05-27T10:00:00+00:00</updated><id>https://the-agent-report.com/2026/05/ultimate-guide-open-source-ai-agent-frameworks</id><content type="html" xml:base="https://the-agent-report.com/2026/05/ultimate-guide-open-source-ai-agent-frameworks/"><![CDATA[<p>The open-source AI agent framework landscape in 2026 is both richer and more turbulent than it was even twelve months ago. The year began with two major transitions: Microsoft moved AutoGen into maintenance mode and merged it with Semantic Kernel into the new <strong>Microsoft Agent Framework</strong> (GA April 2026), while OpenAI archived its experimental Swarm library and redirected users to the production-grade <strong>Agents SDK</strong>. LangGraph hit 1.0 GA. CrewAI crossed the 1.0 threshold. And TypeScript-native frameworks like Mastra and Vercel AI SDK surged past 20,000 GitHub stars, proving that the agent revolution is not Python’s alone. For context on how these frameworks fit into the broader agent landscape, see our <a href="/2026/05/complete-guide-to-ai-agents-2026/">Complete Guide to AI Agents</a>.</p>

<p>This guide is for developers and technical leaders who need to cut through the noise. We compare eight frameworks across eight criteria — language support, agent types, key features, learning curve, production readiness, best use case, GitHub stars, and 2026 momentum — with deep dives into each. The goal is not to crown a winner but to help you choose the right tool for <em>your</em> use case, team, and stack.</p>

<p><strong>Quick links:</strong> <a href="#comparison-table">Comparison Table</a> · <a href="#1-langchain--langgraph">LangChain / LangGraph</a> · <a href="#2-autogen--ag2">AutoGen / AG2</a> · <a href="#3-crewai">CrewAI</a> · <a href="#4-openai-agents-sdk">OpenAI Agents SDK</a> · <a href="#5-haystack">Haystack</a> · <a href="#6-semantic-kernel">Semantic Kernel</a> · <a href="#7-mastra">Mastra</a> · <a href="#8-vercel-ai-sdk">Vercel AI SDK</a> · <a href="#how-to-choose-a-framework">How to Choose</a> · <a href="#frequently-asked-questions">FAQ</a></p>

<hr />

<h2 id="comparison-table">Comparison Table</h2>

<p>The table below compares all eight frameworks across eight essential dimensions. Star counts are approximate and sourced from GitHub and third-party trackers as of early June 2026. Production readiness reflects consensus across multiple independent comparisons, not vendor claims.</p>

<table>
  <thead>
    <tr>
      <th>Framework</th>
      <th>Language(s)</th>
      <th>Agent Types</th>
      <th>Key Features</th>
      <th>Learning Curve</th>
      <th>Production Readiness</th>
      <th>Best Use Case</th>
      <th>~ GitHub Stars</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><strong>LangChain / LangGraph</strong></td>
      <td>Python, JavaScript</td>
      <td>Single, Multi, Hierarchical, Swarm</td>
      <td>Stateful graphs, checkpointing, memory, human-in-the-loop, LangSmith tracing</td>
      <td>Advanced</td>
      <td>Mature</td>
      <td>Complex stateful workflows, enterprise orchestration</td>
      <td>137k / 33k</td>
    </tr>
    <tr>
      <td><strong>AutoGen (AG2)</strong></td>
      <td>Python</td>
      <td>Multi, Conversational, GroupChat</td>
      <td>Event-driven, async messaging, code execution sandboxes</td>
      <td>Intermediate</td>
      <td>Maintenance Mode</td>
      <td>Legacy multi-agent research systems (use MAF for new builds)</td>
      <td>~48k</td>
    </tr>
    <tr>
      <td><strong>CrewAI</strong></td>
      <td>Python</td>
      <td>Multi, Hierarchical, Role-based</td>
      <td>Role/Goal/Backstory agents, sequential &amp; hierarchical processes, Flows, MCP native</td>
      <td>Beginner</td>
      <td>Stable</td>
      <td>Rapid multi-agent prototyping, marketing automation</td>
      <td>~38k</td>
    </tr>
    <tr>
      <td><strong>OpenAI Agents SDK</strong></td>
      <td>Python, TypeScript</td>
      <td>Single, Multi (handoff)</td>
      <td>Handoff delegation, guardrails, tracing, sandboxed execution, provider-agnostic (100+ LLMs)</td>
      <td>Beginner</td>
      <td>Stable</td>
      <td>Delegation chains, TypeScript/Next.js teams, rapid prototyping</td>
      <td>~19k</td>
    </tr>
    <tr>
      <td><strong>Haystack</strong></td>
      <td>Python</td>
      <td>Single, Multi (pipeline agents)</td>
      <td>Typed pipelines, 50+ document stores, RAG-native, multimodal, agentic pipelines</td>
      <td>Intermediate</td>
      <td>Stable</td>
      <td>Production RAG, semantic search, question answering</td>
      <td>~22k</td>
    </tr>
    <tr>
      <td><strong>Semantic Kernel</strong></td>
      <td>.NET, Python, Java</td>
      <td>Single, Multi, Planner</td>
      <td>Enterprise SDK, Azure integration, OpenTelemetry, A2A protocol, plugin architecture</td>
      <td>Intermediate</td>
      <td>Mature</td>
      <td>.NET enterprise teams, Azure-native AI applications</td>
      <td>~28k</td>
    </tr>
    <tr>
      <td><strong>Mastra</strong></td>
      <td>TypeScript</td>
      <td>Single, Multi, Graph-based</td>
      <td>Graph workflows (then/branch/parallel), RAG, MCP, evals, 4-tier memory, 81+ providers</td>
      <td>Intermediate</td>
      <td>Stable</td>
      <td>TypeScript-native production agents, integrated framework</td>
      <td>~21k</td>
    </tr>
    <tr>
      <td><strong>Vercel AI SDK</strong></td>
      <td>TypeScript, JavaScript</td>
      <td>Single, Multi (tools-based)</td>
      <td>Streaming, React hooks, 2.8M weekly downloads, Next.js native, provider-agnostic</td>
      <td>Beginner</td>
      <td>Mature</td>
      <td>Web app AI features, React/Next.js teams, chatbots</td>
      <td>~20k</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p><strong>A note on star counts:</strong> Star counts are a lagging indicator of community size — not of production readiness. LangGraph has roughly one-quarter the stars of LangChain but more verified enterprise deployments. AutoGen has ~48k stars but is in maintenance mode. Choose by mental model and production track record, not by GitHub popularity. For a practical, ranked view of the tools built on these frameworks, see our <a href="/2026/06/top-20-open-source-ai-agent-tools-2026/">Top 20 Open Source AI Agent Tools</a> guide.</p>
</blockquote>

<hr />

<h2 id="deep-dives">Deep Dives</h2>

<h3 id="1-langchain--langgraph">1. LangChain / LangGraph</h3>

<p><strong>The most mature agent ecosystem, for teams that need ultimate control.</strong></p>

<p>LangChain is the granddaddy of the open-source LLM application ecosystem — <strong>137,000 GitHub stars</strong>, 3,900+ contributors, and 281,000 dependent repositories as of mid-2026. But for agents specifically, <strong>LangGraph</strong> is the framework that matters. Released as a standalone library in 2024 and reaching <strong>1.0 GA</strong> in October 2025, LangGraph models agent behavior as a directed state graph: nodes are computation steps, edges are conditional transitions, and checkpointers provide persistent state with Postgres or Redis backends.</p>

<p>LangGraph’s power lies in its explicitness. You define every state transition. You can pause workflows mid-execution for human approval. You can rewind to any checkpoint during debugging. This makes it the go-to for enterprises — confirmed deployments include <strong>Klarna</strong> (853 employee-equivalent agents, saving $60M), <strong>Uber</strong> (~21,000 developer-hours saved), <strong>LinkedIn</strong>, <strong>Cisco</strong>, <strong>JPMorgan</strong>, and <strong>Elastic</strong>. LangSmith provides monitoring and tracing for observability at scale.</p>

<p>The trade-off is complexity. LangGraph has a steep learning curve — expect a multi-day ramp-up before you’re productive. For teams that don’t need stateful orchestration, LangGraph is overkill. But for production systems where failure is expensive and audit trails are mandatory, nothing else in the open-source ecosystem matches its depth.</p>

<p><strong>2026 momentum:</strong> Deep Agents (launched March 2026) adds built-in planning, filesystem-based context management, and sub-agent spawning on top of LangGraph — pushing it further toward batteries-included, without sacrificing the underlying graph model.</p>

<hr />

<h3 id="2-autogen--ag2">2. AutoGen / AG2</h3>

<p><strong>Microsoft’s multi-agent pioneer — now in maintenance mode.</strong></p>

<p>AutoGen was the framework that sparked the multi-agent revolution. Originally from Microsoft Research, it introduced event-driven, conversational multi-agent systems where agents collaborate through message-passing rather than rigid pipelines. At its peak in 2025, AutoGen amassed <strong>~55,000 GitHub stars</strong> and proved that multi-agent setups could outperform single-agent solutions on benchmarks like GAIA.</p>

<p>But 2026 brought a major reset. In early 2026, Microsoft announced that AutoGen was entering <strong>maintenance mode</strong> — bug fixes only, no new features. The team merged AutoGen’s orchestration ideas with Semantic Kernel’s production infrastructure into the <strong>Microsoft Agent Framework (MAF)</strong>, which reached <strong>1.0 GA on April 3, 2026</strong>. MAF ships as a unified SDK for .NET and Python under <code class="language-plaintext highlighter-rouge">Microsoft.Agents.AI</code>, with Semantic Kernel as the foundation layer and AutoGen-style graph workflows on top.</p>

<p>The community fork <strong>AG2</strong> continues AutoGen development independently, but its long-term trajectory is uncertain. For new projects, the unambiguous guidance from Microsoft and independent analysts is: start with MAF, not AutoGen.</p>

<p><strong>Best remaining use case:</strong> Teams with existing AutoGen 0.2 or 0.4 deployments that aren’t ready to migrate, or researchers who need AutoGen’s specific conversational multi-agent paradigm for academic work. For everyone else, the migration path leads to MAF or to alternative frameworks.</p>

<hr />

<h3 id="3-crewai">3. CrewAI</h3>

<p><strong>The simplest path to multi-agent orchestration.</strong></p>

<p>If LangGraph is a precision instrument, CrewAI is a power tool for multi-agent workflows. The framework’s mental model is intuitive: define agents with roles, goals, and backstories, then assign them tasks in a sequential or hierarchical process. A working multi-agent crew can be scaffolded in <strong>under 10 minutes</strong> via the CLI — the fastest time-to-value of any framework in this comparison.</p>

<p>CrewAI hit <strong>1.0 GA</strong> in October 2025 and has since added significant capabilities: <strong>CrewAI Flows</strong> (event-driven workflows with <code class="language-plaintext highlighter-rouge">@start</code>, <code class="language-plaintext highlighter-rouge">@listen</code>, and <code class="language-plaintext highlighter-rouge">@router</code> decorators for complex branching), native <strong>MCP server support</strong> (v1.10.x), and streaming tool calls. The crewAIInc/crewAI repository has grown to approximately <strong>38,000 GitHub stars</strong>.</p>

<p>CrewAI’s strength is accessibility. The role-based abstraction maps naturally to how teams think about delegation — researcher, writer, reviewer — making it popular for content generation pipelines, marketing automation, customer service triage, and rapid prototypes. The company claims ~1.4 billion automations per month and 60% Fortune 500 adoption (though these figures are not independently audited).</p>

<p>The trade-off: CrewAI is less suited for deeply stateful, long-running agents that need persistent memory across sessions. For those use cases, LangGraph’s checkpointing model is more appropriate. But for teams that need to orchestrate multiple agents quickly without building a state machine from scratch, CrewAI remains the most approachable option.</p>

<hr />

<h3 id="4-openai-agents-sdk">4. OpenAI Agents SDK</h3>

<p><strong>Lightweight, official, and provider-agnostic.</strong></p>

<p>OpenAI’s Agents SDK, released in March 2025, is the successor to the experimental Swarm library (archived March 2025). It’s a minimalist, open-source toolkit built around a single elegant primitive: the <strong>handoff</strong>. Agents can delegate tasks to other agents, enabling triage-and-specialist architectures with remarkably little code.</p>

<p>Despite the name, the SDK is <strong>provider-agnostic</strong> — it works with 100+ LLMs, not just OpenAI models. It ships for both <strong>Python</strong> (v0.17.3 as of May 2026) and <strong>TypeScript</strong> (v0.8.3 as of April 2026), making it one of the few frameworks with first-class support in both ecosystems. Key features include built-in <strong>guardrails</strong> (parallel input validation that halts on failure), <strong>tracing</strong> via OpenAI’s observability platform, and — as of April 2026 — <strong>sandboxed code execution</strong> with providers like Modal, E2B, Cloudflare, and Vercel.</p>

<p>The Agents SDK has approximately <strong>19,000 GitHub stars</strong> (Python repo) and roughly 10.3 million monthly PyPI downloads. Its learning curve is the gentlest of any framework here: if you can write a Python function and decorate it as a tool, you can build an agent.</p>

<p><strong>Best for:</strong> Teams that want a lightweight, official toolkit without the abstraction overhead of LangChain or CrewAI. Particularly strong for delegation-heavy workflows (triage → specialist → response) and for TypeScript/Next.js teams that want the same SDK across frontend and backend. The main limitation is that the handoff model, while elegant, is less expressive than LangGraph’s state graphs for complex branching and looping workflows.</p>

<hr />

<h3 id="5-haystack">5. Haystack</h3>

<p><strong>The RAG specialist with growing agent capabilities.</strong></p>

<p>Haystack, built by Berlin-based deepset (~$45.6M raised), occupies a distinct niche: it is the framework you choose when <strong>retrieval quality is the primary constraint</strong>. While other frameworks treat RAG as a feature, Haystack was built around it from day one. Its pipeline architecture — where components (Retriever, Ranker, PromptBuilder, Generator) are composed into typed, directed graphs — maps directly to the structure of production search and question-answering systems.</p>

<p>The <strong>Haystack 2.x</strong> rewrite modernized the framework significantly, adding agentic pipelines, multimodal support (text + images), and a growing component ecosystem. With approximately <strong>22,000 GitHub stars</strong> and an Apache 2.0 license, Haystack provides 50+ document store integrations, hybrid retrieval strategies, and a REST API for deployment.</p>

<p>Haystack’s agent capabilities are structured as “agentic pipelines” — agents that can reason, use tools (including Haystack components as tools), and iterate within the pipeline framework. This is a different mental model from LangGraph’s freeform graphs or CrewAI’s role-playing, but it’s well-suited for use cases where the primary workflow is retrieval-centric and agents assist within that pipeline.</p>

<p><strong>Best for:</strong> Production RAG systems where retrieval precision and pipeline predictability matter more than open-ended agent autonomy. Teams building semantic search, enterprise knowledge management, or customer support chatbots with grounded answers. Not the best choice for purely conversational multi-agent systems where retrieval is secondary. Deepset also offers a managed cloud platform (Haystack Enterprise) for teams that want a hosted solution.</p>

<hr />

<h3 id="6-semantic-kernel">6. Semantic Kernel</h3>

<p><strong>The enterprise-grade .NET SDK — now part of Microsoft Agent Framework.</strong></p>

<p>Semantic Kernel (SK) is Microsoft’s open-source AI orchestration SDK, purpose-built for the .NET ecosystem with additional support for Python and Java. As of mid-2026, it has approximately <strong>28,000 GitHub stars</strong> and has become the default answer for enterprise .NET teams asking “how do we build AI agents without leaving our stack?”</p>

<p>SK’s architecture centers on <strong>plugins</strong> (reusable AI functions, equivalent to tools in other frameworks), <strong>planners</strong> (agents that chain plugins to accomplish goals), and <strong>memories</strong> (vector-backed semantic storage). It is model-agnostic, supporting OpenAI, Azure OpenAI, Anthropic, Google, and local models. Enterprise features include OpenTelemetry integration for observability, Azure AI Foundry deployment, and support for Google’s Agent-to-Agent (A2A) protocol for cross-framework interoperability.</p>

<p>The major 2026 development was SK’s merger with AutoGen into the <strong>Microsoft Agent Framework (MAF) 1.0</strong>, which shipped GA on April 3, 2026. In MAF, SK provides the production foundation (stability, telemetry, enterprise integration) while AutoGen contributes the multi-agent orchestration patterns. The unified <code class="language-plaintext highlighter-rouge">Microsoft.Agents.AI</code> SDK ships as first-class packages for both .NET and Python with identical API shapes.</p>

<p><strong>Best for:</strong> .NET and C# enterprise teams building AI agents on Azure. Organizations that need deep integration with the Microsoft ecosystem — Azure AI Foundry, Microsoft 365, Power Platform. Teams that value long-term support stability over cutting-edge experimentation. The main limitation is that SK’s Python experience, while solid, is secondary to its .NET-native design — Python-first teams may find other frameworks more idiomatic.</p>

<hr />

<h3 id="7-mastra">7. Mastra</h3>

<p><strong>The TypeScript-native contender with graph-based workflows.</strong></p>

<p>Mastra represents the new wave of TypeScript-first agent frameworks. Built by the team behind Gatsby (YC W25, $13M seed round), Mastra provides an integrated, opinionated framework where agents, workflows, RAG, memory, and evaluation live in a single coherent package — no stitching together of separate libraries required.</p>

<p>Mastra’s workflow engine models agent orchestration as composable graphs with <code class="language-plaintext highlighter-rouge">then()</code>, <code class="language-plaintext highlighter-rouge">branch()</code>, and <code class="language-plaintext highlighter-rouge">parallel()</code> primitives, plus suspend/resume for human-in-the-loop patterns. The <code class="language-plaintext highlighter-rouge">.network()</code> method turns any agent into a router that delegates to sub-agents. Memory is structured across four tiers: message history, working memory, semantic recall, and RAG — a more comprehensive model than most Python frameworks offer. MCP support is built-in, and Mastra connects to <strong>81 providers covering 2,436+ models</strong> via the Vercel AI SDK.</p>

<p>With approximately <strong>21,000 GitHub stars</strong> and accelerating adoption (Replit used Mastra to improve Agent 3 task success from 80% to 96%), Mastra is carving out a position as “the most complete TypeScript agent framework.” The framework has attracted enterprise users including <strong>Marsh McLennan</strong> (75,000 employees) and <strong>SoftBank</strong>.</p>

<p><strong>Best for:</strong> TypeScript teams that want a single integrated framework rather than assembling agents from separate libraries. Developers who value type safety, IDE autocomplete, and Zod schema validation in their agent pipelines. Projects where observability (built-in tracing and eval harness) and MCP connectivity are requirements from day one. The main limitation is that Mastra’s ecosystem is younger than the Python equivalents — fewer community contributions, third-party integrations, and Stack Overflow answers — though the trajectory is steeply upward.</p>

<hr />

<h3 id="8-vercel-ai-sdk">8. Vercel AI SDK</h3>

<p><strong>The web developer’s agent toolkit — massive adoption, streaming-first.</strong></p>

<p>The Vercel AI SDK is not a traditional “agent framework” in the same sense as LangGraph or CrewAI, but it is the most downloaded TypeScript AI toolkit by an enormous margin: <strong>2.8 million weekly npm downloads</strong> and approximately <strong>20,000 GitHub stars</strong>. Built by the creators of Next.js, the SDK is designed to add AI features to web applications with minimal friction.</p>

<p>Its architectural philosophy is streaming-first. The <code class="language-plaintext highlighter-rouge">useChat</code> and <code class="language-plaintext highlighter-rouge">useCompletion</code> React hooks handle the full lifecycle of AI interactions — streaming responses, tool calls, loading states, and error handling — with a few lines of code. The SDK is provider-agnostic, supporting OpenAI, Anthropic, Google, Mistral, and dozens of others through a unified interface. As of 2026, the SDK has grown agentic capabilities: the <code class="language-plaintext highlighter-rouge">generateText</code> and <code class="language-plaintext highlighter-rouge">streamText</code> functions support tool calling, multi-step reasoning, and structured output generation, effectively enabling single-agent workflows.</p>

<p>The SDK’s agent capabilities are lighter than dedicated frameworks — you won’t find built-in multi-agent orchestration, persistent memory, or human-in-the-loop checkpointing. But for the most common use case in web applications — an AI feature that uses tools, streams responses, and generates structured data — the Vercel AI SDK is faster to implement than any Python alternative.</p>

<p><strong>Best for:</strong> React and Next.js developers adding AI chat, tool use, or structured generation to web applications. Projects where streaming UX is a priority. Teams that want the DX advantages of TypeScript-native tooling. Not the right choice for complex multi-agent systems or backend-only agent deployments — pair it with LangGraph.js or Mastra for those cases.</p>

<hr />

<h2 id="how-to-choose-a-framework">How to Choose a Framework</h2>

<p>The right framework depends on your team’s language, your use case’s complexity, and your production requirements. Here’s a practical decision guide:</p>

<h3 id="choose-by-language">Choose by Language</h3>

<ul>
  <li><strong>Python-first team → LangGraph, CrewAI, or OpenAI Agents SDK.</strong> LangGraph for complex stateful workflows, CrewAI for rapid multi-agent prototyping, OpenAI Agents SDK for lightweight delegation chains.</li>
  <li><strong>TypeScript/JavaScript-first team → Mastra or Vercel AI SDK.</strong> Mastra for full agent applications with RAG and evals, Vercel AI SDK for web app AI features.</li>
  <li><strong>.NET enterprise team → Semantic Kernel / Microsoft Agent Framework.</strong> The only framework with first-class .NET support and deep Azure integration.</li>
  <li><strong>Mixed Python + TypeScript → OpenAI Agents SDK</strong> (ships for both) or <strong>LangGraph</strong> (LangGraph.js for TypeScript).</li>
</ul>

<h3 id="choose-by-complexity">Choose by Complexity</h3>

<ul>
  <li><strong>Simple single-agent with tools → OpenAI Agents SDK or Vercel AI SDK.</strong> Both minimize boilerplate for the most common agent patterns.</li>
  <li><strong>Multi-agent orchestration, fast prototype → CrewAI.</strong> Role-based abstractions get you from idea to working crew in under 10 minutes.</li>
  <li><strong>Complex stateful workflows, human-in-the-loop → LangGraph.</strong> The checkpointing and graph model are unmatched for production-critical systems.</li>
  <li><strong>RAG-first with agent augmentation → Haystack.</strong> When retrieval quality is more important than agent autonomy.</li>
</ul>

<h3 id="choose-by-production-posture">Choose by Production Posture</h3>

<ul>
  <li><strong>Enterprise, long-term support → LangGraph or Semantic Kernel/MAF.</strong> Both have 1.0 releases, stable APIs, and verified enterprise deployments.</li>
  <li><strong>Startup, rapid iteration → CrewAI or Mastra.</strong> Fastest time-to-value, growing fast, backed by venture funding.</li>
  <li><strong>Lightweight, low commitment → OpenAI Agents SDK.</strong> Minimal dependencies, works with any LLM provider.</li>
</ul>

<h3 id="special-cases">Special Cases</h3>

<ul>
  <li><strong>Building on Azure → Microsoft Agent Framework</strong> (Semantic Kernel + AutoGen merged). Native Azure AI Foundry deployment, Entra ID auth, OpenTelemetry.</li>
  <li><strong>Building on Vercel/Next.js → Vercel AI SDK.</strong> Native streaming, React hooks, Edge runtime support.</li>
  <li><strong>Building a coding agent →</strong> Consider <a href="https://github.com/langchain-ai/langgraph">LangGraph Deep Agents</a> for planning + sub-agent spawning, or <a href="https://mastra.ai">Mastra</a> for TypeScript-native tool execution.</li>
</ul>

<blockquote>
  <p><strong>If you’re migrating from AutoGen:</strong> The official successor is the Microsoft Agent Framework (GA April 2026). For teams not on the Microsoft stack, CrewAI or LangGraph are the most common migration targets based on community discussion.</p>
</blockquote>

<hr />

<h2 id="frequently-asked-questions">Frequently Asked Questions</h2>

<h3 id="which-framework-has-the-most-github-stars">Which framework has the most GitHub stars?</h3>

<p>LangChain has the most stars at approximately 137,000, followed by AutoGen (~48k) and CrewAI (~38k). However, star counts are not a reliable measure of production readiness — LangGraph (~33k stars) has more verified enterprise deployments than any framework with more stars. AutoGen, despite ~48k stars, is now in maintenance mode.</p>

<h3 id="whats-the-difference-between-langchain-and-langgraph">What’s the difference between LangChain and LangGraph?</h3>

<p>LangChain is the broader ecosystem — a platform for building LLM applications with chains, agents, tools, and output parsing. LangGraph is a specific library within that ecosystem focused on stateful, graph-based agent orchestration with checkpointing and human-in-the-loop patterns. For agent-specific work in 2026, LangGraph is the recommended starting point within the LangChain ecosystem.</p>

<h3 id="is-autogen-dead">Is AutoGen dead?</h3>

<p>Not dead, but in <strong>maintenance mode</strong> as of early 2026. Microsoft is no longer adding features to AutoGen and has merged its orchestration concepts into the Microsoft Agent Framework (MAF). The community fork AG2 continues development, but for new projects, MAF or alternative frameworks are recommended.</p>

<h3 id="which-framework-is-best-for-beginners">Which framework is best for beginners?</h3>

<p><strong>CrewAI</strong> has the gentlest learning curve for multi-agent systems — its role-based abstraction is intuitive and the CLI scaffolds working crews in minutes. For single-agent applications, the <strong>OpenAI Agents SDK</strong> is similarly approachable with minimal boilerplate.</p>

<h3 id="can-i-use-these-frameworks-with-localopen-source-models">Can I use these frameworks with local/open-source models?</h3>

<p>Yes, all eight frameworks are provider-agnostic or support multiple providers. LangGraph, OpenAI Agents SDK, Mastra, and Vercel AI SDK all support 80+ LLM providers including local models via Ollama, vLLM, or similar. Haystack and Semantic Kernel have built-in support for local models. CrewAI supports any model via LiteLLM integration.</p>

<h3 id="which-framework-is-most-production-ready">Which framework is most “production-ready”?</h3>

<p><strong>LangGraph</strong> consistently ranks #1 in production-readiness across independent comparisons, with confirmed enterprise deployments at Klarna, Uber, Cisco, LinkedIn, JPMorgan, and Elastic. <strong>Semantic Kernel</strong> (via Microsoft Agent Framework) is the most production-ready option for .NET/Azure teams, with GA 1.0 guarantees and long-term support commitments.</p>

<h3 id="should-i-build-agents-in-python-or-typescript">Should I build agents in Python or TypeScript?</h3>

<p>Python remains the dominant language for AI agent development, with the deepest ecosystem of frameworks, tools, and community resources. TypeScript is the fastest-growing alternative and is the better choice if your application stack is already JavaScript/TypeScript or if you’re building agents that integrate deeply with web applications. The gap is narrowing rapidly — frameworks like Mastra and Vercel AI SDK are closing the feature parity gap with their Python counterparts.</p>

<h3 id="how-do-these-compare-to-hosted-platforms-like-dify-or-langsmith">How do these compare to hosted platforms like Dify or LangSmith?</h3>

<p>This guide focuses on <strong>code-first frameworks</strong> — libraries and SDKs you integrate into your own application. Hosted platforms like Dify (~143k stars, visual builder), LangSmith (observability), and deepset Cloud (managed Haystack) operate at a higher level of abstraction. They’re often complementary: you might build agents with LangGraph and monitor them with LangSmith, or use CrewAI for orchestration and Dify for the end-user interface.</p>

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Which open source AI agent framework has the most GitHub stars?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "LangChain has the most stars at approximately 137,000, followed by AutoGen (~48k) and CrewAI (~38k). However, star counts are not a reliable measure of production readiness — LangGraph (~33k stars) has more verified enterprise deployments than any framework with more stars."
      }
    },
    {
      "@type": "Question",
      "name": "What's the difference between LangChain and LangGraph?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "LangChain is the broader ecosystem — a platform for building LLM applications with chains, agents, tools, and output parsing. LangGraph is a specific library within that ecosystem focused on stateful, graph-based agent orchestration with checkpointing and human-in-the-loop patterns. For agent-specific work in 2026, LangGraph is the recommended starting point."
      }
    },
    {
      "@type": "Question",
      "name": "Is AutoGen dead in 2026?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "AutoGen is not dead but is in maintenance mode as of early 2026. Microsoft is no longer adding features and has merged its orchestration concepts into the Microsoft Agent Framework (MAF). The community fork AG2 continues development, but for new projects, MAF or alternative frameworks are recommended."
      }
    },
    {
      "@type": "Question",
      "name": "Which AI agent framework is best for beginners?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "CrewAI has the gentlest learning curve for multi-agent systems — its role-based abstraction is intuitive and the CLI scaffolds working crews in minutes. For single-agent applications, the OpenAI Agents SDK is similarly approachable with minimal boilerplate."
      }
    },
    {
      "@type": "Question",
      "name": "Can I use these frameworks with local or open-source LLMs?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes, all eight frameworks are provider-agnostic or support multiple providers. LangGraph, OpenAI Agents SDK, Mastra, and Vercel AI SDK all support 80+ LLM providers including local models via Ollama, vLLM, or similar. Haystack and Semantic Kernel have built-in support for local models. CrewAI supports any model via LiteLLM integration."
      }
    },
    {
      "@type": "Question",
      "name": "Which open source AI agent framework is most production-ready?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "LangGraph consistently ranks #1 in production-readiness with confirmed enterprise deployments at Klarna, Uber, Cisco, LinkedIn, JPMorgan, and Elastic. Semantic Kernel (via Microsoft Agent Framework) is the most production-ready for .NET/Azure teams with GA 1.0 and long-term support commitments."
      }
    },
    {
      "@type": "Question",
      "name": "Should I build AI agents in Python or TypeScript?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Python remains dominant with the deepest ecosystem of frameworks and tools. TypeScript is the fastest-growing alternative and is better if your stack is already JavaScript/TypeScript or if you're building agents integrated with web applications. Frameworks like Mastra and Vercel AI SDK are rapidly closing the feature parity gap."
      }
    },
    {
      "@type": "Question",
      "name": "How do open source agent frameworks compare to hosted platforms?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Code-first frameworks like LangGraph and CrewAI are libraries you integrate into your own application. Hosted platforms like Dify (~143k stars), LangSmith, and deepset Cloud operate at a higher abstraction level. They are often complementary — you might build agents with LangGraph and monitor them with LangSmith."
      }
    }
  ]
}
</script>]]></content><author><name>The Agent Report</name></author><category term="research" /><category term="frameworks" /><category term="comparison" /><category term="open-source" /><category term="guide" /><category term="langchain" /><category term="autogen" /><category term="crewai" /><category term="haystack" /><category term="semantic-kernel" /><category term="mastra" /><category term="vercel-ai-sdk" /><category term="openai-agents-sdk" /><summary type="html"><![CDATA[A comprehensive, data-driven comparison of the 8 most important open-source AI agent frameworks in 2026 — LangChain/LangGraph, AutoGen, CrewAI, OpenAI Agents SDK, Haystack, Semantic Kernel, Mastra, and Vercel AI SDK — with a detailed comparison table, deep dives, and a practical decision guide.]]></summary></entry></feed>