The AI agent landscape has exploded in 2026. New frameworks launch weekly, protocols are being ratified in real time, and the vocabulary is evolving faster than most of us can keep up with. Whether you’re reading a research paper, evaluating a vendor, or debugging a multi-agent system at 2 a.m., you’ve probably hit a term that made you pause and think, “Wait — what exactly does that mean in an agent context?” For a deeper dive into the architectures and frameworks these terms describe, our Complete Guide to AI Agents provides the full technical context.
This glossary is built for that moment. It covers the terms you’ll actually encounter: the core concepts that define how agents work, the frameworks everyone is debating on Hacker News, the technical primitives that power production systems, the safety vocabulary that regulators and red teams use, and the enterprise terminology that’s shaping how companies adopt agentic AI.
We’ve aimed for clarity over exhaustiveness. Every definition is 1–3 sentences, written in plain English, and grounded in how the term is used in practice — not in a whitepaper. Think of this as a field guide, not an encyclopedia. For a historical perspective on how these frameworks evolved, our 2025 open-source agent framework comparison shows where the landscape stood before the 2026 explosion.
Core Concepts
AI Agent
A software system that uses a large language model (LLM) as its reasoning engine to perceive its environment, make decisions, and take actions autonomously. Unlike a chatbot, an agent can plan multi-step tasks, use external tools, and adapt its behavior based on outcomes.
Autonomous Agent
An AI agent capable of operating with minimal or no human supervision over extended periods. Autonomous agents set their own sub-goals, recover from errors without intervention, and persist across sessions — key for production workloads like customer support triage or infrastructure monitoring.
Multi-Agent System (MAS)
An architecture where multiple AI agents collaborate, compete, or negotiate to solve problems that are too complex for a single agent. Each agent may have a specialized role (researcher, coder, reviewer) and the system includes protocols for communication, task delegation, and conflict resolution.
Agentic AI
A term describing AI systems that exhibit goal-directed, autonomous behavior — the quality of being an agent rather than a passive tool. Agentic AI implies planning, tool use, memory, and the ability to pursue objectives over multiple steps without step-by-step human prompting.
Tool Use
The ability of an AI agent to invoke external functions, APIs, or software tools to accomplish tasks beyond text generation. Tools can include web search, code execution, file system operations, database queries, or any external capability exposed through a defined interface.
Function Calling
A specific mechanism by which an LLM outputs structured data (typically JSON) that triggers a predefined function in the host application. Function calling is the most common implementation pattern for tool use — the model decides which function to call and with what arguments based on the user’s intent.
Reasoning
The cognitive process by which an LLM breaks down complex problems, evaluates alternatives, and draws logical conclusions before acting. Advanced reasoning techniques — like step-by-step decomposition and self-verification — are what separate simple instruction-following from genuine agentic behavior.
Planning
An agent’s ability to decompose a high-level goal into a sequence of actionable steps before execution. Effective planning involves anticipating dependencies, ordering tasks correctly, and dynamically re-planning when intermediate steps fail or produce unexpected results.
Memory (Short-Term / Long-Term)
Short-term memory refers to context held within the model’s context window during a single session — the current conversation, recent tool outputs, and in-flight reasoning. Long-term memory persists across sessions via external storage (vector databases, knowledge graphs, or structured logs), allowing agents to remember user preferences, past decisions, and learned patterns over days or months.
RAG (Retrieval-Augmented Generation)
A technique that grounds an LLM’s responses in external knowledge by retrieving relevant documents from a database before generating an answer. In agent systems, RAG is often used as a tool — the agent queries a knowledge base, retrieves context, and uses that context to inform decisions or responses, reducing hallucination on factual queries.
Orchestration
The coordination layer that manages how multiple agents, tools, and workflows interact within a larger system. Orchestration handles task routing, dependency management, state tracking, and error handling — it’s the conductor that keeps a multi-agent system from descending into chaos.
Agent Loop
The core execution cycle of an AI agent: observe (gather information from the environment or tool outputs), reason (analyze and decide what to do next), act (execute a tool call or produce output), and observe again. The loop repeats until the agent determines the task is complete or a termination condition is met.
ReAct (Reasoning + Acting)
A prompting and execution pattern where the agent interleaves reasoning traces with concrete actions. Instead of thinking fully and then acting, the agent thinks a step, acts, observes the result, thinks about the result, and acts again — producing more grounded and correctable behavior than pure chain-of-thought approaches.
Chain-of-Thought (CoT)
A prompting technique that instructs the LLM to produce intermediate reasoning steps before giving a final answer. By verbalizing its thinking, the model often achieves higher accuracy on complex reasoning tasks — and makes its decision process interpretable to human observers.
Tree-of-Thought (ToT)
An extension of chain-of-thought where the LLM explores multiple reasoning paths simultaneously, evaluates them, and prunes unpromising branches — much like a search algorithm. Tree-of-thought is especially powerful for planning and problem-solving tasks where the agent must consider several possible strategies before committing.
Frameworks & Platforms
LangChain
An open-source framework for building LLM-powered applications with a focus on composability. LangChain provides abstractions for chains, agents, tools, and memory, along with a growing ecosystem of integrations — making it one of the most widely adopted starting points for agent development.
AutoGen
Microsoft’s open-source multi-agent conversation framework. AutoGen lets developers define specialized agents that communicate through structured conversations, with built-in support for human-in-the-loop patterns, code execution sandboxes, and group chat topologies.
CrewAI
A Python framework for orchestrating role-based AI agents that work together as a “crew.” CrewAI assigns each agent a defined role, goal, and backstory, then manages sequential or hierarchical task execution — popular for rapid prototyping of multi-agent workflows.
OpenAI Agents SDK
OpenAI’s official software development kit for building, testing, and deploying AI agents. The SDK provides primitives for tool definitions, guardrails, handoffs between agents, and tracing — designed to work natively with OpenAI models and the Responses API.
Claude (Anthropic)
Anthropic’s family of frontier LLMs, widely used as the reasoning engine in agent systems. Claude models are known for strong instruction following, long context windows (up to 200K tokens), native tool-use capabilities, and safety-focused design principles that make them popular for production agent deployments.
Hermes Agent
An open-source AI agent runtime and personal assistant framework by Nous Research, designed to give users full control over their agent’s skills, plugins, memory, and model backend. Hermes Agent emphasizes local-first operation, cross-platform support, and a community-driven ecosystem of shareable skills and profiles.
Openclaw
An open-source personal AI agent platform focused on multi-channel communication (Telegram, Discord, WhatsApp, Slack, email, voice) with a plugin architecture. Openclaw emphasizes multi-profile management, policy plugins for compliance, and the ability to run entirely on user-owned infrastructure.
Haystack
An open-source NLP framework by deepset for building search and retrieval pipelines. In the agent ecosystem, Haystack is commonly used to implement RAG backends, document processing, and knowledge retrieval — often as a tool invoked by higher-level agent frameworks.
Semantic Kernel
Microsoft’s open-source SDK for integrating LLMs into applications with an emphasis on enterprise scenarios. Semantic Kernel provides a plugin model, orchestration patterns, and native integration with the Microsoft ecosystem (Azure, Copilot, Teams).
Microsoft Copilot Studio
A low-code platform for building custom AI copilots and agents within the Microsoft 365 ecosystem. Copilot Studio enables organizations to create agents that work across Teams, SharePoint, Dynamics 365, and Power Platform — with built-in connectors to enterprise data sources.
Technical
MCP (Model Context Protocol)
An open protocol developed by Anthropic that standardizes how AI models connect to external tools, data sources, and services. MCP defines a client-server architecture where agents (clients) discover and invoke capabilities exposed by MCP servers — analogous to how USB standardized peripheral connections, but for AI tool integration.
ACP (Agent Communication Protocol)
An emerging standard for how AI agents communicate with each other across different frameworks and platforms. ACP aims to solve agent-to-agent interoperability — allowing a LangChain agent to delegate work to an AutoGen agent using a common message format, capability discovery mechanism, and security model.
GGUF
A file format for storing quantized LLM weights, widely used in the local and open-source model ecosystem. GGUF enables running large models on consumer hardware by bundling model architecture metadata with compressed weights that tools like llama.cpp can load efficiently.
LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning technique that adds small, trainable adapter layers to a pre-trained model rather than modifying all weights. LoRA makes it practical to customize foundation models for specific agent tasks — like tool-calling or domain-specific reasoning — at a fraction of the cost and storage of full fine-tuning.
Quantization
The process of reducing the numerical precision of a model’s weights (e.g., from 16-bit to 4-bit) to decrease memory usage and inference latency. Quantization is essential for running capable agent models on edge devices, laptops, and cost-constrained cloud instances.
Fine-tuning
The process of further training a pre-trained LLM on a curated dataset to improve performance on a specific task or domain. In agent development, fine-tuning is used to improve tool-calling accuracy, teach domain-specific reasoning, or align model behavior with enterprise policies.
Structured Output
A capability where the LLM generates responses in a guaranteed format (typically JSON conforming to a schema) rather than free-form text. Structured output is critical for agent systems because tool calls, data extraction, and agent-to-agent messages must be machine-parseable with zero tolerance for malformed syntax.
JSON Mode
A specific LLM feature that constrains the model’s output to valid JSON. While less rigorous than full structured output with schema validation, JSON mode is widely supported and sufficient for many agent tool-calling implementations.
Rate Limiting
A mechanism that restricts how many requests an agent can make to an API or service within a given time window. Proper rate-limit handling — with exponential backoff, queuing, and graceful degradation — is essential for production agents that call external APIs without overwhelming them or exhausting budgets.
Token
The atomic unit of text that an LLM processes — roughly corresponding to a word fragment (~4 characters in English). Token count determines context window usage, API pricing, and latency, making token-aware design critical for cost-efficient agents that handle long conversations or large documents.
Context Window
The maximum number of tokens an LLM can process in a single forward pass, encompassing the system prompt, conversation history, tool outputs, and the current query. Modern agents rely on large context windows (128K–200K tokens) to maintain coherence across long, multi-turn interactions — but must still manage context strategically to avoid hitting limits.
Embedding
A numerical vector representation of text, images, or other data that captures semantic meaning in a high-dimensional space. Embeddings enable agents to perform similarity search, clustering, and retrieval — the mathematical foundation behind semantic memory and RAG systems.
Vector Database
A specialized database optimized for storing and querying high-dimensional vectors (embeddings). Vector databases power the “retrieval” half of RAG by enabling fast nearest-neighbor search across millions of documents — letting agents find semantically relevant information even when keywords don’t match.
Agent-to-Agent Communication
The mechanisms by which AI agents exchange information, delegate tasks, and coordinate actions. This can range from simple structured message passing to sophisticated protocols involving capability discovery, negotiation, and shared memory — and is a central challenge in multi-agent system design.
Safety & Alignment
Alignment
The field of ensuring that AI systems behave in accordance with human values, intentions, and safety constraints. In agent systems, alignment means the agent pursues its goals without causing unintended harm — even when the shortest path to the goal would violate ethical or operational boundaries.
RLHF (Reinforcement Learning from Human Feedback)
A training technique where human evaluators rank model outputs and those rankings are used to train a reward model that fine-tunes the LLM via reinforcement learning. RLHF has been the dominant approach for teaching models to be helpful, harmless, and aligned with user intent.
Constitutional AI
Anthropic’s alignment methodology where an AI is trained to follow a written “constitution” of principles rather than relying solely on human feedback. The model self-critiques and revises its outputs against these principles, enabling scalable oversight without requiring humans to review every output.
Red Teaming
The adversarial practice of probing an AI system for vulnerabilities, harmful behaviors, or alignment failures before deployment. Red teams simulate attacks — from prompt injection to social engineering — to identify weaknesses that need to be addressed via guardrails, fine-tuning, or architectural changes.
Prompt Injection
A security attack where malicious instructions are embedded in data that an agent processes (e.g., a web page, email, or document), causing the agent to disregard its original instructions and follow the attacker’s commands. Prompt injection is one of the most challenging unsolved security problems in agent systems.
Guardrails
Protective constraints placed around an agent’s behavior — implemented as input filters, output validators, or runtime monitors. Guardrails can enforce content policies, prevent harmful actions, validate tool calls against schemas, and ensure the agent stays within its defined operational boundaries.
Sandboxing
The practice of running agent code execution, tool invocations, or entire agent instances in isolated environments with restricted permissions. Sandboxing prevents an agent from causing damage if it makes a mistake or is compromised — critical for agents that execute arbitrary code or access file systems.
Agent Safety
The interdisciplinary field concerned with ensuring that autonomous AI agents operate reliably, predictably, and without causing harm — even in unexpected situations. Agent safety encompasses alignment, robustness, monitoring, and the design of “off-switch” mechanisms that remain under human control.
Interpretability
The study of understanding why an AI model made a specific decision, by examining its internal representations, attention patterns, or reasoning traces. In agent systems, interpretability is essential for debugging failures, building trust with users, and satisfying regulatory requirements for explainable AI.
Jailbreaking
The practice of circumventing an AI system’s safety restrictions through crafted prompts, role-playing scenarios, or encoding tricks. Agent systems face heightened jailbreak risk because their tool-use and multi-step reasoning capabilities create larger attack surfaces for bypassing guardrails.
Enterprise & Industry
SLA (Service Level Agreement)
A contractual commitment defining the expected performance, availability, and reliability of an AI agent service. For production agents, SLAs cover uptime (e.g., 99.9%), response latency, accuracy thresholds, and escalation procedures — critical for enterprise procurement and vendor evaluation.
RPA (Robotic Process Automation)
A technology for automating structured, rule-based business processes — such as data entry, invoice processing, or form submission. While traditional RPA follows fixed scripts, the industry is converging with AI agents to create “intelligent automation” that handles exceptions and unstructured data.
ERP Agent
An AI agent integrated with Enterprise Resource Planning systems (SAP, Oracle, Microsoft Dynamics) to automate workflows like order-to-cash, procurement, and financial close. ERP agents represent one of the largest enterprise adoption vectors for agentic AI, with SAP deploying 200+ production agents in 2026.
Autonomous Enterprise
A vision of the future organization where AI agents handle the majority of operational, analytical, and decision-support tasks — with humans shifting to strategic oversight, exception handling, and creative direction. The autonomous enterprise is the endpoint of the agent adoption curve that began with RPA and is accelerating through LLM-powered agents.
Digital Worker
A term used in enterprise contexts to describe an AI agent that performs a specific job function — analogous to a human employee. Digital workers have defined roles, performance metrics, access permissions, and escalation paths, and are increasingly managed alongside human teams in workforce orchestration platforms.
Compliance
The requirement that AI agent systems adhere to regulatory frameworks (GDPR, SOC 2, HIPAA, EU AI Act), industry standards, and internal governance policies. Compliance covers data handling, decision auditability, bias monitoring, and the ability to explain agent actions to regulators and auditors.
Observability
The practice of instrumenting agent systems to understand their internal state through logs, metrics, traces, and dashboards. In agent contexts, observability goes beyond traditional APM — it must capture reasoning chains, tool-call sequences, memory access patterns, and multi-agent interactions to enable debugging and optimization.