diff --git a/agent-framework/TOC.yml b/agent-framework/TOC.yml index f667e37c4..cf7b2f37b 100644 --- a/agent-framework/TOC.yml +++ b/agent-framework/TOC.yml @@ -198,12 +198,38 @@ items: href: integrations/ag-ui/frontend-tools.md - name: Security Considerations href: integrations/ag-ui/security-considerations.md + - name: Workflows + href: integrations/ag-ui/workflows.md - name: Human-in-the-Loop href: integrations/ag-ui/human-in-the-loop.md + - name: MCP Apps Compatibility + href: integrations/ag-ui/mcp-apps.md - name: State Management href: integrations/ag-ui/state-management.md - name: Testing with Dojo href: integrations/ag-ui/testing-with-dojo.md +- name: The Agent Development Journey + items: + - name: Overview + href: journey/index.md + - name: LLM Fundamentals + href: journey/llm-fundamentals.md + - name: From LLMs to Agents + href: journey/from-llms-to-agents.md + - name: Adding Tools + href: journey/adding-tools.md + - name: Adding Skills + href: journey/adding-skills.md + - name: Adding Middleware + href: journey/adding-middleware.md + - name: Context Providers + href: journey/adding-context-providers.md + - name: Agents as Tools + href: journey/agents-as-tools.md + - name: "Agent-to-Agent (A2A)" + href: journey/agent-to-agent.md + - name: Workflows + href: journey/workflows.md - name: DevUI items: - name: Overview diff --git a/agent-framework/agents/rag.md b/agent-framework/agents/rag.md index 792c8f000..7e035f544 100644 --- a/agent-framework/agents/rag.md +++ b/agent-framework/agents/rag.md @@ -43,6 +43,9 @@ AIAgent agent = azureOpenAIClient The `TextSearchProvider` requires a function that provides the search results given a query. This can be implemented using any search technology, e.g. Azure AI Search, or a web search engine. +> [!TIP] +> See the [Vector Stores integration](../integrations/index.md#vector-stores) documentation for more information on how to use a vector store for search results. + Here is an example of a mock search function that returns pre-defined results based on the query. `SourceName` and `SourceLink` are optional, but if provided will be used by the agent to cite the source of the information when answering the user's question. diff --git a/agent-framework/integrations/ag-ui/human-in-the-loop.md b/agent-framework/integrations/ag-ui/human-in-the-loop.md index a1c4d89ee..c766b299a 100644 --- a/agent-framework/integrations/ag-ui/human-in-the-loop.md +++ b/agent-framework/integrations/ag-ui/human-in-the-loop.md @@ -629,10 +629,10 @@ The server middleware must remove approval protocol messages after processing: - **Solution**: After converting approval responses, remove both the `request_approval` tool call and its result message - **Reason**: Prevents "tool_calls must be followed by tool messages" errors -## Next Steps +## Next steps - -- **[Explore Function Tools](../../agents/tools/tool-approval.md)**: Learn more about approval patterns in Agent Framework +> [!div class="nextstepaction"] +> [MCP Apps Compatibility](./mcp-apps.md) ::: zone-end @@ -1116,12 +1116,10 @@ def transfer_funds(...): pass def close_account(...): pass ``` -## Next Steps +## Next steps -Now that you understand human-in-the-loop, you can: - -- **[Learn State Management](state-management.md)**: Manage shared state with approval workflows -- **[Explore Advanced Patterns](../../agents/tools/tool-approval.md)**: Learn more about approval patterns in Agent Framework +> [!div class="nextstepaction"] +> [MCP Apps Compatibility](./mcp-apps.md) ## Additional Resources diff --git a/agent-framework/integrations/ag-ui/index.md b/agent-framework/integrations/ag-ui/index.md index 870a31a87..ddbc53bd9 100644 --- a/agent-framework/integrations/ag-ui/index.md +++ b/agent-framework/integrations/ag-ui/index.md @@ -48,7 +48,9 @@ The Agent Framework AG-UI integration supports all 7 AG-UI protocol features: ## Build agent UIs with CopilotKit -[CopilotKit](https://copilotkit.ai/) provides rich UI components for building agent user interfaces based on the standard AG-UI protocol. CopilotKit supports streaming chat interfaces, frontend & backend tool calling, human-in-the-loop interactions, generative UI, shared state, and much more. You can see a examples of the various agent UI scenarios that CopilotKit supports in the [AG-UI Dojo](https://dojo.ag-ui.com/microsoft-agent-framework-dotnet) sample application. +[CopilotKit](https://copilotkit.ai/) provides rich UI components for building agent user interfaces based on the standard AG-UI protocol. CopilotKit supports streaming chat interfaces, frontend & backend tool calling, human-in-the-loop interactions, generative UI, shared state, and much more. You can see examples of the various agent UI scenarios that CopilotKit supports in the [AG-UI Dojo](https://dojo.ag-ui.com/microsoft-agent-framework-dotnet) sample application. + +To connect a CopilotKit React frontend to an Agent Framework AG-UI backend, register your endpoint as an `HttpAgent` in the CopilotKit runtime. This allows CopilotKit's frontend tools to flow through as AG-UI client tools, and all AG-UI features (streaming, approvals, state sync) work automatically. CopilotKit helps you focus on your agent’s capabilities while delivering a polished user experience without reinventing the wheel. To learn more about getting started with Microsoft Agent Framework and CopilotKit, see the [Microsoft Agent Framework integration for CopilotKit](https://docs.copilotkit.ai/microsoft-agent-framework) documentation. @@ -136,8 +138,8 @@ To get started with AG-UI integration: 1. **[Getting Started](getting-started.md)**: Build your first AG-UI server and client 2. **[Backend Tool Rendering](backend-tool-rendering.md)**: Add function tools to your agents - - +3. **[Human-in-the-Loop](human-in-the-loop.md)**: Implement approval workflows +4. **[State Management](state-management.md)**: Synchronize state between client and server ## Additional Resources @@ -244,14 +246,17 @@ To get started with AG-UI integration: 1. **[Getting Started](getting-started.md)**: Build your first AG-UI server and client 2. **[Backend Tool Rendering](backend-tool-rendering.md)**: Add function tools to your agents - - +3. **[Workflows](workflows.md)**: Expose multi-agent workflows through AG-UI +4. **[Human-in-the-Loop](human-in-the-loop.md)**: Implement approval workflows +5. **[MCP Apps Compatibility](mcp-apps.md)**: Use MCP Apps with your AG-UI endpoint +6. **[State Management](state-management.md)**: Synchronize state between client and server ## Additional Resources - [Agent Framework Documentation](../../overview/index.md) - [AG-UI Protocol Documentation](https://docs.ag-ui.com/introduction) - [AG-UI Dojo App](https://dojo.ag-ui.com/) - Example application demonstrating Agent Framework integration +- [CopilotKit MAF Integration](https://docs.copilotkit.ai/microsoft-agent-framework) - Connect CopilotKit React frontends to AG-UI backends - [Agent Framework GitHub Repository](https://github.com/microsoft/agent-framework) ::: zone-end diff --git a/agent-framework/integrations/ag-ui/mcp-apps.md b/agent-framework/integrations/ag-ui/mcp-apps.md new file mode 100644 index 000000000..a4701bdf7 --- /dev/null +++ b/agent-framework/integrations/ag-ui/mcp-apps.md @@ -0,0 +1,113 @@ +--- +title: MCP Apps Compatibility with AG-UI +description: Learn how Agent Framework Python AG-UI endpoints work with CopilotKit's MCPAppsMiddleware for MCP Apps integration +zone_pivot_groups: programming-languages +author: moonbox3 +ms.topic: conceptual +ms.author: evmattso +ms.date: 04/09/2026 +ms.service: agent-framework +--- + +# MCP Apps Compatibility with AG-UI + +::: zone pivot="programming-language-csharp" + +> [!NOTE] +> MCP Apps compatibility documentation for the .NET AG-UI integration is coming soon. + +::: zone-end + +::: zone pivot="programming-language-python" + +Agent Framework Python AG-UI endpoints are compatible with the AG-UI ecosystem's [MCP Apps](https://docs.ag-ui.com/concepts/mcp-apps) feature. MCP Apps allows frontend applications to embed MCP-powered tools and resources alongside your AG-UI agent — no changes needed on the Python side. + +## Architecture + +MCP Apps support is provided by CopilotKit's TypeScript `MCPAppsMiddleware` (`@ag-ui/mcp-apps-middleware`), which sits between the frontend and your Agent Framework backend: + +``` +┌─────────────────────────┐ +│ Frontend │ +│ (CopilotKit / AG-UI) │ +└────────┬────────────────┘ + │ + ▼ +┌─────────────────────────┐ +│ CopilotKit Runtime / │ +│ Node.js Proxy │ +│ + MCPAppsMiddleware │ +└────────┬────────────────┘ + │ AG-UI protocol + ▼ +┌─────────────────────────┐ +│ Agent Framework │ +│ FastAPI AG-UI Endpoint │ +└─────────────────────────┘ +``` + +The middleware layer handles MCP tool discovery, iframe-proxied resource requests, and `ui/resourceUri` resolution. Your Python AG-UI endpoint receives standard AG-UI requests and is unaware of the MCP Apps layer. + +## No Python-Side Changes Required + +MCP Apps integration is entirely handled by the TypeScript middleware. Your existing `add_agent_framework_fastapi_endpoint()` setup works as-is: + +```python +from agent_framework import Agent +from agent_framework.ag_ui import add_agent_framework_fastapi_endpoint +from fastapi import FastAPI + +app = FastAPI() +agent = Agent(name="my-agent", instructions="...", client=chat_client) + +# This endpoint is MCP Apps-compatible with no additional configuration +add_agent_framework_fastapi_endpoint(app, agent, "/") +``` + +This approach is consistent with how MCP Apps works with all other AG-UI Python integrations — the MCP Apps layer is always in the TypeScript middleware, not in the Python backend. + +## Setting Up the Middleware + +To use MCP Apps with your Agent Framework backend, set up a CopilotKit Runtime or Node.js proxy that includes `MCPAppsMiddleware` and points at your Python endpoint: + +```typescript +// Example Node.js proxy configuration (TypeScript) +import { MCPAppsMiddleware } from "@ag-ui/mcp-apps-middleware"; + +const middleware = new MCPAppsMiddleware({ + agents: [ + { + name: "my-agent", + url: "http://localhost:8888/", // Your MAF AG-UI endpoint + }, + ], + mcpApps: [ + // MCP app configurations + ], +}); +``` + +For full setup instructions, see the [CopilotKit MCP Apps documentation](https://docs.copilotkit.ai/copilotkit-mcp/mcp-overview) and the [AG-UI MCP Apps documentation](https://docs.ag-ui.com/concepts/mcp-apps). + +## What Is Not in Scope + +The following are explicitly **not** part of the Python AG-UI integration: + +- **No Python `MCPAppsMiddleware`**: MCP Apps middleware runs in the TypeScript layer only. +- **No FastAPI handling of iframe-proxied MCP requests**: Resource proxying is handled by the Node.js middleware. +- **No Python-side `ui/resourceUri` discovery**: Resource URI resolution is a middleware concern. + +If your application doesn't need the MCP Apps middleware layer, your Agent Framework AG-UI endpoint works directly with any AG-UI-compatible client. + +## Next steps + +> [!div class="nextstepaction"] +> [State Management](./state-management.md) + +## Additional Resources + +- [AG-UI MCP Apps Documentation](https://docs.ag-ui.com/concepts/mcp-apps) +- [CopilotKit MCP Apps Documentation](https://docs.copilotkit.ai/copilotkit-mcp/mcp-overview) +- [Agent Framework GitHub Repository](https://github.com/microsoft/agent-framework) + +::: zone-end diff --git a/agent-framework/integrations/ag-ui/workflows.md b/agent-framework/integrations/ag-ui/workflows.md new file mode 100644 index 000000000..ea6bcda16 --- /dev/null +++ b/agent-framework/integrations/ag-ui/workflows.md @@ -0,0 +1,324 @@ +--- +title: Workflows with AG-UI +description: Learn how to expose Agent Framework workflows through AG-UI with step tracking, interrupt/resume, and custom events +zone_pivot_groups: programming-languages +author: moonbox3 +ms.topic: tutorial +ms.author: evmattso +ms.date: 04/09/2026 +ms.service: agent-framework +--- + +# Workflows with AG-UI + +::: zone pivot="programming-language-csharp" + +> [!NOTE] +> Workflow support for the .NET AG-UI integration is coming soon. + +::: zone-end + +::: zone pivot="programming-language-python" + +This tutorial shows you how to expose Agent Framework workflows through an AG-UI endpoint. Workflows orchestrate multiple agents and tools in a defined execution graph, and the AG-UI integration streams rich workflow events — step tracking, activity snapshots, interrupts, and custom events — to web clients in real time. + +## Prerequisites + +Before you begin, ensure you have: + +- Python 3.10 or later +- `agent-framework-ag-ui` installed +- Familiarity with the [Getting Started](getting-started.md) tutorial +- Basic understanding of Agent Framework [workflows](../../workflows/index.md) + +## When to Use Workflows with AG-UI + +Use a workflow instead of a single agent when you need: + +- **Multi-agent orchestration**: Route tasks between specialized agents (for example, triage → refund → order) +- **Structured execution steps**: Track progress through defined stages with `STEP_STARTED` / `STEP_FINISHED` events +- **Interrupt / resume flows**: Pause execution to collect human input or approvals, then resume +- **Custom event streaming**: Emit domain-specific events (`request_info`, `status`, `workflow_output`) to the client + +## Wrapping a Workflow with AgentFrameworkWorkflow + +`AgentFrameworkWorkflow` is a lightweight wrapper that adapts a native `Workflow` to the AG-UI protocol. You can provide either a pre-built workflow instance or a factory that creates a new workflow per thread. + +### Direct instance + +Use a direct instance when a single workflow object can safely serve all requests (for example, stateless pipelines): + +```python +from agent_framework import Workflow +from agent_framework.ag_ui import AgentFrameworkWorkflow + +workflow = build_my_workflow() # returns a Workflow + +ag_ui_workflow = AgentFrameworkWorkflow( + workflow=workflow, + name="my-workflow", + description="Single-instance workflow.", +) +``` + +### Thread-scoped factory + +Use `workflow_factory` when each conversation thread needs its own workflow state. The factory receives the `thread_id` and returns a fresh `Workflow`: + +```python +from agent_framework.ag_ui import AgentFrameworkWorkflow + +ag_ui_workflow = AgentFrameworkWorkflow( + workflow_factory=lambda thread_id: build_my_workflow(), + name="my-workflow", + description="Thread-scoped workflow.", +) +``` + +> [!IMPORTANT] +> You must pass **either** `workflow` **or** `workflow_factory`, not both. The wrapper raises a `ValueError` if both are provided. + +## Registering the Endpoint + +Register the workflow with `add_agent_framework_fastapi_endpoint` the same way you would register a single agent: + +```python +from fastapi import FastAPI +from agent_framework.ag_ui import ( + AgentFrameworkWorkflow, + add_agent_framework_fastapi_endpoint, +) + +app = FastAPI(title="Workflow AG-UI Server") + +ag_ui_workflow = AgentFrameworkWorkflow( + workflow_factory=lambda thread_id: build_my_workflow(), + name="handoff-demo", + description="Multi-agent handoff workflow.", +) + +add_agent_framework_fastapi_endpoint( + app=app, + agent=ag_ui_workflow, + path="/workflow", +) +``` + +You can also pass a bare `Workflow` directly — the endpoint auto-wraps it in `AgentFrameworkWorkflow`: + +```python +add_agent_framework_fastapi_endpoint(app, my_workflow, "/workflow") +``` + +## AG-UI Events Emitted by Workflows + +Workflow runs emit a richer set of AG-UI events compared to single-agent runs: + +| Event | When emitted | Description | +|---|---|---| +| `RUN_STARTED` | Run begins | Marks the start of workflow execution | +| `STEP_STARTED` | An executor or superstep begins | `step_name` identifies the agent or step (for example, `"triage_agent"`) | +| `TEXT_MESSAGE_*` | Agent produces text | Standard streaming text events | +| `TOOL_CALL_*` | Agent invokes a tool | Standard tool call events | +| `STEP_FINISHED` | An executor or superstep completes | Closes the step for UI progress tracking | +| `CUSTOM` (`status`) | Workflow state changes | Contains `{"state": ""}` in the event value | +| `CUSTOM` (`request_info`) | Workflow requests human input | Contains the request payload for the client to render a prompt | +| `CUSTOM` (`workflow_output`) | Workflow produces output | Contains the final or intermediate output data | +| `RUN_FINISHED` | Run completes | May include `interrupts` if the workflow is waiting for input | + +Clients can use `STEP_STARTED` / `STEP_FINISHED` events to render progress indicators showing which agent is currently active. + +## Interrupt and Resume + +Workflows can pause execution to collect human input or tool approvals. The AG-UI integration handles this through the interrupt/resume protocol. + +### How interrupts work + +1. During execution, the workflow raises a pending request (for example, a `HandoffAgentUserRequest` asking for more details, or a tool with `approval_mode="always_require"`). +2. The AG-UI bridge emits a `CUSTOM` event with `name="request_info"` containing the request data. +3. The run finishes with a `RUN_FINISHED` event whose `interrupts` field contains a list of pending request objects: + + ```json + { + "type": "RUN_FINISHED", + "threadId": "abc123", + "runId": "run_xyz", + "interrupts": [ + { + "id": "request-id-1", + "value": { "request_type": "HandoffAgentUserRequest", "data": "..." } + } + ] + } + ``` + +4. The client renders UI for the user to respond (a text input, an approval button, etc.). + +### How resume works + +The client sends a new request with the `resume` payload containing the user's responses keyed by interrupt ID: + +```json +{ + "threadId": "abc123", + "messages": [], + "resume": { + "interrupts": [ + { + "id": "request-id-1", + "value": "User's response text or approval decision" + } + ] + } +} +``` + +The server converts the resume payload into workflow responses and continues execution from where it paused. + +## Complete Example: Multi-Agent Handoff Workflow + +This example shows a customer-support workflow with three agents that hand off work to each other, use tools requiring approval, and request human input when needed. + +### Define the agents and tools + +```python +"""AG-UI workflow server with multi-agent handoff.""" + +import os + +from agent_framework import Agent, Message, Workflow, tool +from agent_framework.ag_ui import ( + AgentFrameworkWorkflow, + add_agent_framework_fastapi_endpoint, +) +from agent_framework.azure import AzureOpenAIResponsesClient +from agent_framework.orchestrations import HandoffBuilder +from azure.identity import AzureCliCredential +from fastapi import FastAPI +from fastapi.middleware.cors import CORSMiddleware + + +@tool(approval_mode="always_require") +def submit_refund(refund_description: str, amount: str, order_id: str) -> str: + """Capture a refund request for manual review before processing.""" + return f"Refund recorded for order {order_id} (amount: {amount}): {refund_description}" + + +@tool(approval_mode="always_require") +def submit_replacement(order_id: str, shipping_preference: str, replacement_note: str) -> str: + """Capture a replacement request for manual review before processing.""" + return f"Replacement recorded for order {order_id} (shipping: {shipping_preference}): {replacement_note}" + + +@tool(approval_mode="never_require") +def lookup_order_details(order_id: str) -> dict[str, str]: + """Return order details for a given order ID.""" + return { + "order_id": order_id, + "item_name": "Wireless Headphones", + "amount": "$129.99", + "status": "delivered", + } +``` + +### Build the workflow + +```python +def create_handoff_workflow() -> Workflow: + """Build a handoff workflow with triage, refund, and order agents.""" + client = AzureOpenAIResponsesClient( + project_endpoint=os.environ["AZURE_AI_PROJECT_ENDPOINT"], + deployment_name=os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"], + credential=AzureCliCredential(), + ) + + triage = Agent(id="triage_agent", name="triage_agent", instructions="...", client=client) + refund = Agent(id="refund_agent", name="refund_agent", instructions="...", client=client, + tools=[lookup_order_details, submit_refund]) + order = Agent(id="order_agent", name="order_agent", instructions="...", client=client, + tools=[lookup_order_details, submit_replacement]) + + def termination_condition(conversation: list[Message]) -> bool: + for msg in reversed(conversation): + if msg.role == "assistant" and (msg.text or "").strip().lower().endswith("case complete."): + return True + return False + + builder = HandoffBuilder( + name="support_workflow", + participants=[triage, refund, order], + termination_condition=termination_condition, + ) + builder.add_handoff(triage, [refund], description="Route refund requests.") + builder.add_handoff(triage, [order], description="Route replacement requests.") + builder.add_handoff(refund, [order], description="Route to order after refund.") + builder.add_handoff(order, [triage], description="Route back after completion.") + + return builder.with_start_agent(triage).build() +``` + +### Create the FastAPI app + +```python +app = FastAPI(title="Workflow AG-UI Demo") +app.add_middleware( + CORSMiddleware, + allow_origins=["*"], + allow_credentials=True, + allow_methods=["*"], + allow_headers=["*"], +) + +ag_ui_workflow = AgentFrameworkWorkflow( + workflow_factory=lambda _thread_id: create_handoff_workflow(), + name="support_workflow", + description="Customer support handoff workflow.", +) + +add_agent_framework_fastapi_endpoint( + app=app, + agent=ag_ui_workflow, + path="/support", +) + +if __name__ == "__main__": + import uvicorn + uvicorn.run(app, host="127.0.0.1", port=8888) +``` + +### Event sequence + +A typical multi-turn interaction produces events like: + +``` +RUN_STARTED threadId=abc123 +STEP_STARTED stepName=triage_agent +TEXT_MESSAGE_START role=assistant +TEXT_MESSAGE_CONTENT delta="I'll look into your refund..." +TEXT_MESSAGE_END +STEP_FINISHED stepName=triage_agent +STEP_STARTED stepName=refund_agent +TOOL_CALL_START toolCallName=lookup_order_details +TOOL_CALL_ARGS delta='{"order_id":"12345"}' +TOOL_CALL_END +TOOL_CALL_START toolCallName=submit_refund +TOOL_CALL_ARGS delta='{"order_id":"12345","amount":"$129.99",...}' +TOOL_CALL_END +RUN_FINISHED interrupts=[{id: "...", value: {function_approval_request}}] +``` + +The client can then display an approval dialog and resume with the user's decision. + +## Next steps + +> [!div class="nextstepaction"] +> [Human-in-the-Loop](./human-in-the-loop.md) + +## Additional Resources + +- [AG-UI Overview](index.md) +- [Getting Started](getting-started.md) +- [Agent Framework Workflows](../../workflows/index.md) +- [Agent Framework GitHub Repository](https://github.com/microsoft/agent-framework) + +::: zone-end diff --git a/agent-framework/integrations/chat-history-memory-provider.md b/agent-framework/integrations/chat-history-memory-provider.md index 7bb960cc3..e5ab2fffd 100644 --- a/agent-framework/integrations/chat-history-memory-provider.md +++ b/agent-framework/integrations/chat-history-memory-provider.md @@ -27,11 +27,14 @@ Stored messages are scoped using configurable identifiers (application, agent, u ## Prerequisites -- A vector store implementation from [Microsoft.Extensions.VectorData](https://www.nuget.org/packages/Microsoft.Extensions.VectorData.Abstractions) (for example, [`InMemoryVectorStore`](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.InMemory), [Azure AI Search](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.AzureAISearch), or [other supported stores](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors)) +- A vector store implementation from 📦 [Microsoft.Extensions.VectorData.Abstractions](https://www.nuget.org/packages/Microsoft.Extensions.VectorData.Abstractions) (for example, 📦 [`InMemoryVectorStore`](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.InMemory), 📦 [Azure AI Search](https://www.nuget.org/packages/Microsoft.SemanticKernel.Connectors.AzureAISearch), or [other supported stores](./index.md#vector-store-abstraction-implementations)) - An embedding model configured on your vector store - Azure OpenAI or OpenAI deployment for the chat model - .NET 8.0 or later +> [!TIP] +> See the [Vector Stores integration](./index.md#vector-stores) documentation for more information on the VectorData abstraction and available implementations. + ## Usage The following example demonstrates creating an agent with the `ChatHistoryMemoryProvider` using an in-memory vector store. diff --git a/agent-framework/integrations/index.md b/agent-framework/integrations/index.md index 657becd30..9bd4a6243 100644 --- a/agent-framework/integrations/index.md +++ b/agent-framework/integrations/index.md @@ -100,6 +100,56 @@ Here is a list of existing providers that can be used. ::: zone-end +## Vector Stores + +Microsoft Agent Framework supports integration with many different vector stores. These can be useful for doing Retrieval Augmented Generation (RAG) or storage of memories. + +::: zone pivot="programming-language-csharp" + +To integrate with vector stores, we rely on the 📦 [Microsoft.Extensions.VectorData.Abstractions](https://www.nuget.org/packages/Microsoft.Extensions.VectorData.Abstractions) package which provides a unified layer of abstractions for interacting with vector stores in .NET. +These abstractions let you write simple, high-level code against a single API, and swap out the underlying vector store with minimal changes to your application. Where Agent Framework components rely on a vector store, they use these abstractions to allow you to choose your preferred implementation. + +> [!TIP] +> See the [Vector databases for .NET AI apps](/dotnet/ai/vector-stores/overview) documentation for more information on how to ingest data into a vector store, generate embeddings, and do vector or hybrid searches. + +### Vector Store Abstraction Implementations + +| Implementation | C# | Uses officially supported SDK | Maintainer / Vendor | +| ---------------------------------------------------------------------------------------------------------------------------- | :------------------------: | :---------------------------: | :-----------------: | +| [Azure AI Search](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/azure-ai-search-connector) | ✅ | ✅ | Microsoft | +| [Cosmos DB MongoDB (vCore)](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/azure-cosmosdb-mongodb-connector) | ✅ | ✅ | Microsoft | +| [Cosmos DB No SQL](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/azure-cosmosdb-nosql-connector) | ✅ | ✅ | Microsoft | +| [Couchbase](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/couchbase-connector) | ✅ | ✅ | Couchbase | +| [Elasticsearch](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/elasticsearch-connector) | ✅ | ✅ | Elastic | +| [In-Memory](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/inmemory-connector) | ✅ | N/A | Microsoft | +| [MongoDB](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/mongodb-connector) | ✅ | ✅ | Microsoft | +| [Neon Serverless Postgres](https://neon.com) | Use [Postgres Connector](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/postgres-connector) | ✅ | Microsoft | +| [Oracle](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/oracle-connector) | ✅ | ✅ | Oracle | +| [Pinecone](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/pinecone-connector) | ✅ | ❌ | Microsoft | +| [Postgres](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/postgres-connector) | ✅ | ✅ | Microsoft | +| [Qdrant](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/qdrant-connector) | ✅ | ✅ | Microsoft | +| [Redis](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/redis-connector) | ✅ | ✅ | Microsoft | +| [SQL Server](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/sql-connector) | ✅ | ✅ | Microsoft | +| [SQLite](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/sqlite-connector) | ✅ | ✅ | Microsoft | +| [Volatile (In-Memory)](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/volatile-connector) | Deprecated (use In-Memory) | N/A | Microsoft | +| [Weaviate](/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/weaviate-connector) | ✅ | ✅ | Microsoft | + +> [!IMPORTANT] +> The vector store abstraction implementations are built by a variety of sources. Not all connectors are maintained by Microsoft. When considering an implementation, be sure to evaluate quality, licensing, support, etc. to ensure they meet your requirements. Also make sure you review each provider's documentation for detailed version compatibility information. + +> [!IMPORTANT] +> Some implementations are internally using Database SDKs that are not officially supported by Microsoft or by the Database provider. The *Uses Officially supported SDK* column lists which are using officially supported SDKs and which are not. + +::: zone-end + +::: zone pivot="programming-language-python" + +Agent Framework supports using Semantic Kernel's VectorStore collections to provide vector storage capabilities to agents. +See [the vector store connectors documentation](/semantic-kernel/concepts/vector-store-connectors) to learn how to set up different vector store collections. +See [Creating a search tool from a VectorStore](../agents/rag.md#creating-a-search-tool-from-vectorstore) for more information on how to use these for RAG. + +::: zone-end + ## Next steps > [!div class="nextstepaction"] diff --git a/agent-framework/journey/adding-context-providers.md b/agent-framework/journey/adding-context-providers.md new file mode 100644 index 000000000..93f093df8 --- /dev/null +++ b/agent-framework/journey/adding-context-providers.md @@ -0,0 +1,125 @@ +--- +title: Adding Context Providers +description: Understand what context providers are, why agents need them, and how they inject memory, knowledge, and dynamic data into the agent's context window. +author: TaoChenOSU +ms.topic: conceptual +ms.author: taochen +ms.date: 04/06/2026 +ms.service: agent-framework +--- + +# Adding Context Providers + +The [previous page](adding-middleware.md) showed how middleware wraps the agent's execution pipeline with cross-cutting concerns — logging, guardrails, error handling — without touching the agent's core logic. But middleware deals with *how* the agent runs, not *what* the agent knows. So far, the agent's knowledge comes from two places: its training data and whatever the user says in the current turn. + +That's a problem. A useful agent needs more than that. It needs to recall what the user said three turns ago, know the user's preferences, or pull relevant facts from a knowledge base — all *before* it starts generating a response. Tools can fetch information, but they're reactive: the model must decide to call them. If the model doesn't realize it needs context, it won't ask for it. + +**Context providers** solve this. They're components that run before and after each agent invocation, proactively injecting relevant information into the context window and optionally extracting state from the response to be stored for future use. They give your agent memory, personalization, and access to external knowledge — without changing the agent's instructions or code. + +## When to use this + +Add context providers to your agent when: + +- The agent needs **conversation history** — it should remember what was said in previous turns, not just the current message. +- You want to inject **user-specific data** — profiles, preferences, account details, or session state — so the agent can personalize its responses. +- You need **retrieval-augmented generation (RAG)** — automatically fetching relevant documents or facts from a knowledge base before each response. +- The agent requires **dynamic instructions** — context that changes between invocations based on the time of day, the user's location, or other runtime conditions. +- You want to **decouple data sourcing from agent logic** — the agent doesn't need to know *where* context comes from, only that it's available. + +## Why not just use tools? + +Tools and context providers both give agents access to external information, but they work in fundamentally different ways: + +| Aspect | Tools | Context providers | +|--------|-------|-------------------| +| **Trigger** | Reactive — the model decides when to call a tool | Proactive — runs automatically before every invocation | +| **Control** | Model-driven: the model chooses which tool, when, and with what arguments | Developer-driven: you decide what context is always available | +| **Visibility** | The model must know a tool exists and judge that it's relevant | Context is injected transparently — the model sees it as part of the prompt | +| **Use case** | On-demand actions and lookups: "search the web," "query the database" | Always-present context: conversation history, user profiles, preloaded knowledge | +| **Token cost** | Tokens spent only when the tool is called | Tokens spent on every invocation (the context is always in the prompt) | + +Neither is strictly better. Many agents use both: context providers for information that should *always* be present (history, user profile, core knowledge), and tools for information the agent should fetch *on demand* (live search results, database queries, API calls). + +> [!TIP] +> A good rule of thumb: if the agent should have this information *every single time* it runs, use a context provider. If the agent should fetch it *only when relevant*, use a tool. + +## How context providers work + +Context providers participate in a two-phase lifecycle around each agent invocation: + +``` +┌──────────────────────────────────────────────────────────────┐ +│ Caller: agent.run("What's the return policy?") │ +└──────────────┬───────────────────────────────────────────────┘ + ▼ +┌──────────────────────────────────────────────────────────────┐ +│ BEFORE RUN — each context provider injects context │ +│ │ +│ • History provider loads past conversation messages │ +│ • Memory provider retrieves relevant facts/preferences │ +│ • RAG provider searches knowledge base and adds results │ +│ • Custom provider injects user profile, time, location │ +└──────────────┬───────────────────────────────────────────────┘ + ▼ +┌──────────────────────────────────────────────────────────────┐ +│ Agent core — model sees original input + all injected │ +│ context and generates a response │ +└──────────────┬───────────────────────────────────────────────┘ + ▼ +┌──────────────────────────────────────────────────────────────┐ +│ AFTER RUN — each context provider processes the response │ +│ │ +│ • History provider saves the new messages │ +│ • Memory provider extracts facts to remember for later │ +│ • Custom provider updates session state │ +└──────────────────────────────────────────────────────────────┘ +``` + +Key points: + +1. **Context providers run automatically.** You register them once when creating the agent. After that, they participate in every invocation without any extra code on your part. +2. **Multiple providers compose together.** You can register several context providers — a history provider, a RAG provider, and a custom provider — and they all contribute to the same context window. Their contributions are merged in registration order. +3. **Providers have two hooks.** The *before* hook injects context (messages, instructions, tools) into the prompt. The *after* hook processes the response — storing messages, extracting memories, or updating state. +4. **Providers are session-aware.** Context providers receive the current session, so they can load and store data scoped to a specific conversation. See [Sessions](../agents/conversations/session.md) for how session management works. + +> [!TIP] +> For a detailed view of where context providers sit in the full agent execution pipeline — alongside middleware and the chat client — see the [Agent Pipeline Architecture](../agents/agent-pipeline.md). + +## Managing the context window + +Every piece of context you inject consumes tokens from the model's context window. History grows with each turn. RAG results add document chunks. User profiles add metadata. If the total exceeds the model's limit, the oldest or least relevant information gets truncated — potentially losing important context. + +Context window management is a critical consideration when using context providers: **Compaction** strategies summarize or trim older history to stay within token limits while preserving key information. See [Compaction](../agents/conversations/compaction.md). + +> [!TIP] +> For hands-on experience with memory and context providers, see [Step 4: Memory](../get-started/memory.md) in the Get Started tutorial. + +> [!IMPORTANT] +> It is not recommended to maintain a very long context window, as the performance of the model may degrade as the context window grows. If the agent starts to experience degraded performance, consider using compaction strategies to reduce the context size. + +## Considerations + +| Consideration | Details | +|---------------|---------| +| **Token budget** | Every injected context consumes tokens. Monitor total context size carefully — especially when combining multiple providers. If context grows unbounded, important information gets truncated silently. | +| **Retrieval latency** | Context providers that query external services (databases, search indexes, APIs) add latency to every invocation. Use caching, connection pooling, and async operations to keep retrieval fast. | +| **Relevance** | Injecting irrelevant context doesn't just waste tokens — it can actively degrade the model's responses by diluting the signal. Make sure your providers inject focused, relevant information. | +| **Staleness** | Cached or preloaded context can become outdated. Design providers to refresh data at appropriate intervals, and consider whether slightly stale context is acceptable for your use case. | +| **Composability** | When multiple providers contribute to the same context window, their contributions can interact in unexpected ways. Test providers together, not just individually, to ensure the combined context makes sense. | + +## Next steps + +Now that your agent has tools, skills, middleware, and context providers, the next step is **agents as tools** — composing agents by using one agent as a tool for another, enabling specialization and delegation. + +> [!div class="nextstepaction"] +> [Agents as Tools](agents-as-tools.md) + +**Go deeper:** + +- [Context Providers reference](../agents/conversations/context-providers.md) — built-in and custom provider patterns +- [Conversations & Memory overview](../agents/conversations/index.md) — sessions, history, and storage +- [RAG](../agents/rag.md) — retrieval-augmented generation patterns +- [Compaction](../agents/conversations/compaction.md) — managing context window size +- [Storage](../agents/conversations/storage.md) — persisting conversation data +- [Agent Pipeline Architecture](../agents/agent-pipeline.md) — how context providers fit in the execution pipeline +- [Step 4: Memory](../get-started/memory.md) — hands-on tutorial diff --git a/agent-framework/journey/adding-middleware.md b/agent-framework/journey/adding-middleware.md new file mode 100644 index 000000000..44c4c2fcd --- /dev/null +++ b/agent-framework/journey/adding-middleware.md @@ -0,0 +1,105 @@ +--- +title: Adding Middleware +description: Understand why and when agents need middleware, how the middleware pipeline works, and the types of cross-cutting concerns middleware addresses. +author: taochen +ms.topic: conceptual +ms.author: taochen +ms.date: 04/04/2026 +ms.service: agent-framework +--- + +# Adding Middleware + +The [previous page](adding-skills.md) showed how skills package reusable domain expertise — instructions, reference material, and scripts — into self-contained units that any agent can load on demand. But as you deploy agents into production, a new category of problems emerges: problems that cut across *every* interaction regardless of what the agent does. + +You need to log every request and response. You need guardrails that block harmful content before the model sees it. You need to enforce rate limits, catch exceptions gracefully, and inject telemetry — all without touching the agent's core logic. Copy-pasting these concerns into every agent (or every tool, or every skill) doesn't scale and creates maintenance nightmares. + +**Middleware** solves this. Middleware lets you wrap the agent's [**execution pipeline**](../agents/agent-pipeline.md) with reusable behaviors that intercept, inspect, and modify requests and responses at well-defined points. Think of middleware as a series of concentric layers around the agent — each layer gets a chance to act on the input before it reaches the agent, and on the output before it reaches the caller. + +## When to use this + +Add middleware to your agent when: + +- You need **guardrails** to block harmful, off-topic, or policy-violating content before or after the model processes it. +- You want **centralized logging or telemetry** for all agent interactions without modifying each agent individually. +- You need to **modify requests or responses** — enriching prompts, transforming outputs, or replacing results entirely — without changing agent logic. +- You want to **enforce policies** such as rate limiting, content filtering, or authentication checks that apply to every run. +- You need to **handle exceptions** consistently — retrying on transient failures, returning graceful fallback responses, or logging errors for diagnostics. +- You want to **share state** across the pipeline — for example, tracking request timing or accumulating metrics that multiple middleware components need. + +> [!TIP] +> Agent Framework includes built-in instrumentation for tracing and metrics. See [Observability](../agents/observability.md) for details. + +## How the middleware pipeline works + +When you call your agent's run method, the request doesn't go directly to the model. Instead, it flows through a pipeline of middleware layers, each of which can inspect or modify the request, delegate to the next layer, and then inspect or modify the response on the way back. + +``` +┌─────────────────────────────────────────────────────────┐ +│ Caller: agent.run("What's the weather?") │ +└──────────────┬──────────────────────────────────────────┘ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ Middleware 1 (Logging) │ +│ • Logs the incoming request │ +│ • Calls next middleware │ +│ • Logs the outgoing response │ +└──────────────┬──────────────────────────────────────────┘ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ Middleware 2 (Guardrails) │ +│ • Checks input against content policy │ +│ • If blocked → returns early with rejection message │ +│ • If allowed → calls next middleware │ +│ • Checks output against content policy │ +└──────────────┬──────────────────────────────────────────┘ + ▼ +┌─────────────────────────────────────────────────────────┐ +│ Agent core (model invocation, tool calls, etc.) │ +└─────────────────────────────────────────────────────────┘ +``` + +Key points: + +1. **Each middleware decides whether to continue.** A middleware can call the next layer in the chain to proceed normally, or it can short-circuit the pipeline by returning a response directly — for example, when a guardrail blocks a request. +2. **Middleware sees both directions.** A middleware runs code *before* delegating (to inspect or modify the input) and *after* the response comes back (to inspect or modify the output). This is the classic "onion" pattern. +3. **Multiple middleware chain together.** When you register several middleware components, they nest: the first registered middleware is the outermost layer, and the last registered is the innermost layer closest to the agent. + +> [!TIP] +> For a detailed view of how middleware fits into the full agent execution pipeline — including context providers and chat client layers — see the [Agent Pipeline Architecture](../agents/agent-pipeline.md). + +## What middleware can do + +Agent Framework supports middleware at three layers of the pipeline — agent run, function calling, and chat client — giving you fine-grained control over where you intercept execution. Common patterns include: + +| Pattern | Example | Reference | +|---------|---------|-----------| +| Guardrails & termination | Block harmful content, limit conversation length | [Termination & Guardrails](../agents/middleware/termination.md) | +| Exception handling | Retry on transient failures, return fallback responses | [Exception Handling](../agents/middleware/exception-handling.md) | +| Result overrides | Redact sensitive data, enrich or replace agent output | [Result Overrides](../agents/middleware/result-overrides.md) | +| Shared state | Pass request IDs or timing data between middleware | [Shared State](../agents/middleware/shared-state.md) | +| Runtime context | Vary behavior based on session, user, or per-run config | [Runtime Context](../agents/middleware/runtime-context.md) | +| Scoping | Apply middleware to all runs or just a single run | [Agent vs Run Scope](../agents/middleware/agent-vs-run-scope.md) | + +For a complete walkthrough of defining and registering middleware, see [Defining Middleware](../agents/middleware/defining-middleware.md). For the full architecture overview, see the [Middleware Overview](../agents/middleware/index.md). + +## Considerations + +| Consideration | Details | +|---------------|---------| +| **Separation of concerns** | Middleware keeps cross-cutting logic out of your agent code, your tools, and your skills. Each middleware component has a single responsibility — logging, guardrails, error handling — that you can add, remove, or reorder independently. | +| **Order dependence** | Middleware forms a chain. The order you register middleware matters: a logging middleware that runs first will see the raw input, while one that runs last will see input already modified by earlier middleware. Plan your pipeline order deliberately. | +| **Debugging complexity** | When middleware modifies inputs or outputs, debugging requires understanding the full pipeline. A response might look wrong not because of the agent but because a middleware transformed it. Good logging middleware (placed early in the chain) helps diagnose these cases. | +| **Performance overhead** | Each middleware layer adds processing time to every request. For lightweight operations like logging, this is negligible. For expensive operations like calling an external content-moderation API, the latency adds up — especially when multiple such middleware are chained. | + +## Next steps + +Now that your agent has tools, skills, and middleware, the next step is **context providers** — components that inject memory, user profiles, and dynamic knowledge into the agent's context window before each run. + +> [!div class="nextstepaction"] +> [Context Providers](adding-context-providers.md) + +**Go deeper:** + +- [Middleware Overview](../agents/middleware/index.md) — full reference for all middleware types +- [Agent Pipeline Architecture](../agents/agent-pipeline.md) — how middleware fits into the execution pipeline diff --git a/agent-framework/journey/adding-skills.md b/agent-framework/journey/adding-skills.md new file mode 100644 index 000000000..97eeaf07d --- /dev/null +++ b/agent-framework/journey/adding-skills.md @@ -0,0 +1,119 @@ +--- +title: Adding Skills +description: Understand why and when to package agent capabilities into skills, how skills differ from tools, and when to reach for skills vs. other patterns. +author: TaoChenOSU +ms.topic: conceptual +ms.author: taochen +ms.date: 04/03/2026 +ms.service: agent-framework +--- + +# Adding Skills + +The [previous page](adding-tools.md) showed how tools let agents act — calling functions, querying APIs, searching the web. But as you build more agents, a pattern emerges: the same cluster of tools, instructions, and reference material keeps showing up together. A "file an expense report" capability isn't just one tool — it's a validation script, a set of policy documents, step-by-step instructions on how to fill out the form, and knowledge about spending limits. You end up copy-pasting this bundle from agent to agent, and it drifts out of sync. + +**Skills** solve this problem. A skill is a portable package that bundles instructions, reference material, and optional scripts into a single unit that any agent can discover and load on demand. Skills follow an [open specification](https://agentskills.io/) so they're reusable across agents, teams, and even products. + +## When to use this + +Add skills to your agent when: + +- You have a **cluster of related knowledge** — instructions, reference documents, and scripts — that logically belong together (for example, "expense reporting" or "code review guidelines"). +- **Multiple agents** need the same domain expertise and you want a single source of truth rather than duplicated instructions. +- You want to **share and distribute** agent capabilities across teams, projects, or organizations as self-contained packages. +- You need to **manage context efficiently** — skills use progressive disclosure so agents only load the detail they need, when they need it. + +## Considerations + +| Consideration | Details | +|---------------|---------| +| **Reusability** | A skill is a self-contained package. Once created, any agent can pick it up — no copy-paste, no drift between copies. | +| **Context efficiency** | Skills use progressive disclosure: the agent sees a brief description (~100 tokens) upfront and loads full instructions only when relevant. This keeps the context window lean when the skill isn't needed. | +| **Abstraction cost** | Skills add an abstraction layer on top of tools. For a single, standalone function tool, adding a skill wrapper is unnecessary overhead. | +| **Design effort** | You need to think about skill boundaries upfront: what belongs inside the skill and what stays outside. Poor boundaries lead to skills that are too broad (wasting context) or too narrow (losing the bundling benefit). | + +## How skills differ from tools + +Tools and skills are complementary, not competing. Understanding the distinction helps you decide when to reach for each. + +A **tool** is a single callable action — one function with a name, description, and parameter schema. When the model decides a tool is needed, it generates a structured call, Agent Framework executes it, and the result goes back to the model. Tools are the atoms of agent behavior. + +A **skill** is a package of domain expertise. It can include: + +- **Instructions** — step-by-step guidance, decision rules, and examples that tell the agent *how* to approach a domain. +- **Reference material** — policy documents, FAQs, templates, and other knowledge the agent can consult on demand. +- **Scripts** — executable code the agent can run to perform specific operations (for example, a validation script that checks expense data against policy rules). + +The key difference is one of scope: a tool gives the agent the ability to perform **one action**; a skill gives the agent the knowledge and resources to handle **an entire domain**. + +| | Tool | Skill | +|---|------|-------| +| **What it provides** | A single callable action | Instructions + reference material + optional scripts | +| **How the agent uses it** | Calls it when it needs to act | Loads it when it encounters a relevant task, reads instructions, and may call scripts or consult resources | +| **Context cost** | Tool schema is always in the prompt | Only the skill name and description (~100 tokens) are in the prompt; full content is loaded on demand | +| **Portability** | Tied to the agent that registers it | Self-contained package that any compatible agent can discover | +| **Best for** | Individual actions (query a database, send an email) | Domain expertise (expense policies, code review guidelines, onboarding procedures) | + +> [!TIP] +> Think of tools as **verbs** (search, book, validate) and skills as **expertise** (travel booking knowledge, expense policy knowledge). An agent uses tools to act and skills to know how to act. + +## How skills work: progressive disclosure + +Skills are designed to be context-efficient. Instead of injecting everything into the prompt upfront, skills use a three-stage pattern: + +``` +┌──────────────────────────────────────────────────────────────────┐ +│ Stage 1: Advertise │ +│ Agent sees skill names and descriptions (~100 tokens each) │ +│ in its system prompt at the start of every run. │ +└──────────────┬───────────────────────────────────────────────────┘ + ▼ (task matches a skill's domain) +┌──────────────────────────────────────────────────────────────────┐ +│ Stage 2: Load │ +│ Agent calls load_skill to get the full instructions │ +│ (< 5000 tokens recommended). │ +└──────────────┬───────────────────────────────────────────────────┘ + ▼ (agent needs more detail) +┌──────────────────────────────────────────────────────────────────┐ +│ Stage 3: Read resources │ +│ Agent calls read_skill_resource to fetch supplementary files │ +│ (FAQs, templates, reference docs) only when needed. │ +└──────────────────────────────────────────────────────────────────┘ +``` + +This pattern means an agent with 10 registered skills pays roughly 1,000 tokens of context overhead — not 50,000. The agent only deepens its knowledge when the current task demands it. + +In addition, skills are built on top of the tool infrastructure. Agent Framework advertises available skills in the agent's system prompt, then exposes `load_skill` and `read_skill_resource` as tool calls that the agent invokes to progressively load content. + +> [!TIP] +> For the full details on skill structure, setup, and code examples, see the [Agent Skills](../agents/skills.md) reference. + +## When to use skills vs. other patterns + +As your agent grows more capable, you have several ways to organize its behavior. Here's how skills compare to tools: + +| Pattern | Best for | Example | +|---------|----------|---------| +| **Individual tools** | One-off actions that don't need shared context | A `get_weather` function tool | +| **Skills** | Domain expertise with instructions, references, and optional scripts | An "expense-report" skill with policy docs, validation scripts, and step-by-step filing instructions | + +## Common pitfalls + +| Pitfall | Guidance | +|---------|----------| +| **Overly broad skills** | A skill called "everything-about-finance" that tries to cover accounting, taxes, expense reports, and payroll will have instructions too long and unfocused. Keep skills focused on one domain. | +| **Skipping security review** | Skill instructions are injected into the agent's context and scripts execute code. Treat skills like third-party dependencies — review them before deploying. See the [security best practices](../agents/skills.md#security-best-practices) in the skills reference. | +| **Ignoring progressive disclosure** | If your `SKILL.md` is 2,000 lines long, the agent pays a heavy context cost when it loads the skill. Keep instructions concise and move detailed reference material to separate resource files to take full advantage of progressive disclosure. | + +## Next steps + +Once your agent has tools and skills, the next step is to add **middleware** — cross-cutting behaviors like guardrails, logging, and content filtering that apply to every interaction without modifying your agent's core logic. + +> [!div class="nextstepaction"] +> [Adding Middleware](adding-middleware.md) + +**Go deeper:** + +- [Agent Skills](../agents/skills.md) — full reference with setup, code examples, scripts, and security guidance +- [Agent Skills specification](https://agentskills.io/) — the open standard behind skills +- [Tools Overview](../agents/tools/index.md) — all tool types and provider support matrix diff --git a/agent-framework/journey/adding-tools.md b/agent-framework/journey/adding-tools.md new file mode 100644 index 000000000..a46bab694 --- /dev/null +++ b/agent-framework/journey/adding-tools.md @@ -0,0 +1,228 @@ +--- +title: Adding Tools +description: Understand why and when agents need tools, the tool-calling loop, types of tools available, and how to choose the right tool strategy. +author: TaoChenOSU +ms.topic: conceptual +ms.author: taochen +ms.date: 04/03/2026 +ms.service: agent-framework +--- + +# Adding Tools + +The [previous page](from-llms-to-agents.md) showed how wrapping an LLM in an agent gives you a persistent identity, instructions, and session management. But even with all of that, the agent can only generate contents (text, images, etc.) — it can't look up today's stock price, send an email, or query your database. It answers from whatever knowledge was baked in during training and whatever context you provide in the prompt. + +**Tools** bridge this gap. They give the agent the ability to *act* — to reach beyond its training data and interact with the real world. Adding tools is the single most impactful step you can take to make an agent genuinely useful. + +## When to use this + +Add tools to your agent when: + +- The agent needs access to **real-time or external data** — live prices, weather, database records, search results — that isn't in the model's training data. +- The agent needs to **take actions** — sending emails, creating tickets, calling APIs, writing files — rather than just producing content. + +## Considerations + +| Consideration | Details | +|---------------|---------| +| **Latency** | Each tool call adds a round trip — the model generates a tool request, your code executes it, and the result is sent back before the model can continue. Multi-tool turns compound this. | +| **Token overhead** | Tool definitions (names, descriptions, parameter schemas) are included in every prompt. More tools means fewer tokens available for conversation history and the model's response. | +| **Debugging complexity** | When something goes wrong, the cause may be in the model's tool selection, the arguments it chose, or the tool's execution. You're debugging reasoning *and* code together. | +| **Reliability** | The model may call tools incorrectly, pass bad arguments, or invoke a tool when it shouldn't. Good descriptions and [tool approval](../agents/tools/tool-approval.md) mitigate this, but don't eliminate it. | + +## Why agents need tools + +As covered in [LLM Fundamentals](llm-fundamentals.md#how-llms-learn-to-use-tools), an LLM is trained to generate tokens — including a special structured format that represents a tool call. But the model itself never executes anything. It's your application (or Agent Framework) that parses the model's output, runs the actual function, and feeds the result back. + +This means tools don't change what the model *is* — they change what your agent can *do*. Without tools, an agent is a conversationalist. With tools, it becomes an operator. + +Consider a travel-booking agent. Without tools, it can discuss flights and suggest itineraries based on general knowledge. With tools, it can: + +- **Search** a flight API for real-time availability and pricing +- **Book** a flight on the user's behalf + +Each of those actions requires a tool — a piece of code the agent can invoke to interact with the outside world. + +## How the tool-calling loop works + +When you give an agent tools, Agent Framework automatically manages a **tool-calling loop**: + +``` +┌──────────────────────────────────────────────────────┐ +│ User: "What's the weather in Seattle?" │ +└──────────────┬───────────────────────────────────────┘ + ▼ +┌──────────────────────────────────────────────────────┐ +│ Agent sends messages + tool definitions to LLM │ +└──────────────┬───────────────────────────────────────┘ + ▼ + ┌───────────────┐ + │ LLM responds │ + └───┬───────┬───┘ + │ │ + Tool call? No ──────────────────────────┐ + │ │ + ▼ ▼ +┌─────────────────────────────┐ ┌─────────────────────────────┐ +│ Agent Framework executes │ │ Final response: │ +│ the tool (e.g., │ │ "It's cloudy in Seattle │ +│ get_weather("Seattle")) │ │ with a high of 15°C." │ +└──────────────┬──────────────┘ └─────────────────────────────┘ + │ + ▼ +┌─────────────────────────────┐ +│ Agent sends tool result │ +│ back to the LLM │ +└──────────────┬──────────────┘ + │ + └──────► (back to "LLM responds") +``` + +:::image type="content" source="../workflows/resources/images/ai-agent.png" alt-text="Diagram showing the tool-calling loop: the LLM interacts with external tools and memory in a loop before returning a final response."::: + +Key points: + +1. **You don't need to write the loop.** Agent Framework handles detecting tool calls in the model's response, executing the tools, and feeding results back. You define the tools; the framework orchestrates the rest. +2. **Multiple tool calls per turn.** The model may call several tools (potentially in parallel) before producing a final answer — or chain tool calls where the output of one informs the next. +3. **The model decides when to call tools.** Based on the user's request and the tool descriptions you provide, the model judges whether a tool is needed. Good tool descriptions lead to better tool selection. + +> [!TIP] +> For a hands-on walkthrough of adding your first tool and seeing this loop in action, see [Step 2: Add Tools](../get-started/add-tools.md) in the Get Started tutorial. + +## Types of tools + +Agent Framework supports several categories of tools. Choosing the right one depends on what you need the agent to do and where the capability lives. + +### Function tools + +**Function tools** are custom functions you write and register with the agent. They run in your process, giving you full control over the logic, security boundaries, and error handling. + +Use function tools when: + +- You have custom business logic the agent needs to invoke (query a database, call an internal API, perform a calculation) +- You need the tool to run in your environment with access to your resources +- You want compile-time type safety and testability + +Function tools are the most common and flexible tool type. Most agents start here. + +> [!div class="nextstepaction"] +> [Function Tools reference](../agents/tools/function-tools.md) + +### MCP tools (Model Context Protocol) + +[MCP](https://modelcontextprotocol.io/) is an open standard that defines how applications provide tools to LLMs. Instead of writing tool logic yourself, you connect to an **MCP server** that exposes a set of tools over a standard protocol — similar to how a REST API exposes endpoints. + +Agent Framework supports two flavors: + +| Flavor | What it is | When to use it | +|--------|-----------|----------------| +| **Hosted MCP tools** | MCP servers hosted and managed by Microsoft Foundry or other providers | You want turnkey access to common capabilities (for example, file search, code execution) without managing infrastructure | +| **Local MCP tools** | MCP servers you run yourself or connect to from any provider | You have a custom or third-party MCP server, or you need tools that run in your own environment | + +Use MCP tools when: + +- A prebuilt MCP server already provides the capability you need +- You want to reuse tools across multiple agents or applications through a shared server +- You're integrating with a third-party service that exposes an MCP endpoint + +> [!div class="nextstepaction"] +> [Hosted MCP Tools reference](../agents/tools/hosted-mcp-tools.md) +> [Local MCP Tools reference](../agents/tools/local-mcp-tools.md) + +### Provider-hosted tools + +Some providers offer built-in tools that run on the provider's infrastructure — no local code required. These include: + +| Tool | What it does | +|------|-------------| +| [Code Interpreter](../agents/tools/code-interpreter.md) | Executes code in a sandboxed environment on the provider's infrastructure | +| [File Search](../agents/tools/file-search.md) | Searches through files you upload to the provider | +| [Web Search](../agents/tools/web-search.md) | Searches the web for real-time information | + +Use provider-hosted tools when: + +- You need capabilities like code execution or web search without building or hosting the tool yourself +- The provider already offers a managed version that meets your requirements + +> [!NOTE] +> Provider-hosted tool availability varies by provider. See the [Tools Overview](../agents/tools/index.md) for the full provider support matrix. + +> [!NOTE] +> Some LLM providers may execute hosted tools on their infrastructure during inference, such as the [Responses API](https://developers.openai.com/api/docs/guides/migrate-to-responses) by OpenAI. Think of these inference services as a semi-agentic services that combine inference with tool execution. It doesn't change how the underlying model works, but it does mean that tool execution can happen as part of the service's response generation. These services cannot execute local tools, which must be run on your own infrastructure. + +## Choosing the right tool type + +| Question | Recommendation | +|----------|---------------| +| Do I have custom business logic? | **Function tools** — write and register your own functions | +| Is there an MCP server that already does what I need? | **MCP tools** — connect to it instead of building from scratch, such as the [GitHub MCP server](https://github.com/github/github-mcp-server) | +| Do I need code execution, file search, or web search? | **Provider-hosted tools** — check if your provider supports them | +| Do I need tools from multiple categories? | **Mix them** — agents can use function tools, MCP tools, and provider-hosted tools simultaneously | + +## Tool descriptions matter + +The model selects tools based on their **names and descriptions**. A vague description leads to poor tool selection — the model may call the wrong tool, skip a tool it should use, or pass incorrect arguments. + +Write tool descriptions the same way you'd write an API doc: say what the tool does, what each parameter means, and what it returns. The clearer the description, the better the model's judgment. + +> [!TIP] +> Tool definitions (names, descriptions, parameter schemas) are included in the prompt and consume tokens in the context window. If you register many tools, the overhead can be significant. Only register the tools the agent actually needs. + +## Tool approval: human-in-the-loop + +Some actions are sensitive — transferring money, deleting records, sending emails. You may not want the agent to execute these tools autonomously. **Tool approval** lets you require human confirmation before a tool is executed. + +When a tool is marked as requiring approval, the agent pauses before execution and returns a response indicating that approval is needed. Your application is responsible for presenting this to the user and passing their decision back. + +This pattern is often called **human-in-the-loop** and is essential for building trustworthy agents that handle consequential actions. + +> [!div class="nextstepaction"] +> [Tool Approval reference](../agents/tools/tool-approval.md) + +## Common pitfalls + +| Pitfall | Guidance | +|---------|----------| +| **Too many tools** | Every tool definition consumes tokens. Register only the tools relevant to the agent's purpose. | +| **Vague descriptions** | "Does stuff with data" won't help the model. Be specific: "Queries the inventory database for product availability by SKU." | +| **No error handling** | Tools can fail (network errors, invalid input). Return clear error messages so the model can reason about what went wrong and try again or inform the user. | +| **Overly permissive tools** | A tool that can "run any SQL query" is a security risk. Scope tools to specific, well-defined operations. | +| **Missing approval on sensitive actions** | If a tool can make irreversible changes, add [tool approval](../agents/tools/tool-approval.md) to keep a human in the loop. | + +## Special mention: Code Interpreter Tool + +As discussed in [LLM Fundamentals](llm-fundamentals.md#what-llms-struggle-with), LLMs can make errors in precise calculations and formal logic. This is because LLMs generate answers token by token based on pattern matching — they don't actually *compute*. An LLM asked to multiply two large numbers isn't performing arithmetic; it's predicting what the answer "looks like" based on training data. This works surprisingly often, but fails unpredictably on edge cases. + +**Code Interpreter** solves this by letting the agent write and execute code in a sandboxed environment. Instead of guessing the answer, the model writes a Python script that computes it exactly, runs it, and uses the verified result in its response. + +> [!NOTE] +> The model may write a slightly different script each time it is asked to solve the same problem, but the results should be **mostly** consistent. + +> [!WARNING] +> Code Interpreter is not a replacement for careful reasoning on the human's part. Always check the work of the agent and verify the results independently when necessary. + +Give your agent Code Interpreter when it needs to: + +- **Perform precise calculations** — financial modeling, statistical analysis, unit conversions — where an approximate "best guess" isn't acceptable. +- **Transform or analyze data** — parse CSVs, aggregate rows, generate charts, or reshape structured data. +- **Process files** — read uploaded documents, extract content, convert formats, or generate new files. +- **Validate its own reasoning** — write test code to verify a logical claim before presenting it to the user. + +> [!TIP] +> Code Interpreter can be a provider-hosted tool — the code runs on the provider's infrastructure in a sandbox, not in your environment. This makes it safe to use without worrying about arbitrary code executing on your servers. See the [Code Interpreter reference](../agents/tools/code-interpreter.md) for setup details. + +## Next steps + +Once your agent has tools, the next step is to learn about **skills** — portable packages of instructions, reference material, and scripts that give agents domain expertise they can load on demand. + +> [!div class="nextstepaction"] +> [Adding Skills](adding-skills.md) + +**Go deeper:** + +- [Tools Overview](../agents/tools/index.md) — all tool types and provider support matrix +- [Function Tools](../agents/tools/function-tools.md) — detailed function tool reference +- [Hosted MCP Tools](../agents/tools/hosted-mcp-tools.md) — Microsoft Foundry MCP servers or other providers +- [Local MCP Tools](../agents/tools/local-mcp-tools.md) — custom MCP servers +- [Tool Approval](../agents/tools/tool-approval.md) — human-in-the-loop for tools +- [Step 2: Add Tools](../get-started/add-tools.md) — hands-on tutorial diff --git a/agent-framework/journey/agent-to-agent.md b/agent-framework/journey/agent-to-agent.md new file mode 100644 index 000000000..9d8c68b54 --- /dev/null +++ b/agent-framework/journey/agent-to-agent.md @@ -0,0 +1,49 @@ +--- +title: Agent-to-Agent (A2A) +description: Enable agents to communicate across service and organizational boundaries using the A2A protocol. +author: TaoChenOSU +ms.topic: conceptual +ms.author: taochen +ms.date: 04/06/2026 +ms.service: agent-framework +--- + +# Agent-to-Agent (A2A) + +The [previous page](agents-as-tools.md) showed how to compose agents within a single process — one agent calls another as a function tool, and the framework handles the rest. That pattern works well when all your agents live in the same application, share the same runtime, and are maintained by the same team. + +But real-world agent systems often need to communicate across boundaries. **Agent-to-Agent (A2A)** is an [open protocol](https://a2a-protocol.org/latest/) designed for exactly this. It defines a standard way for agents to discover each other, exchange messages, and coordinate on tasks — over HTTP, across any boundary, in any language or framework. Agent Framework provides [built-in A2A integration](../integrations/a2a.md) so you can host and call A2A-compliant agents with minimal setup. + +## When to use this + +Use A2A when your agents need to cross a boundary that in-process composition can't handle: + +- **Service boundaries.** Your travel-booking agent runs as a microservice, and your expense-filing agent runs as another. They can't call each other as in-process function tools — they need a network protocol. +- **Team boundaries.** A partner team owns a "compliance-review" agent. You don't have access to their code, their model, or their deployment — you just need to send it a request and get a response. +- **Organizational boundaries.** A third-party provider offers a specialized agent (document processing, legal review, medical triage). You need a standard way to discover it, understand what it can do, and communicate with it — regardless of what framework or language it's built with. +- **Independent evolution.** Your agents need different release cycles, different teams, or different languages — without tightly coupling their implementations. + +> [!TIP] +> If your agents all live in the same process and are maintained by the same team, [agents as tools](agents-as-tools.md) is simpler and has less overhead. A2A adds value when you cross a process, service, or organizational boundary. + +## Considerations + +| Consideration | Details | +|---------------|---------| +| **Interoperability** | A2A is framework-agnostic. Your .NET agent can call a Python agent, a LangChain agent, or any agent that implements the protocol. This is A2A's primary value — it's the "HTTP of agent communication." | +| **Network overhead** | Every A2A call is an HTTP request. This adds latency compared to in-process agent-as-tool calls. For performance-sensitive paths, keep agents co-located or use A2A only where a boundary truly exists. | +| **Operational complexity** | Remote agents are distributed services. You need to handle network failures, timeouts, retries, and versioning — the same concerns you'd have with any service-to-service communication. | +| **Discovery at runtime** | Agent cards make discovery dynamic, but you still need to know where to look. In production, you'll typically configure known agent endpoints or use a registry. | +| **Conversation state** | The remote agent manages its own conversation state (keyed by context ID). Your agent doesn't see the remote agent's internal reasoning — only its responses. If the remote agent restarts and loses state, your conversation context may be lost. | + +## Next steps + +Now that your agents can communicate across any boundary, the final step in the journey is **workflows** — explicit, graph-based orchestration for multi-step, multi-agent processes where you need full control over execution order, state, and recoverability. + +> [!div class="nextstepaction"] +> [Workflows](workflows.md) + +**Go deeper:** + +- [A2A Integration](../integrations/a2a.md) — implementation guide for hosting and calling A2A agents +- [Agents as Tools](agents-as-tools.md) — the simpler in-process composition pattern diff --git a/agent-framework/journey/agents-as-tools.md b/agent-framework/journey/agents-as-tools.md new file mode 100644 index 000000000..f9d0237a8 --- /dev/null +++ b/agent-framework/journey/agents-as-tools.md @@ -0,0 +1,97 @@ +--- +title: Agents as Tools +description: Compose agents by using one agent as a tool for another — enabling specialization and delegation. +author: TaoChenOSU +ms.topic: conceptual +ms.author: taochen +ms.date: 04/06/2026 +ms.service: agent-framework +--- + +# Agents as Tools + +The [previous page](adding-context-providers.md) showed how context providers give agents memory and dynamic knowledge — information that's proactively injected before every invocation. At this point, you have a **single** agent that can use tools, load skills, run through middleware, and draw on rich context. That's powerful, but it's still one agent doing everything. + +What happens when your agent's responsibilities grow beyond what a single set of instructions can handle well? As an agent accumulates tools, **tool selection degrades** — models are better at choosing among a handful of well-described tools than sorting through dozens. As instructions broaden, **focus degrades** — a system prompt that tries to cover travel booking, expense reporting, and calendar management gives the model too many roles to juggle. + +[**Agents as tools**](../agents/tools/index.md#using-an-agent-as-a-function-tool) solve this by letting you compose agents: one agent (the *outer* agent) can call another agent (the *inner* agent) as if it were a regular function tool. Each inner agent has a tight scope — its own instructions, its own tools, its own expertise. The outer agent decides when to delegate and what to ask for — exactly the same way it decides when to call any other tool. + +## When to use this + +Use agents as tools when: + +- You want to **delegate a specialized subtask** to a focused agent — for example, a general assistant that calls a dedicated "travel-booking agent" when the user asks about flights. +- The outer agent should decide **when and whether** to involve the inner agent, based on the conversation — the delegation is model-driven, not hard-coded. +- You don't need explicit control over the **execution order** between agents — you're fine with the outer agent orchestrating things through its own reasoning. + +> [!TIP] +> Each agent can also use a different model depending on its specialization and requirements. More complex agents might use larger models for reasoning, while simpler agents might use smaller, faster models for efficiency. + +## Considerations + +| Consideration | Details | +|---------------|---------| +| **Simplicity** | Agent-as-tool is the lightest multi-agent pattern. You convert an agent to a tool and hand it to another agent. It's the natural next step when one agent isn't enough. | +| **Latency** | Each delegation is a full agent invocation: the outer agent calls the inner agent, which calls the LLM, which may call tools of its own. Nested invocations add up. Keep inner agents focused so they resolve quickly. | +| **Routing is model-driven** | The outer agent's LLM decides when to call the inner agent, just like it decides when to call any tool. This means routing can be unpredictable — if the tool description is vague, the model may call the wrong agent or skip it entirely. Clear, specific descriptions are critical. | +| **Limited visibility** | The outer agent sees the inner agent's final text response — it doesn't see the inner agent's intermediate reasoning, tool calls, or context. If you need observability into inner agent behavior, use [tracing](../agents/observability.md). | +| **Context isolation** | The inner agent runs with its own instructions and tools. It doesn't automatically inherit the outer agent's conversation history or context. You communicate with it through the tool call arguments, just like any other function tool. | + +## How it works + +Agents as tools builds on the [tool-calling loop](adding-tools.md#how-the-tool-calling-loop-works) you already know. The only difference is that the "function" being called is itself an agent. + +``` +┌──────────────────────────────────────────────────────────┐ +│ User: "Book me a flight to Paris and file the expense" │ +└──────────────┬───────────────────────────────────────────┘ + ▼ +┌──────────────────────────────────────────────────────────┐ +│ Outer agent reasons about the request │ +│ → decides to call the travel-booking agent first │ +└──────────────┬───────────────────────────────────────────┘ + ▼ +┌──────────────────────────────────────────────────────────┐ +│ Inner agent (travel-booking) runs as a tool: │ +│ • receives: "Book a flight to Paris" │ +│ • uses its own tools (search_flights, book_flight) │ +│ • returns: "Booked Flight AF123, $450" │ +└──────────────┬───────────────────────────────────────────┘ + ▼ +┌──────────────────────────────────────────────────────────┐ +│ Outer agent receives the tool result │ +│ → decides to call the expense-filing agent next │ +└──────────────┬───────────────────────────────────────────┘ + ▼ +┌──────────────────────────────────────────────────────────┐ +│ Inner agent (expense-filing) runs as a tool: │ +│ • receives: "File expense for Flight AF123, $450" │ +│ • uses its own tools (create_expense, attach_receipt) │ +│ • returns: "Expense report filed" │ +└──────────────┬───────────────────────────────────────────┘ + ▼ +┌──────────────────────────────────────────────────────────┐ +│ Outer agent synthesizes both results: │ +│ "Done! Booked Flight AF123 to Paris for $450 and filed │ +│ expense report." │ +└──────────────────────────────────────────────────────────┘ +``` + +Key points: + +1. **The inner agent looks like a function tool.** From the outer agent's perspective, calling an inner agent is no different from calling `get_weather()` or `search_database()`. The framework handles converting the agent to a tool with a name, description, and input parameter. +2. **The inner agent runs independently.** It has its own instructions, tools, and LLM invocations. It doesn't see the outer agent's full conversation — only the input passed through the tool call. +3. **The outer agent sees only the final result.** The inner agent's intermediate steps (tool calls, reasoning, retries) are invisible to the outer agent. It receives a text response, just like any tool result. + +## Next steps + +Now that you can compose agents within a single process, the next step is **Agent-to-Agent (A2A)** — enabling agents to communicate across service and organizational boundaries using a standard protocol. + +> [!div class="nextstepaction"] +> [Agent-to-Agent (A2A)](agent-to-agent.md) + +**Go deeper:** + +- [Tools Overview — Using an Agent as a Function Tool](../agents/tools/index.md#using-an-agent-as-a-function-tool) — code examples for C# and Python +- [Function Tools](../agents/tools/function-tools.md) — the tool type that agent-as-tool builds on +- [Observability](../agents/observability.md) — tracing inner agent behavior diff --git a/agent-framework/journey/from-llms-to-agents.md b/agent-framework/journey/from-llms-to-agents.md new file mode 100644 index 000000000..8f1304d99 --- /dev/null +++ b/agent-framework/journey/from-llms-to-agents.md @@ -0,0 +1,116 @@ +--- +title: From LLMs to Agents +description: Understand what makes an AI agent more than a raw LLM call, why the agent abstraction matters, and create your first agent with instructions. +author: TaoChenOSU +ms.topic: conceptual +ms.author: taochen +ms.date: 04/03/2026 +ms.service: agent-framework +--- + +# From LLMs to Agents + +The [previous page](llm-fundamentals.md) covered how LLMs work: they take a tokenized sequence of messages, generate new tokens one at a time. But a raw LLM call is **stateless** — it has no memory, no tools wired up, and no built-in way to maintain a conversation. Every call starts from scratch. + +An **agent** wraps an LLM with the structure needed to build real applications: a persistent identity, system instructions, tools, memory, and a runtime loop that orchestrates it all. This page explains what that abstraction provides and walks you through creating your first agent. + +## When to use this + +Understanding the agent abstraction helps when: + +- You're deciding whether to use raw LLM calls or Microsoft Agent Framework +- You want to understand the value that Agent Framework provides over direct API calls +- You're designing an application and need to choose the right level of abstraction + +## Trade-offs + +| Raw LLM calls | Agent Framework | +|----------------|-----------------| +| Full control over every API parameter | Opinionated abstractions that handle common patterns | +| No dependencies beyond the model SDK | Additional dependency on Agent Framework | +| You manage state, tools, and retry logic | Built-in session management, tool dispatch, and middleware for production-grade applications | +| Tightly coupled to one provider | Swap providers without changing application code | + +## What a raw LLM call looks like + +At its simplest, calling an LLM is a stateless request-response: + +``` +request: + messages: + [system] "You are a helpful assistant." + [user] "What's the capital of France?" + +response: + [assistant] "The capital of France is Paris." +``` + +This works for a single question. But for anything beyond that, you quickly hit limitations: + +- **No memory** — Chat history management differs by service. Some services support in-service chat history storage, but with raw LLM calls you must manage this yourself. Agent Framework unifies this via the session. +- **No tools** — The model can only generate text. It can't look up data, call APIs, or take actions unless you write all the orchestration code yourself. +- **No identity** — Every call requires you to re-send the system instructions. There's no persistent "agent" — just an API you call. +- **No guardrails** — There's no built-in way to intercept, validate, or modify the model's behavior across calls. +- **No Encapsulation** — Each use site of the LLM needs to have access and knowledge of the tools that needs to be used with the LLM. There is no encapsulation of these inside an opaque agent. +- **Tightly coupled** — Your code is written against a specific provider's API. Switching models means rewriting integration code. + +Each of these problems is solvable on its own, but solving all of them for every application is significant engineering work. That's what the agent abstraction handles for you. + +## What an agent adds + +An agent takes the raw LLM call and wraps it in a structured runtime: + +``` +┌──────────────────────────────────────────────────┐ +│ Agent │ +│ │ +│ ┌──────────────┐ ┌────────┐ ┌─────────────┐ │ +│ │ Instructions │ │ Tools │ │ Session │ │ +│ └──────────────┘ └────────┘ └─────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────┐ │ +│ │ Middleware Pipeline │ │ +│ └──────────────────────────────────────────┘ │ +│ │ +│ ┌──────────────────────────────────────────┐ │ +│ │ LLM Provider (swappable) │ │ +│ └──────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────┘ +``` + +| Layer | What it does | +|-------|--------------| +| **Instructions** | Define the agent's persona, constraints, and output format. Set once, applied to every call. | +| **Tools** | Give the agent the ability to act — call APIs, query databases, run code. The framework handles the tool-call loop automatically. | +| **Session** | Maintain conversation history and any other multi-turn conversation state so the agent remembers what happened before. | +| **Middleware** | Intercept requests and responses for logging, guardrails, caching, or behavioral overrides. | +| **LLM Provider** | Abstract the LLM backend. Switch from Azure OpenAI to another provider without changing your agent code. | + +> [!TIP] +> To see the full list of LLM provider options in Agent Framework, refer to [Providers](../agents/providers/index.md). To see the full agentic pipeline in Agent Framework, refer to [Agent Pipeline](../agents/agent-pipeline.md). + +## Your first agent: instructions only + +The simplest possible agent has just two things: a **model client** and **instructions** — just an LLM with a persona. This is the right starting point for simple tasks such as question answering or text summarization, where the LLM's internal knowledge is sufficient. + +> [!IMPORTANT] +> An agent with instructions only will respond using **only** the knowledge acquired during the training stage of the LLM, and the instructions provided. For example, if the question is "What is the capital of France?", the agent can answer "Paris" because it learned this fact during training. Therefore, the agent at this point only acts as a wrapper around the LLM with a static persona. + +> [!TIP] +> At this stage, you probably don't need a very strong model. If the questions require logical reasoning or complex understanding, you may need a reasoning model. + +Please refer to [Your First Agent](../get-started/your-first-agent.md) for a step-by-step guide to creating and running your first agent in Agent Framework with instructions only. + +Please refer to [Multi-turn Conversations](../get-started/multi-turn.md) for guidance on handling conversations that span multiple interactions with the agent, i.e. adding **session management**. + +## Next steps + +To make the agent more capable, the first thing you may want to do is add **tools**. Tools give the agent the ability to act — call APIs, query databases, run code. + +> [!div class="nextstepaction"] +> [Adding Tools](adding-tools.md) + +**Go deeper:** + +- [Running Agents](../agents/running-agents.md) — streaming, invocation patterns +- [Providers](../agents/providers/index.md) — choose your LLM provider diff --git a/agent-framework/journey/index.md b/agent-framework/journey/index.md new file mode 100644 index 000000000..15a8788ed --- /dev/null +++ b/agent-framework/journey/index.md @@ -0,0 +1,41 @@ +--- +title: The Agent Development Journey +description: A progressive guide from LLM fundamentals to advanced agent patterns, helping you understand when and why to use each capability. +author: TaoChenOSU +ms.topic: conceptual +ms.author: taochen +ms.date: 04/02/2026 +ms.service: agent-framework +--- + +# The Agent Development Journey + +Building AI agents is a journey. This guide takes you from understanding the fundamentals of large language models (LLMs) through progressively more powerful agent patterns, helping you understand **when** and **why** to reach for each capability. + +Each step in the journey builds on the previous one, adding complexity only when the scenario demands it. Along the way, you'll learn the trade-offs of each approach so you can make informed decisions for your own applications. + +| Step | What you'll learn | When you need it | +|------|-------------------|------------------| +| [LLM Fundamentals](llm-fundamentals.md) | How LLMs work and what they can (and can't) do | You're new to LLMs or want to understand the foundation | +| [From LLMs to Agents](from-llms-to-agents.md) | What makes an agent more than a chat completion call, and creating your first agent with instructions | You want to understand the agent abstraction | +| [Adding Tools](adding-tools.md) | Extending agents with function tools and MCP servers | Your agent needs to interact with the real world | +| [Adding Skills](adding-skills.md) | Packaging reusable agent capabilities | You want modular, shareable agent behaviors | +| [Adding Middleware](adding-middleware.md) | Intercepting and customizing agent behavior | You need guardrails, logging, or behavioral overrides | +| [Context Providers](adding-context-providers.md) | Injecting memory and dynamic context | Your agent needs to remember or access external knowledge | +| [Agents as Tools](agents-as-tools.md) | Using one agent as a tool for another | You want agent composition | +| [Agent-to-Agent (A2A)](agent-to-agent.md) | Inter-agent communication across boundaries | Your agents need to communicate across services or organizations | +| [Workflows](workflows.md) | Orchestrating multi-agent, multi-step processes | You need explicit control over complex, multi-step execution | + +## How to use this guide + +- **New to AI agents?** Start from the beginning and work through each step. +- **Experienced developer?** Jump to the step that matches your current challenge. +- **Evaluating Agent Framework?** Read the "When to use" and "Trade-offs" sections on each page to understand the design space. + +> [!TIP] +> Each page includes a **"When to use this"** section and a **"Trade-offs"** table to help you decide if that pattern fits your scenario. + +## Next steps + +> [!div class="nextstepaction"] +> [LLM Fundamentals](llm-fundamentals.md) diff --git a/agent-framework/journey/llm-fundamentals.md b/agent-framework/journey/llm-fundamentals.md new file mode 100644 index 000000000..32a20d013 --- /dev/null +++ b/agent-framework/journey/llm-fundamentals.md @@ -0,0 +1,257 @@ +--- +title: LLM Fundamentals +description: Understand how large language models work, their capabilities, limitations, and why they form the foundation of AI agents. +author: TaoChenOSU +ms.topic: conceptual +ms.author: taochen +ms.date: 04/02/2026 +ms.service: agent-framework +--- + +# LLM Fundamentals + +Before building AI agents, it helps to understand the technology that powers them: **large language models (LLMs)**. This page gives you a developer-oriented overview of what LLMs are, how they work, what they're good at, and where they fall short — so you can make informed decisions as you build agents on top of them. + +> [!TIP] +> If you're already comfortable with LLMs and want to jump straight into building, skip ahead to [From LLMs to Agents](from-llms-to-agents.md). + +## What is an LLM? + +A large language model is a [neural network](https://en.wikipedia.org/wiki/Neural_network#In_machine_learning) trained on massive amounts of text data to predict the next token in a sequence. Through this simple training objective — *given all the previous tokens, what comes next?* — the model learns language structure and world knowledge. + +At its core, an LLM is just two things: + +1. **Model weights** — billions of numerical parameters learned during training that encode the model's knowledge. +2. **Architecture code** — the neural network structure (typically a [Transformer](https://en.wikipedia.org/wiki/Transformer_(deep_learning))) that runs the weights to produce output. + +> [!TIP] +> We highly recommend watching Andrej Karpathy's [Deep Dive into LLMs like ChatGPT](https://www.youtube.com/watch?v=7xTGNNLPyMI), which covers how LLMs are trained, how they work internally, and what should be expected from them. + +### Tokens: the building blocks + +LLMs don't process raw text character by character — they work with **tokens**. A tokenizer splits input text into tokens, which are sub-word units from a fixed vocabulary. A token might be a full word (`"hello"`), part of a word (`"un"` + `"believ"` + `"able"`), a single character, or punctuation. + +For example, the sentence "Tokenization is fascinating!" might break down into tokens like: + +``` +["Token", "ization", " is", " fascinating", "!"] +``` + +> [!TIP] +> Notice the spaces before some tokens — tokenization is not always word-aligned. + +Each token maps to a number (an ID in the model's vocabulary), and the model operates entirely on these numbers — not on text. When the model produces output, it generates token IDs that are then decoded back into text. + +The tokens above might map to the following IDs in the model's vocabulary: + +``` +[4421, 2860, 382, 33733, 0] +``` + +Understanding tokens matters because they are the unit of everything in LLMs: + +- **Pricing** is typically per-token (input tokens + output tokens) +- **Context windows** are measured in tokens (not words or characters) +- **Longer prompts** use more tokens, cost more, and leave less room for the model's response + +A rough rule of thumb: 1 token ≈ ¾ of a word in English. + +> [!TIP] +> To see how text is tokenized, this is a useful [online tokenizer](https://platform.openai.com/tokenizer) provided by OpenAI. + +### How LLMs are trained + +Modern LLMs go through multiple stages of training, each building on the last to produce increasingly capable and useful models. + +#### Stage 1: Pretraining + +Pretraining is where the model learns the bulk of its knowledge. The model is fed massive amounts of text from the internet — books, articles, code, websites — and learns to predict the next token given all previous tokens. This stage requires enormous compute (thousands of GPUs for weeks or months) and produces a **base model**. + +A base model is essentially a text-completion engine. Given a prompt, it generates plausible continuations based on patterns in the training data. However, a base model isn't particularly useful as an assistant — it may continue your text in unexpected ways, generate harmful content, or simply ramble. It doesn't follow instructions reliably. + +#### Stage 2: Post-training + +Post-training transforms a base model into a useful assistant. This stage happens in multiple phases: + +**Supervised Fine-Tuning (SFT)** — The model is trained on curated datasets of high-quality conversations: human-written examples of ideal assistant behavior. These examples show the model *how* to follow instructions, answer questions helpfully, decline harmful requests, and format responses clearly. SFT teaches the model the role of a helpful assistant. + +**Reinforcement Learning from Human Feedback (RLHF)** — After SFT, human raters compare pairs of model responses and indicate which is better. This preference data trains a reward model, which is then used with **reinforcement learning** to further tune the LLM toward responses that humans prefer. RLHF helps the model learn subtle quality distinctions that are hard to capture in static examples — like being concise vs. thorough, or knowing when to ask for clarification. This usually works in **unverifiable domains**, where there is no single correct answer, unlike problems with a clear objective or ground truth, such as arithmetic. + +> [!TIP] +> For intrigued readers, please refer to OpenAI's blog post on [instruction tuning](https://openai.com/research/instruction-following) or the [paper](https://arxiv.org/abs/2203.02155). + +#### Stage 3: Reasoning through reinforcement learning + +More recently, reinforcement learning techniques have been applied to teach models to **reason step by step** before producing a final answer. Rather than immediately responding, these models learn to generate a chain of thought — breaking problems into sub-steps, exploring alternatives, and verifying their work. + +This is the training approach behind reasoning models (such as OpenAI's o-series). The result is models that are significantly better at math, logic, coding, and complex multi-step problems, at the cost of higher latency and token usage (the reasoning steps are generated as tokens too). + +> [!NOTE] +> There are many ways to achieve reasoning in LLMs. Please refer to this post for a detailed overview: [Reasoning in Large Language Models](https://magazine.sebastianraschka.com/p/understanding-reasoning-llms). Reinforcement learning is the most powerful approach as it allows the model to learn from **its own reasoning process**. This approach usually works in **verifiable domains**, such as mathematics, logic, and coding. This is why the resulting models are significantly better at these tasks. + +> [!TIP] +> You don't need to understand every training detail to build agents, but knowing these stages helps explain why models behave differently. A base model completes text. An SFT + RLHF model follows instructions. A reasoning model thinks step by step. When choosing a model for your agent, these differences directly affect capability, cost, and latency. + +### How inference works + +When you send a request to an LLM, the model generates its response **one token at a time** through a process called **autoregressive generation**: + +1. Your full prompt (system message, conversation history, user input) is converted into tokens and fed into the model. +2. The model processes all input tokens and produces a probability distribution over its vocabulary — predicting which token is most likely to come next. +3. A token is selected from that distribution (influenced by temperature and other sampling parameters). +4. That new token is **appended to the full sequence**, and the entire updated sequence is fed back into the model to generate the next token. +5. This repeats until the model produces a stop token or reaches a length limit. + +This iterative process means that conceptually, the model considers the entire token sequence for every token it generates. This is why LLMs have a fixed **context window** — a maximum number of tokens the model can handle. Everything must fit: your prompt, the conversation history, any injected context, *and* the tokens the model is generating as its response. + +> [!TIP] +> In practice, modern LLM inference engines use optimizations like [**KV-cache**](https://arxiv.org/pdf/2603.20397) — caching intermediate computations from previously processed tokens so that each new token doesn't require reprocessing the full sequence from scratch. This is why generating the first token (the "prefill" phase, which processes all input tokens) takes longer than generating subsequent tokens (the "decode" phase, which processes one token at a time using the cache). + +``` +Context window (e.g., 128K tokens) +┌────────────────────────────────────────────────────────┐ +│ System │ History │ User │ ← Generated response → │ +│ instructions│ │ input │ │ +│ (input tokens) │ (output tokens) │ +└────────────────────────────────────────────────────────┘ +``` + +Modern models offer context windows from 4K to over 1M tokens, but the context window is always finite. This is your working memory budget — everything the model needs to know must fit within it. + +> [!IMPORTANT] +> Because inference is autoregressive (one token at a time), longer responses take proportionally longer to generate. Each token requires a full forward pass through the model. This is why **streaming** — sending tokens to the client as they're generated rather than waiting for the complete response — is a common pattern in agent applications. + +## Key concepts for developers + +### Chat completions: the basic API pattern + +Modern LLMs are accessed through a **chat completions API** that uses a structured message format: + +| Role | Purpose | +|------|---------| +| **System** | Sets the model's behavior, persona, and constraints (the "instructions") | +| **User** | The human's input or question | +| **Assistant** | The model's previous responses (for multi-turn context) | + +A typical request looks like this (simplified): + +``` +Messages: + [system] "You are a helpful assistant that answers questions about weather." + [user] "What's the weather like in Seattle?" +``` + +The model processes all messages in the context window and generates the next assistant response. This stateless request-response pattern is the foundation that agents build upon. + +> [!NOTE] +> Depending on the model and the API, the exact format and fields of the messages may vary. And underneath, these messages are converted into a format that may look like `............`, which will then be tokenized and processed by the model. + +### Temperature and determinism + +**Temperature** controls the randomness of the model's output: + +- **Temperature = 0**: More deterministic — the model picks the most likely token each time +- **Temperature > 0**: More creative — the model samples from a broader distribution + +For agent applications, lower temperatures (0–0.3) are typically preferred for reliable, consistent behavior. Higher temperatures (0.7–1.0) suit creative tasks. + +> [!IMPORTANT] +> Even at temperature 0, LLMs are not fully deterministic. Small variations can occur due to floating-point arithmetic, batching, and infrastructure differences. Don't design systems that depend on identical output for identical input. + +## What LLMs are good at + +LLMs excel at tasks that involve language understanding and generation: + +- **Reasoning and analysis** — breaking down problems, comparing options, explaining concepts +- **Content generation** — writing articles, emails, reports, and code +- **Summarization** — distilling long documents into concise key points +- **Translation** — converting between natural languages, or between formats (JSON ↔ prose) +- **Code generation** — writing, explaining, and debugging code across many languages +- **Classification and extraction** — categorizing text, extracting structured data from unstructured input +- **Multimodal understanding** — many modern LLMs can process images, audio, and video alongside text, enabling tasks like describing an image, transcribing speech, or analyzing visual content +- **Structured output** — generating responses in precise formats like JSON or XML, which is essential for tool calling, data extraction, and integration with downstream systems + +> [!TIP] +> Multimodal capabilities work because images, audio, and other modalities can also be converted into tokens — just like text. Specialized encoders transform these inputs into token sequences that the model processes alongside text tokens in the same context window. The fundamental mechanism remains the same: everything is tokens. + +## What LLMs struggle with + +Understanding LLM limitations is critical for building reliable agents: + +| Limitation | What it means for your agent | +|------------|------------------------------| +| **No real-time knowledge** | The model's training data has a cutoff date. It doesn't know about events after training. | +| **Hallucinations** | LLMs can generate confident but factually incorrect responses. They "dream" plausible-sounding text rather than retrieving verified facts. | +| **No persistent memory** | Each API call is stateless. The model doesn't remember previous conversations unless you include them in the context window. | +| **Limited math and logic** | While improving, LLMs can make errors in precise calculations and formal logic. | +| **Non-deterministic** | The same prompt can produce different responses across calls. | +| **No ability to act** | LLMs generate text — they can't send emails, query databases, or call APIs on their own. | + +> [!NOTE] +> Many of these limitations are exactly what agents are designed to address. Tools give agents the ability to act or retrieve real-time knowledge and even run code to ground their responses, and sessions provide persistent memory. You'll see how to address each of these as you progress through this journey. + +## How LLMs learn to use tools + +LLMs can only generate tokens — they can't browse the web, query a database, or call an API on their own. So how do they "use" tools? The answer is surprisingly simple: **they're trained to output a special sequence of tokens that represents a tool call**, and external code interprets that output and does the actual work. + +### Tool use is just token generation + +Remember that an LLM generates output one token at a time. During post-training, models are fine-tuned on examples that include tool interactions. These examples teach the model a structured format — when the model determines that it needs to use a tool, instead of generating a natural language response, it generates tokens that follow a specific schema, such as: + +```json +{ + "tool": "get_weather", + "arguments": { "location": "Seattle" } +} +``` + +To the model, this isn't fundamentally different from generating any other text. It's still predicting the next token. But because it was trained on thousands of examples of when and how to produce these structured outputs, it learns *when* a tool would be helpful, *which* tool to use, and *what arguments* to provide — all expressed as a sequence of tokens. + +> [!NOTE] +> Different model providers use different formats for tool calls (JSON function calls, XML-like tags, special tokens), but the principle is the same: the model generates structured output that signals "I want to call this tool with these arguments." + +### How models learn when to call tools + +During training, the model sees tool definitions included in the prompt — each tool described by a name, a description of what it does, and the parameters it accepts. The training examples demonstrate the pattern: + +1. **A user asks a question** that requires external information or action. +2. **The model generates a tool call** instead of answering directly — because the training data showed that this is the correct behavior when the model doesn't have the information itself. +3. **A tool result appears in the conversation** (provided by external code during training data collection). +4. **The model generates a final response** that incorporates the tool result. + +Through this training, the model learns the judgment of *when* to call a tool (vs. answering from its own knowledge), *which* tool to select from the available options, and *how* to formulate the arguments based on the user's request. + +### Why this matters + +Understanding that tool use is "just" token generation clarifies several important points: + +- **The LLM never executes anything.** It only generates the *request*. Your application code (or an agent framework) is responsible for parsing the tool call, executing the function, and feeding the result back. This separation is a key safety boundary. +- **Tool quality depends on training.** A model's ability to use tools well depends on how thoroughly it was fine-tuned on tool-use examples. This is why some models are better at tool calling than others. +- **Tool descriptions are part of the prompt.** The tool definitions you provide consume tokens in the context window. More tools means fewer tokens available for conversation history and the model's response. +- **The model can make mistakes.** Just like it can hallucinate facts, it can generate tool calls with wrong arguments, call the wrong tool, or call a tool when it shouldn't. Guardrails and validation matter. + +How this tool-calling capability gets wired into a full execution loop — where an agent iteratively calls tools, observes results, and decides what to do next — is the bridge from LLMs to agents, covered in the [next page](from-llms-to-agents.md). + +## How this connects to agents + +An LLM alone is a powerful but limited text-in, text-out system. To build useful applications, you need to add layers on top: + +| Need | LLM alone | With Agent Framework | +|------|-----------|---------------------| +| Focused behavior | Craft system prompts manually | Agent with instructions and identity | +| Real-time data | Not available | Tools (function tools, MCP servers) | +| Take actions | Not possible | Tool calling with approval workflows | +| Memory | Re-send conversation each time | Sessions and context providers | +| Reliability | Hope the prompt works | Middleware for guardrails and overrides | + +Agent Framework handles these layers so you can focus on your application logic rather than re-building LLM infrastructure. + +## Learn more + +- [What are Large Language Models (LLMs)?](https://azure.microsoft.com/resources/cloud-computing-dictionary/what-are-large-language-models-llms) — Microsoft Azure's overview of LLM types and use cases +- [Deep Dive into LLMs like ChatGPT](https://www.youtube.com/watch?v=7xTGNNLPyMI) — Andrej Karpathy's three-hour introduction covering how LLMs are trained, how they work, and what should be expected from them. + +## Next steps + +> [!div class="nextstepaction"] +> [From LLMs to Agents](from-llms-to-agents.md) diff --git a/agent-framework/journey/workflows.md b/agent-framework/journey/workflows.md new file mode 100644 index 000000000..ad56e2198 --- /dev/null +++ b/agent-framework/journey/workflows.md @@ -0,0 +1,118 @@ +--- +title: Workflows +description: Orchestrate multi-agent, multi-step processes with explicit control over execution order, state, and human-in-the-loop patterns. +author: TaoChenOSU +ms.topic: conceptual +ms.author: taochen +ms.date: 04/06/2026 +ms.service: agent-framework +--- + +# Workflows + +> [!TIP] +> Before reaching for workflows, we recommend you first try simpler patterns to see if they meet your needs. They are easier to set up and debug. Workflows are most useful when you need guaranteed execution order that a single agent can't reliably provide on its own. + +The journey so far has covered increasingly powerful ways to build with agents. You've seen how a single agent can [use tools](adding-tools.md), [load skills](adding-skills.md), [run through middleware](adding-middleware.md), and [draw on rich context](adding-context-providers.md). You've composed agents by [using one as a tool for another](agents-as-tools.md) and connected them across service boundaries with [A2A](agent-to-agent.md). + +All of these patterns share a common trait: **the LLM decides what happens next.** The model picks which tool to call, whether to delegate, and when to stop. That's powerful for open-ended tasks where the right path depends on the conversation — but it's a liability when the process itself has rules. + +Consider scenarios like these: + +- A **document-review pipeline** where a draft must be written, reviewed, revised, and approved — in that order, every time. +- A **customer-onboarding flow** that collects information, runs a compliance check, provisions accounts, and sends a welcome email — some steps in parallel, some gated by human approval. +- An **analytics workflow** that gathers data from multiple sources, merges the results, and generates a report — where a failure halfway through should resume from the last checkpoint, not start over. + +In each case, the *structure* of the process is known ahead of time. The steps, their ordering, the decision points — these aren't things you want the model to figure out at runtime. You want to **define the graph explicitly** and let agents (or any other logic) execute within it. + +That's what [**workflows**](../workflows/index.md) provide. + +## The intelligence spectrum + +Agent applications don't have to be fully autonomous or fully rule-based — there's a spectrum in between, and workflows let you choose where to land. + +``` +Fully intelligent Fully deterministic +(model decides everything) (code decides everything) +◄──────────────────────────────────────────────────────────────► +│ │ │ +│ Single agent with │ Workflow with agent │ Workflow with only +│ tools — the model │ executors — the graph │ deterministic executors +│ picks every step │ controls the process, │ — no LLM involved, +│ │ agents handle the │ pure business logic +│ │ reasoning-heavy steps │ +``` + +At the left end, a single agent with tools handles everything — the model decides what to do, when to delegate, and when to stop. This is the most flexible approach, but also the least predictable. At the right end, a workflow with purely deterministic executors is essentially a traditional pipeline — fully predictable, but with no AI reasoning at all. + +Most real-world applications live **somewhere in the middle**. A workflow defines the structure — which steps run, in what order, with what gates — while individual executors within that workflow use agents for the steps that benefit from LLM reasoning. You get the predictability of an explicit process with the intelligence of AI where it matters. + +The key insight is that **you control the dial**. For each step in your process, you decide: + +- Should the **model** figure out what to do? → Use an [agent executor](../workflows/agents-in-workflows.md). +- Should the **code** determine the outcome? → Use a deterministic executor with regular business logic. +- Should a **human** make the call? → Use a [human-in-the-loop](../workflows/human-in-the-loop.md) gate. + +This is the real power of workflows: not replacing agents, but giving you explicit control over **how much intelligence** goes into each part of your application. + +## Choosing the right pattern + +The patterns from earlier in this journey and workflows aren't competing approaches — they're different points on the spectrum. The key question is: **who should decide what happens next?** + +| Question | If the answer is "the model" | If the answer is "the developer" | +|----------|------------------------------|----------------------------------| +| Which subtask to tackle next? | [Agents as tools](agents-as-tools.md) — the outer agent routes dynamically | [Workflows](../workflows/index.md) — the graph defines the path | +| Whether to involve another agent? | [Agents as tools](agents-as-tools.md) — model-driven delegation | [Agents in workflows](../workflows/agents-in-workflows.md) — the graph wires agents together | +| When to ask a human? | [Tool approval](../agents/tools/tool-approval.md) — reactive, per-tool | [Human-in-the-loop](../workflows/human-in-the-loop.md) — explicit gates at defined points | +| How to handle partial failure? | Retry logic in tool implementations | [Checkpoints](../workflows/checkpoints.md) — resume from the last saved state | + +In practice, most production systems **combine both**. A workflow defines the high-level process, and individual executors within that workflow use agents for the steps that benefit from LLM reasoning. The [agents in workflows](../workflows/agents-in-workflows.md) page shows exactly how to do this. + +## Built-in orchestration patterns + +For common multi-agent coordination scenarios, Agent Framework provides [built-in orchestration patterns](../workflows/orchestrations/index.md) — prebuilt workflow templates that you can use directly or customize: + +| Pattern | When to use it | +|---------|----------------| +| [**Sequential**](../workflows/orchestrations/sequential.md) | Agents execute one after another in a defined order — each builds on the previous agent's output | +| [**Concurrent**](../workflows/orchestrations/concurrent.md) | Agents execute in parallel — useful when tasks are independent and you want to reduce latency | +| [**Handoff**](../workflows/orchestrations/handoff.md) | Agents transfer control to each other based on context — good for routing to specialists | +| [**Group Chat**](../workflows/orchestrations/group-chat.md) | Agents collaborate in a shared conversation — useful for debate, review, or brainstorming | +| [**Magentic**](../workflows/orchestrations/magentic.md) | A manager agent dynamically coordinates specialized agents — balances structure with flexibility | + +These orchestrations handle the boilerplate of agent coordination so you can focus on the agents themselves. + +## Workflows as agents + +One of the most powerful composition patterns is wrapping a workflow so it looks like a regular agent. The [workflows as agents](../workflows/as-agents.md) feature lets you take a complex multi-step workflow and expose it through the standard agent interface. Other agents can call it as a tool, A2A clients can invoke it over HTTP, and consumers don't need to know they're talking to a workflow at all. + +## Journey recap + +You've now seen the full spectrum of agent development patterns: + +| Pattern | Best for | +|---------|----------| +| [LLM Fundamentals](llm-fundamentals.md) | Understanding the foundation | +| [From LLMs to Agents](from-llms-to-agents.md) | The agent abstraction | +| [Adding Tools](adding-tools.md) | Agents that act on external systems | +| [Adding Skills](adding-skills.md) | Reusable, modular agent behaviors | +| [Adding Middleware](adding-middleware.md) | Cross-cutting concerns and guardrails | +| [Context Providers](adding-context-providers.md) | Memory, personalization, and RAG | +| [Agents as Tools](agents-as-tools.md) | Simple agent composition and delegation | +| [Agent-to-Agent (A2A)](agent-to-agent.md) | Cross-service agent communication | +| [Workflows](workflows.md) | Complex, multi-step orchestration with explicit control | + +Each pattern adds capability — and complexity. The best agent systems use the simplest pattern that meets their requirements, and reach for more powerful patterns only when the scenario demands it. + +## Next steps + +**Go deeper:** + +- [Workflows overview](../workflows/index.md) — core concepts, architecture, and getting started +- [Executors](../workflows/executors.md) and [Edges](../workflows/edges.md) — building blocks of the workflow graph +- [Agents in Workflows](../workflows/agents-in-workflows.md) — integrating AI agents into workflow steps +- [Orchestrations](../workflows/orchestrations/index.md) — prebuilt multi-agent patterns (sequential, concurrent, handoff, group chat, magentic) +- [Human-in-the-Loop](../workflows/human-in-the-loop.md) — approval gates and external input +- [Checkpoints & Resuming](../workflows/checkpoints.md) — long-running workflow recovery +- [State Management](../workflows/state.md) — sharing data across executors +- [Workflows as Agents](../workflows/as-agents.md) — exposing workflows through the agent interface diff --git a/agent-framework/workflows/checkpoints.md b/agent-framework/workflows/checkpoints.md index 801eb83f0..7ee75e638 100644 --- a/agent-framework/workflows/checkpoints.md +++ b/agent-framework/workflows/checkpoints.md @@ -79,7 +79,19 @@ IReadOnlyList checkpoints = run.Checkpoints; ::: zone pivot="programming-language-python" -To enable checkpointing, a `CheckpointStorage` needs to be provided when creating a workflow. A checkpoint can then be accessed via the storage. +To enable checkpointing, a `CheckpointStorage` needs to be provided when creating a workflow. A checkpoint can then be accessed via the storage. Agent Framework ships three built-in implementations — pick the one that matches your durability and deployment needs: + +| Provider | Package | Durability | Best for | +|---|---|---|---| +| `InMemoryCheckpointStorage` | `agent-framework` | In-process only | Tests, demos, short-lived workflows | +| `FileCheckpointStorage` | `agent-framework` | Local disk | Single-machine workflows, local development | +| `CosmosCheckpointStorage` | `agent-framework-azure-cosmos` | Azure Cosmos DB | Production, distributed, cross-process workflows | + +All three implement the same `CheckpointStorage` protocol, so you can swap providers without changing workflow or executor code. + +# [In-Memory](#tab/py-ckpt-inmemory) + +`InMemoryCheckpointStorage` keeps checkpoints in process memory. Best for tests, demos, and short-lived workflows where you do not need durability across restarts. ```python from agent_framework import ( @@ -88,7 +100,6 @@ from agent_framework import ( ) # Create a checkpoint storage to manage checkpoints -# There are different implementations of CheckpointStorage, such as InMemoryCheckpointStorage and FileCheckpointStorage. checkpoint_storage = InMemoryCheckpointStorage() # Build a workflow with checkpointing enabled @@ -106,6 +117,104 @@ async for event in workflow.run(input, stream=True): checkpoints = await checkpoint_storage.list_checkpoints(workflow_name=workflow.name) ``` +# [File](#tab/py-ckpt-file) + +`FileCheckpointStorage` persists checkpoints to a local directory on disk. Best for single-machine workflows that need to survive process restarts, and for local development. + +```python +from agent_framework import ( + FileCheckpointStorage, + WorkflowBuilder, +) + +# Create a checkpoint storage backed by a directory on disk. +# storage_path is required — there is no default directory. +checkpoint_storage = FileCheckpointStorage("/var/lib/agent-framework/checkpoints") + +# Build a workflow with checkpointing enabled +builder = WorkflowBuilder(start_executor=start_executor, checkpoint_storage=checkpoint_storage) +builder.add_edge(start_executor, executor_b) +builder.add_edge(executor_b, executor_c) +builder.add_edge(executor_b, end_executor) +workflow = builder.build() + +# Run the workflow +async for event in workflow.run(input, stream=True): + ... + +# Access checkpoints from the storage +checkpoints = await checkpoint_storage.list_checkpoints(workflow_name=workflow.name) +``` + +See the [Security Considerations](#security-considerations) section for guidance on restricting which Python types can be deserialized via the `allowed_checkpoint_types` parameter. + +# [Azure Cosmos DB](#tab/py-ckpt-cosmos) + +`CosmosCheckpointStorage` persists checkpoints to Azure Cosmos DB NoSQL. Best for production and distributed workflows that need durable, cross-process checkpointing. Install the optional provider package: + +```bash +pip install agent-framework-azure-cosmos --pre +``` + +The database and container are created automatically on first use, with `/workflow_name` as the partition key for efficient per-workflow queries. The recommended authentication mode is managed identity / RBAC via an Azure `TokenCredential` such as `DefaultAzureCredential`: + +```python +from azure.identity.aio import DefaultAzureCredential +from agent_framework import WorkflowBuilder +from agent_framework_azure_cosmos import CosmosCheckpointStorage + +# CosmosCheckpointStorage is an async context manager — it closes the underlying +# Cosmos client on exit when it created the client itself. +async with ( + DefaultAzureCredential() as credential, + CosmosCheckpointStorage( + endpoint="https://.documents.azure.com:443/", + credential=credential, + database_name="agent-framework", + container_name="workflow-checkpoints", + ) as checkpoint_storage, +): + # Build a workflow with checkpointing enabled + builder = WorkflowBuilder(start_executor=start_executor, checkpoint_storage=checkpoint_storage) + builder.add_edge(start_executor, executor_b) + builder.add_edge(executor_b, executor_c) + builder.add_edge(executor_b, end_executor) + workflow = builder.build() + + # Run the workflow + async for event in workflow.run(input, stream=True): + ... + + # Access checkpoints from the storage + checkpoints = await checkpoint_storage.list_checkpoints(workflow_name=workflow.name) +``` + +Account key authentication is also supported by passing the key directly as the `credential` argument: + +```python +from agent_framework_azure_cosmos import CosmosCheckpointStorage + +checkpoint_storage = CosmosCheckpointStorage( + endpoint="https://.documents.azure.com:443/", + credential="", + database_name="agent-framework", + container_name="workflow-checkpoints", +) +``` + +Connection details can also be supplied entirely through environment variables: + +| Variable | Description | +|---|---| +| `AZURE_COSMOS_ENDPOINT` | Cosmos DB account endpoint | +| `AZURE_COSMOS_DATABASE_NAME` | Database name | +| `AZURE_COSMOS_CONTAINER_NAME` | Container name | +| `AZURE_COSMOS_KEY` | Account key (optional if using Azure credentials) | + +`CosmosCheckpointStorage` also accepts a pre-created `CosmosClient` (via `cosmos_client=`) or `ContainerProxy` (via `container_client=`) if your application already manages the Cosmos client lifecycle. + +--- + ::: zone-end ## Resuming from Checkpoints @@ -263,7 +372,7 @@ async def on_checkpoint_restore(self, state: dict[str, Any]) -> None: ## Security Considerations > [!IMPORTANT] -> Checkpoint storage is a trust boundary. Whether you use the built-in storage implementations or a custom one, the storage backend must be treated as trusted, private infrastructure. **Never load checkpoints from untrusted or potentially tampered sources.** Loading a malicious checkpoint can execute arbitrary code. +> Checkpoint storage is a trust boundary. Whether you use the built-in storage implementations or a custom one, the storage backend must be treated as trusted, private infrastructure. **Never load checkpoints from untrusted or potentially tampered sources.** ::: zone pivot="programming-language-csharp" @@ -275,14 +384,48 @@ Ensure that the storage location used for checkpoints is secured appropriately. ### Pickle serialization -`FileCheckpointStorage` uses Python's [`pickle`](https://docs.python.org/3/library/pickle.html) module to serialize non-JSON-native state such as dataclasses, datetimes, and custom objects. Because `pickle.loads()` can execute arbitrary code during deserialization, a compromised checkpoint file can run malicious code when loaded. The post-deserialization type check performed by the framework cannot prevent this. +Both `FileCheckpointStorage` and `CosmosCheckpointStorage` use Python's [`pickle`](https://docs.python.org/3/library/pickle.html) module to serialize non-JSON-native state such as dataclasses, datetimes, and custom objects. To mitigate the risks of arbitrary code execution during deserialization, both providers use a **restricted unpickler** by default. Only a built-in set of safe Python types (primitives, `datetime`, `uuid`, `Decimal`, common collections, etc.) and all `agent_framework` internal types are permitted during deserialization. Any other type encountered in a checkpoint causes deserialization to fail with a `WorkflowCheckpointException`. -If your threat model does not permit pickle-based serialization, use `InMemoryCheckpointStorage` or implement a custom `CheckpointStorage` with an alternative serialization strategy. +To allow additional application-specific types, pass them via the `allowed_checkpoint_types` parameter using `"module:qualname"` format: + +```python +from agent_framework import FileCheckpointStorage + +storage = FileCheckpointStorage( + "/tmp/checkpoints", + allowed_checkpoint_types=[ + "my_app.models:SafeState", + "my_app.models:UserProfile", + ], +) +``` + +`CosmosCheckpointStorage` accepts the same parameter: + +```python +from azure.identity.aio import DefaultAzureCredential +from agent_framework_azure_cosmos import CosmosCheckpointStorage + +storage = CosmosCheckpointStorage( + endpoint="https://my-account.documents.azure.com:443/", + credential=DefaultAzureCredential(), + database_name="agent-db", + container_name="checkpoints", + allowed_checkpoint_types=[ + "my_app.models:SafeState", + "my_app.models:UserProfile", + ], +) +``` + +If your threat model does not permit pickle-based serialization at all, use `InMemoryCheckpointStorage` or implement a custom `CheckpointStorage` with an alternative serialization strategy. ### Storage location responsibility `FileCheckpointStorage` requires an explicit `storage_path` parameter — there is no default directory. While the framework validates against path traversal attacks, securing the storage directory itself (file permissions, encryption at rest, access controls) is the developer's responsibility. Only authorized processes should have read or write access to the checkpoint directory. +`CosmosCheckpointStorage` relies on Azure Cosmos DB for storage. Use managed identity / RBAC where possible, scope the database and container to the workflow service, and rotate account keys if you use key-based auth. As with file storage, only authorized principals should have read or write access to the Cosmos DB container that holds checkpoint documents. + ::: zone-end ## Next Steps diff --git a/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/index.md b/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/index.md index 226a18cd8..58068a467 100644 --- a/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/index.md +++ b/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/index.md @@ -1,5 +1,5 @@ --- -title: Out-of-the-box Vector Store connectors (Preview) +title: Out-of-the-box Vector Store connectors description: Out-of-the-box Vector Store connectors zone_pivot_groups: programming-languages author: westey-m @@ -8,16 +8,13 @@ ms.author: westey ms.date: 07/08/2024 ms.service: semantic-kernel --- -# Out-of-the-box Vector Store connectors (Preview) +# Out-of-the-box Vector Store connectors ::: zone pivot="programming-language-csharp" ::: zone-end ::: zone pivot="programming-language-python" -> [!WARNING] -> The Semantic Kernel Vector Store functionality is in preview, and improvements that require breaking changes may still occur in limited circumstances before release. - ::: zone-end ::: zone pivot="programming-language-java" @@ -37,25 +34,25 @@ Semantic Kernel provides a number of out-of-the-box Vector Store integrations ma | Vector Store Connectors | C# | Uses officially supported SDK | Maintainer / Vendor | | ------------------------------------------------------------------ | :--------------------------: | :----------------------------: | :-------------------------------: | -| [Azure AI Search](./azure-ai-search-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | -| [Cosmos DB MongoDB (vCore)](./azure-cosmosdb-mongodb-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | -| [Cosmos DB No SQL](./azure-cosmosdb-nosql-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | +| [Azure AI Search](./azure-ai-search-connector.md) | ✅ | ✅ | Microsoft | +| [Cosmos DB MongoDB (vCore)](./azure-cosmosdb-mongodb-connector.md) | ✅ | ✅ | Microsoft | +| [Cosmos DB No SQL](./azure-cosmosdb-nosql-connector.md) | ✅ | ✅ | Microsoft | | [Couchbase](./couchbase-connector.md) | ✅ | ✅ | Couchbase | | [Elasticsearch](./elasticsearch-connector.md) | ✅ | ✅ | Elastic | | Chroma | Planned | | | -| [In-Memory](./inmemory-connector.md) | ✅ | N/A | Microsoft Semantic Kernel Project | +| [In-Memory](./inmemory-connector.md) | ✅ | N/A | Microsoft | | Milvus | Planned | | | -| [MongoDB](./mongodb-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | -| [Neon Serverless Postgres](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/neon1722366567200.neon_serverless_postgres_azure_prod) |Use [Postgres Connector](./postgres-connector.md)| ✅ | Microsoft Semantic Kernel Project | +| [MongoDB](./mongodb-connector.md) | ✅ | ✅ | Microsoft | +| [Neon Serverless Postgres](https://neon.com/) |Use [Postgres Connector](./postgres-connector.md) | ✅ | Microsoft | | [Oracle](./oracle-connector.md) | ✅ | ✅ | Oracle | -| [Pinecone](./pinecone-connector.md) | ✅ | ❌ | Microsoft Semantic Kernel Project | -| [Postgres](./postgres-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | -| [Qdrant](./qdrant-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | -| [Redis](./redis-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | -| [SQL Server](./sql-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | -| [SQLite](./sqlite-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | -| [Volatile (In-Memory)](./volatile-connector.md) | Deprecated (use In-Memory) | N/A | Microsoft Semantic Kernel Project | -| [Weaviate](./weaviate-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | +| [Pinecone](./pinecone-connector.md) | ✅ | ❌ | Microsoft | +| [Postgres](./postgres-connector.md) | ✅ | ✅ | Microsoft | +| [Qdrant](./qdrant-connector.md) | ✅ | ✅ | Microsoft | +| [Redis](./redis-connector.md) | ✅ | ✅ | Microsoft | +| [SQL Server](./sql-connector.md) | ✅ | ✅ | Microsoft | +| [SQLite](./sqlite-connector.md) | ✅ | ✅ | Microsoft | +| [Volatile (In-Memory)](./volatile-connector.md) | Deprecated (use In-Memory) | N/A | Microsoft | +| [Weaviate](./weaviate-connector.md) | ✅ | ✅ | Microsoft | ::: zone-end ::: zone pivot="programming-language-python" @@ -70,8 +67,8 @@ Semantic Kernel provides a number of out-of-the-box Vector Store integrations ma | [Faiss](./faiss-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | | [In-Memory](./inmemory-connector.md) | ✅ | N/A | Microsoft Semantic Kernel Project | | [MongoDB](./mongodb-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | -| [Neon Serverless Postgres](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/neon1722366567200.neon_serverless_postgres_azure_prod) |Use [Postgres Connector](./postgres-connector.md)| ✅ | Microsoft Semantic Kernel Project | -| [Oracle](./oracle-connector.md) | ✅ | ✅ | Oracle | +| [Neon Serverless Postgres](https://neon.com/) |Use [Postgres Connector](./postgres-connector.md) | ✅ | Microsoft Semantic Kernel Project | +| [Oracle](./oracle-connector.md) | ✅ | ✅ | Oracle | | [Pinecone](./pinecone-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | | [Postgres](./postgres-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | | [Qdrant](./qdrant-connector.md) | ✅ | ✅ | Microsoft Semantic Kernel Project | diff --git a/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/postgres-connector.md b/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/postgres-connector.md index 3323e9069..62e2d5779 100644 --- a/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/postgres-connector.md +++ b/semantic-kernel/concepts/vector-store-connectors/out-of-the-box-connectors/postgres-connector.md @@ -33,7 +33,7 @@ ms.service: semantic-kernel ## Overview -The Postgres Vector Store connector can be used to access and manage data in Postgres and also supports [Neon Serverless Postgres](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/neon1722366567200.neon_serverless_postgres_azure_prod). +The Postgres Vector Store connector can be used to access and manage data in Postgres and also supports [Neon Serverless Postgres](https://neon.com/). The connector has the following characteristics. diff --git a/semantic-kernel/media/azure-ai-foundry-attach-app-insights.png b/semantic-kernel/media/azure-ai-foundry-attach-app-insights.png index 94146a4a2..fc0db7556 100644 Binary files a/semantic-kernel/media/azure-ai-foundry-attach-app-insights.png and b/semantic-kernel/media/azure-ai-foundry-attach-app-insights.png differ