# AI Agents Walkthrough — Learn by Doing

## Before We Begin

**Diagnostic Question:** When an LLM uses a "tool" (like a search API or calculator), how is that different from the model just generating text? What has to happen for the tool to be useful?

**Checkpoint:** You recognize that tools give models access to external data and actions. The model must decide when to call a tool, what to pass, and how to use the result.

---

## Step 1: Define a Simple Tool

<!-- hint:code language="json" highlight="1,3" -->

**Task:** Design a tool for an agent that looks up the current weather. Specify: name, description, and parameters (e.g., city, unit). Write it as a JSON schema. Why is the description important?

**Question:** What would happen if the description were vague? How would the model choose between this tool and a "get_forecast" tool?

**Checkpoint:** The user defines a tool with clear name, description, and parameters. They understand the model uses the description to decide when to call the tool.

---

## Step 2: Trace the Agent Loop

<!-- hint:diagram mermaid-type="flowchart" topic="observe think act loop" -->

**Task:** A user asks: "What's the population of Tokyo?" The agent has a `search_web` tool. Trace the loop: (1) Observe — what does the agent see? (2) Think — what does it decide? (3) Act — what tool call? (4) Observe — what comes back? (5) Think/Act — what next?

**Question:** When would the loop stop? What if the search returned no results?

**Checkpoint:** The user traces observe → think → act → observe → respond. They can handle the "no results" case (retry, different query, or admit uncertainty).

---

## Step 3: Design a Multi-Step Workflow

**Task:** Design an agent workflow for "Summarize the top 3 results from a web search for X." List the tools needed, the order of operations, and what the agent outputs at each step. Draw a simple flow.

**Question:** Where could the agent get stuck? What guardrails would you add?

**Checkpoint:** The user designs: search → get top 3 → fetch content → summarize. They identify failure modes (empty results, timeout) and suggest retries or fallbacks.

---

## Step 4: Compare ReAct vs Pre-Planning

<!-- hint:buttons type="single" prompt="Which approach adapts better to surprises?" options="ReAct,Pre-planning,Both equally" -->

**Task:** For the task "Find the GDP of Japan and France, then compare them", sketch two approaches: (A) ReAct — interleave reasoning and tool calls. (B) Pre-plan — write a 3-step plan first, then execute. What are the tradeoffs?

**Question:** When might pre-planning fail? When might ReAct be inefficient?

**Checkpoint:** The user contrasts both approaches. They note: ReAct adapts to surprises; pre-planning can be wrong. ReAct may make redundant calls; pre-planning may miss steps.

---

## Step 5: Map MCP to Your Use Case

<!-- hint:card type="concept" title="MCP (Model Context Protocol)" -->

**Task:** Pick a tool you'd want an agent to use (e.g., read files, query a DB, call an API). Describe how it would look as an MCP tool: name, description, inputs, outputs. What does MCP give you that a custom integration doesn't?

**Question:** How would MCP help if you had 5 different tools from 5 vendors?

**Checkpoint:** The user describes one MCP tool with schema. They understand MCP provides a standard interface so any MCP client can use any MCP server.

---

## Step 6: Add Safety Constraints

**Task:** Your agent can run shell commands and edit files. List 3 safety measures you'd implement. For each, say what it prevents and what the tradeoff is (e.g., slower, less flexible).

**Question:** Where would you require human approval? What would you never allow?

**Checkpoint:** The user proposes: sandbox (no network), allowlist (only certain commands), human approval for destructive ops. They can justify each and name a "never allow" (e.g., `rm -rf /`).

---

## Step 7: Design a Multi-Agent System

**Task:** Design a 3-agent system for "Analyze this codebase and suggest improvements." Define: (1) Orchestrator role, (2) Two specialist agents with distinct tools, (3) How they hand off work and merge results.

**Question:** What does the orchestrator need to "know" to delegate well? How do you avoid duplicate work?

**Checkpoint:** The user defines orchestrator + 2 specialists (e.g., Analyzer + Writer). They describe handoff (orchestrator assigns files, specialists return findings) and a simple merge strategy.