# RAG Fundamentals Walkthrough — Learn by Doing

## Before We Begin

**Diagnostic Question:** When you ask an LLM a question about your company docs, how does it "know" the answer? What if the answer wasn't in its training data?

**Checkpoint:** You recognize that RAG (Retrieval-Augmented Generation) lets LLMs use external knowledge. You fetch relevant chunks, add them to the prompt, and the model answers from that context.

---

## Step 1: Chunk a Document

<!-- hint:code language="text" highlight="1,3" -->

**Task:** Take a 1–2 page document (or create one). Chunk it three ways: (a) fixed 100 tokens, (b) by paragraph, (c) recursive (try sentence, then paragraph). For each, note: chunk count, whether any chunk loses context, and where boundaries feel wrong.

**Question:** When would fixed-size fail? When would paragraph boundaries be better?

**Checkpoint:** The user produces chunks for all three strategies. They observe that fixed-size can split mid-sentence; paragraph preserves semantics but chunks may be uneven.

---

## Step 2: Understand Embeddings (Conceptually)

<!-- hint:card type="concept" title="Embeddings" -->

**Task:** Without coding, reason about similarity. For "How do I reset my password?", rank these by how semantically close they are: (A) "Password reset instructions", (B) "The sky is blue", (C) "Forgot password? Click here." Explain your ranking.

**Question:** Why might keyword search match (A) and (C) but miss (B)? Why do embeddings help?

**Checkpoint:** The user ranks: (A) or (C) closest, (B) farthest. They understand embeddings capture meaning; "reset" and "forgot" are related even without shared words.

---

## Step 3: Design the RAG Pipeline

<!-- hint:diagram mermaid-type="flowchart" topic="RAG pipeline" -->

**Task:** Sketch the full pipeline for "Answer questions from our company handbook." List: (1) What gets indexed, (2) Chunking strategy, (3) How retrieval works, (4) How the prompt is augmented, (5) What the model receives.

**Question:** What metadata would you store with each chunk? How would you filter (e.g., by section, date)?

**Checkpoint:** The user designs: handbook → chunk → embed → store; query → embed → retrieve → augment → generate. They propose metadata (section, page) and filtering.

---

## Step 4: Tune Top-k

<!-- hint:buttons type="single" prompt="When would you choose 3 chunks vs 10?" options="3 for focus,10 for breadth,Depends on doc size" -->

**Task:** If you retrieve 3 chunks vs. 10 chunks, what changes? List pros and cons. When would you choose 3? When 10? What problem does re-ranking address?

**Question:** Why might more chunks hurt? How does re-ranking help?

**Checkpoint:** The user explains: 3 = focused, may miss context; 10 = broader, may add noise and use context. Re-ranking improves precision of top results before passing to the LLM.

---

## Step 5: Build a Minimal RAG (Hands-On)

**Task:** Using a library (LangChain, LlamaIndex, or similar), build a minimal RAG: load 2–3 paragraphs, chunk them, embed, store in an in-memory vector store. Query and print the retrieved chunks. Then augment a prompt and call an LLM. Confirm the answer cites the retrieved text.

**Question:** What would you add for production? (Persistence? Metadata? Evaluation?)

**Checkpoint:** The user has a working pipeline: load → chunk → embed → store → query → retrieve → augment → generate. They can list next steps (persistence, eval, etc.).

---

## Step 6: Identify a Pitfall

**Task:** A RAG system returns wrong answers. It retrieves 5 chunks and passes them to the model. List 4 possible causes (chunking, retrieval, prompt, model) and one fix for each.

**Question:** How would you debug? What would you measure first?

**Checkpoint:** The user identifies: bad chunks (fix: better chunking), irrelevant retrieval (fix: tune k, threshold, re-rank), weak prompt (fix: clarify "only use provided context"), or model ignoring context (fix: stronger instructions, eval faithfulness).
