# RAG Fundamentals Exercises

## Exercise 1: Choose a Chunking Strategy

**Task:** For each document type, pick a chunking strategy and justify in one sentence: (a) Legal contract, (b) FAQ page, (c) Codebase documentation, (d) Blog posts.

**Validation:**
- [ ] Legal: semantic/paragraph (preserve clause boundaries)
- [ ] FAQ: by Q&A pair (natural unit)
- [ ] Code docs: semantic by function/section
- [ ] Blog: paragraph or recursive

**Hints:**
1. Legal: clauses matter; don't split mid-clause
2. FAQ: each Q&A is a natural chunk
3. Code: function, class, or section
4. Blog: paragraphs or sections

---

## Exercise 2: Design Metadata for Retrieval

**Task:** You're indexing a product docs site. What metadata would you store with each chunk? How would you use it at query time? List 4 metadata fields and one filter scenario for each.

**Validation:**
- [ ] At least: source, section, product, last_updated
- [ ] Each has a filter use case (e.g., "only v2 docs", "only API reference")

**Hints:**
1. `source`: "Which file/page"
2. `section`: "API, Getting Started, etc."
3. `product`: "Product A, B"
4. `last_updated`: Filter stale content

---

## Exercise 3: Compare Keyword vs Semantic Search

**Task:** Write a query that keyword search would handle well, and one that semantic search would handle better. For each, explain why. Example: "exact error code XYZ" vs "how to fix connection timeout".

**Validation:**
- [ ] Keyword: exact phrase, specific term, part number
- [ ] Semantic: paraphrased, conceptual, "how do I...?"

**Hints:**
1. Keyword good: "ERROR_CODE_404", "API v2.1"
2. Semantic good: "payment failed" ≈ "transaction declined", "troubleshoot slow API"

---

## Exercise 4: Write an Augmented Prompt Template

**Task:** Write a prompt template for RAG. Placeholders: `{context}` (retrieved chunks), `{question}` (user query). Include instructions: use only the context, say "I don't know" if the answer isn't there, cite the source when possible.

**Validation:**
- [ ] Has {context} and {question}
- [ ] Instructs to use only context
- [ ] Handles "not in context" case
- [ ] Asks for citation/source when possible

**Hints:**
1. "Use ONLY the following context. If the answer isn't there, say so."
2. "When possible, cite which part of the context supports your answer."
3. Template: "Context:\n{context}\n\nQuestion: {question}\n\nAnswer:"

---

## Exercise 5: Propose Evaluation Metrics

**Task:** For a customer-support RAG bot, propose 3 metrics you'd track. For each: name, what it measures, how you'd compute it (human eval, model-as-judge, or automated).

**Validation:**
- [ ] Faithfulness: answer grounded in context (model-as-judge or human)
- [ ] Relevance: retrieved chunks match query (human or relevance model)
- [ ] Correctness: answer is factually right (human or comparison to gold)

**Hints:**
1. Faithfulness: "Does the answer only use the provided context?"
2. Relevance: "Do retrieved chunks address the question?"
3. Correctness: "Is the answer factually correct?"
