# LLM Fundamentals Quick Reference

## Transformer Pipeline

```
Input → Tokenize → Embed → Attention → Generate → Detokenize → Output
```

## Token Basics

| Concept | Value |
|---------|-------|
| ~1 token | ~4 characters (English) |
| ~1 token | ~0.75 words |
| BPE | Byte Pair Encoding (common tokenization) |

## Context Window

- Max tokens (prompt + response) the model can process
- Exceeded → older tokens dropped
- Sizes: 4K–8K (small), 32K–128K (medium), 200K+ (large)

## Temperature

| Range | Effect |
|-------|--------|
| 0 | Deterministic (always most likely token) |
| 0.3–0.5 | Slight variation, factual tasks |
| 0.7–1.0 | Creative, varied output |
| >1 | Very random, often incoherent |

## Training vs Inference

| Phase | Purpose |
|-------|---------|
| Training | Learn from data; update weights |
| Inference | Generate; weights fixed; pay per token |

## Fine-Tuning vs Prompting

| Approach | When |
|----------|------|
| Prompting | Fast, flexible; no model change |
| Fine-tuning | Domain-specific; need labeled data |

## Why Hallucination?

- Model completes patterns; doesn't verify facts
- Confident ≠ correct
- Mitigate: RAG, verification, lower temperature

## Model Families

- **GPT** — OpenAI
- **Claude** — Anthropic
- **Llama** — Meta (open)
- **Gemini** — Google

## One-Liners

- **Tokens** — Subword units; drive cost and context limits.
- **Attention** — Model weights which tokens matter for next prediction.
- **Context window** — Max tokens; overflow = truncation.
- **Hallucination** — Confident pattern completion without knowledge.
