# LLM Fundamentals Quiz

## Question 1

What is a token?

A) A security credential for API access
B) A subword unit used to represent text — words or parts of words
C) A type of neural network layer
D) A unit of time in model training

<!-- ANSWER: B -->
<!-- EXPLANATION: Tokens are subword units (words, word pieces, punctuation) that the model uses to represent text. Tokenization splits input into tokens before processing. -->

## Question 2

What happens when your prompt plus response exceeds the context window?

A) The model automatically switches to a larger model
B) The model runs faster
C) Older tokens are dropped or truncated; the model can't use them
D) The model compresses the text to fit

<!-- ANSWER: C -->
<!-- EXPLANATION: When the context window is exceeded, earlier tokens are typically dropped or truncated. The model cannot attend to tokens outside its context limit. -->

## Question 3

Why do LLMs hallucinate?

A) They are programmed to make things up
B) They complete patterns confidently without verifying facts
C) They only hallucinate when temperature is 0
D) Hallucination is always due to training data errors

<!-- ANSWER: B -->
<!-- EXPLANATION: LLMs predict the next token based on patterns. They can confidently complete plausible-sounding text that isn't true because they don't have a mechanism to verify facts. -->

## Question 4

Match the transformer stage to its role:

<!-- VISUAL: quiz-matching -->

A) Tokenize — Split text into subword units
B) Embed — Map tokens to vectors
C) Attention — Weight which tokens matter for prediction
D) Generate — Predict next token

<!-- ANSWER: All match correctly -->
<!-- EXPLANATION: Tokenize splits input; Embed converts to vectors; Attention computes relevance; Generate produces the next token. -->

## Question 5

What does RLHF stand for and what is it used for?

A) Reinforcement Learning from Human Feedback — aligning model behavior with human preferences
B) Random Language Hyperparameter Finetuning — tuning model settings
C) Recursive Layer Hidden Framework — a type of architecture
D) Real-time Language Human Filter — content moderation

<!-- ANSWER: A -->
<!-- EXPLANATION: RLHF uses human feedback to train a reward model, then optimizes the policy toward that reward. It helps align model behavior with human values (helpful, harmless, honest). -->

## Question 6

What is the approximate token-to-character ratio for English text?

A) 1 token = 1 character
B) 1 token ≈ 4 characters
C) 1 token = 1 word
D) 1 token ≈ 10 characters

<!-- ANSWER: B -->
<!-- EXPLANATION: In English, roughly 1 token equals about 4 characters (or 0.75 tokens per word). This varies by language and content type. -->

## Question 7

<!-- VISUAL: quiz-matching -->

Match each LLM concept to its description:

A) Context window → 1) Maximum sequence length the model can process at once
B) Temperature → 2) Subword units used to represent text
C) Tokens → 3) Controls randomness; higher = more varied outputs
D) Hallucination → 4) Model generating plausible but false or irrelevant content

<!-- ANSWER: A1,B3,C2,D4 -->
<!-- EXPLANATION: Context window is the max sequence length. Temperature controls output randomness. Tokens are subword units. Hallucination is confident generation of false or irrelevant content. -->

## Question 8

<!-- VISUAL: quiz-matching -->

Match each model parameter to its typical effect:

A) Temperature = 0 → 1) More deterministic, focused output
B) Temperature = 1 → 2) More creative, varied output
C) Top-p (nucleus) low → 3) Fewer token choices; more focused
D) Top-p (nucleus) high → 4) Broader token sampling; more diversity

<!-- ANSWER: A1,B2,C3,D4 -->
<!-- EXPLANATION: Low temperature yields deterministic outputs; high temperature increases variety. Low top-p restricts to high-probability tokens; high top-p allows broader sampling. -->