---
title: "Hands-on Tutorial"
subtitle: "Analyzing Political Ideology in Speeches"
author: "Seraphine F. Maerz"
date: today
format:
  html:
    theme: cosmo
    toc: true
    toc-depth: 3
    code-fold: false
    code-tools: true
    highlight-style: github
---

![](pics/logo.png){width=20%}

<a href="https://cdn.jsdelivr.net/gh/quantilab/quantilab.github.io@main/sharezone/brisbane/quallmer_tutorial.qmd" download="quallmer_tutorial.qmd">Download the tutorial file (.qmd)</a>

# Welcome!

This tutorial walks you through the **complete quallmer workflow in 5 steps**, using ideology detection in political speeches as our running example.

::: {.callout-tip}
## The 5-Step Workflow

| Step | Function | Purpose |
|------|----------|---------|
| **1** | `qlm_codebook()` | Define your coding scheme |
| **2** | `qlm_code()` | Apply LLM coding to texts |
| **3** | `qlm_replicate()` | Test robustness across models/settings |
| **4** | `qlm_compare()` / `qlm_validate()` | Assess reliability and validity |
| **5** | `qlm_trail()` | Create audit documentation |
:::

------------------------------------------------------------------------

# Getting Started

## Install Required Packages

```{r}
#| eval: false

# Install quallmer from CRAN
install.packages("quallmer")

# Other packages we'll use
install.packages("quanteda")   # For sample corpus
install.packages("dplyr")      # For data manipulation
```

## Load Packages

```{r}
#| eval: false
#| message: false
#| warning: false

library(quallmer)
library(quanteda)
library(dplyr)
```

## Set Up Your API Key

::: {.callout-important}
## API Key Required

You need an OpenAI API key to run this tutorial. Get one at [platform.openai.com](https://platform.openai.com).
:::

```{r}
#| eval: false

# Option 1: Set in your R session
Sys.setenv(OPENAI_API_KEY = "your-api-key-here")

# Option 2 (recommended): Add to your .Renviron file
# Run: usethis::edit_r_environ()
# Add: OPENAI_API_KEY=your-api-key-here
```

## Load Sample Data

We'll use US inaugural speeches from the `quanteda` package -- a small corpus perfect for learning.

```{r}
#| eval: false

# Load the five most recent inaugural speeches
inaugural_texts <- as.character(quanteda::data_corpus_inaugural[56:60])
names(inaugural_texts) <- names(quanteda::data_corpus_inaugural[56:60])

# Check what we have
names(inaugural_texts)
# [1] "2009-Obama" "2013-Obama" "2017-Trump" "2021-Biden" "2025-Trump"

# Preview one speech
substr(inaugural_texts[1], 1, 300)
```

------------------------------------------------------------------------

# Step 1: Define Your Codebook

The codebook tells the LLM **what to look for** and **how to code it**. This is the most important step -- take time to craft clear instructions!

## The `qlm_codebook()` Function

```{r}
#| eval: false

# Create the codebook
ideology_codebook <- qlm_codebook(
  name = "Ideological Scaling",

  role = "You are an expert political scientist performing ideological text scaling.",

  instructions = "Read each text carefully. Place the text on a -5 to +5 scale
    for the inclusive-exclusive ideological dimension.

    INCLUSIVE language (-5): Emphasizes equal rights, diversity, pluralism,
    and protection of minorities.

    EXCLUSIVE language (+5): Emphasizes exclusion of groups, national homogeneity,
    and restricting rights.

    Score 0 = neutral or mixed rhetoric.",

  schema = type_object(
    score = type_integer(
      "Ideological position (-5 = inclusive, +5 = exclusive)"
    ),
    explanation = type_string(
      "Brief justification for the assigned score, referring to specific text elements"
    )
  )
)
```

## Understanding the Components

| Component | Purpose | Our Example |
|-----------|---------|-------------|
| `name` | Identifies the codebook | "Ideological Scaling" |
| `role` | Sets the LLM's perspective | "Expert political scientist" |
| `instructions` | Tells the LLM what to do | Dimension definition + scoring criteria |
| `schema` | Defines output format | Score (-5 to +5) + explanation |

::: {.callout-tip}
## Tips for Good Codebooks

1. **Be specific** -- Define categories and scales clearly
2. **Provide context** -- Explain what each score means
3. **Include explanations** -- Always ask for reasoning (helps you validate!)
4. **Iterate** -- Test with a few examples and refine
:::

## Schema Options

The `schema` defines **what the LLM returns** (see [ellmer type specifications](https://ellmer.tidyverse.org/reference/index.html)):

| Type | Use Case | Example |
|------|----------|---------|
| `type_boolean()` | Yes/no questions | TRUE/FALSE |
| `type_integer()` | Whole number scores | Score from -5 to +5 |
| `type_number()` | Decimal values | Confidence score 0.0 to 1.0 |
| `type_string()` | Text/explanations | "Brief justification" |
| `type_enum()` | Fixed categories | c("positive", "negative", "neutral") |
| `type_array()` | Lists of items | Named entities, themes |
| `type_object()` | Structured data | Combine multiple fields |

------------------------------------------------------------------------

# Step 2: Code Your Data

Now we apply the codebook to our texts using `qlm_code()`.

## Run the Analysis

```{r}
#| eval: false

# Apply the codebook to inaugural speeches
coded_run1 <- qlm_code(
  inaugural_texts,
  codebook = ideology_codebook,
  model = "openai/gpt-4o-mini",
  name = "run1_ideology"
)

# View results
coded_run1
```

## Understanding the Output

The result is a `qlm_coded` object containing:

- **Coding results**: Score and explanation for each text
- **Metadata**: Model used, timestamps, codebook reference
- **Provenance**: Links to parent analyses (for replication)

```{r}
#| eval: false

# View as a data frame
as.data.frame(coded_run1)

# Access specific columns
coded_run1$score
coded_run1$explanation
```

::: {.callout-note}
## Your Turn

1. Run the code above
2. Look at the scores -- do they match your intuition?
3. Read the explanations -- are they reasonable?
:::

------------------------------------------------------------------------

# Step 3: Replicate

LLMs are not 100% reproducible. Use `qlm_replicate()` to test consistency and robustness.

## Same Settings (Test Reproducibility)

```{r}
#| eval: false

# Replicate with identical settings
coded_run2 <- qlm_replicate(
  coded_run1,
  name = "run2_same_settings"
)

coded_run2
```

## Different Temperature (Test Sensitivity)

```{r}
#| eval: false

# Higher temperature = more variation
coded_run3 <- qlm_replicate(
  coded_run1,
  params = params(temperature = 0.9),
  name = "run3_high_temp"
)

coded_run3
```

## Different Model (Test Cross-Model Consistency)

::: {.callout-note}
## Using Ollama for Local LLMs

To use Ollama models, first install Ollama from [ollama.com](https://ollama.com), then pull the model in R:

```r
install.packages("rollama")
rollama::pull_model("llama3.2:1b")
```

Ollama runs locally -- no API key needed, and your data stays on your machine.
:::

```{r}
#| eval: false

# Try a local open-source model via Ollama
coded_run4 <- qlm_replicate(
  coded_run1,
  model = "ollama/llama3.2:1b",
  name = "run4_llama"
)

coded_run4
```

::: {.callout-tip}
## Why Replicate?

- **Same settings** → Tests LLM consistency
- **Different temperature** → Tests sensitivity to randomness
- **Different models** → Tests robustness across LLMs
- **Multiple runs** → Builds confidence in your results
:::

------------------------------------------------------------------------

# Step 4: Compare and Validate

Now we assess how well our codings agree -- both across LLM runs (reliability) and against human standards (validity).

## Intercoder Reliability with `qlm_compare()`

Compare multiple LLM runs to measure agreement:

```{r}
#| eval: false

# Compare all four runs
comparison <- qlm_compare(
  coded_run1,
  coded_run2,
  coded_run3,
  coded_run4,
  by = "score",
  level = "ordinal"
)

# View results
print(comparison)
```

## Understanding the Metrics

| Metric | What It Measures | Good Value |
|--------|------------------|------------|
| Krippendorff's alpha | Overall agreement | > 0.80 |
| Fleiss' kappa | Multi-rater agreement | > 0.60 |
| Percent agreement | Simple agreement | > 80% |

::: {.callout-note}
## Interpreting Reliability

| Value | Agreement Level |
|-------|-----------------|
| < 0.40 | Poor |
| 0.40 - 0.60 | Moderate |
| 0.60 - 0.80 | Substantial |
| > 0.80 | Almost perfect |
:::

## Gold Standard Validation with `qlm_validate()`

If you have human-coded data, validate against it:

```{r}
#| eval: false

# Example: Create a gold standard (normally from human coders)
gold_scores <- data.frame(
  .id = names(inaugural_texts),
  score = c(-3, -4, 4, -2, 1)  # Your human-coded scores
)
gold_standard <- as_qlm_coded(gold_scores, name = "human_gold")

# Validate LLM against gold standard
validation <- qlm_validate(
  coded_run1,
  gold = gold_standard,
  by = "score",
  level = "ordinal"
)

print(validation)
```

## Manual Review with quallmer.app

For hands-on validation, use the interactive Shiny app:

```{r}
#| eval: false

# Install and launch the app
install.packages("quallmer.app")
library(quallmer.app)
qlm_app()
```

The app allows you to:

- Review LLM-generated scores and explanations
- Mark annotations as valid/invalid
- Add your own codes for comparison
- Calculate agreement metrics

------------------------------------------------------------------------

# Step 5: Create Audit Trail

Document everything for transparency and reproducibility with `qlm_trail()`.

## Generate Documentation

```{r}
#| eval: false

# Create audit trail from all runs
qlm_trail(
  coded_run1,
  coded_run2,
  coded_run3,
  coded_run4,
  path = "ideology_analysis"
)
```

This creates two files:

- `ideology_analysis.rds` -- Complete R object (all data, reloadable)
- `ideology_analysis.qmd` -- Quarto report (human-readable documentation)

## What's in the Audit Trail?

Following Lincoln & Guba's (1985) trustworthiness framework:

| Component | What It Documents |
|-----------|-------------------|
| **Codebook** | Exact instructions given to the LLM |
| **Model settings** | Model name, temperature, parameters |
| **All inputs** | The texts that were coded |
| **All outputs** | Scores and explanations |
| **Timestamps** | When each analysis was run |
| **Provenance** | Parent-child relationships between runs |
| **Session info** | Package versions, R environment |

------------------------------------------------------------------------

# Key Takeaways

::: {.callout-tip}
## Remember

- **Codebooks are crucial** -- Clear instructions = better results
- **Always replicate** -- LLMs are not 100% reproducible
- **Validation is essential** -- LLMs produce language, not truth
- **Document everything** -- Audit trails ensure transparency
:::

------------------------------------------------------------------------

# Exercises

## Exercise 1: Create Your Own Codebook

Try a different ideological dimension:

```{r}
#| eval: false

# Example: Populist rhetoric
populist_codebook <- qlm_codebook(
  name = "Populist Rhetoric",
  role = "You are a political scientist analyzing populist language.",
  instructions = "Score the text on populist rhetoric (0 = not populist, 5 = highly populist).
    Populist rhetoric includes: anti-elite sentiment, appeals to 'the people',
    us-vs-them framing, claims of representing the silent majority.",
  schema = type_object(
    score = type_integer("Populism score from 0 to 5"),
    explanation = type_string("Brief justification")
  )
)

# Apply to your data
coded_populist <- qlm_code(inaugural_texts, populist_codebook, model = "openai/gpt-4o-mini")
```

## Exercise 2: Full Workflow Practice

Run the complete 5-step workflow on your own texts:

1. Create a codebook for your research question
2. Code your data with `qlm_code()`
3. Replicate with at least 2 different settings
4. Compare runs with `qlm_compare()`
5. Generate an audit trail

------------------------------------------------------------------------

# Resources

- **Package website:** [quallmer.github.io/quallmer](https://quallmer.github.io/quallmer)
- **My Instats workshops (including fine-tuning LLMs):** [Instats Seminars](https://instats.org/expert/seraphine-maerz-2?view=Seminars)
- **Contact:** [seraphinem.github.io](https://seraphinem.github.io)

------------------------------------------------------------------------

<footer>
Copyright © 2026 by [Seraphine F. Maerz](https://seraphinem.github.io/). This page is built with [GitHub Copilot](https://github.com/features/copilot) and [Quarto](https://quarto.org/).
</footer>
