# dicerollerts

A TypeScript library for parsing, rolling, and analyzing dice expressions, plus
a small program language for tabletop-RPG-flavoured automation. Supports
standard dice notation, custom dice, dice pools, exploding/rerolling/compound
mechanics, structured-face dice, parametric dice, and full probability
analysis (exact distributions, joint distributions, and adaptive Monte Carlo).

## Install

```bash
npm install dicerollerts
```

## Quick Start

```ts
import { ProgramParser, Evaluator, RR } from 'dicerollerts'

const result = ProgramParser.parse('4d6 drop 1')
if (result.success) {
  const evaluator = new Evaluator(
    (sides) => Math.floor(Math.random() * sides) + 1,
  )
  console.log(evaluator.run(result.program)) // e.g. 14
}
```

## One language

This library is a single language. Dice atoms (`d6`, `4d6`, `dF`, `d%`,
`d{1,2,3}`, `@name`, `3 @name`) are first-class expressions and compose
freely with variables, arithmetic, control flow, records, arrays, and
comprehensions:

```text
$str = 5
$attack = d20 + $str
if $attack >= 15 then 2d6 + $str else 0
```

There is no separate dice-expression "mode" and no backtick wrapping -
write the same notation everywhere. The single public parser entry point
is `ProgramParser.parse(source)`.

> **Migrating from 0.x**:
>
> - Backticks are no longer accepted. Replace `` `d20 + $mod` `` with
>   `d20 + $mod`. Run stored programs through `migrateSource(text)` (a
>   one-line regex that strips backticks) for a quick mechanical pass.
> - Arithmetic that previously lived inside a backtick block is now
>   program-level, so `4d6 drop 1` and `4d6 + 2d8` behave identically -
>   the dice modifiers still bind tighter than `+`/`-`, so `4 + 4d6 drop 1`
>   parses as `4 + ((4d6) drop 1)`.
> - Public `DiceParser` is gone; use `ProgramParser.parse` for everything.
>   `Roller.roll` and `DiceStats.{distribution,mean,...}` now accept the
>   unified `Expression` AST emitted by the parser.
> - The `DiceExpr` wrapper node is gone. Dice atoms (`Die`, `NDice`,
>   `CustomDie`, `DiceReduce`, `StructuredDiceRoll`) appear directly in
>   `Expression`. The `source: string` cosmetic field is removed.
> - `Evaluator`'s `onDiceExpr` + `onStructuredDiceRoll` callbacks merged
>   into a single `onDiceTerm: (event: DiceTermResult) => void`. The
>   `event.kind` discriminator is `'arithmetic'` | `'structured'`.
> - `RollResult` tag renames: `binary-op-result` → `binary-expr-result`,
>   `literal-result` → `number-literal-result`, `unary-op-result` →
>   `unary-expr-result`. Constructors renamed to match (`binaryExprResult`,
>   `numberLiteralResult`, `unaryExprResult`). Op fields use the program
>   vocabulary (`add`/`subtract`/`multiply`/`divide`).
> - `ProgramParameters.list` shape: `defaultExpr` + `defaultSource` →
>   single `defaultExpression: Expression`.
> - `partsing` runtime dependency is gone - the package now ships with
>   zero runtime deps.

---

# 1. Dice Expression Language

Pass any of the following to `ProgramParser.parse(...)`. Whitespace is
generally free; `_` is treated as whitespace.

### Multiline expressions

A newline normally ends a statement. An expression may still span several lines
when a line **ends with a pending binary operator** (`+ - * / and or` and the
comparison operators) - the trailing operator signals that the right-hand side
continues on the next line:

```
$dmg = 2d6 + $str +
       $weapon_bonus +
       $crit_dice

$hit = $attack >= $ac and
       not $defender_invisible
```

The operator must be the _last_ token on the line; a newline placed _before_ an
operator ends the statement instead. A `#` comment is allowed between a trailing
operator and the newline.

## Basic dice and literals

| Notation       | Description                  |
| -------------- | ---------------------------- |
| `d6`           | One six-sided die            |
| `3d6`          | Three six-sided dice, summed |
| `d20`          | Twenty-sided die             |
| `d%` or `d100` | Percent die                  |
| `42`           | Literal number               |
| `-d6`          | Negated die                  |

## Custom dice

| Notation         | Description                 |
| ---------------- | --------------------------- |
| `d{1,1,2,2,3,4}` | Die with custom face values |
| `dF`             | Fate / Fudge die (-1, 0, 1) |
| `4dF`            | Four Fate dice              |

Custom face values may be negative or zero.

## Arithmetic

Standard precedence (multiplication / division before addition / subtraction),
left-to-right within each tier. Parentheses group sub-expressions.

| Notation    | Description                                                                                                                                                                     |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `3d6 + 4`   | Addition                                                                                                                                                                        |
| `2d8 - 1`   | Subtraction                                                                                                                                                                     |
| `d6 * 2`    | Multiplication                                                                                                                                                                  |
| `d100 / 10` | Division (truncates to integer; throws `UndefinedOutcomeError` on zero divisor - the analyser turns this into `partial-number` / `undefined` `FieldStats` rather than crashing) |
| `(2d6+1)*3` | Parenthesized                                                                                                                                                                   |

Unicode aliases: `×` and `⋅` are accepted for multiplication; `÷` for division.
`≤` for `<=`, `≥` for `>=`, `≠` for `!=`. `→` for the `->` match-arm / fold
lambda arrow. The letter `x` and the colon `:` are **not** accepted as
operators - they collide with identifiers (and `0x…` literals) and with
record-field / array-slice syntax respectively.

## Rounding

Five postfix operators turn a numeric expression into a rounded integer. They
bind tighter than arithmetic - `2d6 / 3 round` rounds the quotient, not the
operand.

| Form                | Mode                             | `1.5` | `-1.5` | `2.5` |
| ------------------- | -------------------------------- | ----- | ------ | ----- |
| `… round`           | Half away from zero              | `2`   | `-2`   | `3`   |
| `… round up`        | Ceiling - toward `+∞`            | `2`   | `-1`   | `3`   |
| `… round down`      | Floor - toward `-∞`              | `1`   | `-2`   | `2`   |
| `… truncate`        | Toward zero                      | `1`   | `-1`   | `2`   |
| `… round half even` | Banker's rounding (nearest even) | `2`   | `-2`   | `2`   |

`round` is **not** JavaScript's `Math.round` - `-1.5 round` is `-2`, not `-1`.
This matches the convention used by spreadsheet `ROUND`, Python's
`decimal.ROUND_HALF_UP`, and most tabletop rulebooks.

The keywords are not lexer-level reserved - `$round`, `$truncate`, etc. remain
valid identifiers outside the postfix slot.

The runtime helpers are also exported for consumers that fold `RoundExpr`
nodes into a normalised number themselves (UIs that display the rounded value
alongside the source, serializers that pre-compute the result):

```ts
import {
  applyRoundMode,
  roundHalfAwayFromZero,
  roundHalfEven,
} from 'dicerollerts'

applyRoundMode(1.5, 'round-half-even') // 2
roundHalfAwayFromZero(-1.5) // -2
```

## Expression sets and reducers

Group dice into a pool with square brackets. The bare bracket form
`[d, d, …]` is an array literal (list-typed); adding a reducer
suffix turns it into a scalar dice-pool expression.

| Notation              | Description                 |
| --------------------- | --------------------------- |
| `[2d6, 3d8, d10]`     | List of three rolled values |
| `[2d6, 3d8, d10] sum` | Sum of all (explicit `sum`) |
| `[2d6, 3d8] min`      | Lowest result               |
| `[2d6, 3d8] max`      | Highest result              |
| `[2d6, 3d8] average`  | Arithmetic mean (rounded)   |
| `[2d6, 3d8] median`   | Median                      |
| `[d20, d20] keep 1`   | Filter + default `sum`      |

Reducer aliases: `min` / `minimum` / `take least`; `max` / `maximum` / `take best`;
`average` / `avg`; `median` / `med`.

> **Migrating from older versions:**
>
> - Paren-list dice pools `(d, d, …)` were removed; use `[d, d, …]`.
> - `as pool` is gone - bracket literals are already list-typed, so
>   `$rolls = [d20, d20]` replaces the historic
>   `$rolls = 2d20 as pool`.
> - Bare `(expr)` parens still work for arithmetic grouping
>   (`(d6 + 1) * 2`); only the multi-element comma-separated paren
>   form is gone.

## Drop and Keep

Drop or keep some of the dice in a pool. Filters can be chained - each filter
operates on the surviving dice from the previous one.

| Notation                     | Shorthand | Description                             |
| ---------------------------- | --------- | --------------------------------------- |
| `4d6 drop 1`                 | `4d6d1`   | Drop lowest 1                           |
| `4d6 drop lowest 1`          | `4d6dl1`  | Same, explicit (`low` is also accepted) |
| `4d6 drop highest 1`         | `4d6dh1`  | Drop highest 1 (`high` also accepted)   |
| `4d6 keep 3`                 | `4d6k3`   | Keep highest 3                          |
| `4d6 keep highest 3`         | `4d6kh3`  | Same, explicit (`high` also accepted)   |
| `4d6 keep lowest 1`          | `4d6kl1`  | Keep lowest 1                           |
| `5d6 keep middle 3`          |           | Keep the central band of 3 sorted dice  |
| `5d6 drop middle 1`          |           | Drop the central die (keep the outer 4) |
| `5d6 drop high 1 drop low 1` |           | Drop high 1 then low 1 (keep middle 3)  |

The single-letter shorthands `k<N>` / `d<N>` keep highest / drop lowest; the
two-letter forms `kh<N>` / `kl<N>` / `dh<N>` / `dl<N>` name the direction
explicitly. Like the single-letter forms, they take a literal count only - use
the long `keep lowest $n` keyword form for a parametric count.

For `middle`, when the pool size minus N is odd, the extra dropped die is taken
from the upper end of the sorted pool. So `5d6 keep middle 2` keeps sorted
ranks {1, 2} (the lower-middle pair), and `5d6 drop middle 2` keeps the
complement {0, 3, 4}.

Heterogeneous filterable pools are also allowed: `[d20, d12, d10] keep 2`.

## Exploding

Roll again when a trigger condition is met. Extra rolls are added as separate
dice and contribute to the reducer (typically `sum`).

| Notation                   | Shorthand | Description                  |
| -------------------------- | --------- | ---------------------------- |
| `3d6 explode on 6`         |           | Explode on exact 6, no limit |
| `3d6 explode once on 6`    |           | Explode at most once         |
| `3d6 explode twice on 6`   |           | At most twice                |
| `3d6 explode thrice on 6`  |           | At most three times          |
| `3d6 explode 3 times on 6` |           | At most N explosions         |
| `3d6 explode on 5 or more` | `3d6e5`   | Explode on 5+                |
| `3d6 explode on 2 or less` |           | Explode on 1 or 2            |
| `3d6 explode on 3..5`      |           | Explode on a value in [3, 5] |
| `3d6 explode on max`       | `3d6em`   | Explode on the die's maximum |
| `3d6 explode max`          |           | Same                         |
| `3d6 explode always on 6`  |           | Same as `explode on 6`       |

## Rerolling

Roll again when triggered, but only the last roll counts.

| Notation                  | Shorthand | Description         |
| ------------------------- | --------- | ------------------- |
| `3d6 reroll on 1`         |           | Reroll 1s, no limit |
| `d6 reroll once on 1`     |           | Reroll at most once |
| `3d6 reroll on 2 or less` | `3d6r2`   | Reroll 1s and 2s    |

## Compound exploding

Like exploding, but extra rolls are summed into the original die - one die,
higher value.

| Notation                    | Shorthand | Description               |
| --------------------------- | --------- | ------------------------- |
| `d6 compound on 6`          |           | Compound on 6             |
| `d6 compound once on 6`     |           | At most once              |
| `3d6 compound on 6 or more` | `3d6ce6`  | Compound on 6+            |
| `3d6 compound on max`       | `3d6cem`  | Compound on the die's max |
| `3d6 compound max`          |           | Same                      |

## Chaining functors with filters

Filters can chain after a functor - the filter ranks dice by their
per-original-die total (sum of an explode chain, last reroll, compound
total, or the kept-of-pair for emphasis):

```text
4d6 explode on 6 keep 3       # keep top 3 post-explode totals
4d6 reroll on 1 drop lowest 1
4d6 explode on 6 count >= 7   # successes ≥ 7 over per-die totals
5d6 explode max keep 3
4d6 reroll on 1 explode on 6 keep 3   # combined functor + filter
```

The filter must follow the functor, never precede it
(`4d6 keep 3 explode on 6` is rejected with a migration hint that
suggests the swap). Bounded-times forms (`once`, `twice`,
`N times`, `emphasis`) get exact-tier distributions. Unbounded
`always`-times also goes exact via a probability-mass cutoff
(epsilon = 1e-12); only chains that exceed the joint-enumeration
cap (e.g. `4d6 explode on 6 keep 3`, ~54M joint cells) fall back
to Monte Carlo. Combined functors (`reroll once on 1 explode once
on 6`) take the same exact path in their bounded form.

## Emphasis

Roll two dice, keep the result furthest from a center point.

| Notation                    | Description                        |
| --------------------------- | ---------------------------------- |
| `d20 emphasis`              | Furthest from average; reroll ties |
| `d20 emphasis high`         | Tie-break to higher value          |
| `d20 emphasis low`          | Tie-break to lower value           |
| `d20 emphasis reroll`       | Reroll ties (explicit)             |
| `d20 furthest from 10`      | Custom center point (reroll ties)  |
| `d20 furthest from 10 high` | Custom center, high tie-break      |
| `d20 furthest from 10 low`  | Custom center, low tie-break       |

## Dice pools / success counting

Count how many dice meet one or more thresholds. Thresholds can be expressed
with operators, English `on ...` triggers, bare numbers, or the `exactly`
keyword. Multiple thresholds may be chained with `and` so each die contributes
one success per matching threshold.

| Notation                         | Shorthand | Description                                             |
| -------------------------------- | --------- | ------------------------------------------------------- |
| `8d10 count >= 6`                | `8d10c6`  | Count successes >= 6                                    |
| `8d10 count on 6 or more`        |           | Same, English trigger form                              |
| `3d6 count == 5`                 |           | Count exact 5s (operator form)                          |
| `3d6 count exactly 5`            |           | Same, keyword form                                      |
| `3d6 count on 5`                 |           | Same, trigger form                                      |
| `3d6 count on 3..5`              |           | Count dice between 3 and 5 inclusive                    |
| `4d6 count <= 2`                 |           | Count values <= 2                                       |
| `4d6 count > 4`                  |           | Count values > 4 (canonicalises to `>= 5`)              |
| `4d6 count < 3`                  |           | Count values < 3 (canonicalises to `<= 2`)              |
| `5d10 count on 6 or more and 10` |           | Multi-step: each die counts once per matching threshold |

Threshold values may be a positive literal, a `$variable`, or a
parenthesised expression: `8d10 count >= (3d6)` rolls `3d6` once and
counts how many of the eight dice land at or above that value. This
applies generally to dice-modifier arguments: triggers, counts, and
keep/drop amounts all accept literals, variables, and parenthesised
expressions (e.g. `4d6 explode on $threshold`, `6d20 keep ($n + 1)`,
`5d10 reroll once on (d4)`).

The `and` separator only has this meaning inside a `count` reducer; outside it
remains the boolean operator.

### Multi-step counts

Chain thresholds with `and` to give each die one success per matching
threshold. For example, in `5d10 count >= 6 and == 10`, a 10 matches **both**
`>= 6` and `== 10`, so that die contributes 2 successes; a 7 only matches `>= 6`
and contributes 1; a 3 matches neither and contributes 0. Each segment must be
an explicit threshold form (operator, `exactly`, or `on`):

```
5d10 count >= 6 and == 10
4d12 count >= 6 and >= 10
8d10 count on 6 or more and on 10
```

## Variables in dice expressions

Inside a dice expression, `$var` references resolve to the variable's value
at roll time. Three positions are supported.

**Additive position** - modify a roll with a constant or computed value:

```
$mod = 5
$attack = d20 + $mod
```

**Count position** (parametric count) - roll a variable number of dice:

```
$rolls = 3
$damage = $rolls D6     # rolls 3d6
```

The variable may itself be a dice expression:

```
$rolls = d4            # rolls a d4 (1-4)
$damage = $rolls D6     # rolls between 1d6 and 4d6, summed
```

**Sides position** (parametric sides):

```
$sides = 8
$roll = 1d$sides       # rolls a d8
```

**Both positions:**

```
$n = 3
$s = 6
$roll = $n D$s          # rolls 3d6
```

Parametric forms **require whitespace** between the variable name and the
`d`/`D` (`$n d6`, `$n D6`, `$n DF`, `$n d{1,2}`). The no-whitespace form
`$nD6` is rejected - it's visually ambiguous with a long variable name. The
parser accepts either lowercase `d` or uppercase `D` after the whitespace.
The renderer emits uppercase `D` as the canonical form (`$n D6`) to
distinguish parametric heads from compact literal-count dice (`3d6`).

Counts of 0 or fewer return 0; the resolved count is capped at
`MAX_DICE_COUNT = 10000` per expression - programs that try to roll more
(e.g., `100000d6`) throw a clear runtime error. The AST holds count and
sides compactly regardless of size, so `100000d6 keep 1` parses instantly; the
runtime cap only fires when you actually roll.

Parametric counts compose with all modifiers:

```
$n = d6
$attack = $n D6 keep 1        # roll N d6, keep highest 1 (where N is itself a roll)
$pool   = $n D10 count >= 6   # variable-size dice pool
$burst  = $n D6 explode on 6  # variable-count exploding dice
$fate   = $n DF               # N Fate dice
```

## Parse errors

```ts
import { ProgramParser, suggestKeyword } from 'dicerollerts'

const r = ProgramParser.parse('3d6 explod on 6')
if (!r.success) {
  r.errors[0].message // error description
  r.errors[0].position // character offset
  r.errors[0].suggestion // e.g. "Did you mean 'explode'?"
  r.errors[0].context // surrounding input text
}

suggestKeyword('explod') // "explode"
```

---

# 2. Roller and roll results

```ts
import { Roller, RR, MAX_DICE_COUNT } from 'dicerollerts'

const roller = new Roller((sides) => Math.floor(Math.random() * sides) + 1)

// With explicit options:
const roller2 = new Roller(rollFn, {
  maxExplodeIterations: 100, // default 100
  maxRerollIterations: 100,
  maxEmphasisIterations: 100,
})

const roll = roller.roll(expr) // RollResult tree
RR.getResult(roll) // numeric result
```

`Roller` operates on the unified `Expression` AST emitted by
`ProgramParser`. The roll result is a structured tree (`RollResult`)
describing every die rolled, kept, and discarded - useful for transcript
rendering and replay.

`MAX_DICE_COUNT` is `10_000`. `Roller` enforces this for every n-dice
expression at roll time.

To roll `structured-dice-roll` expressions (`@name` references), use
`Evaluator.run` instead - `Roller` deals only with numeric results.

## Expression utilities (`DE`)

```ts
import { DE } from 'dicerollerts'

DE.toString(expr) // canonical string representation
DE.validate(expr) // null if valid, ValidationMessage[] if not
DE.calculateBasicRolls(expr) // count of dice in expression
DE.simplify(expr) // constant folding & identity elimination
```

`DE.simplify` performs algebraic simplification on a `DiceExpression`:
literal+literal folding, unary negate folding, double-negation, identity
elimination (`x + 0 → x`, `x * 1 → x`, etc.).

---

# 3. Distribution analysis on dice expressions (`DiceStats`)

```ts
import { DiceStats } from 'dicerollerts'

// Exact distribution (for expressions whose support is enumerable):
const dist = DiceStats.distribution(expr) // Map<number, number>
DiceStats.mean(expr)
DiceStats.stddev(expr)
DiceStats.min(expr)
DiceStats.max(expr)
DiceStats.percentile(expr, 50)

// Monte Carlo (for any expression):
const mc = DiceStats.monteCarlo(expr, { trials: 50000 })
mc.mean
mc.stddev
mc.min
mc.max
mc.distribution // Map<number, number> - conditional on defined outcomes
mc.percentile(75)
mc.undefinedMass // fraction of trials that produced an undefined outcome
// (division by zero); 0 if every trial was defined

// Streaming Monte Carlo with progress:
for await (const progress of DiceStats.monteCarloAsync(expr, {
  trials: 50_000,
  chunkSize: 1000,
})) {
  // progress.completed, progress.total, progress.result
}

// Summary: exact when possible (≤ 20 basic rolls), Monte Carlo fallback:
const s = DiceStats.summary(expr)
s.min
s.max
s.mean
s.stddev
s.distribution
s.percentiles // { 25, 50, 75 }
```

Exact distribution is supported for: `die`, `custom-die`, `literal`, binary
and unary ops on those, every `dice-reduce` shape (including `dice-list-with-filter`,
chained drop/keep, sums, mins, max, average, median, count), explode / reroll /
compound (both bounded `times` - `once`, `twice`, `thrice`, `N times` - and the
unbounded `always` form via a probability-mass cutoff), combined functors
(`reroll … explode …`), emphasis, and literal-count structured-dice rolls
(`dice @die [...]; 3 @die`). Parametric-count dice (`$n D6`), variable-count
`repeat $n { … }`, and chains whose joint enumeration exceeds the
five-million-cell cap fall back to Monte Carlo (or `ProgramStats`, which
can handle them in context).

---

# 4. Program Language

A small scripting language for tabletop-RPG-flavoured automation. Variables,
conditionals, records, arrays, comprehensions, and dice declarations.

```ts
import { ProgramParser, Evaluator, ProgramStats } from 'dicerollerts'

const source =
$str_mod = 5
$ac = 15
$attack = d20 + $str_mod
$hit = $attack >= $ac
$damage = if $hit then 2d6 + $str_mod else 0
{ attack: $attack, hit: $hit, damage: $damage }


const parsed = ProgramParser.parse(source)
if (!parsed.success) {
  console.error(parsed.errors)
} else {
  // Single roll:
  const evaluator = new Evaluator(
    (sides) => Math.floor(Math.random() * sides) + 1,
  )
  const result = evaluator.run(parsed.program)
  // e.g. { attack: 18, hit: true, damage: 14 }

  // Probability analysis (auto-detects strategy):
  const analysis = ProgramStats.analyze(parsed.program)
  // analysis.strategy.tier: 'constant' | 'exact' | 'monte-carlo'
  // analysis.stats: per-field distributions
}
```

## Language reference

### Statements

- **Assignment**: `$name = expr` - variables are immutable; reassigning throws.
  Names match `$[a-z_][a-z0-9_]*`.
- **Parameter declaration**: `$name is { default: ..., ... }` - see
  [Parameters](#parameters).
- **Dice declaration**: `dice @name [ ... ]` - see
  [Structured-face dice](#structured-face-dice).
- **Expression statement**: any expression. The program's final value is the
  value of the last statement.

Statements are separated by newlines. Lines starting with `#` are comments;
`#` to end-of-line is also a comment.

### Expressions

| Form                                                               | Notes                                                                                                   |
| ------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------- |
| `42`, `-3`, `true`, `false`, `"text"`                              | Literals                                                                                                |
| `d6`, `4d6`, `4d6 drop 1`, `@name`, `3 @name`                      | Dice atoms - see [Dice Expression Language](#1-dice-expression-language)                                |
| `$name`                                                            | Variable reference                                                                                      |
| `+`, `-`, `*`, `/`                                                 | Arithmetic (`/` is integer division)                                                                    |
| `… round`, `round up`, `round down`, `truncate`, `round half even` | Postfix rounding - see [Rounding](#rounding)                                                            |
| `==`, `!=`, `<`, `<=`, `>`, `>=`                                   | Comparisons                                                                                             |
| `and`, `or`, `not`                                                 | Booleans                                                                                                |
| `if cond then a else b`                                            | Conditional (else required)                                                                             |
| `match { ... }`, `match VALUE { ... }`                             | Pattern matching - see [Match](#match-expressions)                                                      |
| `{ key: value, ... }`, `{ $var }` (shorthand)                      | Records                                                                                                 |
| `$rec.field`                                                       | Field access                                                                                            |
| `[1, 2, 3]`, `[d6, d8]`                                            | Array literal (also acts as a dice pool with a trailing modifier)                                       |
| `[...$arr, d6]`, `[...3d6]`                                        | Spread element - inline a list into an array literal                                                    |
| `$arr[0]`, `$arr[1:3]`, `$arr[:-1]`, `$arr[-3:]`                   | Indexing and slicing - see [Slicing](#array-slicing)                                                    |
| `repeat N { body }`                                                | Returns array of N body evaluations (literal `N` is exact-analyzable; variable `$n` forces Monte Carlo) |
| `for $x in arr { body }`                                           | Comprehension - see [Comprehensions](#comprehensions-and-fold)                                          |
| `<aggregator> for ...`                                             | Aggregator-prefixed comprehension                                                                       |
| `argmin`, `argmax`                                                 | Sugar for sort + slice (returns 1-element array)                                                        |
| `fold arr from init ($acc, $x) -> body`                            | Generic left fold                                                                                       |

`+` is overloaded: when either operand is a string, it concatenates; otherwise
numeric.

### Match expressions

Replace nested if-else chains with a flat form. Two modes are supported.

**Guard mode** (no value after `match`): picks the first arm whose pattern is
truthy.

```
$damage = match {
  $crit -> 4d6 + 1d4 + 4
  $hit  -> 2d6 + 4
  _     -> 0
}
```

**Value mode** (value after `match`): compares the matched value against each
arm's pattern using `==`.

```
$roll = match $roll_mode {
  "advantage"    -> 2d20 keep highest 1
  "disadvantage" -> 2d20 keep lowest 1
  _              -> d20
}
```

Arms can carry an optional `if guard` clause, evaluated only when the pattern
matches:

```
$damage = match $weapon {
  "sword" if $crit  -> 4d6
  "sword"           -> 2d6
  "dagger" if $crit -> 2d4
  "dagger"          -> 1d4
  _                 -> 0
}
```

- `_` is the wildcard (matches anything; use it as a catch-all).
- Arms are separated by newlines or commas; trailing separators are allowed.
- The first matching arm wins; remaining arms are not evaluated.
- A trailing `_ -> default` is recommended - non-matching trials throw a
  runtime error.

### Comprehensions and fold

Iterate over arrays with optional filtering, optionally collapsing via an
aggregator prefix.

```
# Map: returns an array.
for $x in [1, 2, 3] { $x * 2 }            # [2, 4, 6]

# Map + filter.
for $x in [1, 2, 3, 4] if $x > 2 { $x }   # [3, 4]

# Aggregator prefix collapses to a single value.
sum     for $x in [1, 2, 3] { $x }                # 6
product for $x in [2, 3, 4] { $x }                # 24
max     for $x in [3, 1, 5, 2] { $x }             # 5
min     for $x in [3, 1, 5, 2] { $x }             # 1
average for $x in [2, 4, 6] { $x }                # 4
count   for $x in [1, 2, 3, 4, 5] if $x > 2       # 3 (no body - count only)
first   for $x in [1, 2, 3] if $x > 1             # 2 (short-circuits)
last    for $x in [1, 2, 3] if $x > 1             # 3
```

Aggregator body rules:

- `count`: body is forbidden (use `count for $x in arr if cond`).
- `first` / `last`: body is optional (defaults to the element itself).
- All others (and the no-aggregator form): body is required.

`sum` over records does field-wise sum (matching structured-face dice
semantics):

```
sum for $r in [{a: 1, b: 2}, {a: 3, c: 4}] { $r }
# {a: 4, b: 2, c: 4}
```

The `sort` aggregator returns the elements in sorted order (ascending by
default) rather than collapsing them. Optional `by` (sort key) and explicit
`asc` / `desc`:

```
sort for $x in [3, 1, 2] { $x }                          # [1, 2, 3]
sort for $x in [3, 1, 2] desc { $x }                     # [3, 2, 1]
sort for $r in $records by $r.priority { $r }            # records sorted by priority
sort for $r in $records by $r.priority desc { $r }       # descending
sort for $r in $records if $r.active by $r.priority desc { $r } # filter + by + desc
```

`argmin` and `argmax` are sugar for `sort` + slice `[0:1]`. They return a
single-element array containing the element with the smallest / largest key:

```
argmin for $r in $records by $r.priority { $r }   # [record_with_min_priority]
argmax for $r in $records by $r.priority { $r }   # [record_with_max_priority]
argmin for $x in [3, 1, 2] { $x }                 # [1]
```

Both forms accept a `by` clause and follow the same rules as `sort`. They
always sort in their natural direction (`argmin` ascending, `argmax`
descending) - explicit `asc` / `desc` is rejected.

For arbitrary combining logic, use `fold`:

```
fold [1, 2, 3, 4] from 0 ($acc, $x) -> $acc + $x          # 10
fold [{a: 1}, {a: 2}, {a: 3}] from {total: 0}
  ($acc, $x) -> {total: $acc.total + $x.a}                # {total: 6}
```

Comprehension and fold introduce lexical scopes - inner `$x`, `$acc`, etc.
shadow outer bindings of the same name within the body.

### Array slicing

Arrays support Python-style half-open slice notation with optional negative
indices (counted from the end):

```
$arr[1:3]    # elements at indices 1, 2 (not 3)
$arr[:-1]    # all but the last element
$arr[-3:]    # the last 3 elements
$arr[:]      # full copy
```

Slices clamp out-of-range bounds rather than erroring. Slicing and indexing
can be chained: `$arr[:3][0]` returns the first element of the first three.
Step (`[a:b:c]`) is not supported. `$arr[]` is rejected.

### Spread elements

`[...expr]` flattens a list-valued expression into the enclosing array
literal. Three operand shapes are lifted to lists automatically:

```
[...$arr, 99]            # $arr is already a list; concat with 99
[...3d6]                 # 3d6 is normally a scalar sum - spread lifts
                         # it to the three individual rolls
[...3d6 sum]             # explicit sum can also be lifted to its pool
[...repeat 4 { d6 }, 0]  # repeat produces a list; spread inlines it
```

Bare-scalar values (`[...5]`, `[...d20 + 3]`) are rejected - spread is
list-only. Inside dice-pool position (`[...3d6] keep 3`), the spread is
equivalent to writing the elements out by hand.

### Structured-face dice

Define a die whose faces are records, then roll N of them with field-wise sum.
Useful for game systems where each die face yields multiple symbols (Descent,
Star Wars FFG, Genesys, etc.).

```
dice @attack [
  { damage: 1 },
  { damage: 2, range: 1 },
  { surge: 1 },
  { damage: 1, range: 1 },
  { damage: 0 },
  { damage: 2 }
]

# Bare reference rolls one face. The result carries the die's full declared
# field set, with fields the rolled face lacks defaulting to 0.
$one = @attack            # e.g. {damage: 2, range: 1, surge: 0}

# With a count, rolls N and field-wise sums (missing fields default to 0).
$total = 3 @attack        # e.g. {damage: 4, range: 1, surge: 1}

# Parametric count.
$n is { default: 5 }
$variable = $n @attack
```

Numeric-faced dice are also supported and sum numerically:

```
dice @loaded [ 6, 6, 6, 6, 6, 1 ]
100 @loaded               # heavily biased toward 6
```

Rules:

- Faces must be all-numeric or all-record (no mixing).
- Record-face values must be integers (no decimals, strings, booleans, or
  nested records).
- Every record roll (any count `>= 1`, including a bare `@d`) carries the
  field union of every field that appears in any declared face; a field
  missing from a particular rolled face contributes 0. This makes `@d` and
  `N @d` field-set-consistent, and means a mixed pool built with
  `sum for $d in [@a, @b] { $d }` can be read per symbol (`$pool.x`) without a
  `default` for any symbol whose source die is in the pool. (A `count == 0`
  roll has no schema to project and stays the empty record `{}`.)
- Structured-dice rolls cannot be combined with arithmetic or other dice in
  the same expression - keep them in their own statement (assign to a
  variable, or place inside a record) and combine at the program level.
- Each `dice @name` may be declared once per program; redeclaration throws.

See [docs/structured-dice.md](docs/structured-dice.md) for the trace shape
(`StructuredDiceRollResult`), the `flatStructuredFaces` walker for
rendering each die independently, and end-to-end consumer patterns.

#### Reading symbols from a mixed pool

Different structured dice can't be combined in one expression, so a mixed
pool (e.g. an FFG / Genesys narrative-dice roll) is built by summing a
comprehension over the dice, then read per symbol:

```
$pool = sum for $d in [@proficiency, @difficulty, @difficulty] { $d }
$pool.success - $pool.failure
```

Because every roll carries its die's full declared schema, any symbol whose
**source die is in the pool** reads directly and defaults to 0 when it wasn't
shown — `$pool.triumph` is `0` on a roll where no triumph face came up, not an
error. A symbol whose source die is **not** in the pool is still absent and
throws `Record has no field 'despair'`. Use the **`default`** operator there
(and anywhere you want an absent field, out-of-range index, or undefined
outcome to fall back) to read it as 0:

```
{
  net_success: $pool.success - $pool.failure,  # both source dice in the pool
  triumphs:    $pool.triumph,                  # 0 when no triumph face came up
  despairs:    $pool.despair default 0,        # 0 — no despair die in this pool
}
```

`primary default fallback` yields `primary`, or `fallback` when `primary`
has **no value** — a missing record field, an out-of-range index
(`$xs[99] default 0`), or an undefined outcome such as division by zero
(`(a / b) default 0`). Every other error (type errors, undefined variables)
still propagates, so strict access remains the default everywhere else. It
binds tighter than arithmetic, so `$a.x default 0 - $b.y default 0` groups as
`($a.x default 0) - ($b.y default 0)`. `default` is exact in `ProgramStats`
analysis, not only at roll time.

### Parameters

Declare a variable as a parameter with `is { ... }` to give it a default value
plus optional metadata. Tools can introspect the program to render UI inputs;
callers can override defaults at runtime.

```
$str_mod is {
  default: 5,
  min: 0,
  max: 30,
  label: "STR Modifier",
  description: "Your strength bonus",
}

$weapon is {
  default: "longsword",
  enum: ["longsword", "dagger", "greataxe"],
}

$advantage is { default: false }

$attack_die is { default: d20 }

$attack = $attack_die + $str_mod
$hit = $attack >= 15
{ attack: $attack, hit: $hit }
```

Field rules:

- `default` (required): a number, boolean, string, or dice expression.
- `label`, `description`: string literals. Either `"double"` or `'single'`
  quotes work; both accept the usual `\n`, `\t`, `\\` escapes and let the
  other quote appear unescaped (`"can't"`, `'he said "hi"'`).
- `min`, `max`: only valid when the default is a number, or a dice expression.
- `enum`: only valid when the default is non-number; entries must match the
  default's primitive type and the default must be a member of the enum.
- Dice-expression defaults may not reference structured-face dice (`@name`)
  - use a numeric default.
- A parameter may not be reassigned with `=`.

Override at runtime:

```ts
evaluator.run(program, { parameters: { str_mod: 7, weapon: 'dagger' } })
ProgramStats.analyze(program, { parameters: { str_mod: 7 } })
```

Overrides are validated (unknown name, type mismatch, out of range, not in
enum) and throw clear errors. Without overrides, literal defaults act as
constants and dice-expression defaults are rolled per execution (or analysed
exactly).

Introspect for UI:

```ts
import { ProgramParameters } from 'dicerollerts'

const params = ProgramParameters.list(program)
// Parameter[] = [{
//   name, default?, defaultExpression?,
//   label?, description?, min?, max?, enum?
// }, ...]
```

The input kind is inferred by the consumer from the data: number with `min`
and `max` → bounded slider; string with `enum` → dropdown; boolean → toggle;
dice expression default → free-form expression input.

For consumers that want a canonical version of that dispatch rather than
re-implementing it, `classifyParameter` returns a tagged `ParameterKind`:

```ts
import { ProgramParameters, classifyParameter } from 'dicerollerts'

ProgramParameters.list(program).map(classifyParameter)
// ParameterKind[] - { kind: "boolean" } | { kind: "string-enum"; choices }
//                 | { kind: "number-enum"; choices }
//                 | { kind: "number-range"; min, max, default }
//                 | { kind: "number"; default } | { kind: "string"; default }
//                 | { kind: "expression" }
```

Precedence: dice-expression default wins, then boolean, then enum (string
or number), then number-range (both `min` and `max` present), then plain
number, then string. The classifier never throws; ambiguous specs collapse
to the most informative kind.

### Reserved words

Cannot be used as record keys or identifiers: `if`, `then`, `else`, `true`,
`false`, `and`, `or`, `not`, `repeat`, `is`, `match`, `dice`, `for`, `in`,
`fold`, `from`, `by`, `asc`, `desc`, `argmin`, `argmax`. The wildcard `_` is
also reserved.

---

# 5. Evaluator

```ts
import { Evaluator } from 'dicerollerts'

const evaluator = new Evaluator(
  (sides) => Math.floor(Math.random() * sides) + 1,
  {
    maxRepeatIterations: 10000, // default 10000
    onDiceTerm: (event) => {
      // event.kind is 'arithmetic' | 'structured'
      // event.node is the lifted dice AST node (Die / NDice / DiceReduce / ...
      // or StructuredDiceRoll for the structured case)
      // event.result is a RollResult tree (arithmetic) or
      // StructuredDiceRollResult (structured)
    },
  },
)

evaluator.run(program)
evaluator.run(program, { parameters: { str_mod: 7 } })
```

## Tokenization

For syntax highlighting and editor integration, the package exports a
`tokenize(source)` function that classifies a source string into a flat
array of position-tagged tokens.

```ts
import { tokenize } from 'dicerollerts'

const tokens = tokenize('[d20 + $mod, d20 + $mod] max')
// → [{ kind: 'punct', text: '[', start: 0, end: 1 },
//    { kind: 'dice',  text: 'd20', ... },
//    { kind: 'whitespace', text: ' ', ... },
//    { kind: 'operator', text: '+', ... },
//    { kind: 'variable', text: '$mod', ... },
//    ...
//    { kind: 'reducer', text: 'max', ... }]
```

Properties:

- **Total coverage.** Every character of `source` is covered;
  `tokens.map(t => t.text).join('')` always equals `source`. Whitespace
  and comments (`# ...`) are first-class tokens.
- **Context-aware classification.** The same word lands in different
  `kind`s depending on grammatical role:
  - `max` in `[a, b] max` → `reducer`;
  - `max` in `max for $x in $xs { $x }` → `aggregator`;
  - `max` in `{ max: 5 }` (plain record) → `identifier`;
  - `max` in `$max` → part of `variable`;
  - `max` in `$x is { max: 10 }` → `param-field`.
- **Error recovery.** `tokenize` never throws and never returns an empty
  array for non-empty input - unrecognized runs are emitted as `error`
  tokens so half-typed editor buffers keep producing useful output.
- **Shorthand splitting.** Dice modifier shorthands split into a role
  token plus a number: `4d6k3` → `dice("4d6") filter("k") number("3")`.

The full set of token kinds:

| Kind             | Description                                            |
| ---------------- | ------------------------------------------------------ |
| `whitespace`     | runs of spaces / tabs / newlines                       |
| `comment`        | `#` to end of line                                     |
| `string`         | `"..."` or `'...'` (symmetric quotes)                  |
| `number`         | integer or decimal, including unary-minus signed forms |
| `dice`           | `d6`, `2d20`, `dF`, `d%`, `d{1,2,3}`                   |
| `variable`       | `$name` (the `$` is part of the token)                 |
| `identifier`     | bare identifier in non-keyword position                |
| `structured-die` | `@name` (the `@` is part of the token)                 |
| `keyword`        | `if`, `then`, `match`, `for`, `is`, …                  |
| `reducer`        | `sum`, `max`, … in `[...] reducer` position            |
| `aggregator`     | `sum`, `count`, `sort`, … in `... for $x in` position  |
| `functor`        | `explode`, `reroll`, … and shorthands `e`, `r`, `ce`   |
| `filter`         | `keep`, `drop`, … and shorthands `k`, `d`, `kh`, `kl`  |
| `param-field`    | `default`, `min`, `max`, … inside `$x is { ... }`      |
| `operator`       | `+ - * / == != < <= > >= = -> and or not × ⋅ ÷`        |
| `punct`          | `( ) [ ] { } , . : ; %`                                |
| `error`          | unrecognized run; tokenizer skipped to keep going      |

`Token` and `TokenKind` are exported as types. Adding a new `TokenKind`
is a minor-version bump; renaming or removing one is breaking. Consumers
can default-case unknown kinds to plain rendering.

## Hooks

`onDiceTerm` fires once per dice term the evaluator executes, in evaluation
order. It fires:

- once per iteration inside `repeat`;
- once per passing element inside a comprehension or fold;
- only for the taken branch of an `if` / `match`;
- for parameter dice defaults (rolled when no override is provided).

The callback receives a tagged `DiceTermResult` event:

- `event.kind === 'arithmetic'`: ordinary dice term (`die`, `n-dice`,
  `custom-die`, `dice-reduce`). `event.result` is a `RollResult` tree.
- `event.kind === 'structured'`: `structured-dice-roll` expression.
  `event.result` is a `StructuredDiceRollResult` carrying `name`, per-die
  `draws`, and the field-wise sum.

`event.node` is the exact AST node from the parsed program - stable across
runs.

`onScope` is a separate callback that fires at scope boundaries -
iteration boundaries (comprehensions, `repeat`, `fold`) and conditional
branch boundaries (`if`, `match`). Consumers that walk the AST in
lockstep with execution - visualizers, debuggers, the dicerun2
inline-math walker - use it to know which iteration or branch they're
in without re-implementing the evaluator's dispatch semantics:

```ts
new Evaluator(rng, {
  onDiceTerm: (e) => {
    /* dice events */
  },
  onScope: (e) => {
    switch (e.kind) {
      case 'iteration-enter':
        /* { node, index, total, element?, accumulator? } */ break
      case 'iteration-exit':
        /* { node, index } */ break
      case 'filter-skip':
        /* { node, index, element } - comprehension filter rejected */ break
      case 'sort-reorder':
        /* { node, sourceToSorted } - emitted once per sort */ break
      case 'branch-enter':
        /* { node, branch, scrutinee? } - if-then / if-else / match-arm */ break
      case 'branch-exit':
        /* { node } - pair with the matching branch-enter */ break
    }
  },
})
```

`element` carries the bound element (comprehensions + fold), `accumulator`
the running fold state. For `sort` comprehensions, body events fire in
source order - `sort-reorder.sourceToSorted` is the permutation
(`sorted[i] === source[sourceToSorted[i]]`), so consumers render the rows
in sorted order without re-running the sort.

For `branch-enter`, `branch` is a tagged union: `{ kind: 'if-then' }`,
`{ kind: 'if-else' }`, or `{ kind: 'match-arm', armIndex, isWildcard }`
(where `armIndex` is the source-order position of the matched arm and
`isWildcard` is `true` for `_`). `scrutinee` carries the matched value
for value-mode `match VAL { … }`; it is **absent** (not `undefined`) for
`if` and for guard-mode `match { … }`. Use `'scrutinee' in event` as the
value-mode discriminator - the absence pattern survives JSON round-trips.

Branch events nest with iteration events on a shared depth stack. The
canonical consumer pattern is: push on every enter, pop on every exit,
and the stack's depth at any dice event is that die's "open conditional
depth" - useful for animation staggering, debugger frame indicators,
and conditional-frequency analysis. Condition / scrutinee dice fire on
`onDiceTerm` _before_ the matching `branch-enter`; arm body dice fire
between `branch-enter` and `branch-exit`. Both callbacks fire
synchronously from the same evaluator, so push order is evaluation
order - subscribe to both and append to one buffer for a merged stream.

If an arm body throws, `branch-exit` does not fire (the event log is
left mid-scope, mirroring the iteration convention). A `match` whose
arms all fail raises `No match arm fired` before any `branch-enter`,
so the stack stays balanced.

`recordRun` includes scope events in `log.events` only when called with
`{ captureScopeEvents: true }` (default `false`, so existing replay
consumers stay unchanged).

`onAssignment` is a third callback that fires once per execution of an
`Assignment` statement, after the RHS resolves. Drives Monaco-style
inlay hints and any other "show me the resolved value of `$name = …`"
tooling that can't run the program client-side because the RHS contains
random dice:

```ts
const values = new Map<Assignment, Value>()
new Evaluator(rng, {
  onAssignment: (e) => values.set(e.node, e.value),
}).run(program)
// values keyed by AST node identity - last write of each Assignment wins
```

Fires **once per execution**, so the same assignment inside `repeat N`
fires N times with iteration-distinct values. Consumers that want one
value per source location should key by `event.node` (or
`event.node.loc.offset`) and use last-write-wins - keying by
`event.node.name` is a footgun because different scopes can declare the
same name with distinct AST nodes. Dice events in the RHS fire on
`onDiceTerm` before the matching `onAssignment` event. Does NOT fire
for parameter declarations (`$x is { … }`) or comprehension / fold
binders.

## Replay / audit log

```ts
import {
  recordRun,
  replayRun,
  logToJSON,
  logFromJSON,
  ReplayDesyncError,
  type ExecutionLog,
} from 'dicerollerts'

const { value, log } = recordRun(program, rollFn, {
  parameters: { x: 5 },
  maxRepeatIterations: 1000,
})

// log.draws: number[] - every value returned by rollFn, in evaluation order
// log.events: ExecutionEvent[] - one entry per dice-expr / structured-dice-roll

// Replay using only the recorded draws (rollFn is not called):
const { value: replayed } = replayRun(program, log)
// throws ReplayDesyncError if log under- or over-supplies draws

// JSON round-trip (Maps are not used in the log; standard JSON.stringify works):
const json = logToJSON(log)
const restored = logFromJSON(json)
```

`recordRun` and `replayRun` accept the same `parameters` and
`maxRepeatIterations` options as `Evaluator.run`. Branching, comprehensions,
and `repeat` are all handled - only the draws actually consumed during the
recording are stored, and replaying takes the same branches deterministically.

---

# 6. Probability analysis on programs (`ProgramStats`)

`ProgramStats.analyze()` picks one of three strategies:

- **`constant`** - no randomness, single evaluation.
- **`exact`** - covers single dice expressions, comparisons, conditionals,
  arithmetic on independent distributions, shared variables (joint
  enumeration), boolean ops, sort / slice / index over shared rolls,
  comprehensions over `sum` and `sort`, `if`-driven discriminated outputs
  (per-variant conditional stats), and more.
- **`monte-carlo`** - adaptive batched simulation; stops when per-bin
  frequencies stabilise.

```ts
import { ProgramStats } from 'dicerollerts'

const result = ProgramStats.analyze(program, {
  minTrials: 1000, // initial batch
  maxTrials: 100000, // cap (default 100000)
  batchSize: 1000, // batch increment (default 1000)
  targetRelativeError: 0.01, // 1% convergence target
  parameters: { x: 5 }, // optional overrides
  signal, // optional AbortSignal for cancellation
})

result.strategy.tier // 'constant' | 'exact' | 'monte-carlo'
result.strategy.trials // actual trials run (Monte Carlo only)
result.strategy.converged // hit target before maxTrials
result.stats // FieldStats (see below)
result.diagnostics // { classifyTimeMs, analyzeTimeMs, fellBackToMC, jointSizeMax? }

// Manual classification:
ProgramStats.classify(program) // Tier
ProgramStats.classify(program, { parameters: { x: 5 } }) // Tier
```

### Per-binding distributions

Pass `perBinding: true` to additionally get the marginal distribution of
every top-level `$assignment` and `$parameter`:

```ts
const result = ProgramStats.analyze(program, { perBinding: true })

for (const [name, binding] of result.perBinding!) {
  // binding.kind: 'assignment' | 'parameter'
  // binding.stats: FieldStats - same shape as result.stats
}
```

Loop binders (`for $x in …`) and nested-scope assignments are not
included - they don't reduce to a single distribution. Cost is free in
the `exact` tier (the SymDists already exist), light in `constant`, and
linear in trials × bindings for `monte-carlo` (every sampled value is
retained until aggregation). In the rare case where the exact tier
can't represent a binding's joint, that binding is _omitted_ from the
map; the same program under `monte-carlo` captures it via per-trial
hooks.

## FieldStats

`FieldStats` is a discriminated union covering the shape of the program's
output:

- **`number`**: `mean`, `stddev`, `variance`, `mode`, `min`, `max`,
  `distribution: Map<number, number>`, `cdf: Map<number, number>`,
  `percentiles` (p5/p10/p25/p50/p75/p90/p95), `skewness`, `kurtosis`,
  optional `standardError` (Monte Carlo only).
- **`partial-number`**: every field of `number` plus `undefinedMass:
number` (strictly in `(0, 1)`). Produced when some outcomes are
  undefined (currently only division by zero, e.g. `d6 / (d6 - 1)`).
  The numeric stats are CONDITIONAL on the outcome being defined and
  `distribution` sums to 1; multiply by `(1 - undefinedMass)` to
  recover absolute probability mass.
- **`undefined`**: every outcome is undefined (e.g. `1 / 0`). No
  numeric stats - the field has just `type: 'undefined'`.
- **`boolean`**: `pTrue` (probability in `[0, 1]`), optional `standardError`.
- **`string`**: `frequencies: Map<string, number>`, optional per-bucket
  `standardErrors`.
- **`array`**: `elements: FieldStats[]`, optional `aggregate: NumberAggregateStats`
  (pooled stats when all elements are numeric), optional
  `joint: Map<string, number>` (full joint distribution over the array values),
  optional `jointTruncated: boolean`, optional
  `jointStructured: JointDistribution` (decoded form - see below).
- **`record`**: `fields: Record<string, FieldStats>`, optional
  `joint: Map<string, number>`, optional `jointTruncated: boolean`, optional
  `jointStructured: JointDistribution`.
- **`discriminated`**: `discriminator: 'kind' | 'shape'`, `variants: DiscriminatedVariant[]`.
- **`mixed`**: the output isn't of a single uniform kind across trials.

`NumberAggregateStats` (used by `array.aggregate`): `mean`, `stddev`, `min`,
`max`, `distribution`, `cdf`, `percentiles`, `count`.

## Joint distributions

When a record or array's joint distribution can be computed, `joint` is a
`Map<string, number>` keyed by **plain JSON-encoded values** (no internal
sentinel - decode with `JSON.parse`). Per-position / per-field marginals are
always available via `elements[i]` / `fields[k]`.

```ts
const result = ProgramStats.analyze(program)

if (result.stats.type === 'array' && result.stats.joint) {
  for (const [key, prob] of result.stats.joint) {
    const values = JSON.parse(key) // e.g. [1, 6]
    console.log(values, prob)
  }
}

if (result.stats.type === 'record' && result.stats.joint) {
  for (const [key, prob] of result.stats.joint) {
    const record = JSON.parse(key) // e.g. {x: 4, y: 3}
  }
}
```

### `jointStructured` - decoded tuples, no key parsing

`jointStructured` is the same distribution as `joint`, decoded once so you
never `JSON.parse` a key and numeric ordering / binning survive. It is
populated wherever `joint` (or `jointTruncated`) is, with one
`dimensions` descriptor per field / position in **canonical (sorted) order**
and one `outcomes` entry per joint cell:

```ts
interface JointDistribution {
  dimensions: ReadonlyArray<{ name: string }> // field name, or "0","1",… for arrays
  outcomes: ReadonlyArray<{ values: Value[]; p: number }> // values aligned to dimensions
  truncated?: boolean // mirrors jointTruncated: outcomes is then empty
}

const result = ProgramStats.analyze(program) // `{ atk: 1d20, dmg: 2d6 }`
const js = result.stats.type === 'record' && result.stats.jointStructured
if (js) {
  const atk = js.dimensions.findIndex((d) => d.name === 'atk')
  // Numbers stay numbers - pivot / sort / bin directly, no key parsing.
  for (const { values, p } of js.outcomes) {
    console.log(values[atk], p) // e.g. 17, 0.0023
  }
}
```

When the joint is truncated, `jointStructured.truncated` is `true`,
`outcomes` is empty, and `dimensions` still lists the complete field /
position set. The legacy `joint: Map<string, number>` is unchanged and
remains available alongside it.

The `joint` is omitted and `jointTruncated: true` is set when the joint
exceeds the relevant cap:

| Cap                     | Default   | Used for                                                                          |
| ----------------------- | --------- | --------------------------------------------------------------------------------- |
| `MAX_JOINT_SIZE`        | 300,000   | Most exact-tier joint distributions                                               |
| `MAX_REPEAT_JOINT_SIZE` | 300,000   | `repeat-expr` arrays (e.g. `repeat 7 { d6 }` = 6^7)                               |
| `MAX_SORT_JOINT_SIZE`   | 500,000   | Sum-over-sorted-slice path                                                        |
| `MAX_COMPOSITION_COUNT` | 1,000,000 | Multinomial-composition enumerator for homogeneous keep/drop (`9d6 keep 1`, etc.) |

These caps are exported from `sym-dist.ts` and surfaced via the analysis
pipeline; they're not part of the public API as `import` symbols, but their
values match the documented behaviour above.

## Discriminated outputs

When a program's output can take multiple record shapes, use a `kind` field
to discriminate variants. The analyser detects this and produces per-variant
statistics:

```
$hit = d20 + 5 >= 15
if $hit
  then { kind: "hit", damage: 2d6 + 3 }
  else { kind: "miss", margin: 0 }
```

The result is a `discriminated` `FieldStats` with one entry per `kind` value,
each containing the probability of that variant and the marginal stats for
its fields. The `kind` field itself is constant within each variant and is
not included in the per-variant stats.

If trials produce records with different keys but no `kind` field, the
analyser falls back to grouping by key set (`discriminator: 'shape'`). Prefer
the `kind` convention for consistent UI rendering.

`DiscriminatedVariant` shape:

```ts
interface DiscriminatedVariant {
  tag: string // kind value or shape signature
  probability: number // share of trials matching this variant
  standardError?: number // Monte Carlo only
  keys: string[] // field names in this variant (excluding kind)
  fields: Record<string, FieldStats> // marginal stats per field, conditional on variant
}
```

### Conditional field stats

When a variant's field references the same dice as the discriminating
condition, the field's stats are computed _conditional on that variant being
chosen_:

```
$attack = d20
if $attack >= 11
  then { kind: "hit", attack: $attack }
  else { kind: "miss", attack: $attack }
```

The `hit` variant's `attack` field has distribution {11..20} (each 1/10), not
the unconditional {1..20}. Same for `miss`: {1..10}. This works for arbitrary
if-then-else ladders and is computed exactly when feasible. Very large joint
distributions fall back to Monte Carlo with the same shape.

## Diagnosing tier choices: `explainTier`

When a program lands in `monte-carlo` and you want to know why, call
`ProgramStats.explainTier`:

```ts
const explanation = ProgramStats.explainTier(program)
// explanation.tier: 'constant' | 'exact' | 'monte-carlo'
// explanation.reason: string (top-level summary)
// explanation.contributors: TierExplanationContributor[] | undefined

// Each contributor:
// {
//   location?: string,   // e.g. 'statement 3'
//   nodeType: string,    // e.g. 'repeat-expr'
//   cause: string,       // why this node prevented exact analysis
// }
```

For exact / constant tiers, `contributors` is empty. For Monte Carlo, the
output identifies the AST nodes that blocked exact analysis with short
heuristic causes (e.g. "repeat with non-constant count goes Monte Carlo",
"match has no exhaustive default arm", "dice expression contains a variable
reference"). The output is informational, not a stable parseable contract.

Optional parameter overrides apply just like in `analyze`:

```ts
ProgramStats.explainTier(program, { x: 4 })
```

## Async analysis with cancellation

```ts
const controller = new AbortController()

for await (const progress of ProgramStats.analyzeAsync(program, {
  signal: controller.signal,
  yieldEvery: 1000, // emit progress every N trials (defaults to batchSize)
  yieldEveryMs: 150, // OR every N wall-clock ms (wins when both are set)
  convergenceTimeout: 2000, // soft wall-clock cap; maxTrials still applies
  minTrials: 1000,
  maxTrials: 100000,
  batchSize: 1000,
  targetRelativeError: 0.01,
})) {
  // progress.stats: FieldStats - running snapshot
  // progress.trials: number - trials completed so far (0 for non-MC tiers)
  // progress.converged: boolean - true once threshold met
  // progress.fieldConvergence?: FieldConvergence[] - per-field breakdown
  // progress.perBinding?: ReadonlyMap<string, BindingStats>
  //   - present iff `perBinding: true` was passed; rebuilt each yield
  updateUI(progress)
}
```

For `constant` and `exact` tiers, `analyzeAsync` yields a single final
snapshot with `converged: true`. For Monte Carlo, it yields snapshots in
chunks. Calling `controller.abort()` causes the next `await` to throw an
`AbortError`.

`yieldEveryMs` gates progress by wall-clock rather than trial count and is
preferred when both are set - snapshot cost grows as new distinct outcomes
are sampled, so a fixed trial gate produces non-linear pauses on long runs.
`convergenceTimeout` caps the loop's wall-clock budget; the final progress
carries `converged: false` when the timeout fires and `fieldConvergence[]`
shows which fields are still noisy.

`fieldConvergence[]` is populated on the Monte Carlo path. Each entry is:

```ts
interface FieldConvergence {
  path: string // "", "damage", "Hit", "Hit.damage", "0" (array idx)
  converged: boolean
  samples: number // total trials for the root; fewer inside variants
  relativeError: number | null // null for mixed-type / structural nodes
}
```

For discriminated records, the variant tag is a path segment that names the
variant's frequency entry; per-variant fields nest below it (e.g. `Hit` is
the variant probability, `Hit.damage` is the damage field within Hit-tagged
trials).

## Comparing two programs

`ProgramStats.compare(a, b, options?)` analyses both programs and, when both
produce a numeric output, returns a `numeric` summary combining the two
distributions:

```ts
const result = ProgramStats.compare(programA, programB)

result.a.stats // AnalyzeResult for programA
result.b.stats // AnalyzeResult for programB

result.numeric?.probabilityAGreaterThanB
result.numeric?.probabilityAEqualsB
result.numeric?.probabilityALessThanB
result.numeric?.totalVariationDistance
result.numeric?.klDivergenceAFromB // may be Infinity when supports disjoint
result.numeric?.klDivergenceBFromA
result.numeric?.meanDiff // mean(a) - mean(b)
result.numeric?.stddevDiff // stddev(a) - stddev(b)
```

When either program's output is non-numeric (record, array, boolean, string,
mixed), the `numeric` field is omitted; the per-program analyses are still
available.

## Stats utilities

```ts
import {
  fieldStatsToJSON,
  fieldStatsFromJSON, // JSON round-trip (Maps preserved)
  totalVariationDistance,
  klDivergence, // distribution comparison
  probabilityGreaterThan,
  probabilityLessThan,
  probabilityEqual, // P(X op Y) for independent X, Y
  boxPlotData, // quartiles / whiskers / outliers
  sampleFromDistribution, // sample n values via inverse-CDF
  fieldFromRecord,
  elementFromArray, // accessor helpers
  suggestBucketSize,
  binDistribution, // chart helpers (number bucketing)
} from 'dicerollerts'

const bucketSize = suggestBucketSize(stats.min, stats.max, 100)
const binned = binDistribution(stats.distribution, bucketSize)

const bp = boxPlotData(distribution)
// bp.min, bp.q1, bp.median, bp.q3, bp.max
// bp.iqr, bp.lowerWhisker, bp.upperWhisker, bp.outliers
```

`fieldStatsToJSON` / `fieldStatsFromJSON` preserve Maps (encoded as
`{ __map: [[k, v], ...] }` internally) so the full FieldStats tree survives a
JSON round-trip.

---

# 7. Language metadata (for editor integrations)

Structured, JSON-serialisable description of the dice language's surface
vocabulary - keywords, aggregators, dice reducers/functors/filters, operators,
and `@param` field names. Designed for Monaco/Monarch tokenizer rules,
completion providers, hover providers, and `--help` generators.

```ts
import {
  LANGUAGE_METADATA_VERSION,
  LANGUAGE_ENTRIES, // tagged-union flat list of every entry
  KEYWORDS,
  AGGREGATORS,
  DICE_REDUCERS,
  DICE_FUNCTORS,
  DICE_FILTERS,
  OPERATORS,
  PARAM_FIELDS, // per-kind filtered views
  ALL_TOKENS, // deduped, sorted list of every token the tokenizer should recognise
  lookupEntry, // (token) => Entry | null (matches name, operator symbol, or shorthand)
} from 'dicerollerts'

LANGUAGE_METADATA_VERSION // '1'

const explode = lookupEntry('explode')
// { kind: 'dice-functor', id: 'fn-explode', name: 'explode',
//   shorthands: ['e'], summary: '...', details: '...', examples: [...] }

lookupEntry('==') // { kind: 'operator', id: 'op-eq', ... }
lookupEntry('k') // { kind: 'dice-filter', id: 'fl-keep', ... }
```

Each entry carries a stable `id` (safe for lookup tables across versions), a
`summary` for completion list details, `details` for hover popups, and
`examples` that are guaranteed to parse - a round-trip test in
`test/language-metadata.spec.ts` runs every example through `ProgramParser`
on every CI build, so the metadata cannot drift away from the parser without
breaking the test suite.

Signature schemas for signature help, deprecation flags, and i18n are
deliberately out of scope at version `'1'`.

---

# 8. Semantic diagnostics

`analyzeProgram(program, options?)` runs a static pass over a parsed
program and returns an array of `SemanticDiagnostic` entries - problems
that are syntactically valid but semantically wrong. Designed for Monaco /
VSCode markers (each diagnostic carries a `SourceSpan { offset, length }`)
and pre-flight checks.

```ts
import { analyzeProgram, ProgramParser } from 'dicerollerts'

const { success, program } = ProgramParser.parse(source)
if (success) {
  const diagnostics = analyzeProgram(program, { deep: true })
  // [{ severity, code, message, span, related? }, ...]
}
```

Tier 1 (always on, zero-false-positive):

- `undefined-variable` - `$name` not bound in any enclosing scope.
- `duplicate-binding` - second binding of `$name` in the same scope.
  Includes `related` pointing at the first declaration. Nested-scope
  shadowing is intentionally not flagged.
- `unreachable-match-arm` - arms after a guard-less `_` wildcard.
- `unreachable-if-branch` - the dead branch of `if true` / `if false`.

Tier 2 (opt-in via `{ deep: true }`):

- `type-mismatch` - operand statically incompatible with the expected type
  (arithmetic on a boolean, `not` on a number, etc.). Conservative: when
  the operand has a union type, the diagnostic fires only if every member
  is incompatible - `if cond then 1 else "x"` used as a number does not
  fire.
- `never-fires-match-arm` - a literal pattern outside the scrutinee's
  exact value support, e.g. `match d6 { 7 -> ... }`. Uses
  `ProgramStats.analyze` internally; only fires when analysis succeeds at
  the `exact` or `constant` tier.

Tier 2 adds real cost (one `ProgramStats.analyze` call per match
expression). Run with `deep: false` (the default) in keystroke-level
loops, or debounce.

The diagnostic's `span` points at parser-built AST positions; ASTs
constructed by hand via the public AST builders carry no positions and
diagnostics on them use a zero span. The `stripLocs(node)` helper is
exported for callers that want position-free clones.

---

# 9. Programmatic AST construction

For tooling that needs to build dice expressions or programs without going
through the parsers.

**Source positions.** Every Statement, Expression, MatchArm, and lifted
dice node accepts an optional `loc?: SourceSpan`. `ProgramParser.parse`
populates `loc` on every node it constructs; the constructor functions
below leave `loc` undefined so hand-built ASTs remain ergonomic. Spans
are best-effort - left-associative parser loops may include trailing
whitespace before the next operator. The dice-grammar's internal builders
do not track positions, so a `DiceReduce` node returned by the parser
carries `loc`, but the `NDice` nested inside it does not.

`stripLocs(node)` is exported for callers that want a position-free clone
(used internally by `canonicalize` so semantic equality ignores formatting).

## Dice expression AST

```ts
import {
  die,
  customDie,
  literal,
  binaryOp,
  unaryOp,
  diceReduce,
  diceExpressions,
  homogeneousDiceExpressions,
  homogeneousCustomDice,
  diceListWithFilter,
  filterableDiceArray,
  filterableDiceExpressions,
  filterableHomogeneous,
  filterableHomogeneousCustom,
  diceListWithMap,
  diceListWithMapHomogeneous,
  drop,
  keep,
  explode,
  reroll,
  compound,
  emphasis,
  upTo,
  always,
  exact,
  between,
  valueOrMore,
  valueOrLess,
  onMax,
  diceVariableRef,
  nDice,
  nDiceLit,
  nDiceVar,
  structuredDiceRoll,
} from 'dicerollerts'

// 3d6 + 5
const expr = binaryOp(
  'sum',
  diceReduce(diceExpressions(die(6), die(6), die(6)), 'sum'),
  literal(5),
)

// 4d6 drop lowest 1
const abilityScore = diceReduce[
  diceListWithFilter(filterableDiceArray([6, 6, 6, 6]), [drop('low', 1)]),
  'sum',
]

// d6 explode once on 6
const exploding = diceReduce[
  diceListWithMap[[6], explode(upTo(1), exact(6))],
  'sum',
]

// 4dF
const fateDice = diceReduce[
  diceExpressions(
    customDie([-1, 0, 1]),
    customDie([-1, 0, 1]),
    customDie([-1, 0, 1]),
    customDie([-1, 0, 1]),
  ),
  'sum',
]

// 8d10 count >= 6
const dicePool = diceReduce[
  diceExpressions(...Array.from({ length: 8 }, () => die(10))),
  { type: 'count', thresholds: [valueOrMore(6)] },
]

// 5d10 count on 6 or more and 10 (multi-step; a 10 counts twice)
const multiStepPool = diceReduce[
  diceExpressions(...Array.from({ length: 5 }, () => die(10))),
  { type: 'count', thresholds: [valueOrMore(6), exact(10)] },
]
```

## Program AST

The program AST builders are exported from the package root:
`program`, `assignment`, `expressionStatement`, `numberLiteral`,
`booleanLiteral`, `stringLiteral`, `variableRef`, `diceExpr`, `binaryExpr`,
`unaryExpr`, `ifExpr`, `recordExpr`, `arrayExpr`, `repeatExpr`,
`fieldAccess`, `indexAccess`, `matchExpr`, `matchArm`, `wildcardPattern`,
`expressionPattern`, `parameterDeclaration`, `diceDeclaration`,
`comprehensionExpr`, `foldExpr`, `runtimeError`. Type exports cover every
node shape.

## Distribution algebra (`Distribution`)

For building custom analysis pipelines without the program language. All
combinators assume the input distributions are independent - for correlated
sources use `ProgramStats`, which tracks shared sources internally.

```ts
import { Distribution } from 'dicerollerts'

const d6   = Distribution.uniform[[1, 2, 3, 4, 5, 6]]
const sum2 = Distribution.add[d6, d6]
const hit  = Distribution.greaterOrEqualConst(sum2, 8)

// Constructors
Distribution.singleton(value)
Distribution.uniform(values)
Distribution.from(map)
Distribution.fromWeights[[[v, w], ...]]

// Operations
Distribution.map[d, fn]
Distribution.combine(a, b, fn)
Distribution.conditional(boolDist, thenDist, elseDist)

// Numeric
Distribution.add | subtract | multiply | negate
Distribution.repeat[d, n]            // sum of n independent copies (n ≥ 0 integer)
Distribution.mean(d) | variance(d)
Distribution.min(d) | max(d)

// Boolean
Distribution.and | or | not

// Comparisons (numeric → boolean)
Distribution.greaterThan | lessThan | greaterOrEqual | lessOrEqual | equal
Distribution.greaterThanConst | lessThanConst | ...

// Misc
Distribution.probabilityOf[d, predicate]
Distribution.fromDiceExpression(expr)  // bridge to DiceStats
```

---

# 10. Program canonicalization

`canonicalize`, `equivalent`, and `canonicalHash` produce a deterministic
canonical form for parsed programs. Useful for fork-detection (has a viewed
program actually been changed?) and content-addressable caching.

```ts
import { canonicalize, equivalent, canonicalHash } from 'dicerollerts'

const a = ProgramParser.parse('$x = 5\n$x + 1').program
const b = ProgramParser.parse('$y = 5\n$y + 1').program

equivalent(a, b) // true (alpha-renaming)
canonicalHash(a) === canonicalHash(b) // true
```

The canonical form ignores cosmetic differences - comments, whitespace,
parenthesization, variable names - and applies algebraic simplifications
(constant folding, identity elimination, commutative reorder, dice fusion
like `1d6 + 1d6 → 2d6`). It is **conservative**: when in doubt it leaves a
difference rather than collapsing it, so a `false` from `equivalent` is
trustworthy (the programs differ in some structural way).

Granular controls let you disable individual normalisations:

```ts
canonicalize(program, {
  alphaRename: false, // keep variable names literal
  sortCommutative: false, // keep operand order
  sortRecordFields: false,
  reorderStatements: false,
  reorderMatchArms: false,
  foldConstants: false,
  applyIdentities: false,
  normalizeDice: false,
  normalizeParameters: false,
})
```

All flags default to `true`. Disabling individual passes weakens equivalence
in well-defined ways.

# 11. Pretty-printing programs (`format`)

`format(program, options?)` re-renders a parsed `Program` AST as canonically-
formatted source. Use it for editor "Format Document" actions, server-side
storage normalisation, deterministic test fixtures, or to hand a normalised
view of a program back to a user.

```ts
import { ProgramParser, format } from 'dicerollerts'

const { program } = ProgramParser.parse('$x=4d6 drop 1+2  # rolled\n')
format(program)
// '$x = 4d6 drop 1 + 2  # rolled\n'
```

The round-trip invariant is `equivalentWithComments(parse(format(p)), p)` -
the formatted output, re-parsed, produces an AST with the same structure
_and the same attached comments_ as the input. Two passes are idempotent:
`format(parse(format(parse(s)))) === format(parse(s))`.

Options:

```ts
format(program, {
  indent: 2, // spaces per indent level (default 2)
  printWidth: 80, // line-length budget for wrap decisions (default 80)
  blankLineBetweenStatements: true, // one blank line between top-level
  // statements (default true). Set false for tight forms / fixtures.
})
```

The formatter is **opinionated** - these three knobs are the entire surface.
The output follows fixed conventions: single space around binary operators,
single space between dice-modifier keywords (`4d6 drop 1`), one match-arm
per line with `->` aligned across arms within a match when alignment fits
the width budget, trailing commas on multi-line record / array literals.

Comments parse correctly and re-emit verbatim. The parser attaches each
comment as **leading** (line above), **trailing** (end-of-line after), or
**standalone** (surrounded by blank lines, attached as leading on the
following top-level statement so it travels under `canonicalize`'s reorder
passes). Comments do not influence `canonicalize` or `equivalent` - they
are pure decoration. Use `equivalentWithComments` when the comment
attachment matters.

```ts
import { equivalent, equivalentWithComments } from 'dicerollerts'

const a = ProgramParser.parse('# rolled\n$x = 4d6 drop 1\n').program
const b = ProgramParser.parse('$x = 4d6 drop 1\n').program

equivalent(a, b) // true (comments ignored)
equivalentWithComments(a, b) // false (different comments)
```

Operators and number literals are re-rendered from the AST, not the source -
`&&` becomes `and`, hex literals re-render in decimal - so re-parsed output
is AST-equivalent rather than byte-identical to the input.

---

## License

Apache-2.0
