# Design Decisions

> Significant architectural choices made during development of xml-xsd-engine.
> See also `docs/adr/` for Architecture Decision Records.

---

## DD-01: Zero Runtime Dependencies

**Decision:** No npm packages are used at runtime.

**Reasons:** Browser + Deno + Bun compatibility without polyfills; no transitive vulnerability surface; predictable, auditable behavior; small install size.

**Trade-off:** Higher initial implementation effort. Some features (e.g. full XSD 1.1) take longer to reach parity with mature engines.

---

## DD-02: DOM-First (v1.x), Streaming-Capable (v1.7), Streaming-First (v2.0)

**Decision:** v1.x builds a full DOM for validation. v1.7 adds SAX-driven streaming as an opt-in path. v2.0 will make DOM optional.

**Reasons (DOM in v1.x):**
- XPath queries, PSVI, identity constraints, and diffing all require random access
- Simpler implementation path for v1.x
- Most documents in practice are small enough for DOM

**v1.7 streaming path:**
- `validateStreaming()` / `validateStreamingGenerator()` operate entirely on SAX events
- `StreamingKeyrefTracker` resolves xs:keyref in a single SAX pass using a post-parse map
- DFA pre-compiled per complex type (`CompiledSchema.compiledDfas`) eliminates per-element rebuild
- `xsi:type` attribute override handled in `startElement` SAX handler

**Trade-off:** Streaming mode does not support PSVI, XPath-based xs:assert, or cross-document identity constraints.

---

## DD-03: Separate Parse and Validate

**Decision:** `XmlParser` knows nothing about schemas. `ValidationEngine` operates on a pre-built DOM.

**Reasons:** Clear error attribution; DOM reuse across multiple schemas; well-formedness checking without a schema; testability.

---

## DD-04: Schema Compilation Phase

**Decision:** Raw `SchemaModel` (from XSD parse) is compiled into `CompiledSchema` before validation.

**v1.7.0:** `compileSchema()` resolves all `ref=""` references, flattens extension chains, builds the identity constraint index.

**v1.7.0 (G27):** `_compileDfas()` pre-compiles one `DfaModel` per complex type into `CompiledSchema.compiledDfas`. Zero DFA rebuild per validation call — both DOM and streaming paths use this map.

**Trade-off:** Compilation adds latency on first use. Mitigated by `SchemaCache`.

---

## DD-05: Security by Default

**Decision:** All security limits enabled by default. Opt-out is not available.

**Reasons:** Most callers don't know about Billion Laughs, XXE, etc. Default safe behavior prevents accidental exposure.

---

## DD-06: Stable Error Codes

**Decision:** `XmlErrorCode` values are never removed or renamed in minor/patch versions.

**Reasons:** CI pipelines and error-handling code depend on specific codes.

---

## DD-07: PSVI as Opt-In Map

**Decision:** PSVI annotations stored in `Map<XmlElement, PsviAnnotation>`. Disabled by default.

**v1.7 change:** Changed from `WeakMap` to `Map` to expose `.size` for memory monitoring.

**Reasons:** Zero overhead when not used; backward compatible. The `.size` property is needed for tests and monitoring tools that need to verify annotation count.

**Trade-off:** `Map` holds strong references to `XmlElement` nodes. The PSVI map must be released when the document is no longer needed to allow GC.

---

## DD-08: Two-Tier XPath LRU Cache

**Decision:** Two independent LRU maps (cold 512 + hot 128).

**Reasons:** Validation schemas have a small set of frequently used expressions → benefit from hot tier; one-time diagnostic queries don't pollute it. Configurable at runtime via `configureXPathCache()`.

---

## DD-09: SHA-256 Content-Hash Schema Caching

**Decision:** `SchemaMerger` and `SchemaCache` key by SHA-256 hash of XSD source.

**Reasons:** Same XSD content may be referenced from multiple locations; hash-based key detects content changes (`invalidateByContent()`); prevents redundant re-parsing in import chains.

---

## DD-10: Canonical XML as Independent Module

**Decision:** `XmlCanonical.ts` has no runtime dependency on `ValidationEngine` or `SchemaModel`.

**Reasons:** Canonicalization is a pure serialization concern; enables use without the validation stack; independently testable.

---

## DD-11: Formal 7-Stage Validation Pipeline

**Decision:** Validation is a formal pipeline with named stages.

**Reasons:** Per-stage timing; per-stage issue attribution; enables partial re-execution; clear integration point for streaming (replace `parse` stage with SAX).

---

## DD-12: DFA Per Complex Type (v1.7)

**Decision:** One `DfaModel` is compiled per named complex type at `compileSchema()` time and stored in `CompiledSchema.compiledDfas`.

**Reasons:**
- Content-model validation is the hottest inner loop in both DOM and streaming paths
- Per-call `buildDfa()` was O(types × elements × validations)
- Pre-compiled DFAs reduce this to O(1) map lookup + O(elements) DFA run per element
- `DfaModel.particleIndex` (Map-based) further reduces the O(n) `Array.find` to O(1) per child element

**Trade-off:** Compilation memory grows linearly with number of named complex types. Schemas with thousands of types may see higher `compileSchema()` memory. Mitigated by `SchemaCache` so compilation only happens once per distinct XSD source.

---

## DD-13: `parseName` Sticky Regex (v1.7)

**Decision:** `XmlParser.parseName()` and `XmlLexer._lexName()` use a module-level sticky `RegExp` (`/[A-Za-z0-9_...]+/y`) via `consumePattern()` instead of per-character callbacks.

**Reasons:** V8 can use SIMD string scanning for sticky regexes. The old `consumeWhile(ch => /.../.test(ch))` compiled and executed a fresh regex test per character — a significant overhead for element-name-heavy documents.

**Trade-off:** Sticky regexes are stateful (`lastIndex` must be reset before each use). `consumePattern()` in `StringReader` sets `lastIndex = this._pos` before each call, ensuring correctness.

---

## DD-14: `StreamingKeyrefTracker` Element-Name Index (v1.7)

**Decision:** `StreamingKeyrefTracker` builds a `Map<elementName, IdentityConstraint[]>` index in its constructor rather than iterating all schema constraints per SAX element.

**Reasons:** Schemas with many `xs:key`/`xs:keyref` declarations would otherwise cause O(constraints × elements) SAX event processing. The index reduces this to O(1) per element (amortized over construction).

**Trade-off:** The index is an approximation based on the simple part of the `selector` expression. Complex multi-axis selectors (e.g. `.//a/b`) may not match correctly. Full XPath selector evaluation is deferred to a future release.

---

## DD-15: `SchemaCache.statFast` (v1.7)

**Decision:** `getOrLoad` supports an opt-in `statFast: boolean` mode that uses `fs.stat` (mtime + size) as a cache freshness fingerprint instead of reading the full file and computing SHA-256.

**Reasons:** In long-running servers with watch-mode, `getOrLoad` is called frequently for the same stable schemas. The full file read + SHA-256 is wasteful when the file has not changed. `mtime + size` is a reliable proxy in practice for most server environments.

**Trade-off:** `mtime + size` is not collision-resistant. Files with the same size and modification time but different content (e.g. atomic writes that preserve mtime) will not be re-validated. Use `statFast: false` (default) in environments where schema files are updated programmatically without mtime changes.
