# xml-xsd-engine — Completed Features

> All items below are shipped and available in the published package.
> **As of v1.7.0 (2026-04-20)**
> For planned features see [ROADMAP.md](./ROADMAP.md)

---

## v1.0.0 — Foundation

- XML 1.0 Lexer (single-pass state machine, all token types, CRLF normalization, line/col tracking)
- XML Parser producing full DOM (XmlDocument, XmlElement, XmlText, XmlComment, XmlPI)
- Namespace resolution (scoped prefix→URI, default namespace, well-formedness enforcement)
- SAX / event-based parser (push + pull iterator API)
- XML Serializer (round-trip, mixed-content aware)
- XSD 1.0 Parser (large subset, all compositors, xs:import, xs:include)
- Schema Model (in-memory typed graph)
- Validation Engine (structure, type, attribute, occurrence, xs:any)
- Type Validator (44 built-in XSD types, all facets)
- Error system (XmlError, stable XmlErrorCode, ValidationResult, ValidationIssue)
- Output formatters (text, compact, JSON, GitHub Actions, JUnit)
- XPath 1.0 engine (axes, predicates, standard functions)
- Parse Budget and security limits (XXE blocking, depth/node/text limits, Billion Laughs protection)
- Async file I/O (readXmlFile, readXsdFile, validateFiles)
- Streaming parser (Node.js Transform stream, XmlStreamParser)
- Plugin architecture (PluginRegistry, custom type validators, entity resolvers)
- CLI xml-validate (all 5 output formats, exit codes)
- ESM + CJS dual build, TypeScript declarations
- TypeScript-first, strict types

---

## v1.2.0 — Transforms and Batch Processing

- XML to JSON transform (xmlToJson, configurable attribute prefix, text key)
- JSON to XML transform (jsonToXml, jsonToXmlString)
- XSLT-lite transformer (xsl:template, for-each, if, choose, copy, copy-of, AVTs)
- Schema Cache (LRU + TTL)
- Batch Validation (concurrent multi-file/string validation with progress callbacks)

---

## v1.3.0 — Code Generation

- XSD → TypeScript interfaces (generateTypeScript, typePrefix, exportAll, addComments)
- XSD → JSON Schema Draft 7 (generateJsonSchema, id, title, useDefinitions)
- Async schema loader (parseXsdAsync, AsyncSchemaLoader)

---

## v1.4.0 — Error UX and Validation Control

- Validation Pipeline (formal 7-stage: parse, namespace, schema-compile, structure-validate, type-validate, identity-check, post-process)
- Schema Compiler (compileSchema, CompiledSchema — resolves refs, flattens extensions, normalizes occurrences)
- Namespace Engine (standalone NamespaceEngine, scoped prefix→URI, O(decls) push)
- Source-mapped errors (line/col on every ValidationIssue)
- Strict / lax / recover validation modes
- Error recovery mode (collect all errors)
- Profiling hooks (per-stage timing, onProfile callback)
- Compiled XPath expressions (compileXPath, reusable compiled expression)
- Structured error classification (XmlErrorCode categories, stable codes)
- Error code stability guarantee
- xml-format CLI command (indent, declaration, in-place, output)
- xml:space="preserve" propagation in serializer
- Namespace prefix control in serializer

---

## v1.5.0 — Schema Validation Depth

- Identity Constraint Engine (xs:key, xs:unique, xs:keyref — DOM-based XPath evaluation)
- xs:assert evaluator (XPath 1.0-lite assertions on complex types)
- Schema Preflight (validateSchema, checkSchema — self-consistency checks)
- SAX Instrumentation (structured events, position tracking, namespace scope)
- XPath 2.0 functions (22 string + sequence functions: upper-case, lower-case, matches, replace, tokenize, distinct-values, empty, exists, string-join, avg, sum, min, max, ...)
- xml:id uniqueness checking and xmlIds map
- xml:base propagation and resolveXmlBase
- DTD internal entity expansion (safe <!ENTITY> with limits, external entities blocked)
- SHA-256 utilities (sha256Hex, sha256Short — pure TypeScript)
- Schema Cache content-hash keying (invalidateByContent, dependency tracking)
- xs:list / xs:union hardening (nested context validation)
- Platform entry points (browser, deno, bun sub-paths, sideEffects:false)
- xs:redefine parsing groundwork

---

## v1.6.0 — Advanced Tooling

- XML Diff (diffXml, XmlChange, XmlChangeType — structural diff with fingerprint cache)
- Schema Inference (inferSchema, InferredSchema, toXsdString — infer XSD from XML samples)
- Fragment validation (validateFragment — parse + validate XML fragment string)
- Subtree validation (validateSubtree, rootType option)
- Encoding support (ParseOptions.encoding, XmlDocument.encoding)
- Source-mapped errors on all ValidationIssue kinds (line/col from lexer tokens)
- xs:keyref improvements

---

## v1.7.0 — PSVI, C14N, Performance

### New features

- PSVI — Post-Schema-Validation Infoset (PsviAnnotation, PsviMap, extractPsvi, makePsviAnnotation)
- Canonical XML C14N 1.0 (canonicalize, inclusive + exclusive modes, subtree, clearCanonicalCache)
- xs:redefine full support (load + merge + override named components)
- Mixed content model fix (mixed="true" interleaved text, elementOnly warning)
- SchemaMerger class (centralizes xs:import/include/redefine, SHA-256 cache)
- XPath cache control (configureXPathCache, xpathCacheStats, xpathCacheSize, two-tier LRU)
- Streaming code generation (generateTypeScriptStream, generateJsonSchemaStream)

### Performance fixes (P0)

- XPath two-tier LRU cache: +24% first-expression, +4x hot-tier
- Validation traversal memoization (_mergeCache + _substCache): +5%
- XmlDiff fingerprint cache: +40%
- Canonical XML delta saves: O(decls) per element
- PSVI object pool: -60% allocation
- IdentityConstraintEngine pre-index: O(1) skip for constraint-free elements
- SchemaCache maxVersionsPerKey: prevents OOM in long-running servers

### Bug fixes

- ValidationResult._errorCount never incremented — result.valid was always true (CRITICAL)
- XmlCanonical inclusive C14N rendered only new bindings (violated W3C C14N spec §2.4)
- XmlDiff false fingerprint equality (false "no change" on different subtrees)
- AssertionEvaluator non-deterministic error ordering
- ValidationPipeline not forwarding psvi option to ValidationEngine
- SchemaCache._setWithMeta premature eviction

### Test improvements

- 2276 total tests (up from 1333 in v1.6.0)
- SchemaMerger: 0% → fully covered
- Coverage: 87.11% stmts, 76.28% branch, 77.42% funcs, 89.33% lines

---

## v1.7 — Streaming Validation & Performance

> Implemented on branch `v1_7_0` as part of v1.7.0 stability work.

### v1.7 Features (All Complete)

- **DFA per compositor in SchemaCompilerLite (G27)** — `compiledDfas: ReadonlyMap<string, DfaModel>` on `CompiledSchema`; DFA compiled once at `compileSchema()` time, zero per-validation rebuild
- **Streaming validation — SAX-driven (G1)** — `validateStreaming()` / `validateStream()` / `validateStreamingGenerator()` (async generator); no DOM required
- **DFA Engine (G28)** — `buildDfa(particles, compositor)` + `runDfa(dfa, names)`; sequence / choice / all; `particleIndex` map for O(1) element lookup
- **Incremental / subtree revalidation (G38)** — `revalidateSubtree(elem, schema, opts?)` for editor "lint on edit" use cases
- **xs:keyref streaming-aware (G28b)** — `StreamingKeyrefTracker`; element-name index built at construction; accumulates key/unique tuples; validates after parse
- **Streaming result as async generator (P2)** — `validateStreamingGenerator()` yields `StreamingIssue` as produced

### Performance Improvements

- `parseName` / `_lexName` hot path: module-level sticky `_XML_NAME_RE` via `consumePattern()` — single-pass SIMD-friendly scan
- `DfaEngine._runAll` / `_runChoice`: O(1) `particleIndex` map replaces O(n) `Array.find`
- `TypeValidator._normaliseChain`: `_wsCache` caches whitespace mode per type name — no chain rebuild on subsequent calls
- `TypeValidator._checkFacets`: single-pass loop replaces two `Array.filter()` allocations
- `_patternCache`: capped at 512 entries with LRU-lite eviction — prevents unbounded growth
- `SchemaCache.getOrLoad`: `statFast` option uses `fs.stat` mtime+size before full file read — skips read on hit

### Stability & Correctness Fixes

- `StreamingValidator._validateSimpleType`: now validates user-defined `SimpleTypeDefinition` (was silently skipped)
- `StreamingValidator._validateChildOrder`: uses `compiledDfas` (no rebuild per `endElement`)
- `StreamingValidator`: `xsi:type` attribute now overrides `frame.typeDef` — no false positives on runtime type substitution
- `TypeValidator._checkPrimitive`: added `xs:normalizedString`, `xs:token`, `xs:Name`, `xs:ID`/`xs:IDREF`/`xs:ENTITY`, `xs:NMTOKENS`/`xs:IDREFS`/`xs:ENTITIES`, `xs:anySimpleType`/`xs:anyAtomicType`
- `TypeValidator`: `xs:decimal` accepts leading-dot form `.5` per XSD spec
- `TypeValidator`: `xs:date` calendar validation rejects `2024-02-30`, `2024-04-31`, etc.
- `XsdParser.occ()`: `isNaN` guard on `minOccurs`/`maxOccurs` — malformed values no longer propagate `NaN` into DFA
- `validateFragment`: replaced dynamic `require()` with top-level static import

### New Exports

| Symbol | Description |
|--------|-------------|
| `revalidateSubtree` | Incremental subtree revalidation (G38) |
| `validateStreamingGenerator` | Async generator streaming issues |
| `StreamingKeyrefTracker` | Public streaming xs:key/keyref tracker |
| `SchemaCache.get(key)` | Direct retrieval by key |
| `SchemaCache.has(key)` | Presence check |
| `SchemaCacheOptions.statFast` | fs.stat fast path for getOrLoad |
| `CompiledSchema.compiledDfas` | Pre-compiled DFA per complex type |
| `DfaModel.particleIndex` | O(1) name→particle index |
| `IValidator` | Shared interface for DOM + streaming validators |
| `flattenGroupParticles` | Utility to inline xs:group ref particles for DFA |

### Test improvements

- 2460 total tests (up from 2276 in v1.7.0)
- v1.7 regression suite added: 72 tests (v1.7-regression.test.ts)
- Coverage: ~86.7% stmts, ~75.47% branch, ~77.3% funcs, ~88.78% lines
