# Performance Guide

> Benchmark methodology, baseline numbers, performance improvements, and optimization guidance.
> **v1.7.0**

---

## Baseline Numbers (v1.7.0)

| Operation | Input | Throughput | Latency | vs v1.6 | vs v1.7.0 |
|-----------|-------|-----------|---------|---------|-----------|
| `parseXml` small | 2 KB | ~37,000 ops/s | 0.027 ms | same | **+6%** (sticky regex) |
| `parseXml` medium | 40 KB | ~1,300 ops/s | 0.77 ms | same | **+8%** (sticky regex) |
| `parseXml` large | 850 KB | 37 ops/s | 27 ms | same | **+6%** |
| `parseSax` large | 850 KB | 32 ops/s | 31 ms | same | **+6%** |
| `parseXsd` complex | 8 KB | ~20,000 ops/s | 0.05 ms | same | same |
| `validate` 1,000 elements | 40 KB | ~650 ops/s | 1.54 ms | **+10%** | **+5%** |
| `validateStreaming` 1,000 el | 40 KB | ~800 ops/s | 1.25 ms | new | **new** |
| `xpath` //book | 40 KB | ~2,800 ops/s | 0.36 ms | **+24%** | same |
| `xpath` repeated (hot) | 40 KB | ~9,500 ops/s | 0.11 ms | **+4x** | same |
| `serializeXml` 1,000 el | 40 KB | ~550 ops/s | 1.8 ms | same | same |
| `xmlToJson` 1,000 el | 40 KB | ~700 ops/s | 1.4 ms | same | same |
| `diffXml` medium pair | 40 KB | ~420 ops/s | 2.4 ms | **+40%** | same |
| `canonicalize` medium | 40 KB | ~380 ops/s | 2.6 ms | new | same |
| `validate` with psvi | 40 KB | ~570 ops/s | 1.75 ms | new | same |
| `SchemaCache.getOrParse` hit | 8 KB | ~900,000 ops/s | 0.001 ms | same | same |
| `TypeValidator` (repeat type) | — | ~2.5M ops/s | 0.0004 ms | — | **new** (_wsCache) |
| `DfaEngine._runAll` 100 el | — | ~450,000 ops/s | 0.002 ms | — | **new** (particleIndex) |

Numbers recorded on macOS Apple Silicon, Node.js 20 LTS, single-threaded.

---

## v1.7.0 Performance Improvements

| Fix | File | Improvement |
|-----|------|-------------|
| XPath two-tier LRU cache | `XPathEngine.ts` | +24% first-expression, +4x hot-tier |
| ValidationEngine memoization (`_mergeCache`/`_substCache`) | `ValidationEngine.ts` | +5% validate throughput |
| XmlDiff fingerprint WeakMap cache | `XmlDiff.ts` | +40% diff throughput |
| C14N delta namespace save/restore | `XmlCanonical.ts` | O(scope) → O(decls) per element |
| PSVI annotation object pool (256 entries) | `Psvi.ts` | −60% heap allocation on deep docs |
| IdentityConstraintEngine element-name Set | `IdentityConstraintEngine.ts` | O(1) skip for constraint-free elements |
| SchemaCache `maxVersionsPerKey` | `SchemaCache.ts` | Prevents OOM in long-running servers |

---

## v1.7 Performance Improvements

| Fix | File | Improvement |
|-----|------|-------------|
| `parseName`/`_lexName` sticky regex (`_XML_NAME_RE`) | `XmlParser.ts`, `XmlLexer.ts` | ~6% parse throughput on element-heavy docs |
| `DfaModel.particleIndex` (O(1) map vs O(n) `find`) | `DfaEngine.ts` | ~20% DFA run time on xs:all/choice |
| `CompiledSchema.compiledDfas` (pre-compile once) | `SchemaCompilerLite.ts` | DFA rebuild cost ~0 per validation call |
| `StreamingValidator` uses pre-compiled DFAs | `StreamingValidator.ts` | No `buildDfa()` on hot path |
| `TypeValidator._wsCache` | `TypeValidator.ts` | Whitespace chain walk: O(chain) → O(1) after first call |
| `TypeValidator._checkFacets` single-pass | `TypeValidator.ts` | Eliminates 2 `Array.filter` allocs per call |
| `_patternCache` capped at 512 LRU | `TypeValidator.ts` | Bounds memory for long-running servers |
| `SchemaCache.statFast` option | `SchemaCache.ts` | Skips file read on stat fingerprint match |
| `validateFragment` static import | `ValidationEngine.ts` | Removes CJS module-cache lookup on hot path |
| `StreamingKeyrefTracker` element index | `StreamingValidator.ts` | O(1) constraint dispatch vs O(all constraints) |

---

## v1.7.0 Performance Improvements

### P0: Critical

| Fix | Detail | Impact |
|-----|--------|--------|
| XPath two-tier LRU cache | Cold (512) + Hot (128) tier. Hot-tier hits skip all compile cost. | +24% first-time, +4x hot |
| Validation traversal memoization | `_mergeCache` (extension chains) + `_substCache` (substitution groups) | +5% validate |
| XmlDiff fingerprint cache | WeakMap recursive fingerprints; reference fast-path | +40% diff |
| PSVI object pool | 256-entry pool for null-value annotations; WeakMap GC-friendly | -60% allocation |
| Canonical XML delta saves | O(decls) save/restore vs O(scope). Sorted attr comparator hoisted. | +40% c14n |
| IdentityConstraintEngine pre-index | Pre-built element-name Set in constructor | O(1) skip |
| SchemaCache `maxVersionsPerKey` | Caps entries per logical key in long-running servers | prevents OOM |

### P1: Correctness

| Fix | Detail |
|-----|--------|
| `ValidationResult._errorCount` | Now incremented in `addError()`. `result.valid` always accurate. |
| PSVI `notKnown` | Only emitted for genuinely unresolvable types. |
| NamespaceEngine | Flat-cache for empty scopes; O(decls) per push. |
| SAX instrumentation | Zero-cost when not enabled (gated by `enabled` flag). |
| AssertionEvaluator | Deterministic order — sorted by test string before evaluation. |
| SchemaMerger | SHA-256 hash cache prevents repeated parse of same XSD source. |

---

## Running Benchmarks

```bash
npm run build
npm run benchmark
npm run benchmark:compare  # compare with fast-xml-parser / xml2js
```

---

## Comparison with Other Libraries

| Library | Parse 2 KB | Parse 40 KB | XSD validation | XPath |
|---------|-----------|-------------|---------------|-------|
| xml-xsd-engine | ~35,000/s | ~1,200/s | Yes | Yes |
| fast-xml-parser | ~80,000/s | ~2,800/s | No | No |
| xml2js | ~12,000/s | ~400/s | No | No |
| xmldom | ~8,000/s | ~250/s | No | No |

xml-xsd-engine is ~2x slower than fast-xml-parser for raw parsing (cost of full namespace resolution + security limits). For applications that also need XSD validation and XPath, it is typically faster end-to-end.

---

## SchemaCache

For applications that validate many documents against the same schema, `SchemaCache` is the biggest optimization lever:

```ts
import { SchemaCache } from 'xml-xsd-engine';

const cache = new SchemaCache({ maxSize: 50, ttlMs: 3_600_000 });
// schema compiled once, then reused from cache (~900,000 ops/s on hit)

function validateDoc(xmlSource, schemaSource) {
  const schema = cache.getOrParse('my-schema', schemaSource);
  return validate(parseXml(xmlSource), schema);
}
```

---

## XPath Cache Tuning

```ts
import { configureXPathCache, xpathCacheStats } from 'xml-xsd-engine';

// Tune for your workload
configureXPathCache({ coldMax: 1024, hotMax: 256 });

// Monitor hit rate
const { hits, misses } = xpathCacheStats();
console.log(`Hit rate: ${(hits / (hits + misses) * 100).toFixed(1)}%`);
```

Default sizes (512/128) are tuned for typical validation workloads. Increase `coldMax` if you have many unique XPath expressions across many schemas.

---

## BatchValidator

```ts
import { BatchValidator } from 'xml-xsd-engine';

const report = await new BatchValidator('schema.xsd', { concurrency: 8 })
  .validateFiles(largeFileList);
```

Schema compiled once, shared across all workers. Optimal concurrency: `os.cpus().length` for CPU-bound workloads, 16–32 for I/O-bound.

---

## ParseBudget

Set tight limits on untrusted input:

```ts
const doc = parseXml(untrustedInput, {
  maxDepth:      50,
  maxNodeCount:  10_000,
  maxTextLength: 100_000,
});
```

---

## Profiling

```ts
import { runPipeline } from 'xml-xsd-engine';

const result = runPipeline(xmlSource, schema, {
  profile: true,
  onProfile: (e) => console.log(e.stage, e.durationMs),
});

// Stage timings
result.stages.forEach(s => console.log(s.name, s.durationMs));
console.log('Total:', result.totalMs);
```

---

## Memory Model

| Parser mode | Memory | Best for |
|-------------|--------|---------|
| DOM (`parseXml`) | O(document size) | Random access, XPath, validation |
| SAX (`parseSax`) | O(depth) | Event-driven processing |
| Stream (`parseXmlStream`) | O(document) buffered | Pipe API |

- Compiled XPath cache: ~50 KB (512 slots)
- SchemaModel: 1–5 MB per large schema
- PSVI (when enabled): ~200 bytes per annotated element (pooled)
- Canonical XML subtree cache: WeakMap — GC-collectable

---

## Regression Thresholds

| Operation | Alert if slower than |
|-----------|---------------------|
| `parseXml` small | 0.05 ms |
| `validate` medium | 3 ms |
| `validateStreaming` medium | 2 ms |
| `xpath` medium | 0.6 ms |
| `SchemaCache` hit | 0.01 ms |

---

## Future Performance Optimizations (Backlog)

> Identified improvements not yet implemented. Ordered by estimated impact.

### High Impact

| # | Optimization | Area | Expected Gain | Effort |
|---|-------------|------|---------------|--------|
| FP-01 | **Zero-copy lexer** — avoid `String.substring` slicing for token values; use start+length spans | `XmlLexer.ts` | -20% parseXml memory; +10% throughput | High |
| FP-02 | **Adaptive XPath LRU sizing** — auto-tune `coldMax`/`hotMax` based on observed hit/promotion rates | `XPathEngine.ts` | +15% xpath hot-path | Medium |
| FP-03 | **Pre-validate type check on DFA transition** — merge type validation into DFA state transition to eliminate separate pass | `DfaEngine.ts`, `ValidationEngine.ts` | -15% validate time | High |
| FP-04 | **DFA minimization** — reduce state count via Hopcroft's algorithm for large `xs:choice` types | `DfaEngine.ts` | -30% DFA memory; +5% runDfa | High |
| FP-05 | **Worker thread batch validation** — run `BatchValidator` on Node.js `worker_threads` | `BatchValidator.ts` | Linear speedup on multi-core | Medium |

### Medium Impact

| # | Optimization | Area | Expected Gain | Effort |
|---|-------------|------|---------------|--------|
| FP-06 | **PSVI pool per validation call** (not module-level) — enables concurrent validation without lock | `Psvi.ts` | Correctness + concurrency | Medium |
| FP-07 | **Serialized schema cache** — persist `CompiledSchema` to disk between Node.js runs | `SchemaCache.ts` | Eliminate cold compile latency | High |
| FP-08 | **Streaming chunked output for `canonicalize`** — return `AsyncGenerator<string>` instead of full string | `XmlCanonical.ts` | -60% peak memory on large docs | Medium |
| FP-09 | **`_wsCache` clear on schema update** — `TypeValidator._wsCache` is module-level; currently never cleared | `TypeValidator.ts` | Correctness in watch-mode | Low |
| FP-10 | **Namespace string interning pool** — deduplicate repeated namespace URI strings across all elements | `NamespaceEngine.ts` | -10% DOM memory on ns-heavy docs | Low |
| FP-11 | **`StreamingKeyrefTracker` full XPath selector** — current selector parsing is approximate (simple element names only) | `StreamingValidator.ts` | Correctness improvement | High |
| FP-12 | **`DfaModel` sharing across equivalent types** — types with identical particle lists share one DFA | `SchemaCompilerLite.ts` | Reduced schema compile memory | Medium |

### Low Impact / Experimental

| # | Optimization | Area | Expected Gain | Effort |
|---|-------------|------|---------------|--------|
| FP-13 | **WASM build** — compile core parser/validator to WebAssembly | Build | +3-5x browser throughput | Very High |
| FP-14 | **SIMD-accelerated entity scanning** — native SIMD in WASM for `&` / `<` scan | `XmlLexer.ts` | +20% large-text throughput | Very High |
| FP-15 | **Incremental DOM diff** — `revalidateSubtree` currently re-validates the full subtree; diff-based re-check only changed nodes | `ValidationEngine.ts` | -50% incremental validate time | High |
| FP-16 | **Pooled `ValidationIssue` objects** — reuse issue objects in bulk-validation paths | `ValidationResult.ts` | -15% memory in batch mode | Low |
| FP-17 | **`parseXsd` parallel import loading** — load `xs:import` files concurrently using `Promise.all` | `XsdParser.ts` | -30% parseXsd time with imports | Medium |


