# Security Model

> Technical threat model, attack inventory, mitigation details, and deployment guidance.
> **v1.7.0**

---

## Threat Inventory

### T1 — XXE (XML External Entity)

**Attack:** `<!DOCTYPE x [<!ENTITY xxe SYSTEM "file:///etc/passwd">]><root>&xxe;</root>`

**Mitigation:**
- External entity resolver is `undefined` by default
- Any `SYSTEM` or `PUBLIC` reference without an explicit resolver throws `SEC_EXTERNAL_ENTITY`
- Even with a custom resolver, the resolved content is expansion-limited

**Status:** Blocked by default.

---

### T2 — Billion Laughs (Nested Entity Expansion)

**Attack:** Chain of entities each expanding the previous, causing exponential memory growth.

**Mitigation:**
- `maxEntityExpansion` (default 10,000) tracks total expansion count across all entities
- Circular entity definitions detected during DTD parsing
- Throws `SEC_ENTITY_EXPANSION` on limit exceeded

**Status:** Blocked by default.

---

### T3 — Deep Nesting (Stack Overflow)

**Attack:** `<a><b><c>...` with thousands of levels.

**Mitigation:**
- `maxDepth` (default 500) checked at every OPEN_TAG token
- Throws `SEC_MAX_DEPTH` before deep nesting causes issues
- Parser is iterative (stack-based), not recursive — actual stack overflow requires `maxDepth` > system stack limit

**Status:** Blocked by default.

---

### T4 — Large Text Node (Memory Exhaustion)

**Attack:** Single text node with gigabytes of content.

**Mitigation:**
- `maxTextLength` (default 10 MB) checked during text accumulation
- Throws `SEC_MAX_TEXT` before text node exceeds limit

**Status:** Blocked by default.

---

### T5 — Attribute Flooding

**Attack:** Single element with thousands of attributes.

**Mitigation:**
- `maxAttributes` (default 256) enforced at lexer level

**Status:** Blocked by default.

---

### T6 — Node Count Explosion

**Attack:** Document with millions of elements.

**Mitigation:**
- `maxNodeCount` (default 1,000,000) tracked globally

**Status:** Blocked by default.

---

### T7 — XPath Injection

**Attack:** User-controlled XPath expression executing unexpected queries.

**Mitigation:**
- Library XPath (`xs:assert`, identity constraints) compiled from schema, not user input
- `xpath()` API accepts strings — do not pass unsanitized user input
- XPath engine is read-only (no side effects, no I/O)

**Status:** Caller responsibility for user-supplied XPath.

---

### T8 — XSLT Code Execution

**Attack:** XSLT stylesheet executing arbitrary operations.

**Mitigation:**
- XSLT-lite transformer is a strict subset — no `xsl:message`, no `document()`, no extension functions
- No I/O possible from XSLT
- Recursion depth limited by parser depth limit

**Status:** Controlled subset — not a general XSLT processor.

---

### T9 — Streaming xs:keyref Tuple Exhaustion *(v1.7)*

**Attack:** An XML document with millions of `xs:key` or `xs:unique` values causes `StreamingKeyrefTracker` to accumulate unbounded tuples, exhausting memory.

**Mitigation:**
- `StreamingKeyrefTrackerOptions.maxTuples` (default **50,000**) caps the total tuple count during a streaming pass
- When the limit is exceeded, `_tupleCapExceeded` is set; collection stops; `validate()` emits a `KEYREF_TUPLE_LIMIT` **warning** (not error) and referential-integrity checking is skipped for the overflowing document
- `StreamingValidationOptions.maxKeyTuples` propagates this limit from the top-level API

```ts
import { validateStreaming, compileSchema, parseXsd } from 'xml-xsd-engine';
const compiled = compileSchema(parseXsd(xsdSource));
// Tighten the limit for untrusted documents
const result = validateStreaming(xmlSource, compiled, { maxKeyTuples: 5_000 });
```

**Status:** Bounded by default (50,000 tuples).

---

## v1.7.0 Security Improvements

| Area | Change |
|------|--------|
| Streaming xs:keyref DoS guard | `StreamingKeyrefTracker.maxTuples` (default 50,000) prevents memory exhaustion from xs:key floods; emits `KEYREF_TUPLE_LIMIT` warning |
| Streaming `xsi:type` type-confusion | `_isDerivedFrom()` check added — unrelated types rejected with `VALID_XSI_TYPE_UNSAFE`; prevents runtime type substitution attacks |
| PSVI | Opt-in only (default false) — no type information leaked unless requested |
| Canonical XML | Pure serializer — no I/O, no entity resolution, safe on untrusted documents |
| SchemaMerger | Circular xs:import chains detected via SHA-256 hash tracking |
| XPath cache | LRU-bounded — configurable via `configureXPathCache()` |
| `ValidationResult.valid` | Bug fix — was always `true` in v1.6.0. Now accurate. |
| SchemaCache | `maxVersionsPerKey` prevents unbounded growth in long-running processes |

---

## Deployment Guidance

### Validating Untrusted XML

```ts
import { parseXml, validate } from 'xml-xsd-engine';

const doc = parseXml(untrustedXml, {
  maxDepth:      100,
  maxNodeCount:  50_000,
  maxTextLength: 1_000_000,
  maxAttributes: 64,
});
const result = validate(doc, schema, { mode: 'strict' });
```

### Validating Untrusted XSD (schema from user)

Use schema preflight first:

```ts
import { validateSchema, parseXsd } from 'xml-xsd-engine';

const issues = validateSchema(untrustedXsd);
if (issues.some(i => i.severity === 'error')) {
  throw new Error('Invalid schema');
}
const schema = parseXsd(untrustedXsd); // safe after preflight
```

### Long-Running Servers

Cache schemas and configure limits:

```ts
import { SchemaCache } from 'xml-xsd-engine';

const cache = new SchemaCache({
  maxSize:          50,
  ttlMs:            3_600_000,
  maxVersionsPerKey: 2,  // prevents OOM from repeated schema changes
});
```
