# Contributing to xml-xsd-engine

Thank you for your interest in contributing!  
This document is the single authoritative guide for contributors — from setup to a merged PR.

---

## Table of Contents

1. [Code of Conduct](#1-code-of-conduct)
2. [Getting Started](#2-getting-started)
3. [Project Layout](#3-project-layout)
4. [Architecture Quick Map](#4-architecture-quick-map)
5. [Development Workflow](#5-development-workflow)
6. [Writing Tests](#6-writing-tests)
7. [Code Style](#7-code-style)
8. [Commit Messages](#8-commit-messages)
9. [Pull Request Process](#9-pull-request-process)
10. [Core Design Principles](#10-core-design-principles)
11. [Zero-Dependency Rule](#11-zero-dependency-rule)
12. [Performance Expectations](#12-performance-expectations)
13. [Error Code Stability](#13-error-code-stability)
14. [Adding Schema Tests](#14-adding-schema-tests)
15. [DFA / XPath Change Policy](#15-dfa--xpath-change-policy)
16. [Versioning & Release](#16-versioning--release)
17. [What We Will / Will Not Accept](#17-what-we-will--will-not-accept)
18. [DCO — Developer Certificate of Origin](#18-dco--developer-certificate-of-origin)

---

## 1. Code of Conduct

This project follows the [Contributor Covenant Code of Conduct](./CODE_OF_CONDUCT.md).  
Be respectful, constructive, and inclusive.

---

## 2. Getting Started

### Prerequisites

- **Node.js ≥ 18** (CI matrix: 18, 20, 22)
- **npm ≥ 9**
- TypeScript knowledge (all source is `.ts`, `strict: true`)
- Familiarity with XML/XSD concepts helps — see [docs/GLOSSARY.md](./docs/GLOSSARY.md)

### Setup

```bash
git clone https://github.com/hephesthesis/xml-xsd-engine.git
cd xml-xsd-engine
npm install       # installs TypeScript + ESLint + ts-jest (dev deps only)
npm run build     # compile to dist/cjs, dist/esm, dist/types
npm test          # must show 2056 tests passed, 0 failed
npm run lint      # must show 0 errors (≤ 100 warnings allowed)
```

Before making any changes, confirm the test suite passes cleanly. If `npm test` fails before your changes, open an issue rather than masking it.

---

## 3. Project Layout

```
src/
  browser.ts / bun.ts / deno.ts   Platform entry points (G9/G25)
  index.ts                         Main public API barrel export

  cache/        SchemaCache — LRU + TTL with content-hash keying (G40)
  cli/          xml-validate + xml-format binaries
  codegen/      XSD → TypeScript / JSON Schema generation (G14)
  errors/       XmlError, all error codes, ERROR_CATEGORY map (G37)
  io/           Async file I/O, Node.js stream parser
  namespace/    NamespaceEngine — scoped prefix→URI resolution (G35)
  parser/       XmlLexer, XmlParser, SaxParser, SaxInstrumentation,
                XPathEngine, XPath2Functions, XmlSerializer, XmlNodes
  pipeline/     ValidationPipeline (G34), SchemaCompilerLite (G27-lite)
  plugins/      PluginRegistry — extension points
  schema/       SchemaModel — compiled XSD representation
  transform/    XsltTransformer, XmlToJson, JsonToXml
  utils/        ParseBudget, sha256, xmlUtils, XmlDiff (G16),
                SchemaInference (G17), StringReader, benchmark
  validator/    ValidationEngine, TypeValidator, IdentityConstraintEngine,
                AssertionEvaluator, SchemaPreflight, BatchValidator,
                ValidationResult, XsdTypeSystem, formatters
  xsd/          XsdParser — XSD 1.0 → SchemaModel

  tests/
    fixtures/   Shared XML + XSD test files
    unit/       46 test files — one per module
```

---

## 4. Architecture Quick Map

| Layer | Module(s) | Purpose |
|---|---|---|
| Tokenizer | `XmlLexer` | Raw string → lazy token stream |
| DOM builder | `XmlParser` | Token stream → `XmlDocument` tree |
| SAX layer | `SaxParser`, `SaxInstrumentation` | Token stream → events (no DOM) |
| Namespace | `NamespaceEngine` | Scoped prefix → URI resolution |
| XSD parser | `XsdParser` | XSD source → `SchemaModel` |
| Schema compile | `SchemaCompilerLite` | `SchemaModel` → `CompiledSchema` |
| Schema lint | `SchemaPreflight` | Self-consistency checks |
| Validation | `ValidationEngine`, `IdentityConstraintEngine`, `AssertionEvaluator` | DOM × Schema → `ValidationResult` |
| Pipeline | `ValidationPipeline` | Orchestrate all 7 stages |
| XPath | `XPathEngine`, `XPath2Functions` | Query + XPath 2.0 utilities |
| Transforms | `XsltTransformer`, `XmlToJson`, `JsonToXml`, `XmlDiff`, `SchemaInference` | Data conversion + tooling |
| Output | `formatters`, CLI | Developer-facing output |

Data flow is strictly **unidirectional**: Lexer → Parser → Namespace → Schema → Validation → Output. No module reaches backward into a lower layer.

---

## 5. Development Workflow

### Branch naming

| Type | Pattern | Example |
|---|---|---|
| Feature | `feature/G<n>-short-description` | `feature/G48-xpath-full` |
| Bug fix | `fix/issue-<n>-description` | `fix/issue-42-list-facet` |
| Docs | `docs/description` | `docs/update-glossary` |
| Chore | `chore/description` | `chore/bump-jest` |

### Checklist before committing

1. `npm test` — all existing tests pass
2. `npm run lint` — no new errors
3. `npm run build` — all three tsconfig targets compile cleanly
4. New functionality has tests
5. `CHANGELOG.md` updated under `[Unreleased]`
6. Relevant docs updated (API.md, FEATURES.md, GUIDE.md as appropriate)

---

## 6. Writing Tests

### One test file per module

Each source file has a corresponding test file in `src/tests/unit/`. New modules need a new test file.

### Feature tests for new versions

New minor-version features belong in `v<major>.<minor>-features.test.ts` (e.g. `v1.7-features.test.ts`).

### Test requirements

- Every new public function must be tested: happy path + at least one error path + edge cases
- Error condition tests must assert on `err.code`, not `err.message`
- Tests must pass deterministically — no timing-sensitive assertions
- Tests must be independent — reset any state between tests
- Large XML/XSD fixtures go in `src/tests/fixtures/`, not inline strings

### Example

```ts
import { parseXml } from '../../parser/XmlParser';

describe('parseXml', () => {
  it('throws PARSE_MISMATCHED_TAG on mismatched tags', () => {
    expect(() => parseXml('<a></b>')).toThrow(
      expect.objectContaining({ code: 'PARSE_MISMATCHED_TAG' }),
    );
  });

  it('resolves default namespace', () => {
    const doc = parseXml('<root xmlns="http://example.com"/>');
    expect(doc.root!.namespaceURI).toBe('http://example.com');
  });
});
```

---

## 7. Code Style

### TypeScript

- `strict: true` — no `any`, no unchecked indexing
- Explicit return types on all exported functions
- JSDoc comments on all exported symbols
- No default exports — named exports only
- Interfaces over type aliases for object shapes

### Naming conventions

| Construct | Convention | Example |
|---|---|---|
| Classes / Interfaces | `PascalCase` | `ValidationEngine`, `ValidationIssue` |
| Functions | `camelCase` | `parseXml`, `validateFragment` |
| Constants | `SCREAMING_SNAKE` | `XSD_NS`, `ERROR_CATEGORY` |
| Private methods | `_camelCase` | `_checkTypeRefs` |
| Files | `PascalCase.ts` for classes, `camelCase.ts` for utilities | `XmlParser.ts`, `xmlUtils.ts` |

### ESLint

Run `npm run lint` before every commit. Key rules enforced:
- `no-restricted-imports` — **no runtime dependencies**
- `no-explicit-any`
- `no-unused-vars`
- `prefer-const`
- `eqeqeq`

---

## 8. Commit Messages

Follow [Conventional Commits](https://www.conventionalcommits.org/):

```
<type>(<scope>): <short description>

[optional body — explain WHY, not WHAT]

[optional footer: Fixes #123]
```

**Types:** `feat` · `fix` · `docs` · `test` · `refactor` · `perf` · `chore` · `ci`

**Examples:**

```
feat(validator): add validateFragment and validateSubtree (G45)
fix(parser): handle empty xs:list correctly (was splitting '' to [''])
docs(guide): add v1.7 fragment validation recipes
test(xpath): add edge-case tests for descendant-or-self axis
```

---

## 9. Pull Request Process

### Before opening a PR

- [ ] All existing tests pass (`npm test`)
- [ ] No new lint errors (`npm run lint`)
- [ ] Build succeeds (`npm run build`)
- [ ] New functionality has tests
- [ ] `CHANGELOG.md` updated in `[Unreleased]`
- [ ] Relevant docs updated

### PR description

Use the PR template. Include: What · Why · How · Testing · Breaking changes (even additive API changes).

### Review criteria

1. **Correctness** — does it do what it claims?
2. **Tests** — are all paths covered?
3. **Zero-dependency compliance** — no new runtime imports
4. **Performance** — does it regress benchmarks?
5. **API consistency** — naming, option patterns, error codes
6. **Type safety** — no `any`, no unsound casts
7. **Documentation** — exported symbols have JSDoc; docs updated

### Merging

- 1 approving review from a maintainer required
- All CI checks must pass (lint + typecheck + test matrix on Node 18/20/22)
- Squash-merge preferred for feature branches

---

## 10. Core Design Principles

| Principle | Detail |
|---|---|
| **Zero runtime dependencies** | No `npm install` dependency at runtime — ever. See §11. |
| **Security by default** | XXE blocked, entity expansion capped, all limits on. No opt-in required. |
| **Stable error contracts** | `XmlErrorCode` values stable across all minor/patch versions. See §13. |
| **TypeScript-first** | All source `.ts`, `strict: true`, full `.d.ts` for every export. |
| **Separation of concerns** | Each layer is independently testable. No circular module dependencies. |
| **Testability** | Zero global mutable state in source modules. |
| **Backward compatibility** | No breaking API changes within a major version. New options must have safe defaults. |

---

## 11. Zero-Dependency Rule

**xml-xsd-engine has zero runtime dependencies. This is a core design goal.**

- The `dependencies` field in `package.json` must remain empty
- No `import`/`require` of any package not in `devDependencies`
- Every algorithm is hand-implemented (`sha256`, LRU cache, XPath evaluator, etc.)

ESLint's `no-restricted-imports` enforces this automatically. If you need a utility from npm, implement a minimal version in `src/utils/`.

**Why?** Security surface (zero CVEs from upstream), bundle size (~165 kB), long-term stability, full auditability.

`devDependencies` (TypeScript, Jest, ts-jest, ESLint) are allowed — never in the published bundle.

---

## 12. Performance Expectations

New features must not regress existing benchmark numbers:

```bash
npm run build && npm run benchmark
npm run benchmark:compare   # compare against fast-xml-parser/xml2js if installed
```

**Minimum throughput thresholds:**

| Operation | Minimum |
|---|---|
| DOM parse (small ~2 KB) | ≥ 8,000 ops/s |
| DOM parse (medium ~40 KB) | ≥ 500 ops/s |
| XSD compile | ≥ 15,000 ops/s |
| Validation (1,000 elements) | ≥ 400 ops/s |

Include benchmark results in PRs that touch the hot path (lexer, parser, validation engine). See `docs/PERFORMANCE.md` for full methodology.

---

## 13. Error Code Stability

**`XmlErrorCode` values are stable across all minor and patch versions.**

- **Adding** a new code: allowed in any minor version
- **Renaming** an existing code: requires a **major version bump**
- **Removing** an existing code: requires a **major version bump**
- **Changing a code's category**: treated as a rename — requires major bump

### Adding a new error code

1. Add the literal to `XmlErrorCode` union in `src/errors/XmlError.ts`
2. Add its category to `ERROR_CATEGORY` in the same file
3. Document it in `docs/ERRORS.md` (description, trigger, example)
4. Add to `CHANGELOG.md` under `[Unreleased]`
5. Add at least one test asserting `err.code === 'YOUR_NEW_CODE'`

---

## 14. Adding Schema Tests

New XSD features require:

1. **A minimal XSD fixture** — one concept per fixture, no unnecessary complexity
2. **A valid document test** — must produce `result.valid === true` with 0 errors
3. **An invalid document test** — must assert on the exact error `code`, not `message`
4. **Edge cases** — empty elements, missing attributes, `maxOccurs="unbounded"`, namespaces, `xsi:nil`
5. **Schema preflight test** — if adding schema-level checks, test `validateSchema()` / `checkSchema()`

---

## 15. DFA / XPath Change Policy

Changes to `XPathEngine.ts` and the validation/compilation path (`SchemaCompilerLite`, `ValidationEngine`) require extra care.

### XPath changes

- New/changed behaviour must have tests covering the new case + 3 existing cases that remain unchanged
- XPath results must be deterministic (document order only)
- New XPath 2.0 functions go in `XPath2Functions.ts` as pure utilities, not in `XPathEngine.ts`

### Content model changes

Before modifying sequence/choice/all/occurrence logic:

1. Read `docs/ARCHITECTURE.md` §12 and §10
2. Read `docs/adr/0002-dfa-validation.md`
3. Test `recover` mode — must continue past errors without panicking
4. Tag the PR with `needs-dfa-review`

---

## 16. Versioning & Release

| Bump | When |
|---|---|
| **Patch** (1.x.**y**) | Bug fixes, docs, test additions — no API change |
| **Minor** (1.**x**.0) | New exports, new options with safe defaults, new error codes |
| **Major** (**x**.0.0) | Breaking API changes, error code renames/removals |

### ADR (Architecture Decision Records)

Significant design decisions go in `docs/adr/NNNN-short-title.md`. Format: Status · Context · Decision · Consequences. See existing ADRs for examples.

---

## 17. What We Will / Will Not Accept

### Will accept ✅

- Bug fixes with regression tests
- Performance improvements with benchmark data
- New XSD constructs with tests and docs
- New utility exports (zero-dep rule applies)
- New output formatters
- Documentation improvements
- ADRs recording design decisions

### Will not accept ❌

| PR type | Reason |
|---|---|
| Adds a runtime dependency | Zero-dependency rule is non-negotiable |
| Breaks existing API without major bump | Stability guarantee |
| Renames an error code | Requires major bump |
| Uses `any` type | TypeScript strict mode required |
| Adds `console.log` to source (not tests) | Use structured error reporting |
| External entity resolution | XXE attack surface |
| SOAP/WSDL/HTML parsing | Out of scope |
| Full XQuery engine | Multi-year project; out of scope |

---

## 18. DCO — Developer Certificate of Origin

By submitting a pull request, you certify that you have the right to submit the contribution under the project's MIT license (see the full [DCO](https://developercertificate.org/) text).

Sign-off with `git commit -s` or include `Signed-off-by: Your Name <email>` in the commit message body.

This project does **not** require a CLA. The DCO is sufficient.
