# Content-as-Code Metadata System

**A specification for managing version-controlled content with cascading metadata inheritance and programmatic rendering.**

---

## Overview

This folder contains the complete specification for implementing a **Content-as-Code Metadata System** in Roadcrew that enables:

✅ **Metadata inheritance** (global → scope → file-level)  
✅ **Data file references** (single source of truth)  
✅ **Cascading defaults** (reduce duplication 50-80%)  
✅ **Programmatic rendering** (write once, render many ways)  
✅ **Publishing safety** (explicit control over what customers see)  
✅ **Version control as audit trail** (Git history = change log)  

---

## Documents in This Folder

### 1. **SPECIFICATION.md** (Main)
**Complete technical specification** covering:
- Executive summary & goals
- System architecture
- Data file formats (authors.yml, licenses.yml, etc.)
- Frontmatter schema
- Metadata resolution algorithm
- Validation rules
- Build-time rendering pipeline
- Publishing & distribution
- Implementation phases (4 phases over 2-3 weeks)
- Test scenarios
- Complete workflow examples
- Migration path & backward compatibility
- Success metrics

**Read this first** to understand the full system.

### 2. **ARCHITECTURE-DIAGRAMS.md** (Visual Reference)
**12 ASCII diagrams** showing:
1. Metadata resolution cascade (5-layer hierarchy)
2. Directory structure hierarchy
3. Data file dependencies
4. Publishing pipeline
5. Rendering format decision tree
6. Metadata inheritance example
7. Reference resolution flow
8. Validation & quality checks
9. Content lifecycle (Draft → Active → Deprecated → Archived)
10. Build time vs runtime processing
11. Publishing safety: internal vs public
12. System interaction map

**Use these diagrams** when explaining the system to teammates or for quick reference.

---

## Key Concepts

### Metadata Layers (Cascading)

```
Layer 1 (Global defaults)     ← Applies to ALL files
Layer 2 (Scope defaults)      ← Applies to specs, epics, etc.
Layer 3 (Type defaults)       ← Optional, type-specific
Layer 4 (File frontmatter)    ← Individual file overrides
Layer 5 (Reference resolution)← Fetch full objects from data files

Result: Final merged metadata object
```

### Data Files (Single Source of Truth)

- **`_defaults.yml`** - Default metadata for scope
- **`authors.yml`** - Author registry (referenced by frontmatter)
- **`licenses.yml`** - License definitions (referenced by frontmatter)
- **`roadcrew-config.yml`** - Version & deployment config
- **`tags.yml`** - Standard categories, domains, audiences

### Rendering Formats (Write Once, Use Many)

Same source file rendered as:
- **Markdown** (with navigation breadcrumbs)
- **HTML** (with meta tags and styling)
- **JSON** (for programmatic access)
- **GitHub Issues** (auto-generated issues with correct metadata)

---

## Implementation Roadmap

### Phase 1: Foundation (2-3 hours)
Build the metadata resolver + data file structure
- Cascading defaults working
- Reference resolution (author, license, tags)
- Validation framework
- Test coverage: 90%+

### Phase 2: Rendering (3-4 hours)
Create rendering pipeline for all formats
- Markdown, HTML, JSON renderers
- Metadata footer injection
- Navigation generation
- Test coverage: 85%+

### Phase 3: Integration (2-3 hours)
Hook into Roadcrew's command pipeline
- Commands use metadata
- Templates generate with correct frontmatter
- Publishing respects metadata flags
- No internal-only files leak

### Phase 4: Commands (2-3 hours)
Expose as customer-facing commands
- `/validate-metadata` - Check metadata syntax
- `/render-content` - Render to different formats
- `/audit-metadata` - Find inconsistencies
- `/update-defaults` - Bulk edit metadata

**Total: 10-15 hours spread over 2-3 weeks**

---

## Example: From Spec to Publishing

### Step 1: Create minimal spec file
```yaml
---
title: "Authentication Specification"
author: "sam.henry"
status: "active"
---

## Overview
Describes user authentication flow...
```

### Step 2: Metadata resolver cascades
```
Global defaults + Specs scope + File frontmatter + Reference resolution
↓
{
  title: "Authentication Specification",
  author: { name: "Sam Henry", email: "sam@...", role: "Technical PM" },
  status: "active",
  copyright: "Copyright (c) 2025 North Star...",
  license: { name: "Roadcrew Commercial License", ... },
  roadcrew_type: "spec",
  test_coverage_target: 80,
  ...
}
```

### Step 3: Render for publishing
```markdown
---
title: "Authentication Specification"
status: "active"
---

## Overview
Describes user authentication flow...

---
**License:** Roadcrew Commercial License
**Copyright:** Copyright (c) 2025 North Star Holdings, LLC
**Last Updated:** November 4, 2025
```

### Step 4: Internal-only files stripped
- Files with `internal_only: true` don't appear in `dist/`
- Sensitive metadata (email, budget code) removed
- Customers get clean, safe specs only

---

## Success Metrics

By implementing this system, we achieve:

| Metric | Before | After | Impact |
|--------|--------|-------|--------|
| Metadata duplication | 50-80% | <5% | Faster updates, fewer bugs |
| Author name consistency | Varies | 100% | No data entry errors |
| Copyright updates | Manual, 640 files | 1 file | 99% time savings |
| Publishing safety | Manual review | Automatic | Zero internal-only leaks |
| Content reusability | Single format | 4+ formats | More use cases |
| Build time (metadata) | N/A | <2s | Efficient CI/CD |
| Test coverage | N/A | 90%+ | Production ready |

---

## Related Files

- **Main specification:** `SPECIFICATION.md`
- **Architecture diagrams:** `ARCHITECTURE-DIAGRAMS.md`
- **Roadcrew publishing:** `.cursor/rules/08-publishing.mdc`
- **System patterns:** `memory-bank/systemPatterns.md`
- **Templates:** `templates/roadcrew/README.md`

---

## Questions?

**Why metadata files instead of database?**  
Version control is the source of truth. Git provides audit trail, change history, and team collaboration without infrastructure.

**Why cascading defaults?**  
Reduce duplication: Define once globally, override only what's different. 50-80% less metadata per file.

**Why separate data files?**  
Single source of truth. Edit author name once in `authors.yml`, auto-updates everywhere referenced.

**Why 4 rendering formats?**  
Write once, use many ways. Same spec becomes README, HTML page, API docs, and GitHub issues.

**When should we start?**  
Recommended: Start Phase 1 after current sprint completes. ~10-15 hours total over 2-3 weeks.

---

## Document Status

**Version:** 1.0  
**Status:** DRAFT  
**Created:** November 4, 2025  
**Last Updated:** November 4, 2025  
**Author:** Sam Henry  

**Ready for:** Team review → Architecture approval → Implementation kickoff
