---
title: TEA Step-File Architecture
description: Explanation of step-file architecture for 100% LLM compliance
---

# TEA Step-File Architecture

**Version**: 1.0
**Date**: 2026-01-27
**Purpose**: Explain step-file architecture for 100% LLM compliance

---

## Why Step Files?

### The Problem

Traditional workflow instructions suffer from "too much context" syndrome:

- **LLM Improvisation**: When given large instruction files, LLMs often improvise or skip steps
- **Non-Compliance**: Instructions like "analyze codebase then generate tests" are too vague
- **Context Overload**: 5000-word instruction files overwhelm the 200k context window
- **Unpredictable Output**: Same workflow produces different results each run

### The Solution: Step Files

**Step files** break workflows into granular, self-contained instruction units:

- **One Step = One Clear Action**: Each step file contains exactly one task
- **Explicit Exit Conditions**: LLM knows exactly when to proceed to next step
- **Context Injection**: Each step repeats necessary information (no assumptions)
- **Prevents Improvisation**: Strict "ONLY do what this step says" enforcement

**Result**: **100% LLM compliance** - workflows produce consistent, predictable, high-quality output every time.

---

## Architecture Overview

### Before Step Files (Monolithic)

```
workflow/
├── workflow.yaml          # Metadata
├── instructions.md        # 5000 words of instructions ⚠️
├── checklist.md          # Validation checklist
└── templates/            # Output templates
```

**Problems**:

- Instructions too long → LLM skims or improvises
- No clear stopping points → LLM keeps going
- Vague instructions → LLM interprets differently each time

### After Step Files (Granular)

```
workflow/
├── workflow.yaml          # Metadata (points to step files)
├── checklist.md          # Validation checklist
├── templates/            # Output templates
└── steps/
    ├── step-1-setup.md          # 200-500 words, one action
    ├── step-2-analyze.md        # 200-500 words, one action
    ├── step-3-generate.md       # 200-500 words, one action
    └── step-4-validate.md       # 200-500 words, one action
```

**Benefits**:

- Granular instructions → LLM focuses on one task
- Clear exit conditions → LLM knows when to stop
- Repeated context → LLM has all necessary info
- Subagent support → Parallel execution possible

---

## Step File Principles

### 1. Just-In-Time Loading

**Only load the current step file** - never load all steps at once.

```yaml
# workflow.yaml
steps:
  - file: steps/step-1-setup.md
    next: steps/step-2-analyze.md
  - file: steps/step-2-analyze.md
    next: steps/step-3-generate.md
```

**Enforcement**: Agent reads **one step file**, executes it, then loads **next step file**.

### 2. Context Injection

**Each step repeats necessary context** - no assumptions about what LLM remembers.

Example (step-3-generate.md):

```markdown
## Context (from previous steps)

You have:

- Analyzed codebase and identified 3 features: Auth, Checkout, Profile
- Loaded knowledge fragments: fixture-architecture, api-request, network-first
- Determined test framework: Playwright with TypeScript

## Your Task (Step 3 Only)

Generate API tests for the 3 features identified above...
```

### 3. Explicit Exit Conditions

**Each step clearly states when to proceed** - no ambiguity.

Example:

```markdown
## Exit Condition

You may proceed to Step 4 when:

- ✅ All API tests generated and saved to files
- ✅ Test files use knowledge fragment patterns
- ✅ All tests have .spec.ts extension
- ✅ Tests are syntactically valid TypeScript

Do NOT proceed until all conditions met.
```

### 4. Strict Action Boundaries

**Each step forbids actions outside its scope** - prevents LLM wandering.

Example:

```markdown
## What You MUST Do

- Generate API tests only (not E2E, not fixtures)
- Use patterns from loaded knowledge fragments
- Save to tests/api/ directory

## What You MUST NOT Do

- ❌ Do NOT generate E2E tests (that's Step 4)
- ❌ Do NOT run tests yet (that's Step 5)
- ❌ Do NOT refactor existing code
- ❌ Do NOT add features not requested
```

### 5. Subagent Support

**Independent steps can run in parallel subagents** - massive performance gain.

Example (automate workflow):

```
Step 1-2: Sequential (setup)
Step 3: Subagent A (API tests) + Subagent B (E2E tests) - PARALLEL
Step 4: Sequential (aggregate)
```

See [subagent-architecture.md](./subagent-architecture.md) for details.

---

## TEA Workflow Step-File Patterns

### Pattern 1: Sequential Steps (Simple Workflows)

**Used by**: framework, ci

```
Step 1: Setup → Step 2: Configure → Step 3: Generate → Step 4: Validate
```

**Characteristics**:

- Each step depends on previous step output
- No parallelization possible
- Simpler, run-once workflows

### Pattern 2: Parallel Generation (Test Workflows)

**Used by**: automate, atdd

```
Step 1: Setup
Step 2: Load knowledge
Step 3: PARALLEL
  ├── Subagent A: Generate API tests
  └── Subagent B: Generate E2E tests
Step 4: Aggregate + validate
```

**Characteristics**:

- Independent generation tasks run in parallel
- 40-50% performance improvement
- Most frequently used workflows

### Pattern 3: Parallel Validation (Quality Workflows)

**Used by**: test-review, nfr-assess

```
Step 1: Load context
Step 2: PARALLEL
  ├── Subagent A: Check dimension 1
  ├── Subagent B: Check dimension 2
  ├── Subagent C: Check dimension 3
  └── (etc.)
Step 3: Aggregate scores
```

**Characteristics**:

- Independent quality checks run in parallel
- 60-70% performance improvement
- Complex scoring/aggregation logic

### Pattern 4: Two-Phase Workflow (Dependency Workflows)

**Used by**: trace

```
Phase 1: Generate coverage matrix → Output to temp file
Phase 2: Read matrix → Apply decision tree → Generate gate decision
```

**Characteristics**:

- Phase 2 depends on Phase 1 output
- Not parallel, but clean separation of concerns
- Subagent-like phase isolation

### Pattern 5: Risk-Based Planning (Design Workflows)

**Used by**: test-design

```
Step 1: Load context (story/epic)
Step 2: Load knowledge fragments
Step 3: Assess risk (probability × impact)
Step 4: Generate scenarios
Step 5: Prioritize (P0-P3)
Step 6: Output test design document
```

**Characteristics**:

- Sequential risk assessment workflow
- Heavy knowledge fragment usage
- Structured output (test design document)

---

## Knowledge Fragment Integration

### Loading Fragments in Step Files

Step files explicitly load knowledge fragments:

```markdown
## Step 2: Load Knowledge Fragments

Consult `{project-root}/_bmad/tea/testarch/tea-index.csv` and load:

1. **fixture-architecture** - For composable fixture patterns
2. **api-request** - For API test patterns
3. **network-first** - For network handling patterns

Read each fragment from `{project-root}/_bmad/tea/testarch/knowledge/`.

These fragments are your quality guidelines - use their patterns in generated tests.
```

### Fragment Usage Enforcement

Step files enforce fragment patterns:

```markdown
## Requirements

Generated tests MUST follow patterns from loaded fragments:

✅ Use fixture composition pattern (fixture-architecture)
✅ Use await apiRequest() helper (api-request)
✅ Intercept before navigate (network-first)

❌ Do NOT use custom patterns
❌ Do NOT skip fragment patterns
```

---

## Step File Template

### Standard Structure

Every step file follows this structure:

```markdown
# Step N: [Action Name]

## Context (from previous steps)

- What was accomplished in Steps 1, 2, ..., N-1
- Key information LLM needs to know
- Current state of workflow

## Your Task (Step N Only)

[Clear, explicit description of single task]

## Requirements

- ✅ Requirement 1
- ✅ Requirement 2
- ✅ Requirement 3

## What You MUST Do

- Action 1
- Action 2
- Action 3

## What You MUST NOT Do

- ❌ Don't do X (that's Step N+1)
- ❌ Don't do Y (out of scope)
- ❌ Don't do Z (unnecessary)

## Exit Condition

You may proceed to Step N+1 when:

- ✅ Condition 1 met
- ✅ Condition 2 met
- ✅ Condition 3 met

Do NOT proceed until all conditions met.

## Next Step

Load `steps/step-[N+1]-[action].md` and execute.
```

### Example: Step File for API Test Generation

````markdown
# Step 3A: Generate API Tests (Subagent)

## Context (from previous steps)

You have:

- Analyzed codebase and identified 3 features: Auth, Checkout, Profile
- Loaded knowledge fragments: api-request, data-factories, api-testing-patterns
- Determined test framework: Playwright with TypeScript
- Config: use_playwright_utils = true

## Your Task (Step 3A Only)

Generate API tests for the 3 features identified above.

## Requirements

- ✅ Generate tests for all 3 features
- ✅ Use Playwright Utils `apiRequest()` helper (from api-request fragment)
- ✅ Use data factories for test data (from data-factories fragment)
- ✅ Follow API testing patterns (from api-testing-patterns fragment)
- ✅ TypeScript with proper types
- ✅ Save to tests/api/ directory

## What You MUST Do

1. For each feature (Auth, Checkout, Profile):
   - Create `tests/api/[feature].spec.ts`
   - Import necessary Playwright fixtures
   - Import Playwright Utils helpers (apiRequest)
   - Generate 3-5 API test cases covering happy path + edge cases
   - Use data factories for request bodies
   - Use proper assertions (status codes, response schemas)

2. Follow patterns from knowledge fragments:
   - Use `apiRequest({ method, url, data })` helper
   - Use factory functions for test data (not hardcoded)
   - Test both success and error responses

3. Save all test files to disk

## What You MUST NOT Do

- ❌ Do NOT generate E2E tests (that's Step 3B - parallel subagent)
- ❌ Do NOT generate fixtures yet (that's Step 4)
- ❌ Do NOT run tests yet (that's Step 5)
- ❌ Do NOT use custom fetch/axios (use apiRequest helper)
- ❌ Do NOT hardcode test data (use factories)

## Output Format

Output JSON to `/tmp/automate-api-tests-{timestamp}.json`:

```json
{
  "success": true,
  "tests": [
    {
      "file": "tests/api/auth.spec.ts",
      "content": "[full test file content]",
      "description": "API tests for Auth feature"
    }
  ],
  "fixtures": ["authData", "userData"],
  "summary": "Generated 5 API test cases for 3 features"
}
```
````

## Exit Condition

You may finish this subagent when:

- ✅ All 3 features have API test files
- ✅ All tests use Playwright Utils helpers
- ✅ All tests use data factories
- ✅ JSON output file written to /tmp/

Subagent complete. Main workflow will read output and proceed.

````

---

## Validation & Quality Assurance

### BMad Builder Validation

All 9 TEA workflows score **100%** on BMad Builder validation. Validation reports are stored in `src/workflows/testarch/*/validation-report-*.md`.

**Validation Criteria**:

- ✅ Clear, granular instructions (not too much context)
- ✅ Explicit exit conditions (LLM knows when to stop)
- ✅ Context injection (each step self-contained)
- ✅ Strict action boundaries (prevents improvisation)
- ✅ Subagent support (where applicable)

### Real-Project Testing

All 9 workflows tested with real projects:

- ✅ teach-me-testing: Tested multi-session flow with persisted progress
- ✅ test-design: Tested with real story/epic
- ✅ automate: Tested extensively with real codebases
- ✅ atdd: Tested TDD workflow (failing tests confirmed)
- ✅ test-review: Tested against known good/bad test suites
- ✅ nfr-assess: Tested with complex system
- ✅ trace: Tested coverage matrix + gate decision
- ✅ framework: Tested Playwright/Cypress scaffold
- ✅ ci: Tested GitHub Actions/GitLab CI generation

**Result**: 100% LLM compliance - no improvisation, consistent output.

---

## Maintaining Step Files

### When to Update Step Files

Update step files when:

1. **Knowledge fragments change**: Update fragment loading instructions
2. **New patterns emerge**: Add new requirements/patterns to steps
3. **LLM improvises**: Add stricter boundaries to prevent improvisation
4. **Performance issues**: Split steps further or add subagents
5. **User feedback**: Clarify ambiguous instructions

### Best Practices

1. **Keep steps granular**: 200-500 words per step (not 2000+)
2. **Repeat context**: Don't assume LLM remembers previous steps
3. **Be explicit**: "Generate 3-5 test cases" not "generate some tests"
4. **Forbid out-of-scope actions**: Explicitly list what NOT to do
5. **Test after changes**: Re-run BMad Builder validation after edits

### Anti-Patterns to Avoid

❌ **Too much context**: Steps >1000 words defeat the purpose
❌ **Vague instructions**: "Analyze codebase" - analyze what? how?
❌ **Missing exit conditions**: LLM doesn't know when to stop
❌ **Assumed knowledge**: Don't assume LLM remembers previous steps
❌ **Multiple tasks per step**: One step = one action only

---

## Performance Benefits

### Sequential vs Parallel Execution

**Before Step Files (Sequential)**:

- automate: ~10 minutes (API → E2E → fixtures → validate)
- test-review: ~5 minutes (5 quality checks sequentially)
- nfr-assess: ~12 minutes (4 NFR domains sequentially)

**After Step Files (Parallel Subagents)**:

- automate: ~5 minutes (API + E2E in parallel) - **50% faster**
- test-review: ~2 minutes (all checks in parallel) - **60% faster**
- nfr-assess: ~4 minutes (all domains in parallel) - **67% faster**

**Total time savings**: ~40-60% reduction in workflow execution time.

---

## User Experience

### What Users See

Users don't need to understand step-file architecture internals, but they benefit from:

1. **Consistent Output**: Same input → same output, every time
2. **Faster Workflows**: Parallel execution where possible
3. **Higher Quality**: Knowledge fragments enforced consistently
4. **Predictable Behavior**: No LLM improvisation or surprises

### Progress Indicators

When running workflows, users see:

```
✓ Step 1: Setup complete
✓ Step 2: Knowledge fragments loaded
⟳ Step 3: Generating tests (2 subagents running)
  ├── Subagent A: API tests... ✓
  └── Subagent B: E2E tests... ✓
✓ Step 4: Aggregating results
✓ Step 5: Validation complete
```

---

## Troubleshooting

### Common Issues

**Issue**: LLM still improvising despite step files

- **Diagnosis**: Step instructions too vague
- **Fix**: Add more explicit requirements and forbidden actions

**Issue**: Subagent output not aggregating correctly

- **Diagnosis**: Temp file path mismatch or JSON parsing error
- **Fix**: Check temp file naming convention, verify JSON format

**Issue**: Knowledge fragments not being used

- **Diagnosis**: Fragment loading instructions unclear
- **Fix**: Make fragment usage requirements more explicit

**Issue**: Workflow too slow despite subagents

- **Diagnosis**: Not enough parallelization
- **Fix**: Identify more independent steps for subagent pattern

---

## References

- **Subagent Architecture**: [subagent-architecture.md](./subagent-architecture.md)
- **Knowledge Base System**: [knowledge-base-system.md](./knowledge-base-system.md)
- **BMad Builder Validation Reports**: `src/workflows/testarch/*/validation-report-*.md`
- **TEA Workflow Examples**: `src/workflows/testarch/*/steps/*.md`

---

## Future Enhancements

1. **Dynamic Step Generation**: LLM generates custom step files based on workflow complexity
2. **Step Caching**: Cache step outputs for identical inputs (idempotent operations)
3. **Adaptive Granularity**: Automatically split steps if too complex
4. **Visual Step Editor**: GUI for creating/editing step files
5. **Step Templates**: Reusable step file templates for common patterns

---

**Status**: Production-ready, 100% LLM compliance achieved
**Validation**: All 9 workflows score 100% on BMad Builder validation
**Testing**: All 9 workflows tested with real projects, zero improvisation issues
**Next Steps**: Implement subagent patterns (see subagent-architecture.md)
````
