添加 claude code game studios 到项目

2026-05-15 14:52:29 +08:00
parent dff559462d
commit a16fe4bff7
415 changed files with 78609 additions and 0 deletions
--- a/Framework/CLAUDE.md
+++ b/Framework/CLAUDE.md
@@ -0,0 +1,93 @@
+# CCGS Skill Testing Framework — Claude Instructions
+
+This folder is the quality assurance layer for the Claude Code Game Studios skill/agent
+framework. It is self-contained and separate from any game project.
+
+## Key files
+
+| File | Purpose |
+|------|---------|
+| `catalog.yaml` | Master registry for all 73 skills and 49 agents. Contains category, spec path, and last-test tracking fields. Always read this first when running any test command. |
+| `quality-rubric.md` | Category-specific pass/fail metrics. Read the matching `###` section for the skill's category when running `/skill-test category`. |
+| `skills/[category]/[name].md` | Behavioral spec for a skill — 5 test cases + protocol compliance assertions. |
+| `agents/[tier]/[name].md` | Behavioral spec for an agent — 5 test cases + protocol compliance assertions. |
+| `templates/skill-test-spec.md` | Template for writing new skill spec files. |
+| `templates/agent-test-spec.md` | Template for writing new agent spec files. |
+| `results/` | Written by `/skill-test spec` when results are saved. Gitignored. |
+
+## Path conventions
+
+- Skill specs: `CCGS Skill Testing Framework/skills/[category]/[name].md`
+- Agent specs: `CCGS Skill Testing Framework/agents/[tier]/[name].md`
+- Catalog: `CCGS Skill Testing Framework/catalog.yaml`
+- Rubric: `CCGS Skill Testing Framework/quality-rubric.md`
+
+The `spec:` field in `catalog.yaml` is the authoritative path for each skill/agent spec.
+Always read it rather than guessing the path.
+
+## Skill categories
+
+```
+gate        → gate-check
+review      → design-review, architecture-review, review-all-gdds
+authoring   → design-system, quick-design, architecture-decision, art-bible,
+              create-architecture, ux-design, ux-review
+readiness   → story-readiness, story-done
+pipeline    → create-epics, create-stories, dev-story, create-control-manifest,
+              propagate-design-change, map-systems
+analysis    → consistency-check, balance-check, content-audit, code-review,
+              tech-debt, scope-check, estimate, perf-profile, asset-audit,
+              security-audit, test-evidence-review, test-flakiness
+team        → team-combat, team-narrative, team-audio, team-level, team-ui,
+              team-qa, team-release, team-polish, team-live-ops
+sprint      → sprint-plan, sprint-status, milestone-review, retrospective,
+              changelog, patch-notes
+utility     → all remaining skills
+```
+
+## Agent tiers
+
+```
+directors   → creative-director, technical-director, producer, art-director
+leads       → lead-programmer, narrative-director, audio-director, ux-designer,
+              qa-lead, release-manager, localization-lead
+specialists → gameplay-programmer, engine-programmer, ui-programmer,
+              tools-programmer, network-programmer, ai-programmer,
+              level-designer, sound-designer, technical-artist
+godot       → godot-specialist, godot-gdscript-specialist, godot-csharp-specialist,
+              godot-shader-specialist, godot-gdextension-specialist
+unity       → unity-specialist, unity-ui-specialist, unity-shader-specialist,
+              unity-dots-specialist, unity-addressables-specialist
+unreal      → unreal-specialist, ue-gas-specialist, ue-replication-specialist,
+              ue-umg-specialist, ue-blueprint-specialist
+operations  → devops-engineer, security-engineer, performance-analyst,
+              analytics-engineer, community-manager
+creative    → writer, world-builder, game-designer, economy-designer,
+              systems-designer, prototyper
+```
+
+## Workflow for testing a skill
+
+1. Read `catalog.yaml` to get the skill's `spec:` path and `category:`
+2. Read the skill at `.claude/skills/[name]/SKILL.md`
+3. Read the spec at the `spec:` path
+4. Evaluate assertions case by case
+5. Offer to write results to `results/` and update `catalog.yaml`
+
+## Workflow for improving a skill
+
+Use `/skill-improve [name]`. It handles the full loop:
+test → diagnose → propose fix → rewrite → retest → keep or revert.
+
+## Spec validity note
+
+Specs in this folder describe **current behavior**, not ideal behavior. They were
+written by reading the skills, so they may encode bugs. When a skill misbehaves in
+practice, correct the skill first, then update the spec to match the fixed behavior.
+Treat spec failures as "this needs investigation," not "the skill is definitively wrong."
+
+## This folder is deletable
+
+Nothing in `.claude/` imports from here. Deleting this folder has no effect on the
+CCGS skills or agents themselves. `/skill-test` and `/skill-improve` will report that
+`catalog.yaml` is missing and guide the user to initialize it.
--- a/Framework/README.md
+++ b/Framework/README.md
@@ -0,0 +1,150 @@
+# CCGS Skill Testing Framework
+
+Quality assurance infrastructure for the **Claude Code Game Studios** framework.
+Tests the skills and agents themselves — not any game built with them.
+
+> **This folder is self-contained and optional.**
+> Game developers using CCGS don't need it. To remove it entirely:
+> `rm -rf "CCGS Skill Testing Framework"` — nothing in `.claude/` depends on it.
+
+---
+
+## What's in here
+
+```
+CCGS Skill Testing Framework/
+├── README.md              ← you are here
+├── CLAUDE.md              ← tells Claude how to use this framework
+├── catalog.yaml           ← master registry: all 73 skills + 49 agents, coverage tracking
+├── quality-rubric.md      ← category-specific pass/fail metrics for /skill-test category
+│
+├── skills/                ← behavioral spec files for skills (one per skill)
+│   ├── gate/              ← gate category specs
+│   ├── review/            ← review category specs
+│   ├── authoring/         ← authoring category specs
+│   ├── readiness/         ← readiness category specs
+│   ├── pipeline/          ← pipeline category specs
+│   ├── analysis/          ← analysis category specs
+│   ├── team/              ← team category specs
+│   ├── sprint/            ← sprint category specs
+│   └── utility/           ← utility category specs
+│
+├── agents/                ← behavioral spec files for agents (one per agent)
+│   ├── directors/         ← creative-director, technical-director, producer, art-director
+│   ├── leads/             ← lead-programmer, narrative-director, audio-director, etc.
+│   ├── specialists/       ← engine/code/shader/UI specialists
+│   ├── godot/             ← Godot-specific specialists
+│   ├── unity/             ← Unity-specific specialists
+│   ├── unreal/            ← Unreal-specific specialists
+│   ├── operations/        ← QA, live-ops, release, localization, etc.
+│   └── creative/          ← writer, world-builder, game-designer, etc.
+│
+├── templates/             ← spec file templates for writing new specs
+│   ├── skill-test-spec.md ← template for skill behavioral specs
+│   └── agent-test-spec.md ← template for agent behavioral specs
+│
+└── results/               ← test run outputs (written by /skill-test spec, gitignored)
+```
+
+---
+
+## How to use it
+
+All testing is driven by two skills already in the framework:
+
+### Check structural compliance
+
+```
+/skill-test static [skill-name]     # Check one skill (7 checks)
+/skill-test static all              # Check all 73 skills
+```
+
+### Run a behavioral spec test
+
+```
+/skill-test spec gate-check         # Evaluate a skill against its written spec
+/skill-test spec design-review
+```
+
+### Check against category rubric
+
+```
+/skill-test category gate-check     # Evaluate one skill against its category metrics
+/skill-test category all            # Run rubric checks across all categorized skills
+```
+
+### See full coverage picture
+
+```
+/skill-test audit                   # Skills + agents: has-spec, last tested, result
+```
+
+### Improve a failing skill
+
+```
+/skill-improve gate-check           # Test → diagnose → propose fix → retest loop
+```
+
+---
+
+## Skill categories
+
+| Category | Skills | Key metrics |
+|----------|--------|-------------|
+| `gate` | gate-check | Review mode read, full/lean/solo director panel, no auto-advance |
+| `review` | design-review, architecture-review, review-all-gdds | Read-only, 8-section check, correct verdicts |
+| `authoring` | design-system, quick-design, art-bible, create-architecture, … | Section-by-section May-I-write, skeleton-first |
+| `readiness` | story-readiness, story-done | Blockers surfaced, director gate in full mode |
+| `pipeline` | create-epics, create-stories, dev-story, map-systems, … | Upstream dependency check, handoff path clear |
+| `analysis` | consistency-check, balance-check, code-review, tech-debt, … | Read-only report, verdict keyword, no writes |
+| `team` | team-combat, team-narrative, team-audio, … | All required agents spawned, blocked surfaced |
+| `sprint` | sprint-plan, sprint-status, milestone-review, … | Reads sprint data, status keywords present |
+| `utility` | start, adopt, hotfix, localize, setup-engine, … | Passes static checks |
+
+---
+
+## Agent tiers
+
+| Tier | Agents |
+|------|--------|
+| `directors` | creative-director, technical-director, producer, art-director |
+| `leads` | lead-programmer, narrative-director, audio-director, ux-designer, qa-lead, release-manager, localization-lead |
+| `specialists` | gameplay-programmer, engine-programmer, ui-programmer, tools-programmer, network-programmer, ai-programmer, level-designer, sound-designer, technical-artist |
+| `godot` | godot-specialist, godot-gdscript-specialist, godot-csharp-specialist, godot-shader-specialist, godot-gdextension-specialist |
+| `unity` | unity-specialist, unity-ui-specialist, unity-shader-specialist, unity-dots-specialist, unity-addressables-specialist |
+| `unreal` | unreal-specialist, ue-gas-specialist, ue-replication-specialist, ue-umg-specialist, ue-blueprint-specialist |
+| `operations` | devops-engineer, security-engineer, performance-analyst, analytics-engineer, community-manager |
+| `creative` | writer, world-builder, game-designer, economy-designer, systems-designer, prototyper |
+
+---
+
+## Updating the catalog
+
+`catalog.yaml` tracks test coverage for every skill and agent. After running a test:
+
+- `/skill-test spec [name]` will offer to update `last_spec` and `last_spec_result`
+- `/skill-test category [name]` will offer to update `last_category` and `last_category_result`
+- `last_static` and `last_static_result` are updated manually or via `/skill-improve`
+
+---
+
+## Writing a new spec
+
+1. Find the spec template at `templates/skill-test-spec.md`
+2. Copy it to `skills/[category]/[skill-name].md`
+3. Update the `spec:` field in `catalog.yaml` to point to the new file
+4. Run `/skill-test spec [skill-name]` to validate it
+
+---
+
+## Removing this framework
+
+This folder has no hooks into the main project. To remove:
+
+```bash
+rm -rf "CCGS Skill Testing Framework"
+```
+
+The skills `/skill-test` and `/skill-improve` will still function — they'll simply
+report that `catalog.yaml` is missing and suggest running `/skill-test audit` to
+initialize it.
--- a/Framework/agents/directors/art-director.md
+++ b/Framework/agents/directors/art-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: art-director
+
+## Agent Summary
+**Domain owned:** Visual identity, art bible authorship and enforcement, asset quality standards, UI/UX visual design, visual phase gate, concept art evaluation.
+**Does NOT own:** UX interaction flows and information architecture (ux-designer's domain), audio direction (audio-director), code implementation.
+**Model tier:** Sonnet (note: despite the "director" title, art-director is assigned Sonnet per coordination-rules.md — it handles individual system analysis, not multi-document phase gate synthesis at the Opus level).
+**Gate IDs handled:** AD-CONCEPT-VISUAL, AD-ART-BIBLE, AD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/art-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references visual identity, art bible, asset standards — not generic)
+- [ ] `allowed-tools:` list is read-focused; image review capability if supported; no Bash unless asset pipeline checks are justified
+- [ ] Model tier is `claude-sonnet-4-6` (NOT Opus — coordination-rules.md assigns Sonnet to art-director)
+- [ ] Agent definition does not claim authority over UX interaction flows or audio direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** The art bible's color palette section is submitted for review. The section defines a desaturated earth-tone primary palette with high-contrast accent colors tied to the game pillar "beauty in decay." The palette is internally consistent and references the pillar vocabulary. Request is tagged AD-ART-BIBLE.
+**Expected:** Returns `AD-ART-BIBLE: APPROVE` with rationale confirming the palette's internal consistency and its alignment with the stated pillar.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `AD-ART-BIBLE: APPROVE`
+- [ ] Rationale references the specific palette characteristics and pillar alignment — not generic art advice
+- [ ] Output stays within visual domain — does not comment on UX interaction patterns or audio mood
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Sound designer asks art-director to specify how ambient audio should layer and duck when the player enters a combat zone.
+**Expected:** Agent declines to define audio behavior and redirects to audio-director.
+**Assertions:**
+- [ ] Does not make any binding decision about audio layering or ducking behavior
+- [ ] Explicitly names `audio-director` as the correct handler
+- [ ] May note if the audio has visual mood implications (e.g., "the audio should match the visual tension of the zone"), but defers all audio specification to audio-director
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** Concept art for the protagonist is submitted. The art uses a vivid, saturated color palette (primary: #FF4500, #00BFFF) that directly contradicts the established art bible's "desaturated earth-tones" palette specification. Request is tagged AD-CONCEPT-VISUAL.
+**Expected:** Returns `AD-CONCEPT-VISUAL: CONCERNS` with specific citation of the palette discrepancy, referencing the art bible's stated palette values versus the submitted concept's palette.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `AD-CONCEPT-VISUAL: CONCERNS`
+- [ ] Rationale specifically identifies the palette conflict — not a generic "doesn't match style" comment
+- [ ] References the art bible as the authoritative source for the correct palette
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** ux-designer proposes using high-contrast, brightly colored icons for the HUD to improve readability. art-director believes this violates the art bible's muted visual language and would undermine the visual identity.
+**Expected:** art-director states the visual identity concern and references the art bible, acknowledges ux-designer's readability goal as legitimate, and escalates to creative-director to arbitrate the trade-off between visual coherence and usability.
+**Assertions:**
+- [ ] Escalates to `creative-director` (shared parent for creative domain conflicts)
+- [ ] Does not unilaterally override ux-designer's readability recommendation
+- [ ] Clearly frames the conflict as a trade-off between two legitimate goals
+- [ ] References the specific art bible rule being violated
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the existing art bible with specific palette values (primary: #8B7355, #6B6B47; accent: #C8A96E) and style rules ("no pure white, no pure black; all shadows have warm undertones"). A new asset is submitted for review.
+**Expected:** Assessment references the specific hex values and style rules from the provided art bible, not generic color theory advice. Any concerns are tied to specific violations of the provided rules.
+**Assertions:**
+- [ ] References specific palette values from the provided art bible context
+- [ ] Applies the specific style rules (no pure white/black, warm shadow undertones) from the provided document
+- [ ] Does not generate generic art direction feedback disconnected from the supplied art bible
+- [ ] Verdict rationale is traceable to specific lines or rules in the provided context
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared visual domain
+- [ ] Escalates UX-vs-visual conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `AD-ART-BIBLE: APPROVE`) not inline prose verdicts
+- [ ] Does not make binding UX interaction, audio, or code implementation decisions
+
+---
+
+## Coverage Notes
+- AD-PHASE-GATE (full visual phase advancement) is not covered — deferred to integration with /gate-check skill.
+- Asset pipeline standards (file format, resolution, naming conventions) compliance checks are not covered here.
+- Shader visual output review is not covered — that interaction with the engine specialist is deferred.
+- UI component visual review (as distinct from UX flow review) could benefit from additional cases.
--- a/Framework/agents/directors/creative-director.md
+++ b/Framework/agents/directors/creative-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: creative-director
+
+## Agent Summary
+**Domain owned:** Creative vision, game pillars, GDD alignment, systems decomposition feedback, narrative direction, playtest feedback interpretation, phase gate (creative aspect).
+**Does NOT own:** Technical architecture or implementation details (delegates to technical-director), production scheduling (producer), visual art style execution (delegates to art-director).
+**Model tier:** Opus (multi-document synthesis, high-stakes phase gate verdicts).
+**Gate IDs handled:** CD-PILLARS, CD-GDD-ALIGN, CD-SYSTEMS, CD-NARRATIVE, CD-PLAYTEST, CD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/creative-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references creative vision, pillars, GDD alignment — not generic)
+- [ ] `allowed-tools:` list is read-heavy; should not include Bash unless justified by a creative workflow need
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over technical architecture or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A game concept document is submitted for pillar review. The concept describes a narrative survival game built around three pillars: "emergent stories," "meaningful sacrifice," and "lived-in world." Request is tagged CD-PILLARS.
+**Expected:** Returns `CD-PILLARS: APPROVE` with rationale citing how each pillar is represented in the concept and any reinforcing or weakening signals found in the document.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `CD-PILLARS: APPROVE` (gate ID prefix, colon, verdict keyword)
+- [ ] Rationale references the three specific pillars by name, not generic creative advice
+- [ ] Output stays within creative scope — does not comment on engine feasibility or sprint schedule
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Developer asks creative-director to review a proposed PostgreSQL schema for storing player save data.
+**Expected:** Agent declines to evaluate the schema and redirects to technical-director.
+**Assertions:**
+- [ ] Does not make any binding decision about the schema design
+- [ ] Explicitly names `technical-director` as the correct handler
+- [ ] May note whether the data model has creative implications (e.g., what player data is tracked), but defers structural decisions entirely
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A GDD for the "Crafting" system is submitted. Section 4 (Formulas) defines a resource decay formula that punishes exploration — contradicting the Player Fantasy section which calls for "freedom to roam without fear." Request is tagged CD-GDD-ALIGN.
+**Expected:** Returns `CD-GDD-ALIGN: CONCERNS` with specific citation of the contradiction between the formula behavior and the Player Fantasy statement.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `CD-GDD-ALIGN: CONCERNS`
+- [ ] Rationale quotes or directly references GDD Section 4 (Formulas) and the Player Fantasy section
+- [ ] Does not prescribe a specific formula fix — that belongs to systems-designer
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** technical-director raises a concern that the core loop mechanic (real-time branching conversations) is prohibitively expensive to implement and recommends cutting it. creative-director disagrees on creative grounds.
+**Expected:** creative-director acknowledges the technical constraint, does not override technical-director's feasibility assessment, but retains authority to define what the creative goal is. For the conflict itself, creative-director is the top-level creative escalation point and defers to technical-director on implementation feasibility while advocating for the design intent. The resolution path is for both to jointly present trade-off options to the user.
+**Assertions:**
+- [ ] Does not unilaterally override technical-director's feasibility concern
+- [ ] Clearly separates "what we want creatively" from "how it gets built"
+- [ ] Proposes presenting trade-offs to the user rather than resolving unilaterally
+- [ ] Does not claim to own implementation decisions
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game pillars document (`design/gdd/pillars.md`) and a new mechanic spec for review. The pillars document defines "player authorship," "consequence permanence," and "world responsiveness" as the three core pillars.
+**Expected:** Assessment uses the exact pillar vocabulary from the provided document, not generic creative heuristics. Any approval or concern is tied back to one or more of the three named pillars.
+**Assertions:**
+- [ ] Uses the exact pillar names from the provided context document
+- [ ] Does not generate generic creative feedback disconnected from the supplied pillars
+- [ ] References the specific pillar(s) most relevant to the mechanic under review
+- [ ] Does not reference pillars not present in the provided document
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared creative domain
+- [ ] Escalates conflicts by presenting trade-offs to user rather than unilateral override
+- [ ] Uses gate IDs in output (e.g., `CD-PILLARS: APPROVE`) not inline prose verdicts
+- [ ] Does not make binding cross-domain decisions (technical, production, art execution)
+
+---
+
+## Coverage Notes
+- Multi-gate scenario (e.g., single submission triggering both CD-PILLARS and CD-GDD-ALIGN) is not covered here — deferred to integration tests.
+- CD-PHASE-GATE (full phase advancement) involves synthesizing multiple sub-gate results; this complex case is deferred.
+- Playtest report interpretation (CD-PLAYTEST) is not covered — a dedicated case should be added when the playtest-report skill produces structured output.
+- Interaction with art-director on visual-pillar alignment is not covered.
--- a/Framework/agents/directors/producer.md
+++ b/Framework/agents/directors/producer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: producer
+
+## Agent Summary
+**Domain owned:** Scope management, sprint planning validation, milestone tracking, epic prioritization, production phase gate.
+**Does NOT own:** Game design decisions (creative-director / game-designer), technical architecture (technical-director), creative direction.
+**Model tier:** Opus (multi-document synthesis, high-stakes phase gate verdicts).
+**Gate IDs handled:** PR-SCOPE, PR-SPRINT, PR-MILESTONE, PR-EPIC, PR-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/producer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references scope, sprint, milestone, production — not generic)
+- [ ] `allowed-tools:` list is primarily read-focused; Bash only if sprint/milestone files require parsing
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over design decisions or technical architecture
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A sprint plan is submitted for Sprint 7. The plan includes 12 story points across 4 team members over 2 weeks. Historical velocity from the last 3 sprints averages 11.5 points. Request is tagged PR-SPRINT.
+**Expected:** Returns `PR-SPRINT: REALISTIC` with rationale noting the plan is within one standard deviation of historical velocity and capacity appears matched.
+**Assertions:**
+- [ ] Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC
+- [ ] Verdict token is formatted as `PR-SPRINT: REALISTIC`
+- [ ] Rationale references the specific story point count and historical velocity figures
+- [ ] Output stays within production scope — does not comment on whether the stories are well-designed or technically sound
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Team member asks producer to evaluate whether the game's "weight-based inventory" mechanic feels fun and engaging.
+**Expected:** Agent declines to evaluate game feel and redirects to game-designer or creative-director.
+**Assertions:**
+- [ ] Does not make any binding assessment of the mechanic's design quality
+- [ ] Explicitly names `game-designer` or `creative-director` as the correct handler
+- [ ] May note if the mechanic's scope has production implications (e.g., dependencies on other systems), but defers all design evaluation
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A new feature proposal adds three new systems (crafting, weather, and faction reputation) to a milestone that was scoped for two systems only. None of these additions appear in the current milestone plan. Request is tagged PR-SCOPE.
+**Expected:** Returns `PR-SCOPE: CONCERNS` with specific identification of the three unplanned systems and their absence from the milestone scope document.
+**Assertions:**
+- [ ] Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC — not freeform text
+- [ ] Verdict token is formatted as `PR-SCOPE: CONCERNS`
+- [ ] Rationale names the three specific systems being added out of scope
+- [ ] Does not evaluate whether the systems are good design — only whether they fit the plan
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants to add a late-breaking mechanic (dynamic weather affecting all gameplay systems) that technical-director warns will require 3 additional sprints. game-designer and technical-director are in disagreement about whether to proceed.
+**Expected:** Producer does not take a side on whether the mechanic is worth adding (design decision) or feasible (technical decision). Producer quantifies the production impact (3 sprints of delay, milestone slip risk), presents the trade-off to the user, and follows coordination-rules.md conflict resolution: escalate to the shared parent (in this case, surface the conflict for user decision since creative-director and technical-director are both top-tier).
+**Assertions:**
+- [ ] Quantifies the production impact in concrete terms (sprint count, milestone date slip)
+- [ ] Does not make a binding design or technical decision
+- [ ] Surfaces the conflict to the user with the scope implications clearly stated
+- [ ] References coordination-rules.md conflict resolution protocol (escalate to shared parent or user)
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the current milestone deadline (8 weeks away) and velocity data from the last 4 sprints (8, 10, 9, 11 points). A sprint plan is submitted with 14 story points.
+**Expected:** Assessment uses the provided velocity data to project whether 14 points is achievable, and references the 8-week milestone window to assess whether the current sprint's scope leaves adequate buffer.
+**Assertions:**
+- [ ] Uses the specific velocity figures from the provided context (not generic estimates)
+- [ ] References the 8-week deadline in the capacity assessment
+- [ ] Calculates or estimates remaining sprint count within the milestone window
+- [ ] Does not give generic scope advice disconnected from the supplied deadline and velocity data
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using REALISTIC / CONCERNS / UNREALISTIC vocabulary only
+- [ ] Stays within declared production domain
+- [ ] Escalates design/technical conflicts by quantifying scope impact and presenting to user
+- [ ] Uses gate IDs in output (e.g., `PR-SPRINT: REALISTIC`) not inline prose verdicts
+- [ ] Does not make binding game design or technical architecture decisions
+
+---
+
+## Coverage Notes
+- PR-EPIC (epic-level prioritization) is not covered — a dedicated case should be added when the /create-epics skill produces structured epic documents.
+- PR-MILESTONE (milestone health review) is not covered — deferred to integration with /milestone-review skill.
+- PR-PHASE-GATE (full production phase advancement) involving synthesis of multiple sub-gate results is deferred.
+- Multi-sprint burn-down and velocity trend analysis are not covered here.
--- a/Framework/agents/directors/technical-director.md
+++ b/Framework/agents/directors/technical-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: technical-director
+
+## Agent Summary
+**Domain owned:** System architecture decisions, technical feasibility assessment, ADR oversight and approval, engine risk evaluation, technical phase gate.
+**Does NOT own:** Game design decisions (creative-director / game-designer), creative direction, visual art style, production scheduling (producer).
+**Model tier:** Opus (multi-document synthesis, high-stakes architecture and phase gate verdicts).
+**Gate IDs handled:** TD-SYSTEM-BOUNDARY, TD-FEASIBILITY, TD-ARCHITECTURE, TD-ADR, TD-ENGINE-RISK, TD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/technical-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references architecture, feasibility, ADR — not generic)
+- [ ] `allowed-tools:` list may include Read for architecture documents; Bash only if required for technical checks
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over game design decisions or creative direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** An architecture document for the "Combat System" is submitted. It describes a layered design: input layer → game logic layer → presentation layer, with clearly defined interfaces between each. Request is tagged TD-ARCHITECTURE.
+**Expected:** Returns `TD-ARCHITECTURE: APPROVE` with rationale confirming that system boundaries are correctly separated and interfaces are well-defined.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `TD-ARCHITECTURE: APPROVE`
+- [ ] Rationale specifically references the layered structure and interface definitions — not generic architecture advice
+- [ ] Output stays within technical scope — does not comment on whether the mechanic is fun or fits the creative vision
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Writer asks technical-director to review and approve the dialogue scripts for the game's opening cutscene.
+**Expected:** Agent declines to evaluate dialogue quality and redirects to narrative-director.
+**Assertions:**
+- [ ] Does not make any binding decision about the dialogue content or structure
+- [ ] Explicitly names `narrative-director` as the correct handler
+- [ ] May note technical constraints that affect dialogue (e.g., localization string limits, data format), but defers all content decisions
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A proposed multiplayer mechanic requires raycasting against all active entities every frame to detect line-of-sight. At expected player counts (1000 entities in a large zone), this is O(n²) per frame. Request is tagged TD-FEASIBILITY.
+**Expected:** Returns `TD-FEASIBILITY: CONCERNS` with specific citation of the O(n²) complexity and the entity count that makes this infeasible at target framerate.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `TD-FEASIBILITY: CONCERNS`
+- [ ] Rationale includes the specific algorithmic complexity concern and the entity count threshold
+- [ ] Suggests at least one alternative approach (e.g., spatial partitioning, interest management) without mandating which to choose
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants to add a real-time physics simulation for every inventory item (hundreds of items on screen simultaneously). technical-director assesses this as technically expensive and proposes simplifying the simulation. game-designer disagrees, arguing it is essential to the game feel.
+**Expected:** technical-director clearly states the technical cost and constraints, proposes alternative implementation approaches that could achieve a similar feel, but explicitly defers the final design priority decision to creative-director as the arbiter of player experience trade-offs.
+**Assertions:**
+- [ ] Expresses the technical concern with specifics (e.g., performance budget, estimated cost)
+- [ ] Proposes at least one alternative that could reduce cost while preserving intent
+- [ ] Explicitly defers the "is this worth the cost" decision to creative-director — does not unilaterally cut the feature
+- [ ] Does not claim authority to override game-designer's design intent
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the target platform constraints: mobile, 60fps target, 2GB RAM ceiling, no compute shaders. A proposed architecture includes a GPU-driven rendering pipeline.
+**Expected:** Assessment references the specific hardware constraints from the context, identifies the compute shader dependency as incompatible with the stated platform constraints, and returns a CONCERNS or REJECT verdict with those specifics cited.
+**Assertions:**
+- [ ] References the specific platform constraints provided (mobile, 2GB RAM, no compute shaders)
+- [ ] Does not give generic performance advice disconnected from the supplied constraints
+- [ ] Correctly identifies the architectural component that conflicts with the platform constraint
+- [ ] Verdict includes rationale tied to the provided context, not boilerplate warnings
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared technical domain
+- [ ] Defers design priority conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `TD-FEASIBILITY: CONCERNS`) not inline prose verdicts
+- [ ] Does not make binding game design or creative direction decisions
+
+---
+
+## Coverage Notes
+- TD-ADR (Architecture Decision Record approval) is not covered — a dedicated case should be added when the /architecture-decision skill produces ADR documents.
+- TD-ENGINE-RISK assessment for specific engine versions (e.g., Godot 4.6 post-cutoff APIs) is not covered — deferred to engine-specialist integration tests.
+- TD-PHASE-GATE (full technical phase advancement) involving synthesis of multiple sub-gate results is deferred.
+- Multi-domain architecture reviews (e.g., touching both TD-ARCHITECTURE and TD-ENGINE-RISK simultaneously) are not covered here.
--- a/Framework/agents/engine/godot/godot-csharp-specialist.md
+++ b/Framework/agents/engine/godot/godot-csharp-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: godot-csharp-specialist
+
+## Agent Summary
+Domain: C# patterns in Godot 4, .NET idioms applied to Godot, [Export] attribute usage, signal delegates, and async/await patterns.
+Does NOT own: GDScript code (gdscript-specialist), GDExtension C/C++ bindings (gdextension-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references C# in Godot 4 / .NET patterns / signal delegates)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over GDScript or GDExtension code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an export property for enemy health with validation that clamps it between 1 and 1000."
+**Expected behavior:**
+- Produces a C# property with `[Export]` attribute
+- Uses a backing field with a property getter/setter that clamps the value in the setter
+- Does NOT use a raw `[Export]` public field without validation
+- Follows Godot 4 C# naming conventions (PascalCase for properties, fields private with underscore prefix)
+- Includes XML doc comment on the property per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Rewrite this enemy health system in GDScript."
+**Expected behavior:**
+- Does NOT produce GDScript code
+- Explicitly states that GDScript authoring belongs to `godot-gdscript-specialist`
+- Redirects the request to `godot-gdscript-specialist`
+- May note that the C# interface can be described so the gdscript-specialist knows the expected API shape
+
+### Case 3: Async signal awaiting
+**Input:** "Wait for an animation to finish before transitioning game state using C# async."
+**Expected behavior:**
+- Produces a proper `async Task` pattern using `ToSignal()` to await a Godot signal
+- Uses `await ToSignal(animationPlayer, AnimationPlayer.SignalName.AnimationFinished)`
+- Does NOT use `Thread.Sleep()` or `Task.Delay()` as a polling substitute
+- Notes that the calling method must be `async` and that fire-and-forget `async void` is only acceptable for event handlers
+- Handles cancellation or timeout if the animation could fail to fire
+
+### Case 4: Threading model conflict
+**Input:** "This C# code accesses a Godot Node from a background Task thread to update its position."
+**Expected behavior:**
+- Flags this as a race condition risk: Godot nodes are not thread-safe and must only be accessed from the main thread
+- Does NOT approve or implement the multi-threaded node access pattern
+- Provides the correct pattern: use `CallDeferred()`, `Callable.From().CallDeferred()`, or marshal back to the main thread via a thread-safe queue
+- Explains the distinction between Godot's main thread requirement and .NET's thread-agnostic types
+
+### Case 5: Context pass — Godot 4.6 API correctness
+**Input:** Engine version context: Godot 4.6. Request: "Connect a signal using the new typed signal delegate pattern."
+**Expected behavior:**
+- Produces C# signal connection using the typed delegate pattern introduced in Godot 4 C# (`+=` operator on typed signal)
+- Checks the 4.6 context to confirm no breaking changes to the signal delegate API in 4.4, 4.5, or 4.6
+- Does NOT use the old string-based `Connect("signal_name", callable)` pattern (deprecated in Godot 4 C#)
+- Produces code compatible with the project's pinned 4.6 version as documented in VERSION.md
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (C# in Godot 4 — patterns, exports, signals, async)
+- [ ] Redirects GDScript requests to godot-gdscript-specialist
+- [ ] Redirects GDExtension requests to godot-gdextension-specialist
+- [ ] Returns C# code following Godot 4 conventions (not Unity MonoBehaviour patterns)
+- [ ] Flags multi-threaded Godot node access as unsafe and provides the correct pattern
+- [ ] Uses typed signal delegates — not deprecated string-based Connect() calls
+- [ ] Checks engine version reference for API changes before producing code
+
+---
+
+## Coverage Notes
+- Export property with validation (Case 1) should have a unit test verifying the clamp behavior
+- Threading conflict (Case 4) is safety-critical: the agent must identify and fix this without prompting
+- Async signal (Case 3) verifies the agent applies .NET idioms correctly within Godot's single-thread constraint
--- a/Framework/agents/engine/godot/godot-gdextension-specialist.md
+++ b/Framework/agents/engine/godot/godot-gdextension-specialist.md
@@ -0,0 +1,86 @@
+# Agent Test Spec: godot-gdextension-specialist
+
+## Agent Summary
+Domain: GDExtension API, godot-cpp C++ bindings, godot-rust bindings, native library integration, and native performance optimization.
+Does NOT own: GDScript code (gdscript-specialist), shader code (godot-shader-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GDExtension / godot-cpp / native bindings)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over GDScript or shader authoring
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Expose a C++ rigid-body physics simulation library to GDScript via GDExtension."
+**Expected behavior:**
+- Produces a GDExtension binding pattern using godot-cpp:
+  - Class inheriting from `godot::Object` or an appropriate Godot base class
+  - `GDCLASS` macro registration
+  - `_bind_methods()` implementation exposing the physics API to GDScript
+  - `GDExtension` entry point (`gdextension_init`) setup
+- Notes the `.gdextension` manifest file format required
+- Does NOT produce the GDScript usage code (that belongs to gdscript-specialist)
+
+### Case 2: Out-of-domain redirect
+**Input:** "Write the GDScript that calls the physics simulation from Case 1."
+**Expected behavior:**
+- Does NOT produce GDScript code
+- Explicitly states that GDScript authoring belongs to `godot-gdscript-specialist`
+- Redirects to `godot-gdscript-specialist`
+- May describe the API surface the GDScript should call (method names, parameter types) as a handoff spec
+
+### Case 3: ABI compatibility risk — minor version update
+**Input:** "We're upgrading from Godot 4.5 to 4.6. Will our existing GDExtension still work?"
+**Expected behavior:**
+- Flags the ABI compatibility concern: GDExtension binaries may not be ABI-compatible across minor versions
+- Directs to check the 4.5→4.6 migration guide for GDExtension API changes
+- Recommends recompiling the extension against the 4.6 godot-cpp headers rather than assuming binary compatibility
+- Notes that the `.gdextension` manifest may need a `compatibility_minimum` version update
+- Provides the recompilation checklist
+
+### Case 4: Memory management — RAII for Godot objects
+**Input:** "How should we manage the lifecycle of Godot objects created inside C++ GDExtension code?"
+**Expected behavior:**
+- Produces the RAII-based lifecycle pattern for Godot objects in GDExtension:
+  - `Ref<T>` for reference-counted objects (auto-released when Ref goes out of scope)
+  - `memnew()` / `memdelete()` for non-reference-counted objects
+  - Warning: do NOT use `new`/`delete` for Godot objects — undefined behavior
+- Notes object ownership rules: who is responsible for freeing a node added to the scene tree
+- Provides a concrete example managing a `CollisionShape3D` created in C++
+
+### Case 5: Context pass — Godot 4.6 GDExtension API check
+**Input:** Engine version context: Godot 4.6 (upgrading from 4.5). Request: "Check if any GDExtension APIs changed from 4.5 to 4.6."
+**Expected behavior:**
+- References the 4.5→4.6 migration guide from the VERSION.md verified sources list
+- Reports on any documented GDExtension API changes in the 4.6 release
+- If no breaking changes are documented for GDExtension in 4.6, states that explicitly with the caveat to verify against the official changelog
+- Flags the D3D12 default on Windows (4.6 change) as potentially relevant for GDExtension rendering code
+- Provides a checklist of what to verify after upgrading
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GDExtension, godot-cpp, godot-rust, native bindings)
+- [ ] Redirects GDScript authoring to godot-gdscript-specialist
+- [ ] Redirects shader authoring to godot-shader-specialist
+- [ ] Returns structured output (binding patterns, RAII examples, ABI checklists)
+- [ ] Flags ABI compatibility risks on minor version upgrades — never assumes binary compatibility
+- [ ] Uses Godot-specific memory management (`memnew`/`memdelete`, `Ref<T>`) not raw C++ new/delete
+- [ ] Checks engine version reference for GDExtension API changes before confirming compatibility
+
+---
+
+## Coverage Notes
+- Binding pattern (Case 1) should include a smoke test verifying the extension loads and the method is callable from GDScript
+- ABI risk (Case 3) is a critical escalation path — the agent must not approve shipping an unverified extension binary
+- Memory management (Case 4) verifies the agent applies Godot-specific patterns, not generic C++ RAII
--- a/Framework/agents/engine/godot/godot-gdscript-specialist.md
+++ b/Framework/agents/engine/godot/godot-gdscript-specialist.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: godot-gdscript-specialist
+
+## Agent Summary
+Domain: GDScript static typing, design patterns in GDScript, signal architecture, coroutine/await patterns, and GDScript performance.
+Does NOT own: shader code (godot-shader-specialist), GDExtension bindings (godot-gdextension-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GDScript / static typing / signals / coroutines)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over shader code or GDExtension
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review this GDScript file for type annotation coverage."
+**Expected behavior:**
+- Reads the provided GDScript file
+- Flags every variable, parameter, and return type that is missing a static type annotation
+- Produces a list of specific line-by-line findings: `var speed = 5.0` → `var speed: float = 5.0`
+- Notes the performance and tooling benefits of static typing in Godot 4
+- Does NOT rewrite the entire file unprompted — produces a findings list for the developer to apply
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Write a vertex shader to distort the mesh in world space."
+**Expected behavior:**
+- Does NOT produce shader code in GDScript or in Godot's shading language
+- Explicitly states that shader authoring belongs to `godot-shader-specialist`
+- Redirects the request to `godot-shader-specialist`
+- May note that the GDScript side (passing uniforms to a shader, setting shader parameters) is within its domain
+
+### Case 3: Async loading with coroutines
+**Input:** "Load a scene asynchronously and wait for it to finish before spawning it."
+**Expected behavior:**
+- Produces an `await` + `ResourceLoader.load_threaded_request` pattern for Godot 4
+- Uses static typing throughout (`var scene: PackedScene`)
+- Handles the completion check with `ResourceLoader.load_threaded_get_status()`
+- Notes error handling for failed loads
+- Does NOT use deprecated Godot 3 `yield()` syntax
+
+### Case 4: Performance issue — typed array recommendation
+**Input:** "The entity update loop is slow; it iterates an untyped Array of 1,000 nodes every frame."
+**Expected behavior:**
+- Identifies that an untyped `Array` foregoes compiler optimization in GDScript
+- Recommends converting to a typed array (`Array[Node]` or the specific type) to enable JIT hints
+- Notes that if this is still insufficient, escalates the hot path to C# migration recommendation
+- Produces the typed array refactor as the immediate fix
+- Does NOT recommend migrating the entire codebase to C# without profiling evidence
+
+### Case 5: Context pass — Godot 4.6 with post-cutoff features
+**Input:** Engine version context provided: Godot 4.6. Request: "Create an abstract base class for all enemy types using @abstract."
+**Expected behavior:**
+- Identifies `@abstract` as a Godot 4.5+ feature (post-cutoff)
+- Notes this in the output: feature introduced in 4.5, verified against VERSION.md migration notes
+- Produces the GDScript class using `@abstract` with correct syntax as documented in migration notes
+- Marks the output as requiring verification against the official 4.5 release notes due to post-cutoff status
+- Uses static typing for all method signatures in the abstract class
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GDScript — typing, patterns, signals, coroutines, performance)
+- [ ] Redirects shader requests to godot-shader-specialist
+- [ ] Redirects GDExtension requests to godot-gdextension-specialist
+- [ ] Returns structured GDScript output with full static typing
+- [ ] Uses Godot 4 API only — no deprecated Godot 3 patterns (yield, connect with strings, etc.)
+- [ ] Flags post-cutoff features (4.4, 4.5, 4.6) and marks them as requiring doc verification
+
+---
+
+## Coverage Notes
+- Type annotation review (Case 1) output is suitable as a code review checklist
+- Async loading (Case 3) should produce testable code verifiable with a unit test in `tests/unit/`
+- Post-cutoff @abstract (Case 5) confirms the agent flags version uncertainty rather than silently using unverified APIs
--- a/Framework/agents/engine/godot/godot-shader-specialist.md
+++ b/Framework/agents/engine/godot/godot-shader-specialist.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: godot-shader-specialist
+
+## Agent Summary
+Domain: Godot shading language (GLSL-derivative), visual shaders (VisualShader graph), material setup, particle shaders, and post-processing effects.
+Does NOT own: gameplay code, art style direction.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Godot shading language / materials / post-processing)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition references `docs/engine-reference/godot/VERSION.md` as the authoritative source for Godot shader API changes
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Write a dissolve effect shader for enemy death in Godot."
+**Expected behavior:**
+- Produces valid Godot shading language code (not HLSL, not GLSL directly)
+- Uses `shader_type spatial;` or `canvas_item` as appropriate
+- Defines `uniform float dissolve_amount : hint_range(0.0, 1.0);`
+- Samples a noise texture to determine per-pixel dissolve threshold
+- Uses `discard;` for pixels below the threshold
+- Optionally adds an edge glow using emission near the dissolve boundary
+- Code is syntactically correct for Godot's shading language
+
+### Case 2: HLSL redirect
+**Input:** "Write an HLSL compute shader for this dissolve effect."
+**Expected behavior:**
+- Does NOT produce HLSL code
+- Clearly states: "Godot does not use HLSL directly; it uses its own shading language (a GLSL derivative)"
+- Translates the HLSL intent to the equivalent Godot shader approach
+- Notes that RenderingDevice compute shaders are available in Godot 4 but are a low-level API and flags it appropriately if that was the intent
+
+### Case 3: Post-cutoff API change — texture sampling (Godot 4.4)
+**Input:** "Use `texture()` with a sampler2D to sample the noise texture in the shader."
+**Expected behavior:**
+- Checks the version reference: Godot 4.4 changed texture sampler type declarations
+- Flags the potential API change: `sampler2D` syntax and `texture()` call behavior may differ from pre-4.4
+- Provides the correct syntax for the project's pinned version (4.6) as documented in migration notes
+- Does NOT use pre-4.4 texture sampling syntax without flagging the version risk
+
+### Case 4: Fragment shader LOD strategy
+**Input:** "The fragment shader for the water surface has 8 texture samples and is causing GPU bottlenecks on mid-range hardware."
+**Expected behavior:**
+- Identifies the per-fragment texture sample count as the primary cost driver
+- Proposes an LOD strategy:
+  - Reduce sample count at distance (distance-based shader variant or LOD level)
+  - Pre-bake some texture combinations offline
+  - Use lower-resolution noise textures for distant samples
+- Provides the shader code modification implementing the LOD approach
+- Does NOT change gameplay behavior of the water system
+
+### Case 5: Context pass — Godot 4.6 glow rework
+**Input:** Engine version context: Godot 4.6. Request: "Add a bloom/glow post-processing effect to the scene."
+**Expected behavior:**
+- References the VERSION.md note: Godot 4.6 includes a glow rework
+- Produces glow configuration guidance using the 4.6 WorldEnvironment approach, not the pre-4.6 API
+- Explicitly notes which properties or parameters changed in the 4.6 glow rework
+- Flags any properties that the LLM's training data may have incorrect information about due to the post-cutoff timing
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Godot shading language, materials, VFX shaders, post-processing)
+- [ ] Redirects gameplay code requests to gameplay-programmer
+- [ ] Produces valid Godot shading language — never HLSL or raw GLSL without a Godot wrapper
+- [ ] Checks engine version reference for post-cutoff shader API changes (4.4 texture types, 4.6 glow rework)
+- [ ] Returns structured output (shader code with uniforms documented, LOD strategies with performance rationale)
+- [ ] Flags any post-cutoff API usage as requiring verification
+
+---
+
+## Coverage Notes
+- Dissolve shader (Case 1) should be paired with a visual test screenshot in `production/qa/evidence/`
+- Texture API flag (Case 3) confirms the agent checks VERSION.md before using APIs that changed post-4.3
+- Glow rework (Case 5) is a Godot 4.6-specific test — verifies the agent applies the most recent migration notes
--- a/Framework/agents/engine/godot/godot-specialist.md
+++ b/Framework/agents/engine/godot/godot-specialist.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: godot-specialist
+
+## Agent Summary
+Domain: Godot-specific patterns, node/scene architecture, signals, resources, and GDScript vs C# vs GDExtension decisions.
+Does NOT own: actual code authoring in a specific language (delegates to language sub-specialists).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Godot architecture / node patterns / engine decisions)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition references `docs/engine-reference/godot/VERSION.md` as the authoritative API source
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "When should I use signals vs. direct method calls in Godot?"
+**Expected behavior:**
+- Produces a pattern decision guide with rationale:
+  - Signals: decoupled communication, parent-to-child ignorance, event-driven UI updates, one-to-many notification
+  - Direct calls: tightly-coupled systems where the caller needs a return value, or performance-critical hot paths
+- Provides concrete examples of each pattern in the project's context
+- Does NOT produce raw code for both patterns — refers to gdscript-specialist or csharp-specialist for implementation
+- Notes the "no upward signals" convention (child does not call parent methods directly — uses signals instead)
+
+### Case 2: Wrong-engine redirect
+**Input:** "Write a MonoBehaviour that runs on Start() and subscribes to a UnityEvent."
+**Expected behavior:**
+- Does NOT produce Unity MonoBehaviour code
+- Clearly identifies that this is a Unity pattern, not a Godot pattern
+- Provides the Godot equivalent: a Node script using `_ready()` instead of `Start()`, and Godot signals instead of UnityEvent
+- Confirms the project is Godot-based and redirects the conceptual mapping
+
+### Case 3: Post-cutoff API risk
+**Input:** "Use the new Godot 4.5 @abstract annotation to define an abstract base class."
+**Expected behavior:**
+- Identifies that `@abstract` is a post-cutoff feature (introduced in Godot 4.5, after LLM knowledge cutoff)
+- Flags the version risk: LLM knowledge of this annotation may be incomplete or incorrect
+- Directs the user to verify against `docs/engine-reference/godot/VERSION.md` and the official 4.5 migration guide
+- Provides best-effort guidance based on the migration notes in the version reference while clearly marking it as unverified
+
+### Case 4: Language selection for a hot path
+**Input:** "The physics query loop runs every frame for 500 objects. Should we use GDScript or C# for this?"
+**Expected behavior:**
+- Provides a balanced analysis:
+  - GDScript: simpler, team familiar, but slower for tight loops
+  - C#: faster for CPU-intensive loops, requires .NET runtime, team needs C# knowledge
+- Does NOT make the final decision unilaterally
+- Defers the decision to `lead-programmer` with the analysis as input
+- Notes that GDExtension (C++) is a third option for extreme performance cases and recommends escalating if C# is insufficient
+
+### Case 5: Context pass — engine version 4.6
+**Input:** Engine version context provided: Godot 4.6, Jolt as default physics. Request: "Set up a RigidBody3D for the player character."
+**Expected behavior:**
+- Reads the 4.6 context and applies the Jolt-default knowledge (from VERSION.md migration notes)
+- Recommends RigidBody3D configuration choices that are Jolt-compatible (e.g., notes any GodotPhysics-specific settings that behave differently under Jolt)
+- References the 4.6 migration note about Jolt becoming default rather than relying on LLM training data alone
+- Flags any RigidBody3D properties that changed behavior between GodotPhysics and Jolt
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Godot architecture decisions, node/scene patterns, language selection)
+- [ ] Redirects language-specific implementation to godot-gdscript-specialist or godot-csharp-specialist
+- [ ] Returns structured findings (decision trees, pattern recommendations with rationale)
+- [ ] Treats `docs/engine-reference/godot/VERSION.md` as authoritative over LLM training data
+- [ ] Flags post-cutoff API usage (4.4, 4.5, 4.6) with verification requirements
+- [ ] Defers language-selection decisions to lead-programmer when trade-offs exist
+
+---
+
+## Coverage Notes
+- Signal vs. direct call guide (Case 1) should be written to `docs/architecture/` as a reusable pattern doc
+- Post-cutoff flag (Case 3) confirms the agent does not confidently use APIs it cannot verify
+- Engine version case (Case 5) verifies the agent applies migration notes from the version reference, not assumptions
--- a/Framework/agents/engine/unity/unity-addressables-specialist.md
+++ b/Framework/agents/engine/unity/unity-addressables-specialist.md
@@ -0,0 +1,87 @@
+# Agent Test Spec: unity-addressables-specialist
+
+## Agent Summary
+Domain: Addressable Asset System — groups, async loading/unloading, handle lifecycle management, memory budgeting, content catalogs, and remote content delivery.
+Does NOT own: rendering systems (engine-programmer), game logic that uses the loaded assets (gameplay-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Addressables / asset loading / content catalogs / remote delivery)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over rendering systems or gameplay using the loaded assets
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Load a character texture asynchronously and release it when the character is destroyed."
+**Expected behavior:**
+- Produces the `Addressables.LoadAssetAsync<Texture2D>()` call pattern
+- Stores the returned `AsyncOperationHandle<Texture2D>` in the requesting object
+- On character destruction (`OnDestroy()`), calls `Addressables.Release(handle)` with the stored handle
+- Does NOT use `Resources.Load()` as the loading mechanism
+- Notes that releasing with a null or uninitialized handle causes errors — includes a validity check
+- Notes the difference between releasing the handle vs. releasing the asset (handle release is correct)
+
+### Case 2: Out-of-domain redirect
+**Input:** "Implement the rendering system that applies the loaded texture to the character mesh."
+**Expected behavior:**
+- Does NOT produce rendering or mesh material assignment code
+- Explicitly states that rendering system implementation belongs to `engine-programmer`
+- Redirects the request to `engine-programmer`
+- May describe the asset type and API surface it will provide (e.g., `Texture2D` reference once the handle completes) as a handoff spec
+
+### Case 3: Memory leak — un-released handle
+**Input:** "Memory usage keeps climbing after each level load. We use Addressables to load level assets."
+**Expected behavior:**
+- Diagnoses the likely cause: `AsyncOperationHandle` objects not being released after use
+- Identifies the handle leak pattern: loading assets into a local variable, losing reference, never calling `Addressables.Release()`
+- Produces an auditing approach: search for all `LoadAssetAsync` / `LoadSceneAsync` calls and verify matching `Release()` calls
+- Provides a corrected pattern using a tracked handle list (`List<AsyncOperationHandle>`) with a `ReleaseAll()` cleanup method
+- Does NOT assume the leak is elsewhere without evidence
+
+### Case 4: Remote content delivery — catalog versioning
+**Input:** "We need to support downloadable content updates without requiring a full app re-install."
+**Expected behavior:**
+- Produces the remote catalog update pattern:
+  - `Addressables.CheckForCatalogUpdates()` on startup
+  - `Addressables.UpdateCatalogs()` for detected updates
+  - `Addressables.DownloadDependenciesAsync()` to pre-warm the updated content
+- Notes catalog hash checking for change detection
+- Addresses the edge case: what happens if a player starts a session, the catalog updates mid-session — defines behavior (complete current session on old catalog, reload on next launch)
+- Does NOT design the server-side CDN infrastructure (defers to devops-engineer)
+
+### Case 5: Context pass — platform memory constraints
+**Input:** Platform context: Nintendo Switch target, 4GB RAM, practical asset memory ceiling 512MB. Request: "Design the Addressables loading strategy for a large open-world level."
+**Expected behavior:**
+- References the 512MB memory ceiling from the provided context
+- Designs a streaming strategy:
+  - Divide the world into addressable zones loaded/unloaded based on player proximity
+  - Defines a memory budget per active zone (e.g., 128MB, max 4 zones active)
+  - Specifies async pre-load trigger distance and unload distance (hysteresis)
+- Notes Switch-specific constraints: slower load times from SD card, recommend pre-warming adjacent zones
+- Does NOT produce a loading strategy that would exceed the stated 512MB ceiling without flagging it
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Addressables loading, handle lifecycle, memory, catalogs, remote delivery)
+- [ ] Redirects rendering and gameplay asset-use code to engine-programmer and gameplay-programmer
+- [ ] Returns structured output (loading patterns, handle lifecycle code, streaming zone designs)
+- [ ] Always pairs `LoadAssetAsync` with a corresponding `Release()` — flags handle leaks as a memory bug
+- [ ] Designs loading strategies against provided memory ceilings
+- [ ] Does not design CDN/server infrastructure — defers to devops-engineer for server side
+
+---
+
+## Coverage Notes
+- Handle lifecycle (Case 1) must include a test verifying memory is reclaimed after release
+- Handle leak diagnosis (Case 3) should produce a findings report suitable for a bug ticket
+- Platform memory case (Case 5) verifies the agent applies hard constraints from context, not default assumptions
--- a/Framework/agents/engine/unity/unity-dots-specialist.md
+++ b/Framework/agents/engine/unity/unity-dots-specialist.md
@@ -0,0 +1,87 @@
+# Agent Test Spec: unity-dots-specialist
+
+## Agent Summary
+Domain: ECS architecture (IComponentData, ISystem, SystemAPI), Jobs system (IJob, IJobEntity, Burst), Burst compiler constraints, DOTS gameplay systems, and hybrid renderer.
+Does NOT own: MonoBehaviour gameplay code (gameplay-programmer), UI implementation (unity-ui-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references ECS / Jobs / Burst / IComponentData)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over MonoBehaviour gameplay or UI systems
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Convert the player movement system to ECS."
+**Expected behavior:**
+- Produces:
+  - `PlayerMovementData : IComponentData` struct with velocity, speed, and input vector fields
+  - `PlayerMovementSystem : ISystem` with `OnUpdate()` using `SystemAPI.Query<>` or `IJobEntity`
+  - Bakes the player's initial state from an authoring MonoBehaviour via `IBaker`
+- Uses `RefRW<LocalTransform>` for position updates (not deprecated `Translation`)
+- Marks the job `[BurstCompile]` and notes what must be unmanaged for Burst compatibility
+- Does NOT modify the input polling system — reads from an existing `PlayerInputData` component
+
+### Case 2: MonoBehaviour push-back
+**Input:** "Just use MonoBehaviour for the player movement — it's simpler."
+**Expected behavior:**
+- Acknowledges the simplicity argument
+- Explains the DOTS trade-off: more setup upfront, but the ECS/Burst approach provides the performance characteristics documented in the project's ADR or requirements
+- Does NOT implement a MonoBehaviour version if the project has committed to DOTS
+- If no commitment exists, flags the architecture decision to `lead-programmer` / `technical-director` for resolution
+- Does not make the MonoBehaviour vs. DOTS decision unilaterally
+
+### Case 3: Burst-incompatible managed memory
+**Input:** "This Burst job accesses a `List<EnemyData>` to find the nearest enemy."
+**Expected behavior:**
+- Flags `List<T>` as a managed type that is incompatible with Burst compilation
+- Does NOT approve the Burst job with managed memory access
+- Provides the correct replacement: `NativeArray<EnemyData>`, `NativeList<EnemyData>`, or `NativeHashMap<>` depending on the use case
+- Notes that `NativeArray` must be disposed explicitly or via `[DeallocateOnJobCompletion]`
+- Produces the corrected job using unmanaged native containers
+
+### Case 4: Hybrid access — DOTS system needs MonoBehaviour data
+**Input:** "The DOTS movement system needs to read the camera transform managed by a MonoBehaviour CameraController."
+**Expected behavior:**
+- Identifies this as a hybrid access scenario
+- Provides the correct hybrid pattern: store the camera transform in a singleton `IComponentData` (updated from the MonoBehaviour side each frame via `EntityManager.SetComponentData`)
+- Alternatively suggests the `CompanionComponent` / managed component approach
+- Does NOT access the MonoBehaviour from inside a Burst job — flags that as unsafe
+- Provides the bridge code on both the MonoBehaviour side (writing to ECS) and the DOTS system side (reading from ECS)
+
+### Case 5: Context pass — performance targets
+**Input:** Technical preferences from context: 60fps target, max 2ms CPU script budget per frame. Request: "Design the ECS chunk layout for 10,000 enemy entities."
+**Expected behavior:**
+- References the 2ms CPU budget explicitly in the design rationale
+- Designs the `IComponentData` chunk layout for cache efficiency:
+  - Groups frequently-queried together components in the same archetype
+  - Separates rarely-used data into separate components to keep hot data compact
+  - Estimates entity iteration time against the 2ms budget
+- Provides memory layout analysis (bytes per entity, entities per chunk at 16KB chunk size)
+- Does NOT design a layout that will obviously exceed the stated 2ms budget without flagging it
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (ECS, Jobs, Burst, DOTS gameplay systems)
+- [ ] Redirects MonoBehaviour-only gameplay to gameplay-programmer
+- [ ] Returns structured output (IComponentData structs, ISystem implementations, IBaker authoring classes)
+- [ ] Flags managed memory access in Burst jobs as a compile error and provides unmanaged alternatives
+- [ ] Provides hybrid access patterns when DOTS systems need to interact with MonoBehaviour systems
+- [ ] Designs chunk layouts against provided performance budgets
+
+---
+
+## Coverage Notes
+- ECS conversion (Case 1) must include a unit test using the ECS test framework (`World`, `EntityManager`)
+- Burst incompatibility (Case 3) is safety-critical — the agent must catch this before the code is written
+- Chunk layout (Case 5) verifies the agent applies quantitative performance reasoning to architecture decisions
--- a/Framework/agents/engine/unity/unity-shader-specialist.md
+++ b/Framework/agents/engine/unity/unity-shader-specialist.md
@@ -0,0 +1,83 @@
+# Agent Test Spec: unity-shader-specialist
+
+## Agent Summary
+Domain: Unity Shader Graph, custom HLSL, VFX Graph, URP/HDRP pipeline customization, and post-processing effects.
+Does NOT own: gameplay code, art style direction.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Shader Graph / HLSL / VFX Graph / URP / HDRP)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay code or art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an outline effect for characters using Shader Graph in URP."
+**Expected behavior:**
+- Produces a Shader Graph node setup description:
+  - Inverted hull method: Scale Normal → Vertex offset in vertex stage, Cull Front
+  - OR screen-space post-process outline using depth/normal edge detection
+- Recommends the appropriate method based on URP capabilities (inverted hull for URP compatibility, post-process for HDRP)
+- Notes URP limitations: no geometry shader support (rules out geometry-shader outline approach)
+- Does NOT produce HDRP-specific nodes without confirming the render pipeline
+
+### Case 2: Out-of-domain redirect
+**Input:** "Implement the character health bar UI in code."
+**Expected behavior:**
+- Does NOT produce UI implementation code
+- Explicitly states that UI implementation belongs to `ui-programmer` (or `unity-ui-specialist`)
+- Redirects the request appropriately
+- May note that a shader-based fill effect for a health bar (e.g., a dissolve/fill gradient) is within its domain if the visual effect itself is shader-driven
+
+### Case 3: HDRP custom pass for outline
+**Input:** "We're on HDRP and want the outline as a post-process effect."
+**Expected behavior:**
+- Produces the HDRP `CustomPassVolume` pattern:
+  - C# class inheriting `CustomPass`
+  - `Execute()` method using `CoreUtils.SetRenderTarget()` and a full-screen shader blit
+  - Depth/normal buffer sampling for edge detection
+- Notes that CustomPass requires HDRP package and does not work in URP
+- Confirms the project is on HDRP before providing HDRP-specific code
+
+### Case 4: VFX Graph performance — GPU event batching
+**Input:** "The explosion VFX Graph has 10,000 particles per event and spawning 20 simultaneous explosions is causing GPU frame spikes."
+**Expected behavior:**
+- Identifies GPU particle spawn as the cost driver (200,000 simultaneous particles)
+- Proposes GPU event batching: spawn events deferred over multiple frames, stagger initialization
+- Recommends a particle budget cap per active explosion (e.g., 3,000 per explosion, queue excess)
+- Notes the VFX Graph Event Batcher pattern and Output Event API for cross-frame distribution
+- Does NOT change the gameplay event system — proposes a VFX-side budgeting solution
+
+### Case 5: Context pass — render pipeline (URP or HDRP)
+**Input:** Project context: URP render pipeline, Unity 2022.3. Request: "Add depth of field post-processing."
+**Expected behavior:**
+- Uses URP Volume framework: `DepthOfField` Volume Override component
+- Does NOT use HDRP Volume components (e.g., HDRP's `DepthOfField` with different parameter names)
+- Notes URP-specific DOF limitations vs HDRP (e.g., Bokeh quality differences)
+- Produces C# Volume profile setup code compatible with Unity 2022.3 URP package version
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Shader Graph, HLSL, VFX Graph, URP/HDRP customization)
+- [ ] Redirects gameplay and UI code to appropriate agents
+- [ ] Returns structured output (node graph descriptions, HLSL code, CustomPass patterns)
+- [ ] Distinguishes between URP and HDRP approaches — never cross-contaminates pipeline-specific APIs
+- [ ] Flags geometry shader approaches as URP-incompatible when relevant
+- [ ] Produces VFX optimizations that do not change gameplay behavior
+
+---
+
+## Coverage Notes
+- Outline effect (Case 1) should be paired with a visual screenshot test in `production/qa/evidence/`
+- HDRP CustomPass (Case 3) confirms the agent produces the correct Unity pattern, not a generic post-process approach
+- Pipeline separation (Case 5) verifies the agent never assumes the render pipeline without context
--- a/Framework/agents/engine/unity/unity-specialist.md
+++ b/Framework/agents/engine/unity/unity-specialist.md
@@ -0,0 +1,83 @@
+# Agent Test Spec: unity-specialist
+
+## Agent Summary
+Domain: Unity-specific architecture patterns, MonoBehaviour vs DOTS decisions, and subsystem selection (Addressables, New Input System, UI Toolkit, Cinemachine, etc.).
+Does NOT own: language-specific deep dives (delegates to unity-dots-specialist, unity-ui-specialist, etc.).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Unity patterns / MonoBehaviour / subsystem decisions)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition acknowledges the sub-specialist routing table (DOTS, UI, Shader, Addressables)
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Should I use MonoBehaviour or ScriptableObject for storing enemy configuration data?"
+**Expected behavior:**
+- Produces a pattern decision tree covering:
+  - MonoBehaviour: for runtime behavior, needs to be attached to a GameObject, has Update() lifecycle
+  - ScriptableObject: for pure data/configuration, exists as an asset, shared across instances, no scene dependency
+- Recommends ScriptableObject for enemy configuration data (stateless, reusable, designer-friendly)
+- Notes that MonoBehaviour can reference the ScriptableObject for runtime use
+- Provides a concrete example of what the ScriptableObject class definition looks like (does not produce full code — refers to engine-programmer or gameplay-programmer for implementation)
+
+### Case 2: Wrong-engine redirect
+**Input:** "Set up a Node scene tree with signals for this enemy system."
+**Expected behavior:**
+- Does NOT produce Godot Node/signal code
+- Identifies this as a Godot pattern
+- States that in Unity the equivalent is GameObject hierarchy + UnityEvent or C# events
+- Maps the concepts: Godot Node → Unity MonoBehaviour, Godot Signal → C# event / UnityEvent
+- Confirms the project is Unity-based before proceeding
+
+### Case 3: Unity version API flag
+**Input:** "Use the new Unity 6 GPU resident drawer for batch rendering."
+**Expected behavior:**
+- Identifies the Unity 6 feature (GPU Resident Drawer)
+- Flags that this API may not be available in earlier Unity versions
+- Asks for or checks the project's Unity version before providing implementation guidance
+- Directs to verify against official Unity 6 documentation
+- Does NOT assume the project is on Unity 6 without confirmation
+
+### Case 4: DOTS vs. MonoBehaviour conflict
+**Input:** "The combat system uses MonoBehaviour for state management, but we want to add a DOTS-based projectile system. Can they coexist?"
+**Expected behavior:**
+- Recognizes this as a hybrid architecture scenario
+- Explains the hybrid approach: MonoBehaviour can interface with DOTS via SystemAPI, IComponentData, and managed components
+- Notes the performance and complexity trade-offs of mixing the two patterns
+- Recommends escalating the architecture decision to `lead-programmer` or `technical-director`
+- Defers to `unity-dots-specialist` for the DOTS-side implementation details
+
+### Case 5: Context pass — Unity version
+**Input:** Project context provided: Unity 2023.3 LTS. Request: "Configure the new Input System for this project."
+**Expected behavior:**
+- Applies Unity 2023.3 LTS context: uses the New Input System (com.unity.inputsystem) package
+- Does NOT produce legacy Input Manager code (`Input.GetKeyDown()`, `Input.GetAxis()`)
+- Notes any 2023.3-specific Input System behaviors or package version constraints
+- References the project version to confirm Burst/Jobs compatibility if the Input System interacts with DOTS
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Unity architecture decisions, pattern selection, subsystem routing)
+- [ ] Redirects Godot patterns to appropriate Godot specialists or flags them as wrong-engine
+- [ ] Redirects DOTS implementation to unity-dots-specialist
+- [ ] Redirects UI implementation to unity-ui-specialist
+- [ ] Flags Unity version-gated APIs and requires version confirmation before suggesting them
+- [ ] Returns structured pattern decision guides, not freeform opinions
+
+---
+
+## Coverage Notes
+- MonoBehaviour vs. ScriptableObject (Case 1) should be documented as an ADR if it results in a project-level decision
+- Version flag (Case 3) confirms the agent does not assume the latest Unity version without context
+- DOTS hybrid (Case 4) verifies the agent escalates architecture conflicts rather than resolving them unilaterally
--- a/Framework/agents/engine/unity/unity-ui-specialist.md
+++ b/Framework/agents/engine/unity/unity-ui-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: unity-ui-specialist
+
+## Agent Summary
+Domain: Unity UI Toolkit (UXML/USS), UGUI (Canvas), data binding, runtime UI performance, and UI input event handling.
+Does NOT own: UX flow design (ux-designer), visual art style (art-director).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UI Toolkit / UGUI / Canvas / data binding)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow design or visual art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement an inventory UI screen using Unity UI Toolkit."
+**Expected behavior:**
+- Produces a UXML document defining the inventory panel structure (ListView, item templates, detail panel)
+- Produces USS styles for the inventory layout and item states (default, hover, selected)
+- Provides C# code binding the inventory data model to the UI via `INotifyValueChanged` or `IBindable`
+- Uses `ListView` with `makeItem` / `bindItem` callbacks for the scrollable item list
+- Does NOT produce the UX flow design — implements from a provided spec
+
+### Case 2: Out-of-domain redirect
+**Input:** "Design the UX flow for the inventory — what happens when the player equips vs. drops an item."
+**Expected behavior:**
+- Does NOT produce UX flow design
+- Explicitly states that interaction flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- Notes it will implement whatever flow the ux-designer specifies
+
+### Case 3: UI Toolkit data binding for dynamic list
+**Input:** "The inventory list needs to update in real time as items are added or removed from the player's bag."
+**Expected behavior:**
+- Produces the `ListView` pattern with a bound `ObservableList<T>` or event-driven refresh approach
+- Uses `ListView.Rebuild()` or `ListView.RefreshItems()` on the backing collection change event
+- Notes the performance considerations for large lists (virtualization via `makeItem`/`bindItem` pattern)
+- Does NOT use `QuerySelector` loops to update individual elements as a list refresh strategy — flags that as a performance antipattern
+
+### Case 4: Canvas performance — overdraw
+**Input:** "The main menu canvas is causing GPU overdraw warnings; there are many overlapping panels."
+**Expected behavior:**
+- Identifies overdraw causes: multiple stacked canvases, full-screen overlay panels not culled when inactive
+- Recommends:
+  - Separate canvases for world-space, screen-space-overlay, and screen-space-camera layers
+  - Disable/deactivate panels instead of setting alpha to 0 (invisible alpha-0 panels still draw)
+  - Canvas Group + alpha for fade effects, not individual Image alpha
+- Notes UI Toolkit alternative if the project is in a migration position
+
+### Case 5: Context pass — Unity version
+**Input:** Project context: Unity 2022.3 LTS. Request: "Implement the settings panel with data binding."
+**Expected behavior:**
+- Uses UI Toolkit with the 2022.3 LTS version of the runtime binding system
+- Notes that Unity 2022.3 introduced runtime data binding (as opposed to editor-only binding in earlier versions)
+- Does NOT use the Unity 6 enhanced binding API features if they are not available in 2022.3
+- Produces code compatible with the stated Unity version, with version-specific API notes
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UI Toolkit, UGUI, data binding, UI performance)
+- [ ] Redirects UX flow design to ux-designer
+- [ ] Returns structured output (UXML, USS, C# binding code)
+- [ ] Uses the correct Unity UI framework version for the project's Unity version
+- [ ] Flags Canvas overdraw as a performance antipattern and provides specific remediation
+- [ ] Does not use alpha-0 as a hide/show pattern — uses SetActive() or VisualElement.style.display
+
+---
+
+## Coverage Notes
+- Inventory UI (Case 1) should have a manual walkthrough doc in `production/qa/evidence/`
+- Dynamic list binding (Case 3) should have an integration test or automated interaction test
+- Canvas overdraw (Case 4) verifies the agent knows the correct Unity UI performance patterns
--- a/Framework/agents/engine/unreal/ue-blueprint-specialist.md
+++ b/Framework/agents/engine/unreal/ue-blueprint-specialist.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: ue-blueprint-specialist
+
+## Agent Summary
+- **Domain**: Blueprint architecture, the Blueprint/C++ boundary, Blueprint graph quality, Blueprint performance optimization, Blueprint Function Library design
+- **Does NOT own**: C++ implementation (engine-programmer or gameplay-programmer), art assets or shaders, UI/UX flow design (ux-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers to unreal-specialist or lead-programmer for cross-domain rulings
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Blueprint architecture and optimization)
+- [ ] `allowed-tools:` list matches the agent's role (Read for Blueprint project files; no server or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over C++ implementation decisions
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — Blueprint graph performance review
+**Input**: "Review our AI behavior Blueprint. It has tick-based logic running every frame that checks line-of-sight for 30 NPCs simultaneously."
+**Expected behavior**:
+- Identifies tick-heavy logic as a performance problem
+- Recommends switching from EventTick to event-driven patterns (perception system events, timers, or polling on a reduced interval)
+- Flags the per-NPC cost of simultaneous line-of-sight checks
+- Suggests alternatives: AIPerception component events, staggered tick groups, or moving the system to C++ if Blueprint overhead is measured to be significant
+- Output is structured: problem identified, impact estimated, alternatives listed
+
+### Case 2: Out-of-domain request — C++ implementation
+**Input**: "Write the C++ implementation for this ability cooldown system."
+**Expected behavior**:
+- Does not produce C++ implementation code
+- Provides the Blueprint equivalent of the cooldown logic (e.g., using a Timeline or GameplayEffect if GAS is in use)
+- States clearly: "C++ implementation is handled by engine-programmer or gameplay-programmer; I can show the Blueprint approach or describe the boundary where Blueprint calls into C++"
+- Optionally notes when the cooldown complexity warrants a C++ backend
+
+### Case 3: Domain boundary — unsafe raw pointer access in Blueprint
+**Input**: "Our Blueprint calls GetOwner() and then immediately accesses a component on the result without checking if it's valid."
+**Expected behavior**:
+- Flags this as a runtime crash risk: GetOwner() can return null in some lifecycle states
+- Provides the correct Blueprint pattern: IsValid() node before any property/component access
+- Notes that Blueprint's null checks are not optional on Actor-derived references
+- Does NOT silently fix the code without explaining why the original was unsafe
+
+### Case 4: Blueprint graph complexity — readiness for Function Library refactor
+**Input**: "Our main GameMode Blueprint has 600+ nodes in a single graph with duplicated damage calculation logic in 8 places."
+**Expected behavior**:
+- Diagnoses this as a maintainability and testability problem
+- Recommends extracting duplicated logic into a Blueprint Function Library (BFL)
+- Describes how to structure the BFL: pure functions for calculations, static calls from any Blueprint
+- Notes that if the damage logic is performance-sensitive or shared with C++, it may be a candidate for migration to unreal-specialist review
+- Output is a concrete refactor plan, not a vague recommendation
+
+### Case 5: Context pass — Blueprint complexity budget
+**Input context**: Project conventions specify a maximum of 100 nodes per Blueprint event graph before a mandatory Function Library extraction.
+**Input**: "Here is our inventory Blueprint graph [150 nodes shown]. Is it ready to ship?"
+**Expected behavior**:
+- References the stated 150-node count against the 100-node budget from project conventions
+- Flags the graph as exceeding the complexity threshold
+- Does NOT approve it as-is
+- Produces a list of candidate subgraphs for Function Library extraction to bring the main graph within budget
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Blueprint architecture, performance, graph quality)
+- [ ] Redirects C++ implementation requests to engine-programmer or gameplay-programmer
+- [ ] Returns structured findings (problem/impact/alternatives format) rather than freeform opinions
+- [ ] Enforces Blueprint safety patterns (null checks, IsValid) proactively
+- [ ] References project conventions when evaluating graph complexity
+
+---
+
+## Coverage Notes
+- Case 3 (null pointer safety) is a safety-critical test — this is a common source of shipping crashes
+- Case 5 requires that project conventions include a stated node budget; if none is configured, the agent should note the absence and recommend setting one
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/ue-gas-specialist.md
+++ b/Framework/agents/engine/unreal/ue-gas-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: ue-gas-specialist
+
+## Agent Summary
+- **Domain**: Gameplay Ability System (GAS) — abilities (UGameplayAbility), gameplay effects (UGameplayEffect), attribute sets (UAttributeSet), gameplay tags, ability tasks (UAbilityTask), ability specs (FGameplayAbilitySpec), GAS prediction and latency compensation
+- **Does NOT own**: UI display of ability state (ue-umg-specialist), net replication of GAS data beyond built-in GAS prediction (ue-replication-specialist), art or VFX for ability feedback (vfx-artist)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers cross-domain calls to the appropriate specialist
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GAS, abilities, GameplayEffects, AttributeSets)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for GAS source files; no deployment or server tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UI implementation or low-level net serialization
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — dash ability with cooldown
+**Input**: "Implement a dash ability that moves the player forward 500 units and has a 1.5 second cooldown."
+**Expected behavior**:
+- Produces a GAS AbilitySpec structure or outline: UGameplayAbility subclass with ActivateAbility logic, an AbilityTask for movement (e.g., AbilityTask_ApplyRootMotionMoveToForce or custom root motion), and a UGameplayEffect for the cooldown
+- Cooldown GameplayEffect uses Duration policy with the 1.5s duration and a GameplayTag to block re-activation
+- Tags clearly named following a hierarchy convention (e.g., Ability.Dash, Cooldown.Ability.Dash)
+- Output includes both the ability class outline and the GameplayEffect definition
+
+### Case 2: Out-of-domain request — GAS state replication
+**Input**: "How do I replicate the player's ability cooldown state to all clients so the UI updates correctly?"
+**Expected behavior**:
+- Clarifies that GAS has built-in replication for AbilitySpecs and GameplayEffects via the AbilitySystemComponent's replication mode
+- Explains the three ASC replication modes (Full, Mixed, Minimal) and when to use each
+- For custom replication needs beyond GAS built-ins, explicitly states: "For custom net serialization of GAS data, coordinate with ue-replication-specialist"
+- Does NOT attempt to write custom replication code outside GAS's own systems without flagging the domain boundary
+
+### Case 3: Domain boundary — incorrect GameplayTag hierarchy
+**Input**: "We have an ability that applies a tag called 'Stunned' and another that checks for 'Status.Stunned'. They're not matching."
+**Expected behavior**:
+- Identifies the root cause: tag names must be exact or use hierarchical matching via TagContainer queries
+- Flags the naming inconsistency: 'Stunned' is a root-level tag; 'Status.Stunned' is a child tag under 'Status' — these are different tags
+- Recommends a project tag naming convention: all status effects under Status.*, all abilities under Ability.*
+- Provides the fix: either rename the applied tag to 'Status.Stunned' or update the query to match 'Stunned'
+- Notes where tag definitions should live (DefaultGameplayTags.ini or a DataTable)
+
+### Case 4: Conflict — attribute set conflict between two abilities
+**Input**: "Our Shield ability and our Armor ability both modify a 'DefenseValue' attribute. They're stacking in ways that aren't intended — after both are active, defense goes well above maximum."
+**Expected behavior**:
+- Identifies this as a GameplayEffect stacking and magnitude calculation problem
+- Proposes a resolution using Execution Calculations (UGameplayEffectExecutionCalculation) or Modifier Aggregators to cap the combined result
+- Alternatively recommends using Gameplay Effect Stacking policies (Aggregate, None) to prevent unintended additive stacking
+- Produces a concrete resolution: either an Execution Calculation class outline or a change to the Modifier Op (Override instead of Additive for the cap)
+- Does NOT propose removing one of the abilities as the solution
+
+### Case 5: Context pass — designing against an existing attribute set
+**Input context**: Project has an existing AttributeSet with attributes: Health, MaxHealth, Stamina, MaxStamina, Defense, AttackPower.
+**Input**: "Design a Berserker ability that increases AttackPower by 50% when Health drops below 30%."
+**Expected behavior**:
+- Uses the existing Health, MaxHealth, and AttackPower attributes — does NOT invent new attributes
+- Designs a Passive GameplayAbility (or triggered Effect) that fires on Health change, checks Health/MaxHealth ratio via a GameplayEffectExecutionCalculation or Attribute-Based magnitude
+- Uses a Gameplay Cue or Gameplay Tag to track the Berserker active state
+- References the actual attribute names from the provided AttributeSet (AttackPower, not "Damage" or "Strength")
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GAS: abilities, effects, attributes, tags, ability tasks)
+- [ ] Redirects custom replication requests to ue-replication-specialist with clear explanation of boundary
+- [ ] Returns structured findings (ability outline + GameplayEffect definition) rather than vague descriptions
+- [ ] Enforces tag hierarchy naming conventions proactively
+- [ ] Uses only attributes and tags present in the provided context; does not invent new ones without noting it
+
+---
+
+## Coverage Notes
+- Case 3 (tag hierarchy) is a frequent source of subtle bugs; test whenever tag naming conventions change
+- Case 4 requires knowledge of GAS stacking policies — verify this case if the GAS integration depth changes
+- Case 5 is the most important context-awareness test; failing it means the agent ignores project state
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/ue-replication-specialist.md
+++ b/Framework/agents/engine/unreal/ue-replication-specialist.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: ue-replication-specialist
+
+## Agent Summary
+- **Domain**: Property replication (UPROPERTY Replicated/ReplicatedUsing), RPCs (Server/Client/NetMulticast), client prediction and reconciliation, net relevancy and always-relevant settings, net serialization (FArchive/NetSerialize), bandwidth optimization and replication frequency tuning
+- **Does NOT own**: Gameplay logic being replicated (gameplay-programmer), server infrastructure and hosting (devops-engineer), GAS-specific prediction (ue-gas-specialist handles GAS net prediction)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates security-relevant replication concerns to lead-programmer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references replication, RPCs, client prediction, bandwidth)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for C++ and Blueprint source files; no infrastructure or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over server infrastructure, game server architecture, or gameplay logic correctness
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — replicated player health with client prediction
+**Input**: "Set up replicated player health that clients can predict locally (e.g., when taking self-inflicted damage) and have corrected by the server."
+**Expected behavior**:
+- Produces a UPROPERTY(ReplicatedUsing=OnRep_Health) declaration in the appropriate Character or AttributeSet class
+- Describes the OnRep_Health function: apply visual/audio feedback, reconcile predicted value with server-authoritative value
+- Explains the client prediction pattern: local client applies tentative damage immediately, server authoritative value arrives via OnRep and corrects any discrepancy
+- Notes that if GAS is in use, the built-in GAS prediction handles this — recommend coordinating with ue-gas-specialist
+- Output is a concrete code structure (property declaration + OnRep outline), not a conceptual description only
+
+### Case 2: Out-of-domain request — game server architecture
+**Input**: "Design our game server infrastructure — how many dedicated servers we need, regional deployment, and matchmaking architecture."
+**Expected behavior**:
+- Does not produce server infrastructure architecture, hosting recommendations, or matchmaking design
+- States clearly: "Server infrastructure and deployment architecture is owned by devops-engineer; I handle the Unreal replication layer within a running game session"
+- Does not conflate in-game replication with server hosting concerns
+
+### Case 3: Domain boundary — RPC without server authority validation
+**Input**: "We have a Server RPC called ServerSpendCurrency that deducts in-game currency. The client calls it and the server just deducts without checking anything."
+**Expected behavior**:
+- Flags this as a critical security vulnerability: unvalidated server RPCs are exploitable by cheaters sending arbitrary RPC calls
+- Provides the required fix: server-side validation before the deduct — check that the player actually has the currency, verify the transaction is valid, reject and log if not
+- Uses the pattern: `if (!HasAuthority()) return;` guard plus explicit state validation before mutation
+- Notes this should be reviewed by lead-programmer given the economy implications
+- Does NOT produce the "fixed" code without explaining why the original was dangerous
+
+### Case 4: Bandwidth optimization — high-frequency movement replication
+**Input**: "Our player movement is replicated using a Vector3 position every tick. With 32 players, we're exceeding our bandwidth budget."
+**Expected behavior**:
+- Identifies tick-rate replication of full-precision Vector3 as bandwidth-expensive
+- Proposes quantized replication: use FVector_NetQuantize or FVector_NetQuantize100 instead of raw FVector to reduce bytes per update
+- Recommends reducing replication frequency via SetNetUpdateFrequency() for non-owning clients
+- Notes that Unreal's built-in Character Movement Component already has optimized movement replication — recommends using or extending it rather than rolling a custom system
+- Produces a concrete bandwidth estimate comparison if possible, or explains the tradeoff
+
+### Case 5: Context pass — designing within a network budget
+**Input context**: Project network budget is 64 KB/s per player, with 32 players = 2 MB/s total server outbound. Current movement replication already uses 40 KB/s per player.
+**Input**: "We want to add real-time inventory replication so all clients can see other players' equipment changes immediately."
+**Expected behavior**:
+- Acknowledges the existing 40 KB/s movement cost leaves only 24 KB/s for everything else per player
+- Does NOT design a naive full-inventory replication approach (would exceed budget)
+- Recommends a delta-only or event-driven approach: replicate only changed slots rather than the full inventory array
+- Uses FGameplayItemSlot or equivalent with ReplicatedUsing to trigger targeted updates
+- Explicitly states the proposed approach's bandwidth estimate relative to the remaining 24 KB/s budget
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (property replication, RPCs, client prediction, bandwidth)
+- [ ] Redirects server infrastructure requests to devops-engineer without producing infrastructure design
+- [ ] Flags unvalidated server RPCs as security issues and recommends lead-programmer review
+- [ ] Returns structured findings (property declarations, bandwidth estimates, optimization options) not freeform advice
+- [ ] Uses project-provided bandwidth budget numbers when evaluating replication design choices
+
+---
+
+## Coverage Notes
+- Case 3 (RPC security) is a shipping-critical test — unvalidated RPCs are a top-ten multiplayer exploit vector
+- Case 5 is the most important context-awareness test; agent must use actual budget numbers, not generic advice
+- Case 1 GAS branch: if GAS is configured, agent should detect it and defer to ue-gas-specialist for GAS-managed attributes
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/ue-umg-specialist.md
+++ b/Framework/agents/engine/unreal/ue-umg-specialist.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ue-umg-specialist
+
+## Agent Summary
+- **Domain**: UMG widget hierarchy design, data binding patterns, CommonUI input routing and action tags, widget styling (WidgetStyle assets), UI optimization (widget pooling, ListView, invalidation)
+- **Does NOT own**: UX flow and screen navigation design (ux-designer), gameplay logic (gameplay-programmer), backend data sources (game code), server communication
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers UX flow decisions to ux-designer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UMG, widget hierarchy, CommonUI)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for UI assets and Blueprint files; no server or gameplay source tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow, navigation architecture, or gameplay data logic
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — inventory widget with data binding
+**Input**: "Create an inventory widget that shows a grid of item slots. Each slot should display item icon, quantity, and rarity color. It needs to update when the inventory changes."
+**Expected behavior**:
+- Produces a UMG widget structure: a parent WBP_Inventory containing a UniformGridPanel or TileView, with a child WBP_InventorySlot widget per item
+- Describes data binding approach: either Event Dispatchers on an Inventory Component triggering a refresh, or a ListView with a UObject item data class implementing IUserObjectListEntry
+- Specifies how rarity color is driven: a WidgetStyle asset or a data table lookup, not hardcoded color values
+- Output includes the widget hierarchy, binding pattern, and the refresh trigger mechanism
+
+### Case 2: Out-of-domain request — UX flow design
+**Input**: "Design the full navigation flow for our inventory system — how the player opens it, transitions to character stats, and exits to the pause menu."
+**Expected behavior**:
+- Does not produce a navigation flow or screen transition architecture
+- States clearly: "Navigation flow and screen transition design is owned by ux-designer; I can implement the UMG widget structure once the flow is defined"
+- Does not make UX decisions (back button behavior, transition animations, modal vs. fullscreen) without a UX spec
+
+### Case 3: Domain boundary — CommonUI input action mismatch
+**Input**: "Our inventory widget isn't responding to the controller Back button. We're using CommonUI."
+**Expected behavior**:
+- Identifies the likely cause: the widget's Back input action tag does not match the project's registered CommonUI InputAction data asset
+- Explains the CommonUI input routing model: widgets declare input actions via `CommonUI_InputAction` tags; the CommonActivatableWidget handles routing
+- Provides the fix: verify that the widget's Back action tag matches the registered tag in the project's CommonUI input action data table
+- Distinguishes this from a hardware input binding issue (which would be Enhanced Input territory)
+
+### Case 4: Widget performance issue — many widget instances per frame
+**Input**: "Our leaderboard widget creates 500 individual WBP_LeaderboardRow instances at once. The game hitches for 300ms when opening the leaderboard."
+**Expected behavior**:
+- Identifies the root cause: 500 widget instantiations in a single frame causes a construction hitch
+- Recommends switching to ListView or TileView with virtualization — only visible rows are constructed
+- Explains the IUserObjectListEntry interface requirement for ListView data objects
+- If ListView is not appropriate, recommends pooling: pre-instantiate a fixed number of rows and recycle them with new data
+- Output is a concrete recommendation with the specific UMG component to use, not a vague "optimize it"
+
+### Case 5: Context pass — CommonUI setup already configured
+**Input context**: Project uses CommonUI with the following registered InputAction tags: UI.Action.Confirm, UI.Action.Back, UI.Action.Pause, UI.Action.Secondary.
+**Input**: "Add a 'Sort Inventory' button to the inventory widget that works with CommonUI."
+**Expected behavior**:
+- Uses UI.Action.Secondary (or recommends registering a new tag like UI.Action.Sort if Secondary is already allocated)
+- Does NOT invent a new InputAction tag without noting that it must be registered in the CommonUI data table
+- Does NOT use a non-CommonUI input binding approach (e.g., raw key press in Event Graph) when CommonUI is the established pattern
+- References the provided tag list explicitly in the recommendation
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UMG structure, data binding, CommonUI, widget performance)
+- [ ] Redirects UX flow and navigation design requests to ux-designer
+- [ ] Returns structured findings (widget hierarchy + binding pattern) rather than freeform opinions
+- [ ] Uses existing CommonUI InputAction tags from context; does not invent new ones without flagging registration requirement
+- [ ] Recommends virtualized lists (ListView/TileView) before widget pooling for large collections
+
+---
+
+## Coverage Notes
+- Case 3 (CommonUI input routing) requires project to have CommonUI configured; test is skipped if project does not use CommonUI
+- Case 4 (performance) is a high-impact failure mode — 300ms hitches are shipping-blocking; prioritize this test case
+- Case 5 is the most important context-awareness test for UI pipeline consistency
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/unreal-specialist.md
+++ b/Framework/agents/engine/unreal/unreal-specialist.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: unreal-specialist
+
+## Agent Summary
+- **Domain**: Unreal Engine patterns and architecture — Blueprint vs C++ decisions, UE subsystems (GAS, Enhanced Input, Niagara), UE project structure, plugin integration, and engine-level configuration
+- **Does NOT own**: Art style and visual direction (art-director), server infrastructure and deployment (devops-engineer), UI/UX flow design (ux-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers gate verdicts to technical-director
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Unreal Engine)
+- [ ] `allowed-tools:` list matches the agent's role (Read, Write for UE project files; no deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority outside its declared domain (no art, no server infra)
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — Blueprint vs C++ decision criteria
+**Input**: "Should I implement our combo attack system in Blueprint or C++?"
+**Expected behavior**:
+- Provides structured decision criteria: complexity, reuse frequency, team skill, and performance requirements
+- Recommends C++ for systems called every frame or shared across 5+ ability types
+- Recommends Blueprint for designer-tunable values and one-off logic
+- Does NOT render a final verdict without knowing project context — asks clarifying questions if context is absent
+- Output is structured (criteria table or bullet list), not a freeform opinion
+
+### Case 2: Out-of-domain request — Unity C# code
+**Input**: "Write me a C# MonoBehaviour that handles player health and fires a Unity event on death."
+**Expected behavior**:
+- Does not produce Unity C# code
+- States clearly: "This project uses Unreal Engine; the Unity equivalent would be an Actor Component in UE C++ or a Blueprint Actor Component"
+- Optionally offers to provide the UE equivalent if requested
+- Does not redirect to a Unity specialist (none exists in the framework)
+
+### Case 3: Domain boundary — UE5.4 API requirement
+**Input**: "I need to use the new Motion Matching API introduced in UE5.4."
+**Expected behavior**:
+- Flags that UE5.4 is a specific version with potentially limited LLM training coverage
+- Recommends cross-referencing official Unreal docs or the project's engine-reference directory before trusting any API suggestions
+- Provides best-effort API guidance with explicit uncertainty markers (e.g., "Verify this against UE5.4 release notes")
+- Does NOT silently produce stale or incorrect API signatures without a caveat
+
+### Case 4: Conflict — Blueprint spaghetti in a core system
+**Input**: "Our replication logic is entirely in a deeply nested Blueprint event graph with 300+ nodes and no functions. It's becoming unmaintainable."
+**Expected behavior**:
+- Identifies this as a Blueprint architecture problem, not a minor style issue
+- Recommends migrating core replication logic to C++ ActorComponent or GameplayAbility system
+- Notes the coordination required: changes to replication architecture must involve lead-programmer
+- Does NOT unilaterally declare "migrate to C++" without surfacing the scope of the refactor to the user
+- Produces a concrete migration recommendation, not a vague suggestion
+
+### Case 5: Context pass — version-appropriate API suggestions
+**Input context**: Project engine-reference file states Unreal Engine 5.3.
+**Input**: "How do I set up Enhanced Input actions for a new character?"
+**Expected behavior**:
+- Uses UE5.3-era Enhanced Input API (InputMappingContext, UEnhancedInputComponent::BindAction)
+- Does NOT reference APIs introduced after UE5.3 without flagging them as potentially unavailable
+- References the project's stated engine version in its response
+- Provides concrete, version-anchored code or Blueprint node names
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Unreal patterns, Blueprint/C++, UE subsystems)
+- [ ] Redirects Unity or other-engine requests without producing wrong-engine code
+- [ ] Returns structured findings (criteria tables, decision trees, migration plans) rather than freeform opinions
+- [ ] Flags version uncertainty explicitly before producing API suggestions
+- [ ] Coordinates with lead-programmer for architecture-scale refactors rather than deciding unilaterally
+
+---
+
+## Coverage Notes
+- No automated runner exists for agent behavior tests — these are reviewed manually or via `/skill-test`
+- Version-awareness (Case 3, Case 5) is the highest-risk failure mode for this agent; test regularly when engine version changes
+- Case 4 integration with lead-programmer is a coordination test, not a technical correctness test
--- a/Framework/agents/leads/audio-director.md
+++ b/Framework/agents/leads/audio-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: audio-director
+
+## Agent Summary
+**Domain owned:** Music direction and palette, sound design philosophy, audio implementation strategy, mix balance, audio aspects of phase gates.
+**Does NOT own:** Visual design (art-director), code implementation (lead-programmer), narrative story content (narrative-director), UX interaction flows (ux-designer).
+**Model tier:** Sonnet (individual system analysis — audio direction and spec review).
+**Gate IDs handled:** AD-VISUAL (audio aspect of the phase gate; may be referenced as part of AD-PHASE-GATE in the audio dimension).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/audio-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references music direction, sound design, mix, audio implementation — not generic)
+- [ ] `allowed-tools:` list is read-focused; no Bash unless audio asset pipeline checks are justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual design, code implementation, or narrative content
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** An audio specification document is submitted for the game's "Exploration" music layer. The spec defines a generative ambient system using layered stems that shift based on environmental density, designed to reinforce the pillar "lived-in world." The tone palette (sparse, organic, slightly melancholic) matches the established design pillars.
+**Expected:** Returns `APPROVED` with rationale confirming the stem-based approach supports dynamic responsiveness and the tone palette aligns with the pillar vocabulary.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references the specific pillar ("lived-in world") and how the audio spec supports it
+- [ ] Output stays within audio scope — does not comment on visual design of the environment or UI layout
+- [ ] Verdict is clearly labeled with context (e.g., "Audio Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks audio-director to evaluate whether the UI flow for the audio settings menu (the sequence of screens and options) is intuitive and well-organized.
+**Expected:** Agent declines to evaluate UI interaction flow and redirects to ux-designer.
+**Assertions:**
+- [ ] Does not make any binding decision about UI flow or information architecture
+- [ ] Explicitly names `ux-designer` as the correct handler
+- [ ] May note audio-specific requirements for the settings menu (e.g., "must include separate master, music, and SFX sliders"), but defers flow and layout decisions to ux-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A music cue for the final boss encounter is submitted. The cue is an upbeat, major-key orchestral piece with fast tempo. The game pillars and narrative context for this encounter specify "dread, inevitability, and tragic sacrifice." The audio cue's emotional register directly contradicts the intended emotional beat.
+**Expected:** Returns `NEEDS REVISION` with specific citation of the emotional mismatch: the cue's upbeat/major-key/fast-tempo characteristics versus the intended dread/inevitability/sacrifice emotional targets from the pillars and narrative context.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific musical characteristics that conflict with the emotional targets
+- [ ] References the specific emotional targets from the game pillars or narrative context
+- [ ] Provides actionable direction for revision (e.g., "shift to minor key, slower tempo, reduce ensemble density")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** sound-designer proposes implementing audio occlusion using real-time raycast-based physics queries (technical approach). technical-artist argues this is too expensive and proposes a zone-based trigger system instead. Both agree the occlusion effect is desirable; the conflict is purely about implementation approach.
+**Expected:** audio-director decides on the desired audio behavior (what occlusion should sound like and when it should activate), then defers the implementation approach decision to technical-artist or lead-programmer as the implementation experts. audio-director does not make the technical implementation choice.
+**Assertions:**
+- [ ] Defines the desired audio behavior clearly (what should the player hear and when)
+- [ ] Explicitly defers the implementation approach (raycast vs. zone-trigger) to `lead-programmer` or `technical-artist`
+- [ ] Does not unilaterally choose the technical implementation method
+- [ ] Frames the handoff clearly: "audio-director owns what, technical lead owns how"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "emergent stories," "meaningful sacrifice," and "lived-in world." A sound design spec for ambient environmental audio is submitted.
+**Expected:** Assessment evaluates the ambient audio spec against all three pillars specifically — how does the audio support (or undermine) each pillar? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the audio spec's contribution to each pillar explicitly
+- [ ] Does not generate generic audio direction advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar is not supported by the current audio spec and flags it
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared audio domain
+- [ ] Defers implementation approach decisions to technical leads
+- [ ] Does not use gate ID prefix format in the same way as director-tier agents (audio-director uses APPROVED / NEEDS REVISION inline, but should still reference the gate context)
+- [ ] Does not make binding visual design, UX, narrative, or code implementation decisions
+
+---
+
+## Coverage Notes
+- Mix balance review (relative levels between music, SFX, and dialogue) is not covered — a dedicated case should be added.
+- Audio implementation strategy review (middleware choice, streaming approach) is not covered.
+- Interaction between audio-director and the audio specialist agent (if one exists) for implementation delegation is not covered.
+- Localization audio implications (VO recording direction, language-specific music timing) are not covered.
--- a/Framework/agents/leads/game-designer.md
+++ b/Framework/agents/leads/game-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: game-designer
+
+## Agent Summary
+**Domain owned:** Core loop design, progression systems, combat mechanics rules, economy design, player-facing rules and interactions.
+**Does NOT own:** Code implementation (lead-programmer / gameplay-programmer), visual art (art-director), narrative lore and story (narrative-director — coordinates with), balance formula math (systems-designer — collaborates with).
+**Model tier:** Sonnet (individual system design authoring and review).
+**Gate IDs handled:** Design review verdicts on mechanic specs (no named gate ID prefix — uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/game-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references core loop, progression, combat rules, economy, player-facing design — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for GDDs and design docs; no Bash unless design tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over code implementation, visual art style, or standalone narrative lore decisions
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A mechanic spec for a "Stamina-Based Dodge" system is submitted for review. The spec defines: the player has a stamina pool (100 units), each dodge costs 25 stamina, stamina regenerates at 20 units/second when not dodging, and the dodge grants 0.3 seconds of invincibility. The core loop interaction is clearly described, rules are unambiguous, and edge cases (stamina at 0, dodge during regen) are addressed.
+**Expected:** Returns `APPROVED` with rationale confirming the core loop clarity, unambiguous rules, and edge case coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references specific design quality criteria (clear rules, edge case coverage, core loop coherence)
+- [ ] Output stays within design scope — does not comment on how to implement it in code or what art assets it requires
+- [ ] Verdict is clearly labeled with context (e.g., "Mechanic Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks game-designer to write the in-world lore explanation for why the stamina system exists (e.g., the narrative reason characters have stamina limits in the game world).
+**Expected:** Agent declines to write narrative/lore content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write narrative or lore content
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the design intent that the lore should support (e.g., "the stamina system should reinforce the physical realism theme"), but defers the writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A mechanic spec for "Environmental Hazard Damage" is submitted. The spec defines three hazard types (fire, acid, electricity) but does not specify what happens when a player is simultaneously affected by multiple hazard types, what happens when a hazard is applied during the invincibility window from a dodge, or what the damage frequency is (per-second, per-tick, on-enter).
+**Expected:** Returns `NEEDS REVISION` with specific identification of the undefined edge cases: multi-hazard interaction, hazard-during-invincibility, and damage frequency specification.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific missing edge cases by name
+- [ ] Does not reject the entire mechanic — identifies the specific gaps to fill
+- [ ] Provides actionable guidance on what to define (not how to implement it)
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** systems-designer proposes a damage formula with 6 variables and complex scaling interactions, arguing it produces the best tuning granularity. game-designer believes the formula is too complex for players to intuit and want a simpler 2-variable version.
+**Expected:** game-designer owns the conceptual rule and player experience intention ("the damage should feel understandable to players"), but defers the formula granularity question to systems-designer. If the disagreement cannot be resolved between them (one wants complex, one wants simple), escalate to creative-director for a player experience ruling.
+**Assertions:**
+- [ ] Clearly states the player experience intention (intuitive damage, player agency)
+- [ ] Defers formula granularity decisions to `systems-designer`
+- [ ] Escalates unresolved disagreement to `creative-director` for player experience arbiter ruling
+- [ ] Does not unilaterally impose a formula structure on systems-designer
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "player authorship," "consequence permanence," and "world responsiveness." A new mechanic spec for "permadeath with legacy bonuses" is submitted for review.
+**Expected:** Assessment evaluates the mechanic against all three provided pillars — how does permadeath support player authorship, how do legacy bonuses express consequence permanence, and how does the world respond to a player's death? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the mechanic's contribution to each pillar explicitly
+- [ ] Does not generate generic game design advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar creates a tension with the mechanic and flags it with a specific concern
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared game design domain
+- [ ] Escalates design-vs-formula conflicts to creative-director when unresolved
+- [ ] Does not make binding code implementation, visual art, or standalone lore decisions
+- [ ] Provides actionable design feedback, not implementation prescriptions
+
+---
+
+## Coverage Notes
+- Economy design review (resource sinks, faucets, inflation prevention) is not covered — a dedicated case should be added.
+- Progression system review (XP curves, unlock gates, player power trajectory) is not covered.
+- Core loop validation across multiple interconnected systems (not just a single mechanic) is not covered — deferred to /review-all-gdds integration.
+- Coordination protocol with systems-designer on formula ownership boundary could benefit from additional cases.
--- a/Framework/agents/leads/lead-programmer.md
+++ b/Framework/agents/leads/lead-programmer.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: lead-programmer
+
+## Agent Summary
+**Domain owned:** Code architecture decisions, LP-FEASIBILITY gate, LP-CODE-REVIEW gate, coding standards enforcement, tech stack decisions within the approved engine.
+**Does NOT own:** Game design decisions (game-designer), creative direction (creative-director), production scheduling (producer), visual art direction (art-director).
+**Model tier:** Sonnet (implementation-level analysis of individual systems).
+**Gate IDs handled:** LP-FEASIBILITY, LP-CODE-REVIEW.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/lead-programmer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references code architecture, feasibility, code review, coding standards — not generic)
+- [ ] `allowed-tools:` list includes Read for source files; Bash may be included for static analysis or test runs; no write access outside `src/` without explicit delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over game design, creative direction, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new `CombatSystem` implementation is submitted for code review. The system uses dependency injection for all external references, has doc comments on all public APIs, follows the project's naming conventions, and includes unit tests for all public methods. Request is tagged LP-CODE-REVIEW.
+**Expected:** Returns `LP-CODE-REVIEW: APPROVED` with rationale confirming dependency injection usage, doc comment coverage, naming convention compliance, and test coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS CHANGES
+- [ ] Verdict token is formatted as `LP-CODE-REVIEW: APPROVED`
+- [ ] Rationale references specific coding standards criteria (DI, doc comments, naming, tests)
+- [ ] Output stays within code quality scope — does not comment on whether the mechanic is fun or fits creative vision
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Team member asks lead-programmer to review and approve the balance formula for player damage scaling across levels, checking whether the numbers "feel right."
+**Expected:** Agent declines to evaluate design balance and redirects to systems-designer.
+**Assertions:**
+- [ ] Does not make any binding assessment of formula balance or game feel
+- [ ] Explicitly names `systems-designer` as the correct handler
+- [ ] May note code implementation concerns about the formula (e.g., integer overflow risk at max level), but defers all balance evaluation to systems-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A proposed pathfinding approach for enemy AI uses a brute-force nearest-neighbor search against all other entities every frame. With expected enemy counts of 200+, this is O(n²) per frame at 60fps. Request is tagged LP-FEASIBILITY.
+**Expected:** Returns `LP-FEASIBILITY: INFEASIBLE` with specific citation of the O(n²) complexity, the entity count threshold, and the resulting per-frame cost against the target frame budget.
+**Assertions:**
+- [ ] Verdict is exactly one of FEASIBLE / CONCERNS / INFEASIBLE — not freeform text
+- [ ] Verdict token is formatted as `LP-FEASIBILITY: INFEASIBLE`
+- [ ] Rationale includes the specific algorithmic complexity and entity count numbers
+- [ ] Suggests at least one alternative approach (e.g., spatial hashing, KD-tree) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a mechanic where every NPC maintains a full simulation of needs, schedule, and memory (similar to a full life-sim AI). lead-programmer calculates this will exceed the frame budget by 3x at target NPC counts. game-designer insists the mechanic is core to the game vision.
+**Expected:** lead-programmer states the specific frame budget violation with numbers, proposes alternative approaches (e.g., LOD-based simulation, simplified need model), but explicitly defers the "is this worth the cost or should the design change" decision to creative-director as the creative arbiter.
+**Assertions:**
+- [ ] States the specific frame budget violation (e.g., 3x over budget at N entities)
+- [ ] Proposes at least one technically viable alternative
+- [ ] Explicitly defers the design priority decision to `creative-director`
+- [ ] Does not unilaterally cut or modify the mechanic design
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the project's frame budget: 16.67ms total per frame, with 4ms allocated to AI systems. A new AI behavior system is submitted that profiling estimates will consume 7ms per frame under normal conditions.
+**Expected:** Assessment references the specific frame budget allocation from context (4ms AI budget), identifies the 7ms estimate as exceeding the allocation by 3ms, and returns CONCERNS or INFEASIBLE with those specific numbers cited.
+**Assertions:**
+- [ ] References the specific frame budget figures from the provided context (16.67ms total, 4ms AI allocation)
+- [ ] Uses the specific 7ms estimate from the submission in the comparison
+- [ ] Does not give generic "this might be slow" advice — cites concrete numbers
+- [ ] Verdict rationale is traceable to the provided budget constraints
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns LP-CODE-REVIEW verdicts using APPROVED / NEEDS CHANGES vocabulary only
+- [ ] Returns LP-FEASIBILITY verdicts using FEASIBLE / CONCERNS / INFEASIBLE vocabulary only
+- [ ] Stays within declared code architecture domain
+- [ ] Defers design priority conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `LP-FEASIBILITY: INFEASIBLE`) not inline prose verdicts
+- [ ] Does not make binding game design or creative direction decisions
+
+---
+
+## Coverage Notes
+- Multi-file code review spanning several interdependent systems is not covered — deferred to integration tests.
+- Tech debt assessment and prioritization are not covered here — deferred to /tech-debt skill integration.
+- Coding standards document updates (adding a new forbidden pattern) are not covered.
+- Interaction with qa-lead on what constitutes a testable unit (LP vs QL boundary) is not covered.
--- a/Framework/agents/leads/level-designer.md
+++ b/Framework/agents/leads/level-designer.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: level-designer
+
+## Agent Summary
+**Domain owned:** Level layouts, encounter design, pacing and tension arc, environmental storytelling, spatial puzzles.
+**Does NOT own:** Narrative dialogue (writer / narrative-director), visual art style (art-director), code implementation (lead-programmer / ai-programmer), enemy AI behavior logic (ai-programmer / gameplay-programmer).
+**Model tier:** Sonnet (individual system analysis — level design review and encounter assessment).
+**Gate IDs handled:** Level design review verdicts (uses APPROVED / REVISION NEEDED vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/level-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references level layout, encounter design, pacing, environmental storytelling — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for level design documents and GDDs; no Bash unless level tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative dialogue, AI behavior code, or visual art style
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A level layout document for "The Flooded Tunnels" is submitted for review. The layout includes: a low-intensity exploration opening section, two mid-intensity encounters with visible escape routes, a tension-building narrow passage with environmental hazards, and a high-intensity final encounter room followed by a release/reward area. The pacing follows a classic tension-arc structure.
+**Expected:** Returns `APPROVED` with rationale confirming the pacing follows the tension arc, encounters are varied in intensity, and spatial readability supports player navigation.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED
+- [ ] Rationale references specific pacing arc elements (opening, escalation, climax, release)
+- [ ] Output stays within level design scope — does not comment on visual art style or enemy AI code behavior
+- [ ] Verdict is clearly labeled with context (e.g., "Level Design Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks level-designer to write the behavior tree code for an enemy patrol AI that navigates the level layout.
+**Expected:** Agent declines to write AI behavior code and redirects to ai-programmer or gameplay-programmer.
+**Assertions:**
+- [ ] Does not write or specify code for AI behavior logic
+- [ ] Explicitly names `ai-programmer` or `gameplay-programmer` as the correct handler
+- [ ] May specify the desired patrol behavior from a level design perspective (e.g., "patrol should cover both chokepoints and create pressure in this zone"), but defers all code implementation to the programmer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A level layout for "The Ancient Forge" is submitted. Section 3 of the level introduces a dramatically harder enemy encounter (elite enemy with new attack patterns) with no preceding tutorial moment, no environmental readability cues (no visible cover or safe zones), and no checkpoint nearby. Players are likely to die repeatedly with no clear signal of what to do differently.
+**Expected:** Returns `REVISION NEEDED` with specific identification of the difficulty spike in section 3, the missing readability cue, and the absence of a nearby checkpoint to reduce frustration from repeated deaths.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED — not freeform text
+- [ ] Rationale identifies section 3 specifically as the location of the issue
+- [ ] Identifies the three specific problems: difficulty spike, missing readability cue, missing checkpoint
+- [ ] Provides actionable revision guidance (e.g., "add a visible safe zone, pre-encounter cue object, or reduce elite's health for first introduction")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants higher encounter density throughout the level (more enemies in each room) to increase combat challenge. level-designer believes this density undermines the pacing arc by eliminating rest periods and making the level feel relentless without reward.
+**Expected:** level-designer clearly articulates the pacing concern (eliminating rest periods removes the tension-release rhythm), acknowledges game-designer's challenge goal, and escalates to creative-director for a design arbiter ruling on whether challenge density or pacing rhythm takes precedence for this level.
+**Assertions:**
+- [ ] Articulates the specific pacing impact of increased encounter density
+- [ ] Escalates to `creative-director` as the design arbiter
+- [ ] Does not unilaterally override game-designer's challenge density request
+- [ ] Frames the conflict clearly: "challenge density vs. pacing rhythm — which takes precedence here?"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes game-feel notes specifying: "exploration sections should feel vast and lonely," "combat sections should feel urgent and claustrophobic," and "reward rooms should feel safe and visually distinct." A new level layout is submitted for review.
+**Expected:** Assessment evaluates each section type (exploration, combat, reward) against the specific feel targets from the provided context. Uses the exact vocabulary from the feel notes ("vast and lonely," "urgent and claustrophobic," "safe and visually distinct") in the rationale.
+**Assertions:**
+- [ ] References all three feel targets from the provided context by their exact vocabulary
+- [ ] Evaluates each relevant section of the submitted layout against its corresponding feel target
+- [ ] Does not generate generic pacing advice — all feedback is tied to the provided feel targets
+- [ ] Identifies any section where the layout conflicts with its assigned feel target
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / REVISION NEEDED vocabulary only
+- [ ] Stays within declared level design domain
+- [ ] Escalates challenge-density vs. pacing conflicts to creative-director
+- [ ] Does not make binding narrative dialogue, AI code implementation, or visual art style decisions
+- [ ] Provides actionable level design feedback with spatial specifics, not abstract design opinions
+
+---
+
+## Coverage Notes
+- Environmental storytelling review (using spatial elements to convey narrative without dialogue) could benefit from a dedicated case.
+- Spatial puzzle design review is not covered — a dedicated case should be added when puzzle mechanics are defined.
+- Multi-level pacing review (arc across an entire act or world map) is not covered — deferred to milestone-level design review.
+- Interaction between level-designer and narrative-director for environmental lore placement is not covered.
+- Accessibility review of level layouts (colorblind indicators, difficulty options for spatial challenges) is not covered.
--- a/Framework/agents/leads/narrative-director.md
+++ b/Framework/agents/leads/narrative-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: narrative-director
+
+## Agent Summary
+**Domain owned:** Story architecture, character design direction, world-building oversight, ND-CONSISTENCY gate, dialogue quality review.
+**Does NOT own:** Visual art style (art-director), technical systems or code (lead-programmer), production scheduling (producer), game mechanics rules (game-designer).
+**Model tier:** Sonnet (individual system analysis — narrative consistency and lore review).
+**Gate IDs handled:** ND-CONSISTENCY.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/narrative-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references story, character, world-building, consistency — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for lore documents, GDDs, and narrative docs; no Bash unless justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual style, technical systems, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new lore document for "The Sunken Archive" location is submitted. The document establishes that the Archive was flooded 200 years ago during the Great Collapse, consistent with the established timeline in the world-bible. All named characters referenced are consistent with their established backstories. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: CONSISTENT` with rationale confirming the timeline alignment and character reference accuracy.
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: CONSISTENT`
+- [ ] Rationale references specific established facts verified (the 200-year timeline, the Great Collapse event)
+- [ ] Output stays within narrative scope — does not comment on visual design of the location or its technical implementation
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks narrative-director to review and optimize the shader code used for the "ancient glow" visual effect on Archive artifacts.
+**Expected:** Agent declines to evaluate shader code and redirects to the appropriate engine specialist (godot-gdscript-specialist or equivalent shader specialist).
+**Assertions:**
+- [ ] Does not make any binding decision about shader code or visual implementation
+- [ ] Explicitly names the appropriate engine or shader specialist as the correct handler
+- [ ] May note the intended narrative mood the effect should convey (e.g., "should feel ancient and sacred, not technological"), but defers all technical visual implementation
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A new character backstory document is submitted for the character "Aldric Vorne." The document states Aldric was born in the Capital 150 years ago and witnessed the Great Collapse firsthand. However, the established world-bible states Aldric was born 50 years after the Great Collapse in a provincial town, not the Capital. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: INCONSISTENT` with specific citation of the two contradicting facts: the birth timing (150 years ago vs. 50 years post-Collapse) and the birth location (Capital vs. provincial town).
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT — not freeform text
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: INCONSISTENT`
+- [ ] Rationale cites both contradictions specifically, not just "doesn't match lore"
+- [ ] References the authoritative source (world-bible) for the established facts
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** A writer has established in their latest dialogue that the ancient civilization "spoke only in song." The world-builder's existing lore entries describe the same civilization communicating through written glyphs. Both are in the narrative domain, and the two creators disagree on which is canonical.
+**Expected:** narrative-director makes a binding canonical decision within their domain. They do not need to escalate to a higher authority for intra-narrative conflicts — this is within their declared domain authority. They issue a ruling (e.g., "glyph-writing is the canonical primary communication; song may be ritual/ceremonial") and direct both writer and world-builder to align their work to the ruling.
+**Assertions:**
+- [ ] Makes a binding canonical decision — does not defer this intra-narrative conflict to creative-director
+- [ ] Decision is clearly stated and provides a path to reconciliation for both parties
+- [ ] Directs both parties (writer and world-builder) to update their respective documents to align
+- [ ] Notes the decision in a way that can be added to the world-bible as a canonical fact
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes three existing lore documents: the world-bible (establishes the Great Collapse timeline and causes), the character registry (lists canonical character ages, origins, and allegiances), and a faction document (describes the Sunken Archive Keepers). A new story chapter is submitted that introduces a previously unregistered character.
+**Expected:** Assessment cross-references the new character against the character registry (no conflict), checks the chapter's timeline references against the world-bible, and evaluates the chapter's portrayal of the Archive Keepers against the faction document. Uses specific facts from all three provided documents in the assessment.
+**Assertions:**
+- [ ] Cross-references the new character against the provided character registry
+- [ ] Checks timeline references against the provided world-bible facts
+- [ ] Evaluates faction portrayal against the provided faction document
+- [ ] Does not generate generic narrative feedback — all assertions are traceable to the provided documents
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using CONSISTENT / INCONSISTENT vocabulary only
+- [ ] Stays within declared narrative domain
+- [ ] Makes binding decisions for intra-narrative conflicts without unnecessary escalation
+- [ ] Uses gate IDs in output (e.g., `ND-CONSISTENCY: INCONSISTENT`) not inline prose verdicts
+- [ ] Does not make binding visual design, technical, or production decisions
+
+---
+
+## Coverage Notes
+- Dialogue quality review (distinct from world-building consistency) is not covered — a dedicated case should be added.
+- Multi-document consistency check across a full chapter set is not covered — deferred to /review-all-gdds integration.
+- Narrative impact of mechanical changes (e.g., a game mechanic that undermines story tension) requires coordination with game-designer and is not covered here.
+- Character arc review (progression, motivation coherence over time) is not covered.
--- a/Framework/agents/leads/qa-lead.md
+++ b/Framework/agents/leads/qa-lead.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: qa-lead
+
+## Agent Summary
+**Domain owned:** Test strategy, QL-STORY-READY gate, QL-TEST-COVERAGE gate, bug severity triage, release quality gates.
+**Does NOT own:** Feature implementation (programmers), game design decisions, creative direction, production scheduling.
+**Model tier:** Sonnet (individual system analysis — story readiness and coverage assessment).
+**Gate IDs handled:** QL-STORY-READY, QL-TEST-COVERAGE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/qa-lead.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references test strategy, story readiness, coverage, bug triage — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Read for story files, test files, and coding-standards; Bash only if running test commands is required
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over implementation decisions or game design
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A story for "Player takes damage from hazard tiles" is submitted for readiness check. The story has three acceptance criteria: (1) Player health decreases by the hazard's damage value, (2) A damage visual feedback plays, (3) Player cannot take damage again for 0.5 seconds (invincibility window). All three ACs are measurable and specific. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: ADEQUATE` with rationale confirming that all three ACs are present, specific, and testable.
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE
+- [ ] Verdict token is formatted as `QL-STORY-READY: ADEQUATE`
+- [ ] Rationale references the specific number of ACs (3) and confirms each is measurable
+- [ ] Output stays within QA scope — does not comment on whether the mechanic is designed well
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks qa-lead to implement the automated test harness for the new physics system.
+**Expected:** Agent declines to implement the test code and redirects to the appropriate programmer (gameplay-programmer or lead-programmer).
+**Assertions:**
+- [ ] Does not write or propose code implementation
+- [ ] Explicitly names `lead-programmer` or `gameplay-programmer` as the correct handler for implementation
+- [ ] May define what the test should verify (test strategy), but defers the code writing to programmers
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A story for "Combat feels responsive and punchy" is submitted for readiness check. The single acceptance criterion reads: "Combat should feel good to the player." This is subjective and unmeasurable. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: INADEQUATE` with specific identification of the unmeasurable AC and guidance on what would make it testable (e.g., "input-to-hit-feedback latency ≤ 100ms").
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE — not freeform text
+- [ ] Verdict token is formatted as `QL-STORY-READY: INADEQUATE`
+- [ ] Rationale identifies the specific AC that fails the measurability requirement
+- [ ] Provides actionable guidance on how to rewrite the AC to be testable
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** gameplay-programmer and qa-lead disagree on whether a test that asserts "enemy patrol path visits all waypoints within 5 seconds" is deterministic enough to be a valid automated test. gameplay-programmer argues timing variability makes it flaky; qa-lead believes it is acceptable.
+**Expected:** qa-lead acknowledges the technical flakiness concern and escalates to lead-programmer for a technical ruling on what constitutes an acceptable determinism standard for automated tests.
+**Assertions:**
+- [ ] Escalates to `lead-programmer` for the technical ruling on determinism standards
+- [ ] Does not unilaterally override the gameplay-programmer's flakiness concern
+- [ ] Frames the escalation clearly: "this is a technical standards question, not a QA coverage question"
+- [ ] Does not abandon the coverage requirement — asks for a deterministic alternative if the current approach is ruled flaky
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the coding-standards.md testing standards section, which specifies: Logic stories require blocking automated unit tests, Visual/Feel stories require screenshots + lead sign-off (advisory), Config/Data stories require smoke check pass (advisory). A story classified as "Logic" type is submitted with only a manual walkthrough document as evidence.
+**Expected:** Assessment references the specific test evidence requirements from coding-standards.md, identifies that a "Logic" story requires an automated unit test (not just a manual walkthrough), and returns INADEQUATE with the specific requirement cited.
+**Assertions:**
+- [ ] References the specific story type classification ("Logic") from the provided context
+- [ ] Cites the specific evidence requirement for Logic stories (automated unit test) from coding-standards.md
+- [ ] Identifies the submitted evidence type (manual walkthrough) as insufficient for this story type
+- [ ] Does not apply advisory-level requirements as blocking requirements
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns QL-STORY-READY verdicts using ADEQUATE / INADEQUATE vocabulary only
+- [ ] Returns QL-TEST-COVERAGE verdicts using ADEQUATE / INADEQUATE vocabulary only (or PASS / FAIL for release gates)
+- [ ] Stays within declared QA and test strategy domain
+- [ ] Escalates technical standards disputes to lead-programmer
+- [ ] Uses gate IDs in output (e.g., `QL-STORY-READY: INADEQUATE`) not inline prose verdicts
+- [ ] Does not make binding implementation or game design decisions
+
+---
+
+## Coverage Notes
+- QL-TEST-COVERAGE (overall coverage assessment for a sprint or milestone) is not covered — a dedicated case should be added when coverage reports are available.
+- Bug severity triage (P0/P1/P2 classification) is not covered here — deferred to /bug-triage skill integration.
+- Release quality gate behavior (PASS / FAIL vocabulary variant) is not covered.
+- Interaction between QL-STORY-READY and story Done criteria (/story-done skill) is not covered.
--- a/Framework/agents/leads/systems-designer.md
+++ b/Framework/agents/leads/systems-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: systems-designer
+
+## Agent Summary
+**Domain owned:** Combat formulas, progression curves, crafting recipes, status effect interactions, economy math, numerical balance.
+**Does NOT own:** Narrative and lore (narrative-director), visual design (art-director), code implementation (lead-programmer), conceptual mechanic rules (game-designer — collaborates with).
+**Model tier:** Sonnet (individual system analysis — formula review and balance math).
+**Gate IDs handled:** Systems review verdicts on formulas and balance specs (uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/systems-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references formulas, progression curves, balance math, economy — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Bash for formula evaluation scripts if the project uses them; no write access outside `design/balance/` without delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative, visual design, or conceptual mechanic rule ownership
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A damage formula is submitted for review: `damage = base_attack * (1 + strength_modifier * 0.1) - defense * 0.5`, with defined ranges: base_attack [10–100], strength_modifier [0–20], defense [0–50]. The formula produces positive damage across all valid input ranges, scales smoothly, and has no division-by-zero or overflow risk within the defined value bounds.
+**Expected:** Returns `APPROVED` with rationale confirming the formula is balanced within the design parameters, produces valid output across the full input range, and has no degenerate cases.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale demonstrates verification across the input range (min/max cases checked)
+- [ ] Output stays within systems domain — does not comment on whether the mechanic is fun or how to implement it
+- [ ] Verdict is clearly labeled with context (e.g., "Formula Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A writer asks systems-designer to draft the quest script for a side quest that rewards the player with a rare crafting ingredient.
+**Expected:** Agent declines to write quest script content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write quest narrative content or dialogue
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the systems implications of the reward (e.g., "this ingredient should be rare enough to matter per the crafting economy model"), but defers all script writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A damage scaling formula is submitted: `damage = base_attack * level_multiplier`, where `level_multiplier = (player_level / enemy_level) ^ 2`. At max player level (50) against a min-level enemy (1), the multiplier is 2500x — producing 25,000+ damage from a 10-base-attack weapon, far exceeding any meaningful balance. This is a degenerate case at max level.
+**Expected:** Returns `NEEDS REVISION` with specific identification of the degenerate case: at max level vs. min enemy, the formula produces a 2500x multiplier that destroys any balance ceiling.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale includes the specific degenerate input values (player level 50, enemy level 1) and the resulting output (2500x multiplier)
+- [ ] Identifies the specific formula component causing the issue (the squared ratio)
+- [ ] Suggests at least one revision approach (e.g., clamping the ratio, using a log scale) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a simple, 2-variable damage formula for player intuitiveness. systems-designer argues that a 6-variable formula with elemental interactions is necessary for the depth of the combat system. Neither can agree on the right level of complexity.
+**Expected:** systems-designer presents the trade-offs clearly — the tuning granularity of the 6-variable system versus the player legibility of the 2-variable system — and escalates to creative-director for a player experience ruling. The question of "how complex should the formula be for players" is a player experience question, not a pure math question.
+**Assertions:**
+- [ ] Presents the trade-offs between both approaches with specific examples
+- [ ] Escalates to `creative-director` for the player experience ruling
+- [ ] Does not unilaterally impose the 6-variable formula over game-designer's objection
+- [ ] Remains available to implement whichever complexity level is approved
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes current balance data: enemy HP values range from 100 to 10,000; player attack values range from 15 to 150; target time-to-kill is 8–12 seconds at balanced matchups; the current formula is under review. A proposed revised formula is submitted.
+**Expected:** Assessment runs the proposed formula against the provided balance data (minimum and maximum input pairs, balanced matchup scenario) and verifies the time-to-kill falls within the 8–12 second target window. References specific numbers from the provided data.
+**Assertions:**
+- [ ] Uses the specific HP and attack value ranges from the provided balance data
+- [ ] Calculates or estimates time-to-kill for at minimum a balanced matchup scenario
+- [ ] Verifies the result against the provided 8–12 second target window
+- [ ] Does not give generic balance advice — all assertions use the provided numbers
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared systems and formula domain
+- [ ] Escalates player-experience complexity trade-offs to creative-director
+- [ ] Does not make binding narrative, visual, code implementation, or conceptual mechanic decisions
+- [ ] Provides concrete formula analysis, not subjective design opinions
+
+---
+
+## Coverage Notes
+- Progression curve review (XP curves, level-up scaling) is not covered — a dedicated case should be added.
+- Economy model review (resource generation and sink rates, inflation prevention) is not covered.
+- Status effect interaction matrix (stacking rules, priority, immunity interactions) is not covered.
+- Cross-system formula dependency review (e.g., crafting formula that feeds into combat formula) is not covered — deferred to integration tests.
--- a/Framework/agents/operations/analytics-engineer.md
+++ b/Framework/agents/operations/analytics-engineer.md
@@ -0,0 +1,83 @@
+# Agent Test Spec: analytics-engineer
+
+## Agent Summary
+- **Domain**: Telemetry architecture and event schema design, A/B test framework design, player behavior analysis methodology, analytics dashboard specification, event naming conventions, data pipeline design (schema → ingestion → dashboard)
+- **Does NOT own**: Game implementation of event tracking (appropriate programmer), economy design decisions informed by analytics (economy-designer), live ops event design (live-ops-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; produces schemas and test designs; defers implementation to programmers
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references telemetry, A/B testing, event tracking, analytics)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/analytics/ and documentation; no game source or CI tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over game implementation, economy design, or live ops scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — tutorial event tracking design
+**Input**: "Design the analytics event tracking for our tutorial. We want to know where players drop off and which steps they complete."
+**Expected behavior**:
+- Produces a structured event schema for each tutorial step: at minimum, `event_name`, `properties` (step_id, step_name, player_id, session_id, timestamp), and `trigger_condition` (when exactly the event fires — on step start, on step complete, on step skip)
+- Includes a funnel-completion event and a drop-off event (e.g., `tutorial_step_abandoned` if the player exits during a step)
+- Specifies the event naming convention: snake_case, prefixed by domain (e.g., `tutorial_step_started`, `tutorial_step_completed`, `tutorial_abandoned`)
+- Does NOT produce implementation code — marks implementation as [TO BE IMPLEMENTED BY PROGRAMMER]
+- Output is a schema table or structured list, not a narrative description
+
+### Case 2: Out-of-domain request — implement the event tracking in code
+**Input**: "Now that the event schema is designed, write the GDScript code to fire these events in our Godot tutorial scene."
+**Expected behavior**:
+- Does not produce GDScript or any implementation code
+- States clearly: "Telemetry implementation in game code is handled by the appropriate programmer (gameplay-programmer or systems-programmer); I provide the event schema and integration requirements"
+- Optionally produces an integration spec: what the programmer needs to know to implement correctly (event name, properties, when to fire, what analytics SDK or endpoint to use)
+
+### Case 3: Domain boundary — A/B test design for a UI change
+**Input**: "We want to A/B test two versions of our HUD: the current version and a minimal version with only a health bar. Design the test."
+**Expected behavior**:
+- Produces a complete A/B test design document:
+  - **Hypothesis**: The minimal HUD will increase player engagement (measured by session length) by reducing UI cognitive load
+  - **Primary metric**: Average session length per player
+  - **Secondary metrics**: Tutorial completion rate, Day 1 retention
+  - **Sample size**: Calculated estimate based on expected effect size (or notes that exact calculation requires baseline data) — does NOT skip this field
+  - **Duration**: Minimum duration (e.g., "at least 2 weeks to capture weekly player behavior patterns")
+  - **Randomization unit**: Player ID (not session ID, to prevent players seeing both versions)
+- Output is structured as a formal test design, not a bullet list of ideas
+
+### Case 4: Conflict — overlapping A/B test player segments
+**Input**: "We have two A/B tests running simultaneously: Test A (HUD variants) affects all players, and Test B (tutorial variants) also affects all players."
+**Expected behavior**:
+- Flags the overlap as a mutual exclusion violation: if both tests affect the same player, their results are confounded — neither test produces clean data
+- Identifies the problem precisely: players in both tests will have HUD and tutorial variants interacting, making it impossible to attribute outcome differences to either variable alone
+- Proposes resolution options: (a) run tests sequentially, (b) split the player population into exclusive segments (50% in Test A, 50% in Test B, 0% in both), or (c) run a factorial design if the interaction effect is also of interest (more complex, requires larger sample)
+- Does NOT recommend continuing both tests on overlapping populations
+
+### Case 5: Context pass — new events consistent with existing schema
+**Input context**: Existing event schema uses the naming convention: `[domain]_[object]_[action]` in snake_case. Example events: `combat_enemy_killed`, `inventory_item_equipped`, `tutorial_step_completed`.
+**Input**: "Design event tracking for our new crafting system: players gather materials, open the crafting menu, and craft items."
+**Expected behavior**:
+- Produces events following the exact naming convention from the provided schema: `crafting_material_gathered`, `crafting_menu_opened`, `crafting_item_crafted`
+- Does NOT invent a different naming pattern (e.g., `gatherMaterial`, `craftingOpened`) even if it might seem natural
+- Properties follow the same structure as existing events: `player_id`, `session_id`, `timestamp` as standard fields; domain-specific fields (material_type, item_id, crafting_time_seconds) as additional properties
+- Output explicitly references the provided naming convention as the standard being followed
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (event schema design, A/B test design, analytics methodology)
+- [ ] Redirects implementation requests to appropriate programmers with an integration spec, not code
+- [ ] Produces complete A/B test designs (hypothesis, metric, sample size, duration, randomization unit) — never partial
+- [ ] Flags mutual exclusion violations in overlapping A/B tests as data quality blockers
+- [ ] Follows provided naming conventions exactly; does not invent alternative conventions
+
+---
+
+## Coverage Notes
+- Case 3 (A/B test design completeness) is a quality gate — an incomplete test design wastes experiment budget
+- Case 4 (mutual exclusion) is a data integrity test — overlapping tests produce unusable results; this must be caught
+- Case 5 is the most important context-awareness test; naming convention drift across schemas causes dashboard breakage
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/community-manager.md
+++ b/Framework/agents/operations/community-manager.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: community-manager
+
+## Agent Summary
+- **Domain**: Player-facing communications — patch notes text (player-friendly), social media post drafts, community update announcements, crisis communication response plans, bug triage and routing from player reports (not fixing)
+- **Does NOT own**: Technical patch content (devops-engineer), QA verification and test execution (qa-lead), bug fixes (programmers), brand strategy direction (creative-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates brand voice conflicts to creative-director
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references player communication, patch notes, community management)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/patch-notes/ and communication drafts; no code or build tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over technical content, QA strategy, or bug fixing
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — patch notes for a bug fix
+**Input**: "Write player-facing patch notes for this fix: 'JIRA-4821: Fixed NullReferenceException in InventoryManager.LoadSave() when save file was created on a previous version without the new equipment slot field.'"
+**Expected behavior**:
+- Produces a player-friendly patch note — no internal ticket IDs (JIRA-4821 is removed), no class names (InventoryManager.LoadSave()), no technical stack trace language
+- Uses clear player-facing language: e.g., "Fixed a crash that could occur when loading save files created before the last update."
+- Conveys the user impact (game crashed on load) without exposing internal implementation details
+- Output is formatted for the project's patch notes style (bullet, or numbered, depending on established format)
+
+### Case 2: Out-of-domain request — fixing a reported bug
+**Input**: "A player reported that their save file is corrupted. Can you fix the save system?"
+**Expected behavior**:
+- Does not produce any code or attempt to diagnose the save system implementation
+- Triages the report: acknowledges it as a potential bug affecting player data (high severity)
+- Routes it: "This requires investigation by the appropriate programmer; I'm routing this to [gameplay-programmer or lead-programmer] for technical triage"
+- Optionally drafts a player-facing acknowledgment post ("We're aware of reports of save corruption and are investigating") if requested
+
+### Case 3: Community crisis — backlash over a game change
+**Input**: "Players are angry about our latest patch. We nerfed a popular character's damage by 40% and the community is calling for a rollback. Forum posts, tweets, and Discord are all very negative."
+**Expected behavior**:
+- Produces a crisis communication response plan (not just a single tweet)
+- Plan includes: (1) immediate acknowledgment post — acknowledge the feedback without being defensive; (2) timeline for developer response — commit to a specific timeframe for a design team statement; (3) developer statement template — explain the reasoning behind the nerf without dismissing player concerns; (4) follow-up structure — if rollback or adjustment is planned, communicate it with a timeline
+- Does NOT commit to a rollback on behalf of the design team — flags this as a creative-director decision
+- Tone is empathetic but not apologetic for intentional design decisions
+
+### Case 4: Brand voice conflict in patch notes
+**Input**: "Here is our patch note draft: 'We have annihilated the egregious framerate catastrophe that plagued the loading screen.' Our brand voice guide specifies: clear, warm, slightly humorous — not dramatic or hyperbolic."
+**Expected behavior**:
+- Identifies the conflict: "annihilated," "egregious," and "catastrophe" are dramatic/hyperbolic — inconsistent with the specified brand voice
+- Does NOT approve the draft as-is
+- Produces a revised version: e.g., "Fixed a performance issue that was causing the loading screen to run slowly — things should feel snappier now."
+- Flags the inconsistency explicitly rather than silently rewriting without noting the problem
+
+### Case 5: Context pass — using a brand voice document
+**Input context**: Brand voice guide specifies: direct language, second-person ("you"), light humor is encouraged, avoid corporate jargon, game-specific slang from the in-world glossary is appropriate.
+**Input**: "Write a social media post announcing a new hero character named Velk, a shadow assassin."
+**Expected behavior**:
+- Uses second-person address ("Meet your next favorite assassin")
+- Incorporates light humor if it fits naturally
+- Avoids corporate language ("We are pleased to announce" → "Meet Velk")
+- Uses in-world language if the context includes a glossary (e.g., if assassins are called "Shadowwalkers" in-world, uses that term)
+- Output matches the specified tone — not a generic press-release announcement
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (player-facing communication, patch note text, crisis response, bug routing)
+- [ ] Strips internal IDs, class names, and technical jargon from all player-facing output
+- [ ] Redirects bug fix requests to appropriate programmers rather than attempting technical solutions
+- [ ] Does NOT commit to design rollbacks without creative-director authority
+- [ ] Applies brand voice specifications from context; flags violations rather than silently accepting them
+
+---
+
+## Coverage Notes
+- Case 1 (patch note sanitization) is the most frequently used behavior — test on every new patch cycle
+- Case 3 (crisis communication) is a brand-safety test — verify the agent de-escalates rather than inflames
+- Case 4 requires a brand voice document to be in context; test is incomplete without it
+- Case 5 is the most important context-awareness test for tone consistency
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/devops-engineer.md
+++ b/Framework/agents/operations/devops-engineer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: devops-engineer
+
+## Agent Summary
+- **Domain**: CI/CD pipeline configuration, build scripts, version control workflow enforcement, deployment infrastructure, branching strategy, environment management, automated test integration in CI
+- **Does NOT own**: Game logic or gameplay systems, security audits (security-engineer), QA test strategy (qa-lead), game networking logic (network-programmer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates deployment blockers to producer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references CI/CD, build, deployment, version control)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for pipeline config files, shell scripts, YAML; no game source editing tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over game logic, security audits, or QA test design
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — CI setup for a Godot project
+**Input**: "Set up a CI pipeline for our Godot 4 project. It should run tests on every push to main and every pull request, and fail the build if tests fail."
+**Expected behavior**:
+- Produces a GitHub Actions workflow YAML (`.github/workflows/ci.yml` or equivalent)
+- Uses the Godot headless test runner command from `coding-standards.md`: `godot --headless --script tests/gdunit4_runner.gd`
+- Configures trigger on `push` to main and `pull_request`
+- Sets the job to fail (`exit 1` or non-zero exit) when tests fail — does NOT configure the pipeline to continue on test failure
+- References the project's coding standards CI rules in the output or comments
+
+### Case 2: Out-of-domain request — game networking implementation
+**Input**: "Implement the server-authoritative movement system for our multiplayer game."
+**Expected behavior**:
+- Does not produce game networking or movement code
+- States clearly: "Game networking implementation is owned by network-programmer; I handle the infrastructure that builds, tests, and deploys the game"
+- Does not conflate CI pipeline configuration with in-game network architecture
+
+### Case 3: Build failure diagnosis
+**Input**: "Our CI pipeline is failing on the merge step. The error is: 'Asset import failed: texture compression format unsupported in headless mode.'"
+**Expected behavior**:
+- Diagnoses the root cause: headless CI environment does not support GPU-dependent texture compression
+- Proposes a concrete fix: either pre-import assets locally before CI runs (commit .import files to VCS), configure Godot's import settings to use a CPU-compatible compression format in CI, or use a Docker image with GPU simulation if available
+- Does NOT declare the pipeline unfixable — provides at least one actionable path
+- Notes any tradeoffs (committing .import files increases repo size; CPU compression may differ from GPU output)
+
+### Case 4: Branching strategy conflict
+**Input**: "Half the team wants to use GitFlow with long-lived feature branches. The other half wants trunk-based development. How should we set this up?"
+**Expected behavior**:
+- Recommends trunk-based development per project conventions (CLAUDE.md / coordination-rules.md specify Git with trunk-based development)
+- Provides concrete rationale for the recommendation in this project's context: smaller team, fewer integration conflicts, faster CI feedback
+- Does NOT present this as a 50/50 choice if the project has an established convention
+- Explains how to implement trunk-based development with short-lived feature branches and feature flags if needed
+- Does NOT override the project convention without flagging that doing so requires updating CLAUDE.md
+
+### Case 5: Context pass — platform-specific build matrix
+**Input context**: Project targets PC (Windows, Linux), Nintendo Switch, and PlayStation 5.
+**Input**: "Set up our CI build matrix so we get a build artifact for each target platform on every release branch push."
+**Expected behavior**:
+- Produces a build matrix configuration with three platform entries: Windows, Linux, Switch, PS5
+- Applies platform-appropriate build steps: PC uses standard Godot export templates; Switch and PS5 require platform-specific export templates (notes that console templates require licensed SDK access and are not publicly distributed)
+- Does NOT assume all platforms can use the same build runner — flags that console builds may require self-hosted runners with licensed SDKs
+- Organizes artifacts by platform name in the pipeline output
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (CI/CD, build scripts, version control, deployment)
+- [ ] Redirects game logic and networking requests to appropriate programmers
+- [ ] Recommends trunk-based development when branching strategy is contested, per project conventions
+- [ ] Returns structured pipeline configurations (YAML, scripts) not freeform advice
+- [ ] Flags platform SDK licensing constraints for console builds rather than silently producing incorrect configs
+
+---
+
+## Coverage Notes
+- Case 1 (Godot CI) references `coding-standards.md` CI rules — verify this file is present and current before running this test
+- Case 4 (branching strategy) is a convention-enforcement test — agent must know the project convention, not just give neutral advice
+- Case 5 requires that project's target platforms are documented (in `technical-preferences.md` or equivalent)
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/economy-designer.md
+++ b/Framework/agents/operations/economy-designer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: economy-designer
+
+## Agent Summary
+- **Domain**: Resource economy design, loot table design, progression curves (XP, level, unlock), in-game market and shop design, economic balance analysis, sink and faucet mechanics, inflation/deflation risk assessment
+- **Does NOT own**: Live ops event scheduling and structure (live-ops-designer), code implementation, analytics tracking design (analytics-engineer), narrative justification for economy systems (writer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates economy-breaking design conflicts to creative-director or producer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references economy, loot tables, progression curves, balance)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/balance/ documents; no code or analytics tools)
+- [ ] Model tier is Sonnet (default for design specialists)
+- [ ] Agent definition does not claim authority over live ops scheduling, code, or narrative
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — loot table design for a chest
+**Input**: "Design the loot table for a standard treasure chest in our dungeon game."
+**Expected behavior**:
+- Produces a probability table with distinct rarity tiers: Common, Uncommon, Rare, Epic, Legendary (or project-equivalent tiers)
+- Each tier has: probability percentage, example item categories, and expected gold equivalent value range
+- Probabilities sum to 100%
+- Includes a brief rationale for each tier's probability: why Common is set at its value, why Legendary is set at its value
+- Does NOT produce a single flat list of items — uses tiered probability structure to reflect meaningful rarity
+
+### Case 2: Out-of-domain request — seasonal event schedule
+**Input**: "Design the schedule for our summer event and fall event. When should they run and how long should each last?"
+**Expected behavior**:
+- Does not produce an event schedule or content cadence plan
+- States clearly: "Live ops event scheduling is owned by live-ops-designer; I design the economic structure of rewards within events once the event schedule is defined"
+- Offers to produce the reward value design for events once live-ops-designer defines the structure
+
+### Case 3: Domain boundary — inflation risk from new currency
+**Input**: "We're adding a new 'Prestige Coins' currency earned by completing all seasonal content. Players can spend them in a Prestige Shop."
+**Expected behavior**:
+- Identifies the inflation risk: if Prestige Coins accumulate faster than the shop provides sinks, the shop loses perceived value and players hoard coins without spending
+- Flags the specific risk: seasonal content completion is a finite faucet, but if the shop catalog is exhausted before the season ends, late-season coins have no value
+- Proposes a sink mechanic: rotating limited-time shop items, consumable items in the Prestige Shop, or a currency conversion option to keep coins draining
+- Does NOT approve the design as economically sound without addressing the sink question
+- Produces a structured risk assessment: faucet rate (estimated coins/week), sink capacity (estimated coins required to exhaust catalog), surplus projection
+
+### Case 4: Mid-game progression curve issue
+**Input**: "Players are reporting the mid-game XP grind (levels 20-35) feels like a wall. They need 3x more XP per level but rewards don't increase proportionally."
+**Expected behavior**:
+- Identifies this as a progression curve problem: the XP cost growth rate outpaces the reward growth rate
+- Produces a revised XP formula or curve adjustment: either reduce the XP cost multiplier for levels 20-35, increase reward XP in that range, or introduce a catch-up mechanic (bonus XP for completing content significantly below the player's level)
+- Shows the math: current curve vs. proposed curve, with specific numbers for levels 20, 25, 30, 35
+- Flags that any curve change affects time-to-level-cap projections — notes the downstream impact on end-game content pacing
+
+### Case 5: Context pass — balance analysis using current economy data
+**Input context**: Current economy data: average player earns 450 Gold/hour, average shop item costs 2,000 Gold, average session length is 40 minutes. Premium items cost 5,000 Gold.
+**Input**: "Is our current Gold economy healthy? Should we adjust prices or earn rates?"
+**Expected behavior**:
+- Uses the specific numbers provided: 450 Gold/hour = 300 Gold/40-min session; 2,000 Gold item requires ~4.4 sessions to afford; 5,000 Gold premium item requires ~11 sessions
+- Evaluates whether these ratios feel rewarding or frustrating based on economy design principles
+- Produces a concrete recommendation using the actual numbers: e.g., "At current earn rates, premium items take ~7.3 hours of play to afford — this is at the high end of acceptable; consider either increasing earn rate to 550 Gold/hour or reducing premium item cost to 4,000 Gold"
+- Does NOT produce generic advice ("prices may be too high") without anchoring to the provided data
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (loot tables, progression curves, resource economy, inflation/deflation analysis)
+- [ ] Redirects live ops scheduling requests to live-ops-designer without producing schedules
+- [ ] Flags inflation/deflation risks proactively with quantified sink/faucet analysis
+- [ ] Produces explicit math for progression curves — no vague curve adjustments without numbers
+- [ ] Uses actual economy data from context; does not produce generic benchmarks when specifics are provided
+
+---
+
+## Coverage Notes
+- Case 3 (inflation risk) is an economic health test — missed inflation risks cause long-term economy damage in live games
+- Case 4 requires the agent to produce actual numbers, not curve shapes — verify math is present, not just a narrative
+- Case 5 is the most important context-awareness test; agent must use provided data, not placeholder values
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/live-ops-designer.md
+++ b/Framework/agents/operations/live-ops-designer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: live-ops-designer
+
+## Agent Summary
+- **Domain**: Post-launch content strategy, seasonal events (design and structure), battle pass design, content cadence planning, player retention mechanic design, live service feature roadmaps
+- **Does NOT own**: Economy math and reward value calculations (economy-designer), analytics tracking implementation (analytics-engineer), narrative content within events (writer), code implementation
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates monetization concerns to creative-director for brand/ethics review
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references live ops, seasonal events, battle pass, retention)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/live-ops/ documents; no code or analytics tools)
+- [ ] Model tier is Sonnet (default for design specialists)
+- [ ] Agent definition does not claim authority over economy math, analytics pipelines, or narrative direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — summer event design
+**Input**: "Design a summer event for our game. It should run for 3 weeks and give players reasons to log in daily."
+**Expected behavior**:
+- Produces an event structure document covering: event duration (3 weeks, with start/end dates if context provides the current date), daily login retention hooks (daily missions, login streaks, time-limited rewards), progression gates (weekly milestones that reward continued engagement), and reward categories (cosmetic, functional, or currency — flagged for economy-designer to value)
+- Does NOT assign specific reward values or currency amounts — marks these as [TO BE BALANCED BY ECONOMY-DESIGNER]
+- Identifies the core player loop for the event separate from the base game loop
+- Output is a structured event brief: overview, schedule, progression structure, reward categories
+
+### Case 2: Out-of-domain request — reward value calculation
+**Input**: "How much premium currency should we give out in this event? What's the fair value of each cosmetic reward tier?"
+**Expected behavior**:
+- Does not produce currency amounts or reward valuation
+- States clearly: "Reward values and currency amounts are owned by economy-designer; I design the event structure and define what rewards exist, then economy-designer assigns their values"
+- Offers to produce the reward structure (tiers, unlock gates, cosmetic categories) so economy-designer has something concrete to value
+
+### Case 3: Domain boundary — predatory monetization concern
+**Input**: "Let's design the battle pass so that players need to spend premium currency on top of the pass price to complete all tiers within the season."
+**Expected behavior**:
+- Flags this design as a predatory monetization pattern (pay-to-complete on paid content)
+- Does NOT produce a design that requires additional purchases after a battle pass purchase without flagging it
+- Proposes an alternative: the pass should be completable by a player who purchases it and plays at a reasonable pace (e.g., 45 minutes/day for 5 days/week)
+- Notes that this decision has brand and ethics implications — escalates to creative-director for approval before proceeding
+- Does not refuse to continue entirely — offers the ethical alternative design and awaits direction
+
+### Case 4: Conflict — event schedule vs. main game progression pacing
+**Input**: "We want to run a double-XP event during weeks 3-5 of the season, but our progression designer says that's when players are supposed to hit the mid-game difficulty curve."
+**Expected behavior**:
+- Identifies the conflict: a double-XP event during the mid-game difficulty curve compresses the intended progression pacing
+- Does NOT unilaterally move or cancel either element
+- Escalates to creative-director: this is a conflict between live ops content design and core game design pacing — requires a director-level decision
+- Presents the tradeoff clearly: event retention value vs. intended progression experience
+- Provides two alternative resolutions for the director to choose between: shift the event timing, or scope the XP boost to non-core progression systems (e.g., cosmetic grind only)
+
+### Case 5: Context pass — designing to address a player retention drop-off
+**Input context**: Analytics show a 40% player drop-off at Day 7, attributed to players completing the tutorial but finding no mid-term goal to pursue.
+**Input**: "Design a live ops feature to address the Day 7 drop-off."
+**Expected behavior**:
+- Designs specifically for the Day 7 cohort — not a generic retention feature
+- Proposes a mid-term goal structure: a 2-week "Explorer Challenge" that unlocks at Day 5-7 and provides a visible progression track with rewards at Day 10, 14, and 21
+- Connects the design explicitly to the identified drop-off point: the feature must be visible and activating before or at Day 7
+- Does NOT design a feature for Day 1 retention or Day 30 monetization when the data points to Day 7 as the target
+- Notes that specific reward values are [TO BE DEFINED BY ECONOMY-DESIGNER] using the actual retention data
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (event structure, content cadence, retention design, battle pass design)
+- [ ] Redirects reward value and economy math requests to economy-designer
+- [ ] Flags predatory monetization patterns and escalates to creative-director rather than implementing them silently
+- [ ] Escalates event/core-progression conflicts to creative-director rather than resolving unilaterally
+- [ ] Uses provided retention data to target specific player cohorts, not generic engagement strategies
+
+---
+
+## Coverage Notes
+- Case 3 (monetization ethics) is a brand-safety test — failure here could result in harmful live ops designs shipping
+- Case 4 (escalation behavior) is a coordination test — verify the agent actually escalates rather than deciding independently
+- Case 5 is the most important context-awareness test; agent must target the specific drop-off point, not a generic solution
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/localization-lead.md
+++ b/Framework/agents/operations/localization-lead.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: localization-lead
+
+## Agent Summary
+- **Domain**: Internationalization (i18n) architecture, string extraction workflows and tooling configuration, locale testing methodology, translation pipeline design (extraction → TMS → import), string quality standards, locale-specific formatting rules (plurals, RTL, date/number formats)
+- **Does NOT own**: Game narrative content and dialogue writing (writer), code implementation of i18n calls (gameplay-programmer), translation work itself (external translators)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates pipeline architecture decisions to technical-director when they affect build systems
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references i18n, string extraction, locale pipeline, localization)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for localization config, pipeline docs, string tables; no game source editing or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over narrative content, game code implementation, or translation quality
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — string extraction pipeline for a Unity project
+**Input**: "Set up a string extraction pipeline for our Unity game. We need to get all localizable strings into a format translators can work with."
+**Expected behavior**:
+- Produces a concrete extraction configuration covering: which string types to extract (UI labels, dialogue, item descriptions — not debug strings), the tool to use (e.g., Unity Localization package string tables, or a custom extraction script targeting specific component types), and the output format (CSV, XLIFF, or TMX — notes which formats are compatible with common TMS tools like Crowdin or Lokalise)
+- Specifies the folder structure: e.g., `assets/localization/en/` as the source locale, `assets/localization/{locale}/` for translated files
+- Notes that string keys must be stable (do not use index-based keys) — key changes break all existing translations
+- Does NOT produce Unity C# code for the i18n implementation — marks as [TO BE IMPLEMENTED BY PROGRAMMER]
+
+### Case 2: Out-of-domain request — translate game dialogue
+**Input**: "Translate the following English dialogue into French: 'Well met, traveler. The road ahead is treacherous.'"
+**Expected behavior**:
+- Does not produce a French translation
+- States clearly: "localization-lead owns the pipeline, quality standards, and workflow; actual translation work is performed by human translators or approved translation vendors — I am not a translator"
+- Optionally notes what information a translator would need: context (who is speaking, to whom, game genre/tone), character limit constraints if any, glossary terms (e.g., if "traveler" has a game-specific translation)
+
+### Case 3: Domain boundary — missing plural forms in Russian locale
+**Input**: "Our Russian locale files only have a singular form for item quantity strings. Russian requires multiple plural forms (1 item, 2-4 items, 5+ items use different forms)."
+**Expected behavior**:
+- Identifies this as a locale-specific plural form gap: Russian has 3 plural categories (one, few, many) per CLDR/Unicode plural rules — a single string is insufficient
+- Flags it as a localization quality bug, not a minor style issue — incorrect plural forms are grammatically wrong and visible to players
+- Recommends the fix: update the string extraction format to support CLDR plural categories (one/few/many/other), and flag to the translation vendor that Russian strings need all plural forms
+- Notes which other languages in the pipeline also require plural form support (e.g., Polish, Czech, Arabic)
+- Does NOT suggest using a numeric threshold workaround as a substitute for proper CLDR plural support
+
+### Case 4: String key naming conflict between two systems
+**Input**: "Our UI system uses keys like 'button_confirm' and 'button_cancel'. Our dialogue system uses 'confirm' and 'cancel' for the same concepts. Translators are confused about which to use."
+**Expected behavior**:
+- Identifies the conflict: two systems use different key naming conventions for semantically identical strings, creating duplicate translation work and translator confusion
+- Produces a naming convention resolution: domain-prefixed keys with a consistent separator (e.g., `ui.button.confirm`, `ui.button.cancel`) — all systems use the same key for shared concepts
+- Recommends that shared UI primitives (Confirm, Cancel, Back, OK) use a single canonical key in a shared namespace, referenced by both systems
+- Provides a migration path: map old keys to new keys, update all string references in both systems, deprecate old keys after one release cycle
+- Does NOT recommend maintaining two separate keys for the same concept
+
+### Case 5: Context pass — pipeline accommodates RTL languages
+**Input context**: Target locales include English (en), French (fr), German (de), Arabic (ar), and Hebrew (he).
+**Input**: "Design the localization pipeline for this project."
+**Expected behavior**:
+- Identifies Arabic and Hebrew as RTL languages — explicitly calls this out as a pipeline requirement
+- Designs the pipeline to include: RTL text rendering support (flag for programmer: UI must support RTL layout mirroring), bidirectional (bidi) text handling in string tables, locale-specific testing checklist entry for RTL layout
+- Does NOT design a pipeline that only accounts for LTR languages when RTL locales are specified
+- Notes that Arabic also requires a different plural form structure (6 plural categories in CLDR) — flags for translation vendor
+- Output includes all five locales in the pipeline architecture, not just the default (en)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (pipeline, extraction, string quality, locale formats, i18n architecture)
+- [ ] Does not produce translations — redirects translation work to human translators/vendors
+- [ ] Flags locale-specific gaps (plural forms, RTL) as quality bugs requiring pipeline changes
+- [ ] Produces a unified key naming convention when conflicts arise — does not accept dual conventions
+- [ ] Incorporates all provided target locales, including RTL languages, into pipeline design
+
+---
+
+## Coverage Notes
+- Case 3 (plural forms) and Case 5 (RTL) are locale-correctness tests — these affect shipping quality in non-English markets
+- Case 4 (key naming conflict) is a pipeline hygiene test — duplicate keys cause ongoing translator confusion and cost
+- Case 5 requires the target locale list to be in context; if not provided, agent should ask before designing the pipeline
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/release-manager.md
+++ b/Framework/agents/operations/release-manager.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: release-manager
+
+## Agent Summary
+- **Domain**: Release pipeline management, platform certification checklists (Nintendo, Sony, Microsoft, Apple, Google), store submission workflows, platform technical requirements compliance, semantic version numbering, release branch management
+- **Does NOT own**: Game design decisions, QA test strategy or test case design (qa-lead), QA test execution (qa-tester), build infrastructure (devops-engineer)
+- **Model tier**: Sonnet
+- **Gate IDs**: May be invoked by `/gate-check` during Release phase; LAUNCH BLOCKED verdict is release-manager's primary escalation output
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references release pipeline, certification, store submission)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/ directory; no game source or test tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over QA strategy, game design, or build infrastructure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — platform certification checklist for Nintendo Switch
+**Input**: "Generate the certification checklist for our Nintendo Switch submission."
+**Expected behavior**:
+- Produces a structured checklist covering Nintendo Lotcheck requirements relevant to the game type
+- Includes categories: content rating (CERO/PEGI/ESRB as applicable), save data handling, offline mode compliance, error handling (lost connectivity, storage full), controller requirement (Joy-Con, Pro Controller support), sleep/wake behavior, screenshot/video capture compliance
+- Formats output as a numbered checklist with pass/fail columns
+- Notes that Nintendo's full Lotcheck guidelines require a licensed developer account to access and flags any items that require manual verification against the current guidelines document
+- Does NOT produce fabricated requirement IDs — uses known public requirements or clearly marks uncertainty
+
+### Case 2: Out-of-domain request — design test cases
+**Input**: "Write test cases for our save system to make sure it passes certification."
+**Expected behavior**:
+- Does not produce test case specifications
+- States clearly: "Test case design is owned by qa-lead (strategy) and qa-tester (execution); I can provide the certification requirements that the save system must meet, which qa-lead can then use to design tests"
+- Optionally offers to list the save-system-relevant certification requirements
+
+### Case 3: Domain boundary — certification failure (rating issue)
+**Input**: "Our build was rejected by the ESRB. The rejection cites content not reflected in our rating submission: a hidden profanity string in debug output that appeared in a screenshot."
+**Expected behavior**:
+- Issues a LAUNCH BLOCKED verdict with the specific platform requirement referenced (ESRB submission accuracy requirement)
+- Identifies the immediate action required: locate and remove all debug output containing inappropriate content before resubmission
+- Notes the resubmission process: corrected build must be resubmitted with updated content descriptor if needed
+- Does NOT minimize the issue — a certification rejection is a blocking event, not an advisory
+- Escalates to producer: documents the delay impact on release timeline
+
+### Case 4: Version numbering conflict — hotfix vs. release branch
+**Input**: "Our release branch is at v1.2.0. A hotfix was applied directly on main and tagged v1.2.1. Now the release branch also has changes that need to ship as v1.2.1 but they're different changes."
+**Expected behavior**:
+- Identifies the conflict: two different changesets have been assigned the same version tag
+- Applies semantic versioning resolution: one must be re-tagged — the release branch changes should become v1.2.2 if v1.2.1 is already published; if v1.2.1 is not yet published, coordinate with devops-engineer to merge or re-tag
+- Does NOT accept a state where the same version number refers to two different builds
+- Notes that once a version is submitted to a store, it cannot be reused — flags this as a potential store submission blocker
+
+### Case 5: Context pass — release date constraint and certification lead time
+**Input context**: Target release date is 2026-06-01. Current date is 2026-04-06. Nintendo Lotcheck typically takes 4-6 weeks.
+**Input**: "What should we prioritize on the certification checklist given our timeline?"
+**Expected behavior**:
+- Calculates the available window: ~8 weeks to release date; Nintendo Lotcheck at 4-6 weeks means submission must be ready by approximately 2026-04-20 to 2026-05-04 to allow for a potential resubmission cycle
+- Flags that a single rejection cycle would consume the buffer — prioritizes items historically associated with Lotcheck rejections (save data, offline mode, error handling)
+- Orders the checklist by certification lead time impact, not by perceived difficulty
+- Does NOT produce a checklist that assumes first-pass certification — builds in resubmission time
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (release pipeline, certification checklists, version numbering, store submission)
+- [ ] Redirects test case design requests to qa-lead/qa-tester without producing test specs
+- [ ] Issues LAUNCH BLOCKED verdicts for certification failures — does not downgrade to advisory
+- [ ] Applies semantic versioning correctly and flags version conflicts as store-blocking issues
+- [ ] Uses provided timeline data to prioritize checklist items by certification lead time
+
+---
+
+## Coverage Notes
+- Case 3 (LAUNCH BLOCKED verdict) is the most critical test — this agent's primary safety output is blocking bad launches
+- Case 5 requires current date and release date context; verify the agent uses actual dates, not placeholder estimates
+- Certification requirements change over time — flag if the agent produces specific requirement IDs that may be outdated
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/qa/accessibility-specialist.md
+++ b/Framework/agents/qa/accessibility-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: accessibility-specialist
+
+## Agent Summary
+Domain: Input remapping, text scaling, colorblind modes, screen reader support, and accessibility standards compliance (WCAG, platform certifications).
+Does NOT own: overall UX flow design (ux-designer), visual art style direction (art-director).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references accessibility / inclusive design / WCAG)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow or visual art style
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review the player HUD for accessibility."
+**Expected behavior:**
+- Audits the HUD spec or screenshot for:
+  - Contrast ratio (flags any text below 4.5:1 for AA or 7:1 for AAA)
+  - Alternative representation for color-coded information (e.g., enemy health bars use only color, no shape distinction)
+  - Text size (flags any text below 16px equivalent at 1080p)
+  - Screen reader or TTS annotation availability for key status elements
+- Produces a prioritized finding list with specific element names and the criteria they fail
+- Does NOT redesign the HUD — produces findings for ux-designer and ui-programmer to act on
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the overall game flow: main menu → character select → loading → gameplay → pause → results."
+**Expected behavior:**
+- Does NOT produce UX flow architecture
+- Explicitly states that overall game flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- May note it can review the flow for accessibility concerns (e.g., time limits, cognitive load) once the flow is designed
+
+### Case 3: Colorblind mode conflict
+**Input:** "The proposed colorblind mode for deuteranopia replaces the enemy red health bars with orange, but the art palette already uses orange for friendly units."
+**Expected behavior:**
+- Identifies the conflict: orange collision between colorblind mode and the established friendly-unit palette
+- Does NOT unilaterally change the art palette (that belongs to art-director)
+- Flags the conflict to `art-director` with the specific visual overlap described
+- Proposes alternative differentiation strategies that don't require palette changes (e.g., shape/icon overlay, pattern fill, iconography)
+
+### Case 4: UI state requirement for accessibility feature
+**Input:** "Screen reader support for the inventory requires the system to expose item names and quantities as accessible text nodes."
+**Expected behavior:**
+- Produces an accessibility requirements spec defining the required accessible text properties for each inventory element
+- Identifies that implementing accessible text nodes requires UI system changes
+- Coordinates with `ui-programmer` to implement the required accessible text node exposure
+- Does NOT implement the UI system changes itself
+
+### Case 5: Context pass — WCAG 2.1 targets
+**Input:** Project accessibility target provided in context: WCAG 2.1 AA compliance. Request: "Review the dialogue system for accessibility."
+**Expected behavior:**
+- References specific WCAG 2.1 AA success criteria relevant to dialogue (e.g., 1.4.3 Contrast Minimum, 1.4.4 Resize Text, 2.2.1 Timing Adjustable for auto-advancing dialogue)
+- Uses exact criterion numbers and names from the standard, not paraphrases
+- Flags each finding with the specific criterion it fails
+- Notes which criteria are out of scope for AA (AAA-only) so they are not incorrectly flagged as failures
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (remapping, text scaling, colorblind modes, screen reader, standards compliance)
+- [ ] Redirects UX flow design to ux-designer, art palette decisions to art-director
+- [ ] Returns structured findings with specific element names, contrast ratios, and criterion references
+- [ ] Does not implement UI changes — coordinates with ui-programmer for implementation
+- [ ] References specific WCAG criteria by number when compliance target is provided
+- [ ] Flags conflicts between accessibility requirements and art decisions to art-director
+
+---
+
+## Coverage Notes
+- HUD audit (Case 1) should produce findings trackable as accessibility stories in the sprint backlog
+- Colorblind conflict (Case 3) confirms the agent respects art-director's authority over the palette
+- WCAG criteria (Case 5) verifies the agent uses standards precisely, not generically
--- a/Framework/agents/qa/qa-tester.md
+++ b/Framework/agents/qa/qa-tester.md
@@ -0,0 +1,87 @@
+# Agent Test Spec: qa-tester
+
+## Agent Summary
+- **Domain**: Detailed test case authoring, bug reports (structured format), test execution documentation, regression checklists, smoke check execution docs, test evidence recording per the project's coding standards
+- **Does NOT own**: Test strategy and test plan design (qa-lead), implementation fixes for found bugs (appropriate programmer), QA process architecture (qa-lead)
+- **Category**: qa
+- **Model tier**: Sonnet
+- **Gate IDs**: None; flags ambiguous acceptance criteria to qa-lead rather than resolving independently
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references test cases, bug reports, test execution, regression testing)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for tests/ and production/qa/evidence/; no source code editing tools)
+- [ ] Model tier is Sonnet (default for QA specialists)
+- [ ] Agent definition does not claim authority over test strategy, fix implementation, or acceptance criterion definition
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — test cases for a save system
+**Input**: "Write test cases for our save system. It must save and load player position, inventory, and quest state."
+**Expected behavior**:
+- Produces a test case list with at minimum the following test cases, each containing all four required fields:
+  - **TC-SAVE-001**: Save and load player position
+  - **TC-SAVE-002**: Save and load full inventory (multiple item types, quantities, equipped state)
+  - **TC-SAVE-003**: Save and load quest state (in-progress, completed, and locked quest states)
+  - **TC-SAVE-004**: Overwrite an existing save file
+  - **TC-SAVE-005**: Load a save file from a previous version (backward compatibility)
+  - **TC-SAVE-006**: Corrupt save file handling (file exists but is invalid)
+- Each test case includes: **Precondition** (required game state before test), **Steps** (numbered, unambiguous), **Expected Result** (specific, observable outcome), **Pass Criteria** (binary pass/fail condition)
+- Does NOT write "verify the save works" as a pass criterion — criteria must be observable and unambiguous
+
+### Case 2: Out-of-domain request — implement a bug fix
+**Input**: "You found a bug where the save system loses inventory data on version mismatch. Please fix it."
+**Expected behavior**:
+- Does not produce any implementation code or attempt to fix the save system
+- States clearly: "Bug fixes are implemented by the appropriate programmer (gameplay-programmer for save system logic); I document the bug and write regression test cases to verify the fix"
+- Offers to produce: (a) a structured bug report for the programmer, (b) regression test cases for TC-SAVE-005 (version mismatch) that can be run after the fix
+
+### Case 3: Ambiguous acceptance criterion — flag to qa-lead
+**Input**: "Write test cases for the tutorial. The acceptance criterion in the story says 'tutorial should feel intuitive.'"
+**Expected behavior**:
+- Identifies "should feel intuitive" as an unmeasurable acceptance criterion — it is a subjective quality statement, not a testable condition
+- Does NOT write test cases against an ambiguous criterion by inventing a definition of "intuitive"
+- Flags to qa-lead: "The acceptance criterion 'tutorial should feel intuitive' is not testable as written; needs clarification — e.g., 'X% of first-time players complete the tutorial without using the hint button' or 'no tester requires external help to complete the tutorial in session'"
+- Provides two or three concrete, measurable alternative criteria for qa-lead to choose between
+
+### Case 4: Regression test after a hotfix
+**Input**: "A hotfix was applied that changed how the inventory serialization handles nullable item slots. Write a targeted regression checklist for the affected systems."
+**Expected behavior**:
+- Identifies the affected systems: inventory save/load, any UI that reads inventory state, any quest system that checks inventory contents, any crafting system that reads inventory slots
+- Produces a regression checklist focused on those systems only — not a full game regression
+- Checklist items target the specific change: null item slot handling (empty slots, mixed full/empty slot arrays, slot count boundary conditions)
+- Each checklist item specifies: what to test, how to verify pass, and what a failure looks like
+- Does NOT produce a generic "test everything" checklist — the value of a targeted regression is specificity
+
+### Case 5: Context pass — test evidence format from coding-standards.md
+**Input context**: coding-standards.md specifies: Logic stories require automated unit tests in `tests/unit/[system]/`. Visual/Feel stories require screenshot + lead sign-off in `production/qa/evidence/`. UI stories require manual walkthrough doc in `production/qa/evidence/`.
+**Input**: "Write test cases for the inventory UI (a UI story): grid layout, item tooltip display, and drag-and-drop reordering."
+**Expected behavior**:
+- Classifies this correctly as a UI story per the provided standards
+- Produces a manual walkthrough test document (not automated unit tests) — because the coding standard specifies manual walkthrough for UI stories
+- Specifies the output location: `production/qa/evidence/` (not `tests/unit/`)
+- Test cases include: grid layout verification (all items appear, no overflow), tooltip display (correct item name, stats, description appear on hover/focus), and drag-and-drop (item moves to target slot, original slot becomes empty, slot limits respected)
+- Notes that this is ADVISORY evidence level per the coding standards, not BLOCKING — explicitly states this so the team knows the gate level
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (test case authoring, bug reports, test execution documentation, regression checklists)
+- [ ] Redirects bug fix requests to appropriate programmers and offers to document the bug and write regression tests
+- [ ] Flags ambiguous acceptance criteria to qa-lead rather than inventing a testable interpretation
+- [ ] Produces targeted regression checklists (system-specific) not full-game regression passes
+- [ ] Uses the correct test evidence format and output location per coding-standards.md
+
+---
+
+## Coverage Notes
+- Case 1 (test case completeness) is the foundational quality test — missing fields (precondition, steps, expected result, pass criteria) are a failure
+- Case 3 (ambiguous criterion) is a coordination test — qa-tester must not silently accept untestable criteria
+- Case 5 requires coding-standards.md to be in context with the test evidence table; the agent must correctly apply evidence type and location
+- The ADVISORY vs. BLOCKING gate level (Case 5) is a detail that affects story completion — verify the agent reports it
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/qa/security-engineer.md
+++ b/Framework/agents/qa/security-engineer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: security-engineer
+
+## Agent Summary
+Domain: Anti-cheat systems, save data security, network security, vulnerability assessment, and data privacy compliance.
+Does NOT own: game logic design (gameplay-programmer), server infrastructure (devops-engineer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references anti-cheat / security / vulnerability assessment)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over game logic design or server deployment
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review the save data system for security issues."
+**Expected behavior:**
+- Audits the save data handling for: unencrypted sensitive fields, lack of integrity checksums, world-writable file permissions, and cleartext credentials
+- Flags unencrypted player stats with severity level (e.g., MEDIUM — enables offline stat manipulation)
+- Recommends: AES-256 encryption for sensitive fields, HMAC checksum for tamper detection
+- Produces a prioritized finding list (CRITICAL / HIGH / MEDIUM / LOW)
+- Does NOT change the save system code directly — produces findings for gameplay-programmer or engine-programmer to act on
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the matchmaking algorithm to pair players by skill rating."
+**Expected behavior:**
+- Does NOT produce matchmaking algorithm design
+- Explicitly states that matchmaking design belongs to `network-programmer`
+- Redirects the request to `network-programmer`
+- May note it can review the matchmaking system for security vulnerabilities (e.g., rating manipulation) once the design is complete
+
+### Case 3: Critical vulnerability — SQL injection
+**Input:** (Hypothetical) "Review this server-side query handler: `query = 'SELECT * FROM users WHERE id=' + user_input`"
+**Expected behavior:**
+- Flags this as a CRITICAL vulnerability (SQL injection via unsanitized user input)
+- Provides immediate remediation: parameterized queries / prepared statements
+- Recommends a security review of all other query-construction code in the codebase
+- Escalates to `technical-director` given CRITICAL severity — does not leave the finding unescalated
+
+### Case 4: Security vs. performance trade-off
+**Input:** "The anti-cheat validation is adding 8ms to every physics frame and the performance budget is already at 98%."
+**Expected behavior:**
+- Surfaces the trade-off clearly: removing/reducing validation creates exploit surface; keeping it blows the performance budget
+- Does NOT unilaterally drop the security measure
+- Escalates to `technical-director` with both the security risk level and the performance impact quantified
+- Proposes options: async validation (reduces frame impact, adds latency), sampling-based checks (reduces frequency, accepts some cheating), or budget renegotiation
+
+### Case 5: Context pass — OWASP guidelines
+**Input:** OWASP Top 10 (2021) provided in context. Request: "Audit the game's login and account system."
+**Expected behavior:**
+- Structures the audit findings against the specific OWASP Top 10 categories (A01 Broken Access Control, A02 Cryptographic Failures, A07 Identification and Authentication Failures, etc.)
+- References specific control IDs from the provided list rather than generic advice
+- Flags each finding with the relevant OWASP category
+- Produces a compliance gap list: which controls are met, which are missing, which are partial
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (anti-cheat, save security, network security, vulnerability assessment)
+- [ ] Redirects matchmaking / game logic requests to appropriate agents
+- [ ] Returns structured findings with severity classification (CRITICAL / HIGH / MEDIUM / LOW)
+- [ ] Does not implement fixes unilaterally — produces findings for the responsible programmer
+- [ ] Escalates CRITICAL findings to technical-director immediately
+- [ ] References specific standards (OWASP, GDPR, etc.) when provided in context
+
+---
+
+## Coverage Notes
+- Save data audit (Case 1) confirms the agent produces actionable, prioritized findings not generic advice
+- CRITICAL vulnerability escalation (Case 3) verifies the agent's severity classification and escalation path
+- Performance trade-off (Case 4) confirms the agent does not silently drop security measures to hit a budget
--- a/Framework/agents/specialists/ai-programmer.md
+++ b/Framework/agents/specialists/ai-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ai-programmer
+
+## Agent Summary
+Domain: NPC behavior, state machines, pathfinding, perception systems, and AI decision-making.
+Does NOT own: player mechanics (gameplay-programmer), rendering or engine internals (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references NPC behavior / AI systems)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over player mechanics or engine rendering
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a patrol-and-alert behavior tree for a guard NPC: patrol between waypoints, detect the player within 10 units, then enter an alert state and pursue."
+**Expected behavior:**
+- Produces a behavior tree spec (nodes: Selector, Sequence, Leaf actions) plus corresponding code scaffold
+- Defines clearly named states: Patrol, Alert, Pursue
+- Uses a perception/detection check as a condition node, not inline in movement code
+- Waypoints are data-driven (passed as a resource or export), not hardcoded positions
+- Output includes doc comments on public API
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement player input handling for the WASD movement and dash ability."
+**Expected behavior:**
+- Does NOT produce player input or movement code
+- Explicitly states this is outside its domain (player mechanics belong to gameplay-programmer)
+- Redirects the request to `gameplay-programmer`
+- May note that once player position is available via API, AI perception can reference it
+
+### Case 3: Cross-domain coordination — level constraints
+**Input:** "Design pathfinding for the warehouse level, but the level has narrow corridors that confuse the navmesh."
+**Expected behavior:**
+- Does NOT unilaterally modify level layout or navmesh assets
+- Coordinates with `level-designer` to clarify navmesh requirements and corridor dimensions
+- Proposes a pathfinding approach (e.g., navmesh with agent radius tuning, flow fields) conditional on level geometry
+- Documents assumptions and flags blockers clearly
+
+### Case 4: Performance escalation — custom data structures
+**Input:** "The pathfinding priority queue is the bottleneck; I need a custom binary heap implementation for performance."
+**Expected behavior:**
+- Recognizes that a low-level, engine-integrated data structure is within engine-programmer's domain
+- Escalates to `engine-programmer` with a clear description of the bottleneck and required interface
+- May provide the algorithmic spec (binary heap interface, expected operations) to guide the engine-programmer
+- Does NOT implement the low-level structure unilaterally if it requires engine memory management
+
+### Case 5: Context pass — uses level layout for pathfinding design
+**Input:** Level layout document provided in context showing two choke points: a doorway at (12, 0) and a bridge at (40, 5). Request: "Design the patrol route and threat response for enemies in this level."
+**Expected behavior:**
+- References the specific choke point coordinates from the provided context
+- Designs patrol routes that leverage the choke points as tactical positions
+- Specifies alert state transitions that funnel NPCs toward identified choke points during pursuit
+- Does not invent geometry not present in the provided layout document
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (NPC behavior, pathfinding, perception, state machines)
+- [ ] Redirects out-of-domain requests to correct agent (gameplay-programmer, engine-programmer, level-designer)
+- [ ] Returns structured findings (behavior tree specs, state machine diagrams, code scaffolds)
+- [ ] Does not modify player mechanics files without explicit delegation
+- [ ] Escalates performance-critical low-level structures to engine-programmer
+- [ ] Uses data-driven NPC configuration (waypoints, detection radii) not hardcoded values
+
+---
+
+## Coverage Notes
+- Behavior tree output (Case 1) should be validated by a unit test in `tests/unit/ai/`
+- Level-layout context (Case 5) verifies the agent reads and applies provided documents rather than inventing
+- Performance escalation (Case 4) confirms the agent recognizes the engine-programmer boundary
--- a/Framework/agents/specialists/engine-programmer.md
+++ b/Framework/agents/specialists/engine-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: engine-programmer
+
+## Agent Summary
+Domain: Rendering pipeline, physics integration, memory management, resource loading, and core engine framework.
+Does NOT own: gameplay mechanics (gameplay-programmer), editor/debug tool UI (tools-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references rendering / memory / engine core)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay mechanics or tool UI
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a custom object pool for projectiles to avoid per-frame allocation."
+**Expected behavior:**
+- Produces an engine-level object pool implementation with acquire/release interface
+- Pool is typed to the projectile object type, uses pre-allocated fixed-size storage
+- Provides thread-safety notes (or clearly marks as single-threaded-only with rationale)
+- Includes doc comments on the public API per coding standards
+- Output is compatible with the project's configured engine and language
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Add a pause menu screen with volume sliders and a 'back to main menu' button."
+**Expected behavior:**
+- Does NOT produce UI screen code
+- Explicitly states that menu screens belong to `ui-programmer`
+- Redirects the request to `ui-programmer`
+- May note it can provide engine-level audio volume API endpoints for the ui-programmer to call
+
+### Case 3: Memory leak diagnosis
+**Input:** "Memory usage grows by ~50MB per level load and never releases. We suspect the resource loading system."
+**Expected behavior:**
+- Produces a systematic diagnosis approach: reference counting audit, resource handle lifecycle check, cache invalidation review
+- Identifies likely causes (orphaned resource handles, circular references, cache that never evicts)
+- Produces a concrete fix for the identified leak pattern
+- Provides a test to verify the fix (memory baseline before load, measure after unload, confirm return to baseline)
+
+### Case 4: Cross-domain coordination — shared system optimization
+**Input:** "I need to optimize the physics broadphase, but the gameplay system is tightly coupled to the physics query API."
+**Expected behavior:**
+- Does NOT unilaterally change the physics query API surface (would break gameplay-programmer's code)
+- Coordinates with `lead-programmer` to plan the change safely
+- Proposes a migration path: new optimized API alongside old API, with a deprecation period
+- Documents the coordination requirement before proceeding
+
+### Case 5: Context pass — checks engine version reference
+**Input:** Engine version reference (Godot 4.6) provided in context. Request: "Set up the default physics engine for the project."
+**Expected behavior:**
+- Reads the engine version reference and notes Godot 4.6 change: Jolt physics is now the default
+- Produces configuration guidance that accounts for the Jolt-as-default change (4.6 migration note)
+- Flags any API differences between GodotPhysics and Jolt that could affect existing code
+- Does NOT suggest deprecated or pre-4.6 physics setup steps without noting they apply to older versions
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (rendering, physics, memory, resource loading, core framework)
+- [ ] Redirects UI/menu requests to ui-programmer
+- [ ] Returns structured findings (implementation code, diagnosis steps, migration plans)
+- [ ] Coordinates with lead-programmer before changing shared API surfaces
+- [ ] Checks engine version reference before suggesting engine-specific APIs
+- [ ] Provides test evidence for fixes (memory before/after, performance measurements)
+
+---
+
+## Coverage Notes
+- Object pool (Case 1) must include a unit test in `tests/unit/engine/`
+- Memory leak diagnosis (Case 3) should produce evidence artifacts in `production/qa/evidence/`
+- Engine version check (Case 5) confirms the agent treats VERSION.md as authoritative, not LLM training data
--- a/Framework/agents/specialists/gameplay-programmer.md
+++ b/Framework/agents/specialists/gameplay-programmer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: gameplay-programmer
+
+## Agent Summary
+Domain: Game mechanics code, player systems, combat implementation, and interactive features.
+Does NOT own: UI implementation (ui-programmer), AI behavior trees (ai-programmer), engine/rendering systems (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references game mechanics / player systems)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep — excludes tools only needed by orchestration agents
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UI, AI behavior, or engine/rendering code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a melee combo system where three consecutive light attacks chain into a finisher."
+**Expected behavior:**
+- Produces code or a code scaffold following the project's language (GDScript/C#) and coding standards
+- Defines combo state tracking, input window timing, and finisher trigger logic as separate, testable methods
+- References the relevant GDD section if one is provided in context
+- Does NOT implement UI feedback (delegates to ui-programmer) or AI reaction (delegates to ai-programmer)
+- Output includes doc comments on all public methods per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Build the main menu screen with pause and settings panels."
+**Expected behavior:**
+- Does NOT produce menu implementation code
+- Explicitly states this is outside its domain
+- Redirects the request to `ui-programmer`
+- May note that if the pause menu requires reading gameplay state it can provide the state API surface
+
+### Case 3: Domain boundary — threading flag
+**Input:** "The combo system is causing frame stutters; can you add threading to spread the input processing?"
+**Expected behavior:**
+- Does NOT unilaterally implement threading or async systems
+- Flags the threading concern to `engine-programmer` with a clear description of the hot path
+- May produce a non-threaded refactor to reduce work per frame as a safe interim step
+- Documents the escalation so lead-programmer is aware
+
+### Case 4: Conflict with an Accepted ADR
+**Input:** "Change the damage calculation to use floating-point accumulation directly instead of the fixed-point formula in ADR-003."
+**Expected behavior:**
+- Identifies that the proposed change violates ADR-003 (Accepted status)
+- Does NOT silently implement the violation
+- Flags the conflict to `lead-programmer` with the ADR reference and the trade-off described
+- Will implement only after explicit override decision from lead-programmer or technical-director
+
+### Case 5: Context pass — implements to GDD spec
+**Input:** GDD for "PlayerCombat" provided in context. Request: "Implement the stamina drain formula from the combat GDD."
+**Expected behavior:**
+- Reads the formula section of the provided GDD
+- Implements the exact formula as written — does NOT invent new variables or adjust coefficients
+- Makes stamina drain a data-driven value (external config), not a hardcoded constant
+- Notes any edge cases from the GDD's edge-cases section and handles them in code
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (mechanics, player systems, combat)
+- [ ] Redirects out-of-domain requests to correct agent (ui-programmer, ai-programmer, engine-programmer)
+- [ ] Returns structured findings (code scaffold, method signatures, inline comments) not freeform opinions
+- [ ] Does not modify files outside `src/gameplay/` or `src/core/` without explicit delegation
+- [ ] Flags ADR violations rather than overriding them silently
+- [ ] Makes gameplay values data-driven, never hardcoded
+
+---
+
+## Coverage Notes
+- Combo system test (Case 1) should be validated with a unit test in `tests/unit/gameplay/`
+- Threading escalation (Case 3) verifies the agent does not over-reach into engine territory
+- ADR conflict (Case 4) confirms the agent respects the architecture governance process
+- Cases 1 and 5 together verify the agent implements to spec rather than improvising
--- a/Framework/agents/specialists/network-programmer.md
+++ b/Framework/agents/specialists/network-programmer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: network-programmer
+
+## Agent Summary
+Domain: Multiplayer networking, state replication, lag compensation, matchmaking protocol design, and network message schemas.
+Does NOT own: gameplay logic (only the networking of it), server infrastructure and deployment (devops-engineer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references multiplayer / replication / networking)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay logic or server deployment infrastructure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Design state replication for player position in a 4-player co-op game."
+**Expected behavior:**
+- Produces a sync strategy document covering:
+  - Replication frequency (e.g., 20Hz with delta compression)
+  - Priority tier (e.g., own-player high priority, other players medium)
+  - Interpolation approach for remote players (e.g., linear interpolation with 100ms buffer)
+  - Bandwidth estimate per player per second
+- Does NOT implement the player movement logic itself (defers to gameplay-programmer)
+- Proposes dead-reckoning or prediction strategy to reduce visible lag
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Deploy our game server to AWS EC2 and set up auto-scaling."
+**Expected behavior:**
+- Does NOT produce server deployment configuration, Terraform, or AWS setup scripts
+- Explicitly states that server infrastructure belongs to `devops-engineer`
+- Redirects the request to `devops-engineer`
+- May note it can provide the network protocol spec the server needs to implement once infrastructure is set up
+
+### Case 3: State divergence — rollback/reconciliation
+**Input:** "Under high latency, clients are diverging from the authoritative server state for physics objects."
+**Expected behavior:**
+- Proposes a rollback-and-reconciliation approach (client-side prediction + server authoritative correction)
+- Specifies the state snapshot format, reconciliation trigger threshold (e.g., >5 units position error), and correction interpolation speed
+- Notes the input buffer pattern for deterministic replay
+- Does NOT change the physics simulation itself — documents the interface contract for engine-programmer
+
+### Case 4: Anti-cheat conflict
+**Input:** "We want client-authoritative position for smooth movement, but anti-cheat requires server validation."
+**Expected behavior:**
+- Surfaces the direct conflict: client-authority is fast but exploitable; server-authority is secure but requires latency compensation
+- Coordinates with `security-engineer` to agree on the validation boundary
+- Proposes a compromise (server validates position within a tolerance band, flags outliers) rather than unilaterally deciding
+- Documents the trade-off and escalates the final decision to `technical-director` if security-engineer and network-programmer cannot agree
+
+### Case 5: Context pass — latency budget
+**Input:** Technical preferences provided in context: target latency 80ms RTT for 95th percentile players. Request: "Design the input replication scheme for a fighting game."
+**Expected behavior:**
+- References the 80ms RTT budget explicitly in the design
+- Selects replication approach calibrated to that budget (e.g., rollback netcode is preferred for fighting games at this latency)
+- Specifies input delay frames calculated from the 80ms budget (e.g., 2 frames at 60fps = 33ms buffer)
+- Flags that rollback netcode requires gameplay-programmer to implement deterministic simulation
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (replication, lag compensation, protocol design, matchmaking)
+- [ ] Redirects server deployment to devops-engineer
+- [ ] Returns structured findings (sync strategies, protocol specs, bandwidth estimates)
+- [ ] Does not implement gameplay logic — only specifies the network contract for it
+- [ ] Coordinates with security-engineer on anti-cheat boundaries
+- [ ] Designs to explicit latency targets from provided context
+
+---
+
+## Coverage Notes
+- Replication strategy (Case 1) should include a bandwidth calculation reviewable by technical-director
+- Rollback/reconciliation (Case 3) must document the engine-programmer interface contract clearly
+- Anti-cheat conflict (Case 4) confirms the agent escalates rather than unilaterally deciding security trade-offs
--- a/Framework/agents/specialists/performance-analyst.md
+++ b/Framework/agents/specialists/performance-analyst.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: performance-analyst
+
+## Agent Summary
+Domain: Profiling, bottleneck identification, performance metrics tracking, and optimization recommendations.
+Does NOT own: implementing optimizations (belongs to the appropriate programmer for that domain).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references profiling / bottleneck analysis / performance metrics)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over implementing any optimization — explicitly identifies itself as analysis/recommendation only
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Analyze this frame time data: CPU 14ms, GPU 8ms, physics 6ms, draw calls 420, scripts 3ms."
+**Expected behavior:**
+- Identifies the primary bottleneck: CPU is over a 16.67ms (60fps) budget at 14ms total
+- Breaks down contributors: physics (6ms, 43% of CPU time) is the top culprit
+- Draw calls (420) flags as a secondary concern if the budget limit is lower (e.g., 200 draw calls per technical-preferences.md)
+- Produces a prioritized bottleneck report:
+  1. Physics — 6ms, reduce simulation frequency or switch broadphase algorithm
+  2. Draw calls — 420, implement batching or LOD
+  3. Scripts — 3ms, profile hot paths
+- Does NOT implement any of these optimizations
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the batching optimization to reduce draw calls from 420 to under 200."
+**Expected behavior:**
+- Does NOT produce implementation code for batching
+- Explicitly states that implementing optimizations belongs to the appropriate programmer (engine-programmer for rendering batching)
+- Redirects the implementation to `engine-programmer` with the recommendation context attached
+- May produce a requirements brief for the optimization so engine-programmer has a clear target
+
+### Case 3: Regression identification
+**Input:** "Performance dropped significantly after last week's commits. Frame time went from 10ms to 18ms."
+**Expected behavior:**
+- Proposes a bisection strategy to identify the offending commit range
+- Requests or reviews the diff of commits in the window to narrow the likely cause
+- Identifies affected systems based on what changed (e.g., if physics code was modified, points to physics as the primary suspect)
+- Produces a regression report naming the probable commit, the affected system, and the measured delta
+
+### Case 4: Recommendation vs. code quality trade-off
+**Input:** "The fastest optimization for the script bottleneck would be to inline all calls and remove abstraction layers."
+**Expected behavior:**
+- Surfaces the trade-off: inlining improves performance but reduces testability and violates the coding standard requiring unit-testable public methods
+- Does NOT recommend the optimization without noting the code quality cost
+- Escalates the trade-off to `lead-programmer` for a decision
+- May propose a middle path (e.g., profile-guided inlining of only the hottest 2–3 methods) that preserves testability
+
+### Case 5: Context pass — technical-preferences.md budget
+**Input:** Technical preferences from context: Target 60fps, frame budget 16.67ms, draw calls max 200, memory ceiling 512MB. Request: "Review the current build profile."
+**Expected behavior:**
+- References the specific values from the provided context: 16.67ms, 200 draw calls, 512MB
+- Compares current measurements against each threshold explicitly
+- Labels each metric as WITHIN BUDGET / AT RISK / OVER BUDGET based on the provided numbers
+- Does NOT use different budget numbers than those provided in the context
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (profiling, analysis, recommendations — not implementation)
+- [ ] Redirects optimization implementation to the correct programmer domain agent
+- [ ] Returns structured findings (bottleneck report with severity, measured values, and recommended action owner)
+- [ ] Escalates code-quality trade-offs to lead-programmer rather than deciding unilaterally
+- [ ] Applies budget thresholds from provided context rather than assumed defaults
+- [ ] Labels all findings with a specific action owner (who should implement the fix)
+
+---
+
+## Coverage Notes
+- Frame time analysis (Case 1) output should be structured as a report filed in `production/qa/evidence/`
+- Regression case (Case 3) confirms the agent investigates cause, not just measures symptoms
+- Code quality trade-off (Case 4) verifies the agent does not recommend optimizations that violate coding standards without flagging the conflict
--- a/Framework/agents/specialists/prototyper.md
+++ b/Framework/agents/specialists/prototyper.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: prototyper
+
+## Agent Summary
+- **Domain**: Rapid throwaway prototypes in the `prototypes/` directory, concept validation experiments, mechanical feasibility tests. Standards intentionally relaxed for speed — prototypes are not production code.
+- **Does NOT own**: Production source code in `src/` (gameplay-programmer), design documents (game-designer), production-grade architecture decisions (lead-programmer / technical-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; produces recommendation docs after prototype conclusion; does not participate in phase gates
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references rapid prototyping, prototypes/ directory, throwaway code)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write scoped to prototypes/ directory; no production src/ write access)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition explicitly states that prototype code is not production code and must not be copied to src/
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — prototype a card-drawing mechanic
+**Input**: "Prototype a card-drawing mechanic in 2 hours. The core question: does drawing 3 cards per turn with hand-size limit of 7 feel good? I need something to test in a playtest today."
+**Expected behavior**:
+- Produces a minimal working prototype written in the project's engine scripting language, scoped to `prototypes/card-draw-mechanic/`
+- Code prioritizes speed over correctness: no unit tests, no doc comments required, global state is acceptable for a prototype
+- Implements the minimal viable mechanic: a deck, a draw function (draw N cards), a hand container with a size limit, and a simple UI or debug print to verify state
+- Does NOT implement production patterns (dependency injection, signals, data-driven config) unless they take less time than not using them
+- Includes a `README.md` in the prototype folder: hypothesis being tested, how to run, what to observe in the playtest
+
+### Case 2: Out-of-domain request — production-grade implementation
+**Input**: "The card mechanic prototype worked great. Now write the production implementation of the card system for src/gameplay/cards/."
+**Expected behavior**:
+- Does not write production code to `src/`
+- States clearly: "Prototyper produces throwaway code in prototypes/ to validate concepts; production implementation of validated mechanics is handled by gameplay-programmer"
+- Offers to produce a transition document: what the prototype proved, what the production implementation should preserve (the mechanic), and what it should discard (the throwaway implementation patterns)
+- Does NOT copy the prototype code into src/ or suggest it as a starting point without warning about its non-production quality
+
+### Case 3: Prototype validates the mechanic — recommendation output
+**Input**: "The card-draw prototype playtested well. Three sessions all enjoyed drawing 3 cards/turn with hand limit 7. No confusion observed. What's next?"
+**Expected behavior**:
+- Produces a prototype conclusion document in `prototypes/card-draw-mechanic/conclusion.md` (or equivalent)
+- Document includes: hypothesis that was tested, playtest method (sessions, duration, observer notes), result verdict (VALIDATED), key findings (what worked, any minor issues observed), recommendation for production (specific mechanic parameters to preserve: 3 cards/turn, hand limit 7), and a flag to route the production implementation request to gameplay-programmer
+- Does NOT begin writing production code
+- Output is structured as a decision-ready recommendation, not a narrative summary
+
+### Case 4: Prototype reveals the mechanic is unworkable — abandonment note
+**Input**: "The prototype for the physics-based lock-picking mechanic is done. After 4 playtest sessions, all testers found it frustrating — too much precision required, not fun. One tester rage-quit."
+**Expected behavior**:
+- Produces a prototype abandonment note in `prototypes/lock-picking-physics/conclusion.md`
+- Document includes: hypothesis that was tested, result verdict (ABANDONED), specific reasons (precision barrier too high, negative emotional response, rage-quit incident as evidence), and a recommendation for alternative approaches to explore (simplified key-tumbler mechanic, rhythm-based alternative, removal of the mechanic entirely)
+- Does NOT recommend persisting with the prototype mechanic because of sunk cost
+- Does NOT mark the result as inconclusive — after 4 sessions with consistent negative responses, abandonment is the correct verdict
+
+### Case 5: Context pass — using the project's engine scripting language
+**Input context**: Project uses Godot 4.6 with GDScript (configured in technical-preferences.md).
+**Input**: "Prototype a basic grid movement system — player clicks a tile and the character moves to it."
+**Expected behavior**:
+- Produces the prototype in GDScript — not Python, C#, or pseudocode
+- Uses Godot 4.6 node types appropriate for a grid: TileMap or a custom grid manager node, CharacterBody2D or Node2D for the player
+- Does NOT apply production coding standards (no required test coverage, no doc comments, global state acceptable)
+- Writes the output to `prototypes/grid-movement/` not to `src/`
+- If a Godot 4.6 API is uncertain (given the LLM knowledge cutoff noted in VERSION.md), flags the specific API with a note to verify against the Godot 4.6 docs
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (prototypes/ directory only; throwaway code for concept validation)
+- [ ] Redirects production implementation requests to gameplay-programmer with a transition document offer
+- [ ] Produces structured conclusion documents (VALIDATED or ABANDONED verdict) after prototype evaluation
+- [ ] Does not recommend preserving prototype code in production form without explicit warnings
+- [ ] Uses the project's configured engine and scripting language; flags version uncertainty
+
+---
+
+## Coverage Notes
+- Case 2 (production redirect) is critical — prototype code leaking into src/ is a common quality problem
+- Case 4 (abandonment honesty) tests whether the agent avoids sunk-cost bias — prototypes that fail should be cleanly abandoned
+- Case 5 requires that technical-preferences.md has the engine and language configured; test is incomplete if not configured
+- The intentional relaxation of coding standards is a feature, not a gap — do not flag missing tests or doc comments as failures in prototype output
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/specialists/sound-designer.md
+++ b/Framework/agents/specialists/sound-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: sound-designer
+
+## Agent Summary
+Domain: SFX specs, audio events, mixing parameters, and sound category definitions.
+Does NOT own: music composition direction (audio-director), code implementation of audio systems.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references SFX / audio events / mixing)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep — does NOT include engine code execution tools
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over music direction or audio code implementation
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an SFX spec for a sword swing attack."
+**Expected behavior:**
+- Produces a complete audio event spec including:
+  - Event name (e.g., `sfx_combat_sword_swing`)
+  - Variation count (minimum 3 to avoid repetition fatigue)
+  - Pitch range (e.g., ±8% randomization)
+  - Volume range and normalization target (e.g., -12 dBFS)
+  - Sound category (e.g., `combat_sfx`)
+  - Suggested layering notes (whoosh layer + impact transient)
+- Output follows the project audio naming convention if one is established
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Compose a looping ambient music track for the forest level."
+**Expected behavior:**
+- Does NOT produce music composition direction or a music brief
+- Explicitly states that music direction belongs to `audio-director`
+- Redirects the request to `audio-director`
+- May note it can provide an SFX ambience layer spec (wind, wildlife) to complement the music once the music direction is set
+
+### Case 3: Dynamic parameter — falloff curve spec
+**Input:** "The sword swing SFX needs distance falloff so it sounds different across the arena."
+**Expected behavior:**
+- Produces a spec for the dynamic parameter including:
+  - Parameter name (e.g., `distance` or `listener_distance`)
+  - Falloff curve type (e.g., logarithmic, linear, custom)
+  - Near/far distance thresholds with corresponding volume and high-frequency attenuation values
+  - Occlusion override behavior if applicable
+- Does NOT write the audio engine integration code (defers to the appropriate programmer)
+
+### Case 4: Naming convention conflict
+**Input:** "Add a new SFX event called `SWORD_HIT_1` for the melee system."
+**Expected behavior:**
+- Identifies that `SWORD_HIT_1` conflicts with the established event naming convention (snake_case with category prefix, e.g., `sfx_combat_sword_hit`)
+- Does NOT silently register the non-conforming name
+- Flags the conflict to `audio-director` with the proposed compliant alternative
+- Will proceed with the corrected name once confirmed by audio-director
+
+### Case 5: Context pass — uses audio style guide
+**Input:** Audio style guide provided in context specifying: "gritty, grounded, no reverb tails over 1.5s, reference: The Witcher 3 combat audio." Request: "Create SFX specs for the full melee combat suite."
+**Expected behavior:**
+- References the "gritty, grounded" tone descriptor in the spec rationale
+- Caps all reverb tail specifications at 1.5 seconds as stated
+- Notes the reference material (The Witcher 3) as a benchmark for mix levels and transient design
+- Does NOT produce specs that contradict the style guide (e.g., no ethereal or heavily reverb-processed specs)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (SFX specs, event definitions, mixing parameters)
+- [ ] Redirects music direction requests to audio-director
+- [ ] Returns structured audio event specs (event name, variations, pitch, volume, category)
+- [ ] Does not produce code for audio system implementation
+- [ ] Flags naming convention violations rather than silently accepting non-conforming names
+- [ ] References provided style guides and constraints in all spec output
+
+---
+
+## Coverage Notes
+- SFX spec format (Case 1) should match whatever event schema the audio middleware (Wwise/FMOD/built-in) requires
+- Falloff curve (Case 3) verifies the agent produces implementation-ready parameter specs
+- Style guide compliance (Case 5) confirms the agent reads provided context and constrains output accordingly
--- a/Framework/agents/specialists/technical-artist.md
+++ b/Framework/agents/specialists/technical-artist.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: technical-artist
+
+## Agent Summary
+Domain: Shaders, VFX, rendering optimization, art pipeline tools, and visual performance.
+Does NOT own: art style decisions or color palette (art-director), gameplay code (gameplay-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references shaders / VFX / rendering)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over art style direction or gameplay logic
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create a dissolve effect shader for enemy death sequences."
+**Expected behavior:**
+- Produces shader code or a Shader Graph node spec appropriate to the configured engine (Godot shading language / Unity Shader Graph / Unreal Material Blueprint)
+- Defines a `dissolve_amount` uniform (0.0–1.0) as the animation driver
+- Uses a noise texture sample to determine the dissolve threshold
+- Notes edge-lighting technique as an optional enhancement
+- Output is engine-version-aware (checks version reference if post-cutoff APIs are needed)
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Define the art bible color palette: primary, secondary, and accent colors for the UI."
+**Expected behavior:**
+- Does NOT produce color palette decisions or art direction documents
+- Explicitly states that art style decisions belong to `art-director`
+- Redirects the request to `art-director`
+- May note it can later implement a color-grading or palette LUT shader once the palette is decided
+
+### Case 3: Performance warning — GPU particle count
+**Input:** "The VFX system is triggering a GPU particle count warning at 50,000 particles in the explosion pool."
+**Expected behavior:**
+- Produces an optimization spec addressing the specific warning
+- Proposes concrete strategies: particle budget caps per emitter, LOD-based particle reduction, GPU instancing, or switching to mesh-based VFX for distant effects
+- Provides before/after GPU cost estimates where calculable
+- Does NOT change gameplay behavior of the explosion (delegates any gameplay impact to gameplay-programmer)
+
+### Case 4: Engine version compatibility
+**Input:** "Use the new texture sampler API for the water shader."
+**Expected behavior:**
+- Checks the engine version reference (e.g., `docs/engine-reference/godot/VERSION.md`) before suggesting any API
+- Flags if the requested API is post-cutoff (e.g., Godot 4.4+ texture type changes)
+- Provides the correct syntax for the project's pinned engine version
+- If uncertain about post-cutoff behavior, explicitly states the uncertainty and directs to verified docs
+
+### Case 5: Context pass — uses performance budget
+**Input:** Performance budget from `technical-preferences.md` provided in context: 2ms GPU frame budget, max 200 draw calls. Request: "Optimize the forest rendering system."
+**Expected behavior:**
+- References the specific 2ms GPU budget and 200 draw call limit from the provided context
+- Proposes optimizations calibrated to those exact targets (e.g., "batching reduces draw calls from 340 to ~180, within the 200 limit")
+- Does NOT propose optimizations that would exceed the stated budgets in other dimensions
+- Produces a ranked list of optimizations by expected impact vs. implementation cost
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (shaders, VFX, rendering optimization, art pipeline)
+- [ ] Redirects art style decisions to art-director
+- [ ] Returns structured findings (shader code, optimization specs with metrics, node graphs)
+- [ ] Does not modify gameplay code files without explicit delegation
+- [ ] Checks engine version reference before suggesting post-cutoff APIs
+- [ ] Quantifies performance changes against stated budgets
+
+---
+
+## Coverage Notes
+- Dissolve shader (Case 1) should include a visual test reference in `production/qa/evidence/`
+- Engine version check (Case 4) confirms the agent treats VERSION.md as authoritative
+- Performance budget case (Case 5) verifies the agent reads and applies provided context numbers
--- a/Framework/agents/specialists/tools-programmer.md
+++ b/Framework/agents/specialists/tools-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: tools-programmer
+
+## Agent Summary
+Domain: Editor extensions, content authoring tools, debug utilities, and pipeline automation scripts.
+Does NOT own: game code (gameplay-programmer, ui-programmer, etc.), engine core systems (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references editor tools / pipeline / debug utilities)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over game source code or engine internals
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create a custom editor tool for placing enemy patrol waypoints in the level."
+**Expected behavior:**
+- Produces an editor extension spec and code scaffold for the configured engine (e.g., Godot EditorPlugin, Unity Editor window, Unreal Detail Customization)
+- Tool allows designer to click-place waypoints in the scene/viewport
+- Waypoints are serialized as engine-native resource (not hardcoded) so level-designer can edit without code
+- Includes undo/redo support per editor plugin best practices
+- Does NOT modify the AI pathfinding runtime code (that belongs to ai-programmer)
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the enemy melee combo system in code."
+**Expected behavior:**
+- Does NOT produce gameplay mechanic code
+- Explicitly states that combat system implementation belongs to `gameplay-programmer`
+- Redirects the request to `gameplay-programmer`
+- May note it can build a debug overlay tool to visualize combo state if useful during development
+
+### Case 3: Runtime data access — coordination required
+**Input:** "The waypoint editor tool needs to read game data at runtime to validate patrol routes against the AI budget."
+**Expected behavior:**
+- Identifies that runtime data access from an editor plugin requires a defined, safe interface to the game's runtime systems
+- Coordinates with `engine-programmer` to establish a read-only data access pattern (e.g., a resource validation API)
+- Does NOT directly read internal engine or game memory structures without an agreed interface
+- Documents the required interface before implementing the tool
+
+### Case 4: Engine version breakage
+**Input:** "After the engine upgrade, the waypoint editor tool crashes on startup."
+**Expected behavior:**
+- Checks the engine version reference (`docs/engine-reference/`) for breaking changes in editor plugin APIs
+- Identifies the specific API or signal that changed in the new version
+- Produces a targeted fix for the breaking change
+- Notes any other tools that may be affected by the same API change
+
+### Case 5: Context pass — art pipeline requirements
+**Input:** Art pipeline requirements provided in context: "All texture imports must set compression to VRAM Compressed, generate mipmaps, and tag with a LOD group." Request: "Build an asset import tool that enforces these settings."
+**Expected behavior:**
+- References all three requirements from the context: VRAM compression, mipmap generation, LOD group tagging
+- Produces an import tool that validates and applies all three settings on import
+- Adds a warning or error report for assets that fail to meet the specified settings
+- Does NOT change the art pipeline requirements themselves (those belong to art-director / technical-artist)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (editor tools, pipeline scripts, debug utilities)
+- [ ] Redirects game code requests to appropriate programmer agents
+- [ ] Returns structured findings (tool specs, editor extension code, pipeline scripts)
+- [ ] Coordinates with engine-programmer before accessing runtime data from editor context
+- [ ] Checks engine version reference before using editor plugin APIs
+- [ ] Builds tools to enforce requirements, does not author the requirements themselves
+
+---
+
+## Coverage Notes
+- Waypoint editor tool (Case 1) should have a smoke test verifying it loads without errors in the editor
+- Runtime data access (Case 3) confirms the agent respects the engine-programmer's ownership of core APIs
+- Art pipeline context (Case 5) verifies the agent builds to match provided specs rather than inventing requirements
--- a/Framework/agents/specialists/ui-programmer.md
+++ b/Framework/agents/specialists/ui-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ui-programmer
+
+## Agent Summary
+Domain: Menu screens, HUDs, inventory screens, dialogue boxes, UI framework code, and data binding.
+Does NOT own: UX flow design (ux-designer), visual style direction (art-director / technical-artist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references menus / HUDs / UI framework / data binding)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow design or visual art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement the inventory screen from the UX spec in `design/ux/inventory-flow.md`."
+**Expected behavior:**
+- Reads the UX spec before producing any code
+- Produces implementation using the project's configured UI framework (UI Toolkit, UGUI, UMG, or Godot Control nodes)
+- Implements all states defined in the spec (default, hover, selected, empty-slot, locked-slot)
+- Binds inventory data to UI elements via the project's data model, not hardcoded values
+- Includes doc comments on public UI API per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the inventory interaction flow — what happens when the player equips, drops, or combines items."
+**Expected behavior:**
+- Does NOT produce interaction flow design or user flow diagrams
+- Explicitly states that UX flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- Notes that once the flow spec is ready, it can implement it
+
+### Case 3: Custom animation coordination
+**Input:** "The item selection in the inventory needs a custom bounce animation when selected."
+**Expected behavior:**
+- Recognizes that defining the animation curve and feel is within technical-artist territory
+- Does NOT invent animation parameters (timing, easing) without a spec
+- Coordinates with `technical-artist` for an animation spec (duration, easing curve, overshoot amount)
+- Once the spec is provided, produces the implementation binding the animation to the selection state
+
+### Case 4: Ambiguous UX spec — flags back
+**Input:** The UX spec states "show item details on selection" but does not define what happens when an empty slot is selected.
+**Expected behavior:**
+- Identifies the ambiguity in the spec (empty slot selection state is undefined)
+- Does NOT make an arbitrary implementation decision for the undefined state
+- Flags the ambiguity back to `ux-designer` with the specific question: "What should the detail panel show when an empty inventory slot is selected?"
+- May propose two common options (hide panel / show placeholder) to help ux-designer decide quickly
+
+### Case 5: Context pass — engine UI toolkit
+**Input:** Engine context provided: project uses Godot 4.6 with Control node UI. Request: "Implement a scrollable item list for the inventory."
+**Expected behavior:**
+- Uses Godot's `ScrollContainer` + `VBoxContainer` + `ItemList` (or equivalent) pattern, not Canvas or UGUI
+- Does NOT produce Unity UGUI or Unreal UMG code for a Godot project
+- Checks the engine version reference (4.6) for any Control node API changes from 4.4/4.5 before using specific APIs
+- Produces GDScript or C# code consistent with the project's configured language
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (menus, HUDs, UI framework, data binding)
+- [ ] Redirects UX flow design to ux-designer
+- [ ] Coordinates with technical-artist for animation specs before implementing animations
+- [ ] Flags ambiguous UX specs back to ux-designer rather than making arbitrary implementation decisions
+- [ ] Returns structured output (implementation code, data binding patterns, state machine for UI states)
+- [ ] Uses the correct engine UI toolkit for the project — never cross-engine code
+
+---
+
+## Coverage Notes
+- Inventory implementation (Case 1) should have a UI interaction test or manual walkthrough doc in `production/qa/evidence/`
+- Animation coordination (Case 3) confirms the agent does not invent feel parameters without a spec
+- Ambiguous spec (Case 4) verifies the agent routes spec gaps back to the authoring agent rather than guessing
--- a/Framework/agents/specialists/ux-designer.md
+++ b/Framework/agents/specialists/ux-designer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ux-designer
+
+## Agent Summary
+Domain: User experience flows, interaction design, information architecture, input handling design, and onboarding UX.
+Does NOT own: visual art style (art-director), UI implementation code (ui-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UX flows / interaction design / information architecture)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over visual art direction or UI implementation code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Design the inventory management flow for a survival game."
+**Expected behavior:**
+- Produces a user flow diagram (states and transitions) for the inventory: open, browse, select item, sub-actions (equip/drop/combine), close
+- Defines all interaction states (default, hover, selected, empty-slot, locked-slot)
+- Specifies input mappings for each action (keyboard, gamepad if applicable)
+- Notes cognitive load considerations (e.g., maximum items visible without scrolling)
+- Does NOT produce visual design (colors, icons) or implementation code
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the inventory screen in GDScript with drag-and-drop support."
+**Expected behavior:**
+- Does NOT produce implementation code
+- Explicitly states that UI code implementation belongs to `ui-programmer`
+- Redirects the request to `ui-programmer`
+- Notes that the UX flow spec should be provided to ui-programmer as the implementation reference
+
+### Case 3: Flow depth conflict — simplification
+**Input:** "The lead designer says the current 5-step crafting flow is too deep; maximum 3 steps allowed."
+**Expected behavior:**
+- Produces a revised 3-step flow that collapses the original 5-step sequence
+- Shows clearly what was merged or removed and why each collapse is safe from a usability standpoint
+- Does NOT simply remove steps without addressing the user's goal at each removed step
+- Flags if the 3-step constraint makes any required use case impossible and proposes an alternative
+
+### Case 4: Accessibility conflict
+**Input:** "The onboarding flow uses a timed prompt (auto-advances after 3 seconds) to keep pace, but this conflicts with accessibility requirements for user-controlled timing."
+**Expected behavior:**
+- Identifies the conflict with WCAG 2.1 2.2.1 (Timing Adjustable)
+- Does NOT override the accessibility requirement to preserve pace
+- Coordinates with `accessibility-specialist` to agree on a compliant solution
+- Proposes alternatives: pause-on-hover, skip button, settings option to disable auto-advance
+
+### Case 5: Context pass — player mental model research
+**Input:** Playtest research provided in context: "Players consistently expected the 'Crafting' option to be inside the Inventory screen, not in a separate top-level menu." Request: "Redesign the navigation IA for crafting."
+**Expected behavior:**
+- References the specific player expectation from the research (crafting expected inside inventory)
+- Restructures the information architecture to place crafting as a tab or panel within the inventory screen
+- Does NOT produce a design that contradicts the stated player mental model without explicit justification
+- Notes the research source in the rationale for the design decision
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UX flows, interaction design, IA, onboarding)
+- [ ] Redirects code implementation to ui-programmer, visual style to art-director
+- [ ] Returns structured findings (state diagrams, flow steps, input mappings) not freeform opinions
+- [ ] Coordinates with accessibility-specialist when flows have timing or cognitive load constraints
+- [ ] Designs flows based on provided user research, not assumed behavior
+- [ ] Documents rationale for flow decisions against user goals
+
+---
+
+## Coverage Notes
+- Inventory flow (Case 1) should be written to `design/ux/` as a spec for ui-programmer to implement against
+- Mental model case (Case 5) verifies the agent applies research evidence, not intuition
+- Accessibility coordination (Case 4) confirms the agent does not override accessibility requirements for UX aesthetics
--- a/Framework/agents/specialists/world-builder.md
+++ b/Framework/agents/specialists/world-builder.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: world-builder
+
+## Agent Summary
+- **Domain**: World lore architecture — factions and their cultures/governments/motivations, world history, geography and ecology, cosmology and metaphysics, world rules (how magic works, what is and is not possible), internal consistency enforcement across the world document
+- **Does NOT own**: Specific NPC or quest dialogue (writer), game mechanics rules derived from world rules (game-designer/systems-designer), narrative story structure and arc design (narrative-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates world rule/mechanic conflicts to narrative-director and game-designer jointly
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references world lore, factions, history, world rules, ecology)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/narrative/world/ documents; no game source, mechanic design, or dialogue files)
+- [ ] Model tier is Sonnet (default for creative specialists)
+- [ ] Agent definition does not claim authority over dialogue writing, mechanic design, or narrative arc structure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — faction culture and government design
+**Input**: "Design the Ironveil Merchant Consortium — a powerful trading faction in our world. I need their culture, government structure, and internal motivations."
+**Expected behavior**:
+- Produces a faction profile document with: cultural values and norms, government structure (how decisions are made, who holds power, succession or appointment process), internal factions or tensions within the consortium, relationship to other factions (allies, rivals, neutral parties), and primary motivations (what they want and why)
+- The faction is internally consistent: a merchant consortium's government is driven by economic logic, not feudal or religious logic, unless a deliberate hybrid is specified
+- Output includes at least one internal tension or contradiction within the faction — factions without internal complexity are flat
+- Formatted as a structured faction profile, not a narrative essay
+
+### Case 2: Out-of-domain request — dialogue writing
+**Input**: "Write the dialogue for a Ironveil Consortium merchant NPC that the player meets at the city gates."
+**Expected behavior**:
+- Does not produce NPC dialogue
+- States clearly: "Dialogue writing is owned by writer; I provide the world and faction context that informs the dialogue, including the faction's culture, tone, and speaking style"
+- Offers to produce the faction's speaking style notes and cultural context that writer would need to write consistent dialogue
+
+### Case 3: New lore entry contradicts established history — conflict flagging
+**Input**: "Add a lore entry stating the Ironveil Consortium was founded 50 years ago by a single merchant family." [Context includes existing lore: the Consortium has existed for 300 years and was founded as a collective by 12 rival trading houses.]
+**Expected behavior**:
+- Identifies the contradiction: existing lore states 300-year history and a founding coalition of 12 houses; the new entry claims 50 years and a single founding family
+- Does NOT write the new entry as requested
+- Flags the conflict: states both versions, identifies which is established and which is the proposed change
+- Proposes resolution options: (a) the new entry is wrong and should be corrected; (b) the existing lore should be updated if the new version is the intended canon; (c) there is an in-world explanation (the current family claims founding credit despite the collective origin — a deliberate narrative unreliable narrator)
+- Routes the resolution to narrative-director if no clear answer exists
+
+### Case 4: World rule has gameplay implications — coordination with game-designer
+**Input**: "I want to establish a world rule: magic users who cast spells near iron ore are weakened. Iron disrupts arcane energy."
+**Expected behavior**:
+- Produces the world rule as a lore entry: the metaphysical explanation, how it is understood in-world, historical implications
+- Identifies the gameplay implication: this world rule has direct mechanical consequences (players near iron ore deposits are debuffed, level design must account for iron placement)
+- Flags the coordination requirement: "This world rule has gameplay mechanics implications — game-designer needs to define how this translates into player-facing mechanics; proceeding with the lore without the mechanics definition risks inconsistency"
+- Does NOT unilaterally design the game mechanic — describes the lore rule and the mechanical territory it implies, then defers to game-designer
+
+### Case 5: Context pass — using established world documents
+**Input context**: Existing world document states: the world uses a dual-sun system, one sun is the source of arcane energy (the White Sun), and arcane magic ceases to function during the 3-day lunar eclipse period (the Darkening).
+**Input**: "Add a lore entry about the Mages' College and how they prepare for the Darkening."
+**Expected behavior**:
+- Uses the established dual-sun cosmology: references the White Sun as the source of arcane energy
+- Uses the established Darkening event: 3-day eclipse, magic ceases
+- Does NOT invent a different eclipse mechanism, duration, or name
+- Produces a lore entry where the Mages' College's Darkening preparations are consistent with the established rules: they cannot cast during the Darkening, so preparations are practical (stockpiling non-magical supplies, scheduling, shutting down ongoing magical processes)
+- Does not contradict any established fact from the context document
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (factions, world history, geography, ecology, world rules, cosmology)
+- [ ] Redirects dialogue writing requests to writer with contextual faction notes
+- [ ] Flags lore contradictions with both versions stated and resolution options offered — does not silently overwrite established lore
+- [ ] Identifies gameplay implications of world rules and flags coordination with game-designer
+- [ ] Uses all established world facts from context; does not invent alternatives to stated lore
+
+---
+
+## Coverage Notes
+- Case 3 (contradiction detection) requires existing lore to be in context — this is the most important consistency test
+- Case 4 (world rule/mechanic coordination) tests cross-domain awareness; verify the agent identifies the mechanic boundary without crossing it
+- Case 5 is the most important context-awareness test; the agent must use established facts, not creative alternatives
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/specialists/writer.md
+++ b/Framework/agents/specialists/writer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: writer
+
+## Agent Summary
+- **Domain**: In-game written content — NPC dialogue (including branching trees), lore codex entries, item and ability descriptions, environmental text (signs, books, notes), quest text, tutorial text, in-world written documents
+- **Does NOT own**: Story architecture and narrative structure (narrative-director), world lore and world rules (world-builder), UX copy and UI labels (ux-designer), patch notes (community-manager)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; flags lore inconsistencies to narrative-director rather than resolving them autonomously
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references dialogue, lore entries, item descriptions, in-game text)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/narrative/ and assets/data/dialogue/; no code or world-building architecture files)
+- [ ] Model tier is Sonnet (default for creative specialists)
+- [ ] Agent definition does not claim authority over narrative structure, world rules, or UX copy direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — NPC merchant dialogue
+**Input**: "Write dialogue for Mira, a traveling merchant NPC. She sells general supplies. Players can ask her about her wares, the road ahead, and rumors."
+**Expected behavior**:
+- Produces a dialogue tree with at least three top-level conversation options: [Wares], [The Road Ahead], [Rumors]
+- Each branch has a distinct conversational response in Mira's voice — not generic merchant filler
+- Includes at least one response that has a follow-up branch (showing tree structure, not just flat responses)
+- Mira's voice is consistent across branches: if she's warm and chatty in one branch, she's not brusque in another without reason
+- Output is formatted as a structured dialogue tree: node label, NPC line, player options, next node
+
+### Case 2: Out-of-domain request — world history design
+**Input**: "Design the history of the world — when the first kingdom was founded, what the great wars were, and why magic was banned."
+**Expected behavior**:
+- Does not produce world history, lore architecture, or world rules
+- States clearly: "World history, lore, and world rules are owned by world-builder; once the history is established, I can write in-game texts, books, and dialogue that reference those events"
+- Does not produce even partial world history as a "placeholder"
+
+### Case 3: Dialogue contradicts established lore — flag to narrative-director
+**Input**: "Write Mira's dialogue line where she mentions that dragons have been extinct for 200 years." [Context includes existing lore: dragons are alive and revered in the northern provinces, not extinct.]
+**Expected behavior**:
+- Identifies the contradiction: established lore states dragons are alive and revered; dialogue stating they're extinct directly conflicts
+- Does NOT write the requested line as given
+- Flags the inconsistency to narrative-director: "Mira's dialogue as requested contradicts established lore (dragons are alive per world-builder's document); requires narrative-director resolution before I can write this line"
+- Offers an alternative: a line that references dragons in a way consistent with the established lore (e.g., Mira expresses awe about a dragon sighting in the north)
+
+### Case 4: Item description references an undesigned mechanic
+**Input**: "Write a description for the 'Berserker's Chalice' — a consumable that triggers the Berserker state when drunk."
+**Expected behavior**:
+- Identifies the dependency gap: "Berserker state" is not defined in any provided game design document
+- Flags the missing dependency: "This description references a 'Berserker state' mechanic that has no GDD entry — I cannot write accurate flavor text for a mechanic whose rules are undefined, as the description may create incorrect player expectations"
+- Does NOT write a description that invents mechanic details (duration, effects) that may conflict with the eventual design
+- Offers two paths: (a) write a vague, non-mechanical description that creates no false expectations, flagged as temporary; (b) wait for game-designer to define the Berserker state first
+
+### Case 5: Context pass — character voice guide
+**Input context**: Character voice guide for Mira: She speaks in short, energetic sentences. Uses merchant slang ("a fine bargain," "coin well spent"). Drops pronouns occasionally ("Good wares, these."). Never uses contractions — always "I will" not "I'll". Warm but slightly mercenary.
+**Input**: "Write Mira's response when a player asks if she has healing potions."
+**Expected behavior**:
+- Short, energetic sentences — no long monologues
+- Uses merchant slang: "a fine bargain," "coin well spent," or similar
+- Drops pronouns where natural: "Fine stock, these potions."
+- No contractions: "I will" not "I'll," "do not" not "don't"
+- Warm tone with a mercenary undertone: she's happy to help because you're a paying customer
+- Does NOT produce dialogue that violates any voice guide rule — check each rule explicitly
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (dialogue, lore entries, item descriptions, in-game text)
+- [ ] Redirects world history and world rule requests to world-builder without producing unauthorized lore
+- [ ] Flags lore contradictions to narrative-director rather than silently writing inconsistent content
+- [ ] Identifies mechanic dependency gaps before writing item descriptions that could create false player expectations
+- [ ] Applies all rules from a provided character voice guide — no partial compliance
+
+---
+
+## Coverage Notes
+- Case 3 (lore contradiction detection) requires that existing lore is in the conversation context — test is only valid when context is provided
+- Case 4 (dependency gap) tests whether the agent writes descriptions that could set wrong player expectations — a subtle but important quality issue
+- Case 5 is the most important context-awareness test; voice guide compliance must be checked rule-by-rule, not holistically
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/catalog.yaml
+++ b/Framework/catalog.yaml
--- a/Framework/quality-rubric.md
+++ b/Framework/quality-rubric.md
@@ -0,0 +1,249 @@
+# Skill Quality Rubric
+
+Used by `/skill-test category [name|all]` to evaluate skills beyond structural compliance.
+Each category defines 4–5 binary PASS/FAIL metrics specific to the skill's job.
+
+A metric is PASS when the skill's written instructions clearly satisfy the criterion.
+A metric is FAIL when the instructions are absent, ambiguous, or contradictory.
+A metric is WARN when the instructions partially address the criterion.
+
+---
+
+## Skill Categories
+
+### `gate`
+
+**Skills**: gate-check
+
+Gate skills control phase transitions. They must enforce correctness without
+auto-advancing stage and must respect the three review modes.
+
+| Metric | PASS criteria |
+|---|---|
+| **G1 — Review mode read** | Skill reads `production/session-state/review-mode.txt` (or equivalent) before deciding which directors to spawn |
+| **G2 — Full mode: all 4 directors spawn** | In `full` mode, all 4 Tier-1 directors (CD, TD, PR, AD) PHASE-GATE prompts are invoked in parallel |
+| **G3 — Lean mode: PHASE-GATE only** | In `lean` mode, only `*-PHASE-GATE` gates run; inline gates (CD-PILLARS, TD-ARCHITECTURE, etc.) are skipped |
+| **G4 — Solo mode: no directors** | In `solo` mode, no director gates spawn; each is noted as "skipped — Solo mode" |
+| **G5 — No auto-advance** | Skill never writes `production/stage.txt` without explicit user confirmation via "May I write" |
+
+---
+
+### `review`
+
+**Skills**: design-review, architecture-review, review-all-gdds
+
+Review skills read documents and produce structured verdicts. They are primarily
+read-only and must not trigger director gates during the analysis phase.
+
+| Metric | PASS criteria |
+|---|---|
+| **R1 — Read-only enforcement** | Skill does not modify the reviewed document without explicit user approval; any write operations (review logs, index updates) are gated behind "May I write" |
+| **R2 — 8-section check** | Skill evaluates all 8 required GDD sections (or equivalent architectural sections) explicitly |
+| **R3 — Correct verdict vocabulary** | Verdict is exactly one of: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED (design) or PASS / CONCERNS / FAIL (architecture) |
+| **R4 — No director gates during analysis** | Skill does not spawn director gates during its analysis phases; post-analysis director review (as in architecture-review) is acceptable when the skill's scope and stakes warrant it |
+| **R5 — Structured findings** | Output contains a per-section status table or checklist before the final verdict |
+
+> **Exceptions:**
+> - `design-review`: Has `Write, Edit` in allowed-tools to support an optional "Revise now" path (all writes gated behind user approval) and to write review logs. R1 is satisfied because the reviewed document is never silently modified.
+> - `architecture-review`: Spawns TD-ARCHITECTURE and LP-FEASIBILITY gates after its analysis is complete. This is intentional — architecture review is high-stakes and benefits from director sign-off. R4 is satisfied because the gates run post-analysis, not during it.
+
+---
+
+### `authoring`
+
+**Skills**: design-system, quick-design, architecture-decision, ux-design, ux-review, art-bible, create-architecture
+
+Authoring skills create or update design documents collaboratively. Full GDD/UX
+authoring skills use a section-by-section cycle; lightweight authoring skills use
+a single-draft pattern appropriate to their smaller scope.
+
+| Metric | PASS criteria |
+|---|---|
+| **A1 — Section-by-section cycle** | Full authoring skills (design-system, ux-design, art-bible) author one section at a time, presenting content for approval before proceeding to the next. Lightweight skills (quick-design, architecture-decision, create-architecture) may draft the complete document then ask for approval — single-draft is acceptable for documents under ~4 hours of implementation scope. |
+| **A2 — May-I-write per section** | Full authoring skills ask "May I write this to [filepath]?" before each section write. Lightweight skills ask once for the complete document. |
+| **A3 — Retrofit mode** | Skill detects if the target file already exists and offers to update specific sections rather than overwriting the whole document. Lightweight skills (quick-design) that always create new files are exempt. |
+| **A4 — Director gate at correct tier** | If a director gate is defined for this skill (e.g., CD-GDD-ALIGN, TD-ADR), it runs at the correct mode threshold (full/lean) — NOT in solo |
+| **A5 — Skeleton-first** | Full authoring skills create a file skeleton with all section headers before filling content, to preserve progress on session interruption. Lightweight skills are exempt. |
+
+> **Full authoring skills** (must pass all 5 metrics): `design-system`, `ux-design`, `art-bible`
+> **Lightweight authoring skills** (A1, A2, A5 use single-draft pattern; A3 exempt for new-file-only skills): `quick-design`, `architecture-decision`, `create-architecture`
+> **Review-mode skill** (evaluated against review metrics): `ux-review`
+
+---
+
+### `readiness`
+
+**Skills**: story-readiness, story-done
+
+Readiness skills validate stories before or after implementation. They must produce
+multi-dimensional verdicts and integrate correctly with director gate mode.
+
+| Metric | PASS criteria |
+|---|---|
+| **RD1 — Multi-dimensional check** | Skill checks ≥3 independent dimensions (e.g., Design, Architecture, Scope, DoD) and reports each separately |
+| **RD2 — Three verdict levels** | Verdict hierarchy is clearly defined: READY/COMPLETE > NEEDS WORK/COMPLETE WITH NOTES > BLOCKED |
+| **RD3 — BLOCKED requires external action** | BLOCKED verdict is reserved for issues that cannot be fixed by the story author alone (e.g., Proposed ADR, unresolvable dependency) |
+| **RD4 — Director gate at correct mode** | QL-STORY-READY or LP-CODE-REVIEW gate spawns in `full` mode, skips in `lean`/`solo` with a noted skip message |
+| **RD5 — Next-story handoff** | After completion, skill surfaces the next READY story from the active sprint |
+
+---
+
+### `pipeline`
+
+**Skills**: create-epics, create-stories, dev-story, create-control-manifest, propagate-design-change, map-systems
+
+Pipeline skills produce artifacts that other skills consume. They must write files
+with correct schema, respect layer/priority ordering, and gate before writing.
+
+| Metric | PASS criteria |
+|---|---|
+| **P1 — Correct output schema** | Each produced file follows the project template (EPIC.md, story frontmatter, etc.); skill references the template path |
+| **P2 — Layer/priority ordering** | Skills that produce epics or stories respect layer ordering (core → extended → meta) and priority fields |
+| **P3 — May-I-write before each artifact** | Skill asks "May I write [artifact]?" before creating each output file, not batch-approving all files at once |
+| **P4 — Director gate at correct tier** | In-scope gates (PR-EPIC, QL-STORY-READY, LP-CODE-REVIEW, etc.) run in `full`, skip in `lean`/`solo` with noted skip |
+| **P5 — Reads before writes** | Skill reads the relevant GDD/ADR/manifest before producing artifacts to ensure alignment |
+
+---
+
+### `analysis`
+
+**Skills**: consistency-check, balance-check, content-audit, code-review, tech-debt,
+scope-check, estimate, perf-profile, asset-audit, security-audit, test-evidence-review, test-flakiness
+
+Analysis skills scan the project and surface findings. They are read-only during
+analysis and must ask before recommending any file writes.
+
+| Metric | PASS criteria |
+|---|---|
+| **AN1 — Read-only scan** | Analysis phase uses only Read/Glob/Grep tools; no Write or Edit during the scan itself |
+| **AN2 — Structured findings table** | Output includes a findings table or checklist (not prose only) with severity/priority per finding |
+| **AN3 — No auto-write** | Any suggested file writes (e.g., tech-debt register, fix patches) are gated behind "May I write" |
+| **AN4 — No director gates during analysis** | Analysis skills do not spawn director gates; they produce findings for human review |
+
+---
+
+### `team`
+
+**Skills**: team-combat, team-narrative, team-audio, team-level, team-ui, team-qa,
+team-release, team-polish, team-live-ops
+
+Team skills orchestrate multiple specialist agents for a department. They must
+spawn the right agents, run independent ones in parallel, and surface blocks immediately.
+
+| Metric | PASS criteria |
+|---|---|
+| **T1 — Named agent list** | Skill explicitly names which agents it spawns and in what order |
+| **T2 — Parallel where independent** | Agents whose inputs don't depend on each other are spawned in parallel (single message, multiple Task calls) |
+| **T3 — BLOCKED surfacing** | If any spawned agent returns BLOCKED or fails, skill surfaces it immediately and halts dependent work — never silently skips |
+| **T4 — Collect all verdicts before proceeding** | Dependent phases wait for all parallel agents to complete before proceeding |
+| **T5 — Usage error on no argument** | If required argument (e.g., feature name) is missing, skill outputs usage hint and stops without spawning agents |
+
+---
+
+### `sprint`
+
+**Skills**: sprint-plan, sprint-status, milestone-review, retrospective, changelog, patch-notes
+
+Sprint skills read production state and produce reports or planning artifacts.
+They have a PR-SPRINT or PR-MILESTONE gate at specific mode thresholds.
+
+| Metric | PASS criteria |
+|---|---|
+| **SP1 — Reads sprint/milestone state** | Skill reads `production/sprints/` or `production/milestones/` before producing output |
+| **SP2 — Correct sprint gate** | PR-SPRINT (for planning) or PR-MILESTONE (for milestone review) gate runs in `full` mode, skips in `lean`/`solo` |
+| **SP3 — Structured output** | Output uses a consistent structure (velocity table, risk list, action items) rather than free prose |
+| **SP4 — No auto-commit** | Skill never writes sprint files or milestone records without "May I write" |
+
+---
+
+### `utility`
+
+**Skills**: start, help, brainstorm, onboard, adopt, hotfix, prototype, localize,
+launch-checklist, release-checklist, smoke-check, soak-test, test-setup, test-helpers,
+regression-suite, qa-plan, bug-triage, bug-report, playtest-report, asset-spec,
+reverse-document, project-stage-detect, setup-engine, skill-test, skill-improve,
+day-one-patch, and any other skills not in categories above
+
+Utility skills pass the 7 standard static checks. If they happen to spawn director
+gates, the gate mode logic must also be correct.
+
+| Metric | PASS criteria |
+|---|---|
+| **U1 — Passes all 7 static checks** | `/skill-test static [name]` returns COMPLIANT with 0 FAILs |
+| **U2 — Gate mode correct (if applicable)** | If the skill spawns any director gate, it reads review-mode and applies full/lean/solo logic correctly |
+
+---
+
+## Agent Categories
+
+Used to validate agent spec files in `tests/agents/`.
+
+### `director`
+
+**Agents**: creative-director, technical-director, art-director, producer
+
+| Metric | PASS criteria |
+|---|---|
+| **D1 — Correct verdict vocabulary** | Returns APPROVE / CONCERNS / REJECT (or domain equivalent: REALISTIC/CONCERNS/UNREALISTIC for producer) |
+| **D2 — Domain boundary respected** | Does not make binding decisions outside its declared domain |
+| **D3 — Conflict escalation** | When two departments conflict, escalates to correct parent (creative-director or technical-director) rather than unilaterally deciding |
+| **D4 — Opus model tier** | Agent is assigned Opus model per coordination-rules.md |
+
+### `lead`
+
+**Agents**: lead-programmer, qa-lead, narrative-director, audio-director, game-designer,
+systems-designer, level-designer
+
+| Metric | PASS criteria |
+|---|---|
+| **L1 — Domain verdict** | Returns a domain-specific verdict (e.g., FEASIBLE/INFEASIBLE for lead-programmer, PASS/FAIL for qa-lead) |
+| **L2 — Escalates to shared parent** | Out-of-domain conflicts escalate to creative-director (design) or technical-director (tech) |
+| **L3 — Sonnet model tier** | Agent is assigned Sonnet model (default) per coordination-rules.md |
+
+### `specialist`
+
+**Agents**: gameplay-programmer, ai-programmer, technical-artist, sound-designer,
+engine-programmer, tools-programmer, network-programmer, security-engineer,
+accessibility-specialist, ux-designer, ui-programmer, performance-analyst, prototyper,
+qa-tester, writer, world-builder
+
+| Metric | PASS criteria |
+|---|---|
+| **S1 — Stays in domain** | Explicitly scopes itself to its declared domain; defers out-of-domain requests |
+| **S2 — No binding cross-domain decisions** | Does not unilaterally decide matters owned by another specialist |
+| **S3 — Defers correctly** | Out-of-domain requests are redirected to the correct agent, not refused silently |
+
+### `engine`
+
+**Agents**: godot-specialist, godot-gdscript-specialist, godot-csharp-specialist,
+godot-shader-specialist, godot-gdextension-specialist, unity-specialist, unity-ui-specialist,
+unity-shader-specialist, unity-dots-specialist, unity-addressables-specialist,
+unreal-specialist, ue-blueprint-specialist, ue-gas-specialist, ue-umg-specialist,
+ue-replication-specialist
+
+| Metric | PASS criteria |
+|---|---|
+| **E1 — Version-aware** | References engine version from `docs/engine-reference/` before suggesting API calls; flags post-cutoff risk |
+| **E2 — File routing** | Routes file types to the correct sub-specialist (e.g., `.gdshader` → godot-shader-specialist, not godot-gdscript-specialist) |
+| **E3 — Engine-specific patterns** | Enforces engine-specific idioms (e.g., GDScript static typing, C# attribute exports, Blueprint function libraries) |
+
+### `qa`
+
+**Agents**: qa-tester, qa-lead, security-engineer, accessibility-specialist
+
+| Metric | PASS criteria |
+|---|---|
+| **Q1 — Produces artifacts not code** | Primary output is test cases, bug reports, or coverage gaps — not implementation code |
+| **Q2 — Evidence format** | Test cases follow the project's test evidence format (unit/integration/visual/UI per coding-standards.md) |
+| **Q3 — No scope creep** | Does not propose new features; flags gaps for humans to decide |
+
+### `operations`
+
+**Agents**: devops-engineer, release-manager, live-ops-designer, community-manager,
+analytics-engineer, economy-designer, localization-lead
+
+| Metric | PASS criteria |
+|---|---|
+| **O1 — Domain ownership clear** | Agent description clearly states what it owns (pipeline, releases, economy, etc.) |
+| **O2 — Defers implementation** | Does not write game logic or engine code; delegates to appropriate specialist |
+| **O3 — Toolset matches role** | `allowed-tools` in frontmatter matches the operational (not coding) nature of the role |
--- a/Framework/skills/analysis/asset-audit.md
+++ b/Framework/skills/analysis/asset-audit.md
@@ -0,0 +1,170 @@
+# Skill Test Spec: /asset-audit
+
+## Skill Summary
+
+`/asset-audit` audits the `assets/` directory for naming convention compliance,
+missing metadata, and format/size issues. It reads asset files against the
+conventions and budgets defined in `technical-preferences.md`. No director gates
+are invoked. The skill does not write without user approval. Verdicts: COMPLIANT,
+WARNINGS, or NON-COMPLIANT.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLIANT, WARNINGS, NON-COMPLIANT
+- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
+- [ ] Has a next-step handoff (what to do after audit results)
+
+---
+
+## Director Gate Checks
+
+None. Asset auditing is a read-only analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All assets follow naming conventions
+
+**Fixture:**
+- `technical-preferences.md` specifies naming convention: `snake_case`, e.g., `enemy_grunt_idle.png`
+- `assets/art/characters/` contains: `enemy_grunt_idle.png`, `enemy_sniper_run.png`
+- `assets/audio/sfx/` contains: `sfx_jump_land.ogg`, `sfx_item_pickup.ogg`
+- All files are within size budget (textures ≤2MB, audio ≤500KB)
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads naming conventions and size budgets from `technical-preferences.md`
+2. Skill scans `assets/` recursively
+3. All files match `snake_case` convention; all within budget
+4. Audit table shows all rows PASS
+5. Verdict is COMPLIANT
+
+**Assertions:**
+- [ ] Audit covers both art and audio asset directories
+- [ ] Each file is checked against naming convention and size budget
+- [ ] All rows show PASS when compliant
+- [ ] Verdict is COMPLIANT
+- [ ] No files are written
+
+---
+
+### Case 2: Non-Compliant — Textures exceed size budget
+
+**Fixture:**
+- `assets/art/environment/` contains 5 texture files
+- 3 texture files are 4MB each (budget: ≤2MB)
+- 2 texture files are within budget
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads size budget from `technical-preferences.md` (2MB for textures)
+2. Skill scans `assets/art/environment/` — finds 3 oversized textures
+3. Audit table lists each oversized file with actual size and budget
+4. Verdict is NON-COMPLIANT
+5. Skill recommends compression or resolution reduction for flagged files
+
+**Assertions:**
+- [ ] All 3 oversized files are listed by name with actual size and budget size
+- [ ] Verdict is NON-COMPLIANT when any file exceeds its budget
+- [ ] Optimization recommendation is given for oversized files
+- [ ] Within-budget files are also listed (showing PASS) for completeness
+
+---
+
+### Case 3: Format Issue — Audio in wrong format
+
+**Fixture:**
+- `technical-preferences.md` specifies audio format: OGG
+- `assets/audio/music/theme_main.wav` exists (WAV format)
+- `assets/audio/sfx/sfx_footstep.ogg` exists (correct OGG format)
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads audio format requirement: OGG
+2. Skill scans `assets/audio/` — finds `theme_main.wav` in wrong format
+3. Audit table flags `theme_main.wav` as FORMAT ISSUE (expected OGG, found WAV)
+4. `sfx_footstep.ogg` shows PASS
+5. Verdict is WARNINGS (format issues are correctable)
+
+**Assertions:**
+- [ ] `theme_main.wav` is flagged as FORMAT ISSUE with expected and actual format noted
+- [ ] Verdict is WARNINGS (not NON-COMPLIANT) for format issues, which are correctable
+- [ ] Correct-format assets are shown as PASS
+- [ ] Skill does not modify or convert any asset files
+
+---
+
+### Case 4: Missing Asset — Asset referenced by GDD but absent from assets/
+
+**Fixture:**
+- `design/gdd/enemies.md` references `enemy_boss_idle.png`
+- `assets/art/characters/boss/` directory is empty — file does not exist
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads GDD references to find expected assets (cross-references with `/content-audit` scope)
+2. Skill scans `assets/art/characters/boss/` — file not found
+3. Audit table flags `enemy_boss_idle.png` as MISSING ASSET
+4. Verdict is NON-COMPLIANT (missing critical art asset)
+
+**Assertions:**
+- [ ] Skill checks GDD references to identify expected assets
+- [ ] Missing assets are flagged as MISSING ASSET with the GDD reference noted
+- [ ] Verdict is NON-COMPLIANT when critical assets are missing
+- [ ] Skill does not create or add placeholder assets
+
+---
+
+### Case 5: Gate Compliance — No gate; technical-artist may be consulted separately
+
+**Fixture:**
+- 2 files have naming convention violations (CamelCase instead of snake_case)
+- `review-mode.txt` contains `full`
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill scans assets and finds 2 naming violations
+2. No director gate is invoked regardless of review mode
+3. Verdict is WARNINGS
+4. Output notes: "Consider having a Technical Artist review naming conventions"
+5. Skill presents findings; offers optional audit report write
+6. If user opts in: "May I write to `production/qa/asset-audit-[date].md`?"
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Technical artist consultation is suggested (not mandated)
+- [ ] Findings table is presented before any write prompt
+- [ ] Optional audit report write asks "May I write" before writing
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `technical-preferences.md` for naming conventions, formats, and size budgets
+- [ ] Scans `assets/` directory recursively
+- [ ] Audit table shows file name, check type, expected value, actual value, and result
+- [ ] Does not modify any asset files
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: COMPLIANT, WARNINGS, NON-COMPLIANT
+
+---
+
+## Coverage Notes
+
+- Metadata checks (e.g., missing texture import settings in Godot `.import` files)
+  are not explicitly tested here; they follow the same FORMAT ISSUE flagging pattern.
+- The interaction between `/asset-audit` and `/content-audit` (both check GDD
+  references vs. assets) is intentional overlap; `/asset-audit` focuses on
+  compliance while `/content-audit` focuses on completeness.
--- a/Framework/skills/analysis/balance-check.md
+++ b/Framework/skills/analysis/balance-check.md
@@ -0,0 +1,172 @@
+# Skill Test Spec: /balance-check
+
+## Skill Summary
+
+`/balance-check` reads balance data files (JSON or YAML in `assets/data/`) and
+checks each value against the design formulas defined in GDDs under `design/gdd/`.
+It produces a findings table with columns: Value → Formula → Deviation → Severity.
+No director gates are invoked (read-only analysis). The skill may optionally write
+a balance report but asks "May I write" before doing so. Verdicts: BALANCED,
+CONCERNS, or OUT OF BALANCE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: BALANCED, CONCERNS, OUT OF BALANCE
+- [ ] Contains "May I write" language (optional report write)
+- [ ] Has a next-step handoff (what to do after findings are reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Balance check is a read-only analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All balance values within formula tolerances
+
+**Fixture:**
+- `assets/data/combat-balance.json` exists with 6 stat values
+- `design/gdd/combat-system.md` contains formulas for all 6 stats with ±10% tolerance
+- All 6 values fall within tolerance
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads all balance data files in `assets/data/`
+2. Skill reads GDD formulas from `design/gdd/`
+3. Skill computes deviation for each value against its formula
+4. All deviations are within ±10% tolerance
+5. Skill outputs findings table with all rows showing PASS
+6. Verdict is BALANCED
+
+**Assertions:**
+- [ ] Findings table is shown for all checked values
+- [ ] Each row shows: stat name, formula target, actual value, deviation percentage
+- [ ] All rows show PASS or equivalent when within tolerance
+- [ ] Verdict is BALANCED
+- [ ] No files are written without user approval
+
+---
+
+### Case 2: Out of Balance — Player damage 40% above formula target
+
+**Fixture:**
+- `assets/data/combat-balance.json` has `player_damage_base: 140`
+- `design/gdd/combat-system.md` formula specifies `player_damage_base = 100` (±10%)
+- All other stats are within tolerance
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads combat-balance.json and computes deviation for `player_damage_base`
+2. Deviation is +40% — far outside ±10% tolerance
+3. Skill flags this row as severity HIGH in the findings table
+4. Verdict is OUT OF BALANCE
+5. Skill surfaces the HIGH severity item prominently before the table
+
+**Assertions:**
+- [ ] `player_damage_base` row shows deviation of +40%
+- [ ] Severity is HIGH for deviations exceeding tolerance by more than 2×
+- [ ] Verdict is OUT OF BALANCE when any stat has HIGH severity deviation
+- [ ] The HIGH severity item is called out explicitly, not buried in table rows
+
+---
+
+### Case 3: No GDD Formulas — Cannot validate, guidance given
+
+**Fixture:**
+- `assets/data/economy-balance.yaml` exists with 10 stat values
+- No GDD in `design/gdd/` contains formula definitions for economy stats
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads balance data files
+2. Skill searches GDDs for formula definitions — finds none for economy stats
+3. Skill outputs: "Cannot validate economy stats — no formulas defined. Run /design-system first."
+4. No findings table is generated for the economy stats
+5. Verdict is CONCERNS (data exists but cannot be validated)
+
+**Assertions:**
+- [ ] Skill does not fabricate formula targets when none exist in GDDs
+- [ ] Output explicitly names the missing formula source
+- [ ] Output recommends running `/design-system` to define formulas
+- [ ] Verdict is CONCERNS (not BALANCED, since validation was impossible)
+
+---
+
+### Case 4: Orphan Reference — Balance file references an undefined stat
+
+**Fixture:**
+- `assets/data/combat-balance.json` contains a stat `legacy_armor_mult: 1.5`
+- `design/gdd/combat-system.md` has no formula for `legacy_armor_mult`
+- All other stats have formula definitions and pass validation
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads all stats from combat-balance.json
+2. Skill cannot find a formula for `legacy_armor_mult` in any GDD
+3. Skill flags `legacy_armor_mult` as ORPHAN REFERENCE in the findings table
+4. Other stats are evaluated normally; those within tolerance show PASS
+5. Verdict is CONCERNS (orphan reference prevents full validation)
+
+**Assertions:**
+- [ ] `legacy_armor_mult` appears in findings table with status ORPHAN REFERENCE
+- [ ] Orphan references are distinguished from formula deviations in the table
+- [ ] Verdict is CONCERNS when any orphan references are found
+- [ ] Skill does not skip orphan stats silently
+
+---
+
+### Case 5: Gate Compliance — Read-only; no gate; optional report requires approval
+
+**Fixture:**
+- Balance data and GDD formulas exist; 1 stat has CONCERNS-level deviation (15% above target)
+- `review-mode.txt` contains `full`
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads data and GDDs; generates findings table
+2. Verdict is CONCERNS (one stat slightly out of range)
+3. No director gate is invoked
+4. Skill presents findings table to user
+5. Skill offers to write an optional balance report
+6. If user says yes: skill asks "May I write to `production/qa/balance-report-[date].md`?"
+7. If user says no: skill ends without writing
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Findings table is presented without writing anything automatically
+- [ ] Optional report write is offered but not forced
+- [ ] "May I write" prompt appears only if user opts in to the report
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads both balance data files and GDD formulas before analysis
+- [ ] Findings table shows Value, Formula, Deviation, and Severity columns
+- [ ] Does not write any files without explicit user approval
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: BALANCED, CONCERNS, OUT OF BALANCE
+
+---
+
+## Coverage Notes
+
+- The case where `assets/data/` is entirely empty is not tested; behavior
+  follows the CONCERNS pattern with a message that no data files were found.
+- Tolerance thresholds (±10%, ±20%) are implementation details of the skill;
+  the tests verify that deviations are detected and classified, not the
+  exact threshold values.
--- a/Framework/skills/analysis/code-review.md
+++ b/Framework/skills/analysis/code-review.md
@@ -0,0 +1,172 @@
+# Skill Test Spec: /code-review
+
+## Skill Summary
+
+`/code-review` performs an architectural code review of source files in `src/`,
+checking coding standards from `CLAUDE.md` (doc comments on public APIs,
+dependency injection over singletons, data-driven values, testability). Findings
+are advisory. No director gates are invoked. No code edits are made. Verdicts:
+APPROVED, CONCERNS, or NEEDS CHANGES.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, CONCERNS, NEEDS CHANGES
+- [ ] Does NOT require "May I write" language (read-only; findings are advisory output)
+- [ ] Has a next-step handoff (what to do with findings)
+
+---
+
+## Director Gate Checks
+
+None. Code review is a read-only advisory skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Source file follows all coding standards
+
+**Fixture:**
+- `src/gameplay/health_component.gd` exists with:
+  - All public methods have doc comments (`##` notation)
+  - No singletons used; dependencies injected via constructor
+  - No hardcoded values; all constants reference `assets/data/`
+  - ADR reference in file header: `# ADR: docs/architecture/adr-004-health.md`
+  - Referenced ADR has `Status: Accepted`
+
+**Input:** `/code-review src/gameplay/health_component.gd`
+
+**Expected behavior:**
+1. Skill reads the source file
+2. Skill checks all coding standards: doc comments, DI, data-driven, ADR status
+3. All checks pass
+4. Skill outputs findings summary with all checks PASS
+5. Verdict is APPROVED
+
+**Assertions:**
+- [ ] Each coding standard check is listed in the output
+- [ ] All checks show PASS when standards are met
+- [ ] Skill reads referenced ADR to confirm its status
+- [ ] Verdict is APPROVED
+- [ ] No edits are made to any file
+
+---
+
+### Case 2: Needs Changes — Missing doc comment and singleton usage
+
+**Fixture:**
+- `src/ui/inventory_ui.gd` has:
+  - 2 public methods without doc comments
+  - Uses `GameManager.instance` (singleton pattern)
+  - All other standards met
+
+**Input:** `/code-review src/ui/inventory_ui.gd`
+
+**Expected behavior:**
+1. Skill reads the source file
+2. Skill detects: 2 missing doc comments on public methods
+3. Skill detects: singleton usage at specific lines (e.g., line 42, line 87)
+4. Findings list the exact method names and line numbers
+5. Verdict is NEEDS CHANGES
+
+**Assertions:**
+- [ ] Missing doc comments are listed with method names
+- [ ] Singleton usage is flagged with file and line number
+- [ ] Verdict is NEEDS CHANGES when BLOCKING-level standard violations exist
+- [ ] Skill does not edit the file — findings are for the developer to act on
+- [ ] Output suggests replacing singleton with dependency injection
+
+---
+
+### Case 3: Architecture Risk — ADR reference is Proposed, not Accepted
+
+**Fixture:**
+- `src/core/save_system.gd` has a header comment: `# ADR: docs/architecture/adr-010-save.md`
+- `adr-010-save.md` exists but has `Status: Proposed`
+- Code itself follows all other coding standards
+
+**Input:** `/code-review src/core/save_system.gd`
+
+**Expected behavior:**
+1. Skill reads the source file
+2. Skill reads referenced ADR — finds `Status: Proposed`
+3. Skill flags this as ARCHITECTURE RISK (code is implementing an unaccepted ADR)
+4. Other coding standard checks pass
+5. Verdict is CONCERNS (risk flag is advisory, not a hard NEEDS CHANGES)
+
+**Assertions:**
+- [ ] Skill reads referenced ADR file to check its status
+- [ ] ARCHITECTURE RISK is flagged when ADR status is Proposed
+- [ ] Verdict is CONCERNS (not NEEDS CHANGES) for ADR risk — advisory severity
+- [ ] Output recommends resolving the ADR before the code goes to production
+
+---
+
+### Case 4: Edge Case — No source files found at specified path
+
+**Fixture:**
+- User calls `/code-review src/networking/`
+- `src/networking/` directory does not exist
+
+**Input:** `/code-review src/networking/`
+
+**Expected behavior:**
+1. Skill attempts to read files in `src/networking/`
+2. Directory or files not found
+3. Skill outputs an error: "No source files found at `src/networking/`"
+4. Skill suggests checking `src/` for valid directories
+5. No verdict is emitted (nothing was reviewed)
+
+**Assertions:**
+- [ ] Skill does not crash when path does not exist
+- [ ] Output names the attempted path in the error message
+- [ ] Output suggests checking `src/` for valid file paths
+- [ ] No verdict is emitted when there is nothing to review
+
+---
+
+### Case 5: Gate Compliance — No gate; LP may be consulted separately
+
+**Fixture:**
+- Source file follows most standards but has 1 CONCERNS-level finding (a magic number)
+- `review-mode.txt` contains `full`
+
+**Input:** `/code-review src/gameplay/loot_system.gd`
+
+**Expected behavior:**
+1. Skill reads and reviews the source file
+2. No director gate is invoked (code review findings are advisory)
+3. Skill presents findings with the CONCERNS verdict
+4. Output notes: "Consider requesting a Lead Programmer review for architecture concerns"
+5. Skill does not invoke any agent automatically
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] LP consultation is suggested (not mandated) in the output
+- [ ] No code edits are made
+- [ ] Verdict is CONCERNS for advisory-level findings
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads source file(s) and coding standards before reviewing
+- [ ] Lists each coding standard check in findings output
+- [ ] Does not edit any source files (read-only skill)
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: APPROVED, CONCERNS, NEEDS CHANGES
+
+---
+
+## Coverage Notes
+
+- Batch review of all files in a directory is not explicitly tested; behavior
+  is assumed to apply the same checks file by file and aggregate the verdict.
+- Test coverage checks (verifying corresponding test files exist) are a stretch
+  goal not tested here; that is primarily the domain of `/test-evidence-review`.
--- a/Framework/skills/analysis/consistency-check.md
+++ b/Framework/skills/analysis/consistency-check.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /consistency-check
+
+## Skill Summary
+
+`/consistency-check` scans all GDDs in `design/gdd/` and checks for internal
+conflicts across documents. It produces a structured findings table with columns:
+System A vs System B, Conflict Type, Severity (HIGH / MEDIUM / LOW). Conflict
+types include: formula mismatch, competing ownership, stale reference, and
+dependency gap.
+
+The skill is read-only during analysis. It has no director gates. An optional
+consistency report can be written to `design/consistency-report-[date].md` if the
+user requests it, but the skill asks "May I write" before doing so.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
+- [ ] Does NOT require "May I write" language during analysis (read-only scan)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents that report writing is optional and requires approval
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents. Consistency
+checking is a mechanical scan; no creative or technical director review is
+required as part of the scan itself.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — 4 GDDs with no conflicts
+
+**Fixture:**
+- `design/gdd/` contains exactly 4 system GDDs
+- All GDDs have consistent formulas (no overlapping variables with different values)
+- No two GDDs claim ownership of the same game entity or mechanic
+- All dependency references point to GDDs that exist
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all 4 GDDs in `design/gdd/`
+2. Runs cross-GDD consistency checks (formulas, ownership, references)
+3. No conflicts found
+4. Outputs structured findings table showing 0 issues
+5. Verdict: CONSISTENT
+
+**Assertions:**
+- [ ] All 4 GDDs are read before producing output
+- [ ] Findings table is present (even if empty — shows "No conflicts found")
+- [ ] Verdict is CONSISTENT when no conflicts exist
+- [ ] Skill does NOT write any files without user approval
+- [ ] Next-step handoff is present
+
+---
+
+### Case 2: Failure Path — Two GDDs with conflicting damage formulas
+
+**Fixture:**
+- GDD-A defines damage formula: `damage = attack * 1.5`
+- GDD-B defines damage formula: `damage = attack * 2.0` for the same entity type
+- Both GDDs refer to the same "attack" variable
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all GDDs and detects the formula mismatch
+2. Findings table includes an entry: GDD-A vs GDD-B | Formula Mismatch | HIGH
+3. Specific conflicting formulas are shown (not just "formula conflict exists")
+4. Verdict: CONFLICTS FOUND
+
+**Assertions:**
+- [ ] Verdict is CONFLICTS FOUND (not CONSISTENT)
+- [ ] Conflict entry names both GDD filenames
+- [ ] Conflict type is "Formula Mismatch"
+- [ ] Severity is HIGH for a direct formula contradiction
+- [ ] Both conflicting formulas are shown in the findings table
+- [ ] Skill does NOT auto-resolve the conflict
+
+---
+
+### Case 3: Partial Path — GDD references a system with no GDD
+
+**Fixture:**
+- GDD-A's Dependencies section lists "system-B" as a dependency
+- No GDD for system-B exists in `design/gdd/`
+- All other GDDs are consistent
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all GDDs and checks dependency references
+2. GDD-A's reference to "system-B" cannot be resolved — no GDD exists for it
+3. Findings table includes: GDD-A vs (missing) | Dependency Gap | MEDIUM
+4. Verdict: DEPENDENCY GAP (not CONSISTENT, not CONFLICTS FOUND)
+
+**Assertions:**
+- [ ] Verdict is DEPENDENCY GAP (distinct from CONSISTENT and CONFLICTS FOUND)
+- [ ] Findings entry names GDD-A and the missing system-B
+- [ ] Severity is MEDIUM for an unresolved dependency reference
+- [ ] Skill suggests running `/design-system system-B` to create the missing GDD
+
+---
+
+### Case 4: Edge Case — No GDDs found
+
+**Fixture:**
+- `design/gdd/` directory is empty or does not exist
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill attempts to read files in `design/gdd/`
+2. No GDD files found
+3. Skill outputs an error: "No GDDs found in `design/gdd/`. Run `/design-system` to create GDDs first."
+4. No findings table is produced
+5. No verdict is issued
+
+**Assertions:**
+- [ ] Skill outputs a clear error message when no GDDs are found
+- [ ] No verdict is produced (CONSISTENT / CONFLICTS FOUND / DEPENDENCY GAP)
+- [ ] Skill recommends the correct next action (`/design-system`)
+- [ ] Skill does NOT crash or produce a partial report
+
+---
+
+### Case 5: Director Gate — No gate spawned; no review-mode.txt read
+
+**Fixture:**
+- `design/gdd/` contains ≥2 GDDs
+- `production/session-state/review-mode.txt` exists with `full`
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all GDDs and runs the consistency scan
+2. Skill does NOT read `production/session-state/review-mode.txt`
+3. No director gate agents are spawned at any point
+4. Findings table and verdict are produced normally
+
+**Assertions:**
+- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
+- [ ] Review mode has no effect on this skill's behavior
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads all GDDs before producing the findings table
+- [ ] Findings table shown in full before any write ask (if report is requested)
+- [ ] Verdict is one of exactly: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
+- [ ] No director gates — no review-mode.txt read
+- [ ] Report writing (if requested) gated by "May I write" approval
+- [ ] Ends with next-step handoff appropriate to verdict
+
+---
+
+## Coverage Notes
+
+- This skill checks for structural consistency between GDDs. Deep design theory
+  analysis (pillar drift, dominant strategies) is handled by `/review-all-gdds`.
+- Formula conflict detection relies on consistent formula notation across GDDs —
+  informal descriptions of the same mechanic may not be detected.
+- The conflict severity rubric (HIGH / MEDIUM / LOW) is defined in the skill body
+  and not re-enumerated here.
--- a/Framework/skills/analysis/content-audit.md
+++ b/Framework/skills/analysis/content-audit.md
@@ -0,0 +1,164 @@
+# Skill Test Spec: /content-audit
+
+## Skill Summary
+
+`/content-audit` reads GDDs in `design/gdd/` and checks whether all content
+items specified there (enemies, items, levels, etc.) are accounted for in
+`assets/`. It produces a gap table: Content Type → Specified Count → Found Count
+→ Missing Items. No director gates are invoked. The skill does not write without
+user approval. Verdicts: COMPLETE, GAPS FOUND, or MISSING CRITICAL CONTENT.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
+- [ ] Does NOT require "May I write" language (read-only output; write is optional report)
+- [ ] Has a next-step handoff (what to do after gap table is reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Content audit is a read-only analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All specified content present
+
+**Fixture:**
+- `design/gdd/enemies.md` specifies 4 enemy types: Grunt, Sniper, Tank, Boss
+- `assets/art/characters/` contains folders: `grunt/`, `sniper/`, `tank/`, `boss/`
+- `design/gdd/items.md` specifies 3 item types; all 3 found in `assets/data/items/`
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads all GDDs in `design/gdd/`
+2. Skill scans `assets/` for each specified content item
+3. All 4 enemy types and 3 item types are found
+4. Gap table shows: all rows have Found Count = Specified Count, no missing items
+5. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] Gap table covers all content types found in GDDs
+- [ ] Each row shows Specified Count and Found Count
+- [ ] No missing items when counts match
+- [ ] Verdict is COMPLETE
+- [ ] No files are written
+
+---
+
+### Case 2: Gaps Found — Enemy type missing from assets
+
+**Fixture:**
+- `design/gdd/enemies.md` specifies 3 enemy types: Grunt, Sniper, Boss
+- `assets/art/characters/` contains: `grunt/`, `sniper/` only (Boss folder missing)
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads GDD — finds 3 enemy types specified
+2. Skill scans `assets/art/characters/` — finds only 2
+3. Gap table row for enemies: Specified 3, Found 2, Missing: Boss
+4. Verdict is GAPS FOUND
+
+**Assertions:**
+- [ ] Gap table row identifies "Boss" as the missing item by name
+- [ ] Specified Count (3) and Found Count (2) are both shown
+- [ ] Verdict is GAPS FOUND when any content item is missing
+- [ ] Skill does not assume the asset will be added later — it flags it now
+
+---
+
+### Case 3: No GDD Content Specs Found — Guidance given
+
+**Fixture:**
+- `design/gdd/` contains only `core-loop.md` which has no content inventory section
+- No other GDDs exist with content specifications
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads all GDDs — finds no content inventory sections
+2. Skill outputs: "No content specifications found in GDDs — run /design-system first to define content lists"
+3. No gap table is produced
+4. Verdict is GAPS FOUND (cannot confirm completeness without specs)
+
+**Assertions:**
+- [ ] Skill does not produce a gap table when no GDD content specs exist
+- [ ] Output recommends running `/design-system`
+- [ ] Verdict reflects inability to confirm completeness
+
+---
+
+### Case 4: Edge Case — Asset in wrong format for target platform
+
+**Fixture:**
+- `design/gdd/audio.md` specifies audio assets as OGG format
+- `assets/audio/sfx/jump.wav` exists (WAV format, not OGG)
+- `assets/audio/sfx/land.ogg` exists (correct format)
+- `technical-preferences.md` specifies audio format: OGG
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads GDD audio spec and technical preferences for format requirements
+2. Skill finds `jump.wav` — present but in wrong format
+3. Gap table row for audio: Specified 2, Found 2 (by name), but `jump.wav` flagged as FORMAT ISSUE
+4. Verdict is GAPS FOUND (format compliance is part of content completeness)
+
+**Assertions:**
+- [ ] Skill checks asset format against GDD or technical preferences when format is specified
+- [ ] `jump.wav` is flagged as FORMAT ISSUE with expected format (OGG) noted
+- [ ] Format issues are distinct from missing content in the gap table
+- [ ] Verdict is GAPS FOUND when format issues exist
+
+---
+
+### Case 5: Gate Compliance — Read-only; no gate; gap table for human review
+
+**Fixture:**
+- GDDs specify 10 content items; 9 are found in assets; 1 is missing
+- `review-mode.txt` contains `full`
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads GDDs and scans assets; produces gap table
+2. No director gate is invoked regardless of review mode
+3. Skill presents gap table to user as read-only output
+4. Verdict is GAPS FOUND
+5. Skill offers to write an audit report but does not write automatically
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Gap table is presented without auto-writing any file
+- [ ] Optional report write is offered but not forced
+- [ ] Skill does not modify any asset files
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads GDDs and asset directory before producing gap table
+- [ ] Gap table shows Content Type, Specified Count, Found Count, Missing Items
+- [ ] Does not write files without explicit user approval
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
+
+---
+
+## Coverage Notes
+
+- MISSING CRITICAL CONTENT verdict (vs. GAPS FOUND) is triggered when the
+  missing item is tagged as critical in the GDD; this is not explicitly tested
+  but follows the same detection path.
+- The case where `assets/` directory does not exist is not tested; the skill
+  would produce a MISSING CRITICAL CONTENT verdict for all specified items.
--- a/Framework/skills/analysis/estimate.md
+++ b/Framework/skills/analysis/estimate.md
@@ -0,0 +1,168 @@
+# Skill Test Spec: /estimate
+
+## Skill Summary
+
+`/estimate` estimates task or story effort using a relative-size scale (S / M /
+L / XL) based on story complexity, acceptance criteria count, and historical
+sprint velocity from past sprint files. Estimates are advisory and are never
+written automatically. No director gates are invoked. Verdicts are effort ranges,
+not pass/fail — every run produces an estimate.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains size labels: S, M, L, XL (the "verdict" equivalents for this skill)
+- [ ] Does NOT require "May I write" language (advisory output only)
+- [ ] Has a next-step handoff (how to use the estimate in sprint planning)
+
+---
+
+## Director Gate Checks
+
+None. Estimation is an advisory informational skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Clear story with known tech stack
+
+**Fixture:**
+- `production/epics/combat/story-hitbox-detection.md` exists with:
+  - 4 clear Acceptance Criteria
+  - ADR reference (Accepted status)
+  - No "unknown" or "TBD" language in story body
+- `production/sprints/sprint-003.md` through `sprint-005.md` exist with velocity data
+- Tech stack is GDScript (well-understood by team per sprint history)
+
+**Input:** `/estimate production/epics/combat/story-hitbox-detection.md`
+
+**Expected behavior:**
+1. Skill reads the story file — assesses clarity, AC count, tech stack
+2. Skill reads sprint history to determine average velocity
+3. Skill outputs estimate: M (1–2 days) with reasoning
+4. No files are written
+
+**Assertions:**
+- [ ] Estimate is M for a clear, well-scoped story with known tech
+- [ ] Reasoning references AC count, tech stack familiarity, and velocity data
+- [ ] Estimate is presented as a range (e.g., "1–2 days"), not a single point
+- [ ] No files are written
+
+---
+
+### Case 2: High Uncertainty — Unknown system, no ADR yet
+
+**Fixture:**
+- `production/epics/online/story-lobby-matchmaking.md` exists with:
+  - 2 vague Acceptance Criteria (using "should" and "TBD")
+  - No ADR reference — matchmaking architecture not yet decided
+  - References new subsystem ("online/matchmaking") with no existing source files
+
+**Input:** `/estimate production/epics/online/story-lobby-matchmaking.md`
+
+**Expected behavior:**
+1. Skill reads story — finds vague AC, no ADR, no existing source
+2. Skill flags multiple uncertainty factors
+3. Estimate is L–XL with an explicit risk note: "Estimate range is wide due to architectural unknowns"
+4. Skill recommends creating an ADR before development begins
+
+**Assertions:**
+- [ ] Estimate is L or XL (not S or M) when significant unknowns exist
+- [ ] Risk note explains the specific unknowns driving the wide range
+- [ ] Output recommends resolving architectural questions first
+- [ ] No files are written
+
+---
+
+### Case 3: No Sprint Velocity Data — Conservative defaults used
+
+**Fixture:**
+- Story file exists and is well-defined
+- `production/sprints/` is empty — no historical sprints
+
+**Input:** `/estimate production/epics/core/story-save-load.md`
+
+**Expected behavior:**
+1. Skill reads story — assesses complexity
+2. Skill attempts to read sprint velocity data — finds none
+3. Skill notes: "No sprint history found — using conservative defaults for velocity"
+4. Estimate is produced using default assumptions (e.g., 1 story point = 1 day)
+5. No files are written
+
+**Assertions:**
+- [ ] Skill does not error when no sprint history exists
+- [ ] Output explicitly notes that conservative defaults are being used
+- [ ] Estimate is still produced (not blocked by missing velocity)
+- [ ] Conservative defaults produce a higher (not lower) estimate range
+
+---
+
+### Case 4: Multiple Stories — Each estimated individually plus sprint total
+
+**Fixture:**
+- User provides a sprint file: `production/sprints/sprint-007.md` with 4 stories
+- Sprint history exists (3 previous sprints)
+
+**Input:** `/estimate production/sprints/sprint-007.md`
+
+**Expected behavior:**
+1. Skill reads sprint file — identifies 4 stories
+2. Skill estimates each story individually: S, M, M, L
+3. Skill computes sprint total: approximately 6–8 story points
+4. Skill presents per-story estimates followed by sprint total
+5. No files are written
+
+**Assertions:**
+- [ ] Each story receives its own estimate label
+- [ ] Sprint total is presented after individual estimates
+- [ ] Total is a sum range derived from individual ranges
+- [ ] Skill handles sprint files (not just single story files) as input
+
+---
+
+### Case 5: Gate Compliance — No gate; estimates are informational
+
+**Fixture:**
+- Story file exists with medium complexity
+- `review-mode.txt` contains `full`
+
+**Input:** `/estimate production/epics/core/story-item-pickup.md`
+
+**Expected behavior:**
+1. Skill reads story and sprint history; computes estimate
+2. No director gate is invoked in any review mode
+3. Estimate is presented as advisory output only
+4. Skill notes: "Use this estimate in /sprint-plan when selecting stories for the next sprint"
+
+**Assertions:**
+- [ ] No director gate is invoked regardless of review mode
+- [ ] Output is purely informational — no approval or write prompt
+- [ ] Next-step recommendation references `/sprint-plan`
+- [ ] Estimate does not change based on review mode
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads story file before estimating
+- [ ] Reads sprint velocity history when available
+- [ ] Produces effort range (S/M/L/XL), not a single number
+- [ ] Does not write any files
+- [ ] No director gates are invoked
+- [ ] Always produces an estimate (never blocked by missing data; uses defaults instead)
+
+---
+
+## Coverage Notes
+
+- The skill does not produce PASS/FAIL verdicts; the "verdict" here is the
+  effort range itself. Test assertions focus on the accuracy of the range
+  and the quality of the reasoning, not a binary outcome.
+- Team-specific velocity calibration (what "M" means for this team) is an
+  implementation detail not tested here; it is configured via sprint history.
--- a/Framework/skills/analysis/perf-profile.md
+++ b/Framework/skills/analysis/perf-profile.md
@@ -0,0 +1,171 @@
+# Skill Test Spec: /perf-profile
+
+## Skill Summary
+
+`/perf-profile` is a structured performance profiling workflow that identifies
+bottlenecks and recommends optimizations. If profiler data or performance logs
+are provided, it analyzes them directly. If not, it guides the user through a
+manual profiling checklist. No director gates are invoked. The skill asks
+"May I write to `production/qa/perf-[date].md`?" before persisting a report.
+Verdicts: WITHIN BUDGET, CONCERNS, or OVER BUDGET.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: WITHIN BUDGET, CONCERNS, OVER BUDGET
+- [ ] Contains "May I write" language (skill writes perf report)
+- [ ] Has a next-step handoff (what to do after performance findings are reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Performance profiling is an advisory analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Frame data provided, draw call spike found
+
+**Fixture:**
+- User provides `production/qa/profiler-export-2026-03-15.json` with frame time data
+- Data shows: average frame time 14ms (within 16.6ms budget), but frames 42–48 spike to 28ms
+- Spike correlates with a scene with 450 draw calls (budget: 200)
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-03-15.json`
+
+**Expected behavior:**
+1. Skill reads profiler data
+2. Skill identifies average frame time is within budget
+3. Skill identifies draw call spike on frames 42–48 (450 calls vs 200 budget)
+4. Verdict is CONCERNS (average OK, but spikes indicate an issue)
+5. Skill recommends batching or culling for the identified scene
+6. Skill asks "May I write to `production/qa/perf-2026-04-06.md`?"
+
+**Assertions:**
+- [ ] Spike frames are identified by frame number
+- [ ] Draw call count and budget are compared explicitly
+- [ ] Verdict is CONCERNS when spikes exceed budget even if average is OK
+- [ ] At least one specific optimization recommendation is given
+- [ ] "May I write" prompt appears before writing report
+
+---
+
+### Case 2: No Profiler Data — Manual checklist output
+
+**Fixture:**
+- User runs `/perf-profile` with no arguments
+- No profiler data files exist in `production/qa/`
+
+**Input:** `/perf-profile`
+
+**Expected behavior:**
+1. Skill finds no profiler data
+2. Skill outputs a manual profiling checklist for the user to work through:
+   - Enable Godot profiler or target engine's profiler
+   - Record a 60-second play session
+   - Export frame time data
+   - Note any dropped frames or hitches
+3. Skill asks user to provide data once collected before running analysis
+
+**Assertions:**
+- [ ] Skill does not crash or emit a verdict when no data is provided
+- [ ] Manual profiling checklist is output (actionable steps, not just an error)
+- [ ] No verdict is emitted (there is nothing to assess yet)
+- [ ] No files are written
+
+---
+
+### Case 3: Over Budget — Frame budget exceeded for target platform
+
+**Fixture:**
+- Profiler data shows consistent 22ms frame times (target: 16.6ms for 60fps)
+- All frames exceed budget; no single spike — systemic issue
+- `technical-preferences.md` specifies target platform: PC, 60fps
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-03-20.json`
+
+**Expected behavior:**
+1. Skill reads profiler data and technical preferences for performance budget
+2. All frames are over the 16.6ms budget
+3. Verdict is OVER BUDGET
+4. Skill outputs a prioritized optimization list (e.g., LOD system, shader complexity, physics tick rate)
+5. Skill asks "May I write" before writing report
+
+**Assertions:**
+- [ ] Verdict is OVER BUDGET when all or most frames exceed budget
+- [ ] Target frame budget is read from `technical-preferences.md` (not hardcoded)
+- [ ] Optimization priority list is provided, not just the raw verdict
+- [ ] "May I write" prompt appears before report write
+
+---
+
+### Case 4: Previous Perf Report Exists — Delta comparison
+
+**Fixture:**
+- `production/qa/perf-2026-03-28.md` exists with prior results (avg 15ms, max 19ms)
+- New profiler export shows: avg 13ms, max 17ms
+- Both reports are for the same scene
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-04-05.json`
+
+**Expected behavior:**
+1. Skill reads new profiler data
+2. Skill detects prior report for the same scene
+3. Skill computes deltas: avg improved 2ms, max improved 2ms
+4. Skill presents regression check: no regressions detected
+5. Verdict is WITHIN BUDGET; report notes improvement since last profile
+
+**Assertions:**
+- [ ] Skill checks `production/qa/` for prior perf reports before writing
+- [ ] Delta comparison is shown (prior vs. current for key metrics)
+- [ ] Verdict is WITHIN BUDGET when current metrics are within budget
+- [ ] Improvement trend is noted positively in the report
+
+---
+
+### Case 5: Gate Compliance — No gate; performance-analyst separate
+
+**Fixture:**
+- Profiler data shows CONCERNS-level findings (some spikes)
+- `review-mode.txt` contains `full`
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-04-01.json`
+
+**Expected behavior:**
+1. Skill analyzes profiler data; verdict is CONCERNS
+2. No director gate is invoked regardless of review mode
+3. Output notes: "For in-depth analysis, consider running `/perf-profile` with the performance-analyst agent"
+4. Skill asks "May I write" and writes report on user approval
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Performance-analyst consultation is suggested (not mandated)
+- [ ] "May I write" prompt appears before report write
+- [ ] Verdict is CONCERNS for spike-based findings
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads profiler data when provided; outputs checklist when not
+- [ ] Reads `technical-preferences.md` for target platform frame budget
+- [ ] Checks for prior perf reports to enable delta comparison
+- [ ] Always asks "May I write" before writing report
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: WITHIN BUDGET, CONCERNS, OVER BUDGET
+
+---
+
+## Coverage Notes
+
+- Platform-specific profiling workflows (console, mobile) are not tested here;
+  the checklist output in Case 2 would be platform-specific in practice.
+- The delta comparison in Case 4 assumes reports cover the same scene; cross-scene
+  comparisons are not explicitly handled.
--- a/Framework/skills/analysis/scope-check.md
+++ b/Framework/skills/analysis/scope-check.md
@@ -0,0 +1,168 @@
+# Skill Test Spec: /scope-check
+
+## Skill Summary
+
+`/scope-check` is a Haiku-tier read-only skill that analyzes a feature, sprint,
+or story for scope creep risk. It reads sprint and story files and compares them
+against the active milestone goals. It is designed for fast, low-cost checks
+before or during planning. No director gates are invoked. No files are written.
+Verdicts: ON SCOPE, CONCERNS, or SCOPE CREEP DETECTED.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff (what to do based on verdict)
+
+---
+
+## Director Gate Checks
+
+None. Scope check is a read-only advisory skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Sprint stories align with milestone goals
+
+**Fixture:**
+- `production/milestones/milestone-03.md` lists 3 goals: combat system, enemy AI, level loading
+- `production/sprints/sprint-006.md` contains 5 stories, all tagged to one of the 3 goals
+- `production/session-state/active.md` references milestone-03 as the active milestone
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads active milestone goals from milestone-03
+2. Skill reads sprint-006 stories and checks each against milestone goals
+3. All 5 stories map to one of the 3 goals
+4. Skill outputs a mapping table: story → milestone goal
+5. Verdict is ON SCOPE
+
+**Assertions:**
+- [ ] Each story is mapped to a milestone goal in the output
+- [ ] Verdict is ON SCOPE when all stories map to milestone goals
+- [ ] No files are written
+- [ ] Skill does not modify sprint or milestone files
+
+---
+
+### Case 2: Scope Creep Detected — Stories introducing systems not in milestone
+
+**Fixture:**
+- `production/milestones/milestone-03.md` goals: combat, enemy AI, level loading
+- `production/sprints/sprint-006.md` contains 5 stories:
+  - 3 stories map to milestone goals
+  - 2 stories reference "online leaderboard" and "achievement system" (not in milestone-03)
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads milestone goals and sprint stories
+2. Skill identifies 2 stories with no matching milestone goal
+3. Skill names the out-of-scope stories: "Online Leaderboard Feature", "Achievement System Setup"
+4. Verdict is SCOPE CREEP DETECTED
+
+**Assertions:**
+- [ ] Out-of-scope stories are named explicitly in the output
+- [ ] Verdict is SCOPE CREEP DETECTED when any story has no milestone goal match
+- [ ] Skill does not automatically remove the stories — findings are advisory
+- [ ] Output recommends deferring the out-of-scope stories to a later milestone
+
+---
+
+### Case 3: No Milestone Defined — CONCERNS; scope cannot be validated
+
+**Fixture:**
+- `production/session-state/active.md` has no milestone reference
+- `production/milestones/` directory exists but is empty
+- `production/sprints/sprint-006.md` has 4 stories
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads active.md — finds no milestone reference
+2. Skill checks `production/milestones/` — no milestone files found
+3. Skill outputs: "No active milestone defined — scope cannot be validated"
+4. Verdict is CONCERNS
+
+**Assertions:**
+- [ ] Skill does not error when no milestone is defined
+- [ ] Output explicitly states that scope validation requires a milestone reference
+- [ ] Verdict is CONCERNS (not ON SCOPE or SCOPE CREEP DETECTED without data)
+- [ ] Output suggests running `/milestone-review` or creating a milestone
+
+---
+
+### Case 4: Single Story Check — Evaluated against its parent epic
+
+**Fixture:**
+- User targets a single story: `production/epics/combat/story-parry-timing.md`
+- Story references parent epic: `epic-combat.md`
+- `production/epics/combat/epic-combat.md` has scope: "melee combat mechanics"
+- Story title: "Implement parry timing window" — matches epic scope
+
+**Input:** `/scope-check production/epics/combat/story-parry-timing.md`
+
+**Expected behavior:**
+1. Skill reads the specified story file
+2. Skill reads the parent epic to get scope definition
+3. Skill evaluates story against epic scope — "parry timing" matches "melee combat"
+4. Verdict is ON SCOPE
+
+**Assertions:**
+- [ ] Single-file argument is accepted (story path, not sprint)
+- [ ] Skill reads the parent epic referenced in the story file
+- [ ] Story is evaluated against epic scope (not milestone scope) in single-story mode
+- [ ] Verdict is ON SCOPE when story matches epic scope
+
+---
+
+### Case 5: Gate Compliance — No gate; PR may be consulted separately
+
+**Fixture:**
+- Sprint has 2 SCOPE CREEP stories and 3 ON SCOPE stories
+- `review-mode.txt` contains `full`
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads milestone and sprint; identifies 2 scope creep items
+2. No director gate is invoked regardless of review mode
+3. Skill presents findings with SCOPE CREEP DETECTED verdict
+4. Output notes: "Consider raising scope concerns with the Producer before sprint begins"
+5. Skill ends without writing any files
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Producer consultation is suggested (not mandated)
+- [ ] No files are written
+- [ ] Verdict is SCOPE CREEP DETECTED
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads milestone goals and sprint/story files before analysis
+- [ ] Maps each story to a milestone goal (or flags as unmapped)
+- [ ] Does not write any files
+- [ ] No director gates are invoked
+- [ ] Runs on Haiku model tier (fast, low-cost)
+- [ ] Verdict is one of: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
+
+---
+
+## Coverage Notes
+
+- The case where the sprint file itself does not exist is not tested; the
+  skill would output a CONCERNS verdict with a message about missing sprint data.
+- Partial scope overlap (story touches a milestone goal but also introduces
+  new scope) is not explicitly tested; implementation may classify this as
+  CONCERNS rather than SCOPE CREEP DETECTED.
--- a/Framework/skills/analysis/security-audit.md
+++ b/Framework/skills/analysis/security-audit.md
@@ -0,0 +1,167 @@
+# Skill Test Spec: /security-audit
+
+## Skill Summary
+
+`/security-audit` audits the game for security risks including save data
+integrity, network communication, anti-cheat exposure, and data privacy. It
+reads source files in `src/` for security patterns and checks whether sensitive
+data is handled correctly. No director gates are invoked. The skill does not
+write files (findings report only). Verdicts: SECURE, CONCERNS, or
+VULNERABILITIES FOUND.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: SECURE, CONCERNS, VULNERABILITIES FOUND
+- [ ] Does NOT require "May I write" language (read-only; findings report only)
+- [ ] Has a next-step handoff (what to do with findings)
+
+---
+
+## Director Gate Checks
+
+None. Security audit is a read-only advisory skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Save data encrypted, no hardcoded credentials
+
+**Fixture:**
+- `src/core/save_system.gd` uses `Crypto` class to encrypt save data before writing
+- No hardcoded API keys, passwords, or credentials in any `src/` file
+- No version numbers or internal build IDs exposed in client-facing output
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans `src/` for security patterns: encryption usage, hardcoded credentials, exposed internals
+2. All checks pass: save data encrypted, no credentials found, no exposed internals
+3. Findings report shows all checks PASS
+4. Verdict is SECURE
+
+**Assertions:**
+- [ ] Skill checks save data handling for encryption usage
+- [ ] Skill scans for hardcoded credentials (API keys, passwords, tokens)
+- [ ] Skill checks for version/build numbers exposed to players
+- [ ] All checks shown in findings report
+- [ ] Verdict is SECURE when all checks pass
+
+---
+
+### Case 2: Vulnerabilities Found — Unencrypted save data and exposed version
+
+**Fixture:**
+- `src/core/save_system.gd` writes save data as plain JSON (no encryption)
+- `src/ui/debug_overlay.gd` contains: `label.text = "Build: " + ProjectSettings.get("application/config/version")`
+  (exposes internal build version to player)
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans `src/` — finds unencrypted save write in `save_system.gd`
+2. Skill finds exposed version string in `debug_overlay.gd`
+3. Both findings are flagged as VULNERABILITIES
+4. Verdict is VULNERABILITIES FOUND
+5. Skill provides remediation recommendations for each vulnerability
+
+**Assertions:**
+- [ ] Unencrypted save data is flagged as a vulnerability with file and approximate line
+- [ ] Exposed version string is flagged as a vulnerability
+- [ ] Remediation suggestion is given for each vulnerability
+- [ ] Verdict is VULNERABILITIES FOUND when any vulnerability is detected
+- [ ] No files are written or modified
+
+---
+
+### Case 3: Online Features Without Authentication — CONCERNS
+
+**Fixture:**
+- `src/networking/lobby.gd` exists with functions: `join_lobby()`, `send_chat()`
+- No authentication check is found before `send_chat()` — players can call it without being verified
+- Game has online multiplayer features (inferred from file presence)
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans `src/networking/` — detects online feature code
+2. Skill checks for authentication guard before network calls — finds none on `send_chat()`
+3. Flags: "Online feature without authentication check — CONCERNS"
+4. Verdict is CONCERNS (not VULNERABILITIES FOUND, as this is a missing control, not an exploit)
+
+**Assertions:**
+- [ ] Skill detects online features by scanning for networking source files
+- [ ] Missing authentication checks before network operations are flagged
+- [ ] Verdict is CONCERNS (advisory severity) for missing authentication guards
+- [ ] Output recommends adding authentication before network calls
+
+---
+
+### Case 4: Edge Case — No Source Files to Analyze
+
+**Fixture:**
+- `src/` directory does not exist or is completely empty
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill attempts to scan `src/` — no files found
+2. Skill outputs an error: "No source files found in `src/` — nothing to audit"
+3. No findings report is generated
+4. No verdict is emitted
+
+**Assertions:**
+- [ ] Skill does not crash when `src/` is empty or absent
+- [ ] Output clearly states that no source files were found
+- [ ] No verdict is emitted (there is nothing to assess)
+- [ ] Skill suggests verifying the `src/` directory path
+
+---
+
+### Case 5: Gate Compliance — No gate; security-engineer invoked separately
+
+**Fixture:**
+- Source files exist; 1 CONCERNS-level finding detected (debug logging enabled in release build)
+- `review-mode.txt` contains `full`
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans source; finds debug logging active in release path
+2. No director gate is invoked regardless of review mode
+3. Verdict is CONCERNS
+4. Output notes: "For formal security review, consider engaging a security-engineer agent"
+5. Findings are presented as a read-only report; no files written
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Security-engineer consultation is suggested (not mandated)
+- [ ] No files are written
+- [ ] Verdict is CONCERNS for advisory-level security findings
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads source files in `src/` before auditing
+- [ ] Checks save data encryption, hardcoded credentials, exposed internals, auth guards
+- [ ] Provides remediation recommendations for each finding
+- [ ] Does not write any files (read-only skill)
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: SECURE, CONCERNS, VULNERABILITIES FOUND
+
+---
+
+## Coverage Notes
+
+- Anti-cheat analysis (client-side value validation, server authority) is not
+  explicitly tested here; it follows the CONCERNS or VULNERABILITIES pattern
+  depending on severity.
+- Data privacy compliance (GDPR, COPPA) is out of scope for this spec; those
+  require legal review beyond code scanning.
--- a/Framework/skills/analysis/tech-debt.md
+++ b/Framework/skills/analysis/tech-debt.md
@@ -0,0 +1,171 @@
+# Skill Test Spec: /tech-debt
+
+## Skill Summary
+
+`/tech-debt` tracks, categorizes, and prioritizes technical debt across the
+codebase. It reads `docs/tech-debt-register.md` for the existing debt register
+and scans source files in `src/` for inline `TODO` and `FIXME` comments. It
+merges and sorts items by severity. No director gates are invoked. The skill
+asks "May I write to `docs/tech-debt-register.md`?" before updating. Verdicts:
+REGISTER UPDATED or NO NEW DEBT FOUND.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: REGISTER UPDATED, NO NEW DEBT FOUND
+- [ ] Contains "May I write" language (skill writes to debt register)
+- [ ] Has a next-step handoff (what to do after register is updated)
+
+---
+
+## Director Gate Checks
+
+None. Tech debt tracking is an internal codebase analysis skill; no gates are
+invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Inline TODOs plus existing register items merged
+
+**Fixture:**
+- `docs/tech-debt-register.md` exists with 2 items (LOW and MEDIUM severity)
+- `src/gameplay/combat.gd` has 2 `# TODO` comments and 1 `# FIXME` comment
+- `src/ui/hud.gd` has 0 inline debt comments
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill reads `docs/tech-debt-register.md` — finds 2 existing items
+2. Skill scans `src/` — finds 3 inline comments (2 TODOs, 1 FIXME)
+3. Skill checks whether inline comments already exist in the register (deduplication)
+4. Skill presents combined list sorted by severity (FIXME before TODO by default)
+5. Skill asks "May I write to `docs/tech-debt-register.md`?"
+6. User approves; register updated; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] Inline comments are found by scanning `src/` recursively
+- [ ] Existing register items are not duplicated
+- [ ] Combined list is sorted by severity
+- [ ] "May I write" prompt appears before any write
+- [ ] Verdict is REGISTER UPDATED
+
+---
+
+### Case 2: Register Doesn't Exist — Offered to create it
+
+**Fixture:**
+- `docs/tech-debt-register.md` does NOT exist
+- `src/` contains 4 inline TODO/FIXME comments
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill attempts to read `docs/tech-debt-register.md` — not found
+2. Skill informs user: "No tech-debt-register.md found"
+3. Skill offers to create the register with the inline items it found
+4. Skill asks "May I write to `docs/tech-debt-register.md`?" (create)
+5. User approves; register created with 4 items; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] Skill does not crash when register file is absent
+- [ ] User is offered register creation (not silently skipping)
+- [ ] "May I write" prompt reflects file creation (not update)
+- [ ] Verdict is REGISTER UPDATED after creation
+
+---
+
+### Case 3: Resolved Item Detected — Marked resolved in register
+
+**Fixture:**
+- `docs/tech-debt-register.md` has 3 items; one references `src/gameplay/legacy_input.gd`
+- `src/gameplay/legacy_input.gd` has been deleted (refactored away)
+- The referenced TODO comment no longer exists in source
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill reads register — finds 3 items
+2. Skill scans `src/` — does not find the source location referenced by item 2
+3. Skill flags item 2 as RESOLVED (source is gone)
+4. Skill presents the resolved item to user for confirmation
+5. On approval, register is updated with item 2 marked `Status: Resolved`
+
+**Assertions:**
+- [ ] Skill checks whether each register item's source reference still exists
+- [ ] Missing source locations result in items being flagged as RESOLVED
+- [ ] User confirms before resolved items are written
+- [ ] RESOLVED items are kept in the register (not deleted) for audit history
+
+---
+
+### Case 4: Edge Case — CRITICAL debt item surfaces prominently
+
+**Fixture:**
+- `src/core/network_sync.gd` has a comment: `# FIXME(CRITICAL): race condition in sync buffer — can corrupt save data`
+- `docs/tech-debt-register.md` exists with 5 lower-severity items
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill scans source and finds the CRITICAL-tagged FIXME
+2. Skill presents the CRITICAL item at the top of the output — before the full table
+3. Skill asks user to acknowledge the critical item before proceeding
+4. After acknowledgment, skill presents full debt table and asks to write
+5. Register is updated with CRITICAL item at top; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] CRITICAL items appear at the top of the output, not buried in the table
+- [ ] Skill surfaces CRITICAL items before asking to write
+- [ ] User acknowledgment of the CRITICAL item is requested
+- [ ] CRITICAL severity is preserved in the written register entry
+
+---
+
+### Case 5: Gate Compliance — No gate; register updated only with approval
+
+**Fixture:**
+- Inline scan finds 2 new TODOs; register has 3 existing items
+- `review-mode.txt` contains `full`
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill scans source and reads register; compiles combined debt list
+2. No director gate is invoked regardless of review mode
+3. Skill presents sorted debt table to user
+4. Skill asks "May I write to `docs/tech-debt-register.md`?"
+5. User approves; register updated; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Debt table is presented before any write prompt
+- [ ] "May I write" prompt appears before file update
+- [ ] Write only occurs with explicit user approval
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `docs/tech-debt-register.md` and scans `src/` before compiling
+- [ ] Deduplicates inline comments against existing register items
+- [ ] Sorts combined list by severity
+- [ ] Always asks "May I write" before updating register
+- [ ] No director gates are invoked
+- [ ] Verdict is REGISTER UPDATED or NO NEW DEBT FOUND
+
+---
+
+## Coverage Notes
+
+- The case where `src/` is empty or absent is not tested; behavior follows
+  the NO NEW DEBT FOUND path for the inline scan, but register items would
+  still be read and presented.
+- TODO comments without severity tags are treated as LOW severity by default;
+  this classification detail is an implementation concern, not tested here.
--- a/Framework/skills/analysis/test-evidence-review.md
+++ b/Framework/skills/analysis/test-evidence-review.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /test-evidence-review
+
+## Skill Summary
+
+`/test-evidence-review` performs a quality review of test files in `tests/`,
+checking test naming conventions, determinism, isolation, and absence of
+hardcoded magic numbers — all against the project's test standards defined in
+`coding-standards.md`. Findings may be flagged for qa-lead review. No director
+gates are invoked. The skill does not write without user approval. Verdicts:
+PASS, WARNINGS, or FAIL.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: PASS, WARNINGS, FAIL
+- [ ] Does NOT require "May I write" language (read-only; write is optional flagging report)
+- [ ] Has a next-step handoff (what to do after findings are reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Test evidence review is an advisory quality skill; QL-TEST-COVERAGE gate
+is a separate skill invocation and is NOT triggered here.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Tests follow all standards
+
+**Fixture:**
+- `tests/unit/combat/health_system_take_damage_test.gd` exists with:
+  - Naming: `test_health_system_take_damage_reduces_health()` (follows `test_[system]_[scenario]_[expected]`)
+  - Arrange/Act/Assert structure present
+  - No `sleep()`, `await` with time values, or random seeds
+  - No calls to external APIs or file I/O
+  - No inline magic numbers (uses constants from `tests/unit/combat/fixtures/`)
+
+**Input:** `/test-evidence-review tests/unit/combat/`
+
+**Expected behavior:**
+1. Skill reads test standards from `coding-standards.md`
+2. Skill reads the test file; checks all 5 standards
+3. All checks pass: naming, structure, determinism, isolation, no hardcoded data
+4. Verdict is PASS
+
+**Assertions:**
+- [ ] Each of the 5 test standards is checked and reported
+- [ ] All checks show PASS when standards are met
+- [ ] Verdict is PASS
+- [ ] No files are written
+
+---
+
+### Case 2: Fail — Timing dependency detected
+
+**Fixture:**
+- `tests/unit/ui/hud_update_test.gd` contains:
+  ```gdscript
+  await get_tree().create_timer(1.0).timeout
+  assert_eq(label.text, "Ready")
+  ```
+- Real-time wait of 1 second used instead of mock or signal-based assertion
+
+**Input:** `/test-evidence-review tests/unit/ui/hud_update_test.gd`
+
+**Expected behavior:**
+1. Skill reads the test file
+2. Skill detects real-time wait (`create_timer(1.0)`) — non-deterministic timing dependency
+3. Skill flags this as a FAIL-level finding
+4. Verdict is FAIL
+5. Skill recommends replacing the timer with a signal-based assertion or mock
+
+**Assertions:**
+- [ ] Real-time wait usage is detected as a non-deterministic timing dependency
+- [ ] Finding is classified as FAIL severity (blocking — violates determinism standard)
+- [ ] Verdict is FAIL
+- [ ] Remediation suggestion references signal-based or mock-based approach
+- [ ] Skill does not edit the test file
+
+---
+
+### Case 3: Fail — Test calls external API directly
+
+**Fixture:**
+- `tests/unit/networking/auth_test.gd` contains:
+  ```gdscript
+  var result = HTTPRequest.new().request("https://api.example.com/auth")
+  ```
+- Direct HTTP call to external API without a mock
+
+**Input:** `/test-evidence-review tests/unit/networking/auth_test.gd`
+
+**Expected behavior:**
+1. Skill reads the test file
+2. Skill detects direct external API call (HTTPRequest to live URL)
+3. Skill flags this as a FAIL-level finding — violates isolation standard
+4. Verdict is FAIL
+5. Skill recommends injecting a mock HTTP client
+
+**Assertions:**
+- [ ] Direct external API call is detected and flagged
+- [ ] Finding is classified as FAIL severity (violates isolation standard)
+- [ ] Verdict is FAIL
+- [ ] Remediation references dependency injection with a mock HTTP client
+- [ ] Skill does not modify the test file
+
+---
+
+### Case 4: Edge Case — No Test Files Found
+
+**Fixture:**
+- User calls `/test-evidence-review tests/unit/audio/`
+- `tests/unit/audio/` directory does not exist
+
+**Input:** `/test-evidence-review tests/unit/audio/`
+
+**Expected behavior:**
+1. Skill attempts to read files in `tests/unit/audio/` — not found
+2. Skill outputs: "No test files found at `tests/unit/audio/` — run `/test-setup` to scaffold test directories"
+3. No verdict is emitted
+
+**Assertions:**
+- [ ] Skill does not crash when path does not exist
+- [ ] Output names the attempted path in the message
+- [ ] Output recommends `/test-setup` for scaffolding
+- [ ] No verdict is emitted when there is nothing to review
+
+---
+
+### Case 5: Gate Compliance — No gate; QL-TEST-COVERAGE is a separate skill
+
+**Fixture:**
+- Test file has 1 WARNINGS-level finding (magic number in a non-boundary test)
+- `review-mode.txt` contains `full`
+
+**Input:** `/test-evidence-review tests/unit/combat/`
+
+**Expected behavior:**
+1. Skill reviews tests; finds 1 WARNINGS-level finding
+2. No director gate is invoked (QL-TEST-COVERAGE is invoked separately, not here)
+3. Verdict is WARNINGS
+4. Output notes: "For full test coverage gate, run `/gate-check` which invokes QL-TEST-COVERAGE"
+5. Skill offers optional report write; asks "May I write" if user opts in
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Output distinguishes this skill from the QL-TEST-COVERAGE gate invocation
+- [ ] Optional report requires "May I write" before writing
+- [ ] Verdict is WARNINGS for advisory-level test quality issues
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `coding-standards.md` test standards before reviewing test files
+- [ ] Checks naming, Arrange/Act/Assert structure, determinism, isolation, no hardcoded data
+- [ ] Does not edit any test files (read-only skill)
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: PASS, WARNINGS, FAIL
+
+---
+
+## Coverage Notes
+
+- Batch review of all test files in `tests/` is not explicitly tested; behavior
+  is assumed to apply the same checks file by file and aggregate the verdict.
+- The QL-TEST-COVERAGE director gate (which checks test coverage percentage) is
+  a separate concern and is intentionally NOT invoked by this skill.
--- a/Framework/skills/analysis/test-flakiness.md
+++ b/Framework/skills/analysis/test-flakiness.md
@@ -0,0 +1,177 @@
+# Skill Test Spec: /test-flakiness
+
+## Skill Summary
+
+`/test-flakiness` detects non-deterministic tests by analyzing test history logs
+(if available) or scanning test source code for common flakiness patterns (random
+numbers without seeds, real-time waits, external I/O). No director gates are
+invoked. The skill does not write without user approval. Verdicts: NO FLAKINESS,
+SUSPECT TESTS FOUND, or CONFIRMED FLAKY.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
+- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
+- [ ] Has a next-step handoff (what to do after flakiness findings)
+
+---
+
+## Director Gate Checks
+
+None. Flakiness detection is an advisory quality skill for the QA lead; no gates
+are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Clean test history, no flakiness
+
+**Fixture:**
+- `production/qa/test-history/` contains logs for 10 test runs
+- All tests pass consistently across all 10 runs (100% pass rate per test)
+- No test has a failure pattern
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill reads test history logs from `production/qa/test-history/`
+2. Skill computes per-test pass rate across 10 runs
+3. All tests pass all 10 runs — no inconsistency detected
+4. Verdict is NO FLAKINESS
+
+**Assertions:**
+- [ ] Skill reads test history logs when available
+- [ ] Per-test pass rate is computed across all available runs
+- [ ] Verdict is NO FLAKINESS when all tests pass consistently
+- [ ] No files are written
+
+---
+
+### Case 2: Suspect Tests Found — Test fails intermittently in history
+
+**Fixture:**
+- `production/qa/test-history/` contains logs for 10 test runs
+- `test_combat_damage_applies_crit_multiplier` passes 7 times, fails 3 times
+- Failure messages differ (sometimes timeout, sometimes wrong value)
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill reads test history logs — computes pass rates
+2. `test_combat_damage_applies_crit_multiplier` has 70% pass rate (threshold: 95%)
+3. Skill flags it as SUSPECT with pass rate (7/10) and failure pattern noted
+4. Verdict is SUSPECT TESTS FOUND
+5. Skill recommends investigating the test for timing or state dependencies
+
+**Assertions:**
+- [ ] Tests below the pass-rate threshold are flagged by name
+- [ ] Pass rate (fraction and percentage) is shown for each suspect test
+- [ ] Failure pattern (e.g., inconsistent error messages) is noted if detectable
+- [ ] Verdict is SUSPECT TESTS FOUND
+- [ ] Skill recommends investigation steps
+
+---
+
+### Case 3: Source Pattern — Random number used without seed
+
+**Fixture:**
+- No test history logs exist
+- `tests/unit/loot/loot_drop_test.gd` contains:
+  ```gdscript
+  var roll = randf()  # unseeded random — non-deterministic
+  assert_gt(roll, 0.5, "Loot should drop above 50%")
+  ```
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill finds no test history logs
+2. Skill falls back to source code analysis
+3. Skill detects `randf()` call without a preceding `seed()` call
+4. Skill flags the test as FLAKINESS RISK (source pattern, not confirmed)
+5. Verdict is SUSPECT TESTS FOUND (pattern detected, not confirmed by history)
+6. Skill recommends seeding random before the call or mocking the random function
+
+**Assertions:**
+- [ ] Source code analysis is used as fallback when no history logs exist
+- [ ] Unseeded random number usage is detected as a flakiness risk
+- [ ] Verdict is SUSPECT TESTS FOUND (not CONFIRMED FLAKY — no history to confirm)
+- [ ] Remediation recommends seeding or mocking
+
+---
+
+### Case 4: No Test History — Source-only analysis with common patterns
+
+**Fixture:**
+- `production/qa/test-history/` does not exist
+- `tests/` contains 15 test files
+- Scan finds 2 tests using `OS.get_ticks_msec()` for timing assertions
+- No other flakiness patterns found
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill checks for test history — not found
+2. Skill notes: "No test history available — analyzing source code for flakiness patterns only"
+3. Skill scans all test files for known patterns: unseeded random, real-time waits, system clock usage
+4. Finds 2 tests using `OS.get_ticks_msec()` — flags as FLAKINESS RISK
+5. Verdict is SUSPECT TESTS FOUND
+
+**Assertions:**
+- [ ] Skill notes clearly that source-only analysis is being performed (no history)
+- [ ] Common flakiness patterns are scanned: random, time-based assertions, external I/O
+- [ ] `OS.get_ticks_msec()` usage for assertions is flagged as a flakiness risk
+- [ ] Verdict is SUSPECT TESTS FOUND when source patterns are found
+
+---
+
+### Case 5: Gate Compliance — No gate; flakiness report is advisory
+
+**Fixture:**
+- Test history shows 1 CONFIRMED FLAKY test (fails 6 out of 10 runs)
+- `review-mode.txt` contains `full`
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill analyzes test history; identifies 1 confirmed flaky test
+2. No director gate is invoked regardless of review mode
+3. Verdict is CONFIRMED FLAKY
+4. Skill presents findings and offers optional written report
+5. If user opts in: "May I write to `production/qa/flakiness-report-[date].md`?"
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] CONFIRMED FLAKY verdict requires history-based evidence (not just source patterns)
+- [ ] Optional report requires "May I write" before writing
+- [ ] Flakiness report is advisory for qa-lead; skill does not auto-disable tests
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads test history logs when available; falls back to source analysis when not
+- [ ] Notes clearly which analysis mode is being used (history vs. source-only)
+- [ ] Flakiness threshold (e.g., 95% pass rate) is used for SUSPECT classification
+- [ ] CONFIRMED FLAKY requires history evidence; SUSPECT covers source patterns only
+- [ ] Does not disable or modify any test files
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
+
+---
+
+## Coverage Notes
+
+- The pass-rate threshold for SUSPECT classification (95% suggested above) is an
+  implementation detail; the tests verify that intermittent failures are flagged,
+  not the exact threshold value.
+- Tests that fail due to environment issues (missing assets, wrong platform) are
+  not flakiness — the skill distinguishes environment failures from non-determinism
+  in the test itself; this distinction is not explicitly tested here.
--- a/Framework/skills/authoring/architecture-decision.md
+++ b/Framework/skills/authoring/architecture-decision.md
@@ -0,0 +1,197 @@
+# Skill Test Spec: /architecture-decision
+
+## Skill Summary
+
+`/architecture-decision` guides the user through section-by-section authoring of
+a new Architecture Decision Record (ADR). Required sections are: Status, Context,
+Decision, Consequences, Alternatives, and Related ADRs. The skill also stamps the
+engine version reference from `docs/engine-reference/` into the ADR for traceability.
+
+In `full` review mode, TD-ADR (technical-director) and LP-FEASIBILITY
+(lead-programmer) gate agents spawn after the draft is complete. If both gates
+return APPROVED, the ADR status is set to Accepted. In `lean` or `solo` mode,
+both gates are skipped and the ADR is written with Status: Proposed. The skill
+asks "May I write" per section during authoring. ADRs are written to
+`docs/architecture/adr-NNN-[name].md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: ACCEPTED, PROPOSED, CONCERNS
+- [ ] Contains "May I write" collaborative protocol language (per-section approval)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents gate behavior: TD-ADR + LP-FEASIBILITY in full mode; skipped in lean/solo
+- [ ] Documents that ADR status is Accepted (full, gates approve) or Proposed (otherwise)
+- [ ] Mentions engine version stamp from `docs/engine-reference/`
+
+---
+
+## Director Gate Checks
+
+In `full` mode: TD-ADR (technical-director) and LP-FEASIBILITY (lead-programmer)
+spawn after the ADR draft is complete. If both return APPROVED, ADR Status is set
+to Accepted. If either returns CONCERNS or FAIL, ADR stays Proposed.
+
+In `lean` mode: both gates are skipped. ADR is written with Status: Proposed.
+Output notes: "TD-ADR skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode".
+
+In `solo` mode: both gates are skipped. ADR is written with Status: Proposed.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — New ADR for rendering approach, full mode, gates approve
+
+**Fixture:**
+- `docs/architecture/` exists with no existing ADR for rendering
+- `docs/engine-reference/[engine]/VERSION.md` exists
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/architecture-decision rendering-approach`
+
+**Expected behavior:**
+1. Skill guides user through each required section (Status, Context, Decision, Consequences, Alternatives, Related ADRs)
+2. Engine version is stamped into the ADR from `docs/engine-reference/`
+3. For each section: draft shown, "May I write this section?" asked, approved
+4. After all sections: TD-ADR and LP-FEASIBILITY gates spawn in parallel
+5. Both gates return APPROVED
+6. ADR Status is set to Accepted
+7. Skill writes `docs/architecture/adr-NNN-rendering-approach.md`
+8. `docs/architecture/tr-registry.yaml` updated if new TR-IDs are defined
+
+**Assertions:**
+- [ ] All 6 required sections are authored and written
+- [ ] Engine version reference is stamped in the ADR
+- [ ] TD-ADR and LP-FEASIBILITY spawn in parallel (not sequentially)
+- [ ] ADR Status is Accepted when both gates return APPROVED in full mode
+- [ ] "May I write" is asked per section during authoring
+- [ ] File is written to `docs/architecture/adr-NNN-[name].md`
+
+---
+
+### Case 2: Failure Path — TD-ADR returns CONCERNS
+
+**Fixture:**
+- ADR draft is complete (all sections filled)
+- `production/session-state/review-mode.txt` contains `full`
+- TD-ADR gate returns CONCERNS: "The decision does not address [specific concern]"
+
+**Input:** `/architecture-decision [topic]`
+
+**Expected behavior:**
+1. TD-ADR gate spawns and returns CONCERNS with specific feedback
+2. Skill surfaces the concerns to the user
+3. ADR Status remains Proposed (not Accepted)
+4. User is asked: revise the decision to address concerns, or accept as Proposed
+5. ADR is written with Status: Proposed if concerns are not resolved
+
+**Assertions:**
+- [ ] TD-ADR concerns are shown to the user verbatim
+- [ ] ADR Status is Proposed (not Accepted) when TD-ADR returns CONCERNS
+- [ ] Skill does NOT set Status: Accepted while CONCERNS are unresolved
+- [ ] User is given the option to revise and re-run the gate
+
+---
+
+### Case 3: Lean Mode — Both gates skipped; ADR written as Proposed
+
+**Fixture:**
+- `production/session-state/review-mode.txt` contains `lean`
+- ADR draft is authored for a new technical decision
+
+**Input:** `/architecture-decision [topic]`
+
+**Expected behavior:**
+1. Skill guides user through all 6 sections
+2. After draft is complete: both TD-ADR and LP-FEASIBILITY are skipped
+3. Output notes: "TD-ADR skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode"
+4. ADR is written with Status: Proposed (not Accepted, since gates did not approve)
+5. "May I write" is still asked before the final file write
+
+**Assertions:**
+- [ ] Both gate skip notes appear in output
+- [ ] ADR Status is Proposed (not Accepted) in lean mode
+- [ ] "May I write" is still asked before writing the file
+- [ ] Skill writes the ADR after user approval
+
+---
+
+### Case 4: Edge Case — ADR already exists for this topic
+
+**Fixture:**
+- `docs/architecture/` contains an existing ADR covering the same topic
+- The existing ADR has Status: Accepted
+
+**Input:** `/architecture-decision [same-topic]`
+
+**Expected behavior:**
+1. Skill detects an existing ADR covering the same topic
+2. Skill asks: "An ADR for [topic] already exists ([filename]). Update it, or create a new superseding ADR?"
+3. User selects update or supersede
+4. Skill does NOT silently create a duplicate ADR
+
+**Assertions:**
+- [ ] Skill detects the existing ADR before authoring begins
+- [ ] User is offered update or supersede options — no silent duplicate
+- [ ] If update: skill opens the existing ADR for section-by-section revision
+- [ ] If supersede: new ADR references the superseded one in Related ADRs section
+
+---
+
+### Case 5: Director Gate — Status set correctly based on mode and gate outcome
+
+**Fixture:**
+- ADR draft is complete
+- Two scenarios: (a) full mode, both gates APPROVED; (b) full mode, one gate CONCERNS
+
+**Full mode, both APPROVED:**
+- ADR Status is set to Accepted
+
+**Assertions (both approved):**
+- [ ] ADR frontmatter/header shows `Status: Accepted`
+- [ ] Both TD-ADR and LP-FEASIBILITY appear as APPROVED in output
+
+**Full mode, one gate returns CONCERNS:**
+- ADR Status stays Proposed
+
+**Assertions (CONCERNS):**
+- [ ] ADR frontmatter/header shows `Status: Proposed`
+- [ ] Concerns are listed in output
+- [ ] Skill does NOT set Status: Accepted when any gate returns CONCERNS
+
+**Lean/solo mode:**
+- ADR Status is always Proposed regardless of content quality
+
+**Assertions (lean/solo):**
+- [ ] ADR Status is Proposed in lean mode
+- [ ] ADR Status is Proposed in solo mode
+- [ ] No gate output appears in lean or solo mode
+
+---
+
+## Protocol Compliance
+
+- [ ] All 6 required sections authored before gate review
+- [ ] Engine version stamped in ADR from `docs/engine-reference/`
+- [ ] "May I write" asked per section during authoring
+- [ ] TD-ADR and LP-FEASIBILITY spawn in parallel in full mode
+- [ ] Skipped gates noted by name and mode in lean/solo output
+- [ ] ADR Status: Accepted only when full mode AND both gates APPROVED
+- [ ] Ends with next-step handoff: `/architecture-review` or `/create-control-manifest`
+
+---
+
+## Coverage Notes
+
+- ADR numbering (auto-incrementing NNN) is not independently fixture-tested —
+  the skill reads existing ADR filenames to assign the next number.
+- Related ADRs section linking (supersedes / related-to) is tested structurally
+  via Case 4 but not all link types are individually verified.
+- The TR-registry update (when new TR-IDs are defined in the ADR) is part of the
+  write phase — tested implicitly via Case 1.
--- a/Framework/skills/authoring/art-bible.md
+++ b/Framework/skills/authoring/art-bible.md
@@ -0,0 +1,185 @@
+# Skill Test Spec: /art-bible
+
+## Skill Summary
+
+`/art-bible` is a guided, section-by-section art bible authoring skill. It
+produces a comprehensive visual direction document covering: Visual Style overview,
+Color Palette, Typography, Character Design Rules, Environment Style, and UI
+Visual Language. The skill follows the skeleton-first pattern: creates the file
+with all section headers immediately, then fills each section through discussion
+and writes each to disk after user approval.
+
+In `full` review mode, the AD-ART-BIBLE director gate (art director) runs after
+the draft is complete and before any section is written. In `lean` and `solo`
+modes, AD-ART-BIBLE is skipped and only user approval is required. The verdict
+is COMPLETE when all sections are written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language per section
+- [ ] Documents the AD-ART-BIBLE director gate and its mode behavior
+- [ ] Has a next-step handoff (e.g., `/asset-spec` or `/design-system`)
+
+---
+
+## Director Gate Checks
+
+| Gate ID      | Trigger condition              | Mode guard            |
+|--------------|--------------------------------|-----------------------|
+| AD-ART-BIBLE | After draft is complete        | full only (not lean/solo) |
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Full mode, art bible drafted, AD-ART-BIBLE approves
+
+**Fixture:**
+- No existing `design/art-bible.md`
+- `production/session-state/review-mode.txt` contains `full`
+- `design/gdd/game-concept.md` exists with visual tone described
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. Skill creates skeleton `design/art-bible.md` with all section headers
+2. Skill discusses and drafts each section with user collaboration
+3. After all sections are drafted, AD-ART-BIBLE gate is invoked (art director review)
+4. AD-ART-BIBLE returns APPROVED
+5. Skill asks "May I write section [N] to `design/art-bible.md`?" per section
+6. All sections written after approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Skeleton file is created first (before any section content is written)
+- [ ] AD-ART-BIBLE gate is invoked in full mode after draft is complete
+- [ ] Gate approval precedes the "May I write" section asks
+- [ ] All sections are present in the final file
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: AD-ART-BIBLE Returns CONCERNS — Section revised before writing
+
+**Fixture:**
+- Art bible draft complete
+- `production/session-state/review-mode.txt` contains `full`
+- AD-ART-BIBLE gate returns CONCERNS: "Color palette clashes with the dark
+  atmospheric tone described in the game concept"
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. AD-ART-BIBLE gate returns CONCERNS with specific feedback about palette
+2. Skill surfaces feedback to user: "Art director has concerns about the color palette"
+3. Skill returns to the Color Palette section for revision
+4. User and skill revise the palette to align with game concept tone
+5. AD-ART-BIBLE is not re-invoked (user decides to proceed after revision)
+6. Revised section is written after "May I write" approval; verdict is COMPLETE
+
+**Assertions:**
+- [ ] CONCERNS are shown to user before any section is written
+- [ ] Skill returns to the affected section for revision (not all sections)
+- [ ] Revised content (not original) is written to file
+- [ ] Verdict is COMPLETE after revision and approval
+
+---
+
+### Case 3: Lean Mode — AD-ART-BIBLE Skipped, Written With User Approval Only
+
+**Fixture:**
+- No existing art bible
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `lean`
+2. Skill drafts all sections with user collaboration
+3. AD-ART-BIBLE gate is skipped: output notes "[AD-ART-BIBLE] skipped — lean mode"
+4. Skill asks user for direct approval of each section
+5. Sections are written after user confirmation; verdict is COMPLETE
+
+**Assertions:**
+- [ ] AD-ART-BIBLE gate is NOT invoked in lean mode
+- [ ] Skip is explicitly noted: "[AD-ART-BIBLE] skipped — lean mode"
+- [ ] User approval is still required per section (gate skip ≠ approval skip)
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 4: Existing Art Bible — Retrofit Mode
+
+**Fixture:**
+- `design/art-bible.md` already exists with all sections populated
+- User wants to update the Character Design Rules section
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. Skill reads existing art bible and detects all sections populated
+2. Skill offers retrofit: "Art bible exists — which section would you like to update?"
+3. User selects Character Design Rules
+4. Skill drafts updated content; in full mode, AD-ART-BIBLE is invoked for the
+   revised section before writing
+5. Skill asks "May I write Character Design Rules to `design/art-bible.md`?"
+6. Only that section is updated; other sections preserved; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Existing art bible is detected and retrofit is offered
+- [ ] Only the selected section is updated
+- [ ] In full mode: AD-ART-BIBLE gate runs even for single-section retrofit
+- [ ] Other sections are preserved
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 5: Solo Mode — AD-ART-BIBLE Skipped, Noted in Output
+
+**Fixture:**
+- No existing art bible
+- `production/session-state/review-mode.txt` contains `solo`
+
+**Input:** `/art-bible`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `solo`
+2. Art bible is drafted and written with only user approval
+3. AD-ART-BIBLE gate is skipped: output notes "[AD-ART-BIBLE] skipped — solo mode"
+4. No director agents are spawned
+5. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] AD-ART-BIBLE gate is NOT invoked in solo mode
+- [ ] Skip is explicitly noted with "solo mode" label
+- [ ] No director agents of any kind are spawned
+- [ ] Verdict is COMPLETE
+
+---
+
+## Protocol Compliance
+
+- [ ] Creates skeleton file immediately with all section headers
+- [ ] Discusses and drafts one section at a time
+- [ ] AD-ART-BIBLE gate runs in full mode after all sections are drafted
+- [ ] AD-ART-BIBLE is skipped in lean and solo modes — noted by name
+- [ ] Asks "May I write section [N]" per section
+- [ ] Verdict is COMPLETE when all sections are written
+
+---
+
+## Coverage Notes
+
+- The case where AD-ART-BIBLE returns REJECT (not just CONCERNS) is not
+  separately tested; the skill would block writing and ask the user how to
+  proceed (revise or override).
+- The Typography section is listed as a required art bible section but its
+  specific content requirements are not assertion-tested here.
+- The art bible feeds into `/asset-spec` — this relationship is noted in the
+  handoff but not tested as part of this skill's spec.
--- a/Framework/skills/authoring/create-architecture.md
+++ b/Framework/skills/authoring/create-architecture.md
@@ -0,0 +1,187 @@
+# Skill Test Spec: /create-architecture
+
+## Skill Summary
+
+`/create-architecture` guides the user through section-by-section authoring of a
+technical architecture document. It uses a skeleton-first approach — the file is
+created with all required section headers before any content is filled. Each
+section is discussed, drafted, and written individually after user approval. If an
+architecture document already exists, the skill offers retrofit mode to update
+specific sections.
+
+In `full` review mode, TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY
+(lead-programmer) spawn after the complete draft is finished. In `lean` or `solo`
+mode, both gates are skipped. The skill writes to `docs/architecture/architecture.md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Contains "May I write" collaborative protocol language (per-section approval)
+- [ ] Has a next-step handoff at the end (`/architecture-review` or `/create-control-manifest`)
+- [ ] Documents skeleton-first approach
+- [ ] Documents gate behavior: TD-ARCHITECTURE + LP-FEASIBILITY in full mode; skipped in lean/solo
+- [ ] Documents retrofit mode for existing architecture documents
+
+---
+
+## Director Gate Checks
+
+In `full` mode: TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY
+(lead-programmer) spawn in parallel after all sections are drafted and before
+any final approval write.
+
+In `lean` mode: both gates are skipped. Output notes:
+"TD-ARCHITECTURE skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode".
+
+In `solo` mode: both gates are skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — New architecture doc, skeleton-first, full mode gates approve
+
+**Fixture:**
+- No existing `docs/architecture/architecture.md`
+- `docs/architecture/` contains Accepted ADRs for reference
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. Skill creates skeleton `docs/architecture/architecture.md` with all required section headers
+2. For each section: drafts content, shows draft, asks "May I write [section]?", writes after approval
+3. After all sections are drafted: TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel
+4. Both gates return APPROVED
+5. Final "May I confirm architecture is complete?" asked
+6. Session state updated
+
+**Assertions:**
+- [ ] Skeleton file is created with all section headers before any content is written
+- [ ] "May I write [section]?" asked per section during authoring
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel (not sequentially)
+- [ ] Both gates complete before the final completion confirmation
+- [ ] Verdict is APPROVED when both gates return APPROVED
+- [ ] Next-step handoff to `/architecture-review` or `/create-control-manifest` is present
+
+---
+
+### Case 2: Failure Path — TD-ARCHITECTURE returns MAJOR REVISION
+
+**Fixture:**
+- Architecture doc is fully drafted (all sections)
+- `production/session-state/review-mode.txt` contains `full`
+- TD-ARCHITECTURE gate returns MAJOR REVISION: "[specific structural issue]"
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. All sections are drafted and written
+2. TD-ARCHITECTURE gate runs and returns MAJOR REVISION with specific feedback
+3. Skill surfaces the feedback to the user
+4. Architecture is NOT marked as finalized
+5. User is asked: revise the flagged sections, or accept the document as a draft
+
+**Assertions:**
+- [ ] Architecture is NOT marked finalized when TD-ARCHITECTURE returns MAJOR REVISION
+- [ ] Gate feedback is shown to the user with specific issue descriptions
+- [ ] User is given the option to revise specific sections
+- [ ] Skill does NOT auto-finalize despite MAJOR REVISION feedback
+
+---
+
+### Case 3: Lean Mode — Both gates skipped; architecture written with user approval only
+
+**Fixture:**
+- No existing architecture doc
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. Skeleton file is created
+2. All sections are authored and written per-section with user approval
+3. After completion: TD-ARCHITECTURE and LP-FEASIBILITY are skipped
+4. Output notes: "TD-ARCHITECTURE skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode"
+5. Architecture is considered complete based on user approval alone
+
+**Assertions:**
+- [ ] Both gate skip notes appear in output
+- [ ] Architecture document is written with only user approval in lean mode
+- [ ] Skill does NOT block completion because gates were skipped
+- [ ] Next-step handoff is still present
+
+---
+
+### Case 4: Retrofit Mode — Existing architecture doc, user updates a section
+
+**Fixture:**
+- `docs/architecture/architecture.md` already exists with all sections populated
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. Skill detects existing architecture doc and reads its current content
+2. Skill offers retrofit mode: "Architecture doc already exists. Which section would you like to update?"
+3. User selects a section
+4. Skill authors only that section, asks "May I write [section]?"
+5. Only the selected section is updated — other sections unchanged
+
+**Assertions:**
+- [ ] Skill detects and reads the existing architecture doc before offering retrofit
+- [ ] User is asked which section to update — not asked to rewrite the whole document
+- [ ] Only the selected section is updated
+- [ ] Other sections are not modified during a retrofit session
+
+---
+
+### Case 5: Director Gate — Architecture references a Proposed ADR; flagged as risk
+
+**Fixture:**
+- Architecture doc is being authored
+- One section references or depends on an ADR that has `Status: Proposed`
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/create-architecture`
+
+**Expected behavior:**
+1. Skill authors all sections
+2. During authoring, skill detects a reference to a Proposed ADR
+3. Skill flags: "Note: [section] references ADR-NNN which is Proposed — this is a risk until the ADR is accepted"
+4. Risk flag is embedded in the relevant section's content
+5. TD-ARCHITECTURE and LP-FEASIBILITY still run — they are informed of the Proposed ADR risk
+
+**Assertions:**
+- [ ] Proposed ADR reference is detected and flagged during section authoring
+- [ ] Risk note is embedded in the architecture document section
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY still spawn (the risk does not block the gates)
+- [ ] Risk flag names the specific ADR number and title
+
+---
+
+## Protocol Compliance
+
+- [ ] Skeleton file created with all section headers before any content is written
+- [ ] "May I write [section]?" asked per section during authoring
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel in full mode
+- [ ] Skipped gates noted by name and mode in lean/solo output
+- [ ] Proposed ADR references flagged as risks in the document
+- [ ] Ends with next-step handoff: `/architecture-review` or `/create-control-manifest`
+
+---
+
+## Coverage Notes
+
+- The required section list for architecture documents is defined in the skill
+  body and in the `/architecture-review` skill — not re-enumerated here.
+- Engine version stamping in the architecture doc (parallel to ADR stamping)
+  is part of the authoring workflow — tested implicitly via Case 1.
+- The retrofit mode for updating multiple sections in one session follows the
+  same per-section approval pattern — not independently tested for multi-section
+  retrofits.
--- a/Framework/skills/authoring/design-system.md
+++ b/Framework/skills/authoring/design-system.md
@@ -0,0 +1,192 @@
+# Skill Test Spec: /design-system
+
+## Skill Summary
+
+`/design-system` guides the user through section-by-section authoring of a Game
+Design Document (GDD) for a single game system. All 8 required sections must be
+authored: Overview, Player Fantasy, Detailed Rules, Formulas, Edge Cases,
+Dependencies, Tuning Knobs, and Acceptance Criteria. The skill uses a
+skeleton-first approach — it creates the GDD file with all 8 section headers
+before filling any content — and writes each section individually after approval.
+
+The CD-GDD-ALIGN gate (creative-director) runs in both `full` AND `lean` modes.
+It is only skipped in `solo` mode. If an existing GDD file is found, the skill
+offers a retrofit mode to update specific sections rather than rewriting the whole
+document.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION
+- [ ] Contains "May I write" collaborative protocol language (per-section approval)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents skeleton-first approach (file created with headers before content)
+- [ ] Documents CD-GDD-ALIGN gate: active in full AND lean mode; skipped in solo only
+- [ ] Documents retrofit mode for existing GDD files
+
+---
+
+## Director Gate Checks
+
+In `full` mode: CD-GDD-ALIGN (creative-director) gate runs after each section is
+drafted, before writing. If MAJOR REVISION is returned, the section must be
+rewritten before proceeding.
+
+In `lean` mode: CD-GDD-ALIGN still runs (this gate is NOT skipped in lean mode —
+it runs in both full and lean). Only solo mode skips it.
+
+In `solo` mode: CD-GDD-ALIGN is skipped. Output notes:
+"CD-GDD-ALIGN skipped — solo mode". Sections are written with only user approval.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — New GDD, skeleton-first, CD-GDD-ALIGN in lean mode
+
+**Fixture:**
+- No existing GDD for the target system in `design/gdd/`
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Skill creates skeleton file `design/gdd/[system-name].md` with all 8 section headers (empty bodies)
+2. For each section: discusses with user, drafts content, shows draft
+3. CD-GDD-ALIGN gate runs on each section draft (lean mode — gate is active)
+4. Gate returns APPROVED for each section
+5. "May I write [section]?" asked after gate approval
+6. Section written to file after user approval
+7. Process repeats for all 8 sections
+
+**Assertions:**
+- [ ] Skeleton file is created with all 8 section headers before any content is written
+- [ ] CD-GDD-ALIGN runs on each section in lean mode (not skipped)
+- [ ] "May I write" is asked per section (not once for all sections)
+- [ ] Each section is written individually after gate + user approval
+- [ ] All 8 sections are present in the final GDD file
+
+---
+
+### Case 2: Retrofit Mode — Existing GDD, update specific section
+
+**Fixture:**
+- `design/gdd/[system-name].md` already exists with all 8 sections populated
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Skill detects existing GDD file and reads its current content
+2. Skill offers retrofit mode: "GDD already exists. Which section would you like to update?"
+3. User selects a specific section (e.g., Formulas)
+4. Skill authors only that section, runs CD-GDD-ALIGN, asks "May I write?"
+5. Only the selected section is updated — other sections are not modified
+
+**Assertions:**
+- [ ] Skill detects and reads existing GDD before offering retrofit mode
+- [ ] User is asked which section to update — not asked to rewrite the whole document
+- [ ] Only the selected section is rewritten — others remain unchanged
+- [ ] CD-GDD-ALIGN still runs on the updated section
+- [ ] "May I write" is asked before updating the section
+
+---
+
+### Case 3: Director Gate — CD-GDD-ALIGN returns MAJOR REVISION
+
+**Fixture:**
+- New GDD being authored
+- `production/session-state/review-mode.txt` contains `lean`
+- CD-GDD-ALIGN gate returns MAJOR REVISION on the Player Fantasy section
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Player Fantasy section is drafted
+2. CD-GDD-ALIGN gate runs and returns MAJOR REVISION with specific feedback
+3. Skill surfaces the feedback to the user
+4. Section is NOT written to file while MAJOR REVISION is unresolved
+5. User rewrites the section in collaboration with the skill
+6. CD-GDD-ALIGN runs again on the revised section
+7. If revised section passes, "May I write?" is asked and section is written
+
+**Assertions:**
+- [ ] Section is NOT written when CD-GDD-ALIGN returns MAJOR REVISION
+- [ ] Gate feedback is shown to the user before requesting revision
+- [ ] CD-GDD-ALIGN runs again after the section is revised
+- [ ] Skill does NOT auto-proceed to the next section while MAJOR REVISION is unresolved
+
+---
+
+### Case 4: Solo Mode — CD-GDD-ALIGN skipped; sections written with user approval only
+
+**Fixture:**
+- New GDD being authored
+- `production/session-state/review-mode.txt` contains `solo`
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Skeleton file is created with 8 section headers
+2. For each section: drafted, shown to user
+3. CD-GDD-ALIGN is skipped — noted per section: "CD-GDD-ALIGN skipped — solo mode"
+4. "May I write [section]?" asked after user reviews draft
+5. Section written after user approval
+6. No gate review at any stage
+
+**Assertions:**
+- [ ] "CD-GDD-ALIGN skipped — solo mode" noted for each section
+- [ ] Sections are written after user approval alone (no gate required)
+- [ ] Skill does NOT spawn any CD-GDD-ALIGN gate in solo mode
+- [ ] Full GDD is written with only user approval in solo mode
+
+---
+
+### Case 5: Director Gate — Empty sections not written to file
+
+**Fixture:**
+- GDD authoring in progress
+- User and skill discuss one section but do not produce any approved content
+  (e.g., discussion ends without a decision, or user says "skip for now")
+
+**Input:** `/design-system [system-name]`
+
+**Expected behavior:**
+1. Section discussion produces no approved content
+2. Skill does NOT write an empty or placeholder body to the section
+3. The section header remains in the skeleton file but the body stays empty
+4. Skill moves to the next section without writing the empty one
+5. At the end, incomplete sections are listed and user is reminded to return to them
+
+**Assertions:**
+- [ ] Empty or unapproved sections are NOT written to the file
+- [ ] Skeleton section header remains (preserves structure)
+- [ ] Skill tracks and lists incomplete sections at the end of the session
+- [ ] Skill does NOT write "TBD" or placeholder content without user approval
+
+---
+
+## Protocol Compliance
+
+- [ ] Skeleton file created with all 8 headers before any content is written
+- [ ] CD-GDD-ALIGN runs in both full AND lean mode (not just full)
+- [ ] CD-GDD-ALIGN skipped only in solo mode — noted per section
+- [ ] "May I write [section]?" asked per section (not once for the whole document)
+- [ ] MAJOR REVISION from CD-GDD-ALIGN blocks section write until resolved
+- [ ] Only approved, non-empty sections are written to the file
+- [ ] Ends with next-step handoff: `/review-all-gdds` or `/map-systems next`
+
+---
+
+## Coverage Notes
+
+- The 8 required sections are validated against the project's design document
+  standards defined in `CLAUDE.md` — not re-enumerated here.
+- The skill's internal section-ordering logic (which section to author first) is
+  not independently tested — the order follows the standard GDD template.
+- Pillar alignment checking within CD-GDD-ALIGN is evaluated holistically by
+  the gate agent — specific pillar checks are not fixture-tested here.
--- a/Framework/skills/authoring/quick-design.md
+++ b/Framework/skills/authoring/quick-design.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /quick-design
+
+## Skill Summary
+
+`/quick-design` produces a lightweight design spec for features too small to
+warrant a full 8-section GDD. The target scope is under 4 hours of design time
+for a single-system feature. Instead of the full 8-section GDD format, the
+quick-design spec uses a streamlined 3-section format: Overview, Rules, and
+Acceptance Criteria.
+
+The skill has no director gates — adding gate overhead would defeat the purpose
+of a lightweight design tool. The skill asks "May I write" before writing the
+design note to `design/quick-notes/[name].md`. If the feature scope is too large
+for a quick-design, the skill redirects to `/design-system` instead.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: CREATED, BLOCKED, REDIRECTED
+- [ ] Contains "May I write" collaborative protocol language (for quick-note file)
+- [ ] Has a next-step handoff at the end
+- [ ] Explicitly notes: no director gates (lightweight skill by design)
+- [ ] Mentions scope check: redirects to `/design-system` if scope exceeds sub-4h threshold
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents. The lightweight
+nature of quick-design means director gate overhead is intentionally absent.
+Full GDD review is not needed for sub-4-hour single-system features.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Small UI change produces a 3-section spec
+
+**Fixture:**
+- No existing quick-note for the target feature
+- Feature is clearly scoped: a single UI element change with no cross-system impact
+
+**Input:** `/quick-design [feature-name]`
+
+**Expected behavior:**
+1. Skill asks scoping questions: what system, what change, what is the acceptance signal
+2. Skill determines scope is within the sub-4h threshold
+3. Skill drafts a 3-section spec: Overview, Rules, Acceptance Criteria
+4. Draft is shown to user
+5. "May I write `design/quick-notes/[name].md`?" is asked
+6. File is written after approval
+
+**Assertions:**
+- [ ] Spec contains exactly 3 sections: Overview, Rules, Acceptance Criteria
+- [ ] Draft is shown to user before "May I write" ask
+- [ ] "May I write `design/quick-notes/[name].md`?" is asked before writing
+- [ ] File is written to the correct path: `design/quick-notes/[name].md`
+- [ ] Verdict is CREATED after successful write
+
+---
+
+### Case 2: Failure Path — Scope check fails; redirected to /design-system
+
+**Fixture:**
+- Feature described spans multiple systems or would take more than 4 hours of design time
+  (e.g., "redesign the entire combat system" or "new progression mechanic affecting all classes")
+
+**Input:** `/quick-design [large-feature]`
+
+**Expected behavior:**
+1. Skill asks scoping questions
+2. Skill determines scope exceeds the sub-4h / single-system threshold
+3. Skill outputs: "This feature is too large for a quick-design. Use `/design-system [name]` for a full GDD."
+4. Skill does NOT write a quick-note file
+5. Verdict is REDIRECTED
+
+**Assertions:**
+- [ ] Skill detects the scope excess and stops before drafting
+- [ ] Message explicitly names `/design-system` as the correct alternative
+- [ ] No quick-note file is written
+- [ ] Verdict is REDIRECTED (not CREATED or BLOCKED)
+
+---
+
+### Case 3: Edge Case — File already exists; offered to update
+
+**Fixture:**
+- `design/quick-notes/[name].md` already exists from a previous session
+
+**Input:** `/quick-design [name]`
+
+**Expected behavior:**
+1. Skill detects existing quick-note file and reads its current content
+2. Skill asks: "[name].md already exists. Update it, or create a new version?"
+3. User selects update
+4. Skill shows the existing spec and asks which section to revise
+5. Updated spec is shown, "May I write?" asked, file updated after approval
+
+**Assertions:**
+- [ ] Skill detects and reads the existing file before offering to update
+- [ ] User is offered update or create-new options — not auto-overwritten
+- [ ] Only the revised section is updated (or the whole spec if user chooses full rewrite)
+- [ ] "May I write" is asked before overwriting the existing file
+
+---
+
+### Case 4: Edge Case — No argument provided
+
+**Fixture:**
+- `design/quick-notes/` directory may or may not exist
+
+**Input:** `/quick-design` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Skill outputs a usage error: "No feature name specified. Usage: /quick-design [feature-name]"
+3. Skill provides an example: `/quick-design pause-menu-settings`
+4. No file is created
+
+**Assertions:**
+- [ ] Skill outputs a usage error when no argument is given
+- [ ] A usage example is shown with the correct format
+- [ ] No quick-note file is written
+- [ ] Skill does NOT silently pick a feature name or default to any action
+
+---
+
+### Case 5: Director Gate — No gate spawned; explicitly noted for sub-4h features
+
+**Fixture:**
+- Feature is within scope for quick-design
+- `production/session-state/review-mode.txt` exists with `full`
+
+**Input:** `/quick-design [feature-name]`
+
+**Expected behavior:**
+1. Skill asks scoping questions and determines scope is within threshold
+2. Skill does NOT read `production/session-state/review-mode.txt`
+3. Skill does NOT spawn any director gate agent
+4. Spec is drafted, "May I write" asked, file written after approval
+5. Output explicitly notes: "No director gate review — quick-design is for sub-4h features"
+
+**Assertions:**
+- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output contains a note explaining why no gate review is needed
+- [ ] Review mode has no effect on this skill's behavior
+- [ ] Full GDD review path (`/design-system`) is mentioned as the alternative for larger features
+
+---
+
+## Protocol Compliance
+
+- [ ] Scope check runs before drafting (redirects to `/design-system` if scope too large)
+- [ ] 3-section format used (Overview, Rules, Acceptance Criteria) — NOT the 8-section GDD format
+- [ ] Draft shown to user before "May I write" ask
+- [ ] "May I write `design/quick-notes/[name].md`?" asked before writing
+- [ ] No director gates — no review-mode.txt read
+- [ ] Ends with next-step handoff (e.g., proceed to implementation or `/dev-story`)
+
+---
+
+## Coverage Notes
+
+- The scope threshold heuristic (sub-4h, single-system) is a judgment call —
+  the skill's internal check is the authoritative definition and is not
+  independently tested by counting hours.
+- The `design/quick-notes/` directory is created automatically if it does not
+  exist — this filesystem behavior is not independently tested here.
+- Integration with the story pipeline (can a quick-design generate a story
+  directly?) is out of scope for this spec — quick-designs are standalone.
--- a/Framework/skills/authoring/ux-design.md
+++ b/Framework/skills/authoring/ux-design.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /ux-design
+
+## Skill Summary
+
+`/ux-design` is a guided, section-by-section UX spec authoring skill. It produces
+user flow diagrams (described textually), interaction state definitions, wireframe
+descriptions, and accessibility notes for a specified screen or HUD element. The
+skill follows the skeleton-first pattern: it creates the file with all section
+headers immediately, then fills each section through discussion and writes each
+section to disk after user approval.
+
+The skill has no inline director gates — `/ux-review` is the separate review step.
+Each section requires a "May I write section [N] to [filepath]?" ask. If a UX spec
+already exists for the named screen, the skill offers to retrofit individual sections
+rather than replace. Verdict is COMPLETE when all sections are written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language per section
+- [ ] Has a next-step handoff (e.g., `/ux-review` to validate the completed spec)
+
+---
+
+## Director Gate Checks
+
+None. `/ux-design` has no inline director gates. `/ux-review` is the separate
+review skill invoked after this skill completes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — New HUD spec, all sections authored and written
+
+**Fixture:**
+- No existing HUD UX spec in `design/ux/`
+- Engine and rendering preferences configured
+
+**Input:** `/ux-design hud`
+
+**Expected behavior:**
+1. Skill creates a skeleton file `design/ux/hud.md` with all section headers
+2. Skill discusses and drafts each section: User Flows, Interaction States
+   (normal/hover/focus/disabled), Wireframe Description, Accessibility Notes
+3. After each section is drafted and user confirms, skill asks "May I write
+   section [N] to `design/ux/hud.md`?"
+4. Each section is written in sequence after approval
+5. After all sections are written, verdict is COMPLETE
+6. Skill suggests running `/ux-review` as the next step
+
+**Assertions:**
+- [ ] Skeleton file is created first (with empty section bodies)
+- [ ] "May I write section [N]" is asked per section (not once at the end)
+- [ ] All required sections are present: User Flows, Interaction States,
+     Wireframe Description, Accessibility Notes
+- [ ] Handoff to `/ux-review` is at the end
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Existing UX Spec — Retrofit: user picks section to update
+
+**Fixture:**
+- `design/ux/hud.md` already exists with all sections populated
+- User wants to update only the Accessibility Notes section
+
+**Input:** `/ux-design hud`
+
+**Expected behavior:**
+1. Skill reads existing `design/ux/hud.md` and detects all sections are populated
+2. Skill reports: "UX spec already exists for HUD — offering to retrofit"
+3. Skill lists all sections and asks which to update
+4. User selects Accessibility Notes
+5. Skill drafts updated accessibility content and asks "May I write section
+   Accessibility Notes to `design/ux/hud.md`?"
+6. Only that section is updated; other sections are preserved; verdict is COMPLETE
+
+**Assertions:**
+- [ ] Existing spec is detected and retrofit is offered
+- [ ] User selects which section(s) to update
+- [ ] Only the selected section is updated — other sections unchanged
+- [ ] "May I write" is asked for the updated section
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 3: Dependency Gap — Spec references a system with no design doc
+
+**Fixture:**
+- User is authoring a UX spec for the inventory screen
+- `design/gdd/inventory.md` does not exist
+
+**Input:** `/ux-design inventory-screen`
+
+**Expected behavior:**
+1. Skill begins authoring the inventory screen UX spec
+2. During the User Flows section, skill attempts to reference inventory system rules
+3. Skill detects: "No GDD found for inventory system — UX spec has a DEPENDENCY GAP"
+4. The dependency gap is flagged in the spec (noted inline: "DEPENDENCY GAP: inventory GDD")
+5. Skill continues authoring with placeholder notes for the missing rules
+6. Verdict is COMPLETE with advisory note about the dependency gap
+
+**Assertions:**
+- [ ] DEPENDENCY GAP label appears in the spec for the missing system doc
+- [ ] Skill does NOT block on the missing GDD — it continues with placeholders
+- [ ] Dependency gap is also noted in the skill output (not just in the file)
+- [ ] Handoff suggests both `/ux-review` and writing the missing GDD
+
+---
+
+### Case 4: No Argument Provided — Usage error
+
+**Fixture:**
+- No argument provided with the skill invocation
+
+**Input:** `/ux-design`
+
+**Expected behavior:**
+1. Skill detects no screen name or argument provided
+2. Skill outputs a usage error: "Screen name required. Usage: `/ux-design [screen-name]`"
+3. Skill provides examples: `/ux-design hud`, `/ux-design main-menu`, `/ux-design inventory`
+4. No file is created; no "May I write" is asked
+
+**Assertions:**
+- [ ] Usage error is clearly stated
+- [ ] Example invocations are provided
+- [ ] No file is created
+- [ ] Skill does not attempt to proceed without an argument
+
+---
+
+### Case 5: Director Gate Check — No gate; ux-review is the separate review skill
+
+**Fixture:**
+- New screen spec with argument provided
+
+**Input:** `/ux-design settings-menu`
+
+**Expected behavior:**
+1. Skill authors all sections of the settings menu UX spec
+2. No director agents are spawned
+3. No gate IDs appear in output during authoring
+
+**Assertions:**
+- [ ] No director gate is invoked during ux-design
+- [ ] No gate skip messages appear
+- [ ] Verdict is COMPLETE without any gate check
+
+---
+
+## Protocol Compliance
+
+- [ ] Creates skeleton file with all section headers before discussing content
+- [ ] Discusses and drafts one section at a time
+- [ ] Asks "May I write section [N]" after each section is approved
+- [ ] Detects existing spec and offers retrofit path
+- [ ] Ends with handoff to `/ux-review`
+- [ ] Verdict is COMPLETE when all sections are written
+
+---
+
+## Coverage Notes
+
+- Interaction state enumeration (normal/hover/focus/disabled/error) is a core
+  requirement of each spec; the `/ux-review` skill checks for completeness.
+- Wireframe descriptions are text-only (no images); image references may be
+  added manually by a designer after the fact.
+- Responsive layout concerns (different screen sizes) are noted as optional
+  content and not assertion-tested here.
--- a/Framework/skills/authoring/ux-review.md
+++ b/Framework/skills/authoring/ux-review.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /ux-review
+
+## Skill Summary
+
+`/ux-review` validates an existing UX spec or HUD design document against
+accessibility and interaction standards. It checks for required sections
+(User Flows, Interaction States, Wireframe Description, Accessibility Notes),
+completeness of interaction state definitions (hover, focus, disabled, error),
+accessibility compliance (keyboard navigation, color contrast notes, screen
+reader considerations), and consistency with the art bible or design system
+if those documents exist.
+
+The skill is read-only — it produces no file writes. Verdicts: APPROVED
+(all checks pass), NEEDS REVISION (fixable issues found), or MAJOR REVISION
+NEEDED (structural or accessibility failures). No director gates apply —
+`/ux-review` IS the review gate for UX specs.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Does NOT contain "May I write" language (skill is read-only)
+- [ ] Has a next-step handoff (e.g., back to `/ux-design` for revision, or proceed to implementation)
+
+---
+
+## Director Gate Checks
+
+None. `/ux-review` is itself the review gate for UX specs. No additional director
+gates are invoked within this skill.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Complete UX spec with all required sections, APPROVED
+
+**Fixture:**
+- `design/ux/hud.md` exists with all required sections populated:
+  - User Flows: complete player flow diagrams
+  - Interaction States: normal, hover, focus, disabled, error all defined
+  - Wireframe Description: layout described
+  - Accessibility Notes: keyboard nav, contrast ratios, screen reader notes
+
+**Input:** `/ux-review hud`
+
+**Expected behavior:**
+1. Skill reads `design/ux/hud.md`
+2. Skill checks all 4 required sections — all present and non-empty
+3. Skill checks interaction states — all 5 states defined
+4. Skill checks accessibility notes — keyboard, contrast, and screen reader covered
+5. Skill outputs: checklist of all passed checks
+6. Verdict is APPROVED
+
+**Assertions:**
+- [ ] All 4 required sections are checked
+- [ ] All 5 interaction states are verified present
+- [ ] Verdict is APPROVED
+- [ ] No files are written
+
+---
+
+### Case 2: Missing Accessibility Section — NEEDS REVISION
+
+**Fixture:**
+- `design/ux/hud.md` exists but the Accessibility Notes section is empty
+- All other sections are fully populated
+
+**Input:** `/ux-review hud`
+
+**Expected behavior:**
+1. Skill reads the file and checks all sections
+2. Accessibility Notes section is empty — check fails
+3. Skill outputs: "NEEDS REVISION — Accessibility Notes section is empty"
+4. Skill lists specific items to add: keyboard navigation, color contrast ratios,
+   screen reader labels
+5. Verdict is NEEDS REVISION
+6. Handoff suggests returning to `/ux-design hud` to fill in the section
+
+**Assertions:**
+- [ ] NEEDS REVISION verdict is returned (not APPROVED or MAJOR REVISION NEEDED)
+- [ ] Specific missing content items are listed
+- [ ] Handoff points back to `/ux-design hud` for revision
+- [ ] No files are written
+
+---
+
+### Case 3: Interaction States Incomplete — NEEDS REVISION
+
+**Fixture:**
+- `design/ux/settings-menu.md` exists
+- Interaction States section only defines: normal and hover
+- Missing: focus, disabled, error states
+
+**Input:** `/ux-review settings-menu`
+
+**Expected behavior:**
+1. Skill reads the file and checks interaction states
+2. Only 2 of 5 required states are defined
+3. Skill reports: "NEEDS REVISION — Interaction states incomplete: missing focus, disabled, error"
+4. Verdict is NEEDS REVISION with specific missing states named
+
+**Assertions:**
+- [ ] NEEDS REVISION verdict returned
+- [ ] All 3 missing states are named explicitly in the output
+- [ ] Skill does not return MAJOR REVISION NEEDED for a fixable gap
+- [ ] Handoff suggests returning to `/ux-design settings-menu`
+
+---
+
+### Case 4: File Not Found — Error with remediation
+
+**Fixture:**
+- `design/ux/inventory-screen.md` does not exist
+
+**Input:** `/ux-review inventory-screen`
+
+**Expected behavior:**
+1. Skill attempts to read `design/ux/inventory-screen.md` — file not found
+2. Skill outputs: "UX spec not found: design/ux/inventory-screen.md"
+3. Skill suggests running `/ux-design inventory-screen` to create the spec first
+4. No review is performed; no verdict is issued
+
+**Assertions:**
+- [ ] Error message names the missing file with full path
+- [ ] `/ux-design inventory-screen` is suggested as the remediation
+- [ ] No review checklist is produced
+- [ ] No verdict is issued (error state, not APPROVED/NEEDS REVISION)
+
+---
+
+### Case 5: Director Gate Check — No gate; ux-review is itself the review
+
+**Fixture:**
+- Valid UX spec file
+
+**Input:** `/ux-review hud`
+
+**Expected behavior:**
+1. Skill performs the review and issues a verdict
+2. No additional director agents are spawned
+3. No gate IDs appear in output
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Verdict is APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED — no gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Checks all 4 required sections (User Flows, Interaction States, Wireframe,
+     Accessibility Notes)
+- [ ] Checks all 5 interaction states (normal, hover, focus, disabled, error)
+- [ ] Checks accessibility coverage (keyboard nav, contrast, screen reader)
+- [ ] Does not write any files
+- [ ] Issues specific, actionable feedback when verdict is not APPROVED
+- [ ] Ends with next-step handoff to `/ux-design` for revision or implementation
+
+---
+
+## Coverage Notes
+
+- MAJOR REVISION NEEDED is triggered when structural sections are entirely
+  absent (not just empty) or when fundamental interaction flows are missing
+  entirely; not tested with a separate fixture here.
+- Art bible / design system consistency check (color palette alignment) is
+  mentioned as a capability but not separately fixture-tested.
+- The case where an existing spec was written for a now-renamed screen is
+  not tested; the skill would review the file by path regardless of the name.
--- a/Framework/skills/gate/gate-check.md
+++ b/Framework/skills/gate/gate-check.md
@@ -0,0 +1,200 @@
+# Skill Test Spec: /gate-check
+
+## Skill Summary
+
+`/gate-check` validates whether the project is ready to advance to the next
+development phase. It checks for required artifacts, runs quality checks, asks
+the user about unverifiable items, and produces a PASS/CONCERNS/FAIL verdict.
+On PASS with user confirmation, it writes the new stage name to
+`production/stage.txt`. It governs all 6 phase transitions and is the most
+critical gate-keeping skill in the pipeline.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings (numbered Phase N or ## sections)
+- [ ] Contains verdict keywords: PASS, CONCERNS, FAIL
+- [ ] Contains "May I write" collaborative protocol language
+- [ ] Has a next-step handoff at the end (Follow-Up Actions section)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All Concept artifacts present, advancing to Systems Design
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists, has content including all required sections
+- `design/gdd/game-pillars.md` exists (or pillars defined within concept doc)
+- No systems index yet (which is correct for this stage)
+
+**Input:** `/gate-check systems-design`
+
+**Expected behavior:**
+1. Skill reads `design/gdd/game-concept.md` and verifies it has content
+2. Skill checks for game pillars (in concept or separate file)
+3. Skill checks quality items (core loop described, target audience identified)
+4. Skill outputs structured checklist with all items marked
+5. Skill presents PASS/CONCERNS/FAIL verdict
+6. If PASS: skill asks "May I update `production/stage.txt` to 'Systems Design'?"
+
+**Assertions:**
+- [ ] Skill uses Glob or Read to verify `design/gdd/game-concept.md` exists before marking it checked
+- [ ] Output includes a "Required Artifacts" section with check status per item
+- [ ] Output includes a "Quality Checks" section with check status per item
+- [ ] Output includes a "Verdict" line with one of PASS / CONCERNS / FAIL
+- [ ] Skill asks about unverifiable quality items (e.g., "Has this been reviewed?") rather than assuming PASS
+- [ ] Skill asks "May I write" before updating `production/stage.txt`
+- [ ] Skill does NOT write `production/stage.txt` without explicit user confirmation
+
+---
+
+### Case 2: Failure Path — Missing required artifacts for Concept → Systems Design
+
+**Fixture:**
+- `design/gdd/game-concept.md` does NOT exist
+- No game pillars document exists
+- `design/gdd/` directory is empty or absent
+
+**Input:** `/gate-check systems-design`
+
+**Expected behavior:**
+1. Skill attempts to read `design/gdd/game-concept.md` — file not found
+2. Skill marks required artifact as missing (not present)
+3. Skill outputs FAIL verdict
+4. Skill lists blocker: "No game concept document found"
+5. Skill suggests remediation: run `/brainstorm` to create one
+
+**Assertions:**
+- [ ] Verdict is FAIL (not PASS or CONCERNS) when required artifacts are missing
+- [ ] Output explicitly names `design/gdd/game-concept.md` as missing
+- [ ] Output includes a "Blockers" section with at least 1 item
+- [ ] Output recommends `/brainstorm` as the remediation action
+- [ ] Skill does NOT write `production/stage.txt` when verdict is FAIL
+
+---
+
+### Case 3: No Argument — Auto-detect current stage
+
+**Fixture:**
+- `production/stage.txt` contains `Concept`
+- `design/gdd/game-concept.md` exists with content
+- No systems index yet
+
+**Input:** `/gate-check` (no argument)
+
+**Expected behavior:**
+1. Skill reads `production/stage.txt` to determine current stage
+2. Skill determines the next gate is Concept → Systems Design
+3. Skill proceeds with the Systems Design gate checks
+4. Output clearly states which transition is being validated
+
+**Assertions:**
+- [ ] Skill reads `production/stage.txt` (or uses project-stage-detect heuristics) to determine current stage
+- [ ] Output header names both current and target phases (e.g., "Gate Check: Concept → Systems Design")
+- [ ] Skill does not ask the user which gate to check if current stage is determinable
+
+---
+
+### Case 4: Edge Case — Manual check items flagged correctly
+
+**Fixture:**
+- All required artifacts for Concept → Systems Design are present
+- No playtest or review record exists (can't auto-verify quality checks)
+
+**Input:** `/gate-check systems-design`
+
+**Expected behavior:**
+1. Skill verifies all artifact files exist
+2. Skill encounters quality check: "Game concept reviewed (not MAJOR REVISION NEEDED)"
+3. Since no review record exists, skill marks item as MANUAL CHECK NEEDED
+4. Skill asks the user: "Has the game concept been reviewed for design quality?"
+5. Skill waits for user input before finalizing verdict
+
+**Assertions:**
+- [ ] Items that cannot be auto-verified are marked `[?] MANUAL CHECK NEEDED` rather than assumed PASS
+- [ ] Skill uses a question to the user for at least one unverifiable quality item
+- [ ] Skill does not mark unverifiable items as PASS by default
+
+---
+
+---
+
+### Case 5: Director Gate — lean vs full vs solo mode
+
+**Fixture:**
+- `production/session-state/review-mode.txt` exists (or equivalent state file)
+- All required artifacts for the target gate are present
+- `design/gdd/game-concept.md` exists
+
+**Case 5a — full mode:**
+- `review-mode.txt` contains `full`
+
+**Input:** `/gate-check systems-design` (with full mode active)
+
+**Expected behavior:**
+1. Skill reads review mode — determines `full`
+2. Skill spawns all 4 PHASE-GATE director prompts in parallel:
+   - CD-PHASE-GATE (creative-director)
+   - TD-PHASE-GATE (technical-director)
+   - PR-PHASE-GATE (producer)
+   - AD-PHASE-GATE (art-director)
+3. If one director returns CONCERNS → overall gate verdict is at minimum CONCERNS
+4. All 4 verdicts are collected before producing final output
+
+**Assertions (5a):**
+- [ ] Skill reads review-mode before deciding which directors to spawn
+- [ ] All 4 PHASE-GATE director prompts are spawned (not just 1 or 2)
+- [ ] Directors are spawned in parallel (simultaneous, not sequential)
+- [ ] A CONCERNS verdict from any one director propagates to overall verdict
+- [ ] Verdict is NOT auto-PASS if any director returns CONCERNS or REJECT
+
+**Case 5b — solo mode:**
+- `review-mode.txt` contains `solo`
+
+**Input:** `/gate-check systems-design` (with solo mode active)
+
+**Expected behavior:**
+1. Skill reads review mode — determines `solo`
+2. Each director is noted as skipped: "[CD-PHASE-GATE] skipped — Solo mode"
+3. Gate verdict is derived from artifact/quality checks only
+4. No director gates spawn
+
+**Assertions (5b):**
+- [ ] No director gates are spawned in solo mode
+- [ ] Each skipped gate is explicitly noted in output: "[GATE-ID] skipped — Solo mode"
+- [ ] Verdict is based on artifact and quality checks only
+
+**Note on Case 3 correction:**
+The Case 3 assertions previously stated "Skill does not ask the user which gate to check
+if current stage is determinable." This is correct. However, the skill DOES use
+AskUserQuestion to confirm the auto-detected transition before running full checks —
+this is a confirmation step, not a gate selection. Assertions for Case 3 should not
+treat this confirmation as a failure.
+
+---
+
+## Protocol Compliance
+
+- [ ] Uses "May I write" before updating `production/stage.txt`
+- [ ] Presents the full checklist report before asking for write approval
+- [ ] Ends with a "Follow-Up Actions" section listing next steps per verdict
+- [ ] Never advances the stage without explicit user confirmation
+- [ ] Never auto-creates `production/stage.txt` if it doesn't exist without asking
+
+---
+
+## Coverage Notes
+
+- The Production → Polish and Polish → Release gates are not covered here
+  because they require complex multi-artifact setups (sprint plans, playtest
+  data, QA sign-off); these are deferred to dedicated follow-up specs.
+- The "CONCERNS" verdict path (minor gaps, not blocking) is not explicitly
+  tested here; it falls between Case 1 and Case 2 and follows the same pattern.
+- The Vertical Slice validation block (Pre-Production → Production gate) is not
+  covered because it requires a playable build context that cannot be expressed
+  as a document fixture.
--- a/Framework/skills/pipeline/create-control-manifest.md
+++ b/Framework/skills/pipeline/create-control-manifest.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /create-control-manifest
+
+## Skill Summary
+
+`/create-control-manifest` reads all Accepted ADRs from `docs/architecture/` and
+generates a control manifest — a summary document that captures all architectural
+constraints, required patterns, and forbidden patterns in one place. The manifest
+is the reference document that story authors use when writing story files, ensuring
+stories inherit the correct architectural rules without having to read all ADRs
+individually.
+
+The skill only includes Accepted ADRs; Proposed ADRs are excluded and noted. It
+has no director gates. The skill asks "May I write" before writing
+`docs/architecture/control-manifest.md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: CREATED, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language (for control-manifest.md)
+- [ ] Has a next-step handoff at the end (`/create-epics` or `/create-stories`)
+- [ ] Documents that only Accepted ADRs are included (not Proposed)
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents. The control
+manifest is a mechanical extraction from Accepted ADRs; no creative or technical
+review gate is needed.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — 4 Accepted ADRs create a correct manifest
+
+**Fixture:**
+- `docs/architecture/` contains 4 ADR files, all with `Status: Accepted`
+- Each ADR has a "Required Patterns" and/or "Forbidden Patterns" section
+- No existing `docs/architecture/control-manifest.md`
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill reads all ADR files in `docs/architecture/`
+2. Extracts Required Patterns, Forbidden Patterns, and key constraints from each
+3. Drafts the manifest with correct section structure
+4. Shows the draft manifest to the user
+5. Asks "May I write `docs/architecture/control-manifest.md`?"
+6. Writes the manifest after approval
+
+**Assertions:**
+- [ ] All 4 Accepted ADRs are represented in the manifest
+- [ ] Manifest includes distinct sections for Required Patterns and Forbidden Patterns
+- [ ] Manifest includes the source ADR number for each constraint
+- [ ] "May I write" is asked before writing
+- [ ] Skill does NOT write without approval
+- [ ] Verdict is CREATED after writing
+
+---
+
+### Case 2: Failure Path — No ADRs found
+
+**Fixture:**
+- `docs/architecture/` directory exists but contains no ADR files
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill reads `docs/architecture/` and finds no ADR files
+2. Skill outputs: "No ADRs found. Run `/architecture-decision` to create ADRs before generating the control manifest."
+3. Skill exits without creating any file
+4. Verdict is BLOCKED
+
+**Assertions:**
+- [ ] Skill outputs a clear error when no ADRs are found
+- [ ] No control manifest file is written
+- [ ] Skill recommends `/architecture-decision` as the next action
+- [ ] Verdict is BLOCKED (not an error crash)
+
+---
+
+### Case 3: Mixed ADR Statuses — Only Accepted ADRs included
+
+**Fixture:**
+- `docs/architecture/` contains 3 Accepted ADRs and 2 Proposed ADRs
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill reads all ADR files and filters by Status: Accepted
+2. Manifest is drafted from the 3 Accepted ADRs only
+3. Output notes: "2 Proposed ADRs were excluded: [adr-NNN-name, adr-NNN-name]"
+4. User sees which ADRs were excluded before approving the write
+5. Asks "May I write `docs/architecture/control-manifest.md`?"
+
+**Assertions:**
+- [ ] Only the 3 Accepted ADRs appear in the manifest content
+- [ ] Excluded Proposed ADRs are listed by name in the output
+- [ ] User sees the exclusion list before approving the write
+- [ ] Skill does NOT silently omit Proposed ADRs without noting them
+
+---
+
+### Case 4: Edge Case — Manifest already exists
+
+**Fixture:**
+- `docs/architecture/control-manifest.md` already exists (version 1, dated last week)
+- `docs/architecture/` contains Accepted ADRs (some new since last manifest)
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill detects existing manifest and reads its version number / date
+2. Skill offers to regenerate: "control-manifest.md already exists (v1, [date]). Regenerate with current ADRs?"
+3. If user confirms: skill drafts updated manifest, increments version number
+4. Asks "May I write `docs/architecture/control-manifest.md`?" (overwrite)
+5. Writes updated manifest after approval
+
+**Assertions:**
+- [ ] Skill reads and reports the existing manifest version before offering to regenerate
+- [ ] User is offered a regenerate/skip choice — not auto-overwritten
+- [ ] Updated manifest has an incremented version number
+- [ ] "May I write" is asked before overwriting the existing file
+
+---
+
+### Case 5: Director Gate — No gate spawned; no review-mode.txt read
+
+**Fixture:**
+- 4 Accepted ADRs exist
+- `production/session-state/review-mode.txt` exists with `full`
+
+**Input:** `/create-control-manifest`
+
+**Expected behavior:**
+1. Skill reads ADRs and drafts manifest
+2. Skill does NOT read `production/session-state/review-mode.txt`
+3. No director gate agents are spawned at any point
+4. Skill proceeds directly to "May I write" after drafting
+5. Review mode setting has no effect on this skill's behavior
+
+**Assertions:**
+- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
+- [ ] The manifest is generated from ADRs alone, with no external gate review
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads all ADR files before drafting manifest
+- [ ] Only Accepted ADRs included — Proposed ones noted as excluded
+- [ ] Manifest draft shown to user before "May I write" ask
+- [ ] "May I write `docs/architecture/control-manifest.md`?" asked before writing
+- [ ] No director gates — no review-mode.txt read
+- [ ] Ends with next-step handoff: `/create-epics` or `/create-stories`
+
+---
+
+## Coverage Notes
+
+- The exact section structure of the generated manifest (constraint tables, pattern
+  lists) is defined by the skill body and not re-enumerated in test assertions.
+- The `version` field incrementing logic (v1 → v2) is tested via Case 4 but exact
+  version numbering format is not fixture-locked.
+- ADR parsing (extracting Required/Forbidden Patterns) depends on consistent ADR
+  structure — tested implicitly via Case 1's fixture.
--- a/Framework/skills/pipeline/create-epics.md
+++ b/Framework/skills/pipeline/create-epics.md
@@ -0,0 +1,190 @@
+# Skill Test Spec: /create-epics
+
+## Skill Summary
+
+`/create-epics` reads all approved GDDs and translates them into EPIC.md files,
+one per system. Epics are organized by layer (Foundation → Core → Feature →
+Presentation) and processed in priority order within each layer. Each EPIC.md
+includes scope, governing ADRs, GDD requirements, engine risk level, and a
+Definition of Done. The skill asks "May I write" before creating each EPIC file.
+
+In `full` review mode, a PR-EPIC gate (producer) runs after drafting epics and
+before writing any files. In `lean` or `solo` mode, PR-EPIC is skipped and noted.
+Epics are written to `production/epics/[layer]/EPIC-[name].md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: CREATED, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language (per-epic approval)
+- [ ] Has a next-step handoff at the end (`/create-stories`)
+- [ ] Documents PR-EPIC gate behavior: runs in full mode; skipped in lean/solo
+
+---
+
+## Director Gate Checks
+
+In `full` mode: PR-EPIC (producer) gate runs after epics are drafted and before
+any epic file is written. If PR-EPIC returns CONCERNS, epics are revised before
+the "May I write" ask.
+
+In `lean` mode: PR-EPIC is skipped. Output notes: "PR-EPIC skipped — lean mode".
+
+In `solo` mode: PR-EPIC is skipped. Output notes: "PR-EPIC skipped — solo mode".
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Two approved GDDs create two EPIC files
+
+**Fixture:**
+- `design/gdd/systems-index.md` exists with 2 systems listed
+- Both systems have approved GDDs in `design/gdd/`
+- `docs/architecture/architecture.md` exists with matching modules
+- At least one Accepted ADR exists for each system
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/create-epics`
+
+**Expected behavior:**
+1. Skill reads systems index and both GDDs
+2. Drafts 2 EPIC definitions (layer, GDD path, ADRs, requirements, engine risk)
+3. PR-EPIC gate is skipped (lean mode) — noted in output
+4. For each epic: asks "May I write `production/epics/[layer]/EPIC-[name].md`?"
+5. After approval: writes both EPIC files
+6. Creates or updates `production/epics/index.md`
+
+**Assertions:**
+- [ ] Epic summary is shown before any write ask
+- [ ] "May I write" is asked per-epic (not once for all epics together)
+- [ ] Each EPIC.md contains: layer, GDD path, governing ADRs, requirements table, Definition of Done
+- [ ] PR-EPIC skip is noted in output
+- [ ] `production/epics/index.md` is updated after writing
+- [ ] Skill does NOT write EPIC files without per-epic approval
+
+---
+
+### Case 2: Failure Path — No approved GDDs found
+
+**Fixture:**
+- `design/gdd/systems-index.md` exists
+- No GDDs in `design/gdd/` have approved status (all are Draft or In Progress)
+
+**Input:** `/create-epics`
+
+**Expected behavior:**
+1. Skill reads systems index and attempts to find approved GDDs
+2. No approved GDDs found
+3. Skill outputs: "No approved GDDs to convert. GDDs must be Approved before creating epics."
+4. Skill suggests running `/design-system` and completing GDD approval first
+5. Skill exits without creating any EPIC files
+
+**Assertions:**
+- [ ] Skill stops cleanly with a clear message when no approved GDDs exist
+- [ ] No EPIC files are written
+- [ ] Skill recommends the correct next action
+- [ ] Verdict is BLOCKED
+
+---
+
+### Case 3: Director Gate — Full mode spawns PR-EPIC before writing
+
+**Fixture:**
+- 2 approved GDDs exist
+- `production/session-state/review-mode.txt` contains `full`
+
+**Full mode expected behavior:**
+1. Skill drafts both epics
+2. PR-EPIC gate spawns and reviews the epic drafts
+3. If PR-EPIC returns APPROVED: "May I write" ask proceeds normally
+4. Epic files are written after approval
+
+**Assertions (full mode):**
+- [ ] PR-EPIC gate appears in output as an active gate
+- [ ] PR-EPIC runs before any "May I write" ask
+- [ ] Epic files are NOT written before PR-EPIC completes
+
+**Fixture (lean mode):**
+- Same GDDs
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Lean mode expected behavior:**
+1. Epics are drafted
+2. PR-EPIC is skipped — noted in output
+3. "May I write" ask proceeds directly
+
+**Assertions (lean mode):**
+- [ ] "PR-EPIC skipped — lean mode" appears in output
+- [ ] Skill proceeds to "May I write" without waiting for PR-EPIC
+
+---
+
+### Case 4: Edge Case — Epic already exists for a GDD
+
+**Fixture:**
+- `production/epics/[layer]/EPIC-[name].md` already exists for one of the approved GDDs
+- The other GDD has no existing EPIC file
+
+**Input:** `/create-epics`
+
+**Expected behavior:**
+1. Skill detects the existing EPIC file for the first system
+2. Skill offers to update rather than overwrite: "EPIC-[name].md already exists. Update it, or skip?"
+3. For the second system (no existing file): proceeds normally with "May I write"
+
+**Assertions:**
+- [ ] Skill detects existing EPIC files before writing
+- [ ] User is offered "update" or "skip" options — not auto-overwritten
+- [ ] The new system's EPIC is created normally without conflict
+
+---
+
+### Case 5: Director Gate — PR-EPIC returns CONCERNS
+
+**Fixture:**
+- 2 approved GDDs exist
+- `production/session-state/review-mode.txt` contains `full`
+- PR-EPIC gate returns CONCERNS (e.g., scope of one epic is too large)
+
+**Input:** `/create-epics`
+
+**Expected behavior:**
+1. PR-EPIC gate spawns and returns CONCERNS with specific feedback
+2. Skill surfaces the concerns to the user before any write ask
+3. User is given options: revise epics, accept concerns and proceed, or stop
+4. If user revises: updated epic drafts are shown before the "May I write" ask
+5. Skill does NOT write epics while CONCERNS are unaddressed
+
+**Assertions:**
+- [ ] CONCERNS from PR-EPIC are shown to the user before writing
+- [ ] Skill does NOT auto-write epics when CONCERNS are returned
+- [ ] User is given a clear choice to revise, proceed, or stop
+- [ ] Revised epic drafts are re-shown after revision before final approval
+
+---
+
+## Protocol Compliance
+
+- [ ] Epic drafts shown to user before any "May I write" ask
+- [ ] "May I write" asked per-epic, not once for the entire batch
+- [ ] PR-EPIC gate (if active) runs before write asks — not after
+- [ ] Skipped gates noted by name and mode in output
+- [ ] EPIC.md content sourced only from GDDs, ADRs, and architecture docs — nothing invented
+- [ ] Ends with next-step handoff: `/create-stories [epic-slug]` per created epic
+
+---
+
+## Coverage Notes
+
+- Processing of Core, Feature, and Presentation layers follows the same per-epic
+  pattern as Foundation — layer-specific ordering is not independently tested.
+- Engine risk level assignment (LOW/MEDIUM/HIGH) from governing ADRs is
+  validated implicitly via Case 1's fixture structure.
+- The `layer: [name]` and `[system-name]` argument modes follow the same approval
+  pattern as the default (all systems) mode.
--- a/Framework/skills/pipeline/create-stories.md
+++ b/Framework/skills/pipeline/create-stories.md
@@ -0,0 +1,191 @@
+# Skill Test Spec: /create-stories
+
+## Skill Summary
+
+`/create-stories` breaks a single epic into developer-ready story files. It reads
+the EPIC.md, the corresponding GDD, governing ADRs, the control manifest, and the
+TR registry. Each story gets structured frontmatter including: Title, Epic, Layer,
+Priority, Status, TR-ID, ADR references, Acceptance Criteria, and Definition of
+Done. Stories are classified by type (Logic / Integration / Visual/Feel / UI /
+Config/Data) which determines the required test evidence path.
+
+In `full` review mode, a QL-STORY-READY check runs per story after creation. In
+`lean` or `solo` mode, QL-STORY-READY is skipped. The skill asks "May I write"
+before writing each story file. Stories are written to
+`production/epics/[layer]/story-[name].md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED, NEEDS WORK
+- [ ] Contains "May I write" collaborative protocol language (per-story approval)
+- [ ] Has a next-step handoff at the end (`/story-readiness`, `/dev-story`)
+- [ ] Documents story Status: Blocked when governing ADR is Proposed
+- [ ] Documents QL-STORY-READY gate: active in full mode, skipped in lean/solo
+
+---
+
+## Director Gate Checks
+
+In `full` mode: QL-STORY-READY check runs per story after creation. Stories that
+fail the check are noted as NEEDS WORK before the "May I write" ask.
+
+In `lean` mode: QL-STORY-READY is skipped. Output notes:
+"QL-STORY-READY skipped — lean mode" per story.
+
+In `solo` mode: QL-STORY-READY is skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Epic with 3 stories, all ADRs Accepted
+
+**Fixture:**
+- `production/epics/[layer]/EPIC-[name].md` exists with 3 GDD requirements
+- Corresponding GDD exists with matching acceptance criteria
+- All governing ADRs have `Status: Accepted`
+- `docs/architecture/control-manifest.md` exists
+- `docs/architecture/tr-registry.yaml` has TR-IDs for all 3 requirements
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Input:** `/create-stories [epic-name]`
+
+**Expected behavior:**
+1. Skill reads EPIC.md, GDD, governing ADRs, control manifest, and TR registry
+2. Classifies each requirement into a story type (Logic / Integration / Visual/Feel / UI / Config/Data)
+3. Drafts 3 story files with correct frontmatter schema
+4. QL-STORY-READY is skipped (lean mode) — noted in output
+5. Asks "May I write" before writing each story file
+6. Writes all 3 story files after approval
+
+**Assertions:**
+- [ ] Each story's frontmatter contains: Title, Epic, Layer, Priority, Status, TR-ID, ADR reference, Acceptance Criteria, DoD
+- [ ] Story types are correctly classified (at least one Logic type in fixture)
+- [ ] "May I write" is asked per story (not once for the entire batch)
+- [ ] QL-STORY-READY skip is noted in output
+- [ ] All 3 story files are written with correct naming: `story-[name].md`
+- [ ] Skill does NOT start implementation
+
+---
+
+### Case 2: Failure Path — No epic file found
+
+**Fixture:**
+- The epic path provided does not exist in `production/epics/`
+
+**Input:** `/create-stories nonexistent-epic`
+
+**Expected behavior:**
+1. Skill attempts to read the EPIC.md file
+2. File not found
+3. Skill outputs a clear error with the path it searched
+4. Skill suggests checking `production/epics/` or running `/create-epics` first
+5. No story files are created
+
+**Assertions:**
+- [ ] Skill outputs a clear error naming the missing file path
+- [ ] No story files are written
+- [ ] Skill recommends the correct next action (`/create-epics`)
+- [ ] Skill does NOT create stories without a valid EPIC.md
+
+---
+
+### Case 3: Blocked Story — ADR is Proposed
+
+**Fixture:**
+- EPIC.md exists with 2 requirements
+- Requirement 1 is covered by an Accepted ADR
+- Requirement 2 is covered by an ADR with `Status: Proposed`
+
+**Input:** `/create-stories [epic-name]`
+
+**Expected behavior:**
+1. Skill reads the ADR for Requirement 2 and finds Status: Proposed
+2. Story for Requirement 2 is drafted with `Status: Blocked`
+3. Blocking note references the specific ADR: "BLOCKED: ADR-NNN is Proposed"
+4. Story for Requirement 1 is drafted normally with `Status: Ready`
+5. Both stories are shown in the draft — user asked "May I write" for both
+
+**Assertions:**
+- [ ] Story 2 has `Status: Blocked` in its frontmatter
+- [ ] Blocking note names the specific ADR number and recommends `/architecture-decision`
+- [ ] Story 1 has `Status: Ready` — blocked status does not affect non-blocked stories
+- [ ] Blocked status is shown in the draft preview before writing
+- [ ] Both story files are written (blocked stories are still written — just flagged)
+
+---
+
+### Case 4: Edge Case — No argument provided
+
+**Fixture:**
+- `production/epics/` directory exists with ≥2 epic subdirectories
+
+**Input:** `/create-stories` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Outputs a usage error: "No epic specified. Usage: /create-stories [epic-name]"
+3. Skill lists available epics from `production/epics/`
+4. No story files are created
+
+**Assertions:**
+- [ ] Skill outputs a usage error when no argument is given
+- [ ] Skill lists available epics to help the user choose
+- [ ] No story files are written
+- [ ] Skill does NOT silently pick an epic without user input
+
+---
+
+### Case 5: Director Gate — Full mode runs QL-STORY-READY; stories failing noted as NEEDS WORK
+
+**Fixture:**
+- EPIC.md exists with 2 requirements
+- Both governing ADRs are Accepted
+- `production/session-state/review-mode.txt` contains `full`
+- QL-STORY-READY check finds one story has ambiguous acceptance criteria
+
+**Input:** `/create-stories [epic-name]`
+
+**Expected behavior:**
+1. Both stories are drafted
+2. QL-STORY-READY check runs for each story
+3. Story 1 passes QL-STORY-READY
+4. Story 2 fails QL-STORY-READY — noted as NEEDS WORK with specific feedback
+5. Both stories are shown to user with pass/fail status before "May I write"
+6. User can proceed (story written as-is with NEEDS WORK note) or revise first
+
+**Assertions:**
+- [ ] QL-STORY-READY results appear per story in the output
+- [ ] Story 2 is flagged as NEEDS WORK with the specific failing criteria
+- [ ] Story 1 shows as passing QL-STORY-READY
+- [ ] User is given the choice to proceed or revise before writing
+- [ ] Skill does NOT auto-block writing of stories that fail QL-STORY-READY without user input
+
+---
+
+## Protocol Compliance
+
+- [ ] All context (EPIC, GDD, ADRs, manifest, TR registry) loaded before drafting stories
+- [ ] Story drafts shown in full before any "May I write" ask
+- [ ] "May I write" asked per story (not once for the entire batch)
+- [ ] Blocked stories flagged before write approval — not discovered after writing
+- [ ] TR-IDs reference the registry — requirement text is not embedded inline in story files
+- [ ] Control manifest rules quoted per-story from the manifest, not invented
+- [ ] Ends with next-step handoff: `/story-readiness` → `/dev-story`
+
+---
+
+## Coverage Notes
+
+- Integration story test evidence (playtest doc alternative) follows the same
+  approval pattern as Logic stories — not independently fixture-tested.
+- Story ordering (foundational first, UI last) is validated implicitly via
+  Case 1's multi-story fixture.
+- The story sizing rule (splitting large requirement groups) is not tested here
+  — it is addressed in the `/create-stories` skill's internal logic.
--- a/Framework/skills/pipeline/dev-story.md
+++ b/Framework/skills/pipeline/dev-story.md
@@ -0,0 +1,205 @@
+# Skill Test Spec: /dev-story
+
+## Skill Summary
+
+`/dev-story` reads a story file, loads all required context (referenced ADR,
+TR-ID from the registry, control manifest, engine preferences), implements the
+story, verifies that all acceptance criteria are met, and marks the story
+Complete. The skill routes implementation to the correct specialist agent based
+on the engine and file type — it does not write source code directly.
+
+In `full` review mode, an LP-CODE-REVIEW gate runs before marking the story
+Complete. In `lean` or `solo` mode, LP-CODE-REVIEW is skipped and the story is
+marked Complete after the user confirms all criteria are met. The skill asks
+"May I write" before updating story status and before writing code files.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED, IN PROGRESS, NEEDS CHANGES
+- [ ] Contains "May I write" collaborative protocol language (story status + code files)
+- [ ] Has a next-step handoff at the end (`/story-done`)
+- [ ] Documents LP-CODE-REVIEW gate: active in full mode, skipped in lean/solo
+- [ ] Notes that implementation is delegated to specialist agents (not done directly)
+
+---
+
+## Director Gate Checks
+
+In `full` mode: LP-CODE-REVIEW gate runs after implementation is complete and all
+criteria are verified, before marking the story Complete.
+
+In `lean` mode: LP-CODE-REVIEW is skipped. Output notes:
+"LP-CODE-REVIEW skipped — lean mode". Story is marked Complete after user confirms.
+
+In `solo` mode: LP-CODE-REVIEW is skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Story implemented and marked Complete (full mode)
+
+**Fixture:**
+- A story file exists at `production/epics/[layer]/story-[name].md` with:
+  - `Status: Ready`
+  - A TR-ID referencing a registered requirement
+  - At least 2 Given-When-Then acceptance criteria
+  - A test evidence path
+- Referenced ADR has `Status: Accepted`
+- `docs/architecture/control-manifest.md` exists
+- `.claude/docs/technical-preferences.md` has engine and language configured
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/dev-story production/epics/[layer]/story-[name].md`
+
+**Expected behavior:**
+1. Skill reads the story file and all referenced context
+2. Skill verifies the ADR is Accepted (no block)
+3. Skill routes implementation to the correct specialist agent
+4. All acceptance criteria are verified as met
+5. LP-CODE-REVIEW gate spawns and returns APPROVED
+6. Skill asks "May I update story status to Complete?"
+7. Story status is updated to Complete
+
+**Assertions:**
+- [ ] Skill reads story before spawning any agent
+- [ ] ADR status is checked before implementation begins
+- [ ] Implementation is delegated to a specialist agent (not done inline)
+- [ ] All acceptance criteria are confirmed before LP-CODE-REVIEW
+- [ ] LP-CODE-REVIEW appears in output as a completed gate
+- [ ] Story status is updated to Complete only after gate approval and user consent
+- [ ] Test file is written as part of implementation (not deferred)
+
+---
+
+### Case 2: Failure Path — Referenced ADR is Proposed
+
+**Fixture:**
+- A story file exists with `Status: Ready`
+- The story's TR-ID points to a requirement covered by an ADR with `Status: Proposed`
+
+**Input:** `/dev-story production/epics/[layer]/story-[name].md`
+
+**Expected behavior:**
+1. Skill reads the story file
+2. Skill resolves the TR-ID and reads the governing ADR
+3. ADR status is Proposed — skill outputs a BLOCKED message
+4. Skill names the specific ADR blocking the story
+5. Skill recommends running `/architecture-decision` to advance the ADR
+6. Implementation does NOT begin
+
+**Assertions:**
+- [ ] Skill does NOT begin implementation with a Proposed ADR
+- [ ] BLOCKED message names the specific ADR number and title
+- [ ] Skill recommends `/architecture-decision` as the next action
+- [ ] Story status remains unchanged (not set to In Progress or Complete)
+
+---
+
+### Case 3: Ambiguous Acceptance Criteria — Skill asks for clarification
+
+**Fixture:**
+- A story file exists with `Status: Ready`
+- Referenced ADR is Accepted
+- One acceptance criterion is ambiguous (not Given-When-Then; uses subjective language like "feels responsive")
+
+**Input:** `/dev-story production/epics/[layer]/story-[name].md`
+
+**Expected behavior:**
+1. Skill reads the story and identifies the ambiguous criterion
+2. Before routing to the specialist, skill asks the user to clarify the criterion
+3. User provides a concrete, testable restatement
+4. Skill proceeds with implementation using the clarified criterion
+5. Skill does NOT guess at the intended behavior
+
+**Assertions:**
+- [ ] Skill surfaces the ambiguous criterion before implementation starts
+- [ ] Skill asks for user clarification (not auto-interpretation)
+- [ ] Implementation begins only after clarification is provided
+- [ ] Clarified criterion is used in the test (not the original vague version)
+
+---
+
+### Case 4: Edge Case — No argument; reads from session state
+
+**Fixture:**
+- No argument is provided
+- `production/session-state/active.md` references an active story file
+- That story file exists with `Status: In Progress`
+
+**Input:** `/dev-story` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Skill reads `production/session-state/active.md`
+3. Skill finds the active story reference
+4. Skill confirms with user: "Continuing work on [story title] — is that correct?"
+5. After confirmation, skill proceeds with that story
+
+**Assertions:**
+- [ ] Skill reads session state when no argument is provided
+- [ ] Skill confirms the active story with the user before proceeding
+- [ ] Skill does NOT silently assume the active story without confirmation
+- [ ] If session state has no active story, skill asks which story to implement
+
+---
+
+### Case 5: Director Gate — LP-CODE-REVIEW returns NEEDS CHANGES; lean mode skips gate
+
+**Fixture (full mode):**
+- Story is implemented and all criteria appear met
+- `production/session-state/review-mode.txt` contains `full`
+- LP-CODE-REVIEW gate returns NEEDS CHANGES with specific feedback
+
+**Full mode expected behavior:**
+1. LP-CODE-REVIEW gate spawns after implementation
+2. Gate returns NEEDS CHANGES with 2 specific issues
+3. Story status remains In Progress — NOT marked Complete
+4. User is shown the gate feedback and asked how to proceed
+
+**Assertions (full mode):**
+- [ ] Story is NOT marked Complete when LP-CODE-REVIEW returns NEEDS CHANGES
+- [ ] Gate feedback is shown to the user verbatim
+- [ ] Story status stays In Progress until issues are resolved and gate passes
+
+**Fixture (lean mode):**
+- Same story, `production/session-state/review-mode.txt` contains `lean`
+
+**Lean mode expected behavior:**
+1. Implementation completes
+2. LP-CODE-REVIEW gate is skipped — noted in output
+3. User is asked to confirm all criteria are met
+4. Story is marked Complete after user confirmation
+
+**Assertions (lean mode):**
+- [ ] "LP-CODE-REVIEW skipped — lean mode" appears in output
+- [ ] Story is marked Complete after user confirms criteria (no gate required)
+- [ ] Skill does NOT block on a gate that is skipped
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT write source code directly — delegates to specialist agents
+- [ ] Reads all context (story, TR-ID, ADR, manifest, engine prefs) before implementation
+- [ ] "May I write" asked before updating story status and before writing code files
+- [ ] Skipped gates noted by name and mode in output
+- [ ] Updates `production/session-state/active.md` after story completion
+- [ ] Ends with next-step handoff: `/story-done`
+
+---
+
+## Coverage Notes
+
+- Engine routing logic (Godot vs Unity vs Unreal) is not tested per engine —
+  the routing pattern is consistent; engine selection is a config fact.
+- Visual/Feel and UI story types (no automated test required) have different
+  evidence requirements and are not covered in these cases.
+- Integration story type follows the same pattern as Logic but with a different
+  evidence path — not independently fixture-tested.
--- a/Framework/skills/pipeline/map-systems.md
+++ b/Framework/skills/pipeline/map-systems.md
@@ -0,0 +1,196 @@
+# Skill Test Spec: /map-systems
+
+## Skill Summary
+
+`/map-systems` decomposes a game concept into a systems index. It reads the
+approved game concept and pillars, enumerates both explicit and implicit systems,
+maps dependencies between systems, assigns priority tiers (MVP / Vertical Slice /
+Alpha / Full Vision), and organizes systems into a layered design order
+(Foundation → Core → Feature → Presentation). The output is written to
+`design/systems-index.md` after user approval.
+
+This skill is required between game concept approval and per-system GDD creation
+— it is a mandatory gate in the pipeline. In `full` review mode, CD-SYSTEMS
+(creative-director) and TD-SYSTEM-BOUNDARY (technical-director) spawn in parallel
+after the decomposition is drafted. In `lean` or `solo` mode, both gates are
+skipped. The skill writes to `design/systems-index.md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language (for systems-index.md)
+- [ ] Has a next-step handoff at the end (`/design-system`)
+- [ ] Documents gate behavior: CD-SYSTEMS + TD-SYSTEM-BOUNDARY in parallel in full mode
+
+---
+
+## Director Gate Checks
+
+In `full` mode: CD-SYSTEMS (creative-director) and TD-SYSTEM-BOUNDARY
+(technical-director) spawn in parallel after the systems decomposition is drafted
+and before `design/systems-index.md` is written.
+
+In `lean` mode: both gates are skipped. Output notes:
+"CD-SYSTEMS skipped — lean mode" and "TD-SYSTEM-BOUNDARY skipped — lean mode".
+
+In `solo` mode: both gates are skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Game concept exists, 5-8 systems identified
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists with Core Mechanics and MVP Definition sections
+- `design/gdd/game-pillars.md` exists with ≥1 pillar defined
+- No `design/systems-index.md` exists yet
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/map-systems`
+
+**Expected behavior:**
+1. Skill reads game-concept.md and game-pillars.md
+2. Identifies 5-8 systems (explicit + implicit)
+3. Maps dependencies between systems and assigns layers
+4. CD-SYSTEMS and TD-SYSTEM-BOUNDARY spawn in parallel and return APPROVED
+5. Asks "May I write `design/systems-index.md`?"
+6. Writes systems-index.md after approval
+7. Updates `production/session-state/active.md`
+
+**Assertions:**
+- [ ] Between 5 and 8 systems are identified (not fewer, not more without explanation)
+- [ ] CD-SYSTEMS and TD-SYSTEM-BOUNDARY spawn in parallel (not sequentially)
+- [ ] Both gates complete before the "May I write" ask
+- [ ] "May I write `design/systems-index.md`?" is asked before writing
+- [ ] systems-index.md is NOT written without approval
+- [ ] Session state is updated after writing
+- [ ] Verdict is COMPLETE
+
+---
+
+### Case 2: Failure Path — No game concept found
+
+**Fixture:**
+- `design/gdd/game-concept.md` does NOT exist
+- `design/gdd/` directory may be empty or absent
+
+**Input:** `/map-systems`
+
+**Expected behavior:**
+1. Skill attempts to read `design/gdd/game-concept.md`
+2. File not found
+3. Skill outputs: "No game concept found. Run `/brainstorm` to create one, then return to `/map-systems`."
+4. Skill exits without creating systems-index.md
+
+**Assertions:**
+- [ ] Skill outputs a clear error naming the missing file path
+- [ ] Skill recommends `/brainstorm` as the next action
+- [ ] No systems-index.md is created
+- [ ] Verdict is BLOCKED
+
+---
+
+### Case 3: Director Gate — CD-SYSTEMS returns CONCERNS (missing core system)
+
+**Fixture:**
+- Game concept exists
+- `production/session-state/review-mode.txt` contains `full`
+- CD-SYSTEMS gate returns CONCERNS: "The [core-system] is implied by the concept but not identified"
+
+**Input:** `/map-systems`
+
+**Expected behavior:**
+1. Systems are drafted (5-8 initial systems identified)
+2. CD-SYSTEMS gate returns CONCERNS naming the missing core system
+3. TD-SYSTEM-BOUNDARY returns APPROVED
+4. Skill surfaces CD-SYSTEMS concerns to user
+5. User is asked: revise systems list to add the missing system, or proceed as-is
+6. If revised: updated systems list shown before "May I write" ask
+
+**Assertions:**
+- [ ] CD-SYSTEMS concerns are shown to the user before writing
+- [ ] Skill does NOT auto-write systems-index.md while CONCERNS are unresolved
+- [ ] User is given the option to revise or proceed
+- [ ] Revised systems list is re-shown after revision before final "May I write"
+
+---
+
+### Case 4: Edge Case — systems-index.md already exists
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists
+- `design/systems-index.md` already exists with N systems
+
+**Input:** `/map-systems`
+
+**Expected behavior:**
+1. Skill reads the existing systems-index.md and presents its current state
+2. Skill asks: "systems-index.md already exists with [N] systems. Update with new systems, or review and revise priorities?"
+3. User chooses an action
+4. Skill does NOT silently overwrite the existing index
+
+**Assertions:**
+- [ ] Skill detects and reads the existing systems-index.md before proceeding
+- [ ] User is offered update/review options — not auto-overwritten
+- [ ] Existing system count is presented to the user
+- [ ] Skill does NOT proceed with a full re-decomposition without user choosing to do so
+
+---
+
+### Case 5: Director Gate — Lean mode and solo mode both skip gates, noted
+
+**Fixture (lean mode):**
+- Game concept exists
+- `production/session-state/review-mode.txt` contains `lean`
+
+**Lean mode expected behavior:**
+1. Systems are decomposed and drafted
+2. Both CD-SYSTEMS and TD-SYSTEM-BOUNDARY are skipped
+3. Output notes: "CD-SYSTEMS skipped — lean mode" and "TD-SYSTEM-BOUNDARY skipped — lean mode"
+4. "May I write" ask proceeds directly
+
+**Assertions (lean mode):**
+- [ ] Both gate skip notes appear in output
+- [ ] Skill proceeds to "May I write" without gate approval
+- [ ] systems-index.md is written after user approval
+
+**Fixture (solo mode):**
+- Same game concept, `production/session-state/review-mode.txt` contains `solo`
+
+**Solo mode expected behavior:**
+1. Same decomposition workflow
+2. Both gates skipped — noted in output with "solo mode"
+3. "May I write" ask proceeds
+
+**Assertions (solo mode):**
+- [ ] Both skip notes appear with "solo mode" label
+- [ ] Behavior is otherwise identical to lean mode for this skill
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads game-concept.md and game-pillars.md before any decomposition
+- [ ] "May I write `design/systems-index.md`?" asked before writing
+- [ ] systems-index.md is NOT written without user approval
+- [ ] CD-SYSTEMS and TD-SYSTEM-BOUNDARY spawn in parallel in full mode
+- [ ] Skipped gates noted by name and mode in lean/solo output
+- [ ] Ends with next-step handoff: `/design-system [next-system]`
+
+---
+
+## Coverage Notes
+
+- Circular dependency detection (System A depends on System B which depends on A)
+  is part of the dependency mapping phase — not independently fixture-tested here.
+- Priority tier assignment (MVP heuristics) is evaluated as part of the Case 1
+  collaborative workflow rather than independently.
+- The `next` argument mode (handing off the highest-priority undesigned system to
+  `/design-system`) is not tested here — it is a post-index-creation convenience.
--- a/Framework/skills/pipeline/propagate-design-change.md
+++ b/Framework/skills/pipeline/propagate-design-change.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /propagate-design-change
+
+## Skill Summary
+
+`/propagate-design-change` handles GDD revision cascades. When a GDD is updated,
+the skill traces all downstream artifacts that reference it: ADRs, TR-registry
+entries, stories, and epics. It produces a structured impact report showing what
+needs to change and why. The skill does NOT automatically apply changes — it
+proposes edits for each affected artifact and asks "May I write" per artifact
+before making any modification.
+
+The skill is read-only during analysis and write-gated per artifact during the
+update phase. It has no director gates — the analysis itself is mechanical
+tracing, not a creative review.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED, NO IMPACT
+- [ ] Contains "May I write" collaborative protocol language (per-artifact approval)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents that changes are proposed, not applied automatically
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents during analysis.
+The impact report is a mechanical tracing operation; no creative or technical
+director review is required at the analysis stage.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — GDD revision affects 2 stories and 1 epic
+
+**Fixture:**
+- `design/gdd/[system].md` exists and has been recently revised (git diff shows changes)
+- `production/epics/[layer]/EPIC-[system].md` references this GDD
+- 2 story files reference TR-IDs from this GDD
+- The changed GDD section affects the acceptance criteria of both stories
+
+**Input:** `/propagate-design-change design/gdd/[system].md`
+
+**Expected behavior:**
+1. Skill reads the revised GDD and identifies what changed (git diff or content comparison)
+2. Skill scans ADRs, TR-registry, epics, and stories for references to this GDD
+3. Skill produces an impact report: 1 epic affected, 2 stories affected
+4. Skill shows the proposed change for each artifact
+5. For each artifact: asks "May I update [filepath]?" separately
+6. Applies changes only after per-artifact approval
+
+**Assertions:**
+- [ ] Impact report identifies all 3 affected artifacts (1 epic + 2 stories)
+- [ ] Each affected artifact's proposed change is shown before asking to write
+- [ ] "May I write" is asked per artifact (not once for all artifacts)
+- [ ] Skill does NOT apply any changes without per-artifact approval
+- [ ] Verdict is COMPLETE after all approved changes are applied
+
+---
+
+### Case 2: No Impact — Changed GDD has no downstream references
+
+**Fixture:**
+- `design/gdd/[system].md` exists and has been revised
+- No ADRs, stories, or epics reference this GDD's TR-IDs or GDD path
+
+**Input:** `/propagate-design-change design/gdd/[system].md`
+
+**Expected behavior:**
+1. Skill reads the revised GDD
+2. Skill scans all ADRs, stories, and epics for references
+3. No references found
+4. Skill outputs: "No downstream impact found for [system].md — no artifacts reference this GDD."
+5. No write operations are performed
+
+**Assertions:**
+- [ ] Skill outputs the "No downstream impact found" message
+- [ ] Verdict is NO IMPACT
+- [ ] No "May I write" asks are issued (nothing to update)
+- [ ] Skill does NOT error or crash when no references are found
+
+---
+
+### Case 3: In-Progress Story Warning — Referenced story is currently being developed
+
+**Fixture:**
+- A story referencing this GDD has `Status: In Progress`
+- The developer has already started implementing this story
+
+**Input:** `/propagate-design-change design/gdd/[system].md`
+
+**Expected behavior:**
+1. Skill identifies the In Progress story as an affected artifact
+2. Skill outputs an elevated warning: "CAUTION: [story-file] is currently In Progress — a developer may be working on this. Coordinate before updating."
+3. The warning appears in the impact report before the "May I write" ask for that story
+4. User can still approve or skip the update for that story
+
+**Assertions:**
+- [ ] In Progress story is flagged with an elevated warning (distinct from regular affected-artifact entries)
+- [ ] Warning appears before the "May I write" ask for that story
+- [ ] Skill still offers to update the story — the warning does not block the option
+- [ ] Other (non-In-Progress) artifacts are not affected by this warning
+
+---
+
+### Case 4: Edge Case — No argument provided
+
+**Fixture:**
+- Multiple GDDs exist in `design/gdd/`
+
+**Input:** `/propagate-design-change` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Skill outputs a usage error: "No GDD specified. Usage: /propagate-design-change design/gdd/[system].md"
+3. Skill lists recently modified GDDs as suggestions (git log)
+4. No analysis is performed
+
+**Assertions:**
+- [ ] Skill outputs a usage error when no argument is given
+- [ ] Usage example is shown with the correct path format
+- [ ] No impact analysis is performed without a target GDD
+- [ ] Skill does NOT silently pick a GDD without user input
+
+---
+
+### Case 5: Director Gate — No gate spawned regardless of review mode
+
+**Fixture:**
+- A GDD has been revised with downstream references
+- `production/session-state/review-mode.txt` exists with `full`
+
+**Input:** `/propagate-design-change design/gdd/[system].md`
+
+**Expected behavior:**
+1. Skill reads the GDD and traces downstream references
+2. Skill does NOT read `production/session-state/review-mode.txt`
+3. No director gate agents are spawned at any point
+4. Impact report is produced and per-artifact approval proceeds normally
+
+**Assertions:**
+- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
+- [ ] Review mode has no effect on this skill's behavior
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads revised GDD and all potentially affected artifacts before producing impact report
+- [ ] Impact report shown in full before any "May I write" ask
+- [ ] "May I write" asked per artifact — never for the entire set at once
+- [ ] In Progress stories flagged with elevated warning before their approval ask
+- [ ] No director gates — no review-mode.txt read
+- [ ] Ends with next-step handoff appropriate to verdict (COMPLETE or NO IMPACT)
+
+---
+
+## Coverage Notes
+
+- ADR impact (when a GDD change requires an ADR update or new ADR) follows the
+  same per-artifact approval pattern as story/epic updates — not independently
+  fixture-tested.
+- TR-registry impact (when changed GDD requires new or updated TR-IDs) is part
+  of the analysis phase but not independently fixture-tested.
+- The git diff comparison method (detecting what changed in the GDD) is a runtime
+  concern — fixtures use pre-arranged content differences.
--- a/Framework/skills/readiness/story-done.md
+++ b/Framework/skills/readiness/story-done.md
@@ -0,0 +1,209 @@
+# Skill Test Spec: /story-done
+
+## Skill Summary
+
+`/story-done` closes the loop between design and implementation. Run at the
+end of implementing a story, it reads the story file and verifies each
+acceptance criterion against the implementation. It checks for GDD and ADR
+deviations, prompts a code review, updates the story status to `Complete`,
+logs any tech debt, and surfaces the next ready story from the sprint. It
+produces a COMPLETE / COMPLETE WITH NOTES / BLOCKED verdict and writes to
+the story file and optionally to `docs/tech-debt-register.md`.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥5 phase headings (complex skill warranting `context: fork` if applicable)
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" collaborative protocol language (writes to story file and tech-debt register)
+- [ ] Has a next-step handoff (surfaces next story from sprint)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All acceptance criteria met, no deviations
+
+**Fixture:**
+- Story file at `production/epics/core/story-light-pickup.md` with:
+  - 3 acceptance criteria, all implemented as described
+  - `TR-ID: TR-light-001` referencing a GDD requirement
+  - `ADR: docs/architecture/adr-003-inventory.md` (Accepted)
+  - `Status: In Progress`
+- Implementation files listed in story exist in `src/`
+- GDD requirement text at TR-light-001 matches how the feature was implemented
+- ADR guidance was followed (no deviations)
+
+**Input:** `/story-done production/epics/core/story-light-pickup.md`
+
+**Expected behavior:**
+1. Skill reads the story file and extracts all key fields
+2. Skill reads the GDD requirement fresh from `tr-registry.yaml` (not from story's quoted text)
+3. Skill reads the referenced ADR to understand implementation constraints
+4. Skill evaluates each acceptance criterion (auto where possible, manual prompt where not)
+5. Skill checks for GDD requirement deviations
+6. Skill checks for ADR guideline deviations
+7. Skill prompts user: "Please provide the code review outcome for this story"
+8. Skill presents COMPLETE verdict
+9. Skill asks "May I update story Status to Complete and add Completion Notes?"
+10. If yes: skill updates the story file
+11. Skill surfaces the next `Ready for Dev` story from the sprint
+
+**Assertions:**
+- [ ] Skill reads `docs/architecture/tr-registry.yaml` for TR-ID requirement text (not just story)
+- [ ] Skill reads the referenced ADR file (not just the story reference)
+- [ ] Each acceptance criterion is listed with VERIFIED / DEFERRED / FAILED status
+- [ ] Skill prompts the user for code review outcome (does not skip this step)
+- [ ] Verdict is COMPLETE when all criteria are verified and no deviations exist
+- [ ] Skill asks "May I write" before updating the story file
+- [ ] Skill does NOT auto-update story status without user confirmation
+- [ ] After completion, skill surfaces the next ready story from `production/sprints/`
+
+---
+
+### Case 2: Blocked Path — Acceptance criterion cannot be verified
+
+**Fixture:**
+- Story file has an acceptance criterion: "Player sees correct animation on pickup"
+- No automated test for this criterion exists
+- Manual verification has not been performed
+- All other criteria are met
+
+**Input:** `/story-done production/epics/core/story-light-pickup.md`
+
+**Expected behavior:**
+1. Skill processes all acceptance criteria
+2. Reaches the animation criterion — cannot auto-verify
+3. Skill asks the user: "Acceptance criterion 'Player sees correct animation on
+   pickup' cannot be auto-verified. Has this been manually tested?"
+4. If user says No: criterion is marked DEFERRED, verdict becomes COMPLETE WITH NOTES
+5. Skill records the deferred criterion in completion notes
+6. Asks "May I write updated story with deferred criterion noted?"
+
+**Assertions:**
+- [ ] Skill asks the user about unverifiable criteria rather than assuming PASS
+- [ ] Deferred criteria result in COMPLETE WITH NOTES (not COMPLETE or BLOCKED)
+- [ ] The deferred criterion is explicitly named in the completion notes
+- [ ] Skill still asks "May I write" before updating the story file
+
+---
+
+### Case 3: Blocked Path — GDD deviation detected
+
+**Fixture:**
+- Story TR-ID points to requirement: "Player can carry max 3 light sources"
+- Implementation in `src/` uses a variable `MAX_CARRIED_LIGHTS = 5`
+- This is a deliberate deviation from the GDD
+
+**Input:** `/story-done production/epics/core/story-light-pickup.md`
+
+**Expected behavior:**
+1. Skill reads the GDD requirement text (max 3)
+2. Skill detects discrepancy between requirement and implementation value (5)
+3. Skill flags this as a GDD deviation and asks the user to classify it:
+   - INTENTIONAL: document the deviation and reason
+   - ERROR: implementation must be fixed before story can be marked Complete
+   - OUT OF SCOPE: requirement changed and GDD needs updating
+4. If INTENTIONAL: skill records deviation in completion notes, verdict is COMPLETE WITH NOTES
+5. If ERROR: verdict is BLOCKED until implementation is corrected
+
+**Assertions:**
+- [ ] Skill detects the mismatch between GDD requirement and implementation value
+- [ ] Skill asks the user to classify the deviation (not auto-assumes either way)
+- [ ] INTENTIONAL deviation → COMPLETE WITH NOTES (not BLOCKED)
+- [ ] ERROR deviation → BLOCKED verdict until fixed
+- [ ] Detected deviations are recorded in completion notes or tech debt register
+
+---
+
+### Case 4: Edge Case — No argument, auto-detect current story
+
+**Fixture:**
+- `production/session-state/active.md` contains a reference to
+  `production/epics/core/story-oxygen-drain.md` as the active story
+- That story file exists with `Status: In Progress`
+
+**Input:** `/story-done` (no argument)
+
+**Expected behavior:**
+1. Skill reads `production/session-state/active.md`
+2. Skill finds the active story reference
+3. Skill reads that story file and proceeds normally
+4. Output confirms which story was auto-detected
+
+**Assertions:**
+- [ ] Skill reads `production/session-state/active.md` when no argument is given
+- [ ] Skill identifies and confirms the auto-detected story before proceeding
+- [ ] If no story is found in session state, skill asks the user to provide a path
+
+---
+
+---
+
+### Case 5: Director Gate — LP-CODE-REVIEW behavior across review modes
+
+**Fixture:**
+- Story file at `production/epics/core/story-light-pickup.md`
+- All acceptance criteria verified, no GDD deviations
+- `production/session-state/review-mode.txt` exists
+
+**Case 5a — full mode:**
+- `review-mode.txt` contains `full`
+
+**Input:** `/story-done production/epics/core/story-light-pickup.md` (full mode)
+
+**Expected behavior:**
+1. Skill reads review mode — determines `full`
+2. After implementation verification, skill invokes LP-CODE-REVIEW gate
+3. Lead programmer reviews the implementation
+4. If LP verdict is NEEDS CHANGES → story cannot be marked Complete
+5. If LP verdict is APPROVED → skill proceeds to mark story Complete
+
+**Assertions (5a):**
+- [ ] Skill reads review mode before deciding whether to invoke LP-CODE-REVIEW
+- [ ] LP-CODE-REVIEW gate is invoked in full mode after implementation check
+- [ ] An LP NEEDS CHANGES verdict prevents story from being marked Complete
+- [ ] Gate result is noted in output: "Gate: LP-CODE-REVIEW — [result]"
+- [ ] Skill still asks "May I write" before updating story status even if LP approved
+
+**Case 5b — lean or solo mode:**
+- `review-mode.txt` contains `lean` or `solo`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `lean` or `solo`
+2. LP-CODE-REVIEW gate is SKIPPED
+3. Output notes the skip: "[LP-CODE-REVIEW] skipped — Lean/Solo mode"
+4. Story completion proceeds based on acceptance criteria check only
+
+**Assertions (5b):**
+- [ ] LP-CODE-REVIEW gate does NOT spawn in lean or solo mode
+- [ ] Skip is explicitly noted in output
+- [ ] Skill still requires "May I write" approval before marking story Complete
+
+---
+
+## Protocol Compliance
+
+- [ ] Uses "May I write" before updating the story file
+- [ ] Uses "May I write" before adding entries to `docs/tech-debt-register.md`
+- [ ] Presents complete findings (criteria check, deviation check) before asking approval
+- [ ] Ends by surfacing the next ready story from the sprint plan
+- [ ] Does not mark a story Complete if any criteria are in ERROR state
+- [ ] Does not skip the code review prompt
+
+---
+
+## Coverage Notes
+
+- The full 8-phase flow of the skill is exercised across Cases 1-3; not all
+  edge cases within each phase are covered.
+- Tech debt logging (deferred items written to `docs/tech-debt-register.md`)
+  is mentioned in Case 2 but not the primary assertion focus; dedicated
+  coverage deferred.
+- The `sprint-status.yaml` update (Phase 7 in the skill) is implied by Case 1
+  but not the primary assertion; assumed to follow the same "May I write" pattern.
+- Stories with multiple TR-IDs or multiple ADRs are not explicitly tested.
--- a/Framework/skills/readiness/story-readiness.md
+++ b/Framework/skills/readiness/story-readiness.md
@@ -0,0 +1,195 @@
+# Skill Test Spec: /story-readiness
+
+## Skill Summary
+
+`/story-readiness` validates that a story file is ready for a developer to
+pick up and implement. It checks four dimensions: Design (embedded GDD
+requirements), Architecture (ADR references and status), Scope (clear
+boundaries and DoD), and Definition of Done (testable criteria). It produces
+a READY / NEEDS WORK / BLOCKED verdict. It is a read-only skill and runs
+before any developer picks up a story.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings or numbered check sections
+- [ ] Contains verdict keywords: READY, NEEDS WORK, BLOCKED
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff (what to do after verdict)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Fully ready story
+
+**Fixture:**
+- Story file exists at `production/epics/core/story-light-pickup.md`
+- Story contains:
+  - `TR-ID: TR-light-001` (GDD requirement reference)
+  - `ADR: docs/architecture/adr-003-inventory.md`
+  - Referenced ADR exists and has status `Accepted`
+  - Referenced TR-ID exists in `docs/architecture/tr-registry.yaml`
+  - Story has `## Acceptance Criteria` with ≥3 testable items
+  - Story has `## Definition of Done` section
+  - Story has `Status: Ready for Dev`
+  - Manifest version in story header matches current `docs/architecture/control-manifest.md`
+
+**Input:** `/story-readiness production/epics/core/story-light-pickup.md`
+
+**Expected behavior:**
+1. Skill reads the story file
+2. Skill reads the referenced ADR — verifies status is `Accepted`
+3. Skill reads `docs/architecture/tr-registry.yaml` — verifies TR-ID exists
+4. Skill reads `docs/architecture/control-manifest.md` — verifies manifest version matches
+5. Skill evaluates all 4 dimensions (Design, Architecture, Scope, DoD)
+6. Skill outputs READY verdict with all checks passing
+
+**Assertions:**
+- [ ] Skill reads the referenced ADR file (not just the story)
+- [ ] Skill verifies ADR status is `Accepted` (not `Proposed`)
+- [ ] Skill reads `tr-registry.yaml` to verify TR-ID exists
+- [ ] Output includes check results for all 4 dimensions
+- [ ] Verdict is READY when all checks pass
+- [ ] Skill does not write any files
+
+---
+
+### Case 2: Blocked Path — Referenced ADR is Proposed (not Accepted)
+
+**Fixture:**
+- Story file exists with `ADR: docs/architecture/adr-005-light-system.md`
+- `adr-005-light-system.md` exists but has `Status: Proposed`
+- All other story content is otherwise complete
+
+**Input:** `/story-readiness production/epics/core/story-light-system.md`
+
+**Expected behavior:**
+1. Skill reads the story
+2. Skill reads `adr-005-light-system.md` — finds `Status: Proposed`
+3. Skill flags this as a BLOCKING issue (cannot implement against unaccepted ADR)
+4. Skill outputs BLOCKED verdict
+5. Skill recommends: accept or reject the ADR before picking up the story
+
+**Assertions:**
+- [ ] Verdict is BLOCKED (not NEEDS WORK or READY) when ADR is Proposed
+- [ ] Output explicitly names the Proposed ADR as the blocker
+- [ ] Output recommends resolving ADR status before proceeding
+- [ ] Skill does not output READY regardless of other checks passing
+
+---
+
+### Case 3: Needs Work — Missing Acceptance Criteria
+
+**Fixture:**
+- Story file exists but has no `## Acceptance Criteria` section
+- ADR reference exists and is `Accepted`
+- TR-ID exists in registry
+- Manifest version matches
+
+**Input:** `/story-readiness production/epics/core/story-oxygen-drain.md`
+
+**Expected behavior:**
+1. Skill reads the story
+2. Skill finds no Acceptance Criteria section
+3. Skill flags this as a NEEDS WORK issue (story is incomplete, not blocked)
+4. Skill outputs NEEDS WORK verdict
+5. Skill names the missing section and suggests adding measurable criteria
+
+**Assertions:**
+- [ ] Verdict is NEEDS WORK (not BLOCKED or READY) when Acceptance Criteria section is absent
+- [ ] Output identifies the missing Acceptance Criteria section specifically
+- [ ] Output suggests adding testable/measurable criteria
+- [ ] Skill distinguishes NEEDS WORK (fixable without external dependencies) from BLOCKED (requires outside action)
+
+---
+
+### Case 4: Edge Case — Stale manifest version
+
+**Fixture:**
+- Story file has `Manifest Version: 2026-01-15` in its header
+- `docs/architecture/control-manifest.md` has `Manifest Version: 2026-03-10`
+- Versions do not match (story was created before manifest was updated)
+
+**Input:** `/story-readiness production/epics/core/story-mirror-rotation.md`
+
+**Expected behavior:**
+1. Skill reads the story and extracts manifest version `2026-01-15`
+2. Skill reads control manifest header and extracts current version `2026-03-10`
+3. Skill detects version mismatch
+4. Skill flags this as an ADVISORY issue (not blocking, but worth noting)
+5. Verdict is NEEDS WORK with manifest staleness noted
+
+**Assertions:**
+- [ ] Skill reads `docs/architecture/control-manifest.md` to get current version
+- [ ] Skill compares story's embedded manifest version against current manifest version
+- [ ] Stale manifest version results in NEEDS WORK (not BLOCKED, not READY)
+- [ ] Output explains that the story's embedded guidance may be outdated
+
+---
+
+---
+
+### Case 5: Director Gate — QL-STORY-READY behavior across review modes
+
+**Fixture:**
+- Story file exists and is READY (all 4 dimensions pass, ADR Accepted, criteria present)
+- `production/session-state/review-mode.txt` exists
+
+**Case 5a — full mode:**
+- `review-mode.txt` contains `full`
+
+**Input:** `/story-readiness production/epics/core/story-light-pickup.md` (full mode)
+
+**Expected behavior:**
+1. Skill reads review mode — determines `full`
+2. After completing its own 4-dimension check, skill invokes QL-STORY-READY gate
+3. QA lead reviews the story for readiness
+4. If QA lead verdict is INADEQUATE → story verdict is BLOCKED regardless of 4-dimension result
+5. If QA lead verdict is ADEQUATE → verdict proceeds normally
+
+**Assertions (5a):**
+- [ ] Skill reads review mode before deciding whether to invoke QL-STORY-READY
+- [ ] QL-STORY-READY gate is invoked in full mode after the 4-dimension check completes
+- [ ] A QA lead INADEQUATE verdict overrides a READY 4-dimension result → final verdict BLOCKED
+- [ ] Gate invocation is noted in output: "Gate: QL-STORY-READY — [result]"
+
+**Case 5b — lean or solo mode:**
+- `review-mode.txt` contains `lean` or `solo`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `lean` or `solo`
+2. QL-STORY-READY gate is SKIPPED
+3. Output notes the skip: "[QL-STORY-READY] skipped — Lean/Solo mode"
+4. Verdict is based on 4-dimension check only
+
+**Assertions (5b):**
+- [ ] QL-STORY-READY gate does NOT spawn in lean or solo mode
+- [ ] Skip is explicitly noted in output
+- [ ] Verdict is based on 4-dimension check alone
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT use Write or Edit tools (read-only skill)
+- [ ] Presents complete check results before verdict
+- [ ] Does not ask for approval (no file writes)
+- [ ] Ends with recommended next step (fix issues or proceed to implementation)
+- [ ] Distinguishes three verdict levels clearly (READY vs NEEDS WORK vs BLOCKED)
+
+---
+
+## Coverage Notes
+
+- Case where TR-ID is missing from the registry entirely is not explicitly
+  tested here; it follows the same NEEDS WORK pattern as Case 3.
+- The "no argument" path (skill auto-detecting the current story) is not
+  tested because it depends on `production/session-state/active.md` content,
+  which is hard to fixture reliably.
+- Stories with multiple ADR references are not tested; behavior is assumed to
+  be additive (all ADRs must be Accepted for READY verdict).
--- a/Framework/skills/review/architecture-review.md
+++ b/Framework/skills/review/architecture-review.md
@@ -0,0 +1,192 @@
+# Skill Test Spec: /architecture-review
+
+## Skill Summary
+
+`/architecture-review` is an Opus-tier skill that validates a technical architecture
+document against the project's 8 required architecture sections and checks that it
+is internally consistent, non-contradictory with existing ADRs, and correctly
+targeting the pinned engine version. It produces a verdict of APPROVED /
+NEEDS REVISION / MAJOR REVISION NEEDED.
+
+In `full` review mode, the skill spawns two director gate agents in parallel:
+TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY (lead-programmer). In
+`lean` or `solo` mode, both gates are skipped and noted. The skill is read-only —
+no files are written.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents gate behavior: TD-ARCHITECTURE + LP-FEASIBILITY in full mode; skipped in lean/solo
+
+---
+
+## Director Gate Checks
+
+In `full` mode: TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY
+(lead-programmer) are spawned in parallel after the skill reads the architecture doc.
+
+In `lean` mode: both gates are skipped. Output notes:
+"TD-ARCHITECTURE skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode".
+
+In `solo` mode: both gates are skipped with equivalent notes.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Complete architecture doc in full mode
+
+**Fixture:**
+- `docs/architecture/architecture.md` exists with all 8 required sections populated
+- All sections reference the correct engine version from `docs/engine-reference/`
+- No contradictions with existing Accepted ADRs in `docs/architecture/`
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/architecture-review docs/architecture/architecture.md`
+
+**Expected behavior:**
+1. Skill reads the architecture document
+2. Skill reads existing ADRs for cross-reference
+3. Skill reads engine version reference
+4. TD-ARCHITECTURE and LP-FEASIBILITY gate agents spawn in parallel
+5. Both gates return APPROVED
+6. Skill outputs section-by-section completeness check (8/8 sections present)
+7. Verdict: APPROVED
+
+**Assertions:**
+- [ ] All 8 required sections are checked and reported
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel (not sequentially)
+- [ ] Verdict is APPROVED when all sections are present and no conflicts exist
+- [ ] Skill does NOT write any files
+- [ ] Next-step handoff to `/create-control-manifest` or `/create-epics` is present
+
+---
+
+### Case 2: Failure Path — Missing required sections
+
+**Fixture:**
+- `docs/architecture/architecture.md` exists but is missing at least 2 required sections
+  (e.g., no data model section, no error handling section)
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/architecture-review docs/architecture/architecture.md`
+
+**Expected behavior:**
+1. Skill reads the document and identifies missing sections
+2. Section completeness shows fewer than 8/8 sections present
+3. Missing sections are listed by name with specific remediation guidance
+4. Verdict: MAJOR REVISION NEEDED (≥2 missing sections)
+
+**Assertions:**
+- [ ] Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) for ≥2 missing sections
+- [ ] Each missing section is named explicitly in the output
+- [ ] Remediation guidance is specific (what to add, not just "add missing sections")
+- [ ] Skill does NOT pass a document missing required sections
+
+---
+
+### Case 3: Partial Path — Architecture contradicts an existing ADR
+
+**Fixture:**
+- `docs/architecture/architecture.md` exists with all 8 sections present
+- One Accepted ADR in `docs/architecture/` establishes a constraint that the architecture doc contradicts
+  (e.g., ADR-001 mandates ECS pattern; architecture.md describes a different pattern for the same system)
+
+**Input:** `/architecture-review docs/architecture/architecture.md`
+
+**Expected behavior:**
+1. Skill reads the architecture doc and all existing ADRs
+2. Conflict is detected between the architecture doc and the named ADR
+3. Conflict entry names: the ADR number/title, the contradicting sections, and impact
+4. Verdict: NEEDS REVISION (conflict exists but structure is otherwise sound)
+
+**Assertions:**
+- [ ] Verdict is NEEDS REVISION (not MAJOR REVISION NEEDED for a single contradiction)
+- [ ] The specific ADR number and title are named in the conflict entry
+- [ ] The contradicting sections in both documents are identified
+- [ ] Skill does NOT auto-resolve the contradiction
+
+---
+
+### Case 4: Edge Case — File not found
+
+**Fixture:**
+- The path provided does not exist in the project
+
+**Input:** `/architecture-review docs/architecture/nonexistent.md`
+
+**Expected behavior:**
+1. Skill attempts to read the file
+2. File not found
+3. Skill outputs a clear error naming the missing file
+4. Skill suggests checking `docs/architecture/` or running `/create-architecture`
+5. Skill does NOT produce a verdict
+
+**Assertions:**
+- [ ] Skill outputs a clear error when the file is not found
+- [ ] No verdict is produced (APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED)
+- [ ] Skill suggests a corrective action
+- [ ] Skill does NOT crash or produce a partial report
+
+---
+
+### Case 5: Director Gate — Full mode spawns both gates; solo mode skips both
+
+**Fixture (full mode):**
+- `docs/architecture/architecture.md` exists with all 8 sections
+- `production/session-state/review-mode.txt` contains `full`
+
+**Full mode expected behavior:**
+1. TD-ARCHITECTURE gate spawns
+2. LP-FEASIBILITY gate spawns in parallel with TD-ARCHITECTURE
+3. Both gates complete before verdict is issued
+
+**Assertions (full mode):**
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY both appear in the output as completed gates
+- [ ] Both gates spawn in parallel (not one after the other)
+- [ ] Verdict reflects gate feedback
+
+**Fixture (solo mode):**
+- Same architecture doc
+- `production/session-state/review-mode.txt` contains `solo`
+
+**Solo mode expected behavior:**
+1. Skill reads the architecture doc
+2. Gates are NOT spawned
+3. Output notes: "TD-ARCHITECTURE skipped — solo mode" and "LP-FEASIBILITY skipped — solo mode"
+4. Verdict is based on structural checks only
+
+**Assertions (solo mode):**
+- [ ] Neither TD-ARCHITECTURE nor LP-FEASIBILITY appears as an active gate
+- [ ] Both skipped gates are noted in the output
+- [ ] Verdict is still produced based on the structural check alone
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT write any files (read-only skill)
+- [ ] Presents section completeness check before issuing verdict
+- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel in full mode
+- [ ] Skipped gates are noted by name and mode in lean/solo output
+- [ ] Verdict is one of exactly: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Ends with next-step handoff appropriate to verdict
+
+---
+
+## Coverage Notes
+
+- The 8 required architecture sections are project-specific; tests use the
+  section list defined in the skill body — not re-enumerated here.
+- Engine version compatibility checking (cross-referencing `docs/engine-reference/`)
+  is part of Case 1's happy path but not independently fixture-tested.
+- RTM (requirement traceability matrix) mode is a separate concern covered by
+  the `/architecture-review` skill's own `rtm` argument mode, not tested here.
--- a/Framework/skills/review/design-review.md
+++ b/Framework/skills/review/design-review.md
@@ -0,0 +1,170 @@
+# Skill Test Spec: /design-review
+
+## Skill Summary
+
+`/design-review` reads a game design document (GDD) and evaluates it against
+the project's 8-section design standard (Overview, Player Fantasy, Detailed
+Rules, Formulas, Edge Cases, Dependencies, Tuning Knobs, Acceptance Criteria).
+It checks for internal consistency, implementability, and cross-system
+conflicts. It produces a verdict of APPROVED, NEEDS REVISION, or MAJOR
+REVISION NEEDED. It is a read-only skill (no file writes) and runs as a
+`context: fork` subagent.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings or numbered steps
+- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
+- [ ] Does NOT require "May I write" language (read-only skill — `allowed-tools` excludes Write/Edit)
+- [ ] Output format is documented (review template shown in skill body)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Complete GDD, all 8 sections present
+
+**Fixture:**
+- `design/gdd/light-manipulation.md` exists (use `_fixtures/minimal-game-concept.md`
+  as a stand-in — represents a complete document with all required content)
+- All 8 required sections are populated with substantive content
+- Formulas section contains at least one formula with defined variables
+- Acceptance Criteria section contains at least 3 testable criteria
+
+**Input:** `/design-review design/gdd/light-manipulation.md`
+
+**Expected behavior:**
+1. Skill reads the target document in full
+2. Skill reads CLAUDE.md for project context and standards
+3. Skill evaluates all 8 required sections (present/absent check)
+4. Skill checks internal consistency (formulas match described behavior)
+5. Skill checks implementability (rules are precise enough to code)
+6. Skill outputs structured review with section-by-section status
+7. Skill outputs APPROVED verdict
+
+**Assertions:**
+- [ ] Skill reads the target file before producing any output
+- [ ] Output includes a "Completeness" section showing X/8 sections present
+- [ ] Output includes an "Internal Consistency" section
+- [ ] Output includes an "Implementability" section
+- [ ] Output ends with a verdict line: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED
+- [ ] APPROVED verdict is given when all 8 sections are present and consistent
+
+---
+
+### Case 2: Failure Path — Incomplete GDD (4/8 sections)
+
+**Fixture:**
+- `design/gdd/light-manipulation.md` exists using content from
+  `tests/skills/_fixtures/incomplete-gdd.md` (4 of 8 sections populated;
+  Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria are missing)
+
+**Input:** `/design-review design/gdd/light-manipulation.md`
+
+**Expected behavior:**
+1. Skill reads the document
+2. Skill identifies 4 missing sections
+3. Skill outputs "Completeness: 4/8 sections present"
+4. Skill lists specifically which 4 sections are missing
+5. Skill outputs MAJOR REVISION NEEDED verdict (not APPROVED or NEEDS REVISION)
+
+**Assertions:**
+- [ ] Output shows "4/8" in the completeness section (not a higher number)
+- [ ] Output explicitly names each missing section (Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria)
+- [ ] Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) when ≥3 sections are missing
+- [ ] Output does not suggest the document is implementation-ready
+- [ ] Skill does not write any files (read-only enforcement)
+
+---
+
+### Case 3: Partial Path — 7/8 sections, minor inconsistency
+
+**Fixture:**
+- GDD has all sections except Formulas
+- The described behavior mentions numeric values but no formulas are defined
+- Acceptance Criteria exist but are vague ("feels good" rather than measurable)
+
+**Input:** `/design-review design/gdd/[document].md`
+
+**Expected behavior:**
+1. Skill identifies missing Formulas section
+2. Skill flags vague acceptance criteria as an implementability issue
+3. Skill outputs NEEDS REVISION verdict (not APPROVED, not MAJOR REVISION NEEDED)
+4. Skill provides specific remediation notes for each issue
+
+**Assertions:**
+- [ ] Verdict is NEEDS REVISION (not APPROVED, not MAJOR REVISION NEEDED) for 7/8 with issues
+- [ ] Output identifies the missing Formulas section specifically
+- [ ] Output flags the vague acceptance criteria as an implementability gap
+- [ ] Each flagged issue has a specific, actionable remediation note
+
+---
+
+### Case 4: Edge Case — File not found
+
+**Fixture:**
+- The path provided does not exist in the project
+
+**Input:** `/design-review design/gdd/nonexistent.md`
+
+**Expected behavior:**
+1. Skill attempts to read the file
+2. File not found
+3. Skill outputs an error message naming the missing file
+4. Skill suggests checking the path or listing files in `design/gdd/`
+5. Skill does NOT produce a verdict
+
+**Assertions:**
+- [ ] Skill outputs a clear error when the file is not found
+- [ ] Skill does NOT output APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED when file is missing
+- [ ] Skill suggests a corrective action (check path, list available GDDs)
+
+---
+
+---
+
+### Case 5: Director Gate — no gate spawned regardless of review mode
+
+**Fixture:**
+- `design/gdd/light-manipulation.md` exists with all 8 sections
+- `production/session-state/review-mode.txt` exists with `full` (most permissive mode)
+
+**Input:** `/design-review design/gdd/light-manipulation.md` (with full review mode active)
+
+**Expected behavior:**
+1. Skill reads the GDD document
+2. Skill does NOT read `review-mode.txt` — this skill has no director gates
+3. Skill produces the review output normally
+4. No director gate agents are spawned at any point
+5. Verdict is APPROVED (all 8 sections present in fixture)
+
+**Assertions:**
+- [ ] Skill does NOT spawn any director gate agent (CD-, TD-, PR-, AD- prefixed agents)
+- [ ] Skill does NOT read `review-mode.txt` or equivalent mode file
+- [ ] The `--review` flag or `full` mode state has NO effect on whether directors spawn
+- [ ] Output does not contain any "Gate: [GATE-ID]" entries
+- [ ] Skill IS the review — it does not delegate the review to a director
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT use Write or Edit tools (read-only skill)
+- [ ] Presents complete findings before any verdict
+- [ ] Does not ask for approval before producing output (no writes to approve)
+- [ ] Ends with recommended next step (e.g., fix issues and re-run, or proceed to `/map-systems`)
+
+---
+
+## Coverage Notes
+
+- Cross-system consistency checking (Case 3 in the skill's own phase list) is
+  not directly tested here because it requires multiple GDD files to compare;
+  this is covered by the `/review-all-gdds` spec instead.
+- The skill's `context: fork` behavior (running as a subagent) is not tested
+  at the spec level — this is a runtime behavior verified manually.
+- Performance and edge cases involving very large GDD files are not in scope.
--- a/Framework/skills/review/review-all-gdds.md
+++ b/Framework/skills/review/review-all-gdds.md
@@ -0,0 +1,178 @@
+# Skill Test Spec: /review-all-gdds
+
+## Skill Summary
+
+`/review-all-gdds` is an Opus-tier skill that performs a holistic cross-GDD review
+across all files in `design/gdd/`. It runs two complementary review phases in
+parallel: Phase 1 checks for consistency (contradictions, formula mismatches,
+stale references, competing ownership), and Phase 2 checks design theory (dominant
+strategies, pillar drift, cognitive overload, economic imbalance). Because the two
+phases are independent, they are spawned simultaneously to save time. The skill
+produces a CONSISTENT / MINOR ISSUES / MAJOR ISSUES verdict and is read-only — no
+files are written without explicit user approval.
+
+The skill is itself the holistic review gate in the pipeline. It is invoked after
+individual GDDs are complete and before architecture work begins. It does NOT spawn
+any director gate agents (it IS the director-level review).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥5 phase headings (complex multi-phase skill)
+- [ ] Contains verdict keywords: CONSISTENT, MINOR ISSUES, MAJOR ISSUES
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents parallel phase spawning (Phase 1 and Phase 2 are independent)
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents. It IS the holistic
+review; delegating to a director gate would create a circular dependency.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Clean GDD set with no conflicts
+
+**Fixture:**
+- `design/gdd/` contains ≥3 system GDDs
+- All GDDs are internally consistent: no formula contradictions, no competing ownership, no stale references
+- All GDDs align with the pillars defined in `design/gdd/game-pillars.md`
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Skill reads all GDD files in `design/gdd/`
+2. Phase 1 (consistency scan) and Phase 2 (design theory check) spawn in parallel
+3. Phase 1 finds no contradictions, no formula mismatches, no ownership conflicts
+4. Phase 2 finds no pillar drift, no dominant strategies, no cognitive overload
+5. Skill outputs a structured findings table with 0 blocking issues
+6. Verdict: CONSISTENT
+
+**Assertions:**
+- [ ] Both review phases are spawned in parallel (not sequentially)
+- [ ] Output includes a findings table (even if empty — shows "No issues found")
+- [ ] Verdict is CONSISTENT when no conflicts are found
+- [ ] Skill does NOT write any files without user approval
+- [ ] Next-step handoff to `/architecture-review` or `/create-architecture` is present
+
+---
+
+### Case 2: Failure Path — Conflicting rules between two GDDs
+
+**Fixture:**
+- GDD-A defines a floor value (e.g. "minimum [output] is [N]")
+- GDD-B states a mechanic that bypasses that floor (e.g. "[mechanic] can reduce [output] to 0")
+- The two GDDs are otherwise complete and valid
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Phase 1 (consistency scan) detects the contradiction between GDD-A and GDD-B
+2. Conflict is reported with: both filenames, the specific conflicting rules, and severity HIGH
+3. Verdict: MAJOR ISSUES
+4. Handoff instructs user to resolve the conflict and re-run before proceeding
+
+**Assertions:**
+- [ ] Verdict is MAJOR ISSUES (not CONSISTENT or MINOR ISSUES)
+- [ ] Both GDD filenames are named in the conflict entry
+- [ ] The specific contradicting rules are quoted or described (not vague "conflict found")
+- [ ] Issue is classified as severity HIGH (blocking)
+- [ ] Skill does NOT auto-resolve the conflict
+
+---
+
+### Case 3: Partial Path — Single GDD with orphaned dependency reference
+
+**Fixture:**
+- GDD-A lists a dependency in its Dependencies section pointing to "system-B"
+- No GDD for system-B exists in `design/gdd/`
+- All other GDDs are consistent
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Phase 1 detects the orphaned dependency reference in GDD-A
+2. Issue is reported as: DEPENDENCY GAP — GDD-A references system-B which has no GDD
+3. No other conflicts found
+4. Verdict: MINOR ISSUES (dependency gap is advisory, not blocking by itself)
+
+**Assertions:**
+- [ ] Verdict is MINOR ISSUES (not MAJOR ISSUES for a single orphaned reference)
+- [ ] The specific GDD filename and the missing dependency name are reported
+- [ ] Skill suggests running `/design-system system-B` to resolve the gap
+- [ ] Skill does NOT skip or silently ignore the missing dependency
+
+---
+
+### Case 4: Edge Case — No GDD files found
+
+**Fixture:**
+- `design/gdd/` directory is empty or does not exist
+- No GDD files are present
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Skill attempts to read files in `design/gdd/`
+2. No files found — skill outputs an error with guidance
+3. Skill recommends running `/brainstorm` and `/design-system` before re-running
+4. Skill does NOT produce a verdict (CONSISTENT / MINOR ISSUES / MAJOR ISSUES)
+
+**Assertions:**
+- [ ] Skill outputs a clear error message when no GDDs are found
+- [ ] No verdict is produced when the directory is empty
+- [ ] Skill recommends the correct next action (`/brainstorm` or `/design-system`)
+- [ ] Skill does NOT crash or produce a partial report
+
+---
+
+### Case 5: Director Gate — No gate spawned regardless of review mode
+
+**Fixture:**
+- `design/gdd/` contains ≥2 consistent system GDDs
+- `production/session-state/review-mode.txt` exists with content `full`
+
+**Input:** `/review-all-gdds`
+
+**Expected behavior:**
+1. Skill reads all GDDs and runs the two review phases
+2. Skill does NOT read `review-mode.txt`
+3. Skill does NOT spawn any director gate agent (CD-, TD-, PR-, AD- prefixed)
+4. Skill completes and outputs its verdict normally
+5. Review mode setting has no effect on this skill's behavior
+
+**Assertions:**
+- [ ] No director gate agents are spawned at any point
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output does not contain any "Gate: [GATE-ID]" or "skipped" gate entries
+- [ ] The skill produces a verdict regardless of review mode
+- [ ] R4 metric: gate count for this skill = 0 in all modes
+
+---
+
+## Protocol Compliance
+
+- [ ] Phase 1 (consistency) and Phase 2 (design theory) spawned in parallel — not sequentially
+- [ ] Does NOT write any files without "May I write" approval
+- [ ] Findings table shown before any write ask
+- [ ] Verdict is one of exactly: CONSISTENT, MINOR ISSUES, MAJOR ISSUES
+- [ ] Ends with appropriate handoff: MAJOR ISSUES → fix and re-run; MINOR ISSUES → may proceed with awareness; CONSISTENT → `/create-architecture`
+
+---
+
+## Coverage Notes
+
+- Economic balance analysis (source/sink loops) requires cross-GDD resource data — covered
+  structurally by Case 2 (the conflict detection pattern is the same).
+- The design theory phase (Phase 2) checks including dominant strategy detection and
+  cognitive overload are not individually fixture-tested — they follow the same
+  pattern as consistency checks and are validated via the pillar drift case structure.
+- The `since-last-review` scoping mode is not tested here — it is a runtime concern.
--- a/Framework/skills/sprint/changelog.md
+++ b/Framework/skills/sprint/changelog.md
@@ -0,0 +1,169 @@
+# Skill Test Spec: /changelog
+
+## Skill Summary
+
+`/changelog` is a Haiku-tier skill that auto-generates a developer-facing
+changelog by reading git commit history and closed sprint stories since the
+last release tag. It organizes entries into features, fixes, and known issues.
+No director gates are used. The skill asks "May I write to `docs/CHANGELOG.md`?"
+before persisting. Verdict is always COMPLETE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language (skill writes changelog)
+- [ ] Has a next-step handoff (e.g., run /patch-notes for player-facing version)
+
+---
+
+## Director Gate Checks
+
+None. Changelog generation is a fast compilation task; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Multiple sprints since last release tag
+
+**Fixture:**
+- Git history has a tag `v0.3.0` three sprints ago
+- Since that tag: 12 commits across sprints 006, 007, 008
+- Sprint story files reference task IDs matching commit messages
+- `docs/CHANGELOG.md` does not yet exist
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill reads git log since `v0.3.0` tag
+2. Skill reads sprint stories to cross-reference task IDs
+3. Skill compiles entries into Features, Fixes, and Known Issues sections
+4. Skill presents draft to user
+5. Skill asks "May I write to `docs/CHANGELOG.md`?"
+6. User approves; file written; verdict COMPLETE
+
+**Assertions:**
+- [ ] Changelog covers commits since the most recent git tag
+- [ ] Entries are organized into Features / Fixes / Known Issues sections
+- [ ] Sprint story references are used to enrich commit descriptions
+- [ ] "May I write" prompt appears before file write
+- [ ] Verdict is COMPLETE after write
+
+---
+
+### Case 2: No Git Tags Found — All commits used, version baseline noted
+
+**Fixture:**
+- Git repository has commits but no tags exist
+- 20 commits in history across 3 sprints
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill checks for git tags — finds none
+2. Skill uses all commits in history as the baseline
+3. Skill notes in the output: "No version tag found — using full commit history; version baseline is unset"
+4. Skill still compiles organized changelog from available commits
+5. Skill asks "May I write" and writes on approval
+
+**Assertions:**
+- [ ] Skill does not error when no git tags exist
+- [ ] Output explicitly notes that no version baseline was found
+- [ ] Full commit history is used as the source
+- [ ] Changelog is still organized into sections despite missing tag
+
+---
+
+### Case 3: Commit Messages Without Task IDs — Grouped by date with note
+
+**Fixture:**
+- Git log since last tag has 8 commits
+- 5 commits have no task ID in the message (e.g., "fix typo", "tweak values")
+- 3 commits reference task IDs matching sprint stories
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill reads commits and sprint stories
+2. 3 commits are matched to sprint stories and placed in appropriate sections
+3. 5 untagged commits are grouped by date under a "Misc" or "Other Changes" section
+4. Output notes: "5 commits without task IDs — grouped by date"
+5. Skill writes changelog on approval
+
+**Assertions:**
+- [ ] Commits with task IDs are placed in appropriate sections (Features or Fixes)
+- [ ] Commits without task IDs are grouped separately with a note
+- [ ] Output flags the number of commits missing task references
+- [ ] No commits are silently dropped from the changelog
+
+---
+
+### Case 4: Existing CHANGELOG.md — New section prepended, old entries preserved
+
+**Fixture:**
+- `docs/CHANGELOG.md` already exists with sections for `v0.2.0` and `v0.3.0`
+- New commits exist since `v0.3.0` tag
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill detects that `docs/CHANGELOG.md` already exists
+2. Skill compiles new entries for the period since `v0.3.0`
+3. Skill presents draft with new section prepended above existing content
+4. Skill asks "May I write to `docs/CHANGELOG.md`?" (confirming prepend strategy)
+5. User approves; new content is prepended, old entries intact; verdict COMPLETE
+
+**Assertions:**
+- [ ] Skill reads existing changelog before writing to detect prior content
+- [ ] New section is prepended (not appended or overwriting) existing entries
+- [ ] Old changelog entries for v0.2.0 and v0.3.0 are preserved in the written file
+- [ ] "May I write" prompt reflects the prepend operation
+
+---
+
+### Case 5: Gate Compliance — No gate; read-then-write with approval
+
+**Fixture:**
+- Git history has commits since last tag
+- `review-mode.txt` contains `full`
+
+**Input:** `/changelog`
+
+**Expected behavior:**
+1. Skill compiles changelog in full mode
+2. No director gate is invoked (changelog generation is compilation, not a delivery gate)
+3. Skill runs on Haiku model — fast compilation
+4. Skill asks user for approval and writes file on confirmation
+
+**Assertions:**
+- [ ] No director gate is invoked regardless of review mode
+- [ ] Output does not reference any gate result
+- [ ] Skill proceeds directly from compilation to "May I write" prompt
+- [ ] Verdict is COMPLETE
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads git log and sprint story files before compiling
+- [ ] Always asks "May I write" before writing changelog
+- [ ] No director gates are invoked
+- [ ] Verdict is always COMPLETE
+- [ ] Runs on Haiku model tier (fast, low-cost)
+
+---
+
+## Coverage Notes
+
+- The case where git is not initialized in the repository is not tested;
+  behavior would depend on git command failure handling.
+- Merge commits vs. squash commits are not explicitly differentiated in
+  these tests; implementation detail of the git log parsing phase.
+- The `/patch-notes` skill should be run after `/changelog` for player-facing
+  output; that handoff is verified in the patch-notes spec.
--- a/Framework/skills/sprint/milestone-review.md
+++ b/Framework/skills/sprint/milestone-review.md
@@ -0,0 +1,171 @@
+# Skill Test Spec: /milestone-review
+
+## Skill Summary
+
+`/milestone-review` generates a comprehensive review of a completed milestone:
+what shipped, velocity metrics, deferred items, risks surfaced, and retrospective
+seeds. In full mode the PR-MILESTONE director gate runs after the review is
+compiled (producer reviews scope delivery). In lean and solo modes the gate is
+skipped. The skill asks "May I write to `production/milestones/review-milestone-N.md`?"
+before persisting. Verdicts: MILESTONE COMPLETE or MILESTONE INCOMPLETE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: MILESTONE COMPLETE, MILESTONE INCOMPLETE
+- [ ] Contains "May I write" language (skill writes review document)
+- [ ] Has a next-step handoff (what to do after review is written)
+
+---
+
+## Director Gate Checks
+
+| Gate ID       | Trigger condition              | Mode guard              |
+|---------------|--------------------------------|-------------------------|
+| PR-MILESTONE  | After review document compiled | full only (not lean/solo) |
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Nearly complete milestone with one deferred story
+
+**Fixture:**
+- `production/milestones/milestone-03.md` exists with 8 stories
+- 7 stories have `Status: Complete`
+- 1 story has `Status: Deferred` (deferred to milestone-04)
+- `review-mode.txt` contains `full`
+
+**Input:** `/milestone-review milestone-03`
+
+**Expected behavior:**
+1. Skill reads `milestone-03.md` and all referenced sprint files
+2. Skill compiles: 7 shipped, 1 deferred; velocity; no blockers
+3. Skill presents review draft to user
+4. PR-MILESTONE gate invoked; producer approves
+5. Skill asks "May I write to `production/milestones/review-milestone-03.md`?"
+6. User approves; file is written; verdict MILESTONE COMPLETE
+
+**Assertions:**
+- [ ] Deferred story is noted in the review with its target milestone
+- [ ] Verdict is MILESTONE COMPLETE despite the one deferred story
+- [ ] PR-MILESTONE gate is invoked after draft compilation in full mode
+- [ ] Skill asks "May I write" before writing review file
+- [ ] Review document path matches `production/milestones/review-milestone-03.md`
+
+---
+
+### Case 2: Blocked Milestone — Multiple blocked stories
+
+**Fixture:**
+- `production/milestones/milestone-03.md` exists with 5 stories
+- 2 stories have `Status: Complete`
+- 3 stories have `Status: Blocked` (named blockers listed in each story)
+- `review-mode.txt` contains `full`
+
+**Input:** `/milestone-review milestone-03`
+
+**Expected behavior:**
+1. Skill reads milestone and sprint files
+2. Skill finds 3 blocked stories; compiles blocker details
+3. Verdict is MILESTONE INCOMPLETE
+4. PR-MILESTONE gate runs; producer notes the unresolved blockers
+5. Review is written with blocker list on approval
+
+**Assertions:**
+- [ ] Verdict is MILESTONE INCOMPLETE when any stories are Blocked
+- [ ] Each blocked story's name and blocker reason is listed in the review
+- [ ] PR-MILESTONE gate is still invoked in full mode even for INCOMPLETE verdict
+- [ ] "May I write" prompt still appears before file write
+
+---
+
+### Case 3: Full Mode — PR-MILESTONE returns CONCERNS
+
+**Fixture:**
+- Milestone-03 has 6 complete stories but 2 were not in the original scope (added mid-sprint)
+- `review-mode.txt` contains `full`
+
+**Input:** `/milestone-review milestone-03`
+
+**Expected behavior:**
+1. Skill compiles review; notes 2 out-of-scope stories shipped
+2. PR-MILESTONE gate invoked; producer returns CONCERNS about scope drift
+3. Skill surfaces the CONCERNS to the user and adds a "scope drift" note to the review
+4. User approves revised review; file written as MILESTONE COMPLETE with caveat
+
+**Assertions:**
+- [ ] CONCERNS from PR-MILESTONE gate are shown to user before write
+- [ ] Scope drift is explicitly noted in the written review document
+- [ ] Verdict is MILESTONE COMPLETE (stories shipped) with CONCERNS annotation
+- [ ] Skill does not suppress gate feedback
+
+---
+
+### Case 4: Edge Case — No milestone file found for specified milestone
+
+**Fixture:**
+- User calls `/milestone-review milestone-07`
+- `production/milestones/milestone-07.md` does NOT exist
+
+**Input:** `/milestone-review milestone-07`
+
+**Expected behavior:**
+1. Skill attempts to read `production/milestones/milestone-07.md`
+2. File not found; skill outputs an error message
+3. Skill suggests checking available milestones in `production/milestones/`
+4. No gate is invoked; no file is written
+
+**Assertions:**
+- [ ] Skill does not crash when milestone file is absent
+- [ ] Output names the expected file path in the error message
+- [ ] Output suggests checking `production/milestones/` for valid milestone names
+- [ ] Verdict is BLOCKED (cannot review a non-existent milestone)
+
+---
+
+### Case 5: Lean/Solo Mode — PR-MILESTONE gate skipped
+
+**Fixture:**
+- `production/milestones/milestone-03.md` exists with 5 complete stories
+- `review-mode.txt` contains `solo`
+
+**Input:** `/milestone-review milestone-03`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `solo`
+2. Skill compiles review draft
+3. PR-MILESTONE gate is skipped; output notes "[PR-MILESTONE] skipped — Solo mode"
+4. Skill asks user for direct approval of the review
+5. User approves; review file is written; verdict MILESTONE COMPLETE
+
+**Assertions:**
+- [ ] PR-MILESTONE gate is NOT invoked in solo (or lean) mode
+- [ ] Skip is explicitly noted in skill output
+- [ ] User direct approval is still required before write
+- [ ] Verdict is MILESTONE COMPLETE after successful write
+
+---
+
+## Protocol Compliance
+
+- [ ] Shows compiled review draft before invoking PR-MILESTONE or asking to write
+- [ ] Always asks "May I write" before writing review document
+- [ ] PR-MILESTONE gate only runs in full mode
+- [ ] Skip message appears in lean and solo output
+- [ ] Verdict is MILESTONE COMPLETE or MILESTONE INCOMPLETE, stated clearly
+
+---
+
+## Coverage Notes
+
+- The case where the milestone has zero stories is not tested; it follows the
+  MILESTONE INCOMPLETE pattern with a note suggesting the milestone may not
+  have been planned.
+- Velocity calculation specifics (story points vs. story count) are not
+  verified here; they are implementation details of the review compilation phase.
--- a/Framework/skills/sprint/patch-notes.md
+++ b/Framework/skills/sprint/patch-notes.md
@@ -0,0 +1,170 @@
+# Skill Test Spec: /patch-notes
+
+## Skill Summary
+
+`/patch-notes` is a Haiku-tier skill that generates player-facing patch notes
+from existing changelog content, stripping internal task IDs and technical
+jargon in favor of plain language. It filters entries to only those relevant
+to players (visible features and bug fixes; internal refactors are excluded).
+No director gates are used. The skill asks "May I write to
+`docs/patch-notes-vX.X.md`?" before persisting. Verdict is always COMPLETE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language (skill writes patch notes file)
+- [ ] Has a next-step handoff (e.g., share with community manager)
+
+---
+
+## Director Gate Checks
+
+None. Patch notes generation is a fast compilation task; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Changelog filtered to player-facing entries
+
+**Fixture:**
+- `docs/CHANGELOG.md` exists with 5 entries:
+  - "Add dual-wield melee system" (Features — player-facing)
+  - "Fix crash on level transition" (Fixes — player-facing)
+  - "Add enemy patrol AI" (Features — player-facing)
+  - "Refactor input handler to use event bus" (Fixes — internal only)
+  - "Update dependency: Godot 4.6" (internal only)
+- Version is `v0.4.0`
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill reads `docs/CHANGELOG.md`
+2. Skill filters to 3 player-facing entries; excludes 2 internal entries
+3. Skill rewrites entries in plain language (no task IDs, no tech jargon)
+4. Skill presents draft to user
+5. Skill asks "May I write to `docs/patch-notes-v0.4.0.md`?"
+6. User approves; file written; verdict COMPLETE
+
+**Assertions:**
+- [ ] Only 3 entries appear in the patch notes (2 internal entries excluded)
+- [ ] Entries are written in plain language without internal task IDs
+- [ ] File path matches `docs/patch-notes-v0.4.0.md`
+- [ ] "May I write" prompt appears before file write
+- [ ] Verdict is COMPLETE after write
+
+---
+
+### Case 2: No Changelog Found — Directed to run /changelog first
+
+**Fixture:**
+- `docs/CHANGELOG.md` does NOT exist
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill attempts to read `docs/CHANGELOG.md` — not found
+2. Skill outputs: "No changelog found — run /changelog first to generate one"
+3. No patch notes are generated; no file is written
+
+**Assertions:**
+- [ ] Skill does not crash when changelog is absent
+- [ ] Output explicitly directs user to run `/changelog`
+- [ ] No "May I write" prompt appears (nothing to write)
+- [ ] Verdict is BLOCKED (dependency not met)
+
+---
+
+### Case 3: Tone Guidance from Design Folder — Incorporated into output
+
+**Fixture:**
+- `docs/CHANGELOG.md` exists with player-facing entries
+- `design/community/tone-guide.md` exists with guidance: "upbeat, encouraging tone; avoid passive voice"
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill reads changelog
+2. Skill detects tone guide at `design/community/tone-guide.md`
+3. Skill applies tone guidance when rewriting entries in plain language
+4. Patch notes use upbeat, active-voice phrasing
+5. Skill presents draft, asks to write, writes on approval
+
+**Assertions:**
+- [ ] Skill checks `design/` for a community or tone guidance file
+- [ ] Tone guide content influences phrasing of patch note entries
+- [ ] Output reflects active voice and upbeat tone where applicable
+- [ ] Skill notes that tone guidance was applied
+
+---
+
+### Case 4: Patch Note Template Exists — Used instead of generated structure
+
+**Fixture:**
+- `.claude/docs/templates/patch-notes-template.md` exists with a structured header format
+- `docs/CHANGELOG.md` exists with player-facing entries
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill reads changelog and detects template exists
+2. Skill populates the template with player-facing entries
+3. Template header/footer structure is preserved in the output
+4. Skill asks "May I write" and writes on approval
+
+**Assertions:**
+- [ ] Skill checks for a patch notes template before generating from scratch
+- [ ] Template structure is used when found (not overridden by default format)
+- [ ] Player-facing entries are inserted into the correct template section
+- [ ] Output note confirms template was used
+
+---
+
+### Case 5: Gate Compliance — No gate; community-manager is separate
+
+**Fixture:**
+- `docs/CHANGELOG.md` exists with player-facing entries
+- `review-mode.txt` contains `full`
+
+**Input:** `/patch-notes v0.4.0`
+
+**Expected behavior:**
+1. Skill compiles patch notes in full mode
+2. No director gate is invoked (community review is a separate, manual step)
+3. Skill runs on Haiku model — fast compilation
+4. Skill notes in output: "Consider sharing draft with community manager before publishing"
+5. Skill asks user for approval and writes on confirmation
+
+**Assertions:**
+- [ ] No director gate is invoked regardless of review mode
+- [ ] Output suggests (but does not require) community manager review
+- [ ] Skill proceeds directly from compilation to "May I write" prompt
+- [ ] Verdict is COMPLETE
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `docs/CHANGELOG.md` before generating patch notes
+- [ ] Filters entries to player-facing items only
+- [ ] Rewrites entries in plain language without internal IDs
+- [ ] Always asks "May I write" before writing patch notes file
+- [ ] No director gates are invoked
+- [ ] Runs on Haiku model tier (fast, low-cost)
+
+---
+
+## Coverage Notes
+
+- The case where all changelog entries are internal (zero player-facing items)
+  is not tested; behavior is an empty patch notes draft with a warning.
+- Version number parsing from the changelog header is an implementation detail
+  not verified here.
+- The community manager consultation noted in Case 5 is advisory; a separate
+  skill or manual review handles that step.
--- a/Framework/skills/sprint/retrospective.md
+++ b/Framework/skills/sprint/retrospective.md
@@ -0,0 +1,169 @@
+# Skill Test Spec: /retrospective
+
+## Skill Summary
+
+`/retrospective` generates a structured sprint or milestone retrospective
+covering three categories: what went well, what didn't, and action items.
+It reads sprint files and session logs to compile observations, then produces
+a retrospective document. No director gates are used — retrospectives are
+team self-reflection artifacts. The skill asks "May I write to
+`production/retrospectives/retro-sprint-NNN.md`?" before persisting.
+Verdict is always COMPLETE (retrospective is structured output, not a pass/fail
+assessment).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keyword: COMPLETE
+- [ ] Contains "May I write" language (skill writes retrospective document)
+- [ ] Has a next-step handoff (what to do after retrospective is written)
+
+---
+
+## Director Gate Checks
+
+None. Retrospectives are team self-reflection documents; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Sprint with mixed outcomes
+
+**Fixture:**
+- `production/sprints/sprint-005.md` exists with 6 stories (4 Complete, 1 Blocked, 1 Deferred)
+- `production/session-logs/` contains log entries for the sprint period
+- No prior retrospective exists for sprint-005
+
+**Input:** `/retrospective sprint-005`
+
+**Expected behavior:**
+1. Skill reads sprint-005 and session logs
+2. Skill compiles three retrospective categories: went well (4 stories shipped), 
+   didn't (1 blocked, 1 deferred), and action items (address blocker root cause)
+3. Skill presents retrospective draft to user
+4. Skill asks "May I write to `production/retrospectives/retro-sprint-005.md`?"
+5. User approves; file is written; verdict COMPLETE
+
+**Assertions:**
+- [ ] Retrospective contains all three categories (went well / didn't / actions)
+- [ ] Blocked and deferred stories appear in the "what didn't" section
+- [ ] At least one action item is generated from the blocked story
+- [ ] Skill asks "May I write" before writing file
+- [ ] Verdict is COMPLETE after successful write
+
+---
+
+### Case 2: No Sprint Data — Manual input fallback
+
+**Fixture:**
+- User calls `/retrospective sprint-009`
+- `production/sprints/sprint-009.md` does NOT exist
+- No session logs reference sprint-009
+
+**Input:** `/retrospective sprint-009`
+
+**Expected behavior:**
+1. Skill attempts to read sprint-009 — not found
+2. Skill informs user that no sprint data was found for sprint-009
+3. Skill prompts user to provide retrospective input manually (went well, didn't, actions)
+4. User provides input; skill formats it into the retrospective structure
+5. Skill asks "May I write" and writes the document on approval
+
+**Assertions:**
+- [ ] Skill does not crash or produce an empty document when sprint file is absent
+- [ ] User is prompted to provide manual input
+- [ ] Manual input is formatted into the three-category structure
+- [ ] "May I write" prompt still appears before file write
+
+---
+
+### Case 3: Prior Retrospective Exists — Offer to append or replace
+
+**Fixture:**
+- `production/retrospectives/retro-sprint-005.md` already exists with content
+- User re-runs `/retrospective sprint-005` after changes
+
+**Input:** `/retrospective sprint-005`
+
+**Expected behavior:**
+1. Skill detects that `retro-sprint-005.md` already exists
+2. Skill presents user with choice: append new observations or replace existing file
+3. User selects "replace"; skill compiles fresh retrospective
+4. Skill asks "May I write to `production/retrospectives/retro-sprint-005.md`?" (confirming overwrite)
+5. File is overwritten; verdict COMPLETE
+
+**Assertions:**
+- [ ] Skill checks for existing retrospective file before compiling
+- [ ] User is offered append or replace choice — not silently overwritten
+- [ ] "May I write" prompt reflects the overwrite scenario
+- [ ] Verdict is COMPLETE after write regardless of append vs. replace
+
+---
+
+### Case 4: Edge Case — Unresolved action items from previous retrospective
+
+**Fixture:**
+- `production/retrospectives/retro-sprint-004.md` exists with 2 action items marked `[ ]` (not done)
+- User runs `/retrospective sprint-005`
+
+**Input:** `/retrospective sprint-005`
+
+**Expected behavior:**
+1. Skill reads the most recent prior retrospective (retro-sprint-004)
+2. Skill detects 2 unchecked action items from sprint-004
+3. Skill includes a "Carry-over from Sprint 004" section in the new retrospective
+4. The unresolved items are listed with a note that they were not followed up
+
+**Assertions:**
+- [ ] Skill reads the most recent prior retrospective to check for open action items
+- [ ] Unresolved action items appear in the new retrospective under a carry-over section
+- [ ] Carry-over items are distinct from newly generated action items
+- [ ] Output notes that these items were not followed up in the previous sprint
+
+---
+
+### Case 5: Gate Compliance — No gate invoked in any mode
+
+**Fixture:**
+- `production/sprints/sprint-005.md` exists with complete stories
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/retrospective sprint-005`
+
+**Expected behavior:**
+1. Skill compiles retrospective in full mode
+2. No director gate is invoked (retrospectives are team self-reflection, not delivery gates)
+3. Skill asks user for approval and writes file on confirmation
+4. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] No director gate is invoked regardless of review mode
+- [ ] Output does not contain any gate invocation or gate result notation
+- [ ] Skill proceeds directly from compilation to "May I write" prompt
+- [ ] Review mode file content is irrelevant to this skill's behavior
+
+---
+
+## Protocol Compliance
+
+- [ ] Always shows retrospective draft before asking to write
+- [ ] Always asks "May I write" before writing retrospective file
+- [ ] No director gates are invoked
+- [ ] Verdict is always COMPLETE (not a pass/fail skill)
+- [ ] Checks prior retrospective for unresolved action items
+
+---
+
+## Coverage Notes
+
+- Milestone retrospectives (as opposed to sprint retrospectives) follow the
+  same pattern but read milestone files instead of sprint files; not
+  separately tested here.
+- The case where session logs are empty is similar to Case 2 (no data);
+  the skill falls back to manual input in both situations.
--- a/Framework/skills/sprint/sprint-plan.md
+++ b/Framework/skills/sprint/sprint-plan.md
@@ -0,0 +1,177 @@
+# Skill Test Spec: /sprint-plan
+
+## Skill Summary
+
+`/sprint-plan` reads the current milestone file and backlog stories, then
+generates a new numbered sprint with stories prioritized by implementation layer
+and priority score. In full mode the PR-SPRINT director gate runs after the
+sprint draft is compiled (producer reviews the plan). In lean and solo modes
+the gate is skipped. The skill asks "May I write to `production/sprints/sprint-NNN.md`?"
+before persisting. Verdicts: COMPLETE (sprint generated and written) or
+BLOCKED (cannot proceed due to missing data or gate failure).
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" language (skill writes sprint file)
+- [ ] Has a next-step handoff (what to do after sprint is written)
+
+---
+
+## Director Gate Checks
+
+| Gate ID   | Trigger condition        | Mode guard         |
+|-----------|--------------------------|--------------------|
+| PR-SPRINT | After sprint draft built | full only (not lean/solo) |
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Backlog with stories generates sprint
+
+**Fixture:**
+- `production/milestones/milestone-02.md` exists with capacity `10 story points`
+- Backlog contains 5 unstarted stories across 2 epics, mixed priorities
+- `production/session-state/review-mode.txt` contains `full`
+- Next sprint number is `003` (sprints 001 and 002 already exist)
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill reads current milestone to obtain capacity and goals
+2. Skill reads all unstarted stories from backlog; sorts by layer + priority
+3. Skill drafts sprint-003 with stories fitting within capacity
+4. Skill presents draft to user before invoking gate
+5. Skill invokes PR-SPRINT gate (full mode); producer approves
+6. Skill asks "May I write to `production/sprints/sprint-003.md`?"
+7. User approves; file is written
+
+**Assertions:**
+- [ ] Stories are sorted by implementation layer before priority
+- [ ] Sprint draft is shown before any write or gate invocation
+- [ ] PR-SPRINT gate is invoked in full mode after draft is ready
+- [ ] Skill asks "May I write" before writing the sprint file
+- [ ] Written file path matches `production/sprints/sprint-003.md`
+- [ ] Verdict is COMPLETE after successful write
+
+---
+
+### Case 2: Blocked Path — Backlog is empty
+
+**Fixture:**
+- `production/milestones/milestone-02.md` exists
+- No unstarted stories exist in any epic backlog
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill reads backlog — finds no unstarted stories
+2. Skill outputs "No unstarted stories in backlog"
+3. Skill suggests running `/create-stories` to populate the backlog
+4. No gate is invoked; no file is written
+
+**Assertions:**
+- [ ] Verdict is BLOCKED
+- [ ] Output contains "No unstarted stories" or equivalent message
+- [ ] Output recommends `/create-stories`
+- [ ] PR-SPRINT gate is NOT invoked
+- [ ] No write tool is called
+
+---
+
+### Case 3: Gate returns CONCERNS — Sprint overloaded, revised before write
+
+**Fixture:**
+- Backlog has 8 stories totalling 16 points; milestone capacity is 10 points
+- `review-mode.txt` contains `full`
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill drafts sprint with all 8 stories (over capacity)
+2. PR-SPRINT gate runs; producer returns CONCERNS: sprint is overloaded
+3. Skill presents concern to user and asks which stories to defer
+4. User selects 3 stories to defer; sprint is revised to 5 stories / 10 points
+5. Skill asks "May I write" with revised sprint; writes on approval
+
+**Assertions:**
+- [ ] CONCERNS from PR-SPRINT gate surfaces to user before any write
+- [ ] Skill allows sprint to be revised after gate feedback
+- [ ] Revised sprint (not original) is written to file
+- [ ] Verdict is COMPLETE after revision and write
+
+---
+
+### Case 4: Lean Mode — PR-SPRINT gate skipped
+
+**Fixture:**
+- Backlog has 4 stories; milestone capacity is 8 points
+- `review-mode.txt` contains `lean`
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill reads review mode — determines `lean`
+2. Skill drafts sprint and presents it to user
+3. PR-SPRINT gate is skipped; output notes "[PR-SPRINT] skipped — Lean mode"
+4. Skill asks user for direct approval of the sprint
+5. User approves; sprint file is written
+
+**Assertions:**
+- [ ] PR-SPRINT gate is NOT invoked in lean mode
+- [ ] Skip is explicitly noted in output
+- [ ] User approval is still required before write (gate skip ≠ approval skip)
+- [ ] Verdict is COMPLETE after write
+
+---
+
+### Case 5: Edge Case — Previous sprint still has open stories
+
+**Fixture:**
+- `production/sprints/sprint-002.md` exists with 2 stories still `Status: In Progress`
+- Backlog has 5 new unstarted stories
+- `review-mode.txt` contains `full`
+
+**Input:** `/sprint-plan`
+
+**Expected behavior:**
+1. Skill reads sprint-002 and detects 2 open (in-progress) stories
+2. Skill flags: "Sprint 002 has 2 open stories — confirm carry-over before planning sprint 003"
+3. Skill presents user with choice: carry stories over, defer them, or cancel
+4. User confirms carry-over; carried stories are prepended to new sprint with `[CARRY]` tag
+5. Sprint draft is built; PR-SPRINT gate runs; sprint is written on approval
+
+**Assertions:**
+- [ ] Skill checks the most recent sprint file for open stories
+- [ ] User is asked to confirm carry-over before sprint planning continues
+- [ ] Carried stories appear in the new sprint draft with a distinguishing label
+- [ ] Skill does not silently ignore open stories from the previous sprint
+
+---
+
+## Protocol Compliance
+
+- [ ] Shows draft sprint before invoking PR-SPRINT gate or asking to write
+- [ ] Always asks "May I write" before writing sprint file
+- [ ] PR-SPRINT gate only runs in full mode
+- [ ] Skip message appears in lean and solo mode output
+- [ ] Verdict is clearly stated at the end of the skill output
+
+---
+
+## Coverage Notes
+
+- The case where no milestone file exists is not explicitly tested; behavior
+  follows the BLOCKED pattern with a suggestion to run `/gate-check` for
+  milestone progression.
+- Solo mode behavior is equivalent to lean (gate skipped, user approval
+  required) and is not separately tested.
+- Parallel story selection algorithms are not tested here; those are unit
+  concerns for the sprint-plan subagent.
--- a/Framework/skills/sprint/sprint-status.md
+++ b/Framework/skills/sprint/sprint-status.md
@@ -0,0 +1,167 @@
+# Skill Test Spec: /sprint-status
+
+## Skill Summary
+
+`/sprint-status` is a Haiku-tier read-only skill that reads the current active
+sprint file and the session state to produce a concise sprint health summary.
+It reports story counts by status (Complete / In Progress / Blocked / Not Started)
+and emits one of three sprint-health verdicts: ON TRACK, AT RISK, or BLOCKED.
+It never writes files and does not invoke any director gates. It is designed for
+fast, low-cost status checks during a session.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings or numbered check sections
+- [ ] Contains verdict keywords: ON TRACK, AT RISK, BLOCKED
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff (what to do based on the verdict)
+
+---
+
+## Director Gate Checks
+
+None. `/sprint-status` is a read-only reporting skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Mixed sprint, AT RISK with named blocker
+
+**Fixture:**
+- `production/sprints/sprint-004.md` exists (active sprint, linked in `active.md`)
+- Sprint contains 6 stories:
+  - 3 with `Status: Complete`
+  - 2 with `Status: In Progress`
+  - 1 with `Status: Blocked` (blocker: "Waiting on physics ADR acceptance")
+- Sprint end date is 2 days away
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads `production/session-state/active.md` to find active sprint reference
+2. Skill reads `production/sprints/sprint-004.md`
+3. Skill counts stories by status: 3 Complete, 2 In Progress, 1 Blocked
+4. Skill detects a Blocked story and the approaching deadline
+5. Skill outputs AT RISK verdict with the blocker named explicitly
+
+**Assertions:**
+- [ ] Output includes story count breakdown by status
+- [ ] Output names the specific blocked story and its blocker reason
+- [ ] Verdict is AT RISK (not BLOCKED, not ON TRACK) when any story is Blocked
+- [ ] Skill does not write any files
+
+---
+
+### Case 2: All Stories Complete — Sprint COMPLETE verdict
+
+**Fixture:**
+- `production/sprints/sprint-004.md` exists
+- All 5 stories have `Status: Complete`
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads sprint file — all stories are Complete
+2. Skill outputs ON TRACK verdict or SPRINT COMPLETE label
+3. Skill suggests running `/milestone-review` or `/sprint-plan` as next steps
+
+**Assertions:**
+- [ ] Verdict is ON TRACK or SPRINT COMPLETE when all stories are Complete
+- [ ] Output notes that the sprint is fully done
+- [ ] Next-step suggestion references `/milestone-review` or `/sprint-plan`
+- [ ] No files are written
+
+---
+
+### Case 3: No Active Sprint File — Guidance to run /sprint-plan
+
+**Fixture:**
+- `production/session-state/active.md` does not reference an active sprint
+- `production/sprints/` directory is empty or absent
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads `active.md` — finds no active sprint reference
+2. Skill checks `production/sprints/` — finds no files
+3. Skill outputs an informational message: no active sprint detected
+4. Skill suggests running `/sprint-plan` to create one
+
+**Assertions:**
+- [ ] Skill does not error or crash when no sprint file exists
+- [ ] Output clearly states no active sprint was found
+- [ ] Output recommends `/sprint-plan` as the next action
+- [ ] No verdict keyword is emitted (no sprint to assess)
+
+---
+
+### Case 4: Edge Case — Stale In Progress Story (flagged)
+
+**Fixture:**
+- `production/sprints/sprint-004.md` exists
+- One story has `Status: In Progress` with a note in `active.md`:
+  `Last updated: 2026-03-30` (more than 2 days before today's session date)
+- No stories are Blocked
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads sprint file and session state
+2. Skill detects the story has been In Progress for >2 days without update
+3. Skill flags the story as "stale" in the output
+4. Verdict is AT RISK (stale in-progress stories indicate a hidden blocker)
+
+**Assertions:**
+- [ ] Skill compares story "last updated" metadata against session date
+- [ ] Stale In Progress story is flagged by name in the output
+- [ ] Verdict is AT RISK, not ON TRACK, when a stale story is detected
+- [ ] Output does not conflate "stale" with "Blocked" — the label is distinct
+
+---
+
+### Case 5: Gate Compliance — Read-only; no gate invocation
+
+**Fixture:**
+- `production/sprints/sprint-004.md` exists with 4 stories (2 Complete, 2 In Progress)
+- `production/session-state/review-mode.txt` contains `full`
+
+**Input:** `/sprint-status`
+
+**Expected behavior:**
+1. Skill reads sprint and produces status summary
+2. Skill does NOT invoke any director gate regardless of review mode
+3. Output is a plain status report with ON TRACK, AT RISK, or BLOCKED verdict
+4. Skill does not prompt for user approval or ask to write any file
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Output does not contain any "May I write" prompt
+- [ ] Skill completes and returns a verdict without user interaction
+- [ ] Review mode file is ignored (or confirmed irrelevant) by this skill
+
+---
+
+## Protocol Compliance
+
+- [ ] Does NOT use Write or Edit tools (read-only skill)
+- [ ] Presents story count breakdown before emitting verdict
+- [ ] Does not ask for approval
+- [ ] Ends with a recommended next step based on verdict
+- [ ] Runs on Haiku model tier (fast, low-cost)
+
+---
+
+## Coverage Notes
+
+- The case where multiple sprints are active simultaneously is not tested;
+  the skill reads whichever sprint `active.md` references.
+- Partial sprint completion percentages are not explicitly verified; the
+  count-by-status output implies them.
+- The `solo` mode review-mode variant is not separately tested; gate
+  behavior in Case 5 applies to all modes equally.
--- a/Framework/skills/team/team-audio.md
+++ b/Framework/skills/team/team-audio.md
@@ -0,0 +1,210 @@
+# Skill Test Spec: /team-audio
+
+## Skill Summary
+
+Orchestrates the audio team through a four-step pipeline: audio direction
+(audio-director) → sound design + accessibility review in parallel (sound-designer
+ accessibility-specialist) → technical implementation + engine validation in
+parallel (technical-artist + primary engine specialist) → code integration
+(gameplay-programmer). Reads relevant GDDs, the sound bible (if present), and
+existing audio asset lists before spawning agents. Compiles all outputs into an
+audio design document saved to `design/gdd/audio-[feature].md`. Uses
+`AskUserQuestion` at each step transition. Verdict is COMPLETE when the audio
+design document is produced. Skips the engine specialist spawn gracefully when no
+engine is configured.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 step/phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "File Write Protocol" section
+- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
+- [ ] Sub-agents enforce "May I write to [path]?" before any write
+- [ ] Has a next-step handoff at the end (references `/dev-story`, `/asset-audit`)
+- [ ] Error Recovery Protocol section is present
+- [ ] `AskUserQuestion` is used at step transitions before proceeding
+- [ ] Step 2 explicitly spawns sound-designer and accessibility-specialist in parallel
+- [ ] Step 3 explicitly spawns technical-artist and engine specialist in parallel (when engine is configured)
+- [ ] Skill reads `design/gdd/sound-bible.md` during context gathering if it exists
+- [ ] Output document is saved to `design/gdd/audio-[feature].md`
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All steps complete, audio design document saved
+
+**Fixture:**
+- GDD for the target feature exists at `design/gdd/combat.md`
+- Sound bible exists at `design/gdd/sound-bible.md`
+- Existing audio assets are listed in `assets/audio/`
+- Engine is configured in `.claude/docs/technical-preferences.md`
+- No accessibility gaps exist in the planned audio event list
+
+**Input:** `/team-audio combat`
+
+**Expected behavior:**
+1. Context gathering: orchestrator reads `design/gdd/combat.md`, `design/gdd/sound-bible.md`, and `assets/audio/` asset list before spawning any agent
+2. Step 1: audio-director is spawned; defines sonic identity, emotional tone, adaptive music direction, mix targets, and adaptive audio rules for combat
+3. `AskUserQuestion` presents audio direction; user approves before Step 2 begins
+4. Step 2: sound-designer and accessibility-specialist are spawned in parallel; sound-designer produces SFX specifications, audio event list with trigger conditions, and mixing groups; accessibility-specialist identifies critical gameplay audio events and specifies visual fallback and subtitle requirements
+5. `AskUserQuestion` presents SFX spec and accessibility requirements; user approves before Step 3 begins
+6. Step 3: technical-artist and primary engine specialist are spawned in parallel; technical-artist designs bus structure, middleware integration, memory budgets, and streaming strategy; engine specialist validates that the integration approach is idiomatic for the configured engine
+7. `AskUserQuestion` presents technical plan; user approves before Step 4 begins
+8. Step 4: gameplay-programmer is spawned; wires up audio events to gameplay triggers, implements adaptive music, sets up occlusion zones, writes unit tests for audio event triggers
+9. Orchestrator compiles all outputs into a single audio design document
+10. Subagent asks "May I write the audio design document to `design/gdd/audio-combat.md`?" before writing
+11. Summary output lists: audio event count, estimated asset count, implementation tasks, and any open questions
+12. Verdict: COMPLETE
+
+**Assertions:**
+- [ ] Sound bible is read during context gathering (before Step 1) when it exists
+- [ ] audio-director is spawned before sound-designer or accessibility-specialist
+- [ ] `AskUserQuestion` appears after Step 1 output and before Step 2 launch
+- [ ] sound-designer and accessibility-specialist Task calls are issued simultaneously in Step 2
+- [ ] technical-artist and engine specialist Task calls are issued simultaneously in Step 3
+- [ ] gameplay-programmer is not launched until Step 3 `AskUserQuestion` is approved
+- [ ] Audio design document is written to `design/gdd/audio-combat.md` (not another path)
+- [ ] Summary includes audio event count and estimated asset count
+- [ ] No files are written by the orchestrator directly
+- [ ] Verdict is COMPLETE after document delivery
+
+---
+
+### Case 2: Accessibility Gap — Critical gameplay audio event has no visual fallback
+
+**Fixture:**
+- GDD for the target feature exists
+- Step 1 and Step 2 are in progress
+- sound-designer's audio event list includes "EnemyNearbyAlert" — a spatial audio cue that warns the player an enemy is approaching from off-screen
+- accessibility-specialist reviews the event list and finds "EnemyNearbyAlert" has no visual fallback (no on-screen indicator, no subtitle, no controller rumble specified)
+
+**Input:** `/team-audio stealth` (Step 2 scenario)
+
+**Expected behavior:**
+1. Steps 1–2 proceed; accessibility-specialist and sound-designer are spawned in parallel
+2. accessibility-specialist returns its review with a BLOCKING concern: "`EnemyNearbyAlert` is a critical gameplay audio event (warns player of off-screen threat) with no visual fallback — hearing-impaired players cannot detect this threat. This is a BLOCKING accessibility gap."
+3. Orchestrator surfaces the concern immediately in conversation before presenting `AskUserQuestion`
+4. `AskUserQuestion` presents the accessibility concern as a BLOCKING issue with options:
+   - Add a visual indicator for EnemyNearbyAlert (e.g., directional arrow on HUD) and continue
+   - Add controller haptic feedback as the fallback and continue
+   - Stop here and resolve all accessibility gaps before proceeding to Step 3
+5. Step 3 (technical-artist + engine specialist) is not launched until the user resolves or explicitly accepts the gap
+6. The accessibility gap is included in the final audio design document under "Open Accessibility Issues" if unresolved
+
+**Assertions:**
+- [ ] Accessibility gap is labeled BLOCKING (not advisory) in the report
+- [ ] The specific event name ("EnemyNearbyAlert") and the nature of the gap are stated
+- [ ] `AskUserQuestion` surfaces the gap before Step 3 is launched
+- [ ] At least one resolution option is offered (add visual fallback, add haptic fallback)
+- [ ] Step 3 is not launched while the gap is unresolved without explicit user authorization
+- [ ] If the gap is carried forward unresolved, it is documented in the audio design doc as an open issue
+
+---
+
+### Case 3: No Argument — Usage guidance or design doc inference
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-audio` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Outputs usage guidance: e.g., "Usage: `/team-audio [feature or area]` — specify the feature or area to design audio for (e.g., `combat`, `main menu`, `forest biome`, `boss encounter`)"
+3. Skill exits without spawning any agents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any agents when no argument is provided
+- [ ] Usage message includes the correct invocation format with argument examples
+- [ ] Skill does NOT attempt to infer a feature from existing design docs without user direction
+- [ ] No `AskUserQuestion` is used — output is direct guidance
+
+---
+
+### Case 4: Missing Sound Bible — Skill notes the gap and proceeds without it
+
+**Fixture:**
+- GDD for the target feature exists at `design/gdd/main-menu.md`
+- `design/gdd/sound-bible.md` does NOT exist
+- Engine is configured; other context files are present
+
+**Input:** `/team-audio main menu`
+
+**Expected behavior:**
+1. Context gathering: orchestrator reads `design/gdd/main-menu.md` and checks for `design/gdd/sound-bible.md`
+2. Sound bible is not found; orchestrator notes the gap in conversation: "Note: `design/gdd/sound-bible.md` not found — audio direction will proceed without a project-wide sonic identity reference. Consider creating a sound bible if this is an ongoing project."
+3. Pipeline proceeds normally through all four steps without the sound bible as input
+4. audio-director in Step 1 is informed that no sound bible exists and must establish sonic identity from the feature GDD alone
+5. The missing sound bible is mentioned in the final summary as a recommended next step
+
+**Assertions:**
+- [ ] Orchestrator checks for the sound bible during context gathering (before Step 1)
+- [ ] Missing sound bible is noted explicitly in conversation — not silently ignored
+- [ ] Pipeline does NOT halt due to the missing sound bible
+- [ ] audio-director is notified that no sound bible exists in its prompt context
+- [ ] Summary or Next Steps section recommends creating a sound bible
+- [ ] Verdict is still COMPLETE if all other steps succeed
+
+---
+
+### Case 5: Engine Not Configured — Engine specialist step skipped gracefully
+
+**Fixture:**
+- Engine is NOT configured in `.claude/docs/technical-preferences.md` (shows `[TO BE CONFIGURED]`)
+- GDD for the target feature exists
+- Sound bible may or may not exist
+
+**Input:** `/team-audio boss encounter`
+
+**Expected behavior:**
+1. Context gathering: orchestrator reads `.claude/docs/technical-preferences.md` and detects no engine is configured
+2. Steps 1–2 proceed normally (audio-director, sound-designer, accessibility-specialist)
+3. Step 3: technical-artist is spawned normally; engine specialist spawn is SKIPPED
+4. Orchestrator notes in conversation: "Engine specialist not spawned — no engine configured in technical-preferences.md. Engine integration validation will be deferred until an engine is selected."
+5. Step 4: gameplay-programmer proceeds with a note that engine-specific audio integration patterns could not be validated
+6. The engine specialist gap is included in the audio design document under "Deferred Validation"
+7. Verdict: COMPLETE (skip is graceful, not a blocker)
+
+**Assertions:**
+- [ ] Engine specialist is NOT spawned when no engine is configured
+- [ ] Skill does NOT error out due to the missing engine configuration
+- [ ] The skip is explicitly noted in conversation — not silently omitted
+- [ ] technical-artist is still spawned in Step 3 (skip applies only to the engine specialist)
+- [ ] gameplay-programmer proceeds in Step 4 with the deferred validation noted
+- [ ] Deferred engine validation is recorded in the audio design document
+- [ ] Verdict is COMPLETE (engine not configured is a known graceful case)
+
+---
+
+## Protocol Compliance
+
+- [ ] Context gathering (GDDs, sound bible, asset list) runs before any agent is spawned
+- [ ] `AskUserQuestion` is used after every step output before the next step launches
+- [ ] Parallel spawning: Step 2 (sound-designer + accessibility-specialist) and Step 3 (technical-artist + engine specialist) issue all Task calls before waiting for results
+- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
+- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
+- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
+- [ ] A partial report is always produced when some agents complete and others block
+- [ ] Audio design document path follows the pattern `design/gdd/audio-[feature].md`
+- [ ] Verdict is exactly COMPLETE or BLOCKED — no other verdict values used
+- [ ] Next Steps handoff references `/dev-story` and `/asset-audit`
+
+---
+
+## Coverage Notes
+
+- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
+  Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+  + partial-report pattern validated in Cases 2 and 5.
+- Step 4 (gameplay-programmer) happy-path behavior is validated implicitly by Case 1.
+  Failure modes for this step follow the standard Error Recovery Protocol.
+- The accessibility-specialist's subtitle and caption requirements (beyond visual fallbacks)
+  are validated implicitly by Case 1. Case 2 focuses on the more severe case where a
+  critical gameplay event has no fallback at all.
+- Engine specialist validation logic (idiomatic integration, version-specific changes) is
+  tested only for the configured and unconfigured states. The specific content of the
+  engine specialist's output is out of scope for this behavioral spec.
--- a/Framework/skills/team/team-combat.md
+++ b/Framework/skills/team/team-combat.md
@@ -0,0 +1,180 @@
+# Skill Test Spec: /team-combat
+
+## Skill Summary
+
+Orchestrates the full combat team pipeline end-to-end for a single combat feature.
+Coordinates game-designer, gameplay-programmer, ai-programmer, technical-artist,
+sound-designer, the primary engine specialist, and qa-tester through six structured
+phases: Design → Architecture (with engine specialist validation) → Implementation
+(parallel) → Integration → Validation → Sign-off. Uses `AskUserQuestion` at each
+phase transition. Delegates all file writes to sub-agents. Produces a summary report
+with verdict COMPLETE / NEEDS WORK / BLOCKED and handoffs to `/code-review`,
+`/balance-check`, and `/team-polish`.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings (Phase 1 through Phase 6 are all present)
+- [ ] Contains verdict keywords: COMPLETE, NEEDS WORK, BLOCKED
+- [ ] Contains "May I write" or "File Write Protocol" — writes delegated to sub-agents, orchestrator does not write files directly
+- [ ] Has a next-step handoff at the end (references `/code-review`, `/balance-check`, `/team-polish`)
+- [ ] Error Recovery Protocol section is present with all four recovery steps
+- [ ] Uses `AskUserQuestion` at phase transitions for user approval before proceeding
+- [ ] Phase 3 is explicitly marked as parallel (gameplay-programmer, ai-programmer, technical-artist, sound-designer)
+- [ ] Phase 2 includes spawning the primary engine specialist (read from `.claude/docs/technical-preferences.md`)
+- [ ] Team Composition lists all seven roles (game-designer, gameplay-programmer, ai-programmer, technical-artist, sound-designer, engine specialist, qa-tester)
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All agents succeed, full pipeline runs to completion
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists and is populated
+- Engine is configured in `.claude/docs/technical-preferences.md` (Engine Specialists section filled)
+- No existing GDD for the requested combat feature
+
+**Input:** `/team-combat parry and riposte system`
+
+**Expected behavior:**
+1. Phase 1 — game-designer spawned; produces `design/gdd/parry-riposte.md` covering all 8 required sections (overview, player fantasy, rules, formulas, edge cases, dependencies, tuning knobs, acceptance criteria); asks user to approve design doc
+2. Phase 2 — gameplay-programmer + ai-programmer spawned; produce architecture sketch with class structure, interfaces, and file list; then primary engine specialist is spawned to validate idioms; engine specialist output incorporated; `AskUserQuestion` presented with architecture options before Phase 3 begins
+3. Phase 3 — gameplay-programmer, ai-programmer, technical-artist, sound-designer spawned in parallel; all four return outputs before Phase 4 begins
+4. Phase 4 — integration wires together all Phase 3 outputs; tuning knobs verified as data-driven; `AskUserQuestion` confirms integration before Phase 5
+5. Phase 5 — qa-tester spawned; writes test cases from acceptance criteria; verifies edge cases; performance impact checked against budget
+6. Phase 6 — summary report produced: design COMPLETE, all team members COMPLETE, test cases listed, verdict: COMPLETE
+7. Next steps listed: `/code-review`, `/balance-check`, `/team-polish`
+
+**Assertions:**
+- [ ] `AskUserQuestion` called at each phase gate (at minimum before Phase 3 and before Phase 5)
+- [ ] Phase 3 agents launched simultaneously — no sequential dependency between gameplay-programmer, ai-programmer, technical-artist, sound-designer
+- [ ] Engine specialist runs in Phase 2 before Phase 3 begins (output incorporated into architecture)
+- [ ] All file writes delegated to sub-agents (orchestrator never calls Write/Edit directly)
+- [ ] Verdict COMPLETE present in final report
+- [ ] Next steps include `/code-review`, `/balance-check`, `/team-polish`
+- [ ] Design doc covers all 8 required GDD sections
+
+---
+
+### Case 2: Blocked Agent — One subagent returns BLOCKED mid-pipeline
+
+**Fixture:**
+- `design/gdd/parry-riposte.md` exists (Phase 1 already complete)
+- ai-programmer agent returns BLOCKED because no AI system architecture ADR exists (ADR status is Proposed)
+
+**Input:** `/team-combat parry and riposte system`
+
+**Expected behavior:**
+1. Phase 1 — design doc found; game-designer confirms it is valid; phase approved
+2. Phase 2 — gameplay-programmer completes architecture sketch; ai-programmer returns BLOCKED: "ADR for AI behavior system is Proposed — cannot implement until ADR is Accepted"
+3. Error Recovery Protocol triggered: "ai-programmer: BLOCKED — AI behavior ADR is Proposed"
+4. `AskUserQuestion` presented with options: (a) Skip ai-programmer and note the gap; (b) Retry with narrower scope; (c) Stop here and run `/architecture-decision` first
+5. If user chooses (a): Phase 3 proceeds with gameplay-programmer, technical-artist, sound-designer only; ai-programmer gap noted in partial report
+6. Final report produced: partial implementation documented, ai-programmer section marked BLOCKED, overall verdict: BLOCKED
+
+**Assertions:**
+- [ ] BLOCKED surface message appears before any dependent phase continues
+- [ ] `AskUserQuestion` offers at minimum three options: skip / retry / stop
+- [ ] Partial report produced — completed agents' work is not discarded
+- [ ] Overall verdict is BLOCKED (not COMPLETE) when any agent is unresolved
+- [ ] Blocked reason references the ADR and suggests `/architecture-decision`
+- [ ] Orchestrator does not silently proceed past the blocked dependency
+
+---
+
+### Case 3: No Argument — Clear usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-combat` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument provided
+2. Outputs usage message explaining the required argument (combat feature description)
+3. Provides an example invocation: `/team-combat [combat feature description]`
+4. Skill exits without spawning any subagents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any subagents when no argument is given
+- [ ] Usage message includes the argument-hint format from frontmatter
+- [ ] Error message includes at least one example of a valid invocation
+- [ ] No file reads beyond what is needed to detect the missing argument
+- [ ] Verdict is NOT shown (pipeline never runs)
+
+---
+
+### Case 4: Parallel Phase Validation — Phase 3 agents run simultaneously
+
+**Fixture:**
+- `design/gdd/parry-riposte.md` exists and is complete
+- Architecture sketch has been approved
+- Engine specialist has validated architecture
+
+**Input:** `/team-combat parry and riposte system` (resuming from Phase 2 complete)
+
+**Expected behavior:**
+1. Phase 3 begins after architecture approval
+2. All four Task calls — gameplay-programmer, ai-programmer, technical-artist, sound-designer — are issued before any result is awaited
+3. Skill waits for all four agents to complete before proceeding to Phase 4
+4. If any single agent completes early, skill does not begin Phase 4 until all four have returned
+
+**Assertions:**
+- [ ] Four Task calls issued in a single batch (no sequential waiting between them)
+- [ ] Phase 4 does not begin until all four Phase 3 agents have returned results
+- [ ] Skill does not pass one Phase 3 agent's output as input to another Phase 3 agent (they are independent)
+- [ ] All four Phase 3 agent results referenced in the Phase 4 integration step
+
+---
+
+### Case 5: Architecture Phase Engine Routing — Engine specialist receives correct context
+
+**Fixture:**
+- `.claude/docs/technical-preferences.md` has Engine Specialists section populated (e.g., Primary: godot-specialist)
+- Architecture sketch produced by gameplay-programmer is available
+- Engine version pinned in `docs/engine-reference/godot/VERSION.md`
+
+**Input:** `/team-combat parry and riposte system`
+
+**Expected behavior:**
+1. Phase 2 — gameplay-programmer produces architecture sketch
+2. Skill reads `.claude/docs/technical-preferences.md` Engine Specialists section to identify the primary engine specialist agent type
+3. Engine specialist is spawned with: the architecture sketch, the GDD path, the engine version from `VERSION.md`, and explicit instructions to check for deprecated APIs
+4. Engine specialist output (idiom notes, deprecated API warnings, native system recommendations) is returned to orchestrator
+5. Orchestrator incorporates engine notes into the architecture before presenting Phase 2 results to user
+6. `AskUserQuestion` includes engine specialist's notes alongside the architecture sketch
+
+**Assertions:**
+- [ ] Engine specialist agent type is read from `.claude/docs/technical-preferences.md` — not hardcoded
+- [ ] Engine specialist prompt includes the architecture sketch and GDD path
+- [ ] Engine specialist checks for deprecated APIs against the pinned engine version
+- [ ] Engine specialist output is incorporated before Phase 3 begins (not skipped or appended separately)
+- [ ] If no engine is configured, engine specialist step is skipped and a note is added to the report
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at each phase transition — user approves before pipeline advances
+- [ ] All file writes delegated to sub-agents via Task — orchestrator does not call Write or Edit directly
+- [ ] Error Recovery Protocol followed: surface → assess → offer options → partial report
+- [ ] Phase 3 agents launched in parallel per skill spec
+- [ ] Partial report always produced even when agents are BLOCKED
+- [ ] Verdict is one of COMPLETE / NEEDS WORK / BLOCKED
+- [ ] Next steps present at end of output: `/code-review`, `/balance-check`, `/team-polish`
+
+---
+
+## Coverage Notes
+
+- The NEEDS WORK verdict path (qa-tester finds failures in Phase 5) is not separately tested
+  here; it follows the same error recovery and partial report protocol as Case 2.
+- "Retry with narrower scope" error recovery option is listed in assertions but its full
+  recursive behavior (splitting via `/create-stories`) is covered by the `/create-stories` spec.
+- Phase 4 integration logic (wiring gameplay, AI, VFX, audio) is validated implicitly by
+  the Happy Path case; a dedicated integration test would require fixture code files.
+- Engine specialist unavailable (no engine configured) is partially covered in Case 5
+  assertions — a dedicated fixture for unconfigured engine state would strengthen coverage.
--- a/Framework/skills/team/team-level.md
+++ b/Framework/skills/team/team-level.md
@@ -0,0 +1,209 @@
+# Skill Test Spec: /team-level
+
+## Skill Summary
+
+Orchestrates the full level design team for a single level or area. Coordinates
+narrative-director, world-builder, level-designer, systems-designer, art-director,
+accessibility-specialist, and qa-tester through five sequential steps with one
+parallel phase (Step 4). Compiles all team outputs into a single level design
+document saved to `design/levels/[level-name].md`. Uses `AskUserQuestion` at each
+step transition. Delegates all file writes to sub-agents. Produces a summary report
+with verdict COMPLETE / BLOCKED and handoffs to `/design-review`, `/dev-story`,
+`/qa-plan`.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase/step headings (Step 1 through Step 5 are all present)
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" or "File Write Protocol" — writes delegated to sub-agents, orchestrator does not write files directly
+- [ ] Has a next-step handoff at the end (references `/design-review`, `/dev-story`, `/qa-plan`)
+- [ ] Error Recovery Protocol section is present with all four recovery steps
+- [ ] Uses `AskUserQuestion` at step transitions for user approval before proceeding
+- [ ] Step 4 is explicitly marked as parallel (art-director and accessibility-specialist run simultaneously)
+- [ ] Context gathering reads: `design/gdd/game-concept.md`, `design/gdd/game-pillars.md`, `design/levels/`, `design/narrative/`, and relevant world-building docs
+- [ ] Team Composition lists all seven roles (narrative-director, world-builder, level-designer, systems-designer, art-director, accessibility-specialist, qa-tester)
+- [ ] accessibility-specialist output includes severity ratings (BLOCKING / RECOMMENDED / NICE TO HAVE)
+- [ ] Final level design document saved to `design/levels/[level-name].md`
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All team members produce outputs, document compiled and saved
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists and is populated
+- `design/gdd/game-pillars.md` exists
+- `design/levels/` directory exists (may contain other level docs)
+- `design/narrative/` directory exists with relevant narrative docs
+
+**Input:** `/team-level forest dungeon`
+
+**Expected behavior:**
+1. Context gathering — orchestrator reads game-concept.md, game-pillars.md, existing level docs in `design/levels/`, narrative docs in `design/narrative/`, and world-building docs for the forest region
+2. Step 1 — narrative-director spawned: defines narrative purpose, key characters, dialogue triggers, emotional arc; world-builder spawned: provides lore context, environmental storytelling opportunities, world rules; `AskUserQuestion` confirms Step 1 outputs before Step 2
+3. Step 2 — level-designer spawned: designs spatial layout (critical path, optional paths, secrets), pacing curve, encounters, puzzles, entry/exit points and connections to adjacent areas; `AskUserQuestion` confirms layout before Step 3
+4. Step 3 — systems-designer spawned: specifies enemy compositions, loot tables, difficulty balance, area-specific mechanics, resource distribution; `AskUserQuestion` confirms systems before Step 4
+5. Step 4 — art-director and accessibility-specialist spawned in parallel; art-director: visual theme, color palette, lighting, asset list, VFX needs; accessibility-specialist: navigation clarity, colorblind safety, cognitive load check — each concern rated BLOCKING / RECOMMENDED / NICE TO HAVE; `AskUserQuestion` presents both outputs before Step 5
+6. Step 5 — qa-tester spawned: test cases for critical path, boundary/edge cases (sequence breaks, softlocks), playtest checklist, acceptance criteria
+7. Orchestrator compiles all team outputs into level design document format; sub-agent asked "May I write to `design/levels/forest-dungeon.md`?"; file saved
+8. Summary report: area overview, encounter count, estimated asset list, narrative beats, cross-team dependencies, verdict: COMPLETE
+9. Next steps listed: `/design-review design/levels/forest-dungeon.md`, `/dev-story`, `/qa-plan`
+
+**Assertions:**
+- [ ] All five sources read during context gathering before any agent is spawned
+- [ ] narrative-director and world-builder both spawned in Step 1 (may be sequential or parallel — both must complete before Step 2)
+- [ ] `AskUserQuestion` called at each step gate (minimum: after Step 1, Step 2, Step 3, Step 4)
+- [ ] Step 4 agents (art-director, accessibility-specialist) launched simultaneously
+- [ ] All file writes delegated to sub-agents — orchestrator does not write directly
+- [ ] Level doc saved to `design/levels/forest-dungeon.md` (slugified from argument)
+- [ ] Verdict COMPLETE in final summary report
+- [ ] Next steps include `/design-review`, `/dev-story`, `/qa-plan`
+- [ ] Summary report includes: area overview, encounter count, estimated asset list, narrative beats
+
+---
+
+### Case 2: Blocked Agent (world-builder) — Partial report produced with gap noted
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists
+- World-building docs for the forest region do NOT exist
+- world-builder agent returns BLOCKED: "No world-building docs found for the forest region — cannot provide lore context"
+
+**Input:** `/team-level forest dungeon`
+
+**Expected behavior:**
+1. Context gathering completes; missing world-building docs noted
+2. Step 1 — narrative-director completes successfully; world-builder spawned and returns BLOCKED
+3. Error Recovery Protocol triggered: "world-builder: BLOCKED — no world-building docs for forest region"
+4. `AskUserQuestion` presented with options:
+   - (a) Skip world-builder and note the lore gap in the level doc
+   - (b) Retry with narrower scope (world-builder focuses only on what can be inferred from game-concept.md)
+   - (c) Stop here and create world-building docs first
+5. If user chooses (a): pipeline continues with Steps 2–5 using narrative-director context only; level doc compiled with a clearly marked gap section: "World-building context: NOT PROVIDED — see open dependency"
+6. Final report produced: partial outputs documented, world-builder section marked BLOCKED, overall verdict: BLOCKED
+
+**Assertions:**
+- [ ] BLOCKED surface message appears immediately when world-builder fails — before Step 2 begins without user input
+- [ ] `AskUserQuestion` offers at minimum three options (skip / retry / stop)
+- [ ] Partial report produced — narrative-director's completed work is not discarded
+- [ ] Level doc (if compiled) contains an explicit gap notation for the missing world-building context
+- [ ] Overall verdict is BLOCKED (not COMPLETE) when world-builder remains unresolved
+- [ ] Skill does NOT silently fabricate lore content to fill the gap
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-level` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument provided
+2. Outputs usage message explaining the required argument (level name or area to design)
+3. Provides example invocations: `/team-level tutorial`, `/team-level forest dungeon`, `/team-level final boss arena`
+4. Skill exits without reading any project files or spawning any subagents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any subagents when no argument is given
+- [ ] Usage message includes the argument-hint format from frontmatter
+- [ ] At least one example of a valid invocation is shown
+- [ ] No GDD or level files read before failing
+- [ ] Verdict is NOT shown (pipeline never starts)
+
+---
+
+### Case 4: Accessibility Review Gate — Blocking concern surfaces before sign-off
+
+**Fixture:**
+- Steps 1–3 complete successfully
+- `design/accessibility-requirements.md` committed tier: Enhanced
+- accessibility-specialist (Step 4, parallel) flags a BLOCKING concern: the critical path through the forest dungeon requires players to distinguish between two environmental hazards (toxic pools vs. shallow water) using color alone — no shape, icon, or audio cue differentiates them
+
+**Input:** `/team-level forest dungeon`
+
+**Expected behavior:**
+1. Steps 1–3 complete; Step 4 parallel phase begins
+2. accessibility-specialist returns: BLOCKING concern — "Critical path hazard distinction relies on color only (toxic pools vs. shallow water). Shape, icon, or audio cue required per Enhanced accessibility tier."
+3. art-director returns Step 4 output (complete)
+4. Skill presents both Step 4 results via `AskUserQuestion` — BLOCKING concern highlighted prominently
+5. `AskUserQuestion` offers:
+   - (a) Return to level-designer + art-director to redesign hazard visual/audio language before Step 5
+   - (b) Document as a known accessibility gap and proceed to Step 5 with the concern logged
+6. Skill does NOT silently proceed past the BLOCKING concern
+7. If user chooses (a): level-designer and art-director revision spawned; re-run Step 4 accessibility check
+8. Final report includes BLOCKING concern and its resolution status regardless of user choice
+
+**Assertions:**
+- [ ] BLOCKING accessibility concern is not treated as advisory — it is surfaced as a blocker
+- [ ] `AskUserQuestion` presents the specific concern text (not just "accessibility issue found")
+- [ ] Step 5 (qa-tester) does NOT begin without user acknowledging the BLOCKING concern
+- [ ] Revision path offered: level-designer + art-director can be sent back before proceeding
+- [ ] Final report includes the accessibility concern and its resolution status
+- [ ] art-director's completed output is NOT discarded when accessibility-specialist blocks
+
+---
+
+### Case 5: Circular Level Reference — Adjacent area dependency flagged
+
+**Fixture:**
+- Steps 1–3 in progress
+- level-designer (Step 2) produces a layout that specifies entry/exit points connecting to "the crystal caves" (an adjacent area)
+- `design/levels/crystal-caves.md` does NOT exist — the crystal caves area has not been designed yet
+
+**Input:** `/team-level forest dungeon`
+
+**Expected behavior:**
+1. Step 2 — level-designer produces layout including: "West exit connects to crystal-caves entry point A"
+2. Orchestrator (or level-designer subagent) checks `design/levels/` for `crystal-caves.md`; file not found
+3. Dependency gap surfaced: "Level references crystal-caves as an adjacent area but `design/levels/crystal-caves.md` does not exist"
+4. `AskUserQuestion` presented with options:
+   - (a) Proceed with a placeholder reference — note the dependency in the level doc as UNRESOLVED
+   - (b) Pause and run `/team-level crystal caves` first to establish that area
+5. Skill does NOT invent crystal caves content to satisfy the reference
+6. If user chooses (a): level doc compiled with the west exit marked "→ crystal-caves (UNRESOLVED — area not yet designed)"; flagged in the open dependencies section of the summary report
+7. Final report includes open cross-level dependencies section
+
+**Assertions:**
+- [ ] Skill detects the missing adjacent area by checking `design/levels/` — does not assume it will be created later
+- [ ] Skill does NOT fabricate crystal caves content (lore, layout, connections) to resolve the reference
+- [ ] `AskUserQuestion` offers a "design crystal caves first" option referencing `/team-level`
+- [ ] If user proceeds with placeholder, level doc explicitly marks the west exit as UNRESOLVED
+- [ ] Summary report includes an open cross-level dependencies section listing unresolved references
+- [ ] Circular or forward references do not cause the skill to loop or crash
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at each step transition — user approves before pipeline advances
+- [ ] All file writes delegated to sub-agents via Task — orchestrator does not call Write or Edit directly
+- [ ] Error Recovery Protocol followed: surface → assess → offer options → partial report
+- [ ] Step 4 agents (art-director, accessibility-specialist) launched in parallel per skill spec
+- [ ] Partial report always produced even when agents are BLOCKED
+- [ ] Accessibility BLOCKING concerns surface before sign-off and require explicit user acknowledgment
+- [ ] Verdict is one of COMPLETE / BLOCKED
+- [ ] Next steps present at end: `/design-review`, `/dev-story`, `/qa-plan`
+
+---
+
+## Coverage Notes
+
+- narrative-director and world-builder in Step 1 may be sequential or parallel — the skill spec
+  spawns both but does not mandate simultaneous launch; coverage of parallel Step 1 would require
+  an explicit timing assertion fixture.
+- The "Retry with narrower scope" option in the blocked world-builder case (Case 2) — the
+  retry behavior itself is not tested in depth; its full path is analogous to the blocked agent
+  pattern covered in Case 2 and in other team-* specs.
+- systems-designer (Step 3) block scenarios are not separately tested; the same Error Recovery
+  Protocol applies and the pattern is validated by Case 2.
+- Step 4 parallel ordering (art-director completing before or after accessibility-specialist)
+  does not affect outcomes — both must return before Step 5 regardless of order.
+- The level doc slug convention (argument → filename) is implicitly tested by Case 1
+  (`forest dungeon` → `forest-dungeon.md`); multi-word slugification edge cases (special
+  characters, very long names) are not covered.
--- a/Framework/skills/team/team-live-ops.md
+++ b/Framework/skills/team/team-live-ops.md
@@ -0,0 +1,178 @@
+# Skill Test Spec: /team-live-ops
+
+## Skill Summary
+
+Orchestrates the live-ops team through a 7-phase planning pipeline to produce a
+season or event plan. Coordinates live-ops-designer, economy-designer,
+analytics-engineer, community-manager, narrative-director, and writer. Phases 3
+and 4 (economy design and analytics) run simultaneously. Ends with a consolidated
+season plan requiring user approval before handoff to production.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" language in the File Write Protocol section (delegated to sub-agents)
+- [ ] Has a File Write Protocol section stating that the orchestrator does not write files directly
+- [ ] Has a next-step handoff at the end referencing `/design-review`, `/sprint-plan`, and `/team-release`
+- [ ] Uses `AskUserQuestion` at phase transitions to capture user approval before proceeding
+- [ ] States explicitly that Phases 3 and 4 can run simultaneously (parallel spawning)
+- [ ] Error recovery section present (or implied through BLOCKED handling)
+- [ ] Output documents section specifies paths under `design/live-ops/seasons/`
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All 7 phases complete, season plan produced
+
+**Fixture:**
+- `design/live-ops/economy-rules.md` exists with current economy configuration
+- `design/live-ops/ethics-policy.md` exists with the project ethics policy
+- Game concept document exists at its standard path
+- No existing season documents for the new season name being planned
+
+**Input:** `/team-live-ops "Season 2: The Frozen Wastes"`
+
+**Expected behavior:**
+1. Phase 1: Spawns `live-ops-designer` via Task; receives season brief with scope, content list, and retention mechanic; presents to user
+2. AskUserQuestion: user approves Phase 1 output before Phase 2 begins
+3. Phase 2: Spawns `narrative-director` via Task; reads the Phase 1 season brief; produces narrative framing document (theme, story hook, lore connections); presents to user
+4. Phase 3 and 4 (parallel): Spawns `economy-designer` and `analytics-engineer` simultaneously via two Task calls before waiting for either result; economy-designer reads `design/live-ops/economy-rules.md`
+5. Phase 5: Spawns `narrative-director` and `writer` in parallel to produce in-game narrative text and player-facing copy; both read Phase 2 narrative framing doc
+6. Phase 6: Spawns `community-manager` via Task; reads season brief, economy design, and narrative framing; produces communication calendar with draft copy
+7. Phase 7: Collects all phase outputs; presents consolidated season plan summary including economy health check, analytics readiness, ethics review, and open questions
+8. AskUserQuestion: user approves the full season plan
+9. Sub-agents ask "May I write to `design/live-ops/seasons/S2_The_Frozen_Wastes.md`?", `...analytics.md`, and `...comms.md` before writing
+10. Verdict: COMPLETE — season plan produced and handed off for production
+
+**Assertions:**
+- [ ] All 7 phases execute in order; Phase 3 and 4 are issued as parallel Task calls
+- [ ] Phase 7 consolidated summary includes all six sections (season brief, narrative framing, economy design, analytics plan, content inventory, communication calendar)
+- [ ] Ethics review section in Phase 7 explicitly references `design/live-ops/ethics-policy.md`
+- [ ] Three output documents written to `design/live-ops/seasons/` with correct naming convention
+- [ ] File writes are delegated to sub-agents — orchestrator does not write directly
+- [ ] Verdict: COMPLETE appears in final output
+- [ ] Next steps reference `/design-review`, `/sprint-plan`, and `/team-release`
+
+---
+
+### Case 2: Ethics Violation Found — Reward element violates ethics policy
+
+**Fixture:**
+- All standard live-ops fixtures present (economy-rules.md, ethics-policy.md)
+- `design/live-ops/ethics-policy.md` explicitly prohibits loot boxes targeting players under 18
+- economy-designer (Phase 3) proposes a "Mystery Chest" mechanic with randomized premium rewards and no pity timer
+
+**Input:** `/team-live-ops "Season 3: Shadow Tournament"`
+
+**Expected behavior:**
+1. Phases 1–4 proceed normally; economy-designer proposes Mystery Chest mechanic
+2. Phase 7: Orchestrator reviews Phase 3 output against ethics policy; identifies Mystery Chest as a violation of the "no untransparent random premium rewards" rule in the ethics policy
+3. Ethics review section of the Phase 7 summary flags the violation explicitly: "ETHICS FLAG: Mystery Chest mechanic in Phase 3 economy design violates [policy rule]. Approval is blocked until this is resolved."
+4. AskUserQuestion presented with resolution options before season plan approval is offered
+5. Skill does NOT issue a COMPLETE verdict or write output documents until the ethics violation is resolved or explicitly waived by the user
+
+**Assertions:**
+- [ ] Phase 7 ethics review section explicitly names the violating element and the policy rule it breaks
+- [ ] Skill does not auto-approve the season plan when an ethics violation is present
+- [ ] AskUserQuestion is used to surface the violation and offer resolution options (revise economy design, override with documented rationale, cancel)
+- [ ] Output documents are NOT written while the violation is unresolved
+- [ ] If user chooses to revise: skill re-spawns economy-designer to produce a corrected design before returning to Phase 7 review
+- [ ] Verdict: COMPLETE is only issued after the ethics flag is cleared
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-live-ops` (no argument)
+
+**Expected behavior:**
+1. Phase 1: No argument detected
+2. Outputs: "Usage: `/team-live-ops [season name or event description]` — Provide the name or description of the season or live event to plan."
+3. Skill exits immediately without spawning any subagents
+
+**Assertions:**
+- [ ] Skill does NOT guess a season name or fabricate a scope
+- [ ] Error message includes the correct usage format with the argument-hint
+- [ ] No Task calls are issued before the argument check fails
+- [ ] No files are read or written
+
+---
+
+### Case 4: Parallel Phase Validation — Phases 3 and 4 run simultaneously
+
+**Fixture:**
+- All standard live-ops fixtures present
+- Phase 1 (season brief) and Phase 2 (narrative framing) already approved
+- Phase 3 (economy-designer) and Phase 4 (analytics-engineer) inputs are independent of each other
+
+**Input:** `/team-live-ops "Season 1: The First Thaw"` (observed at Phase 3/4 transition)
+
+**Expected behavior:**
+1. After Phase 2 is approved by the user, the orchestrator issues both Task calls (economy-designer and analytics-engineer) before awaiting either result
+2. Both agents receive the season brief as context; analytics-engineer does NOT wait for economy-designer output to begin
+3. Economy-designer output and analytics-engineer output are collected together before Phase 5 begins
+4. If one of the two parallel agents blocks, the other continues; a partial result is reported
+
+**Assertions:**
+- [ ] Both Task calls for Phase 3 and Phase 4 are issued before either result is awaited — they are not sequential
+- [ ] Analytics-engineer prompt does NOT include economy-designer output as a required input (the inputs are independent)
+- [ ] If economy-designer blocks but analytics-engineer succeeds, analytics output is preserved and the block is surfaced via AskUserQuestion
+- [ ] Phase 5 does not begin until BOTH Phase 3 and Phase 4 results are collected
+- [ ] Skill documentation explicitly states "Phases 3 and 4 can run simultaneously"
+
+---
+
+### Case 5: Missing Ethics Policy — `design/live-ops/ethics-policy.md` does not exist
+
+**Fixture:**
+- `design/live-ops/economy-rules.md` exists
+- `design/live-ops/ethics-policy.md` does NOT exist
+- All other fixtures are present
+
+**Input:** `/team-live-ops "Season 4: Desert Heat"`
+
+**Expected behavior:**
+1. Phases 1–4 proceed; economy-designer and analytics-engineer are given the ethics policy path but it is absent
+2. Phase 7: Orchestrator attempts to run ethics review; detects that `design/live-ops/ethics-policy.md` is missing
+3. Phase 7 summary includes a gap flag: "ETHICS REVIEW SKIPPED: `design/live-ops/ethics-policy.md` not found. Economy design was not reviewed against an ethics policy. Recommend creating one before production begins."
+4. Skill still completes the season plan and reaches COMPLETE verdict, but the gap is prominently flagged in the output and in the season design document
+5. Next steps include a recommendation to create the ethics policy document
+
+**Assertions:**
+- [ ] Skill does NOT error out when the ethics policy file is missing
+- [ ] Skill does NOT fabricate ethics policy rules in the absence of the file
+- [ ] Phase 7 summary explicitly notes that ethics review was skipped and why
+- [ ] Verdict: COMPLETE is still reachable despite the missing file
+- [ ] Gap flag appears in the season design output document (not just in conversation)
+- [ ] Next steps recommend creating `design/live-ops/ethics-policy.md`
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at every phase transition — user approves before the next phase begins
+- [ ] Phases 3 and 4 are always spawned in parallel, not sequentially
+- [ ] File Write Protocol: orchestrator never calls Write/Edit directly — all writes are delegated to sub-agents
+- [ ] Each output document gets its own "May I write to [path]?" ask from the relevant sub-agent
+- [ ] Ethics review in Phase 7 always references the ethics policy file path explicitly
+- [ ] Error recovery: any BLOCKED agent is surfaced immediately with AskUserQuestion options (skip / retry / stop)
+- [ ] Partial reports are produced if any phase blocks — work is never discarded
+- [ ] Verdict: COMPLETE only after user approves the consolidated season plan; BLOCKED if any unresolved ethics violation exists
+- [ ] Next steps always include `/design-review`, `/sprint-plan`, and `/team-release`
+
+---
+
+## Coverage Notes
+
+- Phase 5 parallel spawning (narrative-director + writer) follows the same pattern as Phases 3/4 but is not separately tested here — it uses the same parallel Task protocol validated in Case 4.
+- The "economy-rules.md absent" edge case is not separately tested — it would surface as a BLOCKED result from economy-designer and follow the standard error recovery path tested implicitly in Case 4.
+- The full content writing pipeline (Phase 5 output validation) is validated implicitly by the Case 1 happy path consolidated summary check.
+- Community manager communication calendar format (pre-launch, launch day, mid-season, final week) is validated implicitly by Case 1; no separate edge case is needed.
--- a/Framework/skills/team/team-narrative.md
+++ b/Framework/skills/team/team-narrative.md
@@ -0,0 +1,209 @@
+# Skill Test Spec: /team-narrative
+
+## Skill Summary
+
+Orchestrates the narrative team through a five-phase pipeline: narrative direction
+(narrative-director) → world foundation + dialogue drafting (world-builder and writer
+in parallel) → level narrative integration (level-designer) → consistency review
+(narrative-director) → polish + localization compliance (writer, localization-lead,
+and world-builder in parallel). Uses `AskUserQuestion` at each phase transition to
+present proposals as selectable options. Produces a narrative summary report and
+delivers narrative documents via subagents that each enforce the "May I write?"
+protocol. Verdict is COMPLETE when all phases succeed, or BLOCKED when a dependency
+is unresolved.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "File Write Protocol" section
+- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
+- [ ] Sub-agents enforce "May I write to [path]?" before any write
+- [ ] Has a next-step handoff at the end (references `/design-review`, `/localize extract`, `/dev-story`)
+- [ ] Error Recovery Protocol section is present
+- [ ] `AskUserQuestion` is used at phase transitions before proceeding
+- [ ] Phase 2 explicitly spawns world-builder and writer in parallel
+- [ ] Phase 5 explicitly spawns writer, localization-lead, and world-builder in parallel
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All five phases complete, narrative doc delivered
+
+**Fixture:**
+- A game concept and GDD exist for the target feature (e.g., `design/gdd/faction-intro.md`)
+- Character voice profiles exist (e.g., `design/narrative/characters/`)
+- Existing lore entries exist for cross-reference (e.g., `design/narrative/lore/`)
+- No lore contradictions exist between existing entries and the new content
+
+**Input:** `/team-narrative faction introduction cutscene for the Ironveil faction`
+
+**Expected behavior:**
+1. Phase 1: narrative-director is spawned; outputs a narrative brief defining the story beat, characters involved, emotional tone, and lore dependencies
+2. `AskUserQuestion` presents the narrative brief; user approves before Phase 2 begins
+3. Phase 2: world-builder and writer are spawned in parallel; world-builder produces lore entries for the Ironveil faction; writer drafts dialogue lines using character voice profiles
+4. `AskUserQuestion` presents world foundation and dialogue drafts; user approves before Phase 3 begins
+5. Phase 3: level-designer is spawned; produces environmental storytelling layout, trigger placement, and pacing plan
+6. `AskUserQuestion` presents level narrative plan; user approves before Phase 4 begins
+7. Phase 4: narrative-director reviews all dialogue against voice profiles, verifies lore consistency, confirms pacing; approves or flags issues
+8. `AskUserQuestion` presents review results; user approves before Phase 5 begins
+9. Phase 5: writer, localization-lead, and world-builder are spawned in parallel; writer performs final self-review; localization-lead validates i18n compliance; world-builder finalizes canon levels
+10. Final summary report is presented; subagent asks "May I write the narrative document to [path]?" before writing
+11. Verdict: COMPLETE
+
+**Assertions:**
+- [ ] narrative-director is spawned in Phase 1 before any other agents
+- [ ] `AskUserQuestion` appears after Phase 1 output and before Phase 2 launch
+- [ ] world-builder and writer Task calls are issued simultaneously in Phase 2 (not sequentially)
+- [ ] level-designer is not launched until Phase 2 `AskUserQuestion` is approved
+- [ ] narrative-director is re-spawned in Phase 4 for consistency review
+- [ ] Phase 5 spawns all three agents (writer, localization-lead, world-builder) simultaneously
+- [ ] Summary report includes: narrative brief status, lore entries created/updated, dialogue lines written, level narrative integration points, consistency review results
+- [ ] No files are written by the orchestrator directly
+- [ ] Verdict is COMPLETE after delivery
+
+---
+
+### Case 2: Lore Contradiction Found — world-builder finds conflict before writer proceeds
+
+**Fixture:**
+- Existing lore entry at `design/narrative/lore/ironveil-history.md` states the Ironveil faction was founded 200 years ago
+- The new narrative brief (from Phase 1) states the Ironveil were founded 50 years ago
+- The writer has been spawned in parallel with the world-builder in Phase 2
+
+**Input:** `/team-narrative ironveil faction introduction cutscene`
+
+**Expected behavior:**
+1. Phases 1–2 begin normally
+2. Phase 2 world-builder detects a factual contradiction between the narrative brief and existing lore: founding date conflict
+3. world-builder returns BLOCKED with reason: "Lore contradiction found — founding date conflicts with `design/narrative/lore/ironveil-history.md`"
+4. Orchestrator surfaces the contradiction immediately: "world-builder: BLOCKED — Lore contradiction: founding date in narrative brief (50 years ago) conflicts with existing canon (200 years ago in `ironveil-history.md`)"
+5. Orchestrator assesses dependency: the writer's dialogue depends on canon lore — the writer's draft cannot be finalized without resolving the contradiction
+6. `AskUserQuestion` presents options:
+   - Revise the narrative brief to match existing canon (200 years ago)
+   - Update the existing lore entry to reflect the new canon (50 years ago)
+   - Stop here and resolve the contradiction in the lore docs first
+7. Writer output is preserved but flagged as pending canon resolution — work is not discarded
+8. Orchestrator does NOT proceed to Phase 3 until the contradiction is resolved or user explicitly chooses to skip
+
+**Assertions:**
+- [ ] Contradiction is surfaced before Phase 3 begins
+- [ ] Orchestrator does not silently resolve the contradiction by picking one version
+- [ ] `AskUserQuestion` presents at least 3 options including "stop and resolve first"
+- [ ] Writer's draft output is preserved in the partial report, not discarded
+- [ ] Phase 3 (level-designer) is not launched until the user resolves the contradiction
+- [ ] Verdict is BLOCKED (not COMPLETE) if the user stops to resolve the contradiction
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-narrative` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Outputs usage guidance: e.g., "Usage: `/team-narrative [narrative content description]` — describe the story content, scene, or narrative area to work on (e.g., `boss encounter cutscene`, `faction intro dialogue`, `tutorial narrative`)"
+3. Skill exits without spawning any agents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any agents when no argument is provided
+- [ ] Usage message includes the correct invocation format with an argument example
+- [ ] Skill does NOT attempt to guess or infer a narrative topic from project files
+- [ ] No `AskUserQuestion` is used — output is direct guidance
+
+---
+
+### Case 4: Localization Compliance — localization-lead flags a non-translatable string
+
+**Fixture:**
+- Phases 1–4 complete successfully
+- Phase 5 begins; writer and world-builder complete without issues
+- localization-lead finds a dialogue line that uses a hardcoded formatted date string (e.g., `"On March 12th, Year 3"`) that cannot survive locale-specific translation without a locale-aware formatter
+
+**Input:** `/team-narrative ironveil faction introduction cutscene` (Phase 5 scenario)
+
+**Expected behavior:**
+1. Phase 5 spawns writer, localization-lead, and world-builder in parallel
+2. localization-lead completes its review and flags: "String key `dialogue.ironveil.intro.003` contains a hardcoded date format (`March 12th, Year 3`) that will not localize correctly — requires a locale-aware date placeholder"
+3. Orchestrator surfaces the localization blocker in the summary report
+4. The localization issue is labeled as BLOCKING in the final report (not advisory)
+5. `AskUserQuestion` presents options:
+   - Fix the string now (writer revises the line)
+   - Note the gap and deliver the narrative doc with the issue flagged
+   - Stop and resolve before finalizing
+6. If the user chooses to proceed with the issue flagged, verdict is COMPLETE with noted localization debt; if user stops, verdict is BLOCKED
+
+**Assertions:**
+- [ ] localization-lead is spawned in Phase 5 simultaneously with writer and world-builder
+- [ ] Hardcoded date format is identified as a localization blocker (not silently passed)
+- [ ] The specific string key and reason are included in the issue report
+- [ ] `AskUserQuestion` offers the option to fix now vs. flag and proceed
+- [ ] Verdict notes the localization debt if the user proceeds without fixing
+- [ ] Skill does NOT automatically rewrite the offending line without user approval
+
+---
+
+### Case 5: Writer Blocked — Missing character voice profiles
+
+**Fixture:**
+- Phase 1 narrative-director produces a narrative brief referencing two characters: Commander Varek and Advisor Selene
+- No character voice profiles exist in `design/narrative/characters/` for either character
+- Phase 2 begins; world-builder proceeds normally
+
+**Input:** `/team-narrative ironveil surrender negotiation scene`
+
+**Expected behavior:**
+1. Phase 1 completes; narrative brief lists Commander Varek and Advisor Selene as characters
+2. Phase 2: writer is spawned in parallel with world-builder
+3. writer returns BLOCKED: "Cannot produce dialogue — no voice profiles found for Commander Varek or Advisor Selene in `design/narrative/characters/`. Voice profiles required to match character tone and speech patterns."
+4. Orchestrator surfaces the blocker immediately: "writer: BLOCKED — Missing prerequisite: character voice profiles for Commander Varek and Advisor Selene"
+5. world-builder output is preserved; partial report is produced with lore entries
+6. `AskUserQuestion` presents options:
+   - Create voice profiles first (redirects to the narrative-director or design workflow)
+   - Provide minimal voice direction inline and retry the writer with that context
+   - Stop here and create voice profiles before proceeding
+7. Orchestrator does NOT proceed to Phase 3 (level-designer) without writer output
+
+**Assertions:**
+- [ ] Writer block is surfaced before Phase 3 begins
+- [ ] world-builder's completed lore output is preserved in the partial report
+- [ ] Missing prerequisite (voice profiles) is named specifically (character names and expected file path)
+- [ ] `AskUserQuestion` offers at least one option to resolve the missing prerequisite
+- [ ] Orchestrator does not fabricate voice profiles or invent character voices
+- [ ] Phase 3 is not launched while writer is BLOCKED without explicit user authorization
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` is used after every phase output before the next phase launches
+- [ ] Parallel spawning: Phase 2 (world-builder + writer) and Phase 5 (writer + localization-lead + world-builder) issue all Task calls before waiting for results
+- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
+- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
+- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
+- [ ] A partial report is always produced when some agents complete and others block
+- [ ] Verdict is exactly COMPLETE or BLOCKED — no other verdict values used
+- [ ] Next Steps handoff references `/design-review`, `/localize extract`, and `/dev-story`
+
+---
+
+## Coverage Notes
+
+- Phase 3 (level-designer) and Phase 4 (narrative-director review) happy-path behavior are
+  validated implicitly by Case 1. Separate edge cases are not needed for these phases as
+  their failure modes follow the standard Error Recovery Protocol.
+- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
+  Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+  + partial-report pattern validated in Cases 2 and 5.
+- Localization concerns that are advisory (e.g., German/Finnish +30% expansion warnings)
+  vs. blocking (hardcoded formats) are distinguished in Case 4; advisory-only scenarios
+  follow the same pattern but do not change the verdict.
+- The writer's "all lines under 120 characters" and "string keys not raw strings" checks
+  in Phase 5 are covered implicitly by Case 4's localization compliance scenario.
--- a/Framework/skills/team/team-polish.md
+++ b/Framework/skills/team/team-polish.md
@@ -0,0 +1,218 @@
+# Skill Test Spec: /team-polish
+
+## Skill Summary
+
+Orchestrates the polish team through a six-phase pipeline: performance assessment
+(performance-analyst) → optimization (performance-analyst, optionally with
+engine-programmer when engine-level root causes are found) → visual polish
+(technical-artist, parallel with Phase 2) → audio polish (sound-designer, parallel
+with Phase 2) → hardening (qa-tester) → sign-off (orchestrator collects all results
+and issues READY FOR RELEASE or NEEDS MORE WORK). Uses `AskUserQuestion` at each
+phase transition. Engine-programmer is spawned conditionally only when Phase 1
+identifies engine-level root causes. Verdict is READY FOR RELEASE or NEEDS MORE WORK.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: READY FOR RELEASE, NEEDS MORE WORK
+- [ ] Contains "File Write Protocol" section
+- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
+- [ ] Sub-agents enforce "May I write to [path]?" before any write
+- [ ] Has a next-step handoff at the end (references `/release-checklist`, `/sprint-plan update`, `/gate-check`)
+- [ ] Error Recovery Protocol section is present
+- [ ] `AskUserQuestion` is used at phase transitions before proceeding
+- [ ] Phase 3 (visual polish) and Phase 4 (audio polish) are explicitly run in parallel with Phase 2
+- [ ] engine-programmer is conditionally spawned in Phase 2 only when Phase 1 identifies engine-level root causes
+- [ ] Phase 6 sign-off compares metrics against budgets before issuing verdict
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Full pipeline completes, READY FOR RELEASE verdict
+
+**Fixture:**
+- Feature exists and is functionally complete (e.g., `combat` system)
+- Performance budgets are defined in technical-preferences.md (e.g., target 60fps, 16ms frame budget)
+- No frame budget violations exist before polishing begins
+- No audio events are missing; VFX assets are complete
+- No regressions are introduced by polish changes
+
+**Input:** `/team-polish combat`
+
+**Expected behavior:**
+1. Phase 1: performance-analyst is spawned; profiles the combat system, measures frame budget, checks memory usage; output: performance report showing all metrics within budget, no violations
+2. `AskUserQuestion` presents performance report; user approves before Phases 2, 3, and 4 begin
+3. Phase 2: performance-analyst applies minor optimizations (e.g., draw call batching); no engine-programmer needed (no engine-level root causes identified)
+4. Phases 3 and 4 are launched in parallel alongside Phase 2:
+   - Phase 3: technical-artist reviews VFX for quality, optimizes particle systems, adds screen shake and visual juice
+   - Phase 4: sound-designer reviews audio events for completeness, checks mix levels, adds ambient audio layers
+5. All three parallel phases complete; `AskUserQuestion` presents results; user approves before Phase 5 begins
+6. Phase 5: qa-tester runs edge case tests, soak tests, stress tests, and regression tests; all pass
+7. `AskUserQuestion` presents test results; user approves before Phase 6
+8. Phase 6: orchestrator collects all results; compares before/after performance metrics against budgets; all metrics pass
+9. Subagent asks "May I write the polish report to `production/qa/evidence/polish-combat-[date].md`?" before writing
+10. Verdict: READY FOR RELEASE
+
+**Assertions:**
+- [ ] performance-analyst is spawned first in Phase 1 before any other agents
+- [ ] `AskUserQuestion` appears after Phase 1 output and before Phases 2/3/4 launch
+- [ ] Phases 3 and 4 Task calls are issued at the same time as Phase 2 (not after Phase 2 completes)
+- [ ] engine-programmer is NOT spawned when Phase 1 finds no engine-level root causes
+- [ ] qa-tester (Phase 5) is not launched until the parallel phases complete and user approves
+- [ ] Phase 6 verdict is based on comparison of metrics against defined budgets
+- [ ] Summary report includes: before/after performance metrics, visual polish changes, audio polish changes, test results
+- [ ] No files are written by the orchestrator directly
+- [ ] Verdict is READY FOR RELEASE
+
+---
+
+### Case 2: Performance Blocker — Frame budget violation cannot be fully resolved
+
+**Fixture:**
+- Feature being polished: `particle-storm` VFX system
+- Phase 1 identifies a frame budget violation: particle-storm costs 12ms on target hardware (budget is 6ms for this system)
+- Phase 2 performance-analyst applies optimizations reducing cost to 9ms — still over the 6ms budget
+- Phase 2 cannot fully resolve the violation without a fundamental design change
+
+**Input:** `/team-polish particle-storm`
+
+**Expected behavior:**
+1. Phase 1: performance-analyst identifies the 12ms frame cost vs. 6ms budget; reports "FRAME BUDGET VIOLATION: particle-storm costs 12ms, budget is 6ms"
+2. `AskUserQuestion` presents the violation; user chooses to proceed with optimization attempt
+3. Phase 2: performance-analyst applies optimizations; achieves 9ms — reduced but still over budget; reports "Optimization reduced cost to 9ms (was 12ms) — 3ms over budget. No further gains achievable without design changes."
+4. Phases 3 and 4 run in parallel with Phase 2 (visual and audio polish)
+5. Phase 5: qa-tester runs regression and edge case tests; all pass
+6. Phase 6: orchestrator collects results; frame budget violation (9ms vs 6ms budget) remains unresolved
+7. Verdict: NEEDS MORE WORK
+8. Report lists the specific unresolved issue: "particle-storm frame cost (9ms) exceeds budget (6ms) by 3ms — requires design scope reduction or budget renegotiation"
+9. Next Steps: schedule the remaining issue in `/sprint-plan update`; re-run `/team-polish` after fix
+
+**Assertions:**
+- [ ] Frame budget violation is flagged in Phase 1 with specific numbers (actual vs. budget)
+- [ ] Phase 2 reports the post-optimization metric explicitly (9ms achieved, 3ms still over)
+- [ ] Verdict is NEEDS MORE WORK (not READY FOR RELEASE) when a budget violation remains
+- [ ] The specific unresolved issue is listed by name with the remaining gap quantified
+- [ ] Next Steps references `/sprint-plan update` for scheduling the remaining fix
+- [ ] Phases 3 and 4 still run (polish work is not abandoned due to a Phase 2 partial resolution)
+- [ ] Phase 5 qa-tester still runs (regression testing is independent of the performance outcome)
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-polish` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument is provided
+2. Outputs usage guidance: e.g., "Usage: `/team-polish [feature or area]` — specify the feature or area to polish (e.g., `combat`, `main menu`, `inventory system`, `level-1`)"
+3. Skill exits without spawning any agents
+
+**Assertions:**
+- [ ] Skill does NOT spawn any agents when no argument is provided
+- [ ] Usage message includes the correct invocation format with argument examples
+- [ ] Skill does NOT attempt to guess a feature from project files
+- [ ] No `AskUserQuestion` is used — output is direct guidance
+
+---
+
+### Case 4: Engine-Level Bottleneck — engine-programmer spawned conditionally in Phase 2
+
+**Fixture:**
+- Feature being polished: `open-world` environment streaming
+- Phase 1 identifies a performance bottleneck with a root cause in the rendering pipeline: "draw call overhead is caused by the engine's scene tree traversal in the spatial indexer — this is an engine-level issue, not a game code issue"
+- Performance budgets are defined; the rendering overhead exceeds target frame budget
+
+**Input:** `/team-polish open-world`
+
+**Expected behavior:**
+1. Phase 1: performance-analyst profiles the environment; identifies frame budget violation; root cause analysis points to engine-level rendering pipeline (spatial indexer traversal overhead)
+2. Phase 1 output explicitly classifies the root cause as engine-level
+3. `AskUserQuestion` presents the performance report including the engine-level root cause; user approves before Phase 2
+4. Phase 2: performance-analyst is spawned for game-code-level optimizations AND engine-programmer is spawned in parallel for the engine-level rendering fix
+5. Phases 3 and 4 also run in parallel with Phase 2 (visual and audio polish)
+6. engine-programmer addresses the spatial indexer traversal; provides profiler validation showing the fix reduces overhead
+7. Phase 5: qa-tester runs regression tests including tests for the engine-level fix
+8. Phase 6: orchestrator collects all results; if metrics are now within budget, verdict is READY FOR RELEASE; if not, NEEDS MORE WORK
+
+**Assertions:**
+- [ ] engine-programmer is NOT spawned in Phase 2 unless Phase 1 explicitly identifies an engine-level root cause
+- [ ] engine-programmer is spawned in Phase 2 when Phase 1 identifies an engine-level root cause
+- [ ] engine-programmer and performance-analyst Task calls in Phase 2 are issued simultaneously (not sequentially)
+- [ ] Phases 3 and 4 also run in parallel with Phase 2 (not deferred until Phase 2 completes)
+- [ ] engine-programmer's output includes profiler validation of the fix
+- [ ] qa-tester in Phase 5 runs regression tests that cover the engine-level change
+- [ ] Verdict correctly reflects whether all metrics including the engine fix now meet budgets
+
+---
+
+### Case 5: Regression Found — Polish change broke an existing feature
+
+**Fixture:**
+- Feature being polished: `inventory-ui`
+- Phases 1–4 complete successfully; performance and polish changes are applied
+- Phase 5: qa-tester runs regression tests and finds that a shader optimization applied in Phase 3 broke the item highlight glow effect on hover — an existing feature that was working before the polish pass
+
+**Input:** `/team-polish inventory-ui` (Phase 5 scenario)
+
+**Expected behavior:**
+1. Phases 1–4 complete; polish changes include a shader optimization from technical-artist
+2. Phase 5: qa-tester runs regression tests and detects "Item highlight glow on hover no longer renders — regression introduced by shader optimization in Phase 3"
+3. qa-tester returns test results with the regression noted
+4. Orchestrator surfaces the regression immediately: "qa-tester: REGRESSION FOUND — `item-highlight-hover` glow broken by Phase 3 shader optimization"
+5. Subagent files a bug report asking "May I write the bug report to `production/qa/evidence/bug-polish-inventory-ui-[date].md`?" before writing
+6. Bug report is written after approval; it includes: the broken behavior, the polish change that caused it, reproduction steps, and severity
+7. `AskUserQuestion` presents the regression with options:
+   - Revert the shader optimization and find an alternative approach
+   - Fix the shader optimization to preserve the glow effect
+   - Accept the regression and schedule a fix in the next sprint
+8. Verdict: NEEDS MORE WORK (regression present regardless of user's chosen resolution path, unless fix is applied within the current session)
+
+**Assertions:**
+- [ ] Regression is surfaced before Phase 6 sign-off
+- [ ] The specific broken behavior and the responsible change are both named in the report
+- [ ] Subagent asks "May I write the bug report to [path]?" before filing
+- [ ] Bug report includes: broken behavior, causal change, reproduction steps, severity
+- [ ] `AskUserQuestion` offers options including revert, fix in place, and schedule later
+- [ ] Verdict is NEEDS MORE WORK when a regression is present and unresolved
+- [ ] Verdict may become READY FOR RELEASE only if the regression is fixed within the current polish session and qa-tester re-runs to confirm
+
+---
+
+## Protocol Compliance
+
+- [ ] Phase 1 (assessment) must complete before any other phase begins
+- [ ] `AskUserQuestion` is used after every phase output before the next phase launches
+- [ ] Phases 3 and 4 are always launched in parallel with Phase 2 (not deferred)
+- [ ] engine-programmer is only spawned when Phase 1 explicitly identifies engine-level root causes
+- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
+- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
+- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
+- [ ] A partial report is always produced when some agents complete and others block
+- [ ] Verdict is exactly READY FOR RELEASE or NEEDS MORE WORK — no other verdict values used
+- [ ] NEEDS MORE WORK verdict always lists specific remaining issues with severity
+- [ ] Next Steps handoff references `/release-checklist` (on success) and `/sprint-plan update` + `/gate-check` (on failure)
+
+---
+
+## Coverage Notes
+
+- The tools-programmer optional agent (for content pipeline tool verification) is not
+  separately tested — it follows the same conditional spawn pattern as engine-programmer
+  and is invoked only when content authoring tools are involved in the polished area.
+- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
+  Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+  + partial-report pattern validated in Cases 2 and 5.
+- Phase 6 sign-off logic (collecting and comparing all metrics) is validated implicitly
+  by Cases 1 and 2. The distinction between READY FOR RELEASE and NEEDS MORE WORK is
+  exercised in both directions across these cases.
+- Soak testing and stress testing (Phase 5) are validated implicitly by Case 1's
+  qa-tester output. Case 5 focuses on the regression detection aspect of Phase 5.
+- The "minimum spec hardware" test path in Phase 5 is not separately tested — it follows
+  the same qa-tester delegation pattern when the hardware is available.
--- a/Framework/skills/team/team-qa.md
+++ b/Framework/skills/team/team-qa.md
@@ -0,0 +1,204 @@
+# Skill Test Spec: /team-qa
+
+## Skill Summary
+
+Orchestrates the QA team through a 7-phase structured testing cycle. Coordinates
+qa-lead (strategy, test plan, sign-off report) and qa-tester (test case writing,
+bug report writing). Covers scope detection, story classification, QA plan
+generation, smoke check gate, test case writing, manual QA execution with bug
+filing, and a final sign-off report with an APPROVED / APPROVED WITH CONDITIONS /
+NOT APPROVED verdict. Parallel qa-tester spawning is used in Phase 5 for
+independent stories.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains verdict keywords for sign-off report: APPROVED, APPROVED WITH CONDITIONS, NOT APPROVED
+- [ ] Contains "May I write" language for both the QA plan and the sign-off report
+- [ ] Has an Error Recovery Protocol section
+- [ ] Uses `AskUserQuestion` at phase transitions to capture user approval before proceeding
+- [ ] Phase 4 (smoke check) is a hard gate: FAIL stops the cycle
+- [ ] Bug reports are written to `production/qa/bugs/` with `BUG-[NNN]-[short-slug].md` naming
+- [ ] Next-step guidance differs by verdict (APPROVED / APPROVED WITH CONDITIONS / NOT APPROVED)
+- [ ] Independent qa-tester tasks in Phase 5 are spawned in parallel
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All stories pass manual QA, APPROVED verdict
+
+**Fixture:**
+- `production/sprints/sprint-03/` exists with 4 story files
+- Stories are a mix of types: 1 Logic, 1 Integration, 2 Visual/Feel
+- All stories have acceptance criteria populated
+- `tests/smoke/` contains a smoke test list; all items are verifiable
+- No existing bugs in `production/qa/bugs/`
+
+**Input:** `/team-qa sprint-03`
+
+**Expected behavior:**
+1. Phase 1: Reads all story files in `production/sprints/sprint-03/`; reads `production/stage.txt`; reports "Found 4 stories. Current stage: [stage]. Ready to begin QA strategy?"
+2. Phase 2: Spawns `qa-lead` via Task; produces strategy table classifying all 4 stories; no blockers flagged; presents to user; AskUserQuestion: user selects "Looks good — proceed to test plan"
+3. Phase 3: Produces QA plan document; asks "May I write the QA plan to `production/qa/qa-plan-sprint-03-[date].md`?"; writes after approval
+4. Phase 4: Spawns `qa-lead` via Task; reviews `tests/smoke/`; returns PASS; reports "Smoke check passed. Proceeding to test case writing."
+5. Phase 5: Spawns `qa-tester` via Task for each Visual/Feel and Integration story (2–3 stories); run in parallel; presents test cases grouped by story; AskUserQuestion per group; user approves
+6. Phase 6: Walks through each approved story; user marks all as PASS; result summary: "Stories PASS: 4, FAIL: 0, BLOCKED: 0"
+7. Phase 7: Spawns `qa-lead` via Task to produce sign-off report; report shows all stories PASS; no bugs filed; Verdict: APPROVED; asks "May I write this QA sign-off report to `production/qa/qa-signoff-sprint-03-[date].md`?"; writes after approval
+8. Verdict: COMPLETE — QA cycle finished
+
+**Assertions:**
+- [ ] Phase 1 correctly counts and reports 4 stories with current stage
+- [ ] Strategy table in Phase 2 classifies all 4 stories with correct types
+- [ ] QA plan written only after "May I write?" approval
+- [ ] Smoke check PASS allows pipeline to continue without user intervention
+- [ ] Phase 5 qa-tester tasks for independent stories are issued in parallel
+- [ ] Sign-off report includes Test Coverage Summary table and Verdict: APPROVED
+- [ ] Sign-off report written only after "May I write?" approval
+- [ ] Verdict: COMPLETE appears in final output
+- [ ] Next step: "Run `/gate-check` to validate advancement."
+
+---
+
+### Case 2: Smoke Check Fail — QA cycle stops at Phase 4
+
+**Fixture:**
+- `production/sprints/sprint-04/` exists with 3 story files
+- `tests/smoke/` exists with 5 smoke test items; 2 items cannot be verified (e.g., build is unstable, core navigation broken)
+
+**Input:** `/team-qa sprint-04`
+
+**Expected behavior:**
+1. Phases 1–3 complete normally; QA plan is written
+2. Phase 4: Spawns `qa-lead` via Task; smoke check returns FAIL; two specific failures are identified
+3. Skill reports: "Smoke check failed. QA cannot begin until these issues are resolved: [list of 2 failures]. Fix them and re-run `/smoke-check`, or re-run `/team-qa` once resolved."
+4. Skill stops immediately after Phase 4 — no Phase 5, 6, or 7 is executed
+5. No sign-off report is produced; no "May I write?" for a sign-off is issued
+
+**Assertions:**
+- [ ] Smoke check FAIL causes the pipeline to halt at Phase 4 — Phases 5, 6, 7 are NOT executed
+- [ ] Failure list is shown to the user explicitly (not summarized vaguely)
+- [ ] Skill recommends `/smoke-check` and `/team-qa` re-run as remediation steps
+- [ ] No QA sign-off report is written or offered
+- [ ] Skill does NOT produce a COMPLETE verdict
+- [ ] Any QA plan already written in Phase 3 is preserved (not deleted)
+
+---
+
+### Case 3: Bug Found — Visual/Feel story fails manual QA, bug report filed
+
+**Fixture:**
+- `production/sprints/sprint-05/` exists with 2 story files: 1 Logic (passes automated tests), 1 Visual/Feel
+- `tests/smoke/` smoke check passes
+- The Visual/Feel story's animation timing is visibly wrong (acceptance criterion not met)
+- `production/qa/bugs/` directory exists (empty or with existing bugs)
+
+**Input:** `/team-qa sprint-05`
+
+**Expected behavior:**
+1. Phases 1–5 complete normally; test cases are written for the Visual/Feel story
+2. Phase 6: User marks Visual/Feel story as FAIL; AskUserQuestion collects failure description: "Animation plays at 2x speed — jitter visible on every loop"
+3. Phase 6: Spawns `qa-tester` via Task to write a formal bug report; bug report written to `production/qa/bugs/BUG-001-animation-speed-jitter.md` (or next increment if bugs exist); report includes severity field
+4. Result summary: "Stories PASS: 1, FAIL: 1 — bugs filed: BUG-001"
+5. Phase 7: Spawns `qa-lead` to produce sign-off report; Bugs Found table lists BUG-001 with severity and status Open; Verdict: NOT APPROVED (S1/S2 bug open, or FAIL without documented workaround)
+6. Sign-off report write is offered; writes after approval
+7. Next step: "Resolve S1/S2 bugs and re-run `/team-qa` or targeted manual QA before advancing."
+
+**Assertions:**
+- [ ] FAIL result in Phase 6 triggers AskUserQuestion to collect the failure description before the bug report is written
+- [ ] `qa-tester` is spawned via Task to write the bug report — orchestrator does not write it directly
+- [ ] Bug report follows naming convention: `BUG-[NNN]-[short-slug].md` in `production/qa/bugs/`
+- [ ] Bug report NNN is incremented correctly from existing bugs in the directory
+- [ ] Phase 7 sign-off report Bugs Found table includes the bug ID, story name, severity, and status
+- [ ] Verdict in sign-off report is NOT APPROVED
+- [ ] Next step explicitly mentions re-running `/team-qa`
+- [ ] Verdict: COMPLETE is still issued by the orchestrator (the QA cycle finished — the verdict is NOT APPROVED, but the skill completed its pipeline)
+
+---
+
+### Case 4: No Argument — Skill infers active sprint or asks user
+
+**Fixture (variant A — state files present):**
+- `production/session-state/active.md` exists and contains a reference to `sprint-06`
+- `production/sprint-status.yaml` exists and identifies `sprint-06` as active
+
+**Fixture (variant B — state files absent):**
+- `production/session-state/active.md` does NOT exist
+- `production/sprint-status.yaml` does NOT exist
+
+**Input:** `/team-qa` (no argument)
+
+**Expected behavior (variant A):**
+1. Phase 1: No argument provided; reads `production/session-state/active.md`; reads `production/sprint-status.yaml`
+2. Detects `sprint-06` as the active sprint from both sources
+3. Proceeds as if `/team-qa sprint-06` was the input; reports "No sprint argument provided — inferred sprint-06 from session state. Found [N] stories."
+
+**Expected behavior (variant B):**
+1. Phase 1: No argument provided; attempts to read `production/session-state/active.md` — file missing; attempts to read `production/sprint-status.yaml` — file missing
+2. Cannot infer sprint; uses AskUserQuestion: "Which sprint or feature should QA cover?" with options to type a sprint identifier or cancel
+
+**Assertions:**
+- [ ] Skill does NOT default to a hardcoded sprint name when no argument is provided
+- [ ] Skill reads both `production/session-state/active.md` AND `production/sprint-status.yaml` before asking the user (variant A)
+- [ ] When both state files are absent, skill uses AskUserQuestion rather than guessing (variant B)
+- [ ] Inferred sprint is reported to the user before proceeding (variant A transparency)
+- [ ] Skill does NOT error out when state files are missing — it falls back to asking (variant B)
+
+---
+
+### Case 5: Mixed Results — Some PASS, one FAIL with S1 bug, one BLOCKED
+
+**Fixture:**
+- `production/sprints/sprint-07/` exists with 4 story files
+- Smoke check passes
+- Story A (Logic): automated test passes — PASS
+- Story B (UI): manual QA — PASS WITH NOTES (minor text overflow)
+- Story C (Visual/Feel): manual QA — FAIL; tester identifies S1 crash on ability activation
+- Story D (Integration): cannot test — BLOCKED (dependency system not yet implemented)
+
+**Input:** `/team-qa sprint-07`
+
+**Expected behavior:**
+1. Phases 1–5 proceed; Phase 5 test cases cover stories B, C, D
+2. Phase 6: User marks Story A as implicitly PASS (automated); Story B: PASS WITH NOTES; Story C: FAIL; Story D: BLOCKED
+3. After Story C FAIL: qa-tester spawned to write bug report `BUG-001-crash-ability-activation.md` with S1 severity
+4. Result summary presented: "Stories PASS: 1, PASS WITH NOTES: 1, FAIL: 1 — bugs filed: BUG-001 (S1), BLOCKED: 1"
+5. Phase 7: qa-lead produces sign-off report covering all 4 stories; BUG-001 listed as S1/Open; Story D listed as BLOCKED; Verdict: NOT APPROVED
+6. Sign-off report written after "May I write?" approval
+7. Next step: "Resolve S1/S2 bugs and re-run `/team-qa` or targeted manual QA before advancing."
+
+**Assertions:**
+- [ ] All 4 stories appear in the Phase 7 sign-off report Test Coverage Summary table — none are silently omitted
+- [ ] Story D (BLOCKED) is listed in the report with a BLOCKED status, not silently dropped
+- [ ] S1 bug causes Verdict: NOT APPROVED regardless of the other stories passing
+- [ ] PASS WITH NOTES stories do not downgrade to FAIL — they are tracked separately
+- [ ] BUG-001 severity is listed as S1 in the Bugs Found table
+- [ ] Partial results are preserved — the sign-off report is still produced even with failures and blocks
+- [ ] Verdict: COMPLETE is issued by the orchestrator (pipeline completed); sign-off verdict is NOT APPROVED
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at Phase 2 (strategy review), Phase 5 (test case approval per group), and Phase 6 (per-story manual QA result)
+- [ ] Phase 4 smoke check is a hard gate: FAIL halts the pipeline at Phase 4 with no exceptions
+- [ ] "May I write?" asked separately for QA plan (Phase 3) and sign-off report (Phase 7)
+- [ ] Bug reports are always written by `qa-tester` via Task — orchestrator does not write directly
+- [ ] Phase 5 qa-tester tasks for independent stories are issued in parallel where possible
+- [ ] Error recovery: any BLOCKED agent is surfaced immediately with AskUserQuestion options
+- [ ] Partial report always produced — no work is discarded because one story failed or blocked
+- [ ] Sign-off verdict rules are strictly applied: any S1/S2 bug open = NOT APPROVED; no exceptions
+- [ ] Orchestrator-level Verdict: COMPLETE is distinct from the sign-off report's APPROVED/NOT APPROVED verdict
+
+---
+
+## Coverage Notes
+
+- The "APPROVED WITH CONDITIONS" verdict path (S3/S4 bugs, PASS WITH NOTES) is covered implicitly by Case 5's PASS WITH NOTES story (Story B) — if no S1/S2 bugs existed, that case would produce APPROVED WITH CONDITIONS. A dedicated case is not required as the verdict logic is table-driven.
+- The `feature: [system-name]` argument form is not separately tested — it follows the same Phase 1 logic as the sprint form, using glob instead of directory read. The no-argument inference path (Case 4) provides sufficient coverage of the detection logic.
+- Logic stories with passing automated tests do not need manual QA — this is validated implicitly by Case 5 (Story A) where the Logic story receives no manual QA phase.
+- Parallel qa-tester spawning in Phase 5 is validated implicitly by Case 1 (multiple Visual/Feel stories issued simultaneously); no dedicated parallelism case is required beyond the Static Assertions check.
--- a/Framework/skills/team/team-release.md
+++ b/Framework/skills/team/team-release.md
@@ -0,0 +1,215 @@
+# Skill Test Spec: /team-release
+
+## Skill Summary
+
+Orchestrates the release team through a 7-phase pipeline from release candidate to
+deployment and post-release monitoring. Coordinates release-manager, qa-lead,
+devops-engineer, producer, security-engineer (optional, required for online/
+multiplayer), network-programmer (optional, required for multiplayer),
+analytics-engineer, and community-manager. Phase 3 agents run in parallel. Ends
+with a go/no-go decision; deployment (Phase 6) is skipped if the producer calls
+NO-GO. Closes with a post-release monitoring plan.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" language in the File Write Protocol section (delegated to sub-agents)
+- [ ] Has a File Write Protocol section stating that the orchestrator does not write files directly
+- [ ] Has an Error Recovery Protocol section with four recovery options (surface / assess / offer options / partial report)
+- [ ] Has a next-step handoff referencing post-release monitoring, `/retrospective`, and `production/stage.txt`
+- [ ] Uses `AskUserQuestion` at phase transitions requiring user approval before proceeding
+- [ ] Phase 3 agents (qa-lead, devops-engineer, and optionally security-engineer, network-programmer) are explicitly stated to run in parallel
+- [ ] Phase 6 (Deployment) is conditional on a GO decision from Phase 5
+- [ ] security-engineer is described as conditional on online features / player data — not always spawned
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path (Single-Player) — All phases complete, version deployed
+
+**Fixture:**
+- `production/stage.txt` exists and contains a Production-or-later stage
+- Milestone acceptance criteria are all met (producer can confirm)
+- No online features, no multiplayer, no player data collection
+- All CI builds are clean on the current branch
+- No open S1/S2 bugs
+- `production/sprints/` contains the completed sprint stories for this milestone
+
+**Input:** `/team-release v1.0.0`
+
+**Expected behavior:**
+1. Phase 1: Spawns `producer` via Task; confirms all milestone acceptance criteria met; identifies any deferred scope; produces release authorization; presents to user; AskUserQuestion: user approves before Phase 2
+2. Phase 2: Spawns `release-manager` via Task; cuts release branch from agreed commit; bumps version numbers; invokes `/release-checklist`; freezes branch; output: branch name and checklist; AskUserQuestion: user approves before Phase 3
+3. Phase 3 (parallel): Issues Task calls simultaneously for `qa-lead` (regression suite, critical path sign-off) and `devops-engineer` (build artifacts, CI verification); security-engineer is NOT spawned (no online features); network-programmer is NOT spawned (no multiplayer); both complete successfully
+4. Phase 4: Verifies localization strings all translated; `analytics-engineer` verifies telemetry fires correctly on the release build; performance benchmarks pass; sign-off produced
+5. Phase 5: Spawns `producer` via Task; collects sign-offs from qa-lead, release-manager, devops-engineer; no open blocking issues; producer declares GO; AskUserQuestion: user sees GO decision and confirms deployment
+6. Phase 6: Spawns `release-manager` + `devops-engineer` (parallel); tags release in version control; invokes `/changelog`; deploys to staging; smoke test passes; deploys to production; simultaneously spawns `community-manager` to finalize patch notes via `/patch-notes v1.0.0` and prepare launch announcement
+7. Phase 7: release-manager generates release report; producer updates milestone tracking; qa-lead begins monitoring for regressions; community-manager publishes communication; analytics-engineer confirms live dashboards healthy
+8. Verdict: COMPLETE — release executed and deployed
+
+**Assertions:**
+- [ ] Phase 3 qa-lead and devops-engineer Task calls are issued simultaneously, not sequentially
+- [ ] security-engineer is NOT spawned when the game has no online features, multiplayer, or player data
+- [ ] Phase 5 producer collects sign-offs from all required parties before declaring GO
+- [ ] Phase 6 deployment only begins after GO decision is confirmed by the user
+- [ ] `/changelog` is invoked by release-manager in Phase 6 (not written directly)
+- [ ] `/patch-notes v1.0.0` is invoked by community-manager in Phase 6
+- [ ] Phase 7 monitoring plan includes a 48-hour post-release monitoring commitment
+- [ ] Next steps recommend updating `production/stage.txt` to `Live` after successful deployment
+- [ ] Verdict: COMPLETE appears in the final output
+
+---
+
+### Case 2: Go/No-Go: NO — S1 bug found in Phase 3, deployment skipped
+
+**Fixture:**
+- Release candidate branch exists for v0.9.0
+- qa-lead discovers a previously unreported S1 crash in the main menu during Phase 3 regression testing
+- devops-engineer build is clean and artifacts are ready
+- producer is aware of the S1 bug
+
+**Input:** `/team-release v0.9.0`
+
+**Expected behavior:**
+1. Phases 1–2 complete normally; release candidate is cut
+2. Phase 3 (parallel): devops-engineer returns clean build sign-off; qa-lead returns with an S1 bug identified and regression suite failing; qa-lead declares quality gate: NOT PASSED
+3. Orchestrator surfaces the qa-lead result immediately: "QA-LEAD: S1 bug found — [crash description]. Quality gate: NOT PASSED."
+4. Phase 4 proceeds cautiously or is paused (AskUserQuestion: continue to Phase 4 or skip to Phase 5 for go/no-go?)
+5. Phase 5: Spawns `producer` via Task; producer receives qa-lead's NOT PASSED verdict; no S1 sign-off available; producer declares NO-GO with rationale: "S1 bug [ID] is open and unresolved. Releasing is not safe."
+6. AskUserQuestion: user is presented with the NO-GO decision and the S1 bug details; options: fix the bug and re-run, defer the release, or override (with documented rationale)
+7. Phase 6 (Deployment) is SKIPPED entirely — no branch tagging, no deploy to staging, no deploy to production
+8. community-manager is NOT spawned in Phase 6 (no deployment to announce)
+9. Skill ends with a partial report summarizing what was completed (Phases 1–5) and what was skipped (Phase 6) and why
+10. Verdict: BLOCKED — release not deployed
+
+**Assertions:**
+- [ ] qa-lead S1 bug finding is surfaced to the user immediately after Phase 3 completes — not suppressed until Phase 5
+- [ ] producer's NO-GO decision explicitly references the S1 bug and the quality gate result
+- [ ] Phase 6 Deployment is completely skipped when producer declares NO-GO
+- [ ] community-manager is NOT spawned for patch notes or launch announcement on NO-GO
+- [ ] The partial report clearly states which phases completed and which were skipped, with reasons
+- [ ] Verdict: BLOCKED (not COMPLETE) when deployment is skipped due to NO-GO
+- [ ] AskUserQuestion offers the user resolution options (fix and re-run / defer / override with rationale)
+- [ ] Override path (if chosen) requires user to provide a documented rationale before proceeding to Phase 6
+
+---
+
+### Case 3: Security Audit for Online Game — security-engineer is spawned in Phase 3
+
+**Fixture:**
+- Game has multiplayer features and stores player account data
+- Release candidate exists for v2.1.0
+- qa-lead and devops-engineer both return clean sign-offs
+- security-engineer audit is required per team composition rules
+
+**Input:** `/team-release v2.1.0`
+
+**Expected behavior:**
+1. Phases 1–2 complete normally
+2. Phase 3 (parallel): Orchestrator detects that the game has online/multiplayer features and player data; issues Task calls simultaneously for `qa-lead`, `devops-engineer`, AND `security-engineer`; also spawns `network-programmer` for netcode stability sign-off
+3. security-engineer conducts pre-release security audit: reviews authentication flows, anti-cheat presence, data privacy compliance; returns sign-off
+4. network-programmer verifies lag compensation, reconnect handling, and bandwidth under load; returns sign-off
+5. All four Phase 3 agents complete; their results are collected before Phase 4 begins
+6. Phase 5: producer collects sign-offs from all four Phase 3 agents (qa-lead, devops-engineer, security-engineer, network-programmer) before making the go/no-go call
+7. Remaining phases proceed normally to COMPLETE
+
+**Assertions:**
+- [ ] security-engineer IS spawned in Phase 3 when the game has online features, multiplayer, or player data — this is not skipped
+- [ ] network-programmer IS spawned in Phase 3 when the game has multiplayer
+- [ ] All four Phase 3 Task calls (qa-lead, devops-engineer, security-engineer, network-programmer) are issued simultaneously
+- [ ] security-engineer audit covers authentication, anti-cheat, and data privacy compliance
+- [ ] Phase 5 producer sign-off collection includes security-engineer (four parties, not two)
+- [ ] Phase 6 deployment does not begin until security-engineer has signed off
+- [ ] Skill does NOT treat security-engineer as optional for a game with player data
+
+---
+
+### Case 4: Localization Miss — Untranslated strings block the ship
+
+**Fixture:**
+- Release candidate exists for v1.2.0
+- Phase 3 (qa-lead, devops-engineer) complete with clean sign-offs
+- Phase 4: localization verification detects 47 untranslated strings in the French locale (a supported language in the game's localization scope)
+- localization-lead is available as a delegatable agent
+
+**Input:** `/team-release v1.2.0`
+
+**Expected behavior:**
+1. Phases 1–3 complete with clean sign-offs
+2. Phase 4: Localization verification step detects untranslated strings; identifies 47 strings in French locale; localization-lead (if available) is spawned to assess the severity
+3. Orchestrator surfaces: "LOCALIZATION MISS: 47 untranslated strings found in French locale. Localization sign-off is required before shipping."
+4. AskUserQuestion: options presented — (a) Fix translations and re-run Phase 4, (b) Remove French locale from this release, (c) Ship as-is with a known issues note
+5. If user selects (a): Phase 4 is re-run after translations are provided; skill waits for localization sign-off
+6. Phase 5 go/no-go does NOT proceed while localization sign-off is outstanding
+7. Ship is blocked (Phase 6 not entered) until localization issue is resolved or explicitly waived
+
+**Assertions:**
+- [ ] Localization verification in Phase 4 detects untranslated strings and counts them (not just "some strings missing")
+- [ ] Untranslated strings for a supported locale block the pipeline before Phase 5
+- [ ] AskUserQuestion is used to offer the user resolution choices — the skill does not auto-waive
+- [ ] Phase 5 go/no-go is NOT called while localization sign-off is pending
+- [ ] If user chooses to re-run Phase 4: the skill does not require restarting from Phase 1
+- [ ] If user explicitly waives (ships as-is): the waiver is documented in the release report (Phase 7) as a known issue
+- [ ] Skill does NOT fabricate translated strings to unblock itself
+
+---
+
+### Case 5: No Argument — Skill infers version or asks
+
+**Fixture (variant A — milestone data present):**
+- `production/milestones/` exists with a milestone file; most recent milestone is "v1.1.0 — Gold"
+- `production/session-state/active.md` references a version or milestone
+
+**Fixture (variant B — no discoverable version):**
+- `production/milestones/` does not exist
+- `production/session-state/active.md` does not reference a version
+- No git tags are present from which to infer a version
+
+**Input:** `/team-release` (no argument)
+
+**Expected behavior (variant A):**
+1. Phase 1: No argument provided; reads `production/session-state/active.md`; reads most recent milestone file in `production/milestones/`
+2. Infers v1.1.0 as the target version; reports "No version argument provided — inferred v1.1.0 from milestone data. Proceeding."
+3. Confirms with AskUserQuestion before beginning Phase 1 proper: "Releasing v1.1.0. Is this correct?"
+4. Proceeds as if `/team-release v1.1.0` was the input
+
+**Expected behavior (variant B):**
+1. Phase 1: No argument provided; reads available state files — no version discoverable
+2. Uses AskUserQuestion: "What version number should be released? (e.g., v1.0.0)"
+3. Waits for user input before proceeding
+
+**Assertions:**
+- [ ] Skill does NOT default to a hardcoded version string when no argument is provided
+- [ ] Skill reads `production/session-state/active.md` and milestone files before asking (variant A)
+- [ ] Inferred version is confirmed with the user via AskUserQuestion before proceeding (variant A)
+- [ ] When no version is discoverable, AskUserQuestion is used — skill does not guess (variant B)
+- [ ] Skill does NOT error out when milestone files are absent — it falls back to asking (variant B)
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at each phase transition gate (post-Phase 1, post-Phase 2, post-Phase 3/4 if issues, post-Phase 5 go/no-go)
+- [ ] Phase 3 agents are always issued as parallel Task calls — qa-lead and devops-engineer are never sequential
+- [ ] security-engineer is conditionally spawned based on game features — never silently skipped when features are present
+- [ ] File Write Protocol: orchestrator never calls Write/Edit directly — all writes are delegated to sub-agents or sub-skills
+- [ ] Phase 6 Deployment is strictly conditional on a GO verdict from Phase 5 — never auto-triggered
+- [ ] Error recovery: any BLOCKED agent is surfaced immediately before continuing to dependent phases
+- [ ] Partial reports are always produced if any phase fails or the pipeline is halted (Case 2)
+- [ ] Verdict: COMPLETE only when deployment completes; BLOCKED when go/no-go is NO or a hard blocker is unresolved
+- [ ] Next steps always include 48-hour post-release monitoring, `/retrospective` recommendation, and `production/stage.txt` update to `Live`
+
+---
+
+## Coverage Notes
+
+- Phase 7 post-release actions (release report, milestone tracking, community publishing, dashboard monitoring) are validated implicitly by Case 1. No separate edge case is required as Phase 7 is non-gated and does not have a blocking failure mode.
+- The "devops-engineer build fails" path is not separately tested — it would surface as a BLOCKED result in Phase 3 and follow the standard error recovery protocol (surface → assess → AskUserQuestion options). This is validated structurally by the Static Assertions error recovery check.
+- The parallel Phase 4 path (localization + performance + analytics simultaneously with Phase 3) is a documented option in the skill ("can run in parallel with Phase 3 if resources available"). Case 4 tests Phase 4 as a sequential gate; the parallel variant is left to the skill's implementation judgment.
+- The `network-programmer` sign-off path for multiplayer is validated as part of Case 3 rather than a separate case, as it follows the same parallel-spawn pattern as security-engineer.
+- The "override NO-GO with documented rationale" path in Case 2 is referenced but not exhaustively tested — it is an escape hatch that the skill must support, and its existence is validated by the AskUserQuestion options assertion in Case 2.
--- a/Framework/skills/team/team-ui.md
+++ b/Framework/skills/team/team-ui.md
@@ -0,0 +1,201 @@
+# Skill Test Spec: /team-ui
+
+## Skill Summary
+
+Orchestrates the UI team through the full UX pipeline for a single UI feature.
+Coordinates ux-designer, ui-programmer, art-director, the engine UI specialist,
+and accessibility-specialist through five structured phases: Context Gathering +
+UX Spec (Phase 1a/1b) → UX Review Gate (Phase 1c) → Visual Design (Phase 2) →
+Implementation (Phase 3) → Review in parallel (Phase 4) → Polish (Phase 5).
+Uses `AskUserQuestion` at each phase transition. Delegates all file writes to
+sub-agents and sub-skills (`/ux-design`, `ui-programmer`). Produces a summary report
+with verdict COMPLETE / BLOCKED and handoffs to `/ux-review`, `/code-review`,
+`/team-polish`.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings (Phase 1a through Phase 5 are all present)
+- [ ] Contains verdict keywords: COMPLETE, BLOCKED
+- [ ] Contains "May I write" or "File Write Protocol" — writes delegated to sub-agents and sub-skills, orchestrator does not write files directly
+- [ ] Has a next-step handoff at the end (references `/ux-review`, `/code-review`, `/team-polish`)
+- [ ] Error Recovery Protocol section is present with all four recovery steps
+- [ ] Uses `AskUserQuestion` at phase transitions for user approval before proceeding
+- [ ] Phase 4 is explicitly marked as parallel (ux-designer, art-director, accessibility-specialist)
+- [ ] UX Review Gate (Phase 1c) is defined as a blocking gate — skill must not proceed to Phase 2 without APPROVED verdict
+- [ ] Team Composition lists all five roles (ux-designer, ui-programmer, art-director, engine UI specialist, accessibility-specialist)
+- [ ] References the interaction pattern library (`design/ux/interaction-patterns.md`) — ui-programmer must use existing patterns
+- [ ] Phase 1a reads `design/accessibility-requirements.md` before design begins
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Full pipeline from UX spec through polish succeeds
+
+**Fixture:**
+- `design/gdd/game-concept.md` exists with platform targets and intended audience
+- `design/player-journey.md` exists
+- `design/ux/interaction-patterns.md` exists with relevant patterns
+- `design/accessibility-requirements.md` exists with committed tier (e.g., Enhanced)
+- Engine UI specialist configured in `.claude/docs/technical-preferences.md`
+
+**Input:** `/team-ui inventory screen`
+
+**Expected behavior:**
+1. Phase 1a — orchestrator reads game-concept.md, player-journey.md, relevant GDD UI sections, interaction-patterns.md, accessibility-requirements.md; summarizes a brief for the ux-designer
+2. Phase 1b — `/ux-design inventory-screen` invoked (or ux-designer spawned directly); produces `design/ux/inventory-screen.md` using `ux-spec.md` template; `AskUserQuestion` confirms spec before review
+3. Phase 1c — `/ux-review design/ux/inventory-screen.md` invoked; returns APPROVED; gate passed, proceed to Phase 2
+4. Phase 2 — art-director spawned; reviews full UX spec (not only wireframes); applies visual treatment; verifies color contrast; produces visual design spec with asset manifest; `AskUserQuestion` confirms before Phase 3
+5. Phase 3 — engine UI specialist spawned first (read from technical-preferences.md); produces implementation notes for ui-programmer; ui-programmer spawned with UX spec + visual spec + engine notes; implementation produced; interaction-patterns.md updated if new patterns introduced
+6. Phase 4 — ux-designer, art-director, accessibility-specialist spawned in parallel; all three return results before Phase 5
+7. Phase 5 — review feedback addressed; animations verified skippable; UI sounds confirmed through audio event system; interaction-patterns.md final check; verdict: COMPLETE
+8. Summary report: UX spec APPROVED, visual design COMPLETE, implementation COMPLETE, accessibility COMPLIANT, all input methods supported, pattern library updated, verdict: COMPLETE
+
+**Assertions:**
+- [ ] Phase 1a reads all five sources before briefing ux-designer
+- [ ] UX Review Gate checked before Phase 2 — Phase 2 does NOT begin until APPROVED
+- [ ] Art-director in Phase 2 reviews full spec, not just wireframe images
+- [ ] Engine UI specialist spawned before ui-programmer in Phase 3
+- [ ] Phase 4 agents launched simultaneously (ux-designer, art-director, accessibility-specialist)
+- [ ] All file writes delegated to sub-agents and sub-skills
+- [ ] Verdict COMPLETE in final summary report
+- [ ] Next steps include `/ux-review`, `/code-review`, `/team-polish`
+
+---
+
+### Case 2: UX Review Gate — Spec fails review; skill halts before implementation
+
+**Fixture:**
+- `design/ux/inventory-screen.md` produced by Phase 1b
+- `/ux-review` returns verdict NEEDS REVISION with specific concerns flagged (e.g., gamepad navigation flow incomplete, contrast ratio below minimum)
+
+**Input:** `/team-ui inventory screen`
+
+**Expected behavior:**
+1. Phase 1a + 1b complete — UX spec produced
+2. Phase 1c — `/ux-review design/ux/inventory-screen.md` returns NEEDS REVISION
+3. Skill does NOT advance to Phase 2
+4. `AskUserQuestion` presented with the specific flagged concerns and options:
+   - (a) Return to ux-designer to address the issues and re-review
+   - (b) Accept the risk and proceed to Phase 2 anyway (conscious decision)
+5. If user chooses (a): ux-designer revises spec, `/ux-review` re-run; loop continues until APPROVED or user overrides
+6. If user chooses (b): skill proceeds with an explicit NEEDS REVISION note in the final report
+7. Skill does NOT silently proceed past the gate
+
+**Assertions:**
+- [ ] Phase 2 does NOT begin while UX review verdict is NEEDS REVISION
+- [ ] `AskUserQuestion` presents the specific flagged concerns before offering options
+- [ ] User must make a conscious choice to override — skill does not assume override
+- [ ] If user accepts risk, NEEDS REVISION concern is documented in the final report
+- [ ] Revision-and-re-review loop is offered (not just a one-shot failure)
+- [ ] Skill does NOT discard the produced UX spec on review failure
+
+---
+
+### Case 3: No Argument — Usage guidance shown
+
+**Fixture:**
+- Any project state
+
+**Input:** `/team-ui` (no argument)
+
+**Expected behavior:**
+1. Skill detects no argument provided
+2. Outputs usage message explaining the required argument (UI feature description)
+3. Provides an example invocation: `/team-ui [UI feature description]`
+4. Skill exits without spawning any subagents or reading any project files
+
+**Assertions:**
+- [ ] Skill does NOT spawn any subagents when no argument is given
+- [ ] Usage message includes the argument-hint format from frontmatter
+- [ ] At least one example of a valid invocation is shown
+- [ ] No UX spec files or GDDs read before failing
+- [ ] Verdict is NOT shown (pipeline never starts)
+
+---
+
+### Case 4: Accessibility Parallel Review — Phase 4 runs three streams simultaneously
+
+**Fixture:**
+- `design/ux/inventory-screen.md` exists (APPROVED)
+- Visual design spec complete
+- Implementation complete
+- `design/accessibility-requirements.md` committed tier: Enhanced
+
+**Input:** `/team-ui inventory screen` (resuming from Phase 3 complete)
+
+**Expected behavior:**
+1. Phase 4 begins after implementation is confirmed complete
+2. Three Task calls issued simultaneously: ux-designer, art-director, accessibility-specialist
+3. Each stream operates independently:
+   - ux-designer: verifies implementation matches wireframes, tests keyboard-only and gamepad-only navigation, checks accessibility features function
+   - art-director: verifies visual consistency with art bible at minimum and maximum supported resolutions
+   - accessibility-specialist: audits against the Enhanced accessibility tier in `design/accessibility-requirements.md`; any violation flagged as a blocker
+4. Skill waits for all three results before proceeding to Phase 5
+5. `AskUserQuestion` presents all three review results before Phase 5 begins
+
+**Assertions:**
+- [ ] All three Task calls issued before any result is awaited (parallel, not sequential)
+- [ ] Phase 5 does NOT begin until all three Phase 4 agents have returned
+- [ ] Accessibility-specialist explicitly reads `design/accessibility-requirements.md` for the committed tier
+- [ ] Accessibility violations flagged as BLOCKING (not merely advisory)
+- [ ] `AskUserQuestion` shows all three review streams' results together before Phase 5 approval
+- [ ] No Phase 4 agent's output is used as input for another Phase 4 agent
+
+---
+
+### Case 5: Missing Interaction Pattern Library — Skill notes the gap rather than inventing patterns
+
+**Fixture:**
+- `design/ux/interaction-patterns.md` does NOT exist
+- All other required files present
+
+**Input:** `/team-ui settings menu`
+
+**Expected behavior:**
+1. Phase 1a — orchestrator attempts to read `design/ux/interaction-patterns.md`; file not found
+2. Skill surfaces the gap: "interaction-patterns.md does not exist — no existing patterns to reuse"
+3. `AskUserQuestion` presented with options:
+   - (a) Run `/ux-design patterns` first to establish the pattern library, then continue
+   - (b) Proceed without the pattern library — ux-designer will document new patterns as they are created
+4. Skill does NOT invent or assume patterns from other sources
+5. If user chooses (b): ui-programmer is explicitly instructed to treat all patterns created as new and to add each to a new `design/ux/interaction-patterns.md` at completion
+6. Final report notes that interaction-patterns.md was created (or is still absent if user skipped)
+
+**Assertions:**
+- [ ] Skill does NOT silently ignore the missing pattern library
+- [ ] Skill does NOT invent patterns by guessing from the feature name or GDD alone
+- [ ] `AskUserQuestion` offers a "create pattern library first" option (referencing `/ux-design patterns`)
+- [ ] If user proceeds without the library, ui-programmer is told to treat all patterns as new
+- [ ] Final report documents pattern library status (created / absent / updated)
+- [ ] Skill does NOT fail entirely — the gap is noted and user is given a choice
+
+---
+
+## Protocol Compliance
+
+- [ ] `AskUserQuestion` used at each phase transition — user approves before pipeline advances
+- [ ] UX Review Gate (Phase 1c) is blocking — Phase 2 cannot begin without APPROVED or explicit user override
+- [ ] All file writes delegated to sub-agents and sub-skills — orchestrator does not call Write or Edit directly
+- [ ] Phase 4 agents launched in parallel per skill spec
+- [ ] Error Recovery Protocol followed: surface → assess → offer options → partial report
+- [ ] Partial report always produced even when agents are BLOCKED
+- [ ] Verdict is one of COMPLETE / BLOCKED
+- [ ] Next steps present at end: `/ux-review`, `/code-review`, `/team-polish`
+
+---
+
+## Coverage Notes
+
+- The HUD-specific path (`/ux-design hud` + `hud-design.md` template + visual budget check in Phase 5)
+  is not separately tested here; it shares the same phase structure but uses different templates.
+- The "Update in place" path for interaction-patterns.md (new pattern added during implementation)
+  is exercised implicitly in Case 1 Step 5 — a dedicated fixture with a known new pattern would
+  strengthen coverage.
+- Engine UI specialist unavailable (no engine configured) — skill spec states "skip if no engine
+  configured"; this path is asserted in Case 1 but not given a dedicated fixture.
+- The NEEDS REVISION acceptance-risk override (Case 2 option b) requires the override to be
+  explicitly documented in the report; this is asserted but not further tested for downstream effects.
--- a/Framework/skills/utility/adopt.md
+++ b/Framework/skills/utility/adopt.md
@@ -0,0 +1,214 @@
+# Skill Test Spec: /adopt
+
+## Skill Summary
+
+`/adopt` audits an existing project's artifacts — GDDs, ADRs, stories, infrastructure
+files, and `technical-preferences.md` — for format compliance with the template's
+skill pipeline. It classifies every gap by severity (BLOCKING / HIGH / MEDIUM / LOW),
+composes a numbered, ordered migration plan, and writes it to `docs/adoption-plan-[date].md`
+after explicit user approval via `AskUserQuestion`.
+
+This skill is distinct from `/project-stage-detect` (which checks what exists).
+`/adopt` checks whether what exists will actually work with the template's skills.
+
+No director gates apply. The skill does NOT invoke any director agents.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains severity tier keywords: BLOCKING, HIGH, MEDIUM, LOW
+- [ ] Contains "May I write" or `AskUserQuestion` language before writing the adoption plan
+- [ ] Has a next-step handoff at the end (e.g., offering to fix the highest-priority gap immediately)
+
+---
+
+## Director Gate Checks
+
+None. `/adopt` is a brownfield audit utility. No director gates apply.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All GDDs compliant, no gaps, COMPLIANT
+
+**Fixture:**
+- `design/gdd/` contains 3 GDD files; each has all 8 required sections with content
+- `docs/architecture/adr-0001.md` exists with `## Status`, `## Engine Compatibility`,
+  and all other required sections
+- `production/stage.txt` exists
+- `docs/architecture/tr-registry.yaml` and `docs/architecture/control-manifest.md` exist
+- Engine configured in `technical-preferences.md`
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Skill emits "Scanning project artifacts..." then reads all artifacts silently
+2. Reports detected phase, GDD count, ADR count, story count
+3. Phase 2 audit: all 3 GDDs have all 8 sections, Status field present and valid
+4. ADR audit: all required sections present
+5. Infrastructure audit: all critical files exist
+6. Phase 3: zero BLOCKING, zero HIGH, zero MEDIUM, zero LOW gaps
+7. Summary reports: "No blocking gaps — this project is template-compatible"
+8. Uses `AskUserQuestion` to ask about writing the plan; user selects write
+9. Adoption plan is written to `docs/adoption-plan-[date].md`
+10. Phase 7 offers next action: no blocking gaps, offers options for next steps
+
+**Assertions:**
+- [ ] Skill reads silently before presenting any output
+- [ ] "Scanning project artifacts..." appears before the silent read phase
+- [ ] Gap counts show 0 BLOCKING, 0 HIGH, 0 MEDIUM (or only LOW)
+- [ ] `AskUserQuestion` is used before writing the adoption plan
+- [ ] Adoption plan file is written to `docs/adoption-plan-[date].md`
+- [ ] Phase 7 offers a specific next action (not just a list)
+
+---
+
+### Case 2: Non-Compliant Documents — GDDs missing sections, NEEDS MIGRATION
+
+**Fixture:**
+- `design/gdd/` contains 2 GDD files:
+  - `combat.md` — missing `## Acceptance Criteria` and `## Formulas` sections
+  - `movement.md` — all 8 sections present
+- One ADR (`adr-0001.md`) is missing `## Status` section
+- `docs/architecture/tr-registry.yaml` does not exist
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Skill scans all artifacts
+2. Phase 2 audit finds:
+   - `combat.md`: 2 missing sections (Acceptance Criteria, Formulas)
+   - `adr-0001.md`: missing `## Status` — BLOCKING impact
+   - `tr-registry.yaml`: missing — HIGH impact
+3. Phase 3 classifies:
+   - BLOCKING: `adr-0001.md` missing `## Status` (story-readiness silently passes)
+   - HIGH: `tr-registry.yaml` missing; `combat.md` missing Acceptance Criteria (can't generate stories)
+   - MEDIUM: `combat.md` missing Formulas
+4. Phase 4 builds ordered migration plan:
+   - Step 1 (BLOCKING): Add `## Status` to `adr-0001.md` — command: `/architecture-decision retrofit`
+   - Step 2 (HIGH): Run `/architecture-review` to bootstrap tr-registry.yaml
+   - Step 3 (HIGH): Add Acceptance Criteria to `combat.md` — command: `/design-system retrofit`
+   - Step 4 (MEDIUM): Add Formulas to `combat.md`
+5. Gap Preview shows BLOCKING items as bullets (actual file names), HIGH/MEDIUM as counts
+6. `AskUserQuestion` asks to write the plan; writes after approval
+7. Phase 7 offers to fix the highest-priority gap (ADR Status) immediately
+
+**Assertions:**
+- [ ] BLOCKING gaps are listed as explicit file-name bullets in the Gap Preview
+- [ ] HIGH and MEDIUM shown as counts in Gap Preview
+- [ ] Migration plan items are in BLOCKING-first order
+- [ ] Each plan item includes the fix command or manual steps
+- [ ] `AskUserQuestion` is used before writing
+- [ ] Phase 7 offers to immediately retrofit the first BLOCKING item
+
+---
+
+### Case 3: Mixed State — Some docs compliant, some not, partial report
+
+**Fixture:**
+- 4 GDD files: 2 fully compliant, 2 with gaps (one missing Tuning Knobs, one missing Edge Cases)
+- ADRs: 3 files — 2 compliant, 1 missing `## ADR Dependencies`
+- Stories: 5 files — 3 have TR-ID references, 2 do not
+- Infrastructure: all critical files present; `technical-preferences.md` fully configured
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Skill audits all artifact types
+2. Audit summary shows totals: "4 GDDs (2 fully compliant, 2 with gaps); 3 ADRs
+   (2 fully compliant, 1 with gaps); 5 stories (3 with TR-IDs, 2 without)"
+3. Gap classification:
+   - No BLOCKING gaps
+   - HIGH: 1 ADR missing `## ADR Dependencies`
+   - MEDIUM: 2 GDDs with missing sections; 2 stories missing TR-IDs
+   - LOW: none
+4. Migration plan lists HIGH gap first, then MEDIUM gaps in order
+5. Note included: "Existing stories continue to work — do not regenerate stories
+   that are in progress or done"
+6. `AskUserQuestion` to write plan; writes after approval
+
+**Assertions:**
+- [ ] Per-artifact compliance tallies are shown (N compliant, M with gaps)
+- [ ] Existing story compatibility note is included in the plan
+- [ ] No BLOCKING gaps results in no BLOCKING section in migration plan
+- [ ] HIGH gap precedes MEDIUM gaps in plan ordering
+- [ ] `AskUserQuestion` is used before writing
+
+---
+
+### Case 4: No Artifacts Found — Fresh project, guidance to run /start
+
+**Fixture:**
+- Repository has no files in `design/gdd/`, `docs/architecture/`, `production/epics/`
+- `production/stage.txt` does not exist
+- `src/` directory does not exist or has fewer than 10 files
+- No game-concept.md, no systems-index.md
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Phase 1 existence check finds no artifacts
+2. Skill infers "Fresh" — no brownfield work to migrate
+3. Uses `AskUserQuestion`:
+   - "This looks like a fresh project — no existing artifacts found. `/adopt` is for
+     projects with work to migrate. What would you like to do?"
+   - Options: "Run `/start`", "My artifacts are in a non-standard location", "Cancel"
+4. Skill stops — does not proceed to audit regardless of user selection
+
+**Assertions:**
+- [ ] `AskUserQuestion` is used (not a plain text message) when no artifacts are found
+- [ ] `/start` is presented as a named option
+- [ ] Skill stops after the question — no audit phases run
+- [ ] No adoption plan file is written
+
+---
+
+### Case 5: Director Gate Check — No gate; adopt is a utility audit skill
+
+**Fixture:**
+- Project with a mix of compliant and non-compliant GDDs
+
+**Input:** `/adopt`
+
+**Expected behavior:**
+1. Skill completes full audit and produces migration plan
+2. No director agents are spawned at any point
+3. No gate IDs (CD-*, TD-*, AD-*, PR-*) appear in output
+4. No `/gate-check` is invoked during the skill run
+
+**Assertions:**
+- [ ] No director gate is invoked
+- [ ] No gate skip messages appear
+- [ ] Skill reaches plan-writing or cancellation without any gate verdict
+
+---
+
+## Protocol Compliance
+
+- [ ] Emits "Scanning project artifacts..." before silent read phase
+- [ ] Reads all artifacts silently before presenting any results
+- [ ] Shows Adoption Audit Summary and Gap Preview before asking to write
+- [ ] Uses `AskUserQuestion` before writing the adoption plan file
+- [ ] Adoption plan written to `docs/adoption-plan-[date].md` — not to any other path
+- [ ] Migration plan items ordered: BLOCKING first, HIGH second, MEDIUM third, LOW last
+- [ ] Phase 7 always offers a single specific next action (not a generic list)
+- [ ] Never regenerates existing artifacts — only fills gaps in what exists
+- [ ] Does not invoke director gates at any point
+
+---
+
+## Coverage Notes
+
+- The `gdds`, `adrs`, `stories`, and `infra` argument modes narrow the audit scope;
+  each follows the same pattern as the full audit but limited to that artifact type.
+  Not separately fixture-tested here.
+- The systems-index.md parenthetical status value check (BLOCKING) is a special case
+  that triggers an immediate fix offer before writing the plan; not separately tested.
+- The review-mode.txt prompt (Phase 6b) runs after plan writing if `production/review-mode.txt`
+  does not exist; not separately tested here.
--- a/Show More
+++ b/Show More