添加 claude code game studios 到项目

2026-05-15 14:52:29 +08:00
parent dff559462d
commit a16fe4bff7
415 changed files with 78609 additions and 0 deletions
--- a/Framework/agents/leads/audio-director.md
+++ b/Framework/agents/leads/audio-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: audio-director
+
+## Agent Summary
+**Domain owned:** Music direction and palette, sound design philosophy, audio implementation strategy, mix balance, audio aspects of phase gates.
+**Does NOT own:** Visual design (art-director), code implementation (lead-programmer), narrative story content (narrative-director), UX interaction flows (ux-designer).
+**Model tier:** Sonnet (individual system analysis — audio direction and spec review).
+**Gate IDs handled:** AD-VISUAL (audio aspect of the phase gate; may be referenced as part of AD-PHASE-GATE in the audio dimension).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/audio-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references music direction, sound design, mix, audio implementation — not generic)
+- [ ] `allowed-tools:` list is read-focused; no Bash unless audio asset pipeline checks are justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual design, code implementation, or narrative content
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** An audio specification document is submitted for the game's "Exploration" music layer. The spec defines a generative ambient system using layered stems that shift based on environmental density, designed to reinforce the pillar "lived-in world." The tone palette (sparse, organic, slightly melancholic) matches the established design pillars.
+**Expected:** Returns `APPROVED` with rationale confirming the stem-based approach supports dynamic responsiveness and the tone palette aligns with the pillar vocabulary.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references the specific pillar ("lived-in world") and how the audio spec supports it
+- [ ] Output stays within audio scope — does not comment on visual design of the environment or UI layout
+- [ ] Verdict is clearly labeled with context (e.g., "Audio Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks audio-director to evaluate whether the UI flow for the audio settings menu (the sequence of screens and options) is intuitive and well-organized.
+**Expected:** Agent declines to evaluate UI interaction flow and redirects to ux-designer.
+**Assertions:**
+- [ ] Does not make any binding decision about UI flow or information architecture
+- [ ] Explicitly names `ux-designer` as the correct handler
+- [ ] May note audio-specific requirements for the settings menu (e.g., "must include separate master, music, and SFX sliders"), but defers flow and layout decisions to ux-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A music cue for the final boss encounter is submitted. The cue is an upbeat, major-key orchestral piece with fast tempo. The game pillars and narrative context for this encounter specify "dread, inevitability, and tragic sacrifice." The audio cue's emotional register directly contradicts the intended emotional beat.
+**Expected:** Returns `NEEDS REVISION` with specific citation of the emotional mismatch: the cue's upbeat/major-key/fast-tempo characteristics versus the intended dread/inevitability/sacrifice emotional targets from the pillars and narrative context.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific musical characteristics that conflict with the emotional targets
+- [ ] References the specific emotional targets from the game pillars or narrative context
+- [ ] Provides actionable direction for revision (e.g., "shift to minor key, slower tempo, reduce ensemble density")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** sound-designer proposes implementing audio occlusion using real-time raycast-based physics queries (technical approach). technical-artist argues this is too expensive and proposes a zone-based trigger system instead. Both agree the occlusion effect is desirable; the conflict is purely about implementation approach.
+**Expected:** audio-director decides on the desired audio behavior (what occlusion should sound like and when it should activate), then defers the implementation approach decision to technical-artist or lead-programmer as the implementation experts. audio-director does not make the technical implementation choice.
+**Assertions:**
+- [ ] Defines the desired audio behavior clearly (what should the player hear and when)
+- [ ] Explicitly defers the implementation approach (raycast vs. zone-trigger) to `lead-programmer` or `technical-artist`
+- [ ] Does not unilaterally choose the technical implementation method
+- [ ] Frames the handoff clearly: "audio-director owns what, technical lead owns how"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "emergent stories," "meaningful sacrifice," and "lived-in world." A sound design spec for ambient environmental audio is submitted.
+**Expected:** Assessment evaluates the ambient audio spec against all three pillars specifically — how does the audio support (or undermine) each pillar? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the audio spec's contribution to each pillar explicitly
+- [ ] Does not generate generic audio direction advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar is not supported by the current audio spec and flags it
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared audio domain
+- [ ] Defers implementation approach decisions to technical leads
+- [ ] Does not use gate ID prefix format in the same way as director-tier agents (audio-director uses APPROVED / NEEDS REVISION inline, but should still reference the gate context)
+- [ ] Does not make binding visual design, UX, narrative, or code implementation decisions
+
+---
+
+## Coverage Notes
+- Mix balance review (relative levels between music, SFX, and dialogue) is not covered — a dedicated case should be added.
+- Audio implementation strategy review (middleware choice, streaming approach) is not covered.
+- Interaction between audio-director and the audio specialist agent (if one exists) for implementation delegation is not covered.
+- Localization audio implications (VO recording direction, language-specific music timing) are not covered.
--- a/Framework/agents/leads/game-designer.md
+++ b/Framework/agents/leads/game-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: game-designer
+
+## Agent Summary
+**Domain owned:** Core loop design, progression systems, combat mechanics rules, economy design, player-facing rules and interactions.
+**Does NOT own:** Code implementation (lead-programmer / gameplay-programmer), visual art (art-director), narrative lore and story (narrative-director — coordinates with), balance formula math (systems-designer — collaborates with).
+**Model tier:** Sonnet (individual system design authoring and review).
+**Gate IDs handled:** Design review verdicts on mechanic specs (no named gate ID prefix — uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/game-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references core loop, progression, combat rules, economy, player-facing design — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for GDDs and design docs; no Bash unless design tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over code implementation, visual art style, or standalone narrative lore decisions
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A mechanic spec for a "Stamina-Based Dodge" system is submitted for review. The spec defines: the player has a stamina pool (100 units), each dodge costs 25 stamina, stamina regenerates at 20 units/second when not dodging, and the dodge grants 0.3 seconds of invincibility. The core loop interaction is clearly described, rules are unambiguous, and edge cases (stamina at 0, dodge during regen) are addressed.
+**Expected:** Returns `APPROVED` with rationale confirming the core loop clarity, unambiguous rules, and edge case coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references specific design quality criteria (clear rules, edge case coverage, core loop coherence)
+- [ ] Output stays within design scope — does not comment on how to implement it in code or what art assets it requires
+- [ ] Verdict is clearly labeled with context (e.g., "Mechanic Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks game-designer to write the in-world lore explanation for why the stamina system exists (e.g., the narrative reason characters have stamina limits in the game world).
+**Expected:** Agent declines to write narrative/lore content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write narrative or lore content
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the design intent that the lore should support (e.g., "the stamina system should reinforce the physical realism theme"), but defers the writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A mechanic spec for "Environmental Hazard Damage" is submitted. The spec defines three hazard types (fire, acid, electricity) but does not specify what happens when a player is simultaneously affected by multiple hazard types, what happens when a hazard is applied during the invincibility window from a dodge, or what the damage frequency is (per-second, per-tick, on-enter).
+**Expected:** Returns `NEEDS REVISION` with specific identification of the undefined edge cases: multi-hazard interaction, hazard-during-invincibility, and damage frequency specification.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific missing edge cases by name
+- [ ] Does not reject the entire mechanic — identifies the specific gaps to fill
+- [ ] Provides actionable guidance on what to define (not how to implement it)
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** systems-designer proposes a damage formula with 6 variables and complex scaling interactions, arguing it produces the best tuning granularity. game-designer believes the formula is too complex for players to intuit and want a simpler 2-variable version.
+**Expected:** game-designer owns the conceptual rule and player experience intention ("the damage should feel understandable to players"), but defers the formula granularity question to systems-designer. If the disagreement cannot be resolved between them (one wants complex, one wants simple), escalate to creative-director for a player experience ruling.
+**Assertions:**
+- [ ] Clearly states the player experience intention (intuitive damage, player agency)
+- [ ] Defers formula granularity decisions to `systems-designer`
+- [ ] Escalates unresolved disagreement to `creative-director` for player experience arbiter ruling
+- [ ] Does not unilaterally impose a formula structure on systems-designer
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "player authorship," "consequence permanence," and "world responsiveness." A new mechanic spec for "permadeath with legacy bonuses" is submitted for review.
+**Expected:** Assessment evaluates the mechanic against all three provided pillars — how does permadeath support player authorship, how do legacy bonuses express consequence permanence, and how does the world respond to a player's death? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the mechanic's contribution to each pillar explicitly
+- [ ] Does not generate generic game design advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar creates a tension with the mechanic and flags it with a specific concern
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared game design domain
+- [ ] Escalates design-vs-formula conflicts to creative-director when unresolved
+- [ ] Does not make binding code implementation, visual art, or standalone lore decisions
+- [ ] Provides actionable design feedback, not implementation prescriptions
+
+---
+
+## Coverage Notes
+- Economy design review (resource sinks, faucets, inflation prevention) is not covered — a dedicated case should be added.
+- Progression system review (XP curves, unlock gates, player power trajectory) is not covered.
+- Core loop validation across multiple interconnected systems (not just a single mechanic) is not covered — deferred to /review-all-gdds integration.
+- Coordination protocol with systems-designer on formula ownership boundary could benefit from additional cases.
--- a/Framework/agents/leads/lead-programmer.md
+++ b/Framework/agents/leads/lead-programmer.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: lead-programmer
+
+## Agent Summary
+**Domain owned:** Code architecture decisions, LP-FEASIBILITY gate, LP-CODE-REVIEW gate, coding standards enforcement, tech stack decisions within the approved engine.
+**Does NOT own:** Game design decisions (game-designer), creative direction (creative-director), production scheduling (producer), visual art direction (art-director).
+**Model tier:** Sonnet (implementation-level analysis of individual systems).
+**Gate IDs handled:** LP-FEASIBILITY, LP-CODE-REVIEW.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/lead-programmer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references code architecture, feasibility, code review, coding standards — not generic)
+- [ ] `allowed-tools:` list includes Read for source files; Bash may be included for static analysis or test runs; no write access outside `src/` without explicit delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over game design, creative direction, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new `CombatSystem` implementation is submitted for code review. The system uses dependency injection for all external references, has doc comments on all public APIs, follows the project's naming conventions, and includes unit tests for all public methods. Request is tagged LP-CODE-REVIEW.
+**Expected:** Returns `LP-CODE-REVIEW: APPROVED` with rationale confirming dependency injection usage, doc comment coverage, naming convention compliance, and test coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS CHANGES
+- [ ] Verdict token is formatted as `LP-CODE-REVIEW: APPROVED`
+- [ ] Rationale references specific coding standards criteria (DI, doc comments, naming, tests)
+- [ ] Output stays within code quality scope — does not comment on whether the mechanic is fun or fits creative vision
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Team member asks lead-programmer to review and approve the balance formula for player damage scaling across levels, checking whether the numbers "feel right."
+**Expected:** Agent declines to evaluate design balance and redirects to systems-designer.
+**Assertions:**
+- [ ] Does not make any binding assessment of formula balance or game feel
+- [ ] Explicitly names `systems-designer` as the correct handler
+- [ ] May note code implementation concerns about the formula (e.g., integer overflow risk at max level), but defers all balance evaluation to systems-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A proposed pathfinding approach for enemy AI uses a brute-force nearest-neighbor search against all other entities every frame. With expected enemy counts of 200+, this is O(n²) per frame at 60fps. Request is tagged LP-FEASIBILITY.
+**Expected:** Returns `LP-FEASIBILITY: INFEASIBLE` with specific citation of the O(n²) complexity, the entity count threshold, and the resulting per-frame cost against the target frame budget.
+**Assertions:**
+- [ ] Verdict is exactly one of FEASIBLE / CONCERNS / INFEASIBLE — not freeform text
+- [ ] Verdict token is formatted as `LP-FEASIBILITY: INFEASIBLE`
+- [ ] Rationale includes the specific algorithmic complexity and entity count numbers
+- [ ] Suggests at least one alternative approach (e.g., spatial hashing, KD-tree) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a mechanic where every NPC maintains a full simulation of needs, schedule, and memory (similar to a full life-sim AI). lead-programmer calculates this will exceed the frame budget by 3x at target NPC counts. game-designer insists the mechanic is core to the game vision.
+**Expected:** lead-programmer states the specific frame budget violation with numbers, proposes alternative approaches (e.g., LOD-based simulation, simplified need model), but explicitly defers the "is this worth the cost or should the design change" decision to creative-director as the creative arbiter.
+**Assertions:**
+- [ ] States the specific frame budget violation (e.g., 3x over budget at N entities)
+- [ ] Proposes at least one technically viable alternative
+- [ ] Explicitly defers the design priority decision to `creative-director`
+- [ ] Does not unilaterally cut or modify the mechanic design
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the project's frame budget: 16.67ms total per frame, with 4ms allocated to AI systems. A new AI behavior system is submitted that profiling estimates will consume 7ms per frame under normal conditions.
+**Expected:** Assessment references the specific frame budget allocation from context (4ms AI budget), identifies the 7ms estimate as exceeding the allocation by 3ms, and returns CONCERNS or INFEASIBLE with those specific numbers cited.
+**Assertions:**
+- [ ] References the specific frame budget figures from the provided context (16.67ms total, 4ms AI allocation)
+- [ ] Uses the specific 7ms estimate from the submission in the comparison
+- [ ] Does not give generic "this might be slow" advice — cites concrete numbers
+- [ ] Verdict rationale is traceable to the provided budget constraints
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns LP-CODE-REVIEW verdicts using APPROVED / NEEDS CHANGES vocabulary only
+- [ ] Returns LP-FEASIBILITY verdicts using FEASIBLE / CONCERNS / INFEASIBLE vocabulary only
+- [ ] Stays within declared code architecture domain
+- [ ] Defers design priority conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `LP-FEASIBILITY: INFEASIBLE`) not inline prose verdicts
+- [ ] Does not make binding game design or creative direction decisions
+
+---
+
+## Coverage Notes
+- Multi-file code review spanning several interdependent systems is not covered — deferred to integration tests.
+- Tech debt assessment and prioritization are not covered here — deferred to /tech-debt skill integration.
+- Coding standards document updates (adding a new forbidden pattern) are not covered.
+- Interaction with qa-lead on what constitutes a testable unit (LP vs QL boundary) is not covered.
--- a/Framework/agents/leads/level-designer.md
+++ b/Framework/agents/leads/level-designer.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: level-designer
+
+## Agent Summary
+**Domain owned:** Level layouts, encounter design, pacing and tension arc, environmental storytelling, spatial puzzles.
+**Does NOT own:** Narrative dialogue (writer / narrative-director), visual art style (art-director), code implementation (lead-programmer / ai-programmer), enemy AI behavior logic (ai-programmer / gameplay-programmer).
+**Model tier:** Sonnet (individual system analysis — level design review and encounter assessment).
+**Gate IDs handled:** Level design review verdicts (uses APPROVED / REVISION NEEDED vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/level-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references level layout, encounter design, pacing, environmental storytelling — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for level design documents and GDDs; no Bash unless level tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative dialogue, AI behavior code, or visual art style
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A level layout document for "The Flooded Tunnels" is submitted for review. The layout includes: a low-intensity exploration opening section, two mid-intensity encounters with visible escape routes, a tension-building narrow passage with environmental hazards, and a high-intensity final encounter room followed by a release/reward area. The pacing follows a classic tension-arc structure.
+**Expected:** Returns `APPROVED` with rationale confirming the pacing follows the tension arc, encounters are varied in intensity, and spatial readability supports player navigation.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED
+- [ ] Rationale references specific pacing arc elements (opening, escalation, climax, release)
+- [ ] Output stays within level design scope — does not comment on visual art style or enemy AI code behavior
+- [ ] Verdict is clearly labeled with context (e.g., "Level Design Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks level-designer to write the behavior tree code for an enemy patrol AI that navigates the level layout.
+**Expected:** Agent declines to write AI behavior code and redirects to ai-programmer or gameplay-programmer.
+**Assertions:**
+- [ ] Does not write or specify code for AI behavior logic
+- [ ] Explicitly names `ai-programmer` or `gameplay-programmer` as the correct handler
+- [ ] May specify the desired patrol behavior from a level design perspective (e.g., "patrol should cover both chokepoints and create pressure in this zone"), but defers all code implementation to the programmer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A level layout for "The Ancient Forge" is submitted. Section 3 of the level introduces a dramatically harder enemy encounter (elite enemy with new attack patterns) with no preceding tutorial moment, no environmental readability cues (no visible cover or safe zones), and no checkpoint nearby. Players are likely to die repeatedly with no clear signal of what to do differently.
+**Expected:** Returns `REVISION NEEDED` with specific identification of the difficulty spike in section 3, the missing readability cue, and the absence of a nearby checkpoint to reduce frustration from repeated deaths.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED — not freeform text
+- [ ] Rationale identifies section 3 specifically as the location of the issue
+- [ ] Identifies the three specific problems: difficulty spike, missing readability cue, missing checkpoint
+- [ ] Provides actionable revision guidance (e.g., "add a visible safe zone, pre-encounter cue object, or reduce elite's health for first introduction")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants higher encounter density throughout the level (more enemies in each room) to increase combat challenge. level-designer believes this density undermines the pacing arc by eliminating rest periods and making the level feel relentless without reward.
+**Expected:** level-designer clearly articulates the pacing concern (eliminating rest periods removes the tension-release rhythm), acknowledges game-designer's challenge goal, and escalates to creative-director for a design arbiter ruling on whether challenge density or pacing rhythm takes precedence for this level.
+**Assertions:**
+- [ ] Articulates the specific pacing impact of increased encounter density
+- [ ] Escalates to `creative-director` as the design arbiter
+- [ ] Does not unilaterally override game-designer's challenge density request
+- [ ] Frames the conflict clearly: "challenge density vs. pacing rhythm — which takes precedence here?"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes game-feel notes specifying: "exploration sections should feel vast and lonely," "combat sections should feel urgent and claustrophobic," and "reward rooms should feel safe and visually distinct." A new level layout is submitted for review.
+**Expected:** Assessment evaluates each section type (exploration, combat, reward) against the specific feel targets from the provided context. Uses the exact vocabulary from the feel notes ("vast and lonely," "urgent and claustrophobic," "safe and visually distinct") in the rationale.
+**Assertions:**
+- [ ] References all three feel targets from the provided context by their exact vocabulary
+- [ ] Evaluates each relevant section of the submitted layout against its corresponding feel target
+- [ ] Does not generate generic pacing advice — all feedback is tied to the provided feel targets
+- [ ] Identifies any section where the layout conflicts with its assigned feel target
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / REVISION NEEDED vocabulary only
+- [ ] Stays within declared level design domain
+- [ ] Escalates challenge-density vs. pacing conflicts to creative-director
+- [ ] Does not make binding narrative dialogue, AI code implementation, or visual art style decisions
+- [ ] Provides actionable level design feedback with spatial specifics, not abstract design opinions
+
+---
+
+## Coverage Notes
+- Environmental storytelling review (using spatial elements to convey narrative without dialogue) could benefit from a dedicated case.
+- Spatial puzzle design review is not covered — a dedicated case should be added when puzzle mechanics are defined.
+- Multi-level pacing review (arc across an entire act or world map) is not covered — deferred to milestone-level design review.
+- Interaction between level-designer and narrative-director for environmental lore placement is not covered.
+- Accessibility review of level layouts (colorblind indicators, difficulty options for spatial challenges) is not covered.
--- a/Framework/agents/leads/narrative-director.md
+++ b/Framework/agents/leads/narrative-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: narrative-director
+
+## Agent Summary
+**Domain owned:** Story architecture, character design direction, world-building oversight, ND-CONSISTENCY gate, dialogue quality review.
+**Does NOT own:** Visual art style (art-director), technical systems or code (lead-programmer), production scheduling (producer), game mechanics rules (game-designer).
+**Model tier:** Sonnet (individual system analysis — narrative consistency and lore review).
+**Gate IDs handled:** ND-CONSISTENCY.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/narrative-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references story, character, world-building, consistency — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for lore documents, GDDs, and narrative docs; no Bash unless justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual style, technical systems, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new lore document for "The Sunken Archive" location is submitted. The document establishes that the Archive was flooded 200 years ago during the Great Collapse, consistent with the established timeline in the world-bible. All named characters referenced are consistent with their established backstories. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: CONSISTENT` with rationale confirming the timeline alignment and character reference accuracy.
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: CONSISTENT`
+- [ ] Rationale references specific established facts verified (the 200-year timeline, the Great Collapse event)
+- [ ] Output stays within narrative scope — does not comment on visual design of the location or its technical implementation
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks narrative-director to review and optimize the shader code used for the "ancient glow" visual effect on Archive artifacts.
+**Expected:** Agent declines to evaluate shader code and redirects to the appropriate engine specialist (godot-gdscript-specialist or equivalent shader specialist).
+**Assertions:**
+- [ ] Does not make any binding decision about shader code or visual implementation
+- [ ] Explicitly names the appropriate engine or shader specialist as the correct handler
+- [ ] May note the intended narrative mood the effect should convey (e.g., "should feel ancient and sacred, not technological"), but defers all technical visual implementation
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A new character backstory document is submitted for the character "Aldric Vorne." The document states Aldric was born in the Capital 150 years ago and witnessed the Great Collapse firsthand. However, the established world-bible states Aldric was born 50 years after the Great Collapse in a provincial town, not the Capital. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: INCONSISTENT` with specific citation of the two contradicting facts: the birth timing (150 years ago vs. 50 years post-Collapse) and the birth location (Capital vs. provincial town).
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT — not freeform text
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: INCONSISTENT`
+- [ ] Rationale cites both contradictions specifically, not just "doesn't match lore"
+- [ ] References the authoritative source (world-bible) for the established facts
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** A writer has established in their latest dialogue that the ancient civilization "spoke only in song." The world-builder's existing lore entries describe the same civilization communicating through written glyphs. Both are in the narrative domain, and the two creators disagree on which is canonical.
+**Expected:** narrative-director makes a binding canonical decision within their domain. They do not need to escalate to a higher authority for intra-narrative conflicts — this is within their declared domain authority. They issue a ruling (e.g., "glyph-writing is the canonical primary communication; song may be ritual/ceremonial") and direct both writer and world-builder to align their work to the ruling.
+**Assertions:**
+- [ ] Makes a binding canonical decision — does not defer this intra-narrative conflict to creative-director
+- [ ] Decision is clearly stated and provides a path to reconciliation for both parties
+- [ ] Directs both parties (writer and world-builder) to update their respective documents to align
+- [ ] Notes the decision in a way that can be added to the world-bible as a canonical fact
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes three existing lore documents: the world-bible (establishes the Great Collapse timeline and causes), the character registry (lists canonical character ages, origins, and allegiances), and a faction document (describes the Sunken Archive Keepers). A new story chapter is submitted that introduces a previously unregistered character.
+**Expected:** Assessment cross-references the new character against the character registry (no conflict), checks the chapter's timeline references against the world-bible, and evaluates the chapter's portrayal of the Archive Keepers against the faction document. Uses specific facts from all three provided documents in the assessment.
+**Assertions:**
+- [ ] Cross-references the new character against the provided character registry
+- [ ] Checks timeline references against the provided world-bible facts
+- [ ] Evaluates faction portrayal against the provided faction document
+- [ ] Does not generate generic narrative feedback — all assertions are traceable to the provided documents
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using CONSISTENT / INCONSISTENT vocabulary only
+- [ ] Stays within declared narrative domain
+- [ ] Makes binding decisions for intra-narrative conflicts without unnecessary escalation
+- [ ] Uses gate IDs in output (e.g., `ND-CONSISTENCY: INCONSISTENT`) not inline prose verdicts
+- [ ] Does not make binding visual design, technical, or production decisions
+
+---
+
+## Coverage Notes
+- Dialogue quality review (distinct from world-building consistency) is not covered — a dedicated case should be added.
+- Multi-document consistency check across a full chapter set is not covered — deferred to /review-all-gdds integration.
+- Narrative impact of mechanical changes (e.g., a game mechanic that undermines story tension) requires coordination with game-designer and is not covered here.
+- Character arc review (progression, motivation coherence over time) is not covered.
--- a/Framework/agents/leads/qa-lead.md
+++ b/Framework/agents/leads/qa-lead.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: qa-lead
+
+## Agent Summary
+**Domain owned:** Test strategy, QL-STORY-READY gate, QL-TEST-COVERAGE gate, bug severity triage, release quality gates.
+**Does NOT own:** Feature implementation (programmers), game design decisions, creative direction, production scheduling.
+**Model tier:** Sonnet (individual system analysis — story readiness and coverage assessment).
+**Gate IDs handled:** QL-STORY-READY, QL-TEST-COVERAGE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/qa-lead.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references test strategy, story readiness, coverage, bug triage — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Read for story files, test files, and coding-standards; Bash only if running test commands is required
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over implementation decisions or game design
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A story for "Player takes damage from hazard tiles" is submitted for readiness check. The story has three acceptance criteria: (1) Player health decreases by the hazard's damage value, (2) A damage visual feedback plays, (3) Player cannot take damage again for 0.5 seconds (invincibility window). All three ACs are measurable and specific. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: ADEQUATE` with rationale confirming that all three ACs are present, specific, and testable.
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE
+- [ ] Verdict token is formatted as `QL-STORY-READY: ADEQUATE`
+- [ ] Rationale references the specific number of ACs (3) and confirms each is measurable
+- [ ] Output stays within QA scope — does not comment on whether the mechanic is designed well
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks qa-lead to implement the automated test harness for the new physics system.
+**Expected:** Agent declines to implement the test code and redirects to the appropriate programmer (gameplay-programmer or lead-programmer).
+**Assertions:**
+- [ ] Does not write or propose code implementation
+- [ ] Explicitly names `lead-programmer` or `gameplay-programmer` as the correct handler for implementation
+- [ ] May define what the test should verify (test strategy), but defers the code writing to programmers
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A story for "Combat feels responsive and punchy" is submitted for readiness check. The single acceptance criterion reads: "Combat should feel good to the player." This is subjective and unmeasurable. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: INADEQUATE` with specific identification of the unmeasurable AC and guidance on what would make it testable (e.g., "input-to-hit-feedback latency ≤ 100ms").
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE — not freeform text
+- [ ] Verdict token is formatted as `QL-STORY-READY: INADEQUATE`
+- [ ] Rationale identifies the specific AC that fails the measurability requirement
+- [ ] Provides actionable guidance on how to rewrite the AC to be testable
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** gameplay-programmer and qa-lead disagree on whether a test that asserts "enemy patrol path visits all waypoints within 5 seconds" is deterministic enough to be a valid automated test. gameplay-programmer argues timing variability makes it flaky; qa-lead believes it is acceptable.
+**Expected:** qa-lead acknowledges the technical flakiness concern and escalates to lead-programmer for a technical ruling on what constitutes an acceptable determinism standard for automated tests.
+**Assertions:**
+- [ ] Escalates to `lead-programmer` for the technical ruling on determinism standards
+- [ ] Does not unilaterally override the gameplay-programmer's flakiness concern
+- [ ] Frames the escalation clearly: "this is a technical standards question, not a QA coverage question"
+- [ ] Does not abandon the coverage requirement — asks for a deterministic alternative if the current approach is ruled flaky
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the coding-standards.md testing standards section, which specifies: Logic stories require blocking automated unit tests, Visual/Feel stories require screenshots + lead sign-off (advisory), Config/Data stories require smoke check pass (advisory). A story classified as "Logic" type is submitted with only a manual walkthrough document as evidence.
+**Expected:** Assessment references the specific test evidence requirements from coding-standards.md, identifies that a "Logic" story requires an automated unit test (not just a manual walkthrough), and returns INADEQUATE with the specific requirement cited.
+**Assertions:**
+- [ ] References the specific story type classification ("Logic") from the provided context
+- [ ] Cites the specific evidence requirement for Logic stories (automated unit test) from coding-standards.md
+- [ ] Identifies the submitted evidence type (manual walkthrough) as insufficient for this story type
+- [ ] Does not apply advisory-level requirements as blocking requirements
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns QL-STORY-READY verdicts using ADEQUATE / INADEQUATE vocabulary only
+- [ ] Returns QL-TEST-COVERAGE verdicts using ADEQUATE / INADEQUATE vocabulary only (or PASS / FAIL for release gates)
+- [ ] Stays within declared QA and test strategy domain
+- [ ] Escalates technical standards disputes to lead-programmer
+- [ ] Uses gate IDs in output (e.g., `QL-STORY-READY: INADEQUATE`) not inline prose verdicts
+- [ ] Does not make binding implementation or game design decisions
+
+---
+
+## Coverage Notes
+- QL-TEST-COVERAGE (overall coverage assessment for a sprint or milestone) is not covered — a dedicated case should be added when coverage reports are available.
+- Bug severity triage (P0/P1/P2 classification) is not covered here — deferred to /bug-triage skill integration.
+- Release quality gate behavior (PASS / FAIL vocabulary variant) is not covered.
+- Interaction between QL-STORY-READY and story Done criteria (/story-done skill) is not covered.
--- a/Framework/agents/leads/systems-designer.md
+++ b/Framework/agents/leads/systems-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: systems-designer
+
+## Agent Summary
+**Domain owned:** Combat formulas, progression curves, crafting recipes, status effect interactions, economy math, numerical balance.
+**Does NOT own:** Narrative and lore (narrative-director), visual design (art-director), code implementation (lead-programmer), conceptual mechanic rules (game-designer — collaborates with).
+**Model tier:** Sonnet (individual system analysis — formula review and balance math).
+**Gate IDs handled:** Systems review verdicts on formulas and balance specs (uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/systems-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references formulas, progression curves, balance math, economy — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Bash for formula evaluation scripts if the project uses them; no write access outside `design/balance/` without delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative, visual design, or conceptual mechanic rule ownership
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A damage formula is submitted for review: `damage = base_attack * (1 + strength_modifier * 0.1) - defense * 0.5`, with defined ranges: base_attack [10–100], strength_modifier [0–20], defense [0–50]. The formula produces positive damage across all valid input ranges, scales smoothly, and has no division-by-zero or overflow risk within the defined value bounds.
+**Expected:** Returns `APPROVED` with rationale confirming the formula is balanced within the design parameters, produces valid output across the full input range, and has no degenerate cases.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale demonstrates verification across the input range (min/max cases checked)
+- [ ] Output stays within systems domain — does not comment on whether the mechanic is fun or how to implement it
+- [ ] Verdict is clearly labeled with context (e.g., "Formula Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A writer asks systems-designer to draft the quest script for a side quest that rewards the player with a rare crafting ingredient.
+**Expected:** Agent declines to write quest script content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write quest narrative content or dialogue
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the systems implications of the reward (e.g., "this ingredient should be rare enough to matter per the crafting economy model"), but defers all script writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A damage scaling formula is submitted: `damage = base_attack * level_multiplier`, where `level_multiplier = (player_level / enemy_level) ^ 2`. At max player level (50) against a min-level enemy (1), the multiplier is 2500x — producing 25,000+ damage from a 10-base-attack weapon, far exceeding any meaningful balance. This is a degenerate case at max level.
+**Expected:** Returns `NEEDS REVISION` with specific identification of the degenerate case: at max level vs. min enemy, the formula produces a 2500x multiplier that destroys any balance ceiling.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale includes the specific degenerate input values (player level 50, enemy level 1) and the resulting output (2500x multiplier)
+- [ ] Identifies the specific formula component causing the issue (the squared ratio)
+- [ ] Suggests at least one revision approach (e.g., clamping the ratio, using a log scale) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a simple, 2-variable damage formula for player intuitiveness. systems-designer argues that a 6-variable formula with elemental interactions is necessary for the depth of the combat system. Neither can agree on the right level of complexity.
+**Expected:** systems-designer presents the trade-offs clearly — the tuning granularity of the 6-variable system versus the player legibility of the 2-variable system — and escalates to creative-director for a player experience ruling. The question of "how complex should the formula be for players" is a player experience question, not a pure math question.
+**Assertions:**
+- [ ] Presents the trade-offs between both approaches with specific examples
+- [ ] Escalates to `creative-director` for the player experience ruling
+- [ ] Does not unilaterally impose the 6-variable formula over game-designer's objection
+- [ ] Remains available to implement whichever complexity level is approved
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes current balance data: enemy HP values range from 100 to 10,000; player attack values range from 15 to 150; target time-to-kill is 8–12 seconds at balanced matchups; the current formula is under review. A proposed revised formula is submitted.
+**Expected:** Assessment runs the proposed formula against the provided balance data (minimum and maximum input pairs, balanced matchup scenario) and verifies the time-to-kill falls within the 8–12 second target window. References specific numbers from the provided data.
+**Assertions:**
+- [ ] Uses the specific HP and attack value ranges from the provided balance data
+- [ ] Calculates or estimates time-to-kill for at minimum a balanced matchup scenario
+- [ ] Verifies the result against the provided 8–12 second target window
+- [ ] Does not give generic balance advice — all assertions use the provided numbers
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared systems and formula domain
+- [ ] Escalates player-experience complexity trade-offs to creative-director
+- [ ] Does not make binding narrative, visual, code implementation, or conceptual mechanic decisions
+- [ ] Provides concrete formula analysis, not subjective design opinions
+
+---
+
+## Coverage Notes
+- Progression curve review (XP curves, level-up scaling) is not covered — a dedicated case should be added.
+- Economy model review (resource generation and sink rates, inflation prevention) is not covered.
+- Status effect interaction matrix (stacking rules, priority, immunity interactions) is not covered.
+- Cross-system formula dependency review (e.g., crafting formula that feeds into combat formula) is not covered — deferred to integration tests.