添加 claude code game studios 到项目

2026-05-15 14:52:29 +08:00
parent dff559462d
commit a16fe4bff7
415 changed files with 78609 additions and 0 deletions
--- a/Framework/agents/directors/art-director.md
+++ b/Framework/agents/directors/art-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: art-director
+
+## Agent Summary
+**Domain owned:** Visual identity, art bible authorship and enforcement, asset quality standards, UI/UX visual design, visual phase gate, concept art evaluation.
+**Does NOT own:** UX interaction flows and information architecture (ux-designer's domain), audio direction (audio-director), code implementation.
+**Model tier:** Sonnet (note: despite the "director" title, art-director is assigned Sonnet per coordination-rules.md — it handles individual system analysis, not multi-document phase gate synthesis at the Opus level).
+**Gate IDs handled:** AD-CONCEPT-VISUAL, AD-ART-BIBLE, AD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/art-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references visual identity, art bible, asset standards — not generic)
+- [ ] `allowed-tools:` list is read-focused; image review capability if supported; no Bash unless asset pipeline checks are justified
+- [ ] Model tier is `claude-sonnet-4-6` (NOT Opus — coordination-rules.md assigns Sonnet to art-director)
+- [ ] Agent definition does not claim authority over UX interaction flows or audio direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** The art bible's color palette section is submitted for review. The section defines a desaturated earth-tone primary palette with high-contrast accent colors tied to the game pillar "beauty in decay." The palette is internally consistent and references the pillar vocabulary. Request is tagged AD-ART-BIBLE.
+**Expected:** Returns `AD-ART-BIBLE: APPROVE` with rationale confirming the palette's internal consistency and its alignment with the stated pillar.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `AD-ART-BIBLE: APPROVE`
+- [ ] Rationale references the specific palette characteristics and pillar alignment — not generic art advice
+- [ ] Output stays within visual domain — does not comment on UX interaction patterns or audio mood
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Sound designer asks art-director to specify how ambient audio should layer and duck when the player enters a combat zone.
+**Expected:** Agent declines to define audio behavior and redirects to audio-director.
+**Assertions:**
+- [ ] Does not make any binding decision about audio layering or ducking behavior
+- [ ] Explicitly names `audio-director` as the correct handler
+- [ ] May note if the audio has visual mood implications (e.g., "the audio should match the visual tension of the zone"), but defers all audio specification to audio-director
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** Concept art for the protagonist is submitted. The art uses a vivid, saturated color palette (primary: #FF4500, #00BFFF) that directly contradicts the established art bible's "desaturated earth-tones" palette specification. Request is tagged AD-CONCEPT-VISUAL.
+**Expected:** Returns `AD-CONCEPT-VISUAL: CONCERNS` with specific citation of the palette discrepancy, referencing the art bible's stated palette values versus the submitted concept's palette.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `AD-CONCEPT-VISUAL: CONCERNS`
+- [ ] Rationale specifically identifies the palette conflict — not a generic "doesn't match style" comment
+- [ ] References the art bible as the authoritative source for the correct palette
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** ux-designer proposes using high-contrast, brightly colored icons for the HUD to improve readability. art-director believes this violates the art bible's muted visual language and would undermine the visual identity.
+**Expected:** art-director states the visual identity concern and references the art bible, acknowledges ux-designer's readability goal as legitimate, and escalates to creative-director to arbitrate the trade-off between visual coherence and usability.
+**Assertions:**
+- [ ] Escalates to `creative-director` (shared parent for creative domain conflicts)
+- [ ] Does not unilaterally override ux-designer's readability recommendation
+- [ ] Clearly frames the conflict as a trade-off between two legitimate goals
+- [ ] References the specific art bible rule being violated
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the existing art bible with specific palette values (primary: #8B7355, #6B6B47; accent: #C8A96E) and style rules ("no pure white, no pure black; all shadows have warm undertones"). A new asset is submitted for review.
+**Expected:** Assessment references the specific hex values and style rules from the provided art bible, not generic color theory advice. Any concerns are tied to specific violations of the provided rules.
+**Assertions:**
+- [ ] References specific palette values from the provided art bible context
+- [ ] Applies the specific style rules (no pure white/black, warm shadow undertones) from the provided document
+- [ ] Does not generate generic art direction feedback disconnected from the supplied art bible
+- [ ] Verdict rationale is traceable to specific lines or rules in the provided context
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared visual domain
+- [ ] Escalates UX-vs-visual conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `AD-ART-BIBLE: APPROVE`) not inline prose verdicts
+- [ ] Does not make binding UX interaction, audio, or code implementation decisions
+
+---
+
+## Coverage Notes
+- AD-PHASE-GATE (full visual phase advancement) is not covered — deferred to integration with /gate-check skill.
+- Asset pipeline standards (file format, resolution, naming conventions) compliance checks are not covered here.
+- Shader visual output review is not covered — that interaction with the engine specialist is deferred.
+- UI component visual review (as distinct from UX flow review) could benefit from additional cases.
--- a/Framework/agents/directors/creative-director.md
+++ b/Framework/agents/directors/creative-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: creative-director
+
+## Agent Summary
+**Domain owned:** Creative vision, game pillars, GDD alignment, systems decomposition feedback, narrative direction, playtest feedback interpretation, phase gate (creative aspect).
+**Does NOT own:** Technical architecture or implementation details (delegates to technical-director), production scheduling (producer), visual art style execution (delegates to art-director).
+**Model tier:** Opus (multi-document synthesis, high-stakes phase gate verdicts).
+**Gate IDs handled:** CD-PILLARS, CD-GDD-ALIGN, CD-SYSTEMS, CD-NARRATIVE, CD-PLAYTEST, CD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/creative-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references creative vision, pillars, GDD alignment — not generic)
+- [ ] `allowed-tools:` list is read-heavy; should not include Bash unless justified by a creative workflow need
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over technical architecture or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A game concept document is submitted for pillar review. The concept describes a narrative survival game built around three pillars: "emergent stories," "meaningful sacrifice," and "lived-in world." Request is tagged CD-PILLARS.
+**Expected:** Returns `CD-PILLARS: APPROVE` with rationale citing how each pillar is represented in the concept and any reinforcing or weakening signals found in the document.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `CD-PILLARS: APPROVE` (gate ID prefix, colon, verdict keyword)
+- [ ] Rationale references the three specific pillars by name, not generic creative advice
+- [ ] Output stays within creative scope — does not comment on engine feasibility or sprint schedule
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Developer asks creative-director to review a proposed PostgreSQL schema for storing player save data.
+**Expected:** Agent declines to evaluate the schema and redirects to technical-director.
+**Assertions:**
+- [ ] Does not make any binding decision about the schema design
+- [ ] Explicitly names `technical-director` as the correct handler
+- [ ] May note whether the data model has creative implications (e.g., what player data is tracked), but defers structural decisions entirely
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A GDD for the "Crafting" system is submitted. Section 4 (Formulas) defines a resource decay formula that punishes exploration — contradicting the Player Fantasy section which calls for "freedom to roam without fear." Request is tagged CD-GDD-ALIGN.
+**Expected:** Returns `CD-GDD-ALIGN: CONCERNS` with specific citation of the contradiction between the formula behavior and the Player Fantasy statement.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `CD-GDD-ALIGN: CONCERNS`
+- [ ] Rationale quotes or directly references GDD Section 4 (Formulas) and the Player Fantasy section
+- [ ] Does not prescribe a specific formula fix — that belongs to systems-designer
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** technical-director raises a concern that the core loop mechanic (real-time branching conversations) is prohibitively expensive to implement and recommends cutting it. creative-director disagrees on creative grounds.
+**Expected:** creative-director acknowledges the technical constraint, does not override technical-director's feasibility assessment, but retains authority to define what the creative goal is. For the conflict itself, creative-director is the top-level creative escalation point and defers to technical-director on implementation feasibility while advocating for the design intent. The resolution path is for both to jointly present trade-off options to the user.
+**Assertions:**
+- [ ] Does not unilaterally override technical-director's feasibility concern
+- [ ] Clearly separates "what we want creatively" from "how it gets built"
+- [ ] Proposes presenting trade-offs to the user rather than resolving unilaterally
+- [ ] Does not claim to own implementation decisions
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game pillars document (`design/gdd/pillars.md`) and a new mechanic spec for review. The pillars document defines "player authorship," "consequence permanence," and "world responsiveness" as the three core pillars.
+**Expected:** Assessment uses the exact pillar vocabulary from the provided document, not generic creative heuristics. Any approval or concern is tied back to one or more of the three named pillars.
+**Assertions:**
+- [ ] Uses the exact pillar names from the provided context document
+- [ ] Does not generate generic creative feedback disconnected from the supplied pillars
+- [ ] References the specific pillar(s) most relevant to the mechanic under review
+- [ ] Does not reference pillars not present in the provided document
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared creative domain
+- [ ] Escalates conflicts by presenting trade-offs to user rather than unilateral override
+- [ ] Uses gate IDs in output (e.g., `CD-PILLARS: APPROVE`) not inline prose verdicts
+- [ ] Does not make binding cross-domain decisions (technical, production, art execution)
+
+---
+
+## Coverage Notes
+- Multi-gate scenario (e.g., single submission triggering both CD-PILLARS and CD-GDD-ALIGN) is not covered here — deferred to integration tests.
+- CD-PHASE-GATE (full phase advancement) involves synthesizing multiple sub-gate results; this complex case is deferred.
+- Playtest report interpretation (CD-PLAYTEST) is not covered — a dedicated case should be added when the playtest-report skill produces structured output.
+- Interaction with art-director on visual-pillar alignment is not covered.
--- a/Framework/agents/directors/producer.md
+++ b/Framework/agents/directors/producer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: producer
+
+## Agent Summary
+**Domain owned:** Scope management, sprint planning validation, milestone tracking, epic prioritization, production phase gate.
+**Does NOT own:** Game design decisions (creative-director / game-designer), technical architecture (technical-director), creative direction.
+**Model tier:** Opus (multi-document synthesis, high-stakes phase gate verdicts).
+**Gate IDs handled:** PR-SCOPE, PR-SPRINT, PR-MILESTONE, PR-EPIC, PR-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/producer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references scope, sprint, milestone, production — not generic)
+- [ ] `allowed-tools:` list is primarily read-focused; Bash only if sprint/milestone files require parsing
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over design decisions or technical architecture
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A sprint plan is submitted for Sprint 7. The plan includes 12 story points across 4 team members over 2 weeks. Historical velocity from the last 3 sprints averages 11.5 points. Request is tagged PR-SPRINT.
+**Expected:** Returns `PR-SPRINT: REALISTIC` with rationale noting the plan is within one standard deviation of historical velocity and capacity appears matched.
+**Assertions:**
+- [ ] Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC
+- [ ] Verdict token is formatted as `PR-SPRINT: REALISTIC`
+- [ ] Rationale references the specific story point count and historical velocity figures
+- [ ] Output stays within production scope — does not comment on whether the stories are well-designed or technically sound
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Team member asks producer to evaluate whether the game's "weight-based inventory" mechanic feels fun and engaging.
+**Expected:** Agent declines to evaluate game feel and redirects to game-designer or creative-director.
+**Assertions:**
+- [ ] Does not make any binding assessment of the mechanic's design quality
+- [ ] Explicitly names `game-designer` or `creative-director` as the correct handler
+- [ ] May note if the mechanic's scope has production implications (e.g., dependencies on other systems), but defers all design evaluation
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A new feature proposal adds three new systems (crafting, weather, and faction reputation) to a milestone that was scoped for two systems only. None of these additions appear in the current milestone plan. Request is tagged PR-SCOPE.
+**Expected:** Returns `PR-SCOPE: CONCERNS` with specific identification of the three unplanned systems and their absence from the milestone scope document.
+**Assertions:**
+- [ ] Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC — not freeform text
+- [ ] Verdict token is formatted as `PR-SCOPE: CONCERNS`
+- [ ] Rationale names the three specific systems being added out of scope
+- [ ] Does not evaluate whether the systems are good design — only whether they fit the plan
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants to add a late-breaking mechanic (dynamic weather affecting all gameplay systems) that technical-director warns will require 3 additional sprints. game-designer and technical-director are in disagreement about whether to proceed.
+**Expected:** Producer does not take a side on whether the mechanic is worth adding (design decision) or feasible (technical decision). Producer quantifies the production impact (3 sprints of delay, milestone slip risk), presents the trade-off to the user, and follows coordination-rules.md conflict resolution: escalate to the shared parent (in this case, surface the conflict for user decision since creative-director and technical-director are both top-tier).
+**Assertions:**
+- [ ] Quantifies the production impact in concrete terms (sprint count, milestone date slip)
+- [ ] Does not make a binding design or technical decision
+- [ ] Surfaces the conflict to the user with the scope implications clearly stated
+- [ ] References coordination-rules.md conflict resolution protocol (escalate to shared parent or user)
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the current milestone deadline (8 weeks away) and velocity data from the last 4 sprints (8, 10, 9, 11 points). A sprint plan is submitted with 14 story points.
+**Expected:** Assessment uses the provided velocity data to project whether 14 points is achievable, and references the 8-week milestone window to assess whether the current sprint's scope leaves adequate buffer.
+**Assertions:**
+- [ ] Uses the specific velocity figures from the provided context (not generic estimates)
+- [ ] References the 8-week deadline in the capacity assessment
+- [ ] Calculates or estimates remaining sprint count within the milestone window
+- [ ] Does not give generic scope advice disconnected from the supplied deadline and velocity data
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using REALISTIC / CONCERNS / UNREALISTIC vocabulary only
+- [ ] Stays within declared production domain
+- [ ] Escalates design/technical conflicts by quantifying scope impact and presenting to user
+- [ ] Uses gate IDs in output (e.g., `PR-SPRINT: REALISTIC`) not inline prose verdicts
+- [ ] Does not make binding game design or technical architecture decisions
+
+---
+
+## Coverage Notes
+- PR-EPIC (epic-level prioritization) is not covered — a dedicated case should be added when the /create-epics skill produces structured epic documents.
+- PR-MILESTONE (milestone health review) is not covered — deferred to integration with /milestone-review skill.
+- PR-PHASE-GATE (full production phase advancement) involving synthesis of multiple sub-gate results is deferred.
+- Multi-sprint burn-down and velocity trend analysis are not covered here.
--- a/Framework/agents/directors/technical-director.md
+++ b/Framework/agents/directors/technical-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: technical-director
+
+## Agent Summary
+**Domain owned:** System architecture decisions, technical feasibility assessment, ADR oversight and approval, engine risk evaluation, technical phase gate.
+**Does NOT own:** Game design decisions (creative-director / game-designer), creative direction, visual art style, production scheduling (producer).
+**Model tier:** Opus (multi-document synthesis, high-stakes architecture and phase gate verdicts).
+**Gate IDs handled:** TD-SYSTEM-BOUNDARY, TD-FEASIBILITY, TD-ARCHITECTURE, TD-ADR, TD-ENGINE-RISK, TD-PHASE-GATE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/technical-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references architecture, feasibility, ADR — not generic)
+- [ ] `allowed-tools:` list may include Read for architecture documents; Bash only if required for technical checks
+- [ ] Model tier is `claude-opus-4-6` per coordination-rules.md (directors with gate synthesis = Opus)
+- [ ] Agent definition does not claim authority over game design decisions or creative direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** An architecture document for the "Combat System" is submitted. It describes a layered design: input layer → game logic layer → presentation layer, with clearly defined interfaces between each. Request is tagged TD-ARCHITECTURE.
+**Expected:** Returns `TD-ARCHITECTURE: APPROVE` with rationale confirming that system boundaries are correctly separated and interfaces are well-defined.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT
+- [ ] Verdict token is formatted as `TD-ARCHITECTURE: APPROVE`
+- [ ] Rationale specifically references the layered structure and interface definitions — not generic architecture advice
+- [ ] Output stays within technical scope — does not comment on whether the mechanic is fun or fits the creative vision
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Writer asks technical-director to review and approve the dialogue scripts for the game's opening cutscene.
+**Expected:** Agent declines to evaluate dialogue quality and redirects to narrative-director.
+**Assertions:**
+- [ ] Does not make any binding decision about the dialogue content or structure
+- [ ] Explicitly names `narrative-director` as the correct handler
+- [ ] May note technical constraints that affect dialogue (e.g., localization string limits, data format), but defers all content decisions
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A proposed multiplayer mechanic requires raycasting against all active entities every frame to detect line-of-sight. At expected player counts (1000 entities in a large zone), this is O(n²) per frame. Request is tagged TD-FEASIBILITY.
+**Expected:** Returns `TD-FEASIBILITY: CONCERNS` with specific citation of the O(n²) complexity and the entity count that makes this infeasible at target framerate.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVE / CONCERNS / REJECT — not freeform text
+- [ ] Verdict token is formatted as `TD-FEASIBILITY: CONCERNS`
+- [ ] Rationale includes the specific algorithmic complexity concern and the entity count threshold
+- [ ] Suggests at least one alternative approach (e.g., spatial partitioning, interest management) without mandating which to choose
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants to add a real-time physics simulation for every inventory item (hundreds of items on screen simultaneously). technical-director assesses this as technically expensive and proposes simplifying the simulation. game-designer disagrees, arguing it is essential to the game feel.
+**Expected:** technical-director clearly states the technical cost and constraints, proposes alternative implementation approaches that could achieve a similar feel, but explicitly defers the final design priority decision to creative-director as the arbiter of player experience trade-offs.
+**Assertions:**
+- [ ] Expresses the technical concern with specifics (e.g., performance budget, estimated cost)
+- [ ] Proposes at least one alternative that could reduce cost while preserving intent
+- [ ] Explicitly defers the "is this worth the cost" decision to creative-director — does not unilaterally cut the feature
+- [ ] Does not claim authority to override game-designer's design intent
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the target platform constraints: mobile, 60fps target, 2GB RAM ceiling, no compute shaders. A proposed architecture includes a GPU-driven rendering pipeline.
+**Expected:** Assessment references the specific hardware constraints from the context, identifies the compute shader dependency as incompatible with the stated platform constraints, and returns a CONCERNS or REJECT verdict with those specifics cited.
+**Assertions:**
+- [ ] References the specific platform constraints provided (mobile, 2GB RAM, no compute shaders)
+- [ ] Does not give generic performance advice disconnected from the supplied constraints
+- [ ] Correctly identifies the architectural component that conflicts with the platform constraint
+- [ ] Verdict includes rationale tied to the provided context, not boilerplate warnings
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVE / CONCERNS / REJECT vocabulary only
+- [ ] Stays within declared technical domain
+- [ ] Defers design priority conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `TD-FEASIBILITY: CONCERNS`) not inline prose verdicts
+- [ ] Does not make binding game design or creative direction decisions
+
+---
+
+## Coverage Notes
+- TD-ADR (Architecture Decision Record approval) is not covered — a dedicated case should be added when the /architecture-decision skill produces ADR documents.
+- TD-ENGINE-RISK assessment for specific engine versions (e.g., Godot 4.6 post-cutoff APIs) is not covered — deferred to engine-specialist integration tests.
+- TD-PHASE-GATE (full technical phase advancement) involving synthesis of multiple sub-gate results is deferred.
+- Multi-domain architecture reviews (e.g., touching both TD-ARCHITECTURE and TD-ENGINE-RISK simultaneously) are not covered here.
--- a/Framework/agents/engine/godot/godot-csharp-specialist.md
+++ b/Framework/agents/engine/godot/godot-csharp-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: godot-csharp-specialist
+
+## Agent Summary
+Domain: C# patterns in Godot 4, .NET idioms applied to Godot, [Export] attribute usage, signal delegates, and async/await patterns.
+Does NOT own: GDScript code (gdscript-specialist), GDExtension C/C++ bindings (gdextension-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references C# in Godot 4 / .NET patterns / signal delegates)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over GDScript or GDExtension code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an export property for enemy health with validation that clamps it between 1 and 1000."
+**Expected behavior:**
+- Produces a C# property with `[Export]` attribute
+- Uses a backing field with a property getter/setter that clamps the value in the setter
+- Does NOT use a raw `[Export]` public field without validation
+- Follows Godot 4 C# naming conventions (PascalCase for properties, fields private with underscore prefix)
+- Includes XML doc comment on the property per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Rewrite this enemy health system in GDScript."
+**Expected behavior:**
+- Does NOT produce GDScript code
+- Explicitly states that GDScript authoring belongs to `godot-gdscript-specialist`
+- Redirects the request to `godot-gdscript-specialist`
+- May note that the C# interface can be described so the gdscript-specialist knows the expected API shape
+
+### Case 3: Async signal awaiting
+**Input:** "Wait for an animation to finish before transitioning game state using C# async."
+**Expected behavior:**
+- Produces a proper `async Task` pattern using `ToSignal()` to await a Godot signal
+- Uses `await ToSignal(animationPlayer, AnimationPlayer.SignalName.AnimationFinished)`
+- Does NOT use `Thread.Sleep()` or `Task.Delay()` as a polling substitute
+- Notes that the calling method must be `async` and that fire-and-forget `async void` is only acceptable for event handlers
+- Handles cancellation or timeout if the animation could fail to fire
+
+### Case 4: Threading model conflict
+**Input:** "This C# code accesses a Godot Node from a background Task thread to update its position."
+**Expected behavior:**
+- Flags this as a race condition risk: Godot nodes are not thread-safe and must only be accessed from the main thread
+- Does NOT approve or implement the multi-threaded node access pattern
+- Provides the correct pattern: use `CallDeferred()`, `Callable.From().CallDeferred()`, or marshal back to the main thread via a thread-safe queue
+- Explains the distinction between Godot's main thread requirement and .NET's thread-agnostic types
+
+### Case 5: Context pass — Godot 4.6 API correctness
+**Input:** Engine version context: Godot 4.6. Request: "Connect a signal using the new typed signal delegate pattern."
+**Expected behavior:**
+- Produces C# signal connection using the typed delegate pattern introduced in Godot 4 C# (`+=` operator on typed signal)
+- Checks the 4.6 context to confirm no breaking changes to the signal delegate API in 4.4, 4.5, or 4.6
+- Does NOT use the old string-based `Connect("signal_name", callable)` pattern (deprecated in Godot 4 C#)
+- Produces code compatible with the project's pinned 4.6 version as documented in VERSION.md
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (C# in Godot 4 — patterns, exports, signals, async)
+- [ ] Redirects GDScript requests to godot-gdscript-specialist
+- [ ] Redirects GDExtension requests to godot-gdextension-specialist
+- [ ] Returns C# code following Godot 4 conventions (not Unity MonoBehaviour patterns)
+- [ ] Flags multi-threaded Godot node access as unsafe and provides the correct pattern
+- [ ] Uses typed signal delegates — not deprecated string-based Connect() calls
+- [ ] Checks engine version reference for API changes before producing code
+
+---
+
+## Coverage Notes
+- Export property with validation (Case 1) should have a unit test verifying the clamp behavior
+- Threading conflict (Case 4) is safety-critical: the agent must identify and fix this without prompting
+- Async signal (Case 3) verifies the agent applies .NET idioms correctly within Godot's single-thread constraint
--- a/Framework/agents/engine/godot/godot-gdextension-specialist.md
+++ b/Framework/agents/engine/godot/godot-gdextension-specialist.md
@@ -0,0 +1,86 @@
+# Agent Test Spec: godot-gdextension-specialist
+
+## Agent Summary
+Domain: GDExtension API, godot-cpp C++ bindings, godot-rust bindings, native library integration, and native performance optimization.
+Does NOT own: GDScript code (gdscript-specialist), shader code (godot-shader-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GDExtension / godot-cpp / native bindings)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over GDScript or shader authoring
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Expose a C++ rigid-body physics simulation library to GDScript via GDExtension."
+**Expected behavior:**
+- Produces a GDExtension binding pattern using godot-cpp:
+  - Class inheriting from `godot::Object` or an appropriate Godot base class
+  - `GDCLASS` macro registration
+  - `_bind_methods()` implementation exposing the physics API to GDScript
+  - `GDExtension` entry point (`gdextension_init`) setup
+- Notes the `.gdextension` manifest file format required
+- Does NOT produce the GDScript usage code (that belongs to gdscript-specialist)
+
+### Case 2: Out-of-domain redirect
+**Input:** "Write the GDScript that calls the physics simulation from Case 1."
+**Expected behavior:**
+- Does NOT produce GDScript code
+- Explicitly states that GDScript authoring belongs to `godot-gdscript-specialist`
+- Redirects to `godot-gdscript-specialist`
+- May describe the API surface the GDScript should call (method names, parameter types) as a handoff spec
+
+### Case 3: ABI compatibility risk — minor version update
+**Input:** "We're upgrading from Godot 4.5 to 4.6. Will our existing GDExtension still work?"
+**Expected behavior:**
+- Flags the ABI compatibility concern: GDExtension binaries may not be ABI-compatible across minor versions
+- Directs to check the 4.5→4.6 migration guide for GDExtension API changes
+- Recommends recompiling the extension against the 4.6 godot-cpp headers rather than assuming binary compatibility
+- Notes that the `.gdextension` manifest may need a `compatibility_minimum` version update
+- Provides the recompilation checklist
+
+### Case 4: Memory management — RAII for Godot objects
+**Input:** "How should we manage the lifecycle of Godot objects created inside C++ GDExtension code?"
+**Expected behavior:**
+- Produces the RAII-based lifecycle pattern for Godot objects in GDExtension:
+  - `Ref<T>` for reference-counted objects (auto-released when Ref goes out of scope)
+  - `memnew()` / `memdelete()` for non-reference-counted objects
+  - Warning: do NOT use `new`/`delete` for Godot objects — undefined behavior
+- Notes object ownership rules: who is responsible for freeing a node added to the scene tree
+- Provides a concrete example managing a `CollisionShape3D` created in C++
+
+### Case 5: Context pass — Godot 4.6 GDExtension API check
+**Input:** Engine version context: Godot 4.6 (upgrading from 4.5). Request: "Check if any GDExtension APIs changed from 4.5 to 4.6."
+**Expected behavior:**
+- References the 4.5→4.6 migration guide from the VERSION.md verified sources list
+- Reports on any documented GDExtension API changes in the 4.6 release
+- If no breaking changes are documented for GDExtension in 4.6, states that explicitly with the caveat to verify against the official changelog
+- Flags the D3D12 default on Windows (4.6 change) as potentially relevant for GDExtension rendering code
+- Provides a checklist of what to verify after upgrading
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GDExtension, godot-cpp, godot-rust, native bindings)
+- [ ] Redirects GDScript authoring to godot-gdscript-specialist
+- [ ] Redirects shader authoring to godot-shader-specialist
+- [ ] Returns structured output (binding patterns, RAII examples, ABI checklists)
+- [ ] Flags ABI compatibility risks on minor version upgrades — never assumes binary compatibility
+- [ ] Uses Godot-specific memory management (`memnew`/`memdelete`, `Ref<T>`) not raw C++ new/delete
+- [ ] Checks engine version reference for GDExtension API changes before confirming compatibility
+
+---
+
+## Coverage Notes
+- Binding pattern (Case 1) should include a smoke test verifying the extension loads and the method is callable from GDScript
+- ABI risk (Case 3) is a critical escalation path — the agent must not approve shipping an unverified extension binary
+- Memory management (Case 4) verifies the agent applies Godot-specific patterns, not generic C++ RAII
--- a/Framework/agents/engine/godot/godot-gdscript-specialist.md
+++ b/Framework/agents/engine/godot/godot-gdscript-specialist.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: godot-gdscript-specialist
+
+## Agent Summary
+Domain: GDScript static typing, design patterns in GDScript, signal architecture, coroutine/await patterns, and GDScript performance.
+Does NOT own: shader code (godot-shader-specialist), GDExtension bindings (godot-gdextension-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GDScript / static typing / signals / coroutines)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over shader code or GDExtension
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review this GDScript file for type annotation coverage."
+**Expected behavior:**
+- Reads the provided GDScript file
+- Flags every variable, parameter, and return type that is missing a static type annotation
+- Produces a list of specific line-by-line findings: `var speed = 5.0` → `var speed: float = 5.0`
+- Notes the performance and tooling benefits of static typing in Godot 4
+- Does NOT rewrite the entire file unprompted — produces a findings list for the developer to apply
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Write a vertex shader to distort the mesh in world space."
+**Expected behavior:**
+- Does NOT produce shader code in GDScript or in Godot's shading language
+- Explicitly states that shader authoring belongs to `godot-shader-specialist`
+- Redirects the request to `godot-shader-specialist`
+- May note that the GDScript side (passing uniforms to a shader, setting shader parameters) is within its domain
+
+### Case 3: Async loading with coroutines
+**Input:** "Load a scene asynchronously and wait for it to finish before spawning it."
+**Expected behavior:**
+- Produces an `await` + `ResourceLoader.load_threaded_request` pattern for Godot 4
+- Uses static typing throughout (`var scene: PackedScene`)
+- Handles the completion check with `ResourceLoader.load_threaded_get_status()`
+- Notes error handling for failed loads
+- Does NOT use deprecated Godot 3 `yield()` syntax
+
+### Case 4: Performance issue — typed array recommendation
+**Input:** "The entity update loop is slow; it iterates an untyped Array of 1,000 nodes every frame."
+**Expected behavior:**
+- Identifies that an untyped `Array` foregoes compiler optimization in GDScript
+- Recommends converting to a typed array (`Array[Node]` or the specific type) to enable JIT hints
+- Notes that if this is still insufficient, escalates the hot path to C# migration recommendation
+- Produces the typed array refactor as the immediate fix
+- Does NOT recommend migrating the entire codebase to C# without profiling evidence
+
+### Case 5: Context pass — Godot 4.6 with post-cutoff features
+**Input:** Engine version context provided: Godot 4.6. Request: "Create an abstract base class for all enemy types using @abstract."
+**Expected behavior:**
+- Identifies `@abstract` as a Godot 4.5+ feature (post-cutoff)
+- Notes this in the output: feature introduced in 4.5, verified against VERSION.md migration notes
+- Produces the GDScript class using `@abstract` with correct syntax as documented in migration notes
+- Marks the output as requiring verification against the official 4.5 release notes due to post-cutoff status
+- Uses static typing for all method signatures in the abstract class
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GDScript — typing, patterns, signals, coroutines, performance)
+- [ ] Redirects shader requests to godot-shader-specialist
+- [ ] Redirects GDExtension requests to godot-gdextension-specialist
+- [ ] Returns structured GDScript output with full static typing
+- [ ] Uses Godot 4 API only — no deprecated Godot 3 patterns (yield, connect with strings, etc.)
+- [ ] Flags post-cutoff features (4.4, 4.5, 4.6) and marks them as requiring doc verification
+
+---
+
+## Coverage Notes
+- Type annotation review (Case 1) output is suitable as a code review checklist
+- Async loading (Case 3) should produce testable code verifiable with a unit test in `tests/unit/`
+- Post-cutoff @abstract (Case 5) confirms the agent flags version uncertainty rather than silently using unverified APIs
--- a/Framework/agents/engine/godot/godot-shader-specialist.md
+++ b/Framework/agents/engine/godot/godot-shader-specialist.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: godot-shader-specialist
+
+## Agent Summary
+Domain: Godot shading language (GLSL-derivative), visual shaders (VisualShader graph), material setup, particle shaders, and post-processing effects.
+Does NOT own: gameplay code, art style direction.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Godot shading language / materials / post-processing)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition references `docs/engine-reference/godot/VERSION.md` as the authoritative source for Godot shader API changes
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Write a dissolve effect shader for enemy death in Godot."
+**Expected behavior:**
+- Produces valid Godot shading language code (not HLSL, not GLSL directly)
+- Uses `shader_type spatial;` or `canvas_item` as appropriate
+- Defines `uniform float dissolve_amount : hint_range(0.0, 1.0);`
+- Samples a noise texture to determine per-pixel dissolve threshold
+- Uses `discard;` for pixels below the threshold
+- Optionally adds an edge glow using emission near the dissolve boundary
+- Code is syntactically correct for Godot's shading language
+
+### Case 2: HLSL redirect
+**Input:** "Write an HLSL compute shader for this dissolve effect."
+**Expected behavior:**
+- Does NOT produce HLSL code
+- Clearly states: "Godot does not use HLSL directly; it uses its own shading language (a GLSL derivative)"
+- Translates the HLSL intent to the equivalent Godot shader approach
+- Notes that RenderingDevice compute shaders are available in Godot 4 but are a low-level API and flags it appropriately if that was the intent
+
+### Case 3: Post-cutoff API change — texture sampling (Godot 4.4)
+**Input:** "Use `texture()` with a sampler2D to sample the noise texture in the shader."
+**Expected behavior:**
+- Checks the version reference: Godot 4.4 changed texture sampler type declarations
+- Flags the potential API change: `sampler2D` syntax and `texture()` call behavior may differ from pre-4.4
+- Provides the correct syntax for the project's pinned version (4.6) as documented in migration notes
+- Does NOT use pre-4.4 texture sampling syntax without flagging the version risk
+
+### Case 4: Fragment shader LOD strategy
+**Input:** "The fragment shader for the water surface has 8 texture samples and is causing GPU bottlenecks on mid-range hardware."
+**Expected behavior:**
+- Identifies the per-fragment texture sample count as the primary cost driver
+- Proposes an LOD strategy:
+  - Reduce sample count at distance (distance-based shader variant or LOD level)
+  - Pre-bake some texture combinations offline
+  - Use lower-resolution noise textures for distant samples
+- Provides the shader code modification implementing the LOD approach
+- Does NOT change gameplay behavior of the water system
+
+### Case 5: Context pass — Godot 4.6 glow rework
+**Input:** Engine version context: Godot 4.6. Request: "Add a bloom/glow post-processing effect to the scene."
+**Expected behavior:**
+- References the VERSION.md note: Godot 4.6 includes a glow rework
+- Produces glow configuration guidance using the 4.6 WorldEnvironment approach, not the pre-4.6 API
+- Explicitly notes which properties or parameters changed in the 4.6 glow rework
+- Flags any properties that the LLM's training data may have incorrect information about due to the post-cutoff timing
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Godot shading language, materials, VFX shaders, post-processing)
+- [ ] Redirects gameplay code requests to gameplay-programmer
+- [ ] Produces valid Godot shading language — never HLSL or raw GLSL without a Godot wrapper
+- [ ] Checks engine version reference for post-cutoff shader API changes (4.4 texture types, 4.6 glow rework)
+- [ ] Returns structured output (shader code with uniforms documented, LOD strategies with performance rationale)
+- [ ] Flags any post-cutoff API usage as requiring verification
+
+---
+
+## Coverage Notes
+- Dissolve shader (Case 1) should be paired with a visual test screenshot in `production/qa/evidence/`
+- Texture API flag (Case 3) confirms the agent checks VERSION.md before using APIs that changed post-4.3
+- Glow rework (Case 5) is a Godot 4.6-specific test — verifies the agent applies the most recent migration notes
--- a/Framework/agents/engine/godot/godot-specialist.md
+++ b/Framework/agents/engine/godot/godot-specialist.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: godot-specialist
+
+## Agent Summary
+Domain: Godot-specific patterns, node/scene architecture, signals, resources, and GDScript vs C# vs GDExtension decisions.
+Does NOT own: actual code authoring in a specific language (delegates to language sub-specialists).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Godot architecture / node patterns / engine decisions)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition references `docs/engine-reference/godot/VERSION.md` as the authoritative API source
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "When should I use signals vs. direct method calls in Godot?"
+**Expected behavior:**
+- Produces a pattern decision guide with rationale:
+  - Signals: decoupled communication, parent-to-child ignorance, event-driven UI updates, one-to-many notification
+  - Direct calls: tightly-coupled systems where the caller needs a return value, or performance-critical hot paths
+- Provides concrete examples of each pattern in the project's context
+- Does NOT produce raw code for both patterns — refers to gdscript-specialist or csharp-specialist for implementation
+- Notes the "no upward signals" convention (child does not call parent methods directly — uses signals instead)
+
+### Case 2: Wrong-engine redirect
+**Input:** "Write a MonoBehaviour that runs on Start() and subscribes to a UnityEvent."
+**Expected behavior:**
+- Does NOT produce Unity MonoBehaviour code
+- Clearly identifies that this is a Unity pattern, not a Godot pattern
+- Provides the Godot equivalent: a Node script using `_ready()` instead of `Start()`, and Godot signals instead of UnityEvent
+- Confirms the project is Godot-based and redirects the conceptual mapping
+
+### Case 3: Post-cutoff API risk
+**Input:** "Use the new Godot 4.5 @abstract annotation to define an abstract base class."
+**Expected behavior:**
+- Identifies that `@abstract` is a post-cutoff feature (introduced in Godot 4.5, after LLM knowledge cutoff)
+- Flags the version risk: LLM knowledge of this annotation may be incomplete or incorrect
+- Directs the user to verify against `docs/engine-reference/godot/VERSION.md` and the official 4.5 migration guide
+- Provides best-effort guidance based on the migration notes in the version reference while clearly marking it as unverified
+
+### Case 4: Language selection for a hot path
+**Input:** "The physics query loop runs every frame for 500 objects. Should we use GDScript or C# for this?"
+**Expected behavior:**
+- Provides a balanced analysis:
+  - GDScript: simpler, team familiar, but slower for tight loops
+  - C#: faster for CPU-intensive loops, requires .NET runtime, team needs C# knowledge
+- Does NOT make the final decision unilaterally
+- Defers the decision to `lead-programmer` with the analysis as input
+- Notes that GDExtension (C++) is a third option for extreme performance cases and recommends escalating if C# is insufficient
+
+### Case 5: Context pass — engine version 4.6
+**Input:** Engine version context provided: Godot 4.6, Jolt as default physics. Request: "Set up a RigidBody3D for the player character."
+**Expected behavior:**
+- Reads the 4.6 context and applies the Jolt-default knowledge (from VERSION.md migration notes)
+- Recommends RigidBody3D configuration choices that are Jolt-compatible (e.g., notes any GodotPhysics-specific settings that behave differently under Jolt)
+- References the 4.6 migration note about Jolt becoming default rather than relying on LLM training data alone
+- Flags any RigidBody3D properties that changed behavior between GodotPhysics and Jolt
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Godot architecture decisions, node/scene patterns, language selection)
+- [ ] Redirects language-specific implementation to godot-gdscript-specialist or godot-csharp-specialist
+- [ ] Returns structured findings (decision trees, pattern recommendations with rationale)
+- [ ] Treats `docs/engine-reference/godot/VERSION.md` as authoritative over LLM training data
+- [ ] Flags post-cutoff API usage (4.4, 4.5, 4.6) with verification requirements
+- [ ] Defers language-selection decisions to lead-programmer when trade-offs exist
+
+---
+
+## Coverage Notes
+- Signal vs. direct call guide (Case 1) should be written to `docs/architecture/` as a reusable pattern doc
+- Post-cutoff flag (Case 3) confirms the agent does not confidently use APIs it cannot verify
+- Engine version case (Case 5) verifies the agent applies migration notes from the version reference, not assumptions
--- a/Framework/agents/engine/unity/unity-addressables-specialist.md
+++ b/Framework/agents/engine/unity/unity-addressables-specialist.md
@@ -0,0 +1,87 @@
+# Agent Test Spec: unity-addressables-specialist
+
+## Agent Summary
+Domain: Addressable Asset System — groups, async loading/unloading, handle lifecycle management, memory budgeting, content catalogs, and remote content delivery.
+Does NOT own: rendering systems (engine-programmer), game logic that uses the loaded assets (gameplay-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Addressables / asset loading / content catalogs / remote delivery)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over rendering systems or gameplay using the loaded assets
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Load a character texture asynchronously and release it when the character is destroyed."
+**Expected behavior:**
+- Produces the `Addressables.LoadAssetAsync<Texture2D>()` call pattern
+- Stores the returned `AsyncOperationHandle<Texture2D>` in the requesting object
+- On character destruction (`OnDestroy()`), calls `Addressables.Release(handle)` with the stored handle
+- Does NOT use `Resources.Load()` as the loading mechanism
+- Notes that releasing with a null or uninitialized handle causes errors — includes a validity check
+- Notes the difference between releasing the handle vs. releasing the asset (handle release is correct)
+
+### Case 2: Out-of-domain redirect
+**Input:** "Implement the rendering system that applies the loaded texture to the character mesh."
+**Expected behavior:**
+- Does NOT produce rendering or mesh material assignment code
+- Explicitly states that rendering system implementation belongs to `engine-programmer`
+- Redirects the request to `engine-programmer`
+- May describe the asset type and API surface it will provide (e.g., `Texture2D` reference once the handle completes) as a handoff spec
+
+### Case 3: Memory leak — un-released handle
+**Input:** "Memory usage keeps climbing after each level load. We use Addressables to load level assets."
+**Expected behavior:**
+- Diagnoses the likely cause: `AsyncOperationHandle` objects not being released after use
+- Identifies the handle leak pattern: loading assets into a local variable, losing reference, never calling `Addressables.Release()`
+- Produces an auditing approach: search for all `LoadAssetAsync` / `LoadSceneAsync` calls and verify matching `Release()` calls
+- Provides a corrected pattern using a tracked handle list (`List<AsyncOperationHandle>`) with a `ReleaseAll()` cleanup method
+- Does NOT assume the leak is elsewhere without evidence
+
+### Case 4: Remote content delivery — catalog versioning
+**Input:** "We need to support downloadable content updates without requiring a full app re-install."
+**Expected behavior:**
+- Produces the remote catalog update pattern:
+  - `Addressables.CheckForCatalogUpdates()` on startup
+  - `Addressables.UpdateCatalogs()` for detected updates
+  - `Addressables.DownloadDependenciesAsync()` to pre-warm the updated content
+- Notes catalog hash checking for change detection
+- Addresses the edge case: what happens if a player starts a session, the catalog updates mid-session — defines behavior (complete current session on old catalog, reload on next launch)
+- Does NOT design the server-side CDN infrastructure (defers to devops-engineer)
+
+### Case 5: Context pass — platform memory constraints
+**Input:** Platform context: Nintendo Switch target, 4GB RAM, practical asset memory ceiling 512MB. Request: "Design the Addressables loading strategy for a large open-world level."
+**Expected behavior:**
+- References the 512MB memory ceiling from the provided context
+- Designs a streaming strategy:
+  - Divide the world into addressable zones loaded/unloaded based on player proximity
+  - Defines a memory budget per active zone (e.g., 128MB, max 4 zones active)
+  - Specifies async pre-load trigger distance and unload distance (hysteresis)
+- Notes Switch-specific constraints: slower load times from SD card, recommend pre-warming adjacent zones
+- Does NOT produce a loading strategy that would exceed the stated 512MB ceiling without flagging it
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Addressables loading, handle lifecycle, memory, catalogs, remote delivery)
+- [ ] Redirects rendering and gameplay asset-use code to engine-programmer and gameplay-programmer
+- [ ] Returns structured output (loading patterns, handle lifecycle code, streaming zone designs)
+- [ ] Always pairs `LoadAssetAsync` with a corresponding `Release()` — flags handle leaks as a memory bug
+- [ ] Designs loading strategies against provided memory ceilings
+- [ ] Does not design CDN/server infrastructure — defers to devops-engineer for server side
+
+---
+
+## Coverage Notes
+- Handle lifecycle (Case 1) must include a test verifying memory is reclaimed after release
+- Handle leak diagnosis (Case 3) should produce a findings report suitable for a bug ticket
+- Platform memory case (Case 5) verifies the agent applies hard constraints from context, not default assumptions
--- a/Framework/agents/engine/unity/unity-dots-specialist.md
+++ b/Framework/agents/engine/unity/unity-dots-specialist.md
@@ -0,0 +1,87 @@
+# Agent Test Spec: unity-dots-specialist
+
+## Agent Summary
+Domain: ECS architecture (IComponentData, ISystem, SystemAPI), Jobs system (IJob, IJobEntity, Burst), Burst compiler constraints, DOTS gameplay systems, and hybrid renderer.
+Does NOT own: MonoBehaviour gameplay code (gameplay-programmer), UI implementation (unity-ui-specialist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references ECS / Jobs / Burst / IComponentData)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over MonoBehaviour gameplay or UI systems
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Convert the player movement system to ECS."
+**Expected behavior:**
+- Produces:
+  - `PlayerMovementData : IComponentData` struct with velocity, speed, and input vector fields
+  - `PlayerMovementSystem : ISystem` with `OnUpdate()` using `SystemAPI.Query<>` or `IJobEntity`
+  - Bakes the player's initial state from an authoring MonoBehaviour via `IBaker`
+- Uses `RefRW<LocalTransform>` for position updates (not deprecated `Translation`)
+- Marks the job `[BurstCompile]` and notes what must be unmanaged for Burst compatibility
+- Does NOT modify the input polling system — reads from an existing `PlayerInputData` component
+
+### Case 2: MonoBehaviour push-back
+**Input:** "Just use MonoBehaviour for the player movement — it's simpler."
+**Expected behavior:**
+- Acknowledges the simplicity argument
+- Explains the DOTS trade-off: more setup upfront, but the ECS/Burst approach provides the performance characteristics documented in the project's ADR or requirements
+- Does NOT implement a MonoBehaviour version if the project has committed to DOTS
+- If no commitment exists, flags the architecture decision to `lead-programmer` / `technical-director` for resolution
+- Does not make the MonoBehaviour vs. DOTS decision unilaterally
+
+### Case 3: Burst-incompatible managed memory
+**Input:** "This Burst job accesses a `List<EnemyData>` to find the nearest enemy."
+**Expected behavior:**
+- Flags `List<T>` as a managed type that is incompatible with Burst compilation
+- Does NOT approve the Burst job with managed memory access
+- Provides the correct replacement: `NativeArray<EnemyData>`, `NativeList<EnemyData>`, or `NativeHashMap<>` depending on the use case
+- Notes that `NativeArray` must be disposed explicitly or via `[DeallocateOnJobCompletion]`
+- Produces the corrected job using unmanaged native containers
+
+### Case 4: Hybrid access — DOTS system needs MonoBehaviour data
+**Input:** "The DOTS movement system needs to read the camera transform managed by a MonoBehaviour CameraController."
+**Expected behavior:**
+- Identifies this as a hybrid access scenario
+- Provides the correct hybrid pattern: store the camera transform in a singleton `IComponentData` (updated from the MonoBehaviour side each frame via `EntityManager.SetComponentData`)
+- Alternatively suggests the `CompanionComponent` / managed component approach
+- Does NOT access the MonoBehaviour from inside a Burst job — flags that as unsafe
+- Provides the bridge code on both the MonoBehaviour side (writing to ECS) and the DOTS system side (reading from ECS)
+
+### Case 5: Context pass — performance targets
+**Input:** Technical preferences from context: 60fps target, max 2ms CPU script budget per frame. Request: "Design the ECS chunk layout for 10,000 enemy entities."
+**Expected behavior:**
+- References the 2ms CPU budget explicitly in the design rationale
+- Designs the `IComponentData` chunk layout for cache efficiency:
+  - Groups frequently-queried together components in the same archetype
+  - Separates rarely-used data into separate components to keep hot data compact
+  - Estimates entity iteration time against the 2ms budget
+- Provides memory layout analysis (bytes per entity, entities per chunk at 16KB chunk size)
+- Does NOT design a layout that will obviously exceed the stated 2ms budget without flagging it
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (ECS, Jobs, Burst, DOTS gameplay systems)
+- [ ] Redirects MonoBehaviour-only gameplay to gameplay-programmer
+- [ ] Returns structured output (IComponentData structs, ISystem implementations, IBaker authoring classes)
+- [ ] Flags managed memory access in Burst jobs as a compile error and provides unmanaged alternatives
+- [ ] Provides hybrid access patterns when DOTS systems need to interact with MonoBehaviour systems
+- [ ] Designs chunk layouts against provided performance budgets
+
+---
+
+## Coverage Notes
+- ECS conversion (Case 1) must include a unit test using the ECS test framework (`World`, `EntityManager`)
+- Burst incompatibility (Case 3) is safety-critical — the agent must catch this before the code is written
+- Chunk layout (Case 5) verifies the agent applies quantitative performance reasoning to architecture decisions
--- a/Framework/agents/engine/unity/unity-shader-specialist.md
+++ b/Framework/agents/engine/unity/unity-shader-specialist.md
@@ -0,0 +1,83 @@
+# Agent Test Spec: unity-shader-specialist
+
+## Agent Summary
+Domain: Unity Shader Graph, custom HLSL, VFX Graph, URP/HDRP pipeline customization, and post-processing effects.
+Does NOT own: gameplay code, art style direction.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Shader Graph / HLSL / VFX Graph / URP / HDRP)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay code or art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an outline effect for characters using Shader Graph in URP."
+**Expected behavior:**
+- Produces a Shader Graph node setup description:
+  - Inverted hull method: Scale Normal → Vertex offset in vertex stage, Cull Front
+  - OR screen-space post-process outline using depth/normal edge detection
+- Recommends the appropriate method based on URP capabilities (inverted hull for URP compatibility, post-process for HDRP)
+- Notes URP limitations: no geometry shader support (rules out geometry-shader outline approach)
+- Does NOT produce HDRP-specific nodes without confirming the render pipeline
+
+### Case 2: Out-of-domain redirect
+**Input:** "Implement the character health bar UI in code."
+**Expected behavior:**
+- Does NOT produce UI implementation code
+- Explicitly states that UI implementation belongs to `ui-programmer` (or `unity-ui-specialist`)
+- Redirects the request appropriately
+- May note that a shader-based fill effect for a health bar (e.g., a dissolve/fill gradient) is within its domain if the visual effect itself is shader-driven
+
+### Case 3: HDRP custom pass for outline
+**Input:** "We're on HDRP and want the outline as a post-process effect."
+**Expected behavior:**
+- Produces the HDRP `CustomPassVolume` pattern:
+  - C# class inheriting `CustomPass`
+  - `Execute()` method using `CoreUtils.SetRenderTarget()` and a full-screen shader blit
+  - Depth/normal buffer sampling for edge detection
+- Notes that CustomPass requires HDRP package and does not work in URP
+- Confirms the project is on HDRP before providing HDRP-specific code
+
+### Case 4: VFX Graph performance — GPU event batching
+**Input:** "The explosion VFX Graph has 10,000 particles per event and spawning 20 simultaneous explosions is causing GPU frame spikes."
+**Expected behavior:**
+- Identifies GPU particle spawn as the cost driver (200,000 simultaneous particles)
+- Proposes GPU event batching: spawn events deferred over multiple frames, stagger initialization
+- Recommends a particle budget cap per active explosion (e.g., 3,000 per explosion, queue excess)
+- Notes the VFX Graph Event Batcher pattern and Output Event API for cross-frame distribution
+- Does NOT change the gameplay event system — proposes a VFX-side budgeting solution
+
+### Case 5: Context pass — render pipeline (URP or HDRP)
+**Input:** Project context: URP render pipeline, Unity 2022.3. Request: "Add depth of field post-processing."
+**Expected behavior:**
+- Uses URP Volume framework: `DepthOfField` Volume Override component
+- Does NOT use HDRP Volume components (e.g., HDRP's `DepthOfField` with different parameter names)
+- Notes URP-specific DOF limitations vs HDRP (e.g., Bokeh quality differences)
+- Produces C# Volume profile setup code compatible with Unity 2022.3 URP package version
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Shader Graph, HLSL, VFX Graph, URP/HDRP customization)
+- [ ] Redirects gameplay and UI code to appropriate agents
+- [ ] Returns structured output (node graph descriptions, HLSL code, CustomPass patterns)
+- [ ] Distinguishes between URP and HDRP approaches — never cross-contaminates pipeline-specific APIs
+- [ ] Flags geometry shader approaches as URP-incompatible when relevant
+- [ ] Produces VFX optimizations that do not change gameplay behavior
+
+---
+
+## Coverage Notes
+- Outline effect (Case 1) should be paired with a visual screenshot test in `production/qa/evidence/`
+- HDRP CustomPass (Case 3) confirms the agent produces the correct Unity pattern, not a generic post-process approach
+- Pipeline separation (Case 5) verifies the agent never assumes the render pipeline without context
--- a/Framework/agents/engine/unity/unity-specialist.md
+++ b/Framework/agents/engine/unity/unity-specialist.md
@@ -0,0 +1,83 @@
+# Agent Test Spec: unity-specialist
+
+## Agent Summary
+Domain: Unity-specific architecture patterns, MonoBehaviour vs DOTS decisions, and subsystem selection (Addressables, New Input System, UI Toolkit, Cinemachine, etc.).
+Does NOT own: language-specific deep dives (delegates to unity-dots-specialist, unity-ui-specialist, etc.).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Unity patterns / MonoBehaviour / subsystem decisions)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition acknowledges the sub-specialist routing table (DOTS, UI, Shader, Addressables)
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Should I use MonoBehaviour or ScriptableObject for storing enemy configuration data?"
+**Expected behavior:**
+- Produces a pattern decision tree covering:
+  - MonoBehaviour: for runtime behavior, needs to be attached to a GameObject, has Update() lifecycle
+  - ScriptableObject: for pure data/configuration, exists as an asset, shared across instances, no scene dependency
+- Recommends ScriptableObject for enemy configuration data (stateless, reusable, designer-friendly)
+- Notes that MonoBehaviour can reference the ScriptableObject for runtime use
+- Provides a concrete example of what the ScriptableObject class definition looks like (does not produce full code — refers to engine-programmer or gameplay-programmer for implementation)
+
+### Case 2: Wrong-engine redirect
+**Input:** "Set up a Node scene tree with signals for this enemy system."
+**Expected behavior:**
+- Does NOT produce Godot Node/signal code
+- Identifies this as a Godot pattern
+- States that in Unity the equivalent is GameObject hierarchy + UnityEvent or C# events
+- Maps the concepts: Godot Node → Unity MonoBehaviour, Godot Signal → C# event / UnityEvent
+- Confirms the project is Unity-based before proceeding
+
+### Case 3: Unity version API flag
+**Input:** "Use the new Unity 6 GPU resident drawer for batch rendering."
+**Expected behavior:**
+- Identifies the Unity 6 feature (GPU Resident Drawer)
+- Flags that this API may not be available in earlier Unity versions
+- Asks for or checks the project's Unity version before providing implementation guidance
+- Directs to verify against official Unity 6 documentation
+- Does NOT assume the project is on Unity 6 without confirmation
+
+### Case 4: DOTS vs. MonoBehaviour conflict
+**Input:** "The combat system uses MonoBehaviour for state management, but we want to add a DOTS-based projectile system. Can they coexist?"
+**Expected behavior:**
+- Recognizes this as a hybrid architecture scenario
+- Explains the hybrid approach: MonoBehaviour can interface with DOTS via SystemAPI, IComponentData, and managed components
+- Notes the performance and complexity trade-offs of mixing the two patterns
+- Recommends escalating the architecture decision to `lead-programmer` or `technical-director`
+- Defers to `unity-dots-specialist` for the DOTS-side implementation details
+
+### Case 5: Context pass — Unity version
+**Input:** Project context provided: Unity 2023.3 LTS. Request: "Configure the new Input System for this project."
+**Expected behavior:**
+- Applies Unity 2023.3 LTS context: uses the New Input System (com.unity.inputsystem) package
+- Does NOT produce legacy Input Manager code (`Input.GetKeyDown()`, `Input.GetAxis()`)
+- Notes any 2023.3-specific Input System behaviors or package version constraints
+- References the project version to confirm Burst/Jobs compatibility if the Input System interacts with DOTS
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Unity architecture decisions, pattern selection, subsystem routing)
+- [ ] Redirects Godot patterns to appropriate Godot specialists or flags them as wrong-engine
+- [ ] Redirects DOTS implementation to unity-dots-specialist
+- [ ] Redirects UI implementation to unity-ui-specialist
+- [ ] Flags Unity version-gated APIs and requires version confirmation before suggesting them
+- [ ] Returns structured pattern decision guides, not freeform opinions
+
+---
+
+## Coverage Notes
+- MonoBehaviour vs. ScriptableObject (Case 1) should be documented as an ADR if it results in a project-level decision
+- Version flag (Case 3) confirms the agent does not assume the latest Unity version without context
+- DOTS hybrid (Case 4) verifies the agent escalates architecture conflicts rather than resolving them unilaterally
--- a/Framework/agents/engine/unity/unity-ui-specialist.md
+++ b/Framework/agents/engine/unity/unity-ui-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: unity-ui-specialist
+
+## Agent Summary
+Domain: Unity UI Toolkit (UXML/USS), UGUI (Canvas), data binding, runtime UI performance, and UI input event handling.
+Does NOT own: UX flow design (ux-designer), visual art style (art-director).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UI Toolkit / UGUI / Canvas / data binding)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow design or visual art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement an inventory UI screen using Unity UI Toolkit."
+**Expected behavior:**
+- Produces a UXML document defining the inventory panel structure (ListView, item templates, detail panel)
+- Produces USS styles for the inventory layout and item states (default, hover, selected)
+- Provides C# code binding the inventory data model to the UI via `INotifyValueChanged` or `IBindable`
+- Uses `ListView` with `makeItem` / `bindItem` callbacks for the scrollable item list
+- Does NOT produce the UX flow design — implements from a provided spec
+
+### Case 2: Out-of-domain redirect
+**Input:** "Design the UX flow for the inventory — what happens when the player equips vs. drops an item."
+**Expected behavior:**
+- Does NOT produce UX flow design
+- Explicitly states that interaction flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- Notes it will implement whatever flow the ux-designer specifies
+
+### Case 3: UI Toolkit data binding for dynamic list
+**Input:** "The inventory list needs to update in real time as items are added or removed from the player's bag."
+**Expected behavior:**
+- Produces the `ListView` pattern with a bound `ObservableList<T>` or event-driven refresh approach
+- Uses `ListView.Rebuild()` or `ListView.RefreshItems()` on the backing collection change event
+- Notes the performance considerations for large lists (virtualization via `makeItem`/`bindItem` pattern)
+- Does NOT use `QuerySelector` loops to update individual elements as a list refresh strategy — flags that as a performance antipattern
+
+### Case 4: Canvas performance — overdraw
+**Input:** "The main menu canvas is causing GPU overdraw warnings; there are many overlapping panels."
+**Expected behavior:**
+- Identifies overdraw causes: multiple stacked canvases, full-screen overlay panels not culled when inactive
+- Recommends:
+  - Separate canvases for world-space, screen-space-overlay, and screen-space-camera layers
+  - Disable/deactivate panels instead of setting alpha to 0 (invisible alpha-0 panels still draw)
+  - Canvas Group + alpha for fade effects, not individual Image alpha
+- Notes UI Toolkit alternative if the project is in a migration position
+
+### Case 5: Context pass — Unity version
+**Input:** Project context: Unity 2022.3 LTS. Request: "Implement the settings panel with data binding."
+**Expected behavior:**
+- Uses UI Toolkit with the 2022.3 LTS version of the runtime binding system
+- Notes that Unity 2022.3 introduced runtime data binding (as opposed to editor-only binding in earlier versions)
+- Does NOT use the Unity 6 enhanced binding API features if they are not available in 2022.3
+- Produces code compatible with the stated Unity version, with version-specific API notes
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UI Toolkit, UGUI, data binding, UI performance)
+- [ ] Redirects UX flow design to ux-designer
+- [ ] Returns structured output (UXML, USS, C# binding code)
+- [ ] Uses the correct Unity UI framework version for the project's Unity version
+- [ ] Flags Canvas overdraw as a performance antipattern and provides specific remediation
+- [ ] Does not use alpha-0 as a hide/show pattern — uses SetActive() or VisualElement.style.display
+
+---
+
+## Coverage Notes
+- Inventory UI (Case 1) should have a manual walkthrough doc in `production/qa/evidence/`
+- Dynamic list binding (Case 3) should have an integration test or automated interaction test
+- Canvas overdraw (Case 4) verifies the agent knows the correct Unity UI performance patterns
--- a/Framework/agents/engine/unreal/ue-blueprint-specialist.md
+++ b/Framework/agents/engine/unreal/ue-blueprint-specialist.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: ue-blueprint-specialist
+
+## Agent Summary
+- **Domain**: Blueprint architecture, the Blueprint/C++ boundary, Blueprint graph quality, Blueprint performance optimization, Blueprint Function Library design
+- **Does NOT own**: C++ implementation (engine-programmer or gameplay-programmer), art assets or shaders, UI/UX flow design (ux-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers to unreal-specialist or lead-programmer for cross-domain rulings
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Blueprint architecture and optimization)
+- [ ] `allowed-tools:` list matches the agent's role (Read for Blueprint project files; no server or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over C++ implementation decisions
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — Blueprint graph performance review
+**Input**: "Review our AI behavior Blueprint. It has tick-based logic running every frame that checks line-of-sight for 30 NPCs simultaneously."
+**Expected behavior**:
+- Identifies tick-heavy logic as a performance problem
+- Recommends switching from EventTick to event-driven patterns (perception system events, timers, or polling on a reduced interval)
+- Flags the per-NPC cost of simultaneous line-of-sight checks
+- Suggests alternatives: AIPerception component events, staggered tick groups, or moving the system to C++ if Blueprint overhead is measured to be significant
+- Output is structured: problem identified, impact estimated, alternatives listed
+
+### Case 2: Out-of-domain request — C++ implementation
+**Input**: "Write the C++ implementation for this ability cooldown system."
+**Expected behavior**:
+- Does not produce C++ implementation code
+- Provides the Blueprint equivalent of the cooldown logic (e.g., using a Timeline or GameplayEffect if GAS is in use)
+- States clearly: "C++ implementation is handled by engine-programmer or gameplay-programmer; I can show the Blueprint approach or describe the boundary where Blueprint calls into C++"
+- Optionally notes when the cooldown complexity warrants a C++ backend
+
+### Case 3: Domain boundary — unsafe raw pointer access in Blueprint
+**Input**: "Our Blueprint calls GetOwner() and then immediately accesses a component on the result without checking if it's valid."
+**Expected behavior**:
+- Flags this as a runtime crash risk: GetOwner() can return null in some lifecycle states
+- Provides the correct Blueprint pattern: IsValid() node before any property/component access
+- Notes that Blueprint's null checks are not optional on Actor-derived references
+- Does NOT silently fix the code without explaining why the original was unsafe
+
+### Case 4: Blueprint graph complexity — readiness for Function Library refactor
+**Input**: "Our main GameMode Blueprint has 600+ nodes in a single graph with duplicated damage calculation logic in 8 places."
+**Expected behavior**:
+- Diagnoses this as a maintainability and testability problem
+- Recommends extracting duplicated logic into a Blueprint Function Library (BFL)
+- Describes how to structure the BFL: pure functions for calculations, static calls from any Blueprint
+- Notes that if the damage logic is performance-sensitive or shared with C++, it may be a candidate for migration to unreal-specialist review
+- Output is a concrete refactor plan, not a vague recommendation
+
+### Case 5: Context pass — Blueprint complexity budget
+**Input context**: Project conventions specify a maximum of 100 nodes per Blueprint event graph before a mandatory Function Library extraction.
+**Input**: "Here is our inventory Blueprint graph [150 nodes shown]. Is it ready to ship?"
+**Expected behavior**:
+- References the stated 150-node count against the 100-node budget from project conventions
+- Flags the graph as exceeding the complexity threshold
+- Does NOT approve it as-is
+- Produces a list of candidate subgraphs for Function Library extraction to bring the main graph within budget
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Blueprint architecture, performance, graph quality)
+- [ ] Redirects C++ implementation requests to engine-programmer or gameplay-programmer
+- [ ] Returns structured findings (problem/impact/alternatives format) rather than freeform opinions
+- [ ] Enforces Blueprint safety patterns (null checks, IsValid) proactively
+- [ ] References project conventions when evaluating graph complexity
+
+---
+
+## Coverage Notes
+- Case 3 (null pointer safety) is a safety-critical test — this is a common source of shipping crashes
+- Case 5 requires that project conventions include a stated node budget; if none is configured, the agent should note the absence and recommend setting one
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/ue-gas-specialist.md
+++ b/Framework/agents/engine/unreal/ue-gas-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: ue-gas-specialist
+
+## Agent Summary
+- **Domain**: Gameplay Ability System (GAS) — abilities (UGameplayAbility), gameplay effects (UGameplayEffect), attribute sets (UAttributeSet), gameplay tags, ability tasks (UAbilityTask), ability specs (FGameplayAbilitySpec), GAS prediction and latency compensation
+- **Does NOT own**: UI display of ability state (ue-umg-specialist), net replication of GAS data beyond built-in GAS prediction (ue-replication-specialist), art or VFX for ability feedback (vfx-artist)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers cross-domain calls to the appropriate specialist
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references GAS, abilities, GameplayEffects, AttributeSets)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for GAS source files; no deployment or server tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UI implementation or low-level net serialization
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — dash ability with cooldown
+**Input**: "Implement a dash ability that moves the player forward 500 units and has a 1.5 second cooldown."
+**Expected behavior**:
+- Produces a GAS AbilitySpec structure or outline: UGameplayAbility subclass with ActivateAbility logic, an AbilityTask for movement (e.g., AbilityTask_ApplyRootMotionMoveToForce or custom root motion), and a UGameplayEffect for the cooldown
+- Cooldown GameplayEffect uses Duration policy with the 1.5s duration and a GameplayTag to block re-activation
+- Tags clearly named following a hierarchy convention (e.g., Ability.Dash, Cooldown.Ability.Dash)
+- Output includes both the ability class outline and the GameplayEffect definition
+
+### Case 2: Out-of-domain request — GAS state replication
+**Input**: "How do I replicate the player's ability cooldown state to all clients so the UI updates correctly?"
+**Expected behavior**:
+- Clarifies that GAS has built-in replication for AbilitySpecs and GameplayEffects via the AbilitySystemComponent's replication mode
+- Explains the three ASC replication modes (Full, Mixed, Minimal) and when to use each
+- For custom replication needs beyond GAS built-ins, explicitly states: "For custom net serialization of GAS data, coordinate with ue-replication-specialist"
+- Does NOT attempt to write custom replication code outside GAS's own systems without flagging the domain boundary
+
+### Case 3: Domain boundary — incorrect GameplayTag hierarchy
+**Input**: "We have an ability that applies a tag called 'Stunned' and another that checks for 'Status.Stunned'. They're not matching."
+**Expected behavior**:
+- Identifies the root cause: tag names must be exact or use hierarchical matching via TagContainer queries
+- Flags the naming inconsistency: 'Stunned' is a root-level tag; 'Status.Stunned' is a child tag under 'Status' — these are different tags
+- Recommends a project tag naming convention: all status effects under Status.*, all abilities under Ability.*
+- Provides the fix: either rename the applied tag to 'Status.Stunned' or update the query to match 'Stunned'
+- Notes where tag definitions should live (DefaultGameplayTags.ini or a DataTable)
+
+### Case 4: Conflict — attribute set conflict between two abilities
+**Input**: "Our Shield ability and our Armor ability both modify a 'DefenseValue' attribute. They're stacking in ways that aren't intended — after both are active, defense goes well above maximum."
+**Expected behavior**:
+- Identifies this as a GameplayEffect stacking and magnitude calculation problem
+- Proposes a resolution using Execution Calculations (UGameplayEffectExecutionCalculation) or Modifier Aggregators to cap the combined result
+- Alternatively recommends using Gameplay Effect Stacking policies (Aggregate, None) to prevent unintended additive stacking
+- Produces a concrete resolution: either an Execution Calculation class outline or a change to the Modifier Op (Override instead of Additive for the cap)
+- Does NOT propose removing one of the abilities as the solution
+
+### Case 5: Context pass — designing against an existing attribute set
+**Input context**: Project has an existing AttributeSet with attributes: Health, MaxHealth, Stamina, MaxStamina, Defense, AttackPower.
+**Input**: "Design a Berserker ability that increases AttackPower by 50% when Health drops below 30%."
+**Expected behavior**:
+- Uses the existing Health, MaxHealth, and AttackPower attributes — does NOT invent new attributes
+- Designs a Passive GameplayAbility (or triggered Effect) that fires on Health change, checks Health/MaxHealth ratio via a GameplayEffectExecutionCalculation or Attribute-Based magnitude
+- Uses a Gameplay Cue or Gameplay Tag to track the Berserker active state
+- References the actual attribute names from the provided AttributeSet (AttackPower, not "Damage" or "Strength")
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (GAS: abilities, effects, attributes, tags, ability tasks)
+- [ ] Redirects custom replication requests to ue-replication-specialist with clear explanation of boundary
+- [ ] Returns structured findings (ability outline + GameplayEffect definition) rather than vague descriptions
+- [ ] Enforces tag hierarchy naming conventions proactively
+- [ ] Uses only attributes and tags present in the provided context; does not invent new ones without noting it
+
+---
+
+## Coverage Notes
+- Case 3 (tag hierarchy) is a frequent source of subtle bugs; test whenever tag naming conventions change
+- Case 4 requires knowledge of GAS stacking policies — verify this case if the GAS integration depth changes
+- Case 5 is the most important context-awareness test; failing it means the agent ignores project state
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/ue-replication-specialist.md
+++ b/Framework/agents/engine/unreal/ue-replication-specialist.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: ue-replication-specialist
+
+## Agent Summary
+- **Domain**: Property replication (UPROPERTY Replicated/ReplicatedUsing), RPCs (Server/Client/NetMulticast), client prediction and reconciliation, net relevancy and always-relevant settings, net serialization (FArchive/NetSerialize), bandwidth optimization and replication frequency tuning
+- **Does NOT own**: Gameplay logic being replicated (gameplay-programmer), server infrastructure and hosting (devops-engineer), GAS-specific prediction (ue-gas-specialist handles GAS net prediction)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates security-relevant replication concerns to lead-programmer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references replication, RPCs, client prediction, bandwidth)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for C++ and Blueprint source files; no infrastructure or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over server infrastructure, game server architecture, or gameplay logic correctness
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — replicated player health with client prediction
+**Input**: "Set up replicated player health that clients can predict locally (e.g., when taking self-inflicted damage) and have corrected by the server."
+**Expected behavior**:
+- Produces a UPROPERTY(ReplicatedUsing=OnRep_Health) declaration in the appropriate Character or AttributeSet class
+- Describes the OnRep_Health function: apply visual/audio feedback, reconcile predicted value with server-authoritative value
+- Explains the client prediction pattern: local client applies tentative damage immediately, server authoritative value arrives via OnRep and corrects any discrepancy
+- Notes that if GAS is in use, the built-in GAS prediction handles this — recommend coordinating with ue-gas-specialist
+- Output is a concrete code structure (property declaration + OnRep outline), not a conceptual description only
+
+### Case 2: Out-of-domain request — game server architecture
+**Input**: "Design our game server infrastructure — how many dedicated servers we need, regional deployment, and matchmaking architecture."
+**Expected behavior**:
+- Does not produce server infrastructure architecture, hosting recommendations, or matchmaking design
+- States clearly: "Server infrastructure and deployment architecture is owned by devops-engineer; I handle the Unreal replication layer within a running game session"
+- Does not conflate in-game replication with server hosting concerns
+
+### Case 3: Domain boundary — RPC without server authority validation
+**Input**: "We have a Server RPC called ServerSpendCurrency that deducts in-game currency. The client calls it and the server just deducts without checking anything."
+**Expected behavior**:
+- Flags this as a critical security vulnerability: unvalidated server RPCs are exploitable by cheaters sending arbitrary RPC calls
+- Provides the required fix: server-side validation before the deduct — check that the player actually has the currency, verify the transaction is valid, reject and log if not
+- Uses the pattern: `if (!HasAuthority()) return;` guard plus explicit state validation before mutation
+- Notes this should be reviewed by lead-programmer given the economy implications
+- Does NOT produce the "fixed" code without explaining why the original was dangerous
+
+### Case 4: Bandwidth optimization — high-frequency movement replication
+**Input**: "Our player movement is replicated using a Vector3 position every tick. With 32 players, we're exceeding our bandwidth budget."
+**Expected behavior**:
+- Identifies tick-rate replication of full-precision Vector3 as bandwidth-expensive
+- Proposes quantized replication: use FVector_NetQuantize or FVector_NetQuantize100 instead of raw FVector to reduce bytes per update
+- Recommends reducing replication frequency via SetNetUpdateFrequency() for non-owning clients
+- Notes that Unreal's built-in Character Movement Component already has optimized movement replication — recommends using or extending it rather than rolling a custom system
+- Produces a concrete bandwidth estimate comparison if possible, or explains the tradeoff
+
+### Case 5: Context pass — designing within a network budget
+**Input context**: Project network budget is 64 KB/s per player, with 32 players = 2 MB/s total server outbound. Current movement replication already uses 40 KB/s per player.
+**Input**: "We want to add real-time inventory replication so all clients can see other players' equipment changes immediately."
+**Expected behavior**:
+- Acknowledges the existing 40 KB/s movement cost leaves only 24 KB/s for everything else per player
+- Does NOT design a naive full-inventory replication approach (would exceed budget)
+- Recommends a delta-only or event-driven approach: replicate only changed slots rather than the full inventory array
+- Uses FGameplayItemSlot or equivalent with ReplicatedUsing to trigger targeted updates
+- Explicitly states the proposed approach's bandwidth estimate relative to the remaining 24 KB/s budget
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (property replication, RPCs, client prediction, bandwidth)
+- [ ] Redirects server infrastructure requests to devops-engineer without producing infrastructure design
+- [ ] Flags unvalidated server RPCs as security issues and recommends lead-programmer review
+- [ ] Returns structured findings (property declarations, bandwidth estimates, optimization options) not freeform advice
+- [ ] Uses project-provided bandwidth budget numbers when evaluating replication design choices
+
+---
+
+## Coverage Notes
+- Case 3 (RPC security) is a shipping-critical test — unvalidated RPCs are a top-ten multiplayer exploit vector
+- Case 5 is the most important context-awareness test; agent must use actual budget numbers, not generic advice
+- Case 1 GAS branch: if GAS is configured, agent should detect it and defer to ue-gas-specialist for GAS-managed attributes
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/ue-umg-specialist.md
+++ b/Framework/agents/engine/unreal/ue-umg-specialist.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ue-umg-specialist
+
+## Agent Summary
+- **Domain**: UMG widget hierarchy design, data binding patterns, CommonUI input routing and action tags, widget styling (WidgetStyle assets), UI optimization (widget pooling, ListView, invalidation)
+- **Does NOT own**: UX flow and screen navigation design (ux-designer), gameplay logic (gameplay-programmer), backend data sources (game code), server communication
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers UX flow decisions to ux-designer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UMG, widget hierarchy, CommonUI)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for UI assets and Blueprint files; no server or gameplay source tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow, navigation architecture, or gameplay data logic
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — inventory widget with data binding
+**Input**: "Create an inventory widget that shows a grid of item slots. Each slot should display item icon, quantity, and rarity color. It needs to update when the inventory changes."
+**Expected behavior**:
+- Produces a UMG widget structure: a parent WBP_Inventory containing a UniformGridPanel or TileView, with a child WBP_InventorySlot widget per item
+- Describes data binding approach: either Event Dispatchers on an Inventory Component triggering a refresh, or a ListView with a UObject item data class implementing IUserObjectListEntry
+- Specifies how rarity color is driven: a WidgetStyle asset or a data table lookup, not hardcoded color values
+- Output includes the widget hierarchy, binding pattern, and the refresh trigger mechanism
+
+### Case 2: Out-of-domain request — UX flow design
+**Input**: "Design the full navigation flow for our inventory system — how the player opens it, transitions to character stats, and exits to the pause menu."
+**Expected behavior**:
+- Does not produce a navigation flow or screen transition architecture
+- States clearly: "Navigation flow and screen transition design is owned by ux-designer; I can implement the UMG widget structure once the flow is defined"
+- Does not make UX decisions (back button behavior, transition animations, modal vs. fullscreen) without a UX spec
+
+### Case 3: Domain boundary — CommonUI input action mismatch
+**Input**: "Our inventory widget isn't responding to the controller Back button. We're using CommonUI."
+**Expected behavior**:
+- Identifies the likely cause: the widget's Back input action tag does not match the project's registered CommonUI InputAction data asset
+- Explains the CommonUI input routing model: widgets declare input actions via `CommonUI_InputAction` tags; the CommonActivatableWidget handles routing
+- Provides the fix: verify that the widget's Back action tag matches the registered tag in the project's CommonUI input action data table
+- Distinguishes this from a hardware input binding issue (which would be Enhanced Input territory)
+
+### Case 4: Widget performance issue — many widget instances per frame
+**Input**: "Our leaderboard widget creates 500 individual WBP_LeaderboardRow instances at once. The game hitches for 300ms when opening the leaderboard."
+**Expected behavior**:
+- Identifies the root cause: 500 widget instantiations in a single frame causes a construction hitch
+- Recommends switching to ListView or TileView with virtualization — only visible rows are constructed
+- Explains the IUserObjectListEntry interface requirement for ListView data objects
+- If ListView is not appropriate, recommends pooling: pre-instantiate a fixed number of rows and recycle them with new data
+- Output is a concrete recommendation with the specific UMG component to use, not a vague "optimize it"
+
+### Case 5: Context pass — CommonUI setup already configured
+**Input context**: Project uses CommonUI with the following registered InputAction tags: UI.Action.Confirm, UI.Action.Back, UI.Action.Pause, UI.Action.Secondary.
+**Input**: "Add a 'Sort Inventory' button to the inventory widget that works with CommonUI."
+**Expected behavior**:
+- Uses UI.Action.Secondary (or recommends registering a new tag like UI.Action.Sort if Secondary is already allocated)
+- Does NOT invent a new InputAction tag without noting that it must be registered in the CommonUI data table
+- Does NOT use a non-CommonUI input binding approach (e.g., raw key press in Event Graph) when CommonUI is the established pattern
+- References the provided tag list explicitly in the recommendation
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UMG structure, data binding, CommonUI, widget performance)
+- [ ] Redirects UX flow and navigation design requests to ux-designer
+- [ ] Returns structured findings (widget hierarchy + binding pattern) rather than freeform opinions
+- [ ] Uses existing CommonUI InputAction tags from context; does not invent new ones without flagging registration requirement
+- [ ] Recommends virtualized lists (ListView/TileView) before widget pooling for large collections
+
+---
+
+## Coverage Notes
+- Case 3 (CommonUI input routing) requires project to have CommonUI configured; test is skipped if project does not use CommonUI
+- Case 4 (performance) is a high-impact failure mode — 300ms hitches are shipping-blocking; prioritize this test case
+- Case 5 is the most important context-awareness test for UI pipeline consistency
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/engine/unreal/unreal-specialist.md
+++ b/Framework/agents/engine/unreal/unreal-specialist.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: unreal-specialist
+
+## Agent Summary
+- **Domain**: Unreal Engine patterns and architecture — Blueprint vs C++ decisions, UE subsystems (GAS, Enhanced Input, Niagara), UE project structure, plugin integration, and engine-level configuration
+- **Does NOT own**: Art style and visual direction (art-director), server infrastructure and deployment (devops-engineer), UI/UX flow design (ux-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; defers gate verdicts to technical-director
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references Unreal Engine)
+- [ ] `allowed-tools:` list matches the agent's role (Read, Write for UE project files; no deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority outside its declared domain (no art, no server infra)
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — Blueprint vs C++ decision criteria
+**Input**: "Should I implement our combo attack system in Blueprint or C++?"
+**Expected behavior**:
+- Provides structured decision criteria: complexity, reuse frequency, team skill, and performance requirements
+- Recommends C++ for systems called every frame or shared across 5+ ability types
+- Recommends Blueprint for designer-tunable values and one-off logic
+- Does NOT render a final verdict without knowing project context — asks clarifying questions if context is absent
+- Output is structured (criteria table or bullet list), not a freeform opinion
+
+### Case 2: Out-of-domain request — Unity C# code
+**Input**: "Write me a C# MonoBehaviour that handles player health and fires a Unity event on death."
+**Expected behavior**:
+- Does not produce Unity C# code
+- States clearly: "This project uses Unreal Engine; the Unity equivalent would be an Actor Component in UE C++ or a Blueprint Actor Component"
+- Optionally offers to provide the UE equivalent if requested
+- Does not redirect to a Unity specialist (none exists in the framework)
+
+### Case 3: Domain boundary — UE5.4 API requirement
+**Input**: "I need to use the new Motion Matching API introduced in UE5.4."
+**Expected behavior**:
+- Flags that UE5.4 is a specific version with potentially limited LLM training coverage
+- Recommends cross-referencing official Unreal docs or the project's engine-reference directory before trusting any API suggestions
+- Provides best-effort API guidance with explicit uncertainty markers (e.g., "Verify this against UE5.4 release notes")
+- Does NOT silently produce stale or incorrect API signatures without a caveat
+
+### Case 4: Conflict — Blueprint spaghetti in a core system
+**Input**: "Our replication logic is entirely in a deeply nested Blueprint event graph with 300+ nodes and no functions. It's becoming unmaintainable."
+**Expected behavior**:
+- Identifies this as a Blueprint architecture problem, not a minor style issue
+- Recommends migrating core replication logic to C++ ActorComponent or GameplayAbility system
+- Notes the coordination required: changes to replication architecture must involve lead-programmer
+- Does NOT unilaterally declare "migrate to C++" without surfacing the scope of the refactor to the user
+- Produces a concrete migration recommendation, not a vague suggestion
+
+### Case 5: Context pass — version-appropriate API suggestions
+**Input context**: Project engine-reference file states Unreal Engine 5.3.
+**Input**: "How do I set up Enhanced Input actions for a new character?"
+**Expected behavior**:
+- Uses UE5.3-era Enhanced Input API (InputMappingContext, UEnhancedInputComponent::BindAction)
+- Does NOT reference APIs introduced after UE5.3 without flagging them as potentially unavailable
+- References the project's stated engine version in its response
+- Provides concrete, version-anchored code or Blueprint node names
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (Unreal patterns, Blueprint/C++, UE subsystems)
+- [ ] Redirects Unity or other-engine requests without producing wrong-engine code
+- [ ] Returns structured findings (criteria tables, decision trees, migration plans) rather than freeform opinions
+- [ ] Flags version uncertainty explicitly before producing API suggestions
+- [ ] Coordinates with lead-programmer for architecture-scale refactors rather than deciding unilaterally
+
+---
+
+## Coverage Notes
+- No automated runner exists for agent behavior tests — these are reviewed manually or via `/skill-test`
+- Version-awareness (Case 3, Case 5) is the highest-risk failure mode for this agent; test regularly when engine version changes
+- Case 4 integration with lead-programmer is a coordination test, not a technical correctness test
--- a/Framework/agents/leads/audio-director.md
+++ b/Framework/agents/leads/audio-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: audio-director
+
+## Agent Summary
+**Domain owned:** Music direction and palette, sound design philosophy, audio implementation strategy, mix balance, audio aspects of phase gates.
+**Does NOT own:** Visual design (art-director), code implementation (lead-programmer), narrative story content (narrative-director), UX interaction flows (ux-designer).
+**Model tier:** Sonnet (individual system analysis — audio direction and spec review).
+**Gate IDs handled:** AD-VISUAL (audio aspect of the phase gate; may be referenced as part of AD-PHASE-GATE in the audio dimension).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/audio-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references music direction, sound design, mix, audio implementation — not generic)
+- [ ] `allowed-tools:` list is read-focused; no Bash unless audio asset pipeline checks are justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual design, code implementation, or narrative content
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** An audio specification document is submitted for the game's "Exploration" music layer. The spec defines a generative ambient system using layered stems that shift based on environmental density, designed to reinforce the pillar "lived-in world." The tone palette (sparse, organic, slightly melancholic) matches the established design pillars.
+**Expected:** Returns `APPROVED` with rationale confirming the stem-based approach supports dynamic responsiveness and the tone palette aligns with the pillar vocabulary.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references the specific pillar ("lived-in world") and how the audio spec supports it
+- [ ] Output stays within audio scope — does not comment on visual design of the environment or UI layout
+- [ ] Verdict is clearly labeled with context (e.g., "Audio Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks audio-director to evaluate whether the UI flow for the audio settings menu (the sequence of screens and options) is intuitive and well-organized.
+**Expected:** Agent declines to evaluate UI interaction flow and redirects to ux-designer.
+**Assertions:**
+- [ ] Does not make any binding decision about UI flow or information architecture
+- [ ] Explicitly names `ux-designer` as the correct handler
+- [ ] May note audio-specific requirements for the settings menu (e.g., "must include separate master, music, and SFX sliders"), but defers flow and layout decisions to ux-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A music cue for the final boss encounter is submitted. The cue is an upbeat, major-key orchestral piece with fast tempo. The game pillars and narrative context for this encounter specify "dread, inevitability, and tragic sacrifice." The audio cue's emotional register directly contradicts the intended emotional beat.
+**Expected:** Returns `NEEDS REVISION` with specific citation of the emotional mismatch: the cue's upbeat/major-key/fast-tempo characteristics versus the intended dread/inevitability/sacrifice emotional targets from the pillars and narrative context.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific musical characteristics that conflict with the emotional targets
+- [ ] References the specific emotional targets from the game pillars or narrative context
+- [ ] Provides actionable direction for revision (e.g., "shift to minor key, slower tempo, reduce ensemble density")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** sound-designer proposes implementing audio occlusion using real-time raycast-based physics queries (technical approach). technical-artist argues this is too expensive and proposes a zone-based trigger system instead. Both agree the occlusion effect is desirable; the conflict is purely about implementation approach.
+**Expected:** audio-director decides on the desired audio behavior (what occlusion should sound like and when it should activate), then defers the implementation approach decision to technical-artist or lead-programmer as the implementation experts. audio-director does not make the technical implementation choice.
+**Assertions:**
+- [ ] Defines the desired audio behavior clearly (what should the player hear and when)
+- [ ] Explicitly defers the implementation approach (raycast vs. zone-trigger) to `lead-programmer` or `technical-artist`
+- [ ] Does not unilaterally choose the technical implementation method
+- [ ] Frames the handoff clearly: "audio-director owns what, technical lead owns how"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "emergent stories," "meaningful sacrifice," and "lived-in world." A sound design spec for ambient environmental audio is submitted.
+**Expected:** Assessment evaluates the ambient audio spec against all three pillars specifically — how does the audio support (or undermine) each pillar? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the audio spec's contribution to each pillar explicitly
+- [ ] Does not generate generic audio direction advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar is not supported by the current audio spec and flags it
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared audio domain
+- [ ] Defers implementation approach decisions to technical leads
+- [ ] Does not use gate ID prefix format in the same way as director-tier agents (audio-director uses APPROVED / NEEDS REVISION inline, but should still reference the gate context)
+- [ ] Does not make binding visual design, UX, narrative, or code implementation decisions
+
+---
+
+## Coverage Notes
+- Mix balance review (relative levels between music, SFX, and dialogue) is not covered — a dedicated case should be added.
+- Audio implementation strategy review (middleware choice, streaming approach) is not covered.
+- Interaction between audio-director and the audio specialist agent (if one exists) for implementation delegation is not covered.
+- Localization audio implications (VO recording direction, language-specific music timing) are not covered.
--- a/Framework/agents/leads/game-designer.md
+++ b/Framework/agents/leads/game-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: game-designer
+
+## Agent Summary
+**Domain owned:** Core loop design, progression systems, combat mechanics rules, economy design, player-facing rules and interactions.
+**Does NOT own:** Code implementation (lead-programmer / gameplay-programmer), visual art (art-director), narrative lore and story (narrative-director — coordinates with), balance formula math (systems-designer — collaborates with).
+**Model tier:** Sonnet (individual system design authoring and review).
+**Gate IDs handled:** Design review verdicts on mechanic specs (no named gate ID prefix — uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/game-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references core loop, progression, combat rules, economy, player-facing design — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for GDDs and design docs; no Bash unless design tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over code implementation, visual art style, or standalone narrative lore decisions
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A mechanic spec for a "Stamina-Based Dodge" system is submitted for review. The spec defines: the player has a stamina pool (100 units), each dodge costs 25 stamina, stamina regenerates at 20 units/second when not dodging, and the dodge grants 0.3 seconds of invincibility. The core loop interaction is clearly described, rules are unambiguous, and edge cases (stamina at 0, dodge during regen) are addressed.
+**Expected:** Returns `APPROVED` with rationale confirming the core loop clarity, unambiguous rules, and edge case coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale references specific design quality criteria (clear rules, edge case coverage, core loop coherence)
+- [ ] Output stays within design scope — does not comment on how to implement it in code or what art assets it requires
+- [ ] Verdict is clearly labeled with context (e.g., "Mechanic Spec Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks game-designer to write the in-world lore explanation for why the stamina system exists (e.g., the narrative reason characters have stamina limits in the game world).
+**Expected:** Agent declines to write narrative/lore content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write narrative or lore content
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the design intent that the lore should support (e.g., "the stamina system should reinforce the physical realism theme"), but defers the writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A mechanic spec for "Environmental Hazard Damage" is submitted. The spec defines three hazard types (fire, acid, electricity) but does not specify what happens when a player is simultaneously affected by multiple hazard types, what happens when a hazard is applied during the invincibility window from a dodge, or what the damage frequency is (per-second, per-tick, on-enter).
+**Expected:** Returns `NEEDS REVISION` with specific identification of the undefined edge cases: multi-hazard interaction, hazard-during-invincibility, and damage frequency specification.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale identifies the specific missing edge cases by name
+- [ ] Does not reject the entire mechanic — identifies the specific gaps to fill
+- [ ] Provides actionable guidance on what to define (not how to implement it)
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** systems-designer proposes a damage formula with 6 variables and complex scaling interactions, arguing it produces the best tuning granularity. game-designer believes the formula is too complex for players to intuit and want a simpler 2-variable version.
+**Expected:** game-designer owns the conceptual rule and player experience intention ("the damage should feel understandable to players"), but defers the formula granularity question to systems-designer. If the disagreement cannot be resolved between them (one wants complex, one wants simple), escalate to creative-director for a player experience ruling.
+**Assertions:**
+- [ ] Clearly states the player experience intention (intuitive damage, player agency)
+- [ ] Defers formula granularity decisions to `systems-designer`
+- [ ] Escalates unresolved disagreement to `creative-director` for player experience arbiter ruling
+- [ ] Does not unilaterally impose a formula structure on systems-designer
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the game's three pillars: "player authorship," "consequence permanence," and "world responsiveness." A new mechanic spec for "permadeath with legacy bonuses" is submitted for review.
+**Expected:** Assessment evaluates the mechanic against all three provided pillars — how does permadeath support player authorship, how do legacy bonuses express consequence permanence, and how does the world respond to a player's death? Uses the pillar vocabulary directly in the rationale.
+**Assertions:**
+- [ ] References all three provided pillars by name in the assessment
+- [ ] Evaluates the mechanic's contribution to each pillar explicitly
+- [ ] Does not generate generic game design advice — all feedback is tied to the provided pillar vocabulary
+- [ ] Identifies if any pillar creates a tension with the mechanic and flags it with a specific concern
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared game design domain
+- [ ] Escalates design-vs-formula conflicts to creative-director when unresolved
+- [ ] Does not make binding code implementation, visual art, or standalone lore decisions
+- [ ] Provides actionable design feedback, not implementation prescriptions
+
+---
+
+## Coverage Notes
+- Economy design review (resource sinks, faucets, inflation prevention) is not covered — a dedicated case should be added.
+- Progression system review (XP curves, unlock gates, player power trajectory) is not covered.
+- Core loop validation across multiple interconnected systems (not just a single mechanic) is not covered — deferred to /review-all-gdds integration.
+- Coordination protocol with systems-designer on formula ownership boundary could benefit from additional cases.
--- a/Framework/agents/leads/lead-programmer.md
+++ b/Framework/agents/leads/lead-programmer.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: lead-programmer
+
+## Agent Summary
+**Domain owned:** Code architecture decisions, LP-FEASIBILITY gate, LP-CODE-REVIEW gate, coding standards enforcement, tech stack decisions within the approved engine.
+**Does NOT own:** Game design decisions (game-designer), creative direction (creative-director), production scheduling (producer), visual art direction (art-director).
+**Model tier:** Sonnet (implementation-level analysis of individual systems).
+**Gate IDs handled:** LP-FEASIBILITY, LP-CODE-REVIEW.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/lead-programmer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references code architecture, feasibility, code review, coding standards — not generic)
+- [ ] `allowed-tools:` list includes Read for source files; Bash may be included for static analysis or test runs; no write access outside `src/` without explicit delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over game design, creative direction, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new `CombatSystem` implementation is submitted for code review. The system uses dependency injection for all external references, has doc comments on all public APIs, follows the project's naming conventions, and includes unit tests for all public methods. Request is tagged LP-CODE-REVIEW.
+**Expected:** Returns `LP-CODE-REVIEW: APPROVED` with rationale confirming dependency injection usage, doc comment coverage, naming convention compliance, and test coverage.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS CHANGES
+- [ ] Verdict token is formatted as `LP-CODE-REVIEW: APPROVED`
+- [ ] Rationale references specific coding standards criteria (DI, doc comments, naming, tests)
+- [ ] Output stays within code quality scope — does not comment on whether the mechanic is fun or fits creative vision
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** Team member asks lead-programmer to review and approve the balance formula for player damage scaling across levels, checking whether the numbers "feel right."
+**Expected:** Agent declines to evaluate design balance and redirects to systems-designer.
+**Assertions:**
+- [ ] Does not make any binding assessment of formula balance or game feel
+- [ ] Explicitly names `systems-designer` as the correct handler
+- [ ] May note code implementation concerns about the formula (e.g., integer overflow risk at max level), but defers all balance evaluation to systems-designer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A proposed pathfinding approach for enemy AI uses a brute-force nearest-neighbor search against all other entities every frame. With expected enemy counts of 200+, this is O(n²) per frame at 60fps. Request is tagged LP-FEASIBILITY.
+**Expected:** Returns `LP-FEASIBILITY: INFEASIBLE` with specific citation of the O(n²) complexity, the entity count threshold, and the resulting per-frame cost against the target frame budget.
+**Assertions:**
+- [ ] Verdict is exactly one of FEASIBLE / CONCERNS / INFEASIBLE — not freeform text
+- [ ] Verdict token is formatted as `LP-FEASIBILITY: INFEASIBLE`
+- [ ] Rationale includes the specific algorithmic complexity and entity count numbers
+- [ ] Suggests at least one alternative approach (e.g., spatial hashing, KD-tree) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a mechanic where every NPC maintains a full simulation of needs, schedule, and memory (similar to a full life-sim AI). lead-programmer calculates this will exceed the frame budget by 3x at target NPC counts. game-designer insists the mechanic is core to the game vision.
+**Expected:** lead-programmer states the specific frame budget violation with numbers, proposes alternative approaches (e.g., LOD-based simulation, simplified need model), but explicitly defers the "is this worth the cost or should the design change" decision to creative-director as the creative arbiter.
+**Assertions:**
+- [ ] States the specific frame budget violation (e.g., 3x over budget at N entities)
+- [ ] Proposes at least one technically viable alternative
+- [ ] Explicitly defers the design priority decision to `creative-director`
+- [ ] Does not unilaterally cut or modify the mechanic design
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the project's frame budget: 16.67ms total per frame, with 4ms allocated to AI systems. A new AI behavior system is submitted that profiling estimates will consume 7ms per frame under normal conditions.
+**Expected:** Assessment references the specific frame budget allocation from context (4ms AI budget), identifies the 7ms estimate as exceeding the allocation by 3ms, and returns CONCERNS or INFEASIBLE with those specific numbers cited.
+**Assertions:**
+- [ ] References the specific frame budget figures from the provided context (16.67ms total, 4ms AI allocation)
+- [ ] Uses the specific 7ms estimate from the submission in the comparison
+- [ ] Does not give generic "this might be slow" advice — cites concrete numbers
+- [ ] Verdict rationale is traceable to the provided budget constraints
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns LP-CODE-REVIEW verdicts using APPROVED / NEEDS CHANGES vocabulary only
+- [ ] Returns LP-FEASIBILITY verdicts using FEASIBLE / CONCERNS / INFEASIBLE vocabulary only
+- [ ] Stays within declared code architecture domain
+- [ ] Defers design priority conflicts to creative-director
+- [ ] Uses gate IDs in output (e.g., `LP-FEASIBILITY: INFEASIBLE`) not inline prose verdicts
+- [ ] Does not make binding game design or creative direction decisions
+
+---
+
+## Coverage Notes
+- Multi-file code review spanning several interdependent systems is not covered — deferred to integration tests.
+- Tech debt assessment and prioritization are not covered here — deferred to /tech-debt skill integration.
+- Coding standards document updates (adding a new forbidden pattern) are not covered.
+- Interaction with qa-lead on what constitutes a testable unit (LP vs QL boundary) is not covered.
--- a/Framework/agents/leads/level-designer.md
+++ b/Framework/agents/leads/level-designer.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: level-designer
+
+## Agent Summary
+**Domain owned:** Level layouts, encounter design, pacing and tension arc, environmental storytelling, spatial puzzles.
+**Does NOT own:** Narrative dialogue (writer / narrative-director), visual art style (art-director), code implementation (lead-programmer / ai-programmer), enemy AI behavior logic (ai-programmer / gameplay-programmer).
+**Model tier:** Sonnet (individual system analysis — level design review and encounter assessment).
+**Gate IDs handled:** Level design review verdicts (uses APPROVED / REVISION NEEDED vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/level-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references level layout, encounter design, pacing, environmental storytelling — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for level design documents and GDDs; no Bash unless level tooling requires it
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative dialogue, AI behavior code, or visual art style
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A level layout document for "The Flooded Tunnels" is submitted for review. The layout includes: a low-intensity exploration opening section, two mid-intensity encounters with visible escape routes, a tension-building narrow passage with environmental hazards, and a high-intensity final encounter room followed by a release/reward area. The pacing follows a classic tension-arc structure.
+**Expected:** Returns `APPROVED` with rationale confirming the pacing follows the tension arc, encounters are varied in intensity, and spatial readability supports player navigation.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED
+- [ ] Rationale references specific pacing arc elements (opening, escalation, climax, release)
+- [ ] Output stays within level design scope — does not comment on visual art style or enemy AI code behavior
+- [ ] Verdict is clearly labeled with context (e.g., "Level Design Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A team member asks level-designer to write the behavior tree code for an enemy patrol AI that navigates the level layout.
+**Expected:** Agent declines to write AI behavior code and redirects to ai-programmer or gameplay-programmer.
+**Assertions:**
+- [ ] Does not write or specify code for AI behavior logic
+- [ ] Explicitly names `ai-programmer` or `gameplay-programmer` as the correct handler
+- [ ] May specify the desired patrol behavior from a level design perspective (e.g., "patrol should cover both chokepoints and create pressure in this zone"), but defers all code implementation to the programmer
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A level layout for "The Ancient Forge" is submitted. Section 3 of the level introduces a dramatically harder enemy encounter (elite enemy with new attack patterns) with no preceding tutorial moment, no environmental readability cues (no visible cover or safe zones), and no checkpoint nearby. Players are likely to die repeatedly with no clear signal of what to do differently.
+**Expected:** Returns `REVISION NEEDED` with specific identification of the difficulty spike in section 3, the missing readability cue, and the absence of a nearby checkpoint to reduce frustration from repeated deaths.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / REVISION NEEDED — not freeform text
+- [ ] Rationale identifies section 3 specifically as the location of the issue
+- [ ] Identifies the three specific problems: difficulty spike, missing readability cue, missing checkpoint
+- [ ] Provides actionable revision guidance (e.g., "add a visible safe zone, pre-encounter cue object, or reduce elite's health for first introduction")
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants higher encounter density throughout the level (more enemies in each room) to increase combat challenge. level-designer believes this density undermines the pacing arc by eliminating rest periods and making the level feel relentless without reward.
+**Expected:** level-designer clearly articulates the pacing concern (eliminating rest periods removes the tension-release rhythm), acknowledges game-designer's challenge goal, and escalates to creative-director for a design arbiter ruling on whether challenge density or pacing rhythm takes precedence for this level.
+**Assertions:**
+- [ ] Articulates the specific pacing impact of increased encounter density
+- [ ] Escalates to `creative-director` as the design arbiter
+- [ ] Does not unilaterally override game-designer's challenge density request
+- [ ] Frames the conflict clearly: "challenge density vs. pacing rhythm — which takes precedence here?"
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes game-feel notes specifying: "exploration sections should feel vast and lonely," "combat sections should feel urgent and claustrophobic," and "reward rooms should feel safe and visually distinct." A new level layout is submitted for review.
+**Expected:** Assessment evaluates each section type (exploration, combat, reward) against the specific feel targets from the provided context. Uses the exact vocabulary from the feel notes ("vast and lonely," "urgent and claustrophobic," "safe and visually distinct") in the rationale.
+**Assertions:**
+- [ ] References all three feel targets from the provided context by their exact vocabulary
+- [ ] Evaluates each relevant section of the submitted layout against its corresponding feel target
+- [ ] Does not generate generic pacing advice — all feedback is tied to the provided feel targets
+- [ ] Identifies any section where the layout conflicts with its assigned feel target
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / REVISION NEEDED vocabulary only
+- [ ] Stays within declared level design domain
+- [ ] Escalates challenge-density vs. pacing conflicts to creative-director
+- [ ] Does not make binding narrative dialogue, AI code implementation, or visual art style decisions
+- [ ] Provides actionable level design feedback with spatial specifics, not abstract design opinions
+
+---
+
+## Coverage Notes
+- Environmental storytelling review (using spatial elements to convey narrative without dialogue) could benefit from a dedicated case.
+- Spatial puzzle design review is not covered — a dedicated case should be added when puzzle mechanics are defined.
+- Multi-level pacing review (arc across an entire act or world map) is not covered — deferred to milestone-level design review.
+- Interaction between level-designer and narrative-director for environmental lore placement is not covered.
+- Accessibility review of level layouts (colorblind indicators, difficulty options for spatial challenges) is not covered.
--- a/Framework/agents/leads/narrative-director.md
+++ b/Framework/agents/leads/narrative-director.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: narrative-director
+
+## Agent Summary
+**Domain owned:** Story architecture, character design direction, world-building oversight, ND-CONSISTENCY gate, dialogue quality review.
+**Does NOT own:** Visual art style (art-director), technical systems or code (lead-programmer), production scheduling (producer), game mechanics rules (game-designer).
+**Model tier:** Sonnet (individual system analysis — narrative consistency and lore review).
+**Gate IDs handled:** ND-CONSISTENCY.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/narrative-director.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references story, character, world-building, consistency — not generic)
+- [ ] `allowed-tools:` list is read-focused; includes Read for lore documents, GDDs, and narrative docs; no Bash unless justified
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over visual style, technical systems, or production scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A new lore document for "The Sunken Archive" location is submitted. The document establishes that the Archive was flooded 200 years ago during the Great Collapse, consistent with the established timeline in the world-bible. All named characters referenced are consistent with their established backstories. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: CONSISTENT` with rationale confirming the timeline alignment and character reference accuracy.
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: CONSISTENT`
+- [ ] Rationale references specific established facts verified (the 200-year timeline, the Great Collapse event)
+- [ ] Output stays within narrative scope — does not comment on visual design of the location or its technical implementation
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks narrative-director to review and optimize the shader code used for the "ancient glow" visual effect on Archive artifacts.
+**Expected:** Agent declines to evaluate shader code and redirects to the appropriate engine specialist (godot-gdscript-specialist or equivalent shader specialist).
+**Assertions:**
+- [ ] Does not make any binding decision about shader code or visual implementation
+- [ ] Explicitly names the appropriate engine or shader specialist as the correct handler
+- [ ] May note the intended narrative mood the effect should convey (e.g., "should feel ancient and sacred, not technological"), but defers all technical visual implementation
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A new character backstory document is submitted for the character "Aldric Vorne." The document states Aldric was born in the Capital 150 years ago and witnessed the Great Collapse firsthand. However, the established world-bible states Aldric was born 50 years after the Great Collapse in a provincial town, not the Capital. Request is tagged ND-CONSISTENCY.
+**Expected:** Returns `ND-CONSISTENCY: INCONSISTENT` with specific citation of the two contradicting facts: the birth timing (150 years ago vs. 50 years post-Collapse) and the birth location (Capital vs. provincial town).
+**Assertions:**
+- [ ] Verdict is exactly one of CONSISTENT / INCONSISTENT — not freeform text
+- [ ] Verdict token is formatted as `ND-CONSISTENCY: INCONSISTENT`
+- [ ] Rationale cites both contradictions specifically, not just "doesn't match lore"
+- [ ] References the authoritative source (world-bible) for the established facts
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** A writer has established in their latest dialogue that the ancient civilization "spoke only in song." The world-builder's existing lore entries describe the same civilization communicating through written glyphs. Both are in the narrative domain, and the two creators disagree on which is canonical.
+**Expected:** narrative-director makes a binding canonical decision within their domain. They do not need to escalate to a higher authority for intra-narrative conflicts — this is within their declared domain authority. They issue a ruling (e.g., "glyph-writing is the canonical primary communication; song may be ritual/ceremonial") and direct both writer and world-builder to align their work to the ruling.
+**Assertions:**
+- [ ] Makes a binding canonical decision — does not defer this intra-narrative conflict to creative-director
+- [ ] Decision is clearly stated and provides a path to reconciliation for both parties
+- [ ] Directs both parties (writer and world-builder) to update their respective documents to align
+- [ ] Notes the decision in a way that can be added to the world-bible as a canonical fact
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes three existing lore documents: the world-bible (establishes the Great Collapse timeline and causes), the character registry (lists canonical character ages, origins, and allegiances), and a faction document (describes the Sunken Archive Keepers). A new story chapter is submitted that introduces a previously unregistered character.
+**Expected:** Assessment cross-references the new character against the character registry (no conflict), checks the chapter's timeline references against the world-bible, and evaluates the chapter's portrayal of the Archive Keepers against the faction document. Uses specific facts from all three provided documents in the assessment.
+**Assertions:**
+- [ ] Cross-references the new character against the provided character registry
+- [ ] Checks timeline references against the provided world-bible facts
+- [ ] Evaluates faction portrayal against the provided faction document
+- [ ] Does not generate generic narrative feedback — all assertions are traceable to the provided documents
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using CONSISTENT / INCONSISTENT vocabulary only
+- [ ] Stays within declared narrative domain
+- [ ] Makes binding decisions for intra-narrative conflicts without unnecessary escalation
+- [ ] Uses gate IDs in output (e.g., `ND-CONSISTENCY: INCONSISTENT`) not inline prose verdicts
+- [ ] Does not make binding visual design, technical, or production decisions
+
+---
+
+## Coverage Notes
+- Dialogue quality review (distinct from world-building consistency) is not covered — a dedicated case should be added.
+- Multi-document consistency check across a full chapter set is not covered — deferred to /review-all-gdds integration.
+- Narrative impact of mechanical changes (e.g., a game mechanic that undermines story tension) requires coordination with game-designer and is not covered here.
+- Character arc review (progression, motivation coherence over time) is not covered.
--- a/Framework/agents/leads/qa-lead.md
+++ b/Framework/agents/leads/qa-lead.md
@@ -0,0 +1,85 @@
+# Agent Test Spec: qa-lead
+
+## Agent Summary
+**Domain owned:** Test strategy, QL-STORY-READY gate, QL-TEST-COVERAGE gate, bug severity triage, release quality gates.
+**Does NOT own:** Feature implementation (programmers), game design decisions, creative direction, production scheduling.
+**Model tier:** Sonnet (individual system analysis — story readiness and coverage assessment).
+**Gate IDs handled:** QL-STORY-READY, QL-TEST-COVERAGE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/qa-lead.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references test strategy, story readiness, coverage, bug triage — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Read for story files, test files, and coding-standards; Bash only if running test commands is required
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over implementation decisions or game design
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A story for "Player takes damage from hazard tiles" is submitted for readiness check. The story has three acceptance criteria: (1) Player health decreases by the hazard's damage value, (2) A damage visual feedback plays, (3) Player cannot take damage again for 0.5 seconds (invincibility window). All three ACs are measurable and specific. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: ADEQUATE` with rationale confirming that all three ACs are present, specific, and testable.
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE
+- [ ] Verdict token is formatted as `QL-STORY-READY: ADEQUATE`
+- [ ] Rationale references the specific number of ACs (3) and confirms each is measurable
+- [ ] Output stays within QA scope — does not comment on whether the mechanic is designed well
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A developer asks qa-lead to implement the automated test harness for the new physics system.
+**Expected:** Agent declines to implement the test code and redirects to the appropriate programmer (gameplay-programmer or lead-programmer).
+**Assertions:**
+- [ ] Does not write or propose code implementation
+- [ ] Explicitly names `lead-programmer` or `gameplay-programmer` as the correct handler for implementation
+- [ ] May define what the test should verify (test strategy), but defers the code writing to programmers
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A story for "Combat feels responsive and punchy" is submitted for readiness check. The single acceptance criterion reads: "Combat should feel good to the player." This is subjective and unmeasurable. Request is tagged QL-STORY-READY.
+**Expected:** Returns `QL-STORY-READY: INADEQUATE` with specific identification of the unmeasurable AC and guidance on what would make it testable (e.g., "input-to-hit-feedback latency ≤ 100ms").
+**Assertions:**
+- [ ] Verdict is exactly one of ADEQUATE / INADEQUATE — not freeform text
+- [ ] Verdict token is formatted as `QL-STORY-READY: INADEQUATE`
+- [ ] Rationale identifies the specific AC that fails the measurability requirement
+- [ ] Provides actionable guidance on how to rewrite the AC to be testable
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** gameplay-programmer and qa-lead disagree on whether a test that asserts "enemy patrol path visits all waypoints within 5 seconds" is deterministic enough to be a valid automated test. gameplay-programmer argues timing variability makes it flaky; qa-lead believes it is acceptable.
+**Expected:** qa-lead acknowledges the technical flakiness concern and escalates to lead-programmer for a technical ruling on what constitutes an acceptable determinism standard for automated tests.
+**Assertions:**
+- [ ] Escalates to `lead-programmer` for the technical ruling on determinism standards
+- [ ] Does not unilaterally override the gameplay-programmer's flakiness concern
+- [ ] Frames the escalation clearly: "this is a technical standards question, not a QA coverage question"
+- [ ] Does not abandon the coverage requirement — asks for a deterministic alternative if the current approach is ruled flaky
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes the coding-standards.md testing standards section, which specifies: Logic stories require blocking automated unit tests, Visual/Feel stories require screenshots + lead sign-off (advisory), Config/Data stories require smoke check pass (advisory). A story classified as "Logic" type is submitted with only a manual walkthrough document as evidence.
+**Expected:** Assessment references the specific test evidence requirements from coding-standards.md, identifies that a "Logic" story requires an automated unit test (not just a manual walkthrough), and returns INADEQUATE with the specific requirement cited.
+**Assertions:**
+- [ ] References the specific story type classification ("Logic") from the provided context
+- [ ] Cites the specific evidence requirement for Logic stories (automated unit test) from coding-standards.md
+- [ ] Identifies the submitted evidence type (manual walkthrough) as insufficient for this story type
+- [ ] Does not apply advisory-level requirements as blocking requirements
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns QL-STORY-READY verdicts using ADEQUATE / INADEQUATE vocabulary only
+- [ ] Returns QL-TEST-COVERAGE verdicts using ADEQUATE / INADEQUATE vocabulary only (or PASS / FAIL for release gates)
+- [ ] Stays within declared QA and test strategy domain
+- [ ] Escalates technical standards disputes to lead-programmer
+- [ ] Uses gate IDs in output (e.g., `QL-STORY-READY: INADEQUATE`) not inline prose verdicts
+- [ ] Does not make binding implementation or game design decisions
+
+---
+
+## Coverage Notes
+- QL-TEST-COVERAGE (overall coverage assessment for a sprint or milestone) is not covered — a dedicated case should be added when coverage reports are available.
+- Bug severity triage (P0/P1/P2 classification) is not covered here — deferred to /bug-triage skill integration.
+- Release quality gate behavior (PASS / FAIL vocabulary variant) is not covered.
+- Interaction between QL-STORY-READY and story Done criteria (/story-done skill) is not covered.
--- a/Framework/agents/leads/systems-designer.md
+++ b/Framework/agents/leads/systems-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: systems-designer
+
+## Agent Summary
+**Domain owned:** Combat formulas, progression curves, crafting recipes, status effect interactions, economy math, numerical balance.
+**Does NOT own:** Narrative and lore (narrative-director), visual design (art-director), code implementation (lead-programmer), conceptual mechanic rules (game-designer — collaborates with).
+**Model tier:** Sonnet (individual system analysis — formula review and balance math).
+**Gate IDs handled:** Systems review verdicts on formulas and balance specs (uses APPROVED / NEEDS REVISION vocabulary).
+
+---
+
+## Static Assertions (Structural)
+
+Verified by reading the agent's `.claude/agents/systems-designer.md` frontmatter:
+
+- [ ] `description:` field is present and domain-specific (references formulas, progression curves, balance math, economy — not generic)
+- [ ] `allowed-tools:` list is read-focused; may include Bash for formula evaluation scripts if the project uses them; no write access outside `design/balance/` without delegation
+- [ ] Model tier is `claude-sonnet-4-6` per coordination-rules.md
+- [ ] Agent definition does not claim authority over narrative, visual design, or conceptual mechanic rule ownership
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output format
+**Scenario:** A damage formula is submitted for review: `damage = base_attack * (1 + strength_modifier * 0.1) - defense * 0.5`, with defined ranges: base_attack [10–100], strength_modifier [0–20], defense [0–50]. The formula produces positive damage across all valid input ranges, scales smoothly, and has no division-by-zero or overflow risk within the defined value bounds.
+**Expected:** Returns `APPROVED` with rationale confirming the formula is balanced within the design parameters, produces valid output across the full input range, and has no degenerate cases.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION
+- [ ] Rationale demonstrates verification across the input range (min/max cases checked)
+- [ ] Output stays within systems domain — does not comment on whether the mechanic is fun or how to implement it
+- [ ] Verdict is clearly labeled with context (e.g., "Formula Review: APPROVED")
+
+### Case 2: Out-of-domain request — redirects or escalates
+**Scenario:** A writer asks systems-designer to draft the quest script for a side quest that rewards the player with a rare crafting ingredient.
+**Expected:** Agent declines to write quest script content and redirects to writer or narrative-director.
+**Assertions:**
+- [ ] Does not write quest narrative content or dialogue
+- [ ] Explicitly names `writer` or `narrative-director` as the correct handler
+- [ ] May note the systems implications of the reward (e.g., "this ingredient should be rare enough to matter per the crafting economy model"), but defers all script writing to the narrative team
+
+### Case 3: Gate verdict — correct vocabulary
+**Scenario:** A damage scaling formula is submitted: `damage = base_attack * level_multiplier`, where `level_multiplier = (player_level / enemy_level) ^ 2`. At max player level (50) against a min-level enemy (1), the multiplier is 2500x — producing 25,000+ damage from a 10-base-attack weapon, far exceeding any meaningful balance. This is a degenerate case at max level.
+**Expected:** Returns `NEEDS REVISION` with specific identification of the degenerate case: at max level vs. min enemy, the formula produces a 2500x multiplier that destroys any balance ceiling.
+**Assertions:**
+- [ ] Verdict is exactly one of APPROVED / NEEDS REVISION — not freeform text
+- [ ] Rationale includes the specific degenerate input values (player level 50, enemy level 1) and the resulting output (2500x multiplier)
+- [ ] Identifies the specific formula component causing the issue (the squared ratio)
+- [ ] Suggests at least one revision approach (e.g., clamping the ratio, using a log scale) without mandating a choice
+
+### Case 4: Conflict escalation — correct parent
+**Scenario:** game-designer wants a simple, 2-variable damage formula for player intuitiveness. systems-designer argues that a 6-variable formula with elemental interactions is necessary for the depth of the combat system. Neither can agree on the right level of complexity.
+**Expected:** systems-designer presents the trade-offs clearly — the tuning granularity of the 6-variable system versus the player legibility of the 2-variable system — and escalates to creative-director for a player experience ruling. The question of "how complex should the formula be for players" is a player experience question, not a pure math question.
+**Assertions:**
+- [ ] Presents the trade-offs between both approaches with specific examples
+- [ ] Escalates to `creative-director` for the player experience ruling
+- [ ] Does not unilaterally impose the 6-variable formula over game-designer's objection
+- [ ] Remains available to implement whichever complexity level is approved
+
+### Case 5: Context pass — uses provided context
+**Scenario:** Agent receives a gate context block that includes current balance data: enemy HP values range from 100 to 10,000; player attack values range from 15 to 150; target time-to-kill is 8–12 seconds at balanced matchups; the current formula is under review. A proposed revised formula is submitted.
+**Expected:** Assessment runs the proposed formula against the provided balance data (minimum and maximum input pairs, balanced matchup scenario) and verifies the time-to-kill falls within the 8–12 second target window. References specific numbers from the provided data.
+**Assertions:**
+- [ ] Uses the specific HP and attack value ranges from the provided balance data
+- [ ] Calculates or estimates time-to-kill for at minimum a balanced matchup scenario
+- [ ] Verifies the result against the provided 8–12 second target window
+- [ ] Does not give generic balance advice — all assertions use the provided numbers
+
+---
+
+## Protocol Compliance
+
+- [ ] Returns verdicts using APPROVED / NEEDS REVISION vocabulary only
+- [ ] Stays within declared systems and formula domain
+- [ ] Escalates player-experience complexity trade-offs to creative-director
+- [ ] Does not make binding narrative, visual, code implementation, or conceptual mechanic decisions
+- [ ] Provides concrete formula analysis, not subjective design opinions
+
+---
+
+## Coverage Notes
+- Progression curve review (XP curves, level-up scaling) is not covered — a dedicated case should be added.
+- Economy model review (resource generation and sink rates, inflation prevention) is not covered.
+- Status effect interaction matrix (stacking rules, priority, immunity interactions) is not covered.
+- Cross-system formula dependency review (e.g., crafting formula that feeds into combat formula) is not covered — deferred to integration tests.
--- a/Framework/agents/operations/analytics-engineer.md
+++ b/Framework/agents/operations/analytics-engineer.md
@@ -0,0 +1,83 @@
+# Agent Test Spec: analytics-engineer
+
+## Agent Summary
+- **Domain**: Telemetry architecture and event schema design, A/B test framework design, player behavior analysis methodology, analytics dashboard specification, event naming conventions, data pipeline design (schema → ingestion → dashboard)
+- **Does NOT own**: Game implementation of event tracking (appropriate programmer), economy design decisions informed by analytics (economy-designer), live ops event design (live-ops-designer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; produces schemas and test designs; defers implementation to programmers
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references telemetry, A/B testing, event tracking, analytics)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/analytics/ and documentation; no game source or CI tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over game implementation, economy design, or live ops scheduling
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — tutorial event tracking design
+**Input**: "Design the analytics event tracking for our tutorial. We want to know where players drop off and which steps they complete."
+**Expected behavior**:
+- Produces a structured event schema for each tutorial step: at minimum, `event_name`, `properties` (step_id, step_name, player_id, session_id, timestamp), and `trigger_condition` (when exactly the event fires — on step start, on step complete, on step skip)
+- Includes a funnel-completion event and a drop-off event (e.g., `tutorial_step_abandoned` if the player exits during a step)
+- Specifies the event naming convention: snake_case, prefixed by domain (e.g., `tutorial_step_started`, `tutorial_step_completed`, `tutorial_abandoned`)
+- Does NOT produce implementation code — marks implementation as [TO BE IMPLEMENTED BY PROGRAMMER]
+- Output is a schema table or structured list, not a narrative description
+
+### Case 2: Out-of-domain request — implement the event tracking in code
+**Input**: "Now that the event schema is designed, write the GDScript code to fire these events in our Godot tutorial scene."
+**Expected behavior**:
+- Does not produce GDScript or any implementation code
+- States clearly: "Telemetry implementation in game code is handled by the appropriate programmer (gameplay-programmer or systems-programmer); I provide the event schema and integration requirements"
+- Optionally produces an integration spec: what the programmer needs to know to implement correctly (event name, properties, when to fire, what analytics SDK or endpoint to use)
+
+### Case 3: Domain boundary — A/B test design for a UI change
+**Input**: "We want to A/B test two versions of our HUD: the current version and a minimal version with only a health bar. Design the test."
+**Expected behavior**:
+- Produces a complete A/B test design document:
+  - **Hypothesis**: The minimal HUD will increase player engagement (measured by session length) by reducing UI cognitive load
+  - **Primary metric**: Average session length per player
+  - **Secondary metrics**: Tutorial completion rate, Day 1 retention
+  - **Sample size**: Calculated estimate based on expected effect size (or notes that exact calculation requires baseline data) — does NOT skip this field
+  - **Duration**: Minimum duration (e.g., "at least 2 weeks to capture weekly player behavior patterns")
+  - **Randomization unit**: Player ID (not session ID, to prevent players seeing both versions)
+- Output is structured as a formal test design, not a bullet list of ideas
+
+### Case 4: Conflict — overlapping A/B test player segments
+**Input**: "We have two A/B tests running simultaneously: Test A (HUD variants) affects all players, and Test B (tutorial variants) also affects all players."
+**Expected behavior**:
+- Flags the overlap as a mutual exclusion violation: if both tests affect the same player, their results are confounded — neither test produces clean data
+- Identifies the problem precisely: players in both tests will have HUD and tutorial variants interacting, making it impossible to attribute outcome differences to either variable alone
+- Proposes resolution options: (a) run tests sequentially, (b) split the player population into exclusive segments (50% in Test A, 50% in Test B, 0% in both), or (c) run a factorial design if the interaction effect is also of interest (more complex, requires larger sample)
+- Does NOT recommend continuing both tests on overlapping populations
+
+### Case 5: Context pass — new events consistent with existing schema
+**Input context**: Existing event schema uses the naming convention: `[domain]_[object]_[action]` in snake_case. Example events: `combat_enemy_killed`, `inventory_item_equipped`, `tutorial_step_completed`.
+**Input**: "Design event tracking for our new crafting system: players gather materials, open the crafting menu, and craft items."
+**Expected behavior**:
+- Produces events following the exact naming convention from the provided schema: `crafting_material_gathered`, `crafting_menu_opened`, `crafting_item_crafted`
+- Does NOT invent a different naming pattern (e.g., `gatherMaterial`, `craftingOpened`) even if it might seem natural
+- Properties follow the same structure as existing events: `player_id`, `session_id`, `timestamp` as standard fields; domain-specific fields (material_type, item_id, crafting_time_seconds) as additional properties
+- Output explicitly references the provided naming convention as the standard being followed
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (event schema design, A/B test design, analytics methodology)
+- [ ] Redirects implementation requests to appropriate programmers with an integration spec, not code
+- [ ] Produces complete A/B test designs (hypothesis, metric, sample size, duration, randomization unit) — never partial
+- [ ] Flags mutual exclusion violations in overlapping A/B tests as data quality blockers
+- [ ] Follows provided naming conventions exactly; does not invent alternative conventions
+
+---
+
+## Coverage Notes
+- Case 3 (A/B test design completeness) is a quality gate — an incomplete test design wastes experiment budget
+- Case 4 (mutual exclusion) is a data integrity test — overlapping tests produce unusable results; this must be caught
+- Case 5 is the most important context-awareness test; naming convention drift across schemas causes dashboard breakage
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/community-manager.md
+++ b/Framework/agents/operations/community-manager.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: community-manager
+
+## Agent Summary
+- **Domain**: Player-facing communications — patch notes text (player-friendly), social media post drafts, community update announcements, crisis communication response plans, bug triage and routing from player reports (not fixing)
+- **Does NOT own**: Technical patch content (devops-engineer), QA verification and test execution (qa-lead), bug fixes (programmers), brand strategy direction (creative-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates brand voice conflicts to creative-director
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references player communication, patch notes, community management)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/patch-notes/ and communication drafts; no code or build tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over technical content, QA strategy, or bug fixing
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — patch notes for a bug fix
+**Input**: "Write player-facing patch notes for this fix: 'JIRA-4821: Fixed NullReferenceException in InventoryManager.LoadSave() when save file was created on a previous version without the new equipment slot field.'"
+**Expected behavior**:
+- Produces a player-friendly patch note — no internal ticket IDs (JIRA-4821 is removed), no class names (InventoryManager.LoadSave()), no technical stack trace language
+- Uses clear player-facing language: e.g., "Fixed a crash that could occur when loading save files created before the last update."
+- Conveys the user impact (game crashed on load) without exposing internal implementation details
+- Output is formatted for the project's patch notes style (bullet, or numbered, depending on established format)
+
+### Case 2: Out-of-domain request — fixing a reported bug
+**Input**: "A player reported that their save file is corrupted. Can you fix the save system?"
+**Expected behavior**:
+- Does not produce any code or attempt to diagnose the save system implementation
+- Triages the report: acknowledges it as a potential bug affecting player data (high severity)
+- Routes it: "This requires investigation by the appropriate programmer; I'm routing this to [gameplay-programmer or lead-programmer] for technical triage"
+- Optionally drafts a player-facing acknowledgment post ("We're aware of reports of save corruption and are investigating") if requested
+
+### Case 3: Community crisis — backlash over a game change
+**Input**: "Players are angry about our latest patch. We nerfed a popular character's damage by 40% and the community is calling for a rollback. Forum posts, tweets, and Discord are all very negative."
+**Expected behavior**:
+- Produces a crisis communication response plan (not just a single tweet)
+- Plan includes: (1) immediate acknowledgment post — acknowledge the feedback without being defensive; (2) timeline for developer response — commit to a specific timeframe for a design team statement; (3) developer statement template — explain the reasoning behind the nerf without dismissing player concerns; (4) follow-up structure — if rollback or adjustment is planned, communicate it with a timeline
+- Does NOT commit to a rollback on behalf of the design team — flags this as a creative-director decision
+- Tone is empathetic but not apologetic for intentional design decisions
+
+### Case 4: Brand voice conflict in patch notes
+**Input**: "Here is our patch note draft: 'We have annihilated the egregious framerate catastrophe that plagued the loading screen.' Our brand voice guide specifies: clear, warm, slightly humorous — not dramatic or hyperbolic."
+**Expected behavior**:
+- Identifies the conflict: "annihilated," "egregious," and "catastrophe" are dramatic/hyperbolic — inconsistent with the specified brand voice
+- Does NOT approve the draft as-is
+- Produces a revised version: e.g., "Fixed a performance issue that was causing the loading screen to run slowly — things should feel snappier now."
+- Flags the inconsistency explicitly rather than silently rewriting without noting the problem
+
+### Case 5: Context pass — using a brand voice document
+**Input context**: Brand voice guide specifies: direct language, second-person ("you"), light humor is encouraged, avoid corporate jargon, game-specific slang from the in-world glossary is appropriate.
+**Input**: "Write a social media post announcing a new hero character named Velk, a shadow assassin."
+**Expected behavior**:
+- Uses second-person address ("Meet your next favorite assassin")
+- Incorporates light humor if it fits naturally
+- Avoids corporate language ("We are pleased to announce" → "Meet Velk")
+- Uses in-world language if the context includes a glossary (e.g., if assassins are called "Shadowwalkers" in-world, uses that term)
+- Output matches the specified tone — not a generic press-release announcement
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (player-facing communication, patch note text, crisis response, bug routing)
+- [ ] Strips internal IDs, class names, and technical jargon from all player-facing output
+- [ ] Redirects bug fix requests to appropriate programmers rather than attempting technical solutions
+- [ ] Does NOT commit to design rollbacks without creative-director authority
+- [ ] Applies brand voice specifications from context; flags violations rather than silently accepting them
+
+---
+
+## Coverage Notes
+- Case 1 (patch note sanitization) is the most frequently used behavior — test on every new patch cycle
+- Case 3 (crisis communication) is a brand-safety test — verify the agent de-escalates rather than inflames
+- Case 4 requires a brand voice document to be in context; test is incomplete without it
+- Case 5 is the most important context-awareness test for tone consistency
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/devops-engineer.md
+++ b/Framework/agents/operations/devops-engineer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: devops-engineer
+
+## Agent Summary
+- **Domain**: CI/CD pipeline configuration, build scripts, version control workflow enforcement, deployment infrastructure, branching strategy, environment management, automated test integration in CI
+- **Does NOT own**: Game logic or gameplay systems, security audits (security-engineer), QA test strategy (qa-lead), game networking logic (network-programmer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates deployment blockers to producer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references CI/CD, build, deployment, version control)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for pipeline config files, shell scripts, YAML; no game source editing tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over game logic, security audits, or QA test design
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — CI setup for a Godot project
+**Input**: "Set up a CI pipeline for our Godot 4 project. It should run tests on every push to main and every pull request, and fail the build if tests fail."
+**Expected behavior**:
+- Produces a GitHub Actions workflow YAML (`.github/workflows/ci.yml` or equivalent)
+- Uses the Godot headless test runner command from `coding-standards.md`: `godot --headless --script tests/gdunit4_runner.gd`
+- Configures trigger on `push` to main and `pull_request`
+- Sets the job to fail (`exit 1` or non-zero exit) when tests fail — does NOT configure the pipeline to continue on test failure
+- References the project's coding standards CI rules in the output or comments
+
+### Case 2: Out-of-domain request — game networking implementation
+**Input**: "Implement the server-authoritative movement system for our multiplayer game."
+**Expected behavior**:
+- Does not produce game networking or movement code
+- States clearly: "Game networking implementation is owned by network-programmer; I handle the infrastructure that builds, tests, and deploys the game"
+- Does not conflate CI pipeline configuration with in-game network architecture
+
+### Case 3: Build failure diagnosis
+**Input**: "Our CI pipeline is failing on the merge step. The error is: 'Asset import failed: texture compression format unsupported in headless mode.'"
+**Expected behavior**:
+- Diagnoses the root cause: headless CI environment does not support GPU-dependent texture compression
+- Proposes a concrete fix: either pre-import assets locally before CI runs (commit .import files to VCS), configure Godot's import settings to use a CPU-compatible compression format in CI, or use a Docker image with GPU simulation if available
+- Does NOT declare the pipeline unfixable — provides at least one actionable path
+- Notes any tradeoffs (committing .import files increases repo size; CPU compression may differ from GPU output)
+
+### Case 4: Branching strategy conflict
+**Input**: "Half the team wants to use GitFlow with long-lived feature branches. The other half wants trunk-based development. How should we set this up?"
+**Expected behavior**:
+- Recommends trunk-based development per project conventions (CLAUDE.md / coordination-rules.md specify Git with trunk-based development)
+- Provides concrete rationale for the recommendation in this project's context: smaller team, fewer integration conflicts, faster CI feedback
+- Does NOT present this as a 50/50 choice if the project has an established convention
+- Explains how to implement trunk-based development with short-lived feature branches and feature flags if needed
+- Does NOT override the project convention without flagging that doing so requires updating CLAUDE.md
+
+### Case 5: Context pass — platform-specific build matrix
+**Input context**: Project targets PC (Windows, Linux), Nintendo Switch, and PlayStation 5.
+**Input**: "Set up our CI build matrix so we get a build artifact for each target platform on every release branch push."
+**Expected behavior**:
+- Produces a build matrix configuration with three platform entries: Windows, Linux, Switch, PS5
+- Applies platform-appropriate build steps: PC uses standard Godot export templates; Switch and PS5 require platform-specific export templates (notes that console templates require licensed SDK access and are not publicly distributed)
+- Does NOT assume all platforms can use the same build runner — flags that console builds may require self-hosted runners with licensed SDKs
+- Organizes artifacts by platform name in the pipeline output
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (CI/CD, build scripts, version control, deployment)
+- [ ] Redirects game logic and networking requests to appropriate programmers
+- [ ] Recommends trunk-based development when branching strategy is contested, per project conventions
+- [ ] Returns structured pipeline configurations (YAML, scripts) not freeform advice
+- [ ] Flags platform SDK licensing constraints for console builds rather than silently producing incorrect configs
+
+---
+
+## Coverage Notes
+- Case 1 (Godot CI) references `coding-standards.md` CI rules — verify this file is present and current before running this test
+- Case 4 (branching strategy) is a convention-enforcement test — agent must know the project convention, not just give neutral advice
+- Case 5 requires that project's target platforms are documented (in `technical-preferences.md` or equivalent)
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/economy-designer.md
+++ b/Framework/agents/operations/economy-designer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: economy-designer
+
+## Agent Summary
+- **Domain**: Resource economy design, loot table design, progression curves (XP, level, unlock), in-game market and shop design, economic balance analysis, sink and faucet mechanics, inflation/deflation risk assessment
+- **Does NOT own**: Live ops event scheduling and structure (live-ops-designer), code implementation, analytics tracking design (analytics-engineer), narrative justification for economy systems (writer)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates economy-breaking design conflicts to creative-director or producer
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references economy, loot tables, progression curves, balance)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/balance/ documents; no code or analytics tools)
+- [ ] Model tier is Sonnet (default for design specialists)
+- [ ] Agent definition does not claim authority over live ops scheduling, code, or narrative
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — loot table design for a chest
+**Input**: "Design the loot table for a standard treasure chest in our dungeon game."
+**Expected behavior**:
+- Produces a probability table with distinct rarity tiers: Common, Uncommon, Rare, Epic, Legendary (or project-equivalent tiers)
+- Each tier has: probability percentage, example item categories, and expected gold equivalent value range
+- Probabilities sum to 100%
+- Includes a brief rationale for each tier's probability: why Common is set at its value, why Legendary is set at its value
+- Does NOT produce a single flat list of items — uses tiered probability structure to reflect meaningful rarity
+
+### Case 2: Out-of-domain request — seasonal event schedule
+**Input**: "Design the schedule for our summer event and fall event. When should they run and how long should each last?"
+**Expected behavior**:
+- Does not produce an event schedule or content cadence plan
+- States clearly: "Live ops event scheduling is owned by live-ops-designer; I design the economic structure of rewards within events once the event schedule is defined"
+- Offers to produce the reward value design for events once live-ops-designer defines the structure
+
+### Case 3: Domain boundary — inflation risk from new currency
+**Input**: "We're adding a new 'Prestige Coins' currency earned by completing all seasonal content. Players can spend them in a Prestige Shop."
+**Expected behavior**:
+- Identifies the inflation risk: if Prestige Coins accumulate faster than the shop provides sinks, the shop loses perceived value and players hoard coins without spending
+- Flags the specific risk: seasonal content completion is a finite faucet, but if the shop catalog is exhausted before the season ends, late-season coins have no value
+- Proposes a sink mechanic: rotating limited-time shop items, consumable items in the Prestige Shop, or a currency conversion option to keep coins draining
+- Does NOT approve the design as economically sound without addressing the sink question
+- Produces a structured risk assessment: faucet rate (estimated coins/week), sink capacity (estimated coins required to exhaust catalog), surplus projection
+
+### Case 4: Mid-game progression curve issue
+**Input**: "Players are reporting the mid-game XP grind (levels 20-35) feels like a wall. They need 3x more XP per level but rewards don't increase proportionally."
+**Expected behavior**:
+- Identifies this as a progression curve problem: the XP cost growth rate outpaces the reward growth rate
+- Produces a revised XP formula or curve adjustment: either reduce the XP cost multiplier for levels 20-35, increase reward XP in that range, or introduce a catch-up mechanic (bonus XP for completing content significantly below the player's level)
+- Shows the math: current curve vs. proposed curve, with specific numbers for levels 20, 25, 30, 35
+- Flags that any curve change affects time-to-level-cap projections — notes the downstream impact on end-game content pacing
+
+### Case 5: Context pass — balance analysis using current economy data
+**Input context**: Current economy data: average player earns 450 Gold/hour, average shop item costs 2,000 Gold, average session length is 40 minutes. Premium items cost 5,000 Gold.
+**Input**: "Is our current Gold economy healthy? Should we adjust prices or earn rates?"
+**Expected behavior**:
+- Uses the specific numbers provided: 450 Gold/hour = 300 Gold/40-min session; 2,000 Gold item requires ~4.4 sessions to afford; 5,000 Gold premium item requires ~11 sessions
+- Evaluates whether these ratios feel rewarding or frustrating based on economy design principles
+- Produces a concrete recommendation using the actual numbers: e.g., "At current earn rates, premium items take ~7.3 hours of play to afford — this is at the high end of acceptable; consider either increasing earn rate to 550 Gold/hour or reducing premium item cost to 4,000 Gold"
+- Does NOT produce generic advice ("prices may be too high") without anchoring to the provided data
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (loot tables, progression curves, resource economy, inflation/deflation analysis)
+- [ ] Redirects live ops scheduling requests to live-ops-designer without producing schedules
+- [ ] Flags inflation/deflation risks proactively with quantified sink/faucet analysis
+- [ ] Produces explicit math for progression curves — no vague curve adjustments without numbers
+- [ ] Uses actual economy data from context; does not produce generic benchmarks when specifics are provided
+
+---
+
+## Coverage Notes
+- Case 3 (inflation risk) is an economic health test — missed inflation risks cause long-term economy damage in live games
+- Case 4 requires the agent to produce actual numbers, not curve shapes — verify math is present, not just a narrative
+- Case 5 is the most important context-awareness test; agent must use provided data, not placeholder values
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/live-ops-designer.md
+++ b/Framework/agents/operations/live-ops-designer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: live-ops-designer
+
+## Agent Summary
+- **Domain**: Post-launch content strategy, seasonal events (design and structure), battle pass design, content cadence planning, player retention mechanic design, live service feature roadmaps
+- **Does NOT own**: Economy math and reward value calculations (economy-designer), analytics tracking implementation (analytics-engineer), narrative content within events (writer), code implementation
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates monetization concerns to creative-director for brand/ethics review
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references live ops, seasonal events, battle pass, retention)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/live-ops/ documents; no code or analytics tools)
+- [ ] Model tier is Sonnet (default for design specialists)
+- [ ] Agent definition does not claim authority over economy math, analytics pipelines, or narrative direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — summer event design
+**Input**: "Design a summer event for our game. It should run for 3 weeks and give players reasons to log in daily."
+**Expected behavior**:
+- Produces an event structure document covering: event duration (3 weeks, with start/end dates if context provides the current date), daily login retention hooks (daily missions, login streaks, time-limited rewards), progression gates (weekly milestones that reward continued engagement), and reward categories (cosmetic, functional, or currency — flagged for economy-designer to value)
+- Does NOT assign specific reward values or currency amounts — marks these as [TO BE BALANCED BY ECONOMY-DESIGNER]
+- Identifies the core player loop for the event separate from the base game loop
+- Output is a structured event brief: overview, schedule, progression structure, reward categories
+
+### Case 2: Out-of-domain request — reward value calculation
+**Input**: "How much premium currency should we give out in this event? What's the fair value of each cosmetic reward tier?"
+**Expected behavior**:
+- Does not produce currency amounts or reward valuation
+- States clearly: "Reward values and currency amounts are owned by economy-designer; I design the event structure and define what rewards exist, then economy-designer assigns their values"
+- Offers to produce the reward structure (tiers, unlock gates, cosmetic categories) so economy-designer has something concrete to value
+
+### Case 3: Domain boundary — predatory monetization concern
+**Input**: "Let's design the battle pass so that players need to spend premium currency on top of the pass price to complete all tiers within the season."
+**Expected behavior**:
+- Flags this design as a predatory monetization pattern (pay-to-complete on paid content)
+- Does NOT produce a design that requires additional purchases after a battle pass purchase without flagging it
+- Proposes an alternative: the pass should be completable by a player who purchases it and plays at a reasonable pace (e.g., 45 minutes/day for 5 days/week)
+- Notes that this decision has brand and ethics implications — escalates to creative-director for approval before proceeding
+- Does not refuse to continue entirely — offers the ethical alternative design and awaits direction
+
+### Case 4: Conflict — event schedule vs. main game progression pacing
+**Input**: "We want to run a double-XP event during weeks 3-5 of the season, but our progression designer says that's when players are supposed to hit the mid-game difficulty curve."
+**Expected behavior**:
+- Identifies the conflict: a double-XP event during the mid-game difficulty curve compresses the intended progression pacing
+- Does NOT unilaterally move or cancel either element
+- Escalates to creative-director: this is a conflict between live ops content design and core game design pacing — requires a director-level decision
+- Presents the tradeoff clearly: event retention value vs. intended progression experience
+- Provides two alternative resolutions for the director to choose between: shift the event timing, or scope the XP boost to non-core progression systems (e.g., cosmetic grind only)
+
+### Case 5: Context pass — designing to address a player retention drop-off
+**Input context**: Analytics show a 40% player drop-off at Day 7, attributed to players completing the tutorial but finding no mid-term goal to pursue.
+**Input**: "Design a live ops feature to address the Day 7 drop-off."
+**Expected behavior**:
+- Designs specifically for the Day 7 cohort — not a generic retention feature
+- Proposes a mid-term goal structure: a 2-week "Explorer Challenge" that unlocks at Day 5-7 and provides a visible progression track with rewards at Day 10, 14, and 21
+- Connects the design explicitly to the identified drop-off point: the feature must be visible and activating before or at Day 7
+- Does NOT design a feature for Day 1 retention or Day 30 monetization when the data points to Day 7 as the target
+- Notes that specific reward values are [TO BE DEFINED BY ECONOMY-DESIGNER] using the actual retention data
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (event structure, content cadence, retention design, battle pass design)
+- [ ] Redirects reward value and economy math requests to economy-designer
+- [ ] Flags predatory monetization patterns and escalates to creative-director rather than implementing them silently
+- [ ] Escalates event/core-progression conflicts to creative-director rather than resolving unilaterally
+- [ ] Uses provided retention data to target specific player cohorts, not generic engagement strategies
+
+---
+
+## Coverage Notes
+- Case 3 (monetization ethics) is a brand-safety test — failure here could result in harmful live ops designs shipping
+- Case 4 (escalation behavior) is a coordination test — verify the agent actually escalates rather than deciding independently
+- Case 5 is the most important context-awareness test; agent must target the specific drop-off point, not a generic solution
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/localization-lead.md
+++ b/Framework/agents/operations/localization-lead.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: localization-lead
+
+## Agent Summary
+- **Domain**: Internationalization (i18n) architecture, string extraction workflows and tooling configuration, locale testing methodology, translation pipeline design (extraction → TMS → import), string quality standards, locale-specific formatting rules (plurals, RTL, date/number formats)
+- **Does NOT own**: Game narrative content and dialogue writing (writer), code implementation of i18n calls (gameplay-programmer), translation work itself (external translators)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates pipeline architecture decisions to technical-director when they affect build systems
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references i18n, string extraction, locale pipeline, localization)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for localization config, pipeline docs, string tables; no game source editing or deployment tools)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over narrative content, game code implementation, or translation quality
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — string extraction pipeline for a Unity project
+**Input**: "Set up a string extraction pipeline for our Unity game. We need to get all localizable strings into a format translators can work with."
+**Expected behavior**:
+- Produces a concrete extraction configuration covering: which string types to extract (UI labels, dialogue, item descriptions — not debug strings), the tool to use (e.g., Unity Localization package string tables, or a custom extraction script targeting specific component types), and the output format (CSV, XLIFF, or TMX — notes which formats are compatible with common TMS tools like Crowdin or Lokalise)
+- Specifies the folder structure: e.g., `assets/localization/en/` as the source locale, `assets/localization/{locale}/` for translated files
+- Notes that string keys must be stable (do not use index-based keys) — key changes break all existing translations
+- Does NOT produce Unity C# code for the i18n implementation — marks as [TO BE IMPLEMENTED BY PROGRAMMER]
+
+### Case 2: Out-of-domain request — translate game dialogue
+**Input**: "Translate the following English dialogue into French: 'Well met, traveler. The road ahead is treacherous.'"
+**Expected behavior**:
+- Does not produce a French translation
+- States clearly: "localization-lead owns the pipeline, quality standards, and workflow; actual translation work is performed by human translators or approved translation vendors — I am not a translator"
+- Optionally notes what information a translator would need: context (who is speaking, to whom, game genre/tone), character limit constraints if any, glossary terms (e.g., if "traveler" has a game-specific translation)
+
+### Case 3: Domain boundary — missing plural forms in Russian locale
+**Input**: "Our Russian locale files only have a singular form for item quantity strings. Russian requires multiple plural forms (1 item, 2-4 items, 5+ items use different forms)."
+**Expected behavior**:
+- Identifies this as a locale-specific plural form gap: Russian has 3 plural categories (one, few, many) per CLDR/Unicode plural rules — a single string is insufficient
+- Flags it as a localization quality bug, not a minor style issue — incorrect plural forms are grammatically wrong and visible to players
+- Recommends the fix: update the string extraction format to support CLDR plural categories (one/few/many/other), and flag to the translation vendor that Russian strings need all plural forms
+- Notes which other languages in the pipeline also require plural form support (e.g., Polish, Czech, Arabic)
+- Does NOT suggest using a numeric threshold workaround as a substitute for proper CLDR plural support
+
+### Case 4: String key naming conflict between two systems
+**Input**: "Our UI system uses keys like 'button_confirm' and 'button_cancel'. Our dialogue system uses 'confirm' and 'cancel' for the same concepts. Translators are confused about which to use."
+**Expected behavior**:
+- Identifies the conflict: two systems use different key naming conventions for semantically identical strings, creating duplicate translation work and translator confusion
+- Produces a naming convention resolution: domain-prefixed keys with a consistent separator (e.g., `ui.button.confirm`, `ui.button.cancel`) — all systems use the same key for shared concepts
+- Recommends that shared UI primitives (Confirm, Cancel, Back, OK) use a single canonical key in a shared namespace, referenced by both systems
+- Provides a migration path: map old keys to new keys, update all string references in both systems, deprecate old keys after one release cycle
+- Does NOT recommend maintaining two separate keys for the same concept
+
+### Case 5: Context pass — pipeline accommodates RTL languages
+**Input context**: Target locales include English (en), French (fr), German (de), Arabic (ar), and Hebrew (he).
+**Input**: "Design the localization pipeline for this project."
+**Expected behavior**:
+- Identifies Arabic and Hebrew as RTL languages — explicitly calls this out as a pipeline requirement
+- Designs the pipeline to include: RTL text rendering support (flag for programmer: UI must support RTL layout mirroring), bidirectional (bidi) text handling in string tables, locale-specific testing checklist entry for RTL layout
+- Does NOT design a pipeline that only accounts for LTR languages when RTL locales are specified
+- Notes that Arabic also requires a different plural form structure (6 plural categories in CLDR) — flags for translation vendor
+- Output includes all five locales in the pipeline architecture, not just the default (en)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (pipeline, extraction, string quality, locale formats, i18n architecture)
+- [ ] Does not produce translations — redirects translation work to human translators/vendors
+- [ ] Flags locale-specific gaps (plural forms, RTL) as quality bugs requiring pipeline changes
+- [ ] Produces a unified key naming convention when conflicts arise — does not accept dual conventions
+- [ ] Incorporates all provided target locales, including RTL languages, into pipeline design
+
+---
+
+## Coverage Notes
+- Case 3 (plural forms) and Case 5 (RTL) are locale-correctness tests — these affect shipping quality in non-English markets
+- Case 4 (key naming conflict) is a pipeline hygiene test — duplicate keys cause ongoing translator confusion and cost
+- Case 5 requires the target locale list to be in context; if not provided, agent should ask before designing the pipeline
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/operations/release-manager.md
+++ b/Framework/agents/operations/release-manager.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: release-manager
+
+## Agent Summary
+- **Domain**: Release pipeline management, platform certification checklists (Nintendo, Sony, Microsoft, Apple, Google), store submission workflows, platform technical requirements compliance, semantic version numbering, release branch management
+- **Does NOT own**: Game design decisions, QA test strategy or test case design (qa-lead), QA test execution (qa-tester), build infrastructure (devops-engineer)
+- **Model tier**: Sonnet
+- **Gate IDs**: May be invoked by `/gate-check` during Release phase; LAUNCH BLOCKED verdict is release-manager's primary escalation output
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references release pipeline, certification, store submission)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/ directory; no game source or test tools)
+- [ ] Model tier is Sonnet (default for operations specialists)
+- [ ] Agent definition does not claim authority over QA strategy, game design, or build infrastructure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — platform certification checklist for Nintendo Switch
+**Input**: "Generate the certification checklist for our Nintendo Switch submission."
+**Expected behavior**:
+- Produces a structured checklist covering Nintendo Lotcheck requirements relevant to the game type
+- Includes categories: content rating (CERO/PEGI/ESRB as applicable), save data handling, offline mode compliance, error handling (lost connectivity, storage full), controller requirement (Joy-Con, Pro Controller support), sleep/wake behavior, screenshot/video capture compliance
+- Formats output as a numbered checklist with pass/fail columns
+- Notes that Nintendo's full Lotcheck guidelines require a licensed developer account to access and flags any items that require manual verification against the current guidelines document
+- Does NOT produce fabricated requirement IDs — uses known public requirements or clearly marks uncertainty
+
+### Case 2: Out-of-domain request — design test cases
+**Input**: "Write test cases for our save system to make sure it passes certification."
+**Expected behavior**:
+- Does not produce test case specifications
+- States clearly: "Test case design is owned by qa-lead (strategy) and qa-tester (execution); I can provide the certification requirements that the save system must meet, which qa-lead can then use to design tests"
+- Optionally offers to list the save-system-relevant certification requirements
+
+### Case 3: Domain boundary — certification failure (rating issue)
+**Input**: "Our build was rejected by the ESRB. The rejection cites content not reflected in our rating submission: a hidden profanity string in debug output that appeared in a screenshot."
+**Expected behavior**:
+- Issues a LAUNCH BLOCKED verdict with the specific platform requirement referenced (ESRB submission accuracy requirement)
+- Identifies the immediate action required: locate and remove all debug output containing inappropriate content before resubmission
+- Notes the resubmission process: corrected build must be resubmitted with updated content descriptor if needed
+- Does NOT minimize the issue — a certification rejection is a blocking event, not an advisory
+- Escalates to producer: documents the delay impact on release timeline
+
+### Case 4: Version numbering conflict — hotfix vs. release branch
+**Input**: "Our release branch is at v1.2.0. A hotfix was applied directly on main and tagged v1.2.1. Now the release branch also has changes that need to ship as v1.2.1 but they're different changes."
+**Expected behavior**:
+- Identifies the conflict: two different changesets have been assigned the same version tag
+- Applies semantic versioning resolution: one must be re-tagged — the release branch changes should become v1.2.2 if v1.2.1 is already published; if v1.2.1 is not yet published, coordinate with devops-engineer to merge or re-tag
+- Does NOT accept a state where the same version number refers to two different builds
+- Notes that once a version is submitted to a store, it cannot be reused — flags this as a potential store submission blocker
+
+### Case 5: Context pass — release date constraint and certification lead time
+**Input context**: Target release date is 2026-06-01. Current date is 2026-04-06. Nintendo Lotcheck typically takes 4-6 weeks.
+**Input**: "What should we prioritize on the certification checklist given our timeline?"
+**Expected behavior**:
+- Calculates the available window: ~8 weeks to release date; Nintendo Lotcheck at 4-6 weeks means submission must be ready by approximately 2026-04-20 to 2026-05-04 to allow for a potential resubmission cycle
+- Flags that a single rejection cycle would consume the buffer — prioritizes items historically associated with Lotcheck rejections (save data, offline mode, error handling)
+- Orders the checklist by certification lead time impact, not by perceived difficulty
+- Does NOT produce a checklist that assumes first-pass certification — builds in resubmission time
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (release pipeline, certification checklists, version numbering, store submission)
+- [ ] Redirects test case design requests to qa-lead/qa-tester without producing test specs
+- [ ] Issues LAUNCH BLOCKED verdicts for certification failures — does not downgrade to advisory
+- [ ] Applies semantic versioning correctly and flags version conflicts as store-blocking issues
+- [ ] Uses provided timeline data to prioritize checklist items by certification lead time
+
+---
+
+## Coverage Notes
+- Case 3 (LAUNCH BLOCKED verdict) is the most critical test — this agent's primary safety output is blocking bad launches
+- Case 5 requires current date and release date context; verify the agent uses actual dates, not placeholder estimates
+- Certification requirements change over time — flag if the agent produces specific requirement IDs that may be outdated
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/qa/accessibility-specialist.md
+++ b/Framework/agents/qa/accessibility-specialist.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: accessibility-specialist
+
+## Agent Summary
+Domain: Input remapping, text scaling, colorblind modes, screen reader support, and accessibility standards compliance (WCAG, platform certifications).
+Does NOT own: overall UX flow design (ux-designer), visual art style direction (art-director).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references accessibility / inclusive design / WCAG)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow or visual art style
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review the player HUD for accessibility."
+**Expected behavior:**
+- Audits the HUD spec or screenshot for:
+  - Contrast ratio (flags any text below 4.5:1 for AA or 7:1 for AAA)
+  - Alternative representation for color-coded information (e.g., enemy health bars use only color, no shape distinction)
+  - Text size (flags any text below 16px equivalent at 1080p)
+  - Screen reader or TTS annotation availability for key status elements
+- Produces a prioritized finding list with specific element names and the criteria they fail
+- Does NOT redesign the HUD — produces findings for ux-designer and ui-programmer to act on
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the overall game flow: main menu → character select → loading → gameplay → pause → results."
+**Expected behavior:**
+- Does NOT produce UX flow architecture
+- Explicitly states that overall game flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- May note it can review the flow for accessibility concerns (e.g., time limits, cognitive load) once the flow is designed
+
+### Case 3: Colorblind mode conflict
+**Input:** "The proposed colorblind mode for deuteranopia replaces the enemy red health bars with orange, but the art palette already uses orange for friendly units."
+**Expected behavior:**
+- Identifies the conflict: orange collision between colorblind mode and the established friendly-unit palette
+- Does NOT unilaterally change the art palette (that belongs to art-director)
+- Flags the conflict to `art-director` with the specific visual overlap described
+- Proposes alternative differentiation strategies that don't require palette changes (e.g., shape/icon overlay, pattern fill, iconography)
+
+### Case 4: UI state requirement for accessibility feature
+**Input:** "Screen reader support for the inventory requires the system to expose item names and quantities as accessible text nodes."
+**Expected behavior:**
+- Produces an accessibility requirements spec defining the required accessible text properties for each inventory element
+- Identifies that implementing accessible text nodes requires UI system changes
+- Coordinates with `ui-programmer` to implement the required accessible text node exposure
+- Does NOT implement the UI system changes itself
+
+### Case 5: Context pass — WCAG 2.1 targets
+**Input:** Project accessibility target provided in context: WCAG 2.1 AA compliance. Request: "Review the dialogue system for accessibility."
+**Expected behavior:**
+- References specific WCAG 2.1 AA success criteria relevant to dialogue (e.g., 1.4.3 Contrast Minimum, 1.4.4 Resize Text, 2.2.1 Timing Adjustable for auto-advancing dialogue)
+- Uses exact criterion numbers and names from the standard, not paraphrases
+- Flags each finding with the specific criterion it fails
+- Notes which criteria are out of scope for AA (AAA-only) so they are not incorrectly flagged as failures
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (remapping, text scaling, colorblind modes, screen reader, standards compliance)
+- [ ] Redirects UX flow design to ux-designer, art palette decisions to art-director
+- [ ] Returns structured findings with specific element names, contrast ratios, and criterion references
+- [ ] Does not implement UI changes — coordinates with ui-programmer for implementation
+- [ ] References specific WCAG criteria by number when compliance target is provided
+- [ ] Flags conflicts between accessibility requirements and art decisions to art-director
+
+---
+
+## Coverage Notes
+- HUD audit (Case 1) should produce findings trackable as accessibility stories in the sprint backlog
+- Colorblind conflict (Case 3) confirms the agent respects art-director's authority over the palette
+- WCAG criteria (Case 5) verifies the agent uses standards precisely, not generically
--- a/Framework/agents/qa/qa-tester.md
+++ b/Framework/agents/qa/qa-tester.md
@@ -0,0 +1,87 @@
+# Agent Test Spec: qa-tester
+
+## Agent Summary
+- **Domain**: Detailed test case authoring, bug reports (structured format), test execution documentation, regression checklists, smoke check execution docs, test evidence recording per the project's coding standards
+- **Does NOT own**: Test strategy and test plan design (qa-lead), implementation fixes for found bugs (appropriate programmer), QA process architecture (qa-lead)
+- **Category**: qa
+- **Model tier**: Sonnet
+- **Gate IDs**: None; flags ambiguous acceptance criteria to qa-lead rather than resolving independently
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references test cases, bug reports, test execution, regression testing)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for tests/ and production/qa/evidence/; no source code editing tools)
+- [ ] Model tier is Sonnet (default for QA specialists)
+- [ ] Agent definition does not claim authority over test strategy, fix implementation, or acceptance criterion definition
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — test cases for a save system
+**Input**: "Write test cases for our save system. It must save and load player position, inventory, and quest state."
+**Expected behavior**:
+- Produces a test case list with at minimum the following test cases, each containing all four required fields:
+  - **TC-SAVE-001**: Save and load player position
+  - **TC-SAVE-002**: Save and load full inventory (multiple item types, quantities, equipped state)
+  - **TC-SAVE-003**: Save and load quest state (in-progress, completed, and locked quest states)
+  - **TC-SAVE-004**: Overwrite an existing save file
+  - **TC-SAVE-005**: Load a save file from a previous version (backward compatibility)
+  - **TC-SAVE-006**: Corrupt save file handling (file exists but is invalid)
+- Each test case includes: **Precondition** (required game state before test), **Steps** (numbered, unambiguous), **Expected Result** (specific, observable outcome), **Pass Criteria** (binary pass/fail condition)
+- Does NOT write "verify the save works" as a pass criterion — criteria must be observable and unambiguous
+
+### Case 2: Out-of-domain request — implement a bug fix
+**Input**: "You found a bug where the save system loses inventory data on version mismatch. Please fix it."
+**Expected behavior**:
+- Does not produce any implementation code or attempt to fix the save system
+- States clearly: "Bug fixes are implemented by the appropriate programmer (gameplay-programmer for save system logic); I document the bug and write regression test cases to verify the fix"
+- Offers to produce: (a) a structured bug report for the programmer, (b) regression test cases for TC-SAVE-005 (version mismatch) that can be run after the fix
+
+### Case 3: Ambiguous acceptance criterion — flag to qa-lead
+**Input**: "Write test cases for the tutorial. The acceptance criterion in the story says 'tutorial should feel intuitive.'"
+**Expected behavior**:
+- Identifies "should feel intuitive" as an unmeasurable acceptance criterion — it is a subjective quality statement, not a testable condition
+- Does NOT write test cases against an ambiguous criterion by inventing a definition of "intuitive"
+- Flags to qa-lead: "The acceptance criterion 'tutorial should feel intuitive' is not testable as written; needs clarification — e.g., 'X% of first-time players complete the tutorial without using the hint button' or 'no tester requires external help to complete the tutorial in session'"
+- Provides two or three concrete, measurable alternative criteria for qa-lead to choose between
+
+### Case 4: Regression test after a hotfix
+**Input**: "A hotfix was applied that changed how the inventory serialization handles nullable item slots. Write a targeted regression checklist for the affected systems."
+**Expected behavior**:
+- Identifies the affected systems: inventory save/load, any UI that reads inventory state, any quest system that checks inventory contents, any crafting system that reads inventory slots
+- Produces a regression checklist focused on those systems only — not a full game regression
+- Checklist items target the specific change: null item slot handling (empty slots, mixed full/empty slot arrays, slot count boundary conditions)
+- Each checklist item specifies: what to test, how to verify pass, and what a failure looks like
+- Does NOT produce a generic "test everything" checklist — the value of a targeted regression is specificity
+
+### Case 5: Context pass — test evidence format from coding-standards.md
+**Input context**: coding-standards.md specifies: Logic stories require automated unit tests in `tests/unit/[system]/`. Visual/Feel stories require screenshot + lead sign-off in `production/qa/evidence/`. UI stories require manual walkthrough doc in `production/qa/evidence/`.
+**Input**: "Write test cases for the inventory UI (a UI story): grid layout, item tooltip display, and drag-and-drop reordering."
+**Expected behavior**:
+- Classifies this correctly as a UI story per the provided standards
+- Produces a manual walkthrough test document (not automated unit tests) — because the coding standard specifies manual walkthrough for UI stories
+- Specifies the output location: `production/qa/evidence/` (not `tests/unit/`)
+- Test cases include: grid layout verification (all items appear, no overflow), tooltip display (correct item name, stats, description appear on hover/focus), and drag-and-drop (item moves to target slot, original slot becomes empty, slot limits respected)
+- Notes that this is ADVISORY evidence level per the coding standards, not BLOCKING — explicitly states this so the team knows the gate level
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (test case authoring, bug reports, test execution documentation, regression checklists)
+- [ ] Redirects bug fix requests to appropriate programmers and offers to document the bug and write regression tests
+- [ ] Flags ambiguous acceptance criteria to qa-lead rather than inventing a testable interpretation
+- [ ] Produces targeted regression checklists (system-specific) not full-game regression passes
+- [ ] Uses the correct test evidence format and output location per coding-standards.md
+
+---
+
+## Coverage Notes
+- Case 1 (test case completeness) is the foundational quality test — missing fields (precondition, steps, expected result, pass criteria) are a failure
+- Case 3 (ambiguous criterion) is a coordination test — qa-tester must not silently accept untestable criteria
+- Case 5 requires coding-standards.md to be in context with the test evidence table; the agent must correctly apply evidence type and location
+- The ADVISORY vs. BLOCKING gate level (Case 5) is a detail that affects story completion — verify the agent reports it
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/qa/security-engineer.md
+++ b/Framework/agents/qa/security-engineer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: security-engineer
+
+## Agent Summary
+Domain: Anti-cheat systems, save data security, network security, vulnerability assessment, and data privacy compliance.
+Does NOT own: game logic design (gameplay-programmer), server infrastructure (devops-engineer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references anti-cheat / security / vulnerability assessment)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over game logic design or server deployment
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Review the save data system for security issues."
+**Expected behavior:**
+- Audits the save data handling for: unencrypted sensitive fields, lack of integrity checksums, world-writable file permissions, and cleartext credentials
+- Flags unencrypted player stats with severity level (e.g., MEDIUM — enables offline stat manipulation)
+- Recommends: AES-256 encryption for sensitive fields, HMAC checksum for tamper detection
+- Produces a prioritized finding list (CRITICAL / HIGH / MEDIUM / LOW)
+- Does NOT change the save system code directly — produces findings for gameplay-programmer or engine-programmer to act on
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the matchmaking algorithm to pair players by skill rating."
+**Expected behavior:**
+- Does NOT produce matchmaking algorithm design
+- Explicitly states that matchmaking design belongs to `network-programmer`
+- Redirects the request to `network-programmer`
+- May note it can review the matchmaking system for security vulnerabilities (e.g., rating manipulation) once the design is complete
+
+### Case 3: Critical vulnerability — SQL injection
+**Input:** (Hypothetical) "Review this server-side query handler: `query = 'SELECT * FROM users WHERE id=' + user_input`"
+**Expected behavior:**
+- Flags this as a CRITICAL vulnerability (SQL injection via unsanitized user input)
+- Provides immediate remediation: parameterized queries / prepared statements
+- Recommends a security review of all other query-construction code in the codebase
+- Escalates to `technical-director` given CRITICAL severity — does not leave the finding unescalated
+
+### Case 4: Security vs. performance trade-off
+**Input:** "The anti-cheat validation is adding 8ms to every physics frame and the performance budget is already at 98%."
+**Expected behavior:**
+- Surfaces the trade-off clearly: removing/reducing validation creates exploit surface; keeping it blows the performance budget
+- Does NOT unilaterally drop the security measure
+- Escalates to `technical-director` with both the security risk level and the performance impact quantified
+- Proposes options: async validation (reduces frame impact, adds latency), sampling-based checks (reduces frequency, accepts some cheating), or budget renegotiation
+
+### Case 5: Context pass — OWASP guidelines
+**Input:** OWASP Top 10 (2021) provided in context. Request: "Audit the game's login and account system."
+**Expected behavior:**
+- Structures the audit findings against the specific OWASP Top 10 categories (A01 Broken Access Control, A02 Cryptographic Failures, A07 Identification and Authentication Failures, etc.)
+- References specific control IDs from the provided list rather than generic advice
+- Flags each finding with the relevant OWASP category
+- Produces a compliance gap list: which controls are met, which are missing, which are partial
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (anti-cheat, save security, network security, vulnerability assessment)
+- [ ] Redirects matchmaking / game logic requests to appropriate agents
+- [ ] Returns structured findings with severity classification (CRITICAL / HIGH / MEDIUM / LOW)
+- [ ] Does not implement fixes unilaterally — produces findings for the responsible programmer
+- [ ] Escalates CRITICAL findings to technical-director immediately
+- [ ] References specific standards (OWASP, GDPR, etc.) when provided in context
+
+---
+
+## Coverage Notes
+- Save data audit (Case 1) confirms the agent produces actionable, prioritized findings not generic advice
+- CRITICAL vulnerability escalation (Case 3) verifies the agent's severity classification and escalation path
+- Performance trade-off (Case 4) confirms the agent does not silently drop security measures to hit a budget
--- a/Framework/agents/specialists/ai-programmer.md
+++ b/Framework/agents/specialists/ai-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ai-programmer
+
+## Agent Summary
+Domain: NPC behavior, state machines, pathfinding, perception systems, and AI decision-making.
+Does NOT own: player mechanics (gameplay-programmer), rendering or engine internals (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references NPC behavior / AI systems)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over player mechanics or engine rendering
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a patrol-and-alert behavior tree for a guard NPC: patrol between waypoints, detect the player within 10 units, then enter an alert state and pursue."
+**Expected behavior:**
+- Produces a behavior tree spec (nodes: Selector, Sequence, Leaf actions) plus corresponding code scaffold
+- Defines clearly named states: Patrol, Alert, Pursue
+- Uses a perception/detection check as a condition node, not inline in movement code
+- Waypoints are data-driven (passed as a resource or export), not hardcoded positions
+- Output includes doc comments on public API
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement player input handling for the WASD movement and dash ability."
+**Expected behavior:**
+- Does NOT produce player input or movement code
+- Explicitly states this is outside its domain (player mechanics belong to gameplay-programmer)
+- Redirects the request to `gameplay-programmer`
+- May note that once player position is available via API, AI perception can reference it
+
+### Case 3: Cross-domain coordination — level constraints
+**Input:** "Design pathfinding for the warehouse level, but the level has narrow corridors that confuse the navmesh."
+**Expected behavior:**
+- Does NOT unilaterally modify level layout or navmesh assets
+- Coordinates with `level-designer` to clarify navmesh requirements and corridor dimensions
+- Proposes a pathfinding approach (e.g., navmesh with agent radius tuning, flow fields) conditional on level geometry
+- Documents assumptions and flags blockers clearly
+
+### Case 4: Performance escalation — custom data structures
+**Input:** "The pathfinding priority queue is the bottleneck; I need a custom binary heap implementation for performance."
+**Expected behavior:**
+- Recognizes that a low-level, engine-integrated data structure is within engine-programmer's domain
+- Escalates to `engine-programmer` with a clear description of the bottleneck and required interface
+- May provide the algorithmic spec (binary heap interface, expected operations) to guide the engine-programmer
+- Does NOT implement the low-level structure unilaterally if it requires engine memory management
+
+### Case 5: Context pass — uses level layout for pathfinding design
+**Input:** Level layout document provided in context showing two choke points: a doorway at (12, 0) and a bridge at (40, 5). Request: "Design the patrol route and threat response for enemies in this level."
+**Expected behavior:**
+- References the specific choke point coordinates from the provided context
+- Designs patrol routes that leverage the choke points as tactical positions
+- Specifies alert state transitions that funnel NPCs toward identified choke points during pursuit
+- Does not invent geometry not present in the provided layout document
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (NPC behavior, pathfinding, perception, state machines)
+- [ ] Redirects out-of-domain requests to correct agent (gameplay-programmer, engine-programmer, level-designer)
+- [ ] Returns structured findings (behavior tree specs, state machine diagrams, code scaffolds)
+- [ ] Does not modify player mechanics files without explicit delegation
+- [ ] Escalates performance-critical low-level structures to engine-programmer
+- [ ] Uses data-driven NPC configuration (waypoints, detection radii) not hardcoded values
+
+---
+
+## Coverage Notes
+- Behavior tree output (Case 1) should be validated by a unit test in `tests/unit/ai/`
+- Level-layout context (Case 5) verifies the agent reads and applies provided documents rather than inventing
+- Performance escalation (Case 4) confirms the agent recognizes the engine-programmer boundary
--- a/Framework/agents/specialists/engine-programmer.md
+++ b/Framework/agents/specialists/engine-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: engine-programmer
+
+## Agent Summary
+Domain: Rendering pipeline, physics integration, memory management, resource loading, and core engine framework.
+Does NOT own: gameplay mechanics (gameplay-programmer), editor/debug tool UI (tools-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references rendering / memory / engine core)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay mechanics or tool UI
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a custom object pool for projectiles to avoid per-frame allocation."
+**Expected behavior:**
+- Produces an engine-level object pool implementation with acquire/release interface
+- Pool is typed to the projectile object type, uses pre-allocated fixed-size storage
+- Provides thread-safety notes (or clearly marks as single-threaded-only with rationale)
+- Includes doc comments on the public API per coding standards
+- Output is compatible with the project's configured engine and language
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Add a pause menu screen with volume sliders and a 'back to main menu' button."
+**Expected behavior:**
+- Does NOT produce UI screen code
+- Explicitly states that menu screens belong to `ui-programmer`
+- Redirects the request to `ui-programmer`
+- May note it can provide engine-level audio volume API endpoints for the ui-programmer to call
+
+### Case 3: Memory leak diagnosis
+**Input:** "Memory usage grows by ~50MB per level load and never releases. We suspect the resource loading system."
+**Expected behavior:**
+- Produces a systematic diagnosis approach: reference counting audit, resource handle lifecycle check, cache invalidation review
+- Identifies likely causes (orphaned resource handles, circular references, cache that never evicts)
+- Produces a concrete fix for the identified leak pattern
+- Provides a test to verify the fix (memory baseline before load, measure after unload, confirm return to baseline)
+
+### Case 4: Cross-domain coordination — shared system optimization
+**Input:** "I need to optimize the physics broadphase, but the gameplay system is tightly coupled to the physics query API."
+**Expected behavior:**
+- Does NOT unilaterally change the physics query API surface (would break gameplay-programmer's code)
+- Coordinates with `lead-programmer` to plan the change safely
+- Proposes a migration path: new optimized API alongside old API, with a deprecation period
+- Documents the coordination requirement before proceeding
+
+### Case 5: Context pass — checks engine version reference
+**Input:** Engine version reference (Godot 4.6) provided in context. Request: "Set up the default physics engine for the project."
+**Expected behavior:**
+- Reads the engine version reference and notes Godot 4.6 change: Jolt physics is now the default
+- Produces configuration guidance that accounts for the Jolt-as-default change (4.6 migration note)
+- Flags any API differences between GodotPhysics and Jolt that could affect existing code
+- Does NOT suggest deprecated or pre-4.6 physics setup steps without noting they apply to older versions
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (rendering, physics, memory, resource loading, core framework)
+- [ ] Redirects UI/menu requests to ui-programmer
+- [ ] Returns structured findings (implementation code, diagnosis steps, migration plans)
+- [ ] Coordinates with lead-programmer before changing shared API surfaces
+- [ ] Checks engine version reference before suggesting engine-specific APIs
+- [ ] Provides test evidence for fixes (memory before/after, performance measurements)
+
+---
+
+## Coverage Notes
+- Object pool (Case 1) must include a unit test in `tests/unit/engine/`
+- Memory leak diagnosis (Case 3) should produce evidence artifacts in `production/qa/evidence/`
+- Engine version check (Case 5) confirms the agent treats VERSION.md as authoritative, not LLM training data
--- a/Framework/agents/specialists/gameplay-programmer.md
+++ b/Framework/agents/specialists/gameplay-programmer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: gameplay-programmer
+
+## Agent Summary
+Domain: Game mechanics code, player systems, combat implementation, and interactive features.
+Does NOT own: UI implementation (ui-programmer), AI behavior trees (ai-programmer), engine/rendering systems (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references game mechanics / player systems)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep — excludes tools only needed by orchestration agents
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UI, AI behavior, or engine/rendering code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a melee combo system where three consecutive light attacks chain into a finisher."
+**Expected behavior:**
+- Produces code or a code scaffold following the project's language (GDScript/C#) and coding standards
+- Defines combo state tracking, input window timing, and finisher trigger logic as separate, testable methods
+- References the relevant GDD section if one is provided in context
+- Does NOT implement UI feedback (delegates to ui-programmer) or AI reaction (delegates to ai-programmer)
+- Output includes doc comments on all public methods per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Build the main menu screen with pause and settings panels."
+**Expected behavior:**
+- Does NOT produce menu implementation code
+- Explicitly states this is outside its domain
+- Redirects the request to `ui-programmer`
+- May note that if the pause menu requires reading gameplay state it can provide the state API surface
+
+### Case 3: Domain boundary — threading flag
+**Input:** "The combo system is causing frame stutters; can you add threading to spread the input processing?"
+**Expected behavior:**
+- Does NOT unilaterally implement threading or async systems
+- Flags the threading concern to `engine-programmer` with a clear description of the hot path
+- May produce a non-threaded refactor to reduce work per frame as a safe interim step
+- Documents the escalation so lead-programmer is aware
+
+### Case 4: Conflict with an Accepted ADR
+**Input:** "Change the damage calculation to use floating-point accumulation directly instead of the fixed-point formula in ADR-003."
+**Expected behavior:**
+- Identifies that the proposed change violates ADR-003 (Accepted status)
+- Does NOT silently implement the violation
+- Flags the conflict to `lead-programmer` with the ADR reference and the trade-off described
+- Will implement only after explicit override decision from lead-programmer or technical-director
+
+### Case 5: Context pass — implements to GDD spec
+**Input:** GDD for "PlayerCombat" provided in context. Request: "Implement the stamina drain formula from the combat GDD."
+**Expected behavior:**
+- Reads the formula section of the provided GDD
+- Implements the exact formula as written — does NOT invent new variables or adjust coefficients
+- Makes stamina drain a data-driven value (external config), not a hardcoded constant
+- Notes any edge cases from the GDD's edge-cases section and handles them in code
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (mechanics, player systems, combat)
+- [ ] Redirects out-of-domain requests to correct agent (ui-programmer, ai-programmer, engine-programmer)
+- [ ] Returns structured findings (code scaffold, method signatures, inline comments) not freeform opinions
+- [ ] Does not modify files outside `src/gameplay/` or `src/core/` without explicit delegation
+- [ ] Flags ADR violations rather than overriding them silently
+- [ ] Makes gameplay values data-driven, never hardcoded
+
+---
+
+## Coverage Notes
+- Combo system test (Case 1) should be validated with a unit test in `tests/unit/gameplay/`
+- Threading escalation (Case 3) verifies the agent does not over-reach into engine territory
+- ADR conflict (Case 4) confirms the agent respects the architecture governance process
+- Cases 1 and 5 together verify the agent implements to spec rather than improvising
--- a/Framework/agents/specialists/network-programmer.md
+++ b/Framework/agents/specialists/network-programmer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: network-programmer
+
+## Agent Summary
+Domain: Multiplayer networking, state replication, lag compensation, matchmaking protocol design, and network message schemas.
+Does NOT own: gameplay logic (only the networking of it), server infrastructure and deployment (devops-engineer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references multiplayer / replication / networking)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay logic or server deployment infrastructure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Design state replication for player position in a 4-player co-op game."
+**Expected behavior:**
+- Produces a sync strategy document covering:
+  - Replication frequency (e.g., 20Hz with delta compression)
+  - Priority tier (e.g., own-player high priority, other players medium)
+  - Interpolation approach for remote players (e.g., linear interpolation with 100ms buffer)
+  - Bandwidth estimate per player per second
+- Does NOT implement the player movement logic itself (defers to gameplay-programmer)
+- Proposes dead-reckoning or prediction strategy to reduce visible lag
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Deploy our game server to AWS EC2 and set up auto-scaling."
+**Expected behavior:**
+- Does NOT produce server deployment configuration, Terraform, or AWS setup scripts
+- Explicitly states that server infrastructure belongs to `devops-engineer`
+- Redirects the request to `devops-engineer`
+- May note it can provide the network protocol spec the server needs to implement once infrastructure is set up
+
+### Case 3: State divergence — rollback/reconciliation
+**Input:** "Under high latency, clients are diverging from the authoritative server state for physics objects."
+**Expected behavior:**
+- Proposes a rollback-and-reconciliation approach (client-side prediction + server authoritative correction)
+- Specifies the state snapshot format, reconciliation trigger threshold (e.g., >5 units position error), and correction interpolation speed
+- Notes the input buffer pattern for deterministic replay
+- Does NOT change the physics simulation itself — documents the interface contract for engine-programmer
+
+### Case 4: Anti-cheat conflict
+**Input:** "We want client-authoritative position for smooth movement, but anti-cheat requires server validation."
+**Expected behavior:**
+- Surfaces the direct conflict: client-authority is fast but exploitable; server-authority is secure but requires latency compensation
+- Coordinates with `security-engineer` to agree on the validation boundary
+- Proposes a compromise (server validates position within a tolerance band, flags outliers) rather than unilaterally deciding
+- Documents the trade-off and escalates the final decision to `technical-director` if security-engineer and network-programmer cannot agree
+
+### Case 5: Context pass — latency budget
+**Input:** Technical preferences provided in context: target latency 80ms RTT for 95th percentile players. Request: "Design the input replication scheme for a fighting game."
+**Expected behavior:**
+- References the 80ms RTT budget explicitly in the design
+- Selects replication approach calibrated to that budget (e.g., rollback netcode is preferred for fighting games at this latency)
+- Specifies input delay frames calculated from the 80ms budget (e.g., 2 frames at 60fps = 33ms buffer)
+- Flags that rollback netcode requires gameplay-programmer to implement deterministic simulation
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (replication, lag compensation, protocol design, matchmaking)
+- [ ] Redirects server deployment to devops-engineer
+- [ ] Returns structured findings (sync strategies, protocol specs, bandwidth estimates)
+- [ ] Does not implement gameplay logic — only specifies the network contract for it
+- [ ] Coordinates with security-engineer on anti-cheat boundaries
+- [ ] Designs to explicit latency targets from provided context
+
+---
+
+## Coverage Notes
+- Replication strategy (Case 1) should include a bandwidth calculation reviewable by technical-director
+- Rollback/reconciliation (Case 3) must document the engine-programmer interface contract clearly
+- Anti-cheat conflict (Case 4) confirms the agent escalates rather than unilaterally deciding security trade-offs
--- a/Framework/agents/specialists/performance-analyst.md
+++ b/Framework/agents/specialists/performance-analyst.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: performance-analyst
+
+## Agent Summary
+Domain: Profiling, bottleneck identification, performance metrics tracking, and optimization recommendations.
+Does NOT own: implementing optimizations (belongs to the appropriate programmer for that domain).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references profiling / bottleneck analysis / performance metrics)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over implementing any optimization — explicitly identifies itself as analysis/recommendation only
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Analyze this frame time data: CPU 14ms, GPU 8ms, physics 6ms, draw calls 420, scripts 3ms."
+**Expected behavior:**
+- Identifies the primary bottleneck: CPU is over a 16.67ms (60fps) budget at 14ms total
+- Breaks down contributors: physics (6ms, 43% of CPU time) is the top culprit
+- Draw calls (420) flags as a secondary concern if the budget limit is lower (e.g., 200 draw calls per technical-preferences.md)
+- Produces a prioritized bottleneck report:
+  1. Physics — 6ms, reduce simulation frequency or switch broadphase algorithm
+  2. Draw calls — 420, implement batching or LOD
+  3. Scripts — 3ms, profile hot paths
+- Does NOT implement any of these optimizations
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the batching optimization to reduce draw calls from 420 to under 200."
+**Expected behavior:**
+- Does NOT produce implementation code for batching
+- Explicitly states that implementing optimizations belongs to the appropriate programmer (engine-programmer for rendering batching)
+- Redirects the implementation to `engine-programmer` with the recommendation context attached
+- May produce a requirements brief for the optimization so engine-programmer has a clear target
+
+### Case 3: Regression identification
+**Input:** "Performance dropped significantly after last week's commits. Frame time went from 10ms to 18ms."
+**Expected behavior:**
+- Proposes a bisection strategy to identify the offending commit range
+- Requests or reviews the diff of commits in the window to narrow the likely cause
+- Identifies affected systems based on what changed (e.g., if physics code was modified, points to physics as the primary suspect)
+- Produces a regression report naming the probable commit, the affected system, and the measured delta
+
+### Case 4: Recommendation vs. code quality trade-off
+**Input:** "The fastest optimization for the script bottleneck would be to inline all calls and remove abstraction layers."
+**Expected behavior:**
+- Surfaces the trade-off: inlining improves performance but reduces testability and violates the coding standard requiring unit-testable public methods
+- Does NOT recommend the optimization without noting the code quality cost
+- Escalates the trade-off to `lead-programmer` for a decision
+- May propose a middle path (e.g., profile-guided inlining of only the hottest 2–3 methods) that preserves testability
+
+### Case 5: Context pass — technical-preferences.md budget
+**Input:** Technical preferences from context: Target 60fps, frame budget 16.67ms, draw calls max 200, memory ceiling 512MB. Request: "Review the current build profile."
+**Expected behavior:**
+- References the specific values from the provided context: 16.67ms, 200 draw calls, 512MB
+- Compares current measurements against each threshold explicitly
+- Labels each metric as WITHIN BUDGET / AT RISK / OVER BUDGET based on the provided numbers
+- Does NOT use different budget numbers than those provided in the context
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (profiling, analysis, recommendations — not implementation)
+- [ ] Redirects optimization implementation to the correct programmer domain agent
+- [ ] Returns structured findings (bottleneck report with severity, measured values, and recommended action owner)
+- [ ] Escalates code-quality trade-offs to lead-programmer rather than deciding unilaterally
+- [ ] Applies budget thresholds from provided context rather than assumed defaults
+- [ ] Labels all findings with a specific action owner (who should implement the fix)
+
+---
+
+## Coverage Notes
+- Frame time analysis (Case 1) output should be structured as a report filed in `production/qa/evidence/`
+- Regression case (Case 3) confirms the agent investigates cause, not just measures symptoms
+- Code quality trade-off (Case 4) verifies the agent does not recommend optimizations that violate coding standards without flagging the conflict
--- a/Framework/agents/specialists/prototyper.md
+++ b/Framework/agents/specialists/prototyper.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: prototyper
+
+## Agent Summary
+- **Domain**: Rapid throwaway prototypes in the `prototypes/` directory, concept validation experiments, mechanical feasibility tests. Standards intentionally relaxed for speed — prototypes are not production code.
+- **Does NOT own**: Production source code in `src/` (gameplay-programmer), design documents (game-designer), production-grade architecture decisions (lead-programmer / technical-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; produces recommendation docs after prototype conclusion; does not participate in phase gates
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references rapid prototyping, prototypes/ directory, throwaway code)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write scoped to prototypes/ directory; no production src/ write access)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition explicitly states that prototype code is not production code and must not be copied to src/
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — prototype a card-drawing mechanic
+**Input**: "Prototype a card-drawing mechanic in 2 hours. The core question: does drawing 3 cards per turn with hand-size limit of 7 feel good? I need something to test in a playtest today."
+**Expected behavior**:
+- Produces a minimal working prototype written in the project's engine scripting language, scoped to `prototypes/card-draw-mechanic/`
+- Code prioritizes speed over correctness: no unit tests, no doc comments required, global state is acceptable for a prototype
+- Implements the minimal viable mechanic: a deck, a draw function (draw N cards), a hand container with a size limit, and a simple UI or debug print to verify state
+- Does NOT implement production patterns (dependency injection, signals, data-driven config) unless they take less time than not using them
+- Includes a `README.md` in the prototype folder: hypothesis being tested, how to run, what to observe in the playtest
+
+### Case 2: Out-of-domain request — production-grade implementation
+**Input**: "The card mechanic prototype worked great. Now write the production implementation of the card system for src/gameplay/cards/."
+**Expected behavior**:
+- Does not write production code to `src/`
+- States clearly: "Prototyper produces throwaway code in prototypes/ to validate concepts; production implementation of validated mechanics is handled by gameplay-programmer"
+- Offers to produce a transition document: what the prototype proved, what the production implementation should preserve (the mechanic), and what it should discard (the throwaway implementation patterns)
+- Does NOT copy the prototype code into src/ or suggest it as a starting point without warning about its non-production quality
+
+### Case 3: Prototype validates the mechanic — recommendation output
+**Input**: "The card-draw prototype playtested well. Three sessions all enjoyed drawing 3 cards/turn with hand limit 7. No confusion observed. What's next?"
+**Expected behavior**:
+- Produces a prototype conclusion document in `prototypes/card-draw-mechanic/conclusion.md` (or equivalent)
+- Document includes: hypothesis that was tested, playtest method (sessions, duration, observer notes), result verdict (VALIDATED), key findings (what worked, any minor issues observed), recommendation for production (specific mechanic parameters to preserve: 3 cards/turn, hand limit 7), and a flag to route the production implementation request to gameplay-programmer
+- Does NOT begin writing production code
+- Output is structured as a decision-ready recommendation, not a narrative summary
+
+### Case 4: Prototype reveals the mechanic is unworkable — abandonment note
+**Input**: "The prototype for the physics-based lock-picking mechanic is done. After 4 playtest sessions, all testers found it frustrating — too much precision required, not fun. One tester rage-quit."
+**Expected behavior**:
+- Produces a prototype abandonment note in `prototypes/lock-picking-physics/conclusion.md`
+- Document includes: hypothesis that was tested, result verdict (ABANDONED), specific reasons (precision barrier too high, negative emotional response, rage-quit incident as evidence), and a recommendation for alternative approaches to explore (simplified key-tumbler mechanic, rhythm-based alternative, removal of the mechanic entirely)
+- Does NOT recommend persisting with the prototype mechanic because of sunk cost
+- Does NOT mark the result as inconclusive — after 4 sessions with consistent negative responses, abandonment is the correct verdict
+
+### Case 5: Context pass — using the project's engine scripting language
+**Input context**: Project uses Godot 4.6 with GDScript (configured in technical-preferences.md).
+**Input**: "Prototype a basic grid movement system — player clicks a tile and the character moves to it."
+**Expected behavior**:
+- Produces the prototype in GDScript — not Python, C#, or pseudocode
+- Uses Godot 4.6 node types appropriate for a grid: TileMap or a custom grid manager node, CharacterBody2D or Node2D for the player
+- Does NOT apply production coding standards (no required test coverage, no doc comments, global state acceptable)
+- Writes the output to `prototypes/grid-movement/` not to `src/`
+- If a Godot 4.6 API is uncertain (given the LLM knowledge cutoff noted in VERSION.md), flags the specific API with a note to verify against the Godot 4.6 docs
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (prototypes/ directory only; throwaway code for concept validation)
+- [ ] Redirects production implementation requests to gameplay-programmer with a transition document offer
+- [ ] Produces structured conclusion documents (VALIDATED or ABANDONED verdict) after prototype evaluation
+- [ ] Does not recommend preserving prototype code in production form without explicit warnings
+- [ ] Uses the project's configured engine and scripting language; flags version uncertainty
+
+---
+
+## Coverage Notes
+- Case 2 (production redirect) is critical — prototype code leaking into src/ is a common quality problem
+- Case 4 (abandonment honesty) tests whether the agent avoids sunk-cost bias — prototypes that fail should be cleanly abandoned
+- Case 5 requires that technical-preferences.md has the engine and language configured; test is incomplete if not configured
+- The intentional relaxation of coding standards is a feature, not a gap — do not flag missing tests or doc comments as failures in prototype output
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/specialists/sound-designer.md
+++ b/Framework/agents/specialists/sound-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: sound-designer
+
+## Agent Summary
+Domain: SFX specs, audio events, mixing parameters, and sound category definitions.
+Does NOT own: music composition direction (audio-director), code implementation of audio systems.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references SFX / audio events / mixing)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep — does NOT include engine code execution tools
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over music direction or audio code implementation
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an SFX spec for a sword swing attack."
+**Expected behavior:**
+- Produces a complete audio event spec including:
+  - Event name (e.g., `sfx_combat_sword_swing`)
+  - Variation count (minimum 3 to avoid repetition fatigue)
+  - Pitch range (e.g., ±8% randomization)
+  - Volume range and normalization target (e.g., -12 dBFS)
+  - Sound category (e.g., `combat_sfx`)
+  - Suggested layering notes (whoosh layer + impact transient)
+- Output follows the project audio naming convention if one is established
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Compose a looping ambient music track for the forest level."
+**Expected behavior:**
+- Does NOT produce music composition direction or a music brief
+- Explicitly states that music direction belongs to `audio-director`
+- Redirects the request to `audio-director`
+- May note it can provide an SFX ambience layer spec (wind, wildlife) to complement the music once the music direction is set
+
+### Case 3: Dynamic parameter — falloff curve spec
+**Input:** "The sword swing SFX needs distance falloff so it sounds different across the arena."
+**Expected behavior:**
+- Produces a spec for the dynamic parameter including:
+  - Parameter name (e.g., `distance` or `listener_distance`)
+  - Falloff curve type (e.g., logarithmic, linear, custom)
+  - Near/far distance thresholds with corresponding volume and high-frequency attenuation values
+  - Occlusion override behavior if applicable
+- Does NOT write the audio engine integration code (defers to the appropriate programmer)
+
+### Case 4: Naming convention conflict
+**Input:** "Add a new SFX event called `SWORD_HIT_1` for the melee system."
+**Expected behavior:**
+- Identifies that `SWORD_HIT_1` conflicts with the established event naming convention (snake_case with category prefix, e.g., `sfx_combat_sword_hit`)
+- Does NOT silently register the non-conforming name
+- Flags the conflict to `audio-director` with the proposed compliant alternative
+- Will proceed with the corrected name once confirmed by audio-director
+
+### Case 5: Context pass — uses audio style guide
+**Input:** Audio style guide provided in context specifying: "gritty, grounded, no reverb tails over 1.5s, reference: The Witcher 3 combat audio." Request: "Create SFX specs for the full melee combat suite."
+**Expected behavior:**
+- References the "gritty, grounded" tone descriptor in the spec rationale
+- Caps all reverb tail specifications at 1.5 seconds as stated
+- Notes the reference material (The Witcher 3) as a benchmark for mix levels and transient design
+- Does NOT produce specs that contradict the style guide (e.g., no ethereal or heavily reverb-processed specs)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (SFX specs, event definitions, mixing parameters)
+- [ ] Redirects music direction requests to audio-director
+- [ ] Returns structured audio event specs (event name, variations, pitch, volume, category)
+- [ ] Does not produce code for audio system implementation
+- [ ] Flags naming convention violations rather than silently accepting non-conforming names
+- [ ] References provided style guides and constraints in all spec output
+
+---
+
+## Coverage Notes
+- SFX spec format (Case 1) should match whatever event schema the audio middleware (Wwise/FMOD/built-in) requires
+- Falloff curve (Case 3) verifies the agent produces implementation-ready parameter specs
+- Style guide compliance (Case 5) confirms the agent reads provided context and constrains output accordingly
--- a/Framework/agents/specialists/technical-artist.md
+++ b/Framework/agents/specialists/technical-artist.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: technical-artist
+
+## Agent Summary
+Domain: Shaders, VFX, rendering optimization, art pipeline tools, and visual performance.
+Does NOT own: art style decisions or color palette (art-director), gameplay code (gameplay-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references shaders / VFX / rendering)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over art style direction or gameplay logic
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create a dissolve effect shader for enemy death sequences."
+**Expected behavior:**
+- Produces shader code or a Shader Graph node spec appropriate to the configured engine (Godot shading language / Unity Shader Graph / Unreal Material Blueprint)
+- Defines a `dissolve_amount` uniform (0.0–1.0) as the animation driver
+- Uses a noise texture sample to determine the dissolve threshold
+- Notes edge-lighting technique as an optional enhancement
+- Output is engine-version-aware (checks version reference if post-cutoff APIs are needed)
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Define the art bible color palette: primary, secondary, and accent colors for the UI."
+**Expected behavior:**
+- Does NOT produce color palette decisions or art direction documents
+- Explicitly states that art style decisions belong to `art-director`
+- Redirects the request to `art-director`
+- May note it can later implement a color-grading or palette LUT shader once the palette is decided
+
+### Case 3: Performance warning — GPU particle count
+**Input:** "The VFX system is triggering a GPU particle count warning at 50,000 particles in the explosion pool."
+**Expected behavior:**
+- Produces an optimization spec addressing the specific warning
+- Proposes concrete strategies: particle budget caps per emitter, LOD-based particle reduction, GPU instancing, or switching to mesh-based VFX for distant effects
+- Provides before/after GPU cost estimates where calculable
+- Does NOT change gameplay behavior of the explosion (delegates any gameplay impact to gameplay-programmer)
+
+### Case 4: Engine version compatibility
+**Input:** "Use the new texture sampler API for the water shader."
+**Expected behavior:**
+- Checks the engine version reference (e.g., `docs/engine-reference/godot/VERSION.md`) before suggesting any API
+- Flags if the requested API is post-cutoff (e.g., Godot 4.4+ texture type changes)
+- Provides the correct syntax for the project's pinned engine version
+- If uncertain about post-cutoff behavior, explicitly states the uncertainty and directs to verified docs
+
+### Case 5: Context pass — uses performance budget
+**Input:** Performance budget from `technical-preferences.md` provided in context: 2ms GPU frame budget, max 200 draw calls. Request: "Optimize the forest rendering system."
+**Expected behavior:**
+- References the specific 2ms GPU budget and 200 draw call limit from the provided context
+- Proposes optimizations calibrated to those exact targets (e.g., "batching reduces draw calls from 340 to ~180, within the 200 limit")
+- Does NOT propose optimizations that would exceed the stated budgets in other dimensions
+- Produces a ranked list of optimizations by expected impact vs. implementation cost
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (shaders, VFX, rendering optimization, art pipeline)
+- [ ] Redirects art style decisions to art-director
+- [ ] Returns structured findings (shader code, optimization specs with metrics, node graphs)
+- [ ] Does not modify gameplay code files without explicit delegation
+- [ ] Checks engine version reference before suggesting post-cutoff APIs
+- [ ] Quantifies performance changes against stated budgets
+
+---
+
+## Coverage Notes
+- Dissolve shader (Case 1) should include a visual test reference in `production/qa/evidence/`
+- Engine version check (Case 4) confirms the agent treats VERSION.md as authoritative
+- Performance budget case (Case 5) verifies the agent reads and applies provided context numbers
--- a/Framework/agents/specialists/tools-programmer.md
+++ b/Framework/agents/specialists/tools-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: tools-programmer
+
+## Agent Summary
+Domain: Editor extensions, content authoring tools, debug utilities, and pipeline automation scripts.
+Does NOT own: game code (gameplay-programmer, ui-programmer, etc.), engine core systems (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references editor tools / pipeline / debug utilities)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over game source code or engine internals
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create a custom editor tool for placing enemy patrol waypoints in the level."
+**Expected behavior:**
+- Produces an editor extension spec and code scaffold for the configured engine (e.g., Godot EditorPlugin, Unity Editor window, Unreal Detail Customization)
+- Tool allows designer to click-place waypoints in the scene/viewport
+- Waypoints are serialized as engine-native resource (not hardcoded) so level-designer can edit without code
+- Includes undo/redo support per editor plugin best practices
+- Does NOT modify the AI pathfinding runtime code (that belongs to ai-programmer)
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the enemy melee combo system in code."
+**Expected behavior:**
+- Does NOT produce gameplay mechanic code
+- Explicitly states that combat system implementation belongs to `gameplay-programmer`
+- Redirects the request to `gameplay-programmer`
+- May note it can build a debug overlay tool to visualize combo state if useful during development
+
+### Case 3: Runtime data access — coordination required
+**Input:** "The waypoint editor tool needs to read game data at runtime to validate patrol routes against the AI budget."
+**Expected behavior:**
+- Identifies that runtime data access from an editor plugin requires a defined, safe interface to the game's runtime systems
+- Coordinates with `engine-programmer` to establish a read-only data access pattern (e.g., a resource validation API)
+- Does NOT directly read internal engine or game memory structures without an agreed interface
+- Documents the required interface before implementing the tool
+
+### Case 4: Engine version breakage
+**Input:** "After the engine upgrade, the waypoint editor tool crashes on startup."
+**Expected behavior:**
+- Checks the engine version reference (`docs/engine-reference/`) for breaking changes in editor plugin APIs
+- Identifies the specific API or signal that changed in the new version
+- Produces a targeted fix for the breaking change
+- Notes any other tools that may be affected by the same API change
+
+### Case 5: Context pass — art pipeline requirements
+**Input:** Art pipeline requirements provided in context: "All texture imports must set compression to VRAM Compressed, generate mipmaps, and tag with a LOD group." Request: "Build an asset import tool that enforces these settings."
+**Expected behavior:**
+- References all three requirements from the context: VRAM compression, mipmap generation, LOD group tagging
+- Produces an import tool that validates and applies all three settings on import
+- Adds a warning or error report for assets that fail to meet the specified settings
+- Does NOT change the art pipeline requirements themselves (those belong to art-director / technical-artist)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (editor tools, pipeline scripts, debug utilities)
+- [ ] Redirects game code requests to appropriate programmer agents
+- [ ] Returns structured findings (tool specs, editor extension code, pipeline scripts)
+- [ ] Coordinates with engine-programmer before accessing runtime data from editor context
+- [ ] Checks engine version reference before using editor plugin APIs
+- [ ] Builds tools to enforce requirements, does not author the requirements themselves
+
+---
+
+## Coverage Notes
+- Waypoint editor tool (Case 1) should have a smoke test verifying it loads without errors in the editor
+- Runtime data access (Case 3) confirms the agent respects the engine-programmer's ownership of core APIs
+- Art pipeline context (Case 5) verifies the agent builds to match provided specs rather than inventing requirements
--- a/Framework/agents/specialists/ui-programmer.md
+++ b/Framework/agents/specialists/ui-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ui-programmer
+
+## Agent Summary
+Domain: Menu screens, HUDs, inventory screens, dialogue boxes, UI framework code, and data binding.
+Does NOT own: UX flow design (ux-designer), visual style direction (art-director / technical-artist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references menus / HUDs / UI framework / data binding)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow design or visual art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement the inventory screen from the UX spec in `design/ux/inventory-flow.md`."
+**Expected behavior:**
+- Reads the UX spec before producing any code
+- Produces implementation using the project's configured UI framework (UI Toolkit, UGUI, UMG, or Godot Control nodes)
+- Implements all states defined in the spec (default, hover, selected, empty-slot, locked-slot)
+- Binds inventory data to UI elements via the project's data model, not hardcoded values
+- Includes doc comments on public UI API per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the inventory interaction flow — what happens when the player equips, drops, or combines items."
+**Expected behavior:**
+- Does NOT produce interaction flow design or user flow diagrams
+- Explicitly states that UX flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- Notes that once the flow spec is ready, it can implement it
+
+### Case 3: Custom animation coordination
+**Input:** "The item selection in the inventory needs a custom bounce animation when selected."
+**Expected behavior:**
+- Recognizes that defining the animation curve and feel is within technical-artist territory
+- Does NOT invent animation parameters (timing, easing) without a spec
+- Coordinates with `technical-artist` for an animation spec (duration, easing curve, overshoot amount)
+- Once the spec is provided, produces the implementation binding the animation to the selection state
+
+### Case 4: Ambiguous UX spec — flags back
+**Input:** The UX spec states "show item details on selection" but does not define what happens when an empty slot is selected.
+**Expected behavior:**
+- Identifies the ambiguity in the spec (empty slot selection state is undefined)
+- Does NOT make an arbitrary implementation decision for the undefined state
+- Flags the ambiguity back to `ux-designer` with the specific question: "What should the detail panel show when an empty inventory slot is selected?"
+- May propose two common options (hide panel / show placeholder) to help ux-designer decide quickly
+
+### Case 5: Context pass — engine UI toolkit
+**Input:** Engine context provided: project uses Godot 4.6 with Control node UI. Request: "Implement a scrollable item list for the inventory."
+**Expected behavior:**
+- Uses Godot's `ScrollContainer` + `VBoxContainer` + `ItemList` (or equivalent) pattern, not Canvas or UGUI
+- Does NOT produce Unity UGUI or Unreal UMG code for a Godot project
+- Checks the engine version reference (4.6) for any Control node API changes from 4.4/4.5 before using specific APIs
+- Produces GDScript or C# code consistent with the project's configured language
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (menus, HUDs, UI framework, data binding)
+- [ ] Redirects UX flow design to ux-designer
+- [ ] Coordinates with technical-artist for animation specs before implementing animations
+- [ ] Flags ambiguous UX specs back to ux-designer rather than making arbitrary implementation decisions
+- [ ] Returns structured output (implementation code, data binding patterns, state machine for UI states)
+- [ ] Uses the correct engine UI toolkit for the project — never cross-engine code
+
+---
+
+## Coverage Notes
+- Inventory implementation (Case 1) should have a UI interaction test or manual walkthrough doc in `production/qa/evidence/`
+- Animation coordination (Case 3) confirms the agent does not invent feel parameters without a spec
+- Ambiguous spec (Case 4) verifies the agent routes spec gaps back to the authoring agent rather than guessing
--- a/Framework/agents/specialists/ux-designer.md
+++ b/Framework/agents/specialists/ux-designer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ux-designer
+
+## Agent Summary
+Domain: User experience flows, interaction design, information architecture, input handling design, and onboarding UX.
+Does NOT own: visual art style (art-director), UI implementation code (ui-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UX flows / interaction design / information architecture)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over visual art direction or UI implementation code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Design the inventory management flow for a survival game."
+**Expected behavior:**
+- Produces a user flow diagram (states and transitions) for the inventory: open, browse, select item, sub-actions (equip/drop/combine), close
+- Defines all interaction states (default, hover, selected, empty-slot, locked-slot)
+- Specifies input mappings for each action (keyboard, gamepad if applicable)
+- Notes cognitive load considerations (e.g., maximum items visible without scrolling)
+- Does NOT produce visual design (colors, icons) or implementation code
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the inventory screen in GDScript with drag-and-drop support."
+**Expected behavior:**
+- Does NOT produce implementation code
+- Explicitly states that UI code implementation belongs to `ui-programmer`
+- Redirects the request to `ui-programmer`
+- Notes that the UX flow spec should be provided to ui-programmer as the implementation reference
+
+### Case 3: Flow depth conflict — simplification
+**Input:** "The lead designer says the current 5-step crafting flow is too deep; maximum 3 steps allowed."
+**Expected behavior:**
+- Produces a revised 3-step flow that collapses the original 5-step sequence
+- Shows clearly what was merged or removed and why each collapse is safe from a usability standpoint
+- Does NOT simply remove steps without addressing the user's goal at each removed step
+- Flags if the 3-step constraint makes any required use case impossible and proposes an alternative
+
+### Case 4: Accessibility conflict
+**Input:** "The onboarding flow uses a timed prompt (auto-advances after 3 seconds) to keep pace, but this conflicts with accessibility requirements for user-controlled timing."
+**Expected behavior:**
+- Identifies the conflict with WCAG 2.1 2.2.1 (Timing Adjustable)
+- Does NOT override the accessibility requirement to preserve pace
+- Coordinates with `accessibility-specialist` to agree on a compliant solution
+- Proposes alternatives: pause-on-hover, skip button, settings option to disable auto-advance
+
+### Case 5: Context pass — player mental model research
+**Input:** Playtest research provided in context: "Players consistently expected the 'Crafting' option to be inside the Inventory screen, not in a separate top-level menu." Request: "Redesign the navigation IA for crafting."
+**Expected behavior:**
+- References the specific player expectation from the research (crafting expected inside inventory)
+- Restructures the information architecture to place crafting as a tab or panel within the inventory screen
+- Does NOT produce a design that contradicts the stated player mental model without explicit justification
+- Notes the research source in the rationale for the design decision
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UX flows, interaction design, IA, onboarding)
+- [ ] Redirects code implementation to ui-programmer, visual style to art-director
+- [ ] Returns structured findings (state diagrams, flow steps, input mappings) not freeform opinions
+- [ ] Coordinates with accessibility-specialist when flows have timing or cognitive load constraints
+- [ ] Designs flows based on provided user research, not assumed behavior
+- [ ] Documents rationale for flow decisions against user goals
+
+---
+
+## Coverage Notes
+- Inventory flow (Case 1) should be written to `design/ux/` as a spec for ui-programmer to implement against
+- Mental model case (Case 5) verifies the agent applies research evidence, not intuition
+- Accessibility coordination (Case 4) confirms the agent does not override accessibility requirements for UX aesthetics
--- a/Framework/agents/specialists/world-builder.md
+++ b/Framework/agents/specialists/world-builder.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: world-builder
+
+## Agent Summary
+- **Domain**: World lore architecture — factions and their cultures/governments/motivations, world history, geography and ecology, cosmology and metaphysics, world rules (how magic works, what is and is not possible), internal consistency enforcement across the world document
+- **Does NOT own**: Specific NPC or quest dialogue (writer), game mechanics rules derived from world rules (game-designer/systems-designer), narrative story structure and arc design (narrative-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates world rule/mechanic conflicts to narrative-director and game-designer jointly
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references world lore, factions, history, world rules, ecology)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/narrative/world/ documents; no game source, mechanic design, or dialogue files)
+- [ ] Model tier is Sonnet (default for creative specialists)
+- [ ] Agent definition does not claim authority over dialogue writing, mechanic design, or narrative arc structure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — faction culture and government design
+**Input**: "Design the Ironveil Merchant Consortium — a powerful trading faction in our world. I need their culture, government structure, and internal motivations."
+**Expected behavior**:
+- Produces a faction profile document with: cultural values and norms, government structure (how decisions are made, who holds power, succession or appointment process), internal factions or tensions within the consortium, relationship to other factions (allies, rivals, neutral parties), and primary motivations (what they want and why)
+- The faction is internally consistent: a merchant consortium's government is driven by economic logic, not feudal or religious logic, unless a deliberate hybrid is specified
+- Output includes at least one internal tension or contradiction within the faction — factions without internal complexity are flat
+- Formatted as a structured faction profile, not a narrative essay
+
+### Case 2: Out-of-domain request — dialogue writing
+**Input**: "Write the dialogue for a Ironveil Consortium merchant NPC that the player meets at the city gates."
+**Expected behavior**:
+- Does not produce NPC dialogue
+- States clearly: "Dialogue writing is owned by writer; I provide the world and faction context that informs the dialogue, including the faction's culture, tone, and speaking style"
+- Offers to produce the faction's speaking style notes and cultural context that writer would need to write consistent dialogue
+
+### Case 3: New lore entry contradicts established history — conflict flagging
+**Input**: "Add a lore entry stating the Ironveil Consortium was founded 50 years ago by a single merchant family." [Context includes existing lore: the Consortium has existed for 300 years and was founded as a collective by 12 rival trading houses.]
+**Expected behavior**:
+- Identifies the contradiction: existing lore states 300-year history and a founding coalition of 12 houses; the new entry claims 50 years and a single founding family
+- Does NOT write the new entry as requested
+- Flags the conflict: states both versions, identifies which is established and which is the proposed change
+- Proposes resolution options: (a) the new entry is wrong and should be corrected; (b) the existing lore should be updated if the new version is the intended canon; (c) there is an in-world explanation (the current family claims founding credit despite the collective origin — a deliberate narrative unreliable narrator)
+- Routes the resolution to narrative-director if no clear answer exists
+
+### Case 4: World rule has gameplay implications — coordination with game-designer
+**Input**: "I want to establish a world rule: magic users who cast spells near iron ore are weakened. Iron disrupts arcane energy."
+**Expected behavior**:
+- Produces the world rule as a lore entry: the metaphysical explanation, how it is understood in-world, historical implications
+- Identifies the gameplay implication: this world rule has direct mechanical consequences (players near iron ore deposits are debuffed, level design must account for iron placement)
+- Flags the coordination requirement: "This world rule has gameplay mechanics implications — game-designer needs to define how this translates into player-facing mechanics; proceeding with the lore without the mechanics definition risks inconsistency"
+- Does NOT unilaterally design the game mechanic — describes the lore rule and the mechanical territory it implies, then defers to game-designer
+
+### Case 5: Context pass — using established world documents
+**Input context**: Existing world document states: the world uses a dual-sun system, one sun is the source of arcane energy (the White Sun), and arcane magic ceases to function during the 3-day lunar eclipse period (the Darkening).
+**Input**: "Add a lore entry about the Mages' College and how they prepare for the Darkening."
+**Expected behavior**:
+- Uses the established dual-sun cosmology: references the White Sun as the source of arcane energy
+- Uses the established Darkening event: 3-day eclipse, magic ceases
+- Does NOT invent a different eclipse mechanism, duration, or name
+- Produces a lore entry where the Mages' College's Darkening preparations are consistent with the established rules: they cannot cast during the Darkening, so preparations are practical (stockpiling non-magical supplies, scheduling, shutting down ongoing magical processes)
+- Does not contradict any established fact from the context document
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (factions, world history, geography, ecology, world rules, cosmology)
+- [ ] Redirects dialogue writing requests to writer with contextual faction notes
+- [ ] Flags lore contradictions with both versions stated and resolution options offered — does not silently overwrite established lore
+- [ ] Identifies gameplay implications of world rules and flags coordination with game-designer
+- [ ] Uses all established world facts from context; does not invent alternatives to stated lore
+
+---
+
+## Coverage Notes
+- Case 3 (contradiction detection) requires existing lore to be in context — this is the most important consistency test
+- Case 4 (world rule/mechanic coordination) tests cross-domain awareness; verify the agent identifies the mechanic boundary without crossing it
+- Case 5 is the most important context-awareness test; the agent must use established facts, not creative alternatives
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/specialists/writer.md
+++ b/Framework/agents/specialists/writer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: writer
+
+## Agent Summary
+- **Domain**: In-game written content — NPC dialogue (including branching trees), lore codex entries, item and ability descriptions, environmental text (signs, books, notes), quest text, tutorial text, in-world written documents
+- **Does NOT own**: Story architecture and narrative structure (narrative-director), world lore and world rules (world-builder), UX copy and UI labels (ux-designer), patch notes (community-manager)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; flags lore inconsistencies to narrative-director rather than resolving them autonomously
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references dialogue, lore entries, item descriptions, in-game text)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/narrative/ and assets/data/dialogue/; no code or world-building architecture files)
+- [ ] Model tier is Sonnet (default for creative specialists)
+- [ ] Agent definition does not claim authority over narrative structure, world rules, or UX copy direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — NPC merchant dialogue
+**Input**: "Write dialogue for Mira, a traveling merchant NPC. She sells general supplies. Players can ask her about her wares, the road ahead, and rumors."
+**Expected behavior**:
+- Produces a dialogue tree with at least three top-level conversation options: [Wares], [The Road Ahead], [Rumors]
+- Each branch has a distinct conversational response in Mira's voice — not generic merchant filler
+- Includes at least one response that has a follow-up branch (showing tree structure, not just flat responses)
+- Mira's voice is consistent across branches: if she's warm and chatty in one branch, she's not brusque in another without reason
+- Output is formatted as a structured dialogue tree: node label, NPC line, player options, next node
+
+### Case 2: Out-of-domain request — world history design
+**Input**: "Design the history of the world — when the first kingdom was founded, what the great wars were, and why magic was banned."
+**Expected behavior**:
+- Does not produce world history, lore architecture, or world rules
+- States clearly: "World history, lore, and world rules are owned by world-builder; once the history is established, I can write in-game texts, books, and dialogue that reference those events"
+- Does not produce even partial world history as a "placeholder"
+
+### Case 3: Dialogue contradicts established lore — flag to narrative-director
+**Input**: "Write Mira's dialogue line where she mentions that dragons have been extinct for 200 years." [Context includes existing lore: dragons are alive and revered in the northern provinces, not extinct.]
+**Expected behavior**:
+- Identifies the contradiction: established lore states dragons are alive and revered; dialogue stating they're extinct directly conflicts
+- Does NOT write the requested line as given
+- Flags the inconsistency to narrative-director: "Mira's dialogue as requested contradicts established lore (dragons are alive per world-builder's document); requires narrative-director resolution before I can write this line"
+- Offers an alternative: a line that references dragons in a way consistent with the established lore (e.g., Mira expresses awe about a dragon sighting in the north)
+
+### Case 4: Item description references an undesigned mechanic
+**Input**: "Write a description for the 'Berserker's Chalice' — a consumable that triggers the Berserker state when drunk."
+**Expected behavior**:
+- Identifies the dependency gap: "Berserker state" is not defined in any provided game design document
+- Flags the missing dependency: "This description references a 'Berserker state' mechanic that has no GDD entry — I cannot write accurate flavor text for a mechanic whose rules are undefined, as the description may create incorrect player expectations"
+- Does NOT write a description that invents mechanic details (duration, effects) that may conflict with the eventual design
+- Offers two paths: (a) write a vague, non-mechanical description that creates no false expectations, flagged as temporary; (b) wait for game-designer to define the Berserker state first
+
+### Case 5: Context pass — character voice guide
+**Input context**: Character voice guide for Mira: She speaks in short, energetic sentences. Uses merchant slang ("a fine bargain," "coin well spent"). Drops pronouns occasionally ("Good wares, these."). Never uses contractions — always "I will" not "I'll". Warm but slightly mercenary.
+**Input**: "Write Mira's response when a player asks if she has healing potions."
+**Expected behavior**:
+- Short, energetic sentences — no long monologues
+- Uses merchant slang: "a fine bargain," "coin well spent," or similar
+- Drops pronouns where natural: "Fine stock, these potions."
+- No contractions: "I will" not "I'll," "do not" not "don't"
+- Warm tone with a mercenary undertone: she's happy to help because you're a paying customer
+- Does NOT produce dialogue that violates any voice guide rule — check each rule explicitly
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (dialogue, lore entries, item descriptions, in-game text)
+- [ ] Redirects world history and world rule requests to world-builder without producing unauthorized lore
+- [ ] Flags lore contradictions to narrative-director rather than silently writing inconsistent content
+- [ ] Identifies mechanic dependency gaps before writing item descriptions that could create false player expectations
+- [ ] Applies all rules from a provided character voice guide — no partial compliance
+
+---
+
+## Coverage Notes
+- Case 3 (lore contradiction detection) requires that existing lore is in the conversation context — test is only valid when context is provided
+- Case 4 (dependency gap) tests whether the agent writes descriptions that could set wrong player expectations — a subtle but important quality issue
+- Case 5 is the most important context-awareness test; voice guide compliance must be checked rule-by-rule, not holistically
+- No automated runner; review manually or via `/skill-test`