添加 claude code game studios 到项目

2026-05-15 14:52:29 +08:00
parent dff559462d
commit a16fe4bff7
415 changed files with 78609 additions and 0 deletions
--- a/Framework/agents/specialists/ai-programmer.md
+++ b/Framework/agents/specialists/ai-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ai-programmer
+
+## Agent Summary
+Domain: NPC behavior, state machines, pathfinding, perception systems, and AI decision-making.
+Does NOT own: player mechanics (gameplay-programmer), rendering or engine internals (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references NPC behavior / AI systems)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over player mechanics or engine rendering
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a patrol-and-alert behavior tree for a guard NPC: patrol between waypoints, detect the player within 10 units, then enter an alert state and pursue."
+**Expected behavior:**
+- Produces a behavior tree spec (nodes: Selector, Sequence, Leaf actions) plus corresponding code scaffold
+- Defines clearly named states: Patrol, Alert, Pursue
+- Uses a perception/detection check as a condition node, not inline in movement code
+- Waypoints are data-driven (passed as a resource or export), not hardcoded positions
+- Output includes doc comments on public API
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement player input handling for the WASD movement and dash ability."
+**Expected behavior:**
+- Does NOT produce player input or movement code
+- Explicitly states this is outside its domain (player mechanics belong to gameplay-programmer)
+- Redirects the request to `gameplay-programmer`
+- May note that once player position is available via API, AI perception can reference it
+
+### Case 3: Cross-domain coordination — level constraints
+**Input:** "Design pathfinding for the warehouse level, but the level has narrow corridors that confuse the navmesh."
+**Expected behavior:**
+- Does NOT unilaterally modify level layout or navmesh assets
+- Coordinates with `level-designer` to clarify navmesh requirements and corridor dimensions
+- Proposes a pathfinding approach (e.g., navmesh with agent radius tuning, flow fields) conditional on level geometry
+- Documents assumptions and flags blockers clearly
+
+### Case 4: Performance escalation — custom data structures
+**Input:** "The pathfinding priority queue is the bottleneck; I need a custom binary heap implementation for performance."
+**Expected behavior:**
+- Recognizes that a low-level, engine-integrated data structure is within engine-programmer's domain
+- Escalates to `engine-programmer` with a clear description of the bottleneck and required interface
+- May provide the algorithmic spec (binary heap interface, expected operations) to guide the engine-programmer
+- Does NOT implement the low-level structure unilaterally if it requires engine memory management
+
+### Case 5: Context pass — uses level layout for pathfinding design
+**Input:** Level layout document provided in context showing two choke points: a doorway at (12, 0) and a bridge at (40, 5). Request: "Design the patrol route and threat response for enemies in this level."
+**Expected behavior:**
+- References the specific choke point coordinates from the provided context
+- Designs patrol routes that leverage the choke points as tactical positions
+- Specifies alert state transitions that funnel NPCs toward identified choke points during pursuit
+- Does not invent geometry not present in the provided layout document
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (NPC behavior, pathfinding, perception, state machines)
+- [ ] Redirects out-of-domain requests to correct agent (gameplay-programmer, engine-programmer, level-designer)
+- [ ] Returns structured findings (behavior tree specs, state machine diagrams, code scaffolds)
+- [ ] Does not modify player mechanics files without explicit delegation
+- [ ] Escalates performance-critical low-level structures to engine-programmer
+- [ ] Uses data-driven NPC configuration (waypoints, detection radii) not hardcoded values
+
+---
+
+## Coverage Notes
+- Behavior tree output (Case 1) should be validated by a unit test in `tests/unit/ai/`
+- Level-layout context (Case 5) verifies the agent reads and applies provided documents rather than inventing
+- Performance escalation (Case 4) confirms the agent recognizes the engine-programmer boundary
--- a/Framework/agents/specialists/engine-programmer.md
+++ b/Framework/agents/specialists/engine-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: engine-programmer
+
+## Agent Summary
+Domain: Rendering pipeline, physics integration, memory management, resource loading, and core engine framework.
+Does NOT own: gameplay mechanics (gameplay-programmer), editor/debug tool UI (tools-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references rendering / memory / engine core)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay mechanics or tool UI
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a custom object pool for projectiles to avoid per-frame allocation."
+**Expected behavior:**
+- Produces an engine-level object pool implementation with acquire/release interface
+- Pool is typed to the projectile object type, uses pre-allocated fixed-size storage
+- Provides thread-safety notes (or clearly marks as single-threaded-only with rationale)
+- Includes doc comments on the public API per coding standards
+- Output is compatible with the project's configured engine and language
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Add a pause menu screen with volume sliders and a 'back to main menu' button."
+**Expected behavior:**
+- Does NOT produce UI screen code
+- Explicitly states that menu screens belong to `ui-programmer`
+- Redirects the request to `ui-programmer`
+- May note it can provide engine-level audio volume API endpoints for the ui-programmer to call
+
+### Case 3: Memory leak diagnosis
+**Input:** "Memory usage grows by ~50MB per level load and never releases. We suspect the resource loading system."
+**Expected behavior:**
+- Produces a systematic diagnosis approach: reference counting audit, resource handle lifecycle check, cache invalidation review
+- Identifies likely causes (orphaned resource handles, circular references, cache that never evicts)
+- Produces a concrete fix for the identified leak pattern
+- Provides a test to verify the fix (memory baseline before load, measure after unload, confirm return to baseline)
+
+### Case 4: Cross-domain coordination — shared system optimization
+**Input:** "I need to optimize the physics broadphase, but the gameplay system is tightly coupled to the physics query API."
+**Expected behavior:**
+- Does NOT unilaterally change the physics query API surface (would break gameplay-programmer's code)
+- Coordinates with `lead-programmer` to plan the change safely
+- Proposes a migration path: new optimized API alongside old API, with a deprecation period
+- Documents the coordination requirement before proceeding
+
+### Case 5: Context pass — checks engine version reference
+**Input:** Engine version reference (Godot 4.6) provided in context. Request: "Set up the default physics engine for the project."
+**Expected behavior:**
+- Reads the engine version reference and notes Godot 4.6 change: Jolt physics is now the default
+- Produces configuration guidance that accounts for the Jolt-as-default change (4.6 migration note)
+- Flags any API differences between GodotPhysics and Jolt that could affect existing code
+- Does NOT suggest deprecated or pre-4.6 physics setup steps without noting they apply to older versions
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (rendering, physics, memory, resource loading, core framework)
+- [ ] Redirects UI/menu requests to ui-programmer
+- [ ] Returns structured findings (implementation code, diagnosis steps, migration plans)
+- [ ] Coordinates with lead-programmer before changing shared API surfaces
+- [ ] Checks engine version reference before suggesting engine-specific APIs
+- [ ] Provides test evidence for fixes (memory before/after, performance measurements)
+
+---
+
+## Coverage Notes
+- Object pool (Case 1) must include a unit test in `tests/unit/engine/`
+- Memory leak diagnosis (Case 3) should produce evidence artifacts in `production/qa/evidence/`
+- Engine version check (Case 5) confirms the agent treats VERSION.md as authoritative, not LLM training data
--- a/Framework/agents/specialists/gameplay-programmer.md
+++ b/Framework/agents/specialists/gameplay-programmer.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: gameplay-programmer
+
+## Agent Summary
+Domain: Game mechanics code, player systems, combat implementation, and interactive features.
+Does NOT own: UI implementation (ui-programmer), AI behavior trees (ai-programmer), engine/rendering systems (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references game mechanics / player systems)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep — excludes tools only needed by orchestration agents
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UI, AI behavior, or engine/rendering code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement a melee combo system where three consecutive light attacks chain into a finisher."
+**Expected behavior:**
+- Produces code or a code scaffold following the project's language (GDScript/C#) and coding standards
+- Defines combo state tracking, input window timing, and finisher trigger logic as separate, testable methods
+- References the relevant GDD section if one is provided in context
+- Does NOT implement UI feedback (delegates to ui-programmer) or AI reaction (delegates to ai-programmer)
+- Output includes doc comments on all public methods per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Build the main menu screen with pause and settings panels."
+**Expected behavior:**
+- Does NOT produce menu implementation code
+- Explicitly states this is outside its domain
+- Redirects the request to `ui-programmer`
+- May note that if the pause menu requires reading gameplay state it can provide the state API surface
+
+### Case 3: Domain boundary — threading flag
+**Input:** "The combo system is causing frame stutters; can you add threading to spread the input processing?"
+**Expected behavior:**
+- Does NOT unilaterally implement threading or async systems
+- Flags the threading concern to `engine-programmer` with a clear description of the hot path
+- May produce a non-threaded refactor to reduce work per frame as a safe interim step
+- Documents the escalation so lead-programmer is aware
+
+### Case 4: Conflict with an Accepted ADR
+**Input:** "Change the damage calculation to use floating-point accumulation directly instead of the fixed-point formula in ADR-003."
+**Expected behavior:**
+- Identifies that the proposed change violates ADR-003 (Accepted status)
+- Does NOT silently implement the violation
+- Flags the conflict to `lead-programmer` with the ADR reference and the trade-off described
+- Will implement only after explicit override decision from lead-programmer or technical-director
+
+### Case 5: Context pass — implements to GDD spec
+**Input:** GDD for "PlayerCombat" provided in context. Request: "Implement the stamina drain formula from the combat GDD."
+**Expected behavior:**
+- Reads the formula section of the provided GDD
+- Implements the exact formula as written — does NOT invent new variables or adjust coefficients
+- Makes stamina drain a data-driven value (external config), not a hardcoded constant
+- Notes any edge cases from the GDD's edge-cases section and handles them in code
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (mechanics, player systems, combat)
+- [ ] Redirects out-of-domain requests to correct agent (ui-programmer, ai-programmer, engine-programmer)
+- [ ] Returns structured findings (code scaffold, method signatures, inline comments) not freeform opinions
+- [ ] Does not modify files outside `src/gameplay/` or `src/core/` without explicit delegation
+- [ ] Flags ADR violations rather than overriding them silently
+- [ ] Makes gameplay values data-driven, never hardcoded
+
+---
+
+## Coverage Notes
+- Combo system test (Case 1) should be validated with a unit test in `tests/unit/gameplay/`
+- Threading escalation (Case 3) verifies the agent does not over-reach into engine territory
+- ADR conflict (Case 4) confirms the agent respects the architecture governance process
+- Cases 1 and 5 together verify the agent implements to spec rather than improvising
--- a/Framework/agents/specialists/network-programmer.md
+++ b/Framework/agents/specialists/network-programmer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: network-programmer
+
+## Agent Summary
+Domain: Multiplayer networking, state replication, lag compensation, matchmaking protocol design, and network message schemas.
+Does NOT own: gameplay logic (only the networking of it), server infrastructure and deployment (devops-engineer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references multiplayer / replication / networking)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over gameplay logic or server deployment infrastructure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Design state replication for player position in a 4-player co-op game."
+**Expected behavior:**
+- Produces a sync strategy document covering:
+  - Replication frequency (e.g., 20Hz with delta compression)
+  - Priority tier (e.g., own-player high priority, other players medium)
+  - Interpolation approach for remote players (e.g., linear interpolation with 100ms buffer)
+  - Bandwidth estimate per player per second
+- Does NOT implement the player movement logic itself (defers to gameplay-programmer)
+- Proposes dead-reckoning or prediction strategy to reduce visible lag
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Deploy our game server to AWS EC2 and set up auto-scaling."
+**Expected behavior:**
+- Does NOT produce server deployment configuration, Terraform, or AWS setup scripts
+- Explicitly states that server infrastructure belongs to `devops-engineer`
+- Redirects the request to `devops-engineer`
+- May note it can provide the network protocol spec the server needs to implement once infrastructure is set up
+
+### Case 3: State divergence — rollback/reconciliation
+**Input:** "Under high latency, clients are diverging from the authoritative server state for physics objects."
+**Expected behavior:**
+- Proposes a rollback-and-reconciliation approach (client-side prediction + server authoritative correction)
+- Specifies the state snapshot format, reconciliation trigger threshold (e.g., >5 units position error), and correction interpolation speed
+- Notes the input buffer pattern for deterministic replay
+- Does NOT change the physics simulation itself — documents the interface contract for engine-programmer
+
+### Case 4: Anti-cheat conflict
+**Input:** "We want client-authoritative position for smooth movement, but anti-cheat requires server validation."
+**Expected behavior:**
+- Surfaces the direct conflict: client-authority is fast but exploitable; server-authority is secure but requires latency compensation
+- Coordinates with `security-engineer` to agree on the validation boundary
+- Proposes a compromise (server validates position within a tolerance band, flags outliers) rather than unilaterally deciding
+- Documents the trade-off and escalates the final decision to `technical-director` if security-engineer and network-programmer cannot agree
+
+### Case 5: Context pass — latency budget
+**Input:** Technical preferences provided in context: target latency 80ms RTT for 95th percentile players. Request: "Design the input replication scheme for a fighting game."
+**Expected behavior:**
+- References the 80ms RTT budget explicitly in the design
+- Selects replication approach calibrated to that budget (e.g., rollback netcode is preferred for fighting games at this latency)
+- Specifies input delay frames calculated from the 80ms budget (e.g., 2 frames at 60fps = 33ms buffer)
+- Flags that rollback netcode requires gameplay-programmer to implement deterministic simulation
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (replication, lag compensation, protocol design, matchmaking)
+- [ ] Redirects server deployment to devops-engineer
+- [ ] Returns structured findings (sync strategies, protocol specs, bandwidth estimates)
+- [ ] Does not implement gameplay logic — only specifies the network contract for it
+- [ ] Coordinates with security-engineer on anti-cheat boundaries
+- [ ] Designs to explicit latency targets from provided context
+
+---
+
+## Coverage Notes
+- Replication strategy (Case 1) should include a bandwidth calculation reviewable by technical-director
+- Rollback/reconciliation (Case 3) must document the engine-programmer interface contract clearly
+- Anti-cheat conflict (Case 4) confirms the agent escalates rather than unilaterally deciding security trade-offs
--- a/Framework/agents/specialists/performance-analyst.md
+++ b/Framework/agents/specialists/performance-analyst.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: performance-analyst
+
+## Agent Summary
+Domain: Profiling, bottleneck identification, performance metrics tracking, and optimization recommendations.
+Does NOT own: implementing optimizations (belongs to the appropriate programmer for that domain).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references profiling / bottleneck analysis / performance metrics)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over implementing any optimization — explicitly identifies itself as analysis/recommendation only
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Analyze this frame time data: CPU 14ms, GPU 8ms, physics 6ms, draw calls 420, scripts 3ms."
+**Expected behavior:**
+- Identifies the primary bottleneck: CPU is over a 16.67ms (60fps) budget at 14ms total
+- Breaks down contributors: physics (6ms, 43% of CPU time) is the top culprit
+- Draw calls (420) flags as a secondary concern if the budget limit is lower (e.g., 200 draw calls per technical-preferences.md)
+- Produces a prioritized bottleneck report:
+  1. Physics — 6ms, reduce simulation frequency or switch broadphase algorithm
+  2. Draw calls — 420, implement batching or LOD
+  3. Scripts — 3ms, profile hot paths
+- Does NOT implement any of these optimizations
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the batching optimization to reduce draw calls from 420 to under 200."
+**Expected behavior:**
+- Does NOT produce implementation code for batching
+- Explicitly states that implementing optimizations belongs to the appropriate programmer (engine-programmer for rendering batching)
+- Redirects the implementation to `engine-programmer` with the recommendation context attached
+- May produce a requirements brief for the optimization so engine-programmer has a clear target
+
+### Case 3: Regression identification
+**Input:** "Performance dropped significantly after last week's commits. Frame time went from 10ms to 18ms."
+**Expected behavior:**
+- Proposes a bisection strategy to identify the offending commit range
+- Requests or reviews the diff of commits in the window to narrow the likely cause
+- Identifies affected systems based on what changed (e.g., if physics code was modified, points to physics as the primary suspect)
+- Produces a regression report naming the probable commit, the affected system, and the measured delta
+
+### Case 4: Recommendation vs. code quality trade-off
+**Input:** "The fastest optimization for the script bottleneck would be to inline all calls and remove abstraction layers."
+**Expected behavior:**
+- Surfaces the trade-off: inlining improves performance but reduces testability and violates the coding standard requiring unit-testable public methods
+- Does NOT recommend the optimization without noting the code quality cost
+- Escalates the trade-off to `lead-programmer` for a decision
+- May propose a middle path (e.g., profile-guided inlining of only the hottest 2–3 methods) that preserves testability
+
+### Case 5: Context pass — technical-preferences.md budget
+**Input:** Technical preferences from context: Target 60fps, frame budget 16.67ms, draw calls max 200, memory ceiling 512MB. Request: "Review the current build profile."
+**Expected behavior:**
+- References the specific values from the provided context: 16.67ms, 200 draw calls, 512MB
+- Compares current measurements against each threshold explicitly
+- Labels each metric as WITHIN BUDGET / AT RISK / OVER BUDGET based on the provided numbers
+- Does NOT use different budget numbers than those provided in the context
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (profiling, analysis, recommendations — not implementation)
+- [ ] Redirects optimization implementation to the correct programmer domain agent
+- [ ] Returns structured findings (bottleneck report with severity, measured values, and recommended action owner)
+- [ ] Escalates code-quality trade-offs to lead-programmer rather than deciding unilaterally
+- [ ] Applies budget thresholds from provided context rather than assumed defaults
+- [ ] Labels all findings with a specific action owner (who should implement the fix)
+
+---
+
+## Coverage Notes
+- Frame time analysis (Case 1) output should be structured as a report filed in `production/qa/evidence/`
+- Regression case (Case 3) confirms the agent investigates cause, not just measures symptoms
+- Code quality trade-off (Case 4) verifies the agent does not recommend optimizations that violate coding standards without flagging the conflict
--- a/Framework/agents/specialists/prototyper.md
+++ b/Framework/agents/specialists/prototyper.md
@@ -0,0 +1,82 @@
+# Agent Test Spec: prototyper
+
+## Agent Summary
+- **Domain**: Rapid throwaway prototypes in the `prototypes/` directory, concept validation experiments, mechanical feasibility tests. Standards intentionally relaxed for speed — prototypes are not production code.
+- **Does NOT own**: Production source code in `src/` (gameplay-programmer), design documents (game-designer), production-grade architecture decisions (lead-programmer / technical-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; produces recommendation docs after prototype conclusion; does not participate in phase gates
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references rapid prototyping, prototypes/ directory, throwaway code)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write scoped to prototypes/ directory; no production src/ write access)
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition explicitly states that prototype code is not production code and must not be copied to src/
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — prototype a card-drawing mechanic
+**Input**: "Prototype a card-drawing mechanic in 2 hours. The core question: does drawing 3 cards per turn with hand-size limit of 7 feel good? I need something to test in a playtest today."
+**Expected behavior**:
+- Produces a minimal working prototype written in the project's engine scripting language, scoped to `prototypes/card-draw-mechanic/`
+- Code prioritizes speed over correctness: no unit tests, no doc comments required, global state is acceptable for a prototype
+- Implements the minimal viable mechanic: a deck, a draw function (draw N cards), a hand container with a size limit, and a simple UI or debug print to verify state
+- Does NOT implement production patterns (dependency injection, signals, data-driven config) unless they take less time than not using them
+- Includes a `README.md` in the prototype folder: hypothesis being tested, how to run, what to observe in the playtest
+
+### Case 2: Out-of-domain request — production-grade implementation
+**Input**: "The card mechanic prototype worked great. Now write the production implementation of the card system for src/gameplay/cards/."
+**Expected behavior**:
+- Does not write production code to `src/`
+- States clearly: "Prototyper produces throwaway code in prototypes/ to validate concepts; production implementation of validated mechanics is handled by gameplay-programmer"
+- Offers to produce a transition document: what the prototype proved, what the production implementation should preserve (the mechanic), and what it should discard (the throwaway implementation patterns)
+- Does NOT copy the prototype code into src/ or suggest it as a starting point without warning about its non-production quality
+
+### Case 3: Prototype validates the mechanic — recommendation output
+**Input**: "The card-draw prototype playtested well. Three sessions all enjoyed drawing 3 cards/turn with hand limit 7. No confusion observed. What's next?"
+**Expected behavior**:
+- Produces a prototype conclusion document in `prototypes/card-draw-mechanic/conclusion.md` (or equivalent)
+- Document includes: hypothesis that was tested, playtest method (sessions, duration, observer notes), result verdict (VALIDATED), key findings (what worked, any minor issues observed), recommendation for production (specific mechanic parameters to preserve: 3 cards/turn, hand limit 7), and a flag to route the production implementation request to gameplay-programmer
+- Does NOT begin writing production code
+- Output is structured as a decision-ready recommendation, not a narrative summary
+
+### Case 4: Prototype reveals the mechanic is unworkable — abandonment note
+**Input**: "The prototype for the physics-based lock-picking mechanic is done. After 4 playtest sessions, all testers found it frustrating — too much precision required, not fun. One tester rage-quit."
+**Expected behavior**:
+- Produces a prototype abandonment note in `prototypes/lock-picking-physics/conclusion.md`
+- Document includes: hypothesis that was tested, result verdict (ABANDONED), specific reasons (precision barrier too high, negative emotional response, rage-quit incident as evidence), and a recommendation for alternative approaches to explore (simplified key-tumbler mechanic, rhythm-based alternative, removal of the mechanic entirely)
+- Does NOT recommend persisting with the prototype mechanic because of sunk cost
+- Does NOT mark the result as inconclusive — after 4 sessions with consistent negative responses, abandonment is the correct verdict
+
+### Case 5: Context pass — using the project's engine scripting language
+**Input context**: Project uses Godot 4.6 with GDScript (configured in technical-preferences.md).
+**Input**: "Prototype a basic grid movement system — player clicks a tile and the character moves to it."
+**Expected behavior**:
+- Produces the prototype in GDScript — not Python, C#, or pseudocode
+- Uses Godot 4.6 node types appropriate for a grid: TileMap or a custom grid manager node, CharacterBody2D or Node2D for the player
+- Does NOT apply production coding standards (no required test coverage, no doc comments, global state acceptable)
+- Writes the output to `prototypes/grid-movement/` not to `src/`
+- If a Godot 4.6 API is uncertain (given the LLM knowledge cutoff noted in VERSION.md), flags the specific API with a note to verify against the Godot 4.6 docs
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (prototypes/ directory only; throwaway code for concept validation)
+- [ ] Redirects production implementation requests to gameplay-programmer with a transition document offer
+- [ ] Produces structured conclusion documents (VALIDATED or ABANDONED verdict) after prototype evaluation
+- [ ] Does not recommend preserving prototype code in production form without explicit warnings
+- [ ] Uses the project's configured engine and scripting language; flags version uncertainty
+
+---
+
+## Coverage Notes
+- Case 2 (production redirect) is critical — prototype code leaking into src/ is a common quality problem
+- Case 4 (abandonment honesty) tests whether the agent avoids sunk-cost bias — prototypes that fail should be cleanly abandoned
+- Case 5 requires that technical-preferences.md has the engine and language configured; test is incomplete if not configured
+- The intentional relaxation of coding standards is a feature, not a gap — do not flag missing tests or doc comments as failures in prototype output
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/specialists/sound-designer.md
+++ b/Framework/agents/specialists/sound-designer.md
@@ -0,0 +1,84 @@
+# Agent Test Spec: sound-designer
+
+## Agent Summary
+Domain: SFX specs, audio events, mixing parameters, and sound category definitions.
+Does NOT own: music composition direction (audio-director), code implementation of audio systems.
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references SFX / audio events / mixing)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep — does NOT include engine code execution tools
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over music direction or audio code implementation
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create an SFX spec for a sword swing attack."
+**Expected behavior:**
+- Produces a complete audio event spec including:
+  - Event name (e.g., `sfx_combat_sword_swing`)
+  - Variation count (minimum 3 to avoid repetition fatigue)
+  - Pitch range (e.g., ±8% randomization)
+  - Volume range and normalization target (e.g., -12 dBFS)
+  - Sound category (e.g., `combat_sfx`)
+  - Suggested layering notes (whoosh layer + impact transient)
+- Output follows the project audio naming convention if one is established
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Compose a looping ambient music track for the forest level."
+**Expected behavior:**
+- Does NOT produce music composition direction or a music brief
+- Explicitly states that music direction belongs to `audio-director`
+- Redirects the request to `audio-director`
+- May note it can provide an SFX ambience layer spec (wind, wildlife) to complement the music once the music direction is set
+
+### Case 3: Dynamic parameter — falloff curve spec
+**Input:** "The sword swing SFX needs distance falloff so it sounds different across the arena."
+**Expected behavior:**
+- Produces a spec for the dynamic parameter including:
+  - Parameter name (e.g., `distance` or `listener_distance`)
+  - Falloff curve type (e.g., logarithmic, linear, custom)
+  - Near/far distance thresholds with corresponding volume and high-frequency attenuation values
+  - Occlusion override behavior if applicable
+- Does NOT write the audio engine integration code (defers to the appropriate programmer)
+
+### Case 4: Naming convention conflict
+**Input:** "Add a new SFX event called `SWORD_HIT_1` for the melee system."
+**Expected behavior:**
+- Identifies that `SWORD_HIT_1` conflicts with the established event naming convention (snake_case with category prefix, e.g., `sfx_combat_sword_hit`)
+- Does NOT silently register the non-conforming name
+- Flags the conflict to `audio-director` with the proposed compliant alternative
+- Will proceed with the corrected name once confirmed by audio-director
+
+### Case 5: Context pass — uses audio style guide
+**Input:** Audio style guide provided in context specifying: "gritty, grounded, no reverb tails over 1.5s, reference: The Witcher 3 combat audio." Request: "Create SFX specs for the full melee combat suite."
+**Expected behavior:**
+- References the "gritty, grounded" tone descriptor in the spec rationale
+- Caps all reverb tail specifications at 1.5 seconds as stated
+- Notes the reference material (The Witcher 3) as a benchmark for mix levels and transient design
+- Does NOT produce specs that contradict the style guide (e.g., no ethereal or heavily reverb-processed specs)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (SFX specs, event definitions, mixing parameters)
+- [ ] Redirects music direction requests to audio-director
+- [ ] Returns structured audio event specs (event name, variations, pitch, volume, category)
+- [ ] Does not produce code for audio system implementation
+- [ ] Flags naming convention violations rather than silently accepting non-conforming names
+- [ ] References provided style guides and constraints in all spec output
+
+---
+
+## Coverage Notes
+- SFX spec format (Case 1) should match whatever event schema the audio middleware (Wwise/FMOD/built-in) requires
+- Falloff curve (Case 3) verifies the agent produces implementation-ready parameter specs
+- Style guide compliance (Case 5) confirms the agent reads provided context and constrains output accordingly
--- a/Framework/agents/specialists/technical-artist.md
+++ b/Framework/agents/specialists/technical-artist.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: technical-artist
+
+## Agent Summary
+Domain: Shaders, VFX, rendering optimization, art pipeline tools, and visual performance.
+Does NOT own: art style decisions or color palette (art-director), gameplay code (gameplay-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references shaders / VFX / rendering)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over art style direction or gameplay logic
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create a dissolve effect shader for enemy death sequences."
+**Expected behavior:**
+- Produces shader code or a Shader Graph node spec appropriate to the configured engine (Godot shading language / Unity Shader Graph / Unreal Material Blueprint)
+- Defines a `dissolve_amount` uniform (0.0–1.0) as the animation driver
+- Uses a noise texture sample to determine the dissolve threshold
+- Notes edge-lighting technique as an optional enhancement
+- Output is engine-version-aware (checks version reference if post-cutoff APIs are needed)
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Define the art bible color palette: primary, secondary, and accent colors for the UI."
+**Expected behavior:**
+- Does NOT produce color palette decisions or art direction documents
+- Explicitly states that art style decisions belong to `art-director`
+- Redirects the request to `art-director`
+- May note it can later implement a color-grading or palette LUT shader once the palette is decided
+
+### Case 3: Performance warning — GPU particle count
+**Input:** "The VFX system is triggering a GPU particle count warning at 50,000 particles in the explosion pool."
+**Expected behavior:**
+- Produces an optimization spec addressing the specific warning
+- Proposes concrete strategies: particle budget caps per emitter, LOD-based particle reduction, GPU instancing, or switching to mesh-based VFX for distant effects
+- Provides before/after GPU cost estimates where calculable
+- Does NOT change gameplay behavior of the explosion (delegates any gameplay impact to gameplay-programmer)
+
+### Case 4: Engine version compatibility
+**Input:** "Use the new texture sampler API for the water shader."
+**Expected behavior:**
+- Checks the engine version reference (e.g., `docs/engine-reference/godot/VERSION.md`) before suggesting any API
+- Flags if the requested API is post-cutoff (e.g., Godot 4.4+ texture type changes)
+- Provides the correct syntax for the project's pinned engine version
+- If uncertain about post-cutoff behavior, explicitly states the uncertainty and directs to verified docs
+
+### Case 5: Context pass — uses performance budget
+**Input:** Performance budget from `technical-preferences.md` provided in context: 2ms GPU frame budget, max 200 draw calls. Request: "Optimize the forest rendering system."
+**Expected behavior:**
+- References the specific 2ms GPU budget and 200 draw call limit from the provided context
+- Proposes optimizations calibrated to those exact targets (e.g., "batching reduces draw calls from 340 to ~180, within the 200 limit")
+- Does NOT propose optimizations that would exceed the stated budgets in other dimensions
+- Produces a ranked list of optimizations by expected impact vs. implementation cost
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (shaders, VFX, rendering optimization, art pipeline)
+- [ ] Redirects art style decisions to art-director
+- [ ] Returns structured findings (shader code, optimization specs with metrics, node graphs)
+- [ ] Does not modify gameplay code files without explicit delegation
+- [ ] Checks engine version reference before suggesting post-cutoff APIs
+- [ ] Quantifies performance changes against stated budgets
+
+---
+
+## Coverage Notes
+- Dissolve shader (Case 1) should include a visual test reference in `production/qa/evidence/`
+- Engine version check (Case 4) confirms the agent treats VERSION.md as authoritative
+- Performance budget case (Case 5) verifies the agent reads and applies provided context numbers
--- a/Framework/agents/specialists/tools-programmer.md
+++ b/Framework/agents/specialists/tools-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: tools-programmer
+
+## Agent Summary
+Domain: Editor extensions, content authoring tools, debug utilities, and pipeline automation scripts.
+Does NOT own: game code (gameplay-programmer, ui-programmer, etc.), engine core systems (engine-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references editor tools / pipeline / debug utilities)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over game source code or engine internals
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Create a custom editor tool for placing enemy patrol waypoints in the level."
+**Expected behavior:**
+- Produces an editor extension spec and code scaffold for the configured engine (e.g., Godot EditorPlugin, Unity Editor window, Unreal Detail Customization)
+- Tool allows designer to click-place waypoints in the scene/viewport
+- Waypoints are serialized as engine-native resource (not hardcoded) so level-designer can edit without code
+- Includes undo/redo support per editor plugin best practices
+- Does NOT modify the AI pathfinding runtime code (that belongs to ai-programmer)
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the enemy melee combo system in code."
+**Expected behavior:**
+- Does NOT produce gameplay mechanic code
+- Explicitly states that combat system implementation belongs to `gameplay-programmer`
+- Redirects the request to `gameplay-programmer`
+- May note it can build a debug overlay tool to visualize combo state if useful during development
+
+### Case 3: Runtime data access — coordination required
+**Input:** "The waypoint editor tool needs to read game data at runtime to validate patrol routes against the AI budget."
+**Expected behavior:**
+- Identifies that runtime data access from an editor plugin requires a defined, safe interface to the game's runtime systems
+- Coordinates with `engine-programmer` to establish a read-only data access pattern (e.g., a resource validation API)
+- Does NOT directly read internal engine or game memory structures without an agreed interface
+- Documents the required interface before implementing the tool
+
+### Case 4: Engine version breakage
+**Input:** "After the engine upgrade, the waypoint editor tool crashes on startup."
+**Expected behavior:**
+- Checks the engine version reference (`docs/engine-reference/`) for breaking changes in editor plugin APIs
+- Identifies the specific API or signal that changed in the new version
+- Produces a targeted fix for the breaking change
+- Notes any other tools that may be affected by the same API change
+
+### Case 5: Context pass — art pipeline requirements
+**Input:** Art pipeline requirements provided in context: "All texture imports must set compression to VRAM Compressed, generate mipmaps, and tag with a LOD group." Request: "Build an asset import tool that enforces these settings."
+**Expected behavior:**
+- References all three requirements from the context: VRAM compression, mipmap generation, LOD group tagging
+- Produces an import tool that validates and applies all three settings on import
+- Adds a warning or error report for assets that fail to meet the specified settings
+- Does NOT change the art pipeline requirements themselves (those belong to art-director / technical-artist)
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (editor tools, pipeline scripts, debug utilities)
+- [ ] Redirects game code requests to appropriate programmer agents
+- [ ] Returns structured findings (tool specs, editor extension code, pipeline scripts)
+- [ ] Coordinates with engine-programmer before accessing runtime data from editor context
+- [ ] Checks engine version reference before using editor plugin APIs
+- [ ] Builds tools to enforce requirements, does not author the requirements themselves
+
+---
+
+## Coverage Notes
+- Waypoint editor tool (Case 1) should have a smoke test verifying it loads without errors in the editor
+- Runtime data access (Case 3) confirms the agent respects the engine-programmer's ownership of core APIs
+- Art pipeline context (Case 5) verifies the agent builds to match provided specs rather than inventing requirements
--- a/Framework/agents/specialists/ui-programmer.md
+++ b/Framework/agents/specialists/ui-programmer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ui-programmer
+
+## Agent Summary
+Domain: Menu screens, HUDs, inventory screens, dialogue boxes, UI framework code, and data binding.
+Does NOT own: UX flow design (ux-designer), visual style direction (art-director / technical-artist).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references menus / HUDs / UI framework / data binding)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over UX flow design or visual art direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Implement the inventory screen from the UX spec in `design/ux/inventory-flow.md`."
+**Expected behavior:**
+- Reads the UX spec before producing any code
+- Produces implementation using the project's configured UI framework (UI Toolkit, UGUI, UMG, or Godot Control nodes)
+- Implements all states defined in the spec (default, hover, selected, empty-slot, locked-slot)
+- Binds inventory data to UI elements via the project's data model, not hardcoded values
+- Includes doc comments on public UI API per coding standards
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Design the inventory interaction flow — what happens when the player equips, drops, or combines items."
+**Expected behavior:**
+- Does NOT produce interaction flow design or user flow diagrams
+- Explicitly states that UX flow design belongs to `ux-designer`
+- Redirects the request to `ux-designer`
+- Notes that once the flow spec is ready, it can implement it
+
+### Case 3: Custom animation coordination
+**Input:** "The item selection in the inventory needs a custom bounce animation when selected."
+**Expected behavior:**
+- Recognizes that defining the animation curve and feel is within technical-artist territory
+- Does NOT invent animation parameters (timing, easing) without a spec
+- Coordinates with `technical-artist` for an animation spec (duration, easing curve, overshoot amount)
+- Once the spec is provided, produces the implementation binding the animation to the selection state
+
+### Case 4: Ambiguous UX spec — flags back
+**Input:** The UX spec states "show item details on selection" but does not define what happens when an empty slot is selected.
+**Expected behavior:**
+- Identifies the ambiguity in the spec (empty slot selection state is undefined)
+- Does NOT make an arbitrary implementation decision for the undefined state
+- Flags the ambiguity back to `ux-designer` with the specific question: "What should the detail panel show when an empty inventory slot is selected?"
+- May propose two common options (hide panel / show placeholder) to help ux-designer decide quickly
+
+### Case 5: Context pass — engine UI toolkit
+**Input:** Engine context provided: project uses Godot 4.6 with Control node UI. Request: "Implement a scrollable item list for the inventory."
+**Expected behavior:**
+- Uses Godot's `ScrollContainer` + `VBoxContainer` + `ItemList` (or equivalent) pattern, not Canvas or UGUI
+- Does NOT produce Unity UGUI or Unreal UMG code for a Godot project
+- Checks the engine version reference (4.6) for any Control node API changes from 4.4/4.5 before using specific APIs
+- Produces GDScript or C# code consistent with the project's configured language
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (menus, HUDs, UI framework, data binding)
+- [ ] Redirects UX flow design to ux-designer
+- [ ] Coordinates with technical-artist for animation specs before implementing animations
+- [ ] Flags ambiguous UX specs back to ux-designer rather than making arbitrary implementation decisions
+- [ ] Returns structured output (implementation code, data binding patterns, state machine for UI states)
+- [ ] Uses the correct engine UI toolkit for the project — never cross-engine code
+
+---
+
+## Coverage Notes
+- Inventory implementation (Case 1) should have a UI interaction test or manual walkthrough doc in `production/qa/evidence/`
+- Animation coordination (Case 3) confirms the agent does not invent feel parameters without a spec
+- Ambiguous spec (Case 4) verifies the agent routes spec gaps back to the authoring agent rather than guessing
--- a/Framework/agents/specialists/ux-designer.md
+++ b/Framework/agents/specialists/ux-designer.md
@@ -0,0 +1,79 @@
+# Agent Test Spec: ux-designer
+
+## Agent Summary
+Domain: User experience flows, interaction design, information architecture, input handling design, and onboarding UX.
+Does NOT own: visual art style (art-director), UI implementation code (ui-programmer).
+Model tier: Sonnet (default).
+No gate IDs assigned.
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references UX flows / interaction design / information architecture)
+- [ ] `allowed-tools:` list includes Read, Write, Edit, Glob, Grep
+- [ ] Model tier is Sonnet (default for specialists)
+- [ ] Agent definition does not claim authority over visual art direction or UI implementation code
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — appropriate output
+**Input:** "Design the inventory management flow for a survival game."
+**Expected behavior:**
+- Produces a user flow diagram (states and transitions) for the inventory: open, browse, select item, sub-actions (equip/drop/combine), close
+- Defines all interaction states (default, hover, selected, empty-slot, locked-slot)
+- Specifies input mappings for each action (keyboard, gamepad if applicable)
+- Notes cognitive load considerations (e.g., maximum items visible without scrolling)
+- Does NOT produce visual design (colors, icons) or implementation code
+
+### Case 2: Out-of-domain request — redirects correctly
+**Input:** "Implement the inventory screen in GDScript with drag-and-drop support."
+**Expected behavior:**
+- Does NOT produce implementation code
+- Explicitly states that UI code implementation belongs to `ui-programmer`
+- Redirects the request to `ui-programmer`
+- Notes that the UX flow spec should be provided to ui-programmer as the implementation reference
+
+### Case 3: Flow depth conflict — simplification
+**Input:** "The lead designer says the current 5-step crafting flow is too deep; maximum 3 steps allowed."
+**Expected behavior:**
+- Produces a revised 3-step flow that collapses the original 5-step sequence
+- Shows clearly what was merged or removed and why each collapse is safe from a usability standpoint
+- Does NOT simply remove steps without addressing the user's goal at each removed step
+- Flags if the 3-step constraint makes any required use case impossible and proposes an alternative
+
+### Case 4: Accessibility conflict
+**Input:** "The onboarding flow uses a timed prompt (auto-advances after 3 seconds) to keep pace, but this conflicts with accessibility requirements for user-controlled timing."
+**Expected behavior:**
+- Identifies the conflict with WCAG 2.1 2.2.1 (Timing Adjustable)
+- Does NOT override the accessibility requirement to preserve pace
+- Coordinates with `accessibility-specialist` to agree on a compliant solution
+- Proposes alternatives: pause-on-hover, skip button, settings option to disable auto-advance
+
+### Case 5: Context pass — player mental model research
+**Input:** Playtest research provided in context: "Players consistently expected the 'Crafting' option to be inside the Inventory screen, not in a separate top-level menu." Request: "Redesign the navigation IA for crafting."
+**Expected behavior:**
+- References the specific player expectation from the research (crafting expected inside inventory)
+- Restructures the information architecture to place crafting as a tab or panel within the inventory screen
+- Does NOT produce a design that contradicts the stated player mental model without explicit justification
+- Notes the research source in the rationale for the design decision
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (UX flows, interaction design, IA, onboarding)
+- [ ] Redirects code implementation to ui-programmer, visual style to art-director
+- [ ] Returns structured findings (state diagrams, flow steps, input mappings) not freeform opinions
+- [ ] Coordinates with accessibility-specialist when flows have timing or cognitive load constraints
+- [ ] Designs flows based on provided user research, not assumed behavior
+- [ ] Documents rationale for flow decisions against user goals
+
+---
+
+## Coverage Notes
+- Inventory flow (Case 1) should be written to `design/ux/` as a spec for ui-programmer to implement against
+- Mental model case (Case 5) verifies the agent applies research evidence, not intuition
+- Accessibility coordination (Case 4) confirms the agent does not override accessibility requirements for UX aesthetics
--- a/Framework/agents/specialists/world-builder.md
+++ b/Framework/agents/specialists/world-builder.md
@@ -0,0 +1,80 @@
+# Agent Test Spec: world-builder
+
+## Agent Summary
+- **Domain**: World lore architecture — factions and their cultures/governments/motivations, world history, geography and ecology, cosmology and metaphysics, world rules (how magic works, what is and is not possible), internal consistency enforcement across the world document
+- **Does NOT own**: Specific NPC or quest dialogue (writer), game mechanics rules derived from world rules (game-designer/systems-designer), narrative story structure and arc design (narrative-director)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; escalates world rule/mechanic conflicts to narrative-director and game-designer jointly
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references world lore, factions, history, world rules, ecology)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/narrative/world/ documents; no game source, mechanic design, or dialogue files)
+- [ ] Model tier is Sonnet (default for creative specialists)
+- [ ] Agent definition does not claim authority over dialogue writing, mechanic design, or narrative arc structure
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — faction culture and government design
+**Input**: "Design the Ironveil Merchant Consortium — a powerful trading faction in our world. I need their culture, government structure, and internal motivations."
+**Expected behavior**:
+- Produces a faction profile document with: cultural values and norms, government structure (how decisions are made, who holds power, succession or appointment process), internal factions or tensions within the consortium, relationship to other factions (allies, rivals, neutral parties), and primary motivations (what they want and why)
+- The faction is internally consistent: a merchant consortium's government is driven by economic logic, not feudal or religious logic, unless a deliberate hybrid is specified
+- Output includes at least one internal tension or contradiction within the faction — factions without internal complexity are flat
+- Formatted as a structured faction profile, not a narrative essay
+
+### Case 2: Out-of-domain request — dialogue writing
+**Input**: "Write the dialogue for a Ironveil Consortium merchant NPC that the player meets at the city gates."
+**Expected behavior**:
+- Does not produce NPC dialogue
+- States clearly: "Dialogue writing is owned by writer; I provide the world and faction context that informs the dialogue, including the faction's culture, tone, and speaking style"
+- Offers to produce the faction's speaking style notes and cultural context that writer would need to write consistent dialogue
+
+### Case 3: New lore entry contradicts established history — conflict flagging
+**Input**: "Add a lore entry stating the Ironveil Consortium was founded 50 years ago by a single merchant family." [Context includes existing lore: the Consortium has existed for 300 years and was founded as a collective by 12 rival trading houses.]
+**Expected behavior**:
+- Identifies the contradiction: existing lore states 300-year history and a founding coalition of 12 houses; the new entry claims 50 years and a single founding family
+- Does NOT write the new entry as requested
+- Flags the conflict: states both versions, identifies which is established and which is the proposed change
+- Proposes resolution options: (a) the new entry is wrong and should be corrected; (b) the existing lore should be updated if the new version is the intended canon; (c) there is an in-world explanation (the current family claims founding credit despite the collective origin — a deliberate narrative unreliable narrator)
+- Routes the resolution to narrative-director if no clear answer exists
+
+### Case 4: World rule has gameplay implications — coordination with game-designer
+**Input**: "I want to establish a world rule: magic users who cast spells near iron ore are weakened. Iron disrupts arcane energy."
+**Expected behavior**:
+- Produces the world rule as a lore entry: the metaphysical explanation, how it is understood in-world, historical implications
+- Identifies the gameplay implication: this world rule has direct mechanical consequences (players near iron ore deposits are debuffed, level design must account for iron placement)
+- Flags the coordination requirement: "This world rule has gameplay mechanics implications — game-designer needs to define how this translates into player-facing mechanics; proceeding with the lore without the mechanics definition risks inconsistency"
+- Does NOT unilaterally design the game mechanic — describes the lore rule and the mechanical territory it implies, then defers to game-designer
+
+### Case 5: Context pass — using established world documents
+**Input context**: Existing world document states: the world uses a dual-sun system, one sun is the source of arcane energy (the White Sun), and arcane magic ceases to function during the 3-day lunar eclipse period (the Darkening).
+**Input**: "Add a lore entry about the Mages' College and how they prepare for the Darkening."
+**Expected behavior**:
+- Uses the established dual-sun cosmology: references the White Sun as the source of arcane energy
+- Uses the established Darkening event: 3-day eclipse, magic ceases
+- Does NOT invent a different eclipse mechanism, duration, or name
+- Produces a lore entry where the Mages' College's Darkening preparations are consistent with the established rules: they cannot cast during the Darkening, so preparations are practical (stockpiling non-magical supplies, scheduling, shutting down ongoing magical processes)
+- Does not contradict any established fact from the context document
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (factions, world history, geography, ecology, world rules, cosmology)
+- [ ] Redirects dialogue writing requests to writer with contextual faction notes
+- [ ] Flags lore contradictions with both versions stated and resolution options offered — does not silently overwrite established lore
+- [ ] Identifies gameplay implications of world rules and flags coordination with game-designer
+- [ ] Uses all established world facts from context; does not invent alternatives to stated lore
+
+---
+
+## Coverage Notes
+- Case 3 (contradiction detection) requires existing lore to be in context — this is the most important consistency test
+- Case 4 (world rule/mechanic coordination) tests cross-domain awareness; verify the agent identifies the mechanic boundary without crossing it
+- Case 5 is the most important context-awareness test; the agent must use established facts, not creative alternatives
+- No automated runner; review manually or via `/skill-test`
--- a/Framework/agents/specialists/writer.md
+++ b/Framework/agents/specialists/writer.md
@@ -0,0 +1,81 @@
+# Agent Test Spec: writer
+
+## Agent Summary
+- **Domain**: In-game written content — NPC dialogue (including branching trees), lore codex entries, item and ability descriptions, environmental text (signs, books, notes), quest text, tutorial text, in-world written documents
+- **Does NOT own**: Story architecture and narrative structure (narrative-director), world lore and world rules (world-builder), UX copy and UI labels (ux-designer), patch notes (community-manager)
+- **Model tier**: Sonnet
+- **Gate IDs**: None; flags lore inconsistencies to narrative-director rather than resolving them autonomously
+
+---
+
+## Static Assertions (Structural)
+
+- [ ] `description:` field is present and domain-specific (references dialogue, lore entries, item descriptions, in-game text)
+- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/narrative/ and assets/data/dialogue/; no code or world-building architecture files)
+- [ ] Model tier is Sonnet (default for creative specialists)
+- [ ] Agent definition does not claim authority over narrative structure, world rules, or UX copy direction
+
+---
+
+## Test Cases
+
+### Case 1: In-domain request — NPC merchant dialogue
+**Input**: "Write dialogue for Mira, a traveling merchant NPC. She sells general supplies. Players can ask her about her wares, the road ahead, and rumors."
+**Expected behavior**:
+- Produces a dialogue tree with at least three top-level conversation options: [Wares], [The Road Ahead], [Rumors]
+- Each branch has a distinct conversational response in Mira's voice — not generic merchant filler
+- Includes at least one response that has a follow-up branch (showing tree structure, not just flat responses)
+- Mira's voice is consistent across branches: if she's warm and chatty in one branch, she's not brusque in another without reason
+- Output is formatted as a structured dialogue tree: node label, NPC line, player options, next node
+
+### Case 2: Out-of-domain request — world history design
+**Input**: "Design the history of the world — when the first kingdom was founded, what the great wars were, and why magic was banned."
+**Expected behavior**:
+- Does not produce world history, lore architecture, or world rules
+- States clearly: "World history, lore, and world rules are owned by world-builder; once the history is established, I can write in-game texts, books, and dialogue that reference those events"
+- Does not produce even partial world history as a "placeholder"
+
+### Case 3: Dialogue contradicts established lore — flag to narrative-director
+**Input**: "Write Mira's dialogue line where she mentions that dragons have been extinct for 200 years." [Context includes existing lore: dragons are alive and revered in the northern provinces, not extinct.]
+**Expected behavior**:
+- Identifies the contradiction: established lore states dragons are alive and revered; dialogue stating they're extinct directly conflicts
+- Does NOT write the requested line as given
+- Flags the inconsistency to narrative-director: "Mira's dialogue as requested contradicts established lore (dragons are alive per world-builder's document); requires narrative-director resolution before I can write this line"
+- Offers an alternative: a line that references dragons in a way consistent with the established lore (e.g., Mira expresses awe about a dragon sighting in the north)
+
+### Case 4: Item description references an undesigned mechanic
+**Input**: "Write a description for the 'Berserker's Chalice' — a consumable that triggers the Berserker state when drunk."
+**Expected behavior**:
+- Identifies the dependency gap: "Berserker state" is not defined in any provided game design document
+- Flags the missing dependency: "This description references a 'Berserker state' mechanic that has no GDD entry — I cannot write accurate flavor text for a mechanic whose rules are undefined, as the description may create incorrect player expectations"
+- Does NOT write a description that invents mechanic details (duration, effects) that may conflict with the eventual design
+- Offers two paths: (a) write a vague, non-mechanical description that creates no false expectations, flagged as temporary; (b) wait for game-designer to define the Berserker state first
+
+### Case 5: Context pass — character voice guide
+**Input context**: Character voice guide for Mira: She speaks in short, energetic sentences. Uses merchant slang ("a fine bargain," "coin well spent"). Drops pronouns occasionally ("Good wares, these."). Never uses contractions — always "I will" not "I'll". Warm but slightly mercenary.
+**Input**: "Write Mira's response when a player asks if she has healing potions."
+**Expected behavior**:
+- Short, energetic sentences — no long monologues
+- Uses merchant slang: "a fine bargain," "coin well spent," or similar
+- Drops pronouns where natural: "Fine stock, these potions."
+- No contractions: "I will" not "I'll," "do not" not "don't"
+- Warm tone with a mercenary undertone: she's happy to help because you're a paying customer
+- Does NOT produce dialogue that violates any voice guide rule — check each rule explicitly
+
+---
+
+## Protocol Compliance
+
+- [ ] Stays within declared domain (dialogue, lore entries, item descriptions, in-game text)
+- [ ] Redirects world history and world rule requests to world-builder without producing unauthorized lore
+- [ ] Flags lore contradictions to narrative-director rather than silently writing inconsistent content
+- [ ] Identifies mechanic dependency gaps before writing item descriptions that could create false player expectations
+- [ ] Applies all rules from a provided character voice guide — no partial compliance
+
+---
+
+## Coverage Notes
+- Case 3 (lore contradiction detection) requires that existing lore is in the conversation context — test is only valid when context is provided
+- Case 4 (dependency gap) tests whether the agent writes descriptions that could set wrong player expectations — a subtle but important quality issue
+- Case 5 is the most important context-awareness test; voice guide compliance must be checked rule-by-rule, not holistically
+- No automated runner; review manually or via `/skill-test`