添加 claude code game studios 到项目

2026-05-15 14:52:29 +08:00
parent dff559462d
commit a16fe4bff7
415 changed files with 78609 additions and 0 deletions
--- a/Framework/skills/analysis/asset-audit.md
+++ b/Framework/skills/analysis/asset-audit.md
@@ -0,0 +1,170 @@
+# Skill Test Spec: /asset-audit
+
+## Skill Summary
+
+`/asset-audit` audits the `assets/` directory for naming convention compliance,
+missing metadata, and format/size issues. It reads asset files against the
+conventions and budgets defined in `technical-preferences.md`. No director gates
+are invoked. The skill does not write without user approval. Verdicts: COMPLIANT,
+WARNINGS, or NON-COMPLIANT.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLIANT, WARNINGS, NON-COMPLIANT
+- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
+- [ ] Has a next-step handoff (what to do after audit results)
+
+---
+
+## Director Gate Checks
+
+None. Asset auditing is a read-only analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All assets follow naming conventions
+
+**Fixture:**
+- `technical-preferences.md` specifies naming convention: `snake_case`, e.g., `enemy_grunt_idle.png`
+- `assets/art/characters/` contains: `enemy_grunt_idle.png`, `enemy_sniper_run.png`
+- `assets/audio/sfx/` contains: `sfx_jump_land.ogg`, `sfx_item_pickup.ogg`
+- All files are within size budget (textures ≤2MB, audio ≤500KB)
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads naming conventions and size budgets from `technical-preferences.md`
+2. Skill scans `assets/` recursively
+3. All files match `snake_case` convention; all within budget
+4. Audit table shows all rows PASS
+5. Verdict is COMPLIANT
+
+**Assertions:**
+- [ ] Audit covers both art and audio asset directories
+- [ ] Each file is checked against naming convention and size budget
+- [ ] All rows show PASS when compliant
+- [ ] Verdict is COMPLIANT
+- [ ] No files are written
+
+---
+
+### Case 2: Non-Compliant — Textures exceed size budget
+
+**Fixture:**
+- `assets/art/environment/` contains 5 texture files
+- 3 texture files are 4MB each (budget: ≤2MB)
+- 2 texture files are within budget
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads size budget from `technical-preferences.md` (2MB for textures)
+2. Skill scans `assets/art/environment/` — finds 3 oversized textures
+3. Audit table lists each oversized file with actual size and budget
+4. Verdict is NON-COMPLIANT
+5. Skill recommends compression or resolution reduction for flagged files
+
+**Assertions:**
+- [ ] All 3 oversized files are listed by name with actual size and budget size
+- [ ] Verdict is NON-COMPLIANT when any file exceeds its budget
+- [ ] Optimization recommendation is given for oversized files
+- [ ] Within-budget files are also listed (showing PASS) for completeness
+
+---
+
+### Case 3: Format Issue — Audio in wrong format
+
+**Fixture:**
+- `technical-preferences.md` specifies audio format: OGG
+- `assets/audio/music/theme_main.wav` exists (WAV format)
+- `assets/audio/sfx/sfx_footstep.ogg` exists (correct OGG format)
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads audio format requirement: OGG
+2. Skill scans `assets/audio/` — finds `theme_main.wav` in wrong format
+3. Audit table flags `theme_main.wav` as FORMAT ISSUE (expected OGG, found WAV)
+4. `sfx_footstep.ogg` shows PASS
+5. Verdict is WARNINGS (format issues are correctable)
+
+**Assertions:**
+- [ ] `theme_main.wav` is flagged as FORMAT ISSUE with expected and actual format noted
+- [ ] Verdict is WARNINGS (not NON-COMPLIANT) for format issues, which are correctable
+- [ ] Correct-format assets are shown as PASS
+- [ ] Skill does not modify or convert any asset files
+
+---
+
+### Case 4: Missing Asset — Asset referenced by GDD but absent from assets/
+
+**Fixture:**
+- `design/gdd/enemies.md` references `enemy_boss_idle.png`
+- `assets/art/characters/boss/` directory is empty — file does not exist
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill reads GDD references to find expected assets (cross-references with `/content-audit` scope)
+2. Skill scans `assets/art/characters/boss/` — file not found
+3. Audit table flags `enemy_boss_idle.png` as MISSING ASSET
+4. Verdict is NON-COMPLIANT (missing critical art asset)
+
+**Assertions:**
+- [ ] Skill checks GDD references to identify expected assets
+- [ ] Missing assets are flagged as MISSING ASSET with the GDD reference noted
+- [ ] Verdict is NON-COMPLIANT when critical assets are missing
+- [ ] Skill does not create or add placeholder assets
+
+---
+
+### Case 5: Gate Compliance — No gate; technical-artist may be consulted separately
+
+**Fixture:**
+- 2 files have naming convention violations (CamelCase instead of snake_case)
+- `review-mode.txt` contains `full`
+
+**Input:** `/asset-audit`
+
+**Expected behavior:**
+1. Skill scans assets and finds 2 naming violations
+2. No director gate is invoked regardless of review mode
+3. Verdict is WARNINGS
+4. Output notes: "Consider having a Technical Artist review naming conventions"
+5. Skill presents findings; offers optional audit report write
+6. If user opts in: "May I write to `production/qa/asset-audit-[date].md`?"
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Technical artist consultation is suggested (not mandated)
+- [ ] Findings table is presented before any write prompt
+- [ ] Optional audit report write asks "May I write" before writing
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `technical-preferences.md` for naming conventions, formats, and size budgets
+- [ ] Scans `assets/` directory recursively
+- [ ] Audit table shows file name, check type, expected value, actual value, and result
+- [ ] Does not modify any asset files
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: COMPLIANT, WARNINGS, NON-COMPLIANT
+
+---
+
+## Coverage Notes
+
+- Metadata checks (e.g., missing texture import settings in Godot `.import` files)
+  are not explicitly tested here; they follow the same FORMAT ISSUE flagging pattern.
+- The interaction between `/asset-audit` and `/content-audit` (both check GDD
+  references vs. assets) is intentional overlap; `/asset-audit` focuses on
+  compliance while `/content-audit` focuses on completeness.
--- a/Framework/skills/analysis/balance-check.md
+++ b/Framework/skills/analysis/balance-check.md
@@ -0,0 +1,172 @@
+# Skill Test Spec: /balance-check
+
+## Skill Summary
+
+`/balance-check` reads balance data files (JSON or YAML in `assets/data/`) and
+checks each value against the design formulas defined in GDDs under `design/gdd/`.
+It produces a findings table with columns: Value → Formula → Deviation → Severity.
+No director gates are invoked (read-only analysis). The skill may optionally write
+a balance report but asks "May I write" before doing so. Verdicts: BALANCED,
+CONCERNS, or OUT OF BALANCE.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: BALANCED, CONCERNS, OUT OF BALANCE
+- [ ] Contains "May I write" language (optional report write)
+- [ ] Has a next-step handoff (what to do after findings are reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Balance check is a read-only analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All balance values within formula tolerances
+
+**Fixture:**
+- `assets/data/combat-balance.json` exists with 6 stat values
+- `design/gdd/combat-system.md` contains formulas for all 6 stats with ±10% tolerance
+- All 6 values fall within tolerance
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads all balance data files in `assets/data/`
+2. Skill reads GDD formulas from `design/gdd/`
+3. Skill computes deviation for each value against its formula
+4. All deviations are within ±10% tolerance
+5. Skill outputs findings table with all rows showing PASS
+6. Verdict is BALANCED
+
+**Assertions:**
+- [ ] Findings table is shown for all checked values
+- [ ] Each row shows: stat name, formula target, actual value, deviation percentage
+- [ ] All rows show PASS or equivalent when within tolerance
+- [ ] Verdict is BALANCED
+- [ ] No files are written without user approval
+
+---
+
+### Case 2: Out of Balance — Player damage 40% above formula target
+
+**Fixture:**
+- `assets/data/combat-balance.json` has `player_damage_base: 140`
+- `design/gdd/combat-system.md` formula specifies `player_damage_base = 100` (±10%)
+- All other stats are within tolerance
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads combat-balance.json and computes deviation for `player_damage_base`
+2. Deviation is +40% — far outside ±10% tolerance
+3. Skill flags this row as severity HIGH in the findings table
+4. Verdict is OUT OF BALANCE
+5. Skill surfaces the HIGH severity item prominently before the table
+
+**Assertions:**
+- [ ] `player_damage_base` row shows deviation of +40%
+- [ ] Severity is HIGH for deviations exceeding tolerance by more than 2×
+- [ ] Verdict is OUT OF BALANCE when any stat has HIGH severity deviation
+- [ ] The HIGH severity item is called out explicitly, not buried in table rows
+
+---
+
+### Case 3: No GDD Formulas — Cannot validate, guidance given
+
+**Fixture:**
+- `assets/data/economy-balance.yaml` exists with 10 stat values
+- No GDD in `design/gdd/` contains formula definitions for economy stats
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads balance data files
+2. Skill searches GDDs for formula definitions — finds none for economy stats
+3. Skill outputs: "Cannot validate economy stats — no formulas defined. Run /design-system first."
+4. No findings table is generated for the economy stats
+5. Verdict is CONCERNS (data exists but cannot be validated)
+
+**Assertions:**
+- [ ] Skill does not fabricate formula targets when none exist in GDDs
+- [ ] Output explicitly names the missing formula source
+- [ ] Output recommends running `/design-system` to define formulas
+- [ ] Verdict is CONCERNS (not BALANCED, since validation was impossible)
+
+---
+
+### Case 4: Orphan Reference — Balance file references an undefined stat
+
+**Fixture:**
+- `assets/data/combat-balance.json` contains a stat `legacy_armor_mult: 1.5`
+- `design/gdd/combat-system.md` has no formula for `legacy_armor_mult`
+- All other stats have formula definitions and pass validation
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads all stats from combat-balance.json
+2. Skill cannot find a formula for `legacy_armor_mult` in any GDD
+3. Skill flags `legacy_armor_mult` as ORPHAN REFERENCE in the findings table
+4. Other stats are evaluated normally; those within tolerance show PASS
+5. Verdict is CONCERNS (orphan reference prevents full validation)
+
+**Assertions:**
+- [ ] `legacy_armor_mult` appears in findings table with status ORPHAN REFERENCE
+- [ ] Orphan references are distinguished from formula deviations in the table
+- [ ] Verdict is CONCERNS when any orphan references are found
+- [ ] Skill does not skip orphan stats silently
+
+---
+
+### Case 5: Gate Compliance — Read-only; no gate; optional report requires approval
+
+**Fixture:**
+- Balance data and GDD formulas exist; 1 stat has CONCERNS-level deviation (15% above target)
+- `review-mode.txt` contains `full`
+
+**Input:** `/balance-check`
+
+**Expected behavior:**
+1. Skill reads data and GDDs; generates findings table
+2. Verdict is CONCERNS (one stat slightly out of range)
+3. No director gate is invoked
+4. Skill presents findings table to user
+5. Skill offers to write an optional balance report
+6. If user says yes: skill asks "May I write to `production/qa/balance-report-[date].md`?"
+7. If user says no: skill ends without writing
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Findings table is presented without writing anything automatically
+- [ ] Optional report write is offered but not forced
+- [ ] "May I write" prompt appears only if user opts in to the report
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads both balance data files and GDD formulas before analysis
+- [ ] Findings table shows Value, Formula, Deviation, and Severity columns
+- [ ] Does not write any files without explicit user approval
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: BALANCED, CONCERNS, OUT OF BALANCE
+
+---
+
+## Coverage Notes
+
+- The case where `assets/data/` is entirely empty is not tested; behavior
+  follows the CONCERNS pattern with a message that no data files were found.
+- Tolerance thresholds (±10%, ±20%) are implementation details of the skill;
+  the tests verify that deviations are detected and classified, not the
+  exact threshold values.
--- a/Framework/skills/analysis/code-review.md
+++ b/Framework/skills/analysis/code-review.md
@@ -0,0 +1,172 @@
+# Skill Test Spec: /code-review
+
+## Skill Summary
+
+`/code-review` performs an architectural code review of source files in `src/`,
+checking coding standards from `CLAUDE.md` (doc comments on public APIs,
+dependency injection over singletons, data-driven values, testability). Findings
+are advisory. No director gates are invoked. No code edits are made. Verdicts:
+APPROVED, CONCERNS, or NEEDS CHANGES.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: APPROVED, CONCERNS, NEEDS CHANGES
+- [ ] Does NOT require "May I write" language (read-only; findings are advisory output)
+- [ ] Has a next-step handoff (what to do with findings)
+
+---
+
+## Director Gate Checks
+
+None. Code review is a read-only advisory skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Source file follows all coding standards
+
+**Fixture:**
+- `src/gameplay/health_component.gd` exists with:
+  - All public methods have doc comments (`##` notation)
+  - No singletons used; dependencies injected via constructor
+  - No hardcoded values; all constants reference `assets/data/`
+  - ADR reference in file header: `# ADR: docs/architecture/adr-004-health.md`
+  - Referenced ADR has `Status: Accepted`
+
+**Input:** `/code-review src/gameplay/health_component.gd`
+
+**Expected behavior:**
+1. Skill reads the source file
+2. Skill checks all coding standards: doc comments, DI, data-driven, ADR status
+3. All checks pass
+4. Skill outputs findings summary with all checks PASS
+5. Verdict is APPROVED
+
+**Assertions:**
+- [ ] Each coding standard check is listed in the output
+- [ ] All checks show PASS when standards are met
+- [ ] Skill reads referenced ADR to confirm its status
+- [ ] Verdict is APPROVED
+- [ ] No edits are made to any file
+
+---
+
+### Case 2: Needs Changes — Missing doc comment and singleton usage
+
+**Fixture:**
+- `src/ui/inventory_ui.gd` has:
+  - 2 public methods without doc comments
+  - Uses `GameManager.instance` (singleton pattern)
+  - All other standards met
+
+**Input:** `/code-review src/ui/inventory_ui.gd`
+
+**Expected behavior:**
+1. Skill reads the source file
+2. Skill detects: 2 missing doc comments on public methods
+3. Skill detects: singleton usage at specific lines (e.g., line 42, line 87)
+4. Findings list the exact method names and line numbers
+5. Verdict is NEEDS CHANGES
+
+**Assertions:**
+- [ ] Missing doc comments are listed with method names
+- [ ] Singleton usage is flagged with file and line number
+- [ ] Verdict is NEEDS CHANGES when BLOCKING-level standard violations exist
+- [ ] Skill does not edit the file — findings are for the developer to act on
+- [ ] Output suggests replacing singleton with dependency injection
+
+---
+
+### Case 3: Architecture Risk — ADR reference is Proposed, not Accepted
+
+**Fixture:**
+- `src/core/save_system.gd` has a header comment: `# ADR: docs/architecture/adr-010-save.md`
+- `adr-010-save.md` exists but has `Status: Proposed`
+- Code itself follows all other coding standards
+
+**Input:** `/code-review src/core/save_system.gd`
+
+**Expected behavior:**
+1. Skill reads the source file
+2. Skill reads referenced ADR — finds `Status: Proposed`
+3. Skill flags this as ARCHITECTURE RISK (code is implementing an unaccepted ADR)
+4. Other coding standard checks pass
+5. Verdict is CONCERNS (risk flag is advisory, not a hard NEEDS CHANGES)
+
+**Assertions:**
+- [ ] Skill reads referenced ADR file to check its status
+- [ ] ARCHITECTURE RISK is flagged when ADR status is Proposed
+- [ ] Verdict is CONCERNS (not NEEDS CHANGES) for ADR risk — advisory severity
+- [ ] Output recommends resolving the ADR before the code goes to production
+
+---
+
+### Case 4: Edge Case — No source files found at specified path
+
+**Fixture:**
+- User calls `/code-review src/networking/`
+- `src/networking/` directory does not exist
+
+**Input:** `/code-review src/networking/`
+
+**Expected behavior:**
+1. Skill attempts to read files in `src/networking/`
+2. Directory or files not found
+3. Skill outputs an error: "No source files found at `src/networking/`"
+4. Skill suggests checking `src/` for valid directories
+5. No verdict is emitted (nothing was reviewed)
+
+**Assertions:**
+- [ ] Skill does not crash when path does not exist
+- [ ] Output names the attempted path in the error message
+- [ ] Output suggests checking `src/` for valid file paths
+- [ ] No verdict is emitted when there is nothing to review
+
+---
+
+### Case 5: Gate Compliance — No gate; LP may be consulted separately
+
+**Fixture:**
+- Source file follows most standards but has 1 CONCERNS-level finding (a magic number)
+- `review-mode.txt` contains `full`
+
+**Input:** `/code-review src/gameplay/loot_system.gd`
+
+**Expected behavior:**
+1. Skill reads and reviews the source file
+2. No director gate is invoked (code review findings are advisory)
+3. Skill presents findings with the CONCERNS verdict
+4. Output notes: "Consider requesting a Lead Programmer review for architecture concerns"
+5. Skill does not invoke any agent automatically
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] LP consultation is suggested (not mandated) in the output
+- [ ] No code edits are made
+- [ ] Verdict is CONCERNS for advisory-level findings
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads source file(s) and coding standards before reviewing
+- [ ] Lists each coding standard check in findings output
+- [ ] Does not edit any source files (read-only skill)
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: APPROVED, CONCERNS, NEEDS CHANGES
+
+---
+
+## Coverage Notes
+
+- Batch review of all files in a directory is not explicitly tested; behavior
+  is assumed to apply the same checks file by file and aggregate the verdict.
+- Test coverage checks (verifying corresponding test files exist) are a stretch
+  goal not tested here; that is primarily the domain of `/test-evidence-review`.
--- a/Framework/skills/analysis/consistency-check.md
+++ b/Framework/skills/analysis/consistency-check.md
@@ -0,0 +1,176 @@
+# Skill Test Spec: /consistency-check
+
+## Skill Summary
+
+`/consistency-check` scans all GDDs in `design/gdd/` and checks for internal
+conflicts across documents. It produces a structured findings table with columns:
+System A vs System B, Conflict Type, Severity (HIGH / MEDIUM / LOW). Conflict
+types include: formula mismatch, competing ownership, stale reference, and
+dependency gap.
+
+The skill is read-only during analysis. It has no director gates. An optional
+consistency report can be written to `design/consistency-report-[date].md` if the
+user requests it, but the skill asks "May I write" before doing so.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
+- [ ] Does NOT require "May I write" language during analysis (read-only scan)
+- [ ] Has a next-step handoff at the end
+- [ ] Documents that report writing is optional and requires approval
+
+---
+
+## Director Gate Checks
+
+No director gates — this skill spawns no director gate agents. Consistency
+checking is a mechanical scan; no creative or technical director review is
+required as part of the scan itself.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — 4 GDDs with no conflicts
+
+**Fixture:**
+- `design/gdd/` contains exactly 4 system GDDs
+- All GDDs have consistent formulas (no overlapping variables with different values)
+- No two GDDs claim ownership of the same game entity or mechanic
+- All dependency references point to GDDs that exist
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all 4 GDDs in `design/gdd/`
+2. Runs cross-GDD consistency checks (formulas, ownership, references)
+3. No conflicts found
+4. Outputs structured findings table showing 0 issues
+5. Verdict: CONSISTENT
+
+**Assertions:**
+- [ ] All 4 GDDs are read before producing output
+- [ ] Findings table is present (even if empty — shows "No conflicts found")
+- [ ] Verdict is CONSISTENT when no conflicts exist
+- [ ] Skill does NOT write any files without user approval
+- [ ] Next-step handoff is present
+
+---
+
+### Case 2: Failure Path — Two GDDs with conflicting damage formulas
+
+**Fixture:**
+- GDD-A defines damage formula: `damage = attack * 1.5`
+- GDD-B defines damage formula: `damage = attack * 2.0` for the same entity type
+- Both GDDs refer to the same "attack" variable
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all GDDs and detects the formula mismatch
+2. Findings table includes an entry: GDD-A vs GDD-B | Formula Mismatch | HIGH
+3. Specific conflicting formulas are shown (not just "formula conflict exists")
+4. Verdict: CONFLICTS FOUND
+
+**Assertions:**
+- [ ] Verdict is CONFLICTS FOUND (not CONSISTENT)
+- [ ] Conflict entry names both GDD filenames
+- [ ] Conflict type is "Formula Mismatch"
+- [ ] Severity is HIGH for a direct formula contradiction
+- [ ] Both conflicting formulas are shown in the findings table
+- [ ] Skill does NOT auto-resolve the conflict
+
+---
+
+### Case 3: Partial Path — GDD references a system with no GDD
+
+**Fixture:**
+- GDD-A's Dependencies section lists "system-B" as a dependency
+- No GDD for system-B exists in `design/gdd/`
+- All other GDDs are consistent
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all GDDs and checks dependency references
+2. GDD-A's reference to "system-B" cannot be resolved — no GDD exists for it
+3. Findings table includes: GDD-A vs (missing) | Dependency Gap | MEDIUM
+4. Verdict: DEPENDENCY GAP (not CONSISTENT, not CONFLICTS FOUND)
+
+**Assertions:**
+- [ ] Verdict is DEPENDENCY GAP (distinct from CONSISTENT and CONFLICTS FOUND)
+- [ ] Findings entry names GDD-A and the missing system-B
+- [ ] Severity is MEDIUM for an unresolved dependency reference
+- [ ] Skill suggests running `/design-system system-B` to create the missing GDD
+
+---
+
+### Case 4: Edge Case — No GDDs found
+
+**Fixture:**
+- `design/gdd/` directory is empty or does not exist
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill attempts to read files in `design/gdd/`
+2. No GDD files found
+3. Skill outputs an error: "No GDDs found in `design/gdd/`. Run `/design-system` to create GDDs first."
+4. No findings table is produced
+5. No verdict is issued
+
+**Assertions:**
+- [ ] Skill outputs a clear error message when no GDDs are found
+- [ ] No verdict is produced (CONSISTENT / CONFLICTS FOUND / DEPENDENCY GAP)
+- [ ] Skill recommends the correct next action (`/design-system`)
+- [ ] Skill does NOT crash or produce a partial report
+
+---
+
+### Case 5: Director Gate — No gate spawned; no review-mode.txt read
+
+**Fixture:**
+- `design/gdd/` contains ≥2 GDDs
+- `production/session-state/review-mode.txt` exists with `full`
+
+**Input:** `/consistency-check`
+
+**Expected behavior:**
+1. Skill reads all GDDs and runs the consistency scan
+2. Skill does NOT read `production/session-state/review-mode.txt`
+3. No director gate agents are spawned at any point
+4. Findings table and verdict are produced normally
+
+**Assertions:**
+- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
+- [ ] Skill does NOT read `production/session-state/review-mode.txt`
+- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
+- [ ] Review mode has no effect on this skill's behavior
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads all GDDs before producing the findings table
+- [ ] Findings table shown in full before any write ask (if report is requested)
+- [ ] Verdict is one of exactly: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
+- [ ] No director gates — no review-mode.txt read
+- [ ] Report writing (if requested) gated by "May I write" approval
+- [ ] Ends with next-step handoff appropriate to verdict
+
+---
+
+## Coverage Notes
+
+- This skill checks for structural consistency between GDDs. Deep design theory
+  analysis (pillar drift, dominant strategies) is handled by `/review-all-gdds`.
+- Formula conflict detection relies on consistent formula notation across GDDs —
+  informal descriptions of the same mechanic may not be detected.
+- The conflict severity rubric (HIGH / MEDIUM / LOW) is defined in the skill body
+  and not re-enumerated here.
--- a/Framework/skills/analysis/content-audit.md
+++ b/Framework/skills/analysis/content-audit.md
@@ -0,0 +1,164 @@
+# Skill Test Spec: /content-audit
+
+## Skill Summary
+
+`/content-audit` reads GDDs in `design/gdd/` and checks whether all content
+items specified there (enemies, items, levels, etc.) are accounted for in
+`assets/`. It produces a gap table: Content Type → Specified Count → Found Count
+→ Missing Items. No director gates are invoked. The skill does not write without
+user approval. Verdicts: COMPLETE, GAPS FOUND, or MISSING CRITICAL CONTENT.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
+- [ ] Does NOT require "May I write" language (read-only output; write is optional report)
+- [ ] Has a next-step handoff (what to do after gap table is reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Content audit is a read-only analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — All specified content present
+
+**Fixture:**
+- `design/gdd/enemies.md` specifies 4 enemy types: Grunt, Sniper, Tank, Boss
+- `assets/art/characters/` contains folders: `grunt/`, `sniper/`, `tank/`, `boss/`
+- `design/gdd/items.md` specifies 3 item types; all 3 found in `assets/data/items/`
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads all GDDs in `design/gdd/`
+2. Skill scans `assets/` for each specified content item
+3. All 4 enemy types and 3 item types are found
+4. Gap table shows: all rows have Found Count = Specified Count, no missing items
+5. Verdict is COMPLETE
+
+**Assertions:**
+- [ ] Gap table covers all content types found in GDDs
+- [ ] Each row shows Specified Count and Found Count
+- [ ] No missing items when counts match
+- [ ] Verdict is COMPLETE
+- [ ] No files are written
+
+---
+
+### Case 2: Gaps Found — Enemy type missing from assets
+
+**Fixture:**
+- `design/gdd/enemies.md` specifies 3 enemy types: Grunt, Sniper, Boss
+- `assets/art/characters/` contains: `grunt/`, `sniper/` only (Boss folder missing)
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads GDD — finds 3 enemy types specified
+2. Skill scans `assets/art/characters/` — finds only 2
+3. Gap table row for enemies: Specified 3, Found 2, Missing: Boss
+4. Verdict is GAPS FOUND
+
+**Assertions:**
+- [ ] Gap table row identifies "Boss" as the missing item by name
+- [ ] Specified Count (3) and Found Count (2) are both shown
+- [ ] Verdict is GAPS FOUND when any content item is missing
+- [ ] Skill does not assume the asset will be added later — it flags it now
+
+---
+
+### Case 3: No GDD Content Specs Found — Guidance given
+
+**Fixture:**
+- `design/gdd/` contains only `core-loop.md` which has no content inventory section
+- No other GDDs exist with content specifications
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads all GDDs — finds no content inventory sections
+2. Skill outputs: "No content specifications found in GDDs — run /design-system first to define content lists"
+3. No gap table is produced
+4. Verdict is GAPS FOUND (cannot confirm completeness without specs)
+
+**Assertions:**
+- [ ] Skill does not produce a gap table when no GDD content specs exist
+- [ ] Output recommends running `/design-system`
+- [ ] Verdict reflects inability to confirm completeness
+
+---
+
+### Case 4: Edge Case — Asset in wrong format for target platform
+
+**Fixture:**
+- `design/gdd/audio.md` specifies audio assets as OGG format
+- `assets/audio/sfx/jump.wav` exists (WAV format, not OGG)
+- `assets/audio/sfx/land.ogg` exists (correct format)
+- `technical-preferences.md` specifies audio format: OGG
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads GDD audio spec and technical preferences for format requirements
+2. Skill finds `jump.wav` — present but in wrong format
+3. Gap table row for audio: Specified 2, Found 2 (by name), but `jump.wav` flagged as FORMAT ISSUE
+4. Verdict is GAPS FOUND (format compliance is part of content completeness)
+
+**Assertions:**
+- [ ] Skill checks asset format against GDD or technical preferences when format is specified
+- [ ] `jump.wav` is flagged as FORMAT ISSUE with expected format (OGG) noted
+- [ ] Format issues are distinct from missing content in the gap table
+- [ ] Verdict is GAPS FOUND when format issues exist
+
+---
+
+### Case 5: Gate Compliance — Read-only; no gate; gap table for human review
+
+**Fixture:**
+- GDDs specify 10 content items; 9 are found in assets; 1 is missing
+- `review-mode.txt` contains `full`
+
+**Input:** `/content-audit`
+
+**Expected behavior:**
+1. Skill reads GDDs and scans assets; produces gap table
+2. No director gate is invoked regardless of review mode
+3. Skill presents gap table to user as read-only output
+4. Verdict is GAPS FOUND
+5. Skill offers to write an audit report but does not write automatically
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Gap table is presented without auto-writing any file
+- [ ] Optional report write is offered but not forced
+- [ ] Skill does not modify any asset files
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads GDDs and asset directory before producing gap table
+- [ ] Gap table shows Content Type, Specified Count, Found Count, Missing Items
+- [ ] Does not write files without explicit user approval
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
+
+---
+
+## Coverage Notes
+
+- MISSING CRITICAL CONTENT verdict (vs. GAPS FOUND) is triggered when the
+  missing item is tagged as critical in the GDD; this is not explicitly tested
+  but follows the same detection path.
+- The case where `assets/` directory does not exist is not tested; the skill
+  would produce a MISSING CRITICAL CONTENT verdict for all specified items.
--- a/Framework/skills/analysis/estimate.md
+++ b/Framework/skills/analysis/estimate.md
@@ -0,0 +1,168 @@
+# Skill Test Spec: /estimate
+
+## Skill Summary
+
+`/estimate` estimates task or story effort using a relative-size scale (S / M /
+L / XL) based on story complexity, acceptance criteria count, and historical
+sprint velocity from past sprint files. Estimates are advisory and are never
+written automatically. No director gates are invoked. Verdicts are effort ranges,
+not pass/fail — every run produces an estimate.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains size labels: S, M, L, XL (the "verdict" equivalents for this skill)
+- [ ] Does NOT require "May I write" language (advisory output only)
+- [ ] Has a next-step handoff (how to use the estimate in sprint planning)
+
+---
+
+## Director Gate Checks
+
+None. Estimation is an advisory informational skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Clear story with known tech stack
+
+**Fixture:**
+- `production/epics/combat/story-hitbox-detection.md` exists with:
+  - 4 clear Acceptance Criteria
+  - ADR reference (Accepted status)
+  - No "unknown" or "TBD" language in story body
+- `production/sprints/sprint-003.md` through `sprint-005.md` exist with velocity data
+- Tech stack is GDScript (well-understood by team per sprint history)
+
+**Input:** `/estimate production/epics/combat/story-hitbox-detection.md`
+
+**Expected behavior:**
+1. Skill reads the story file — assesses clarity, AC count, tech stack
+2. Skill reads sprint history to determine average velocity
+3. Skill outputs estimate: M (1–2 days) with reasoning
+4. No files are written
+
+**Assertions:**
+- [ ] Estimate is M for a clear, well-scoped story with known tech
+- [ ] Reasoning references AC count, tech stack familiarity, and velocity data
+- [ ] Estimate is presented as a range (e.g., "1–2 days"), not a single point
+- [ ] No files are written
+
+---
+
+### Case 2: High Uncertainty — Unknown system, no ADR yet
+
+**Fixture:**
+- `production/epics/online/story-lobby-matchmaking.md` exists with:
+  - 2 vague Acceptance Criteria (using "should" and "TBD")
+  - No ADR reference — matchmaking architecture not yet decided
+  - References new subsystem ("online/matchmaking") with no existing source files
+
+**Input:** `/estimate production/epics/online/story-lobby-matchmaking.md`
+
+**Expected behavior:**
+1. Skill reads story — finds vague AC, no ADR, no existing source
+2. Skill flags multiple uncertainty factors
+3. Estimate is L–XL with an explicit risk note: "Estimate range is wide due to architectural unknowns"
+4. Skill recommends creating an ADR before development begins
+
+**Assertions:**
+- [ ] Estimate is L or XL (not S or M) when significant unknowns exist
+- [ ] Risk note explains the specific unknowns driving the wide range
+- [ ] Output recommends resolving architectural questions first
+- [ ] No files are written
+
+---
+
+### Case 3: No Sprint Velocity Data — Conservative defaults used
+
+**Fixture:**
+- Story file exists and is well-defined
+- `production/sprints/` is empty — no historical sprints
+
+**Input:** `/estimate production/epics/core/story-save-load.md`
+
+**Expected behavior:**
+1. Skill reads story — assesses complexity
+2. Skill attempts to read sprint velocity data — finds none
+3. Skill notes: "No sprint history found — using conservative defaults for velocity"
+4. Estimate is produced using default assumptions (e.g., 1 story point = 1 day)
+5. No files are written
+
+**Assertions:**
+- [ ] Skill does not error when no sprint history exists
+- [ ] Output explicitly notes that conservative defaults are being used
+- [ ] Estimate is still produced (not blocked by missing velocity)
+- [ ] Conservative defaults produce a higher (not lower) estimate range
+
+---
+
+### Case 4: Multiple Stories — Each estimated individually plus sprint total
+
+**Fixture:**
+- User provides a sprint file: `production/sprints/sprint-007.md` with 4 stories
+- Sprint history exists (3 previous sprints)
+
+**Input:** `/estimate production/sprints/sprint-007.md`
+
+**Expected behavior:**
+1. Skill reads sprint file — identifies 4 stories
+2. Skill estimates each story individually: S, M, M, L
+3. Skill computes sprint total: approximately 6–8 story points
+4. Skill presents per-story estimates followed by sprint total
+5. No files are written
+
+**Assertions:**
+- [ ] Each story receives its own estimate label
+- [ ] Sprint total is presented after individual estimates
+- [ ] Total is a sum range derived from individual ranges
+- [ ] Skill handles sprint files (not just single story files) as input
+
+---
+
+### Case 5: Gate Compliance — No gate; estimates are informational
+
+**Fixture:**
+- Story file exists with medium complexity
+- `review-mode.txt` contains `full`
+
+**Input:** `/estimate production/epics/core/story-item-pickup.md`
+
+**Expected behavior:**
+1. Skill reads story and sprint history; computes estimate
+2. No director gate is invoked in any review mode
+3. Estimate is presented as advisory output only
+4. Skill notes: "Use this estimate in /sprint-plan when selecting stories for the next sprint"
+
+**Assertions:**
+- [ ] No director gate is invoked regardless of review mode
+- [ ] Output is purely informational — no approval or write prompt
+- [ ] Next-step recommendation references `/sprint-plan`
+- [ ] Estimate does not change based on review mode
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads story file before estimating
+- [ ] Reads sprint velocity history when available
+- [ ] Produces effort range (S/M/L/XL), not a single number
+- [ ] Does not write any files
+- [ ] No director gates are invoked
+- [ ] Always produces an estimate (never blocked by missing data; uses defaults instead)
+
+---
+
+## Coverage Notes
+
+- The skill does not produce PASS/FAIL verdicts; the "verdict" here is the
+  effort range itself. Test assertions focus on the accuracy of the range
+  and the quality of the reasoning, not a binary outcome.
+- Team-specific velocity calibration (what "M" means for this team) is an
+  implementation detail not tested here; it is configured via sprint history.
--- a/Framework/skills/analysis/perf-profile.md
+++ b/Framework/skills/analysis/perf-profile.md
@@ -0,0 +1,171 @@
+# Skill Test Spec: /perf-profile
+
+## Skill Summary
+
+`/perf-profile` is a structured performance profiling workflow that identifies
+bottlenecks and recommends optimizations. If profiler data or performance logs
+are provided, it analyzes them directly. If not, it guides the user through a
+manual profiling checklist. No director gates are invoked. The skill asks
+"May I write to `production/qa/perf-[date].md`?" before persisting a report.
+Verdicts: WITHIN BUDGET, CONCERNS, or OVER BUDGET.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: WITHIN BUDGET, CONCERNS, OVER BUDGET
+- [ ] Contains "May I write" language (skill writes perf report)
+- [ ] Has a next-step handoff (what to do after performance findings are reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Performance profiling is an advisory analysis skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Frame data provided, draw call spike found
+
+**Fixture:**
+- User provides `production/qa/profiler-export-2026-03-15.json` with frame time data
+- Data shows: average frame time 14ms (within 16.6ms budget), but frames 42–48 spike to 28ms
+- Spike correlates with a scene with 450 draw calls (budget: 200)
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-03-15.json`
+
+**Expected behavior:**
+1. Skill reads profiler data
+2. Skill identifies average frame time is within budget
+3. Skill identifies draw call spike on frames 42–48 (450 calls vs 200 budget)
+4. Verdict is CONCERNS (average OK, but spikes indicate an issue)
+5. Skill recommends batching or culling for the identified scene
+6. Skill asks "May I write to `production/qa/perf-2026-04-06.md`?"
+
+**Assertions:**
+- [ ] Spike frames are identified by frame number
+- [ ] Draw call count and budget are compared explicitly
+- [ ] Verdict is CONCERNS when spikes exceed budget even if average is OK
+- [ ] At least one specific optimization recommendation is given
+- [ ] "May I write" prompt appears before writing report
+
+---
+
+### Case 2: No Profiler Data — Manual checklist output
+
+**Fixture:**
+- User runs `/perf-profile` with no arguments
+- No profiler data files exist in `production/qa/`
+
+**Input:** `/perf-profile`
+
+**Expected behavior:**
+1. Skill finds no profiler data
+2. Skill outputs a manual profiling checklist for the user to work through:
+   - Enable Godot profiler or target engine's profiler
+   - Record a 60-second play session
+   - Export frame time data
+   - Note any dropped frames or hitches
+3. Skill asks user to provide data once collected before running analysis
+
+**Assertions:**
+- [ ] Skill does not crash or emit a verdict when no data is provided
+- [ ] Manual profiling checklist is output (actionable steps, not just an error)
+- [ ] No verdict is emitted (there is nothing to assess yet)
+- [ ] No files are written
+
+---
+
+### Case 3: Over Budget — Frame budget exceeded for target platform
+
+**Fixture:**
+- Profiler data shows consistent 22ms frame times (target: 16.6ms for 60fps)
+- All frames exceed budget; no single spike — systemic issue
+- `technical-preferences.md` specifies target platform: PC, 60fps
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-03-20.json`
+
+**Expected behavior:**
+1. Skill reads profiler data and technical preferences for performance budget
+2. All frames are over the 16.6ms budget
+3. Verdict is OVER BUDGET
+4. Skill outputs a prioritized optimization list (e.g., LOD system, shader complexity, physics tick rate)
+5. Skill asks "May I write" before writing report
+
+**Assertions:**
+- [ ] Verdict is OVER BUDGET when all or most frames exceed budget
+- [ ] Target frame budget is read from `technical-preferences.md` (not hardcoded)
+- [ ] Optimization priority list is provided, not just the raw verdict
+- [ ] "May I write" prompt appears before report write
+
+---
+
+### Case 4: Previous Perf Report Exists — Delta comparison
+
+**Fixture:**
+- `production/qa/perf-2026-03-28.md` exists with prior results (avg 15ms, max 19ms)
+- New profiler export shows: avg 13ms, max 17ms
+- Both reports are for the same scene
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-04-05.json`
+
+**Expected behavior:**
+1. Skill reads new profiler data
+2. Skill detects prior report for the same scene
+3. Skill computes deltas: avg improved 2ms, max improved 2ms
+4. Skill presents regression check: no regressions detected
+5. Verdict is WITHIN BUDGET; report notes improvement since last profile
+
+**Assertions:**
+- [ ] Skill checks `production/qa/` for prior perf reports before writing
+- [ ] Delta comparison is shown (prior vs. current for key metrics)
+- [ ] Verdict is WITHIN BUDGET when current metrics are within budget
+- [ ] Improvement trend is noted positively in the report
+
+---
+
+### Case 5: Gate Compliance — No gate; performance-analyst separate
+
+**Fixture:**
+- Profiler data shows CONCERNS-level findings (some spikes)
+- `review-mode.txt` contains `full`
+
+**Input:** `/perf-profile production/qa/profiler-export-2026-04-01.json`
+
+**Expected behavior:**
+1. Skill analyzes profiler data; verdict is CONCERNS
+2. No director gate is invoked regardless of review mode
+3. Output notes: "For in-depth analysis, consider running `/perf-profile` with the performance-analyst agent"
+4. Skill asks "May I write" and writes report on user approval
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Performance-analyst consultation is suggested (not mandated)
+- [ ] "May I write" prompt appears before report write
+- [ ] Verdict is CONCERNS for spike-based findings
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads profiler data when provided; outputs checklist when not
+- [ ] Reads `technical-preferences.md` for target platform frame budget
+- [ ] Checks for prior perf reports to enable delta comparison
+- [ ] Always asks "May I write" before writing report
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: WITHIN BUDGET, CONCERNS, OVER BUDGET
+
+---
+
+## Coverage Notes
+
+- Platform-specific profiling workflows (console, mobile) are not tested here;
+  the checklist output in Case 2 would be platform-specific in practice.
+- The delta comparison in Case 4 assumes reports cover the same scene; cross-scene
+  comparisons are not explicitly handled.
--- a/Framework/skills/analysis/scope-check.md
+++ b/Framework/skills/analysis/scope-check.md
@@ -0,0 +1,168 @@
+# Skill Test Spec: /scope-check
+
+## Skill Summary
+
+`/scope-check` is a Haiku-tier read-only skill that analyzes a feature, sprint,
+or story for scope creep risk. It reads sprint and story files and compares them
+against the active milestone goals. It is designed for fast, low-cost checks
+before or during planning. No director gates are invoked. No files are written.
+Verdicts: ON SCOPE, CONCERNS, or SCOPE CREEP DETECTED.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
+- [ ] Does NOT require "May I write" language (read-only skill)
+- [ ] Has a next-step handoff (what to do based on verdict)
+
+---
+
+## Director Gate Checks
+
+None. Scope check is a read-only advisory skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Sprint stories align with milestone goals
+
+**Fixture:**
+- `production/milestones/milestone-03.md` lists 3 goals: combat system, enemy AI, level loading
+- `production/sprints/sprint-006.md` contains 5 stories, all tagged to one of the 3 goals
+- `production/session-state/active.md` references milestone-03 as the active milestone
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads active milestone goals from milestone-03
+2. Skill reads sprint-006 stories and checks each against milestone goals
+3. All 5 stories map to one of the 3 goals
+4. Skill outputs a mapping table: story → milestone goal
+5. Verdict is ON SCOPE
+
+**Assertions:**
+- [ ] Each story is mapped to a milestone goal in the output
+- [ ] Verdict is ON SCOPE when all stories map to milestone goals
+- [ ] No files are written
+- [ ] Skill does not modify sprint or milestone files
+
+---
+
+### Case 2: Scope Creep Detected — Stories introducing systems not in milestone
+
+**Fixture:**
+- `production/milestones/milestone-03.md` goals: combat, enemy AI, level loading
+- `production/sprints/sprint-006.md` contains 5 stories:
+  - 3 stories map to milestone goals
+  - 2 stories reference "online leaderboard" and "achievement system" (not in milestone-03)
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads milestone goals and sprint stories
+2. Skill identifies 2 stories with no matching milestone goal
+3. Skill names the out-of-scope stories: "Online Leaderboard Feature", "Achievement System Setup"
+4. Verdict is SCOPE CREEP DETECTED
+
+**Assertions:**
+- [ ] Out-of-scope stories are named explicitly in the output
+- [ ] Verdict is SCOPE CREEP DETECTED when any story has no milestone goal match
+- [ ] Skill does not automatically remove the stories — findings are advisory
+- [ ] Output recommends deferring the out-of-scope stories to a later milestone
+
+---
+
+### Case 3: No Milestone Defined — CONCERNS; scope cannot be validated
+
+**Fixture:**
+- `production/session-state/active.md` has no milestone reference
+- `production/milestones/` directory exists but is empty
+- `production/sprints/sprint-006.md` has 4 stories
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads active.md — finds no milestone reference
+2. Skill checks `production/milestones/` — no milestone files found
+3. Skill outputs: "No active milestone defined — scope cannot be validated"
+4. Verdict is CONCERNS
+
+**Assertions:**
+- [ ] Skill does not error when no milestone is defined
+- [ ] Output explicitly states that scope validation requires a milestone reference
+- [ ] Verdict is CONCERNS (not ON SCOPE or SCOPE CREEP DETECTED without data)
+- [ ] Output suggests running `/milestone-review` or creating a milestone
+
+---
+
+### Case 4: Single Story Check — Evaluated against its parent epic
+
+**Fixture:**
+- User targets a single story: `production/epics/combat/story-parry-timing.md`
+- Story references parent epic: `epic-combat.md`
+- `production/epics/combat/epic-combat.md` has scope: "melee combat mechanics"
+- Story title: "Implement parry timing window" — matches epic scope
+
+**Input:** `/scope-check production/epics/combat/story-parry-timing.md`
+
+**Expected behavior:**
+1. Skill reads the specified story file
+2. Skill reads the parent epic to get scope definition
+3. Skill evaluates story against epic scope — "parry timing" matches "melee combat"
+4. Verdict is ON SCOPE
+
+**Assertions:**
+- [ ] Single-file argument is accepted (story path, not sprint)
+- [ ] Skill reads the parent epic referenced in the story file
+- [ ] Story is evaluated against epic scope (not milestone scope) in single-story mode
+- [ ] Verdict is ON SCOPE when story matches epic scope
+
+---
+
+### Case 5: Gate Compliance — No gate; PR may be consulted separately
+
+**Fixture:**
+- Sprint has 2 SCOPE CREEP stories and 3 ON SCOPE stories
+- `review-mode.txt` contains `full`
+
+**Input:** `/scope-check`
+
+**Expected behavior:**
+1. Skill reads milestone and sprint; identifies 2 scope creep items
+2. No director gate is invoked regardless of review mode
+3. Skill presents findings with SCOPE CREEP DETECTED verdict
+4. Output notes: "Consider raising scope concerns with the Producer before sprint begins"
+5. Skill ends without writing any files
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Producer consultation is suggested (not mandated)
+- [ ] No files are written
+- [ ] Verdict is SCOPE CREEP DETECTED
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads milestone goals and sprint/story files before analysis
+- [ ] Maps each story to a milestone goal (or flags as unmapped)
+- [ ] Does not write any files
+- [ ] No director gates are invoked
+- [ ] Runs on Haiku model tier (fast, low-cost)
+- [ ] Verdict is one of: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
+
+---
+
+## Coverage Notes
+
+- The case where the sprint file itself does not exist is not tested; the
+  skill would output a CONCERNS verdict with a message about missing sprint data.
+- Partial scope overlap (story touches a milestone goal but also introduces
+  new scope) is not explicitly tested; implementation may classify this as
+  CONCERNS rather than SCOPE CREEP DETECTED.
--- a/Framework/skills/analysis/security-audit.md
+++ b/Framework/skills/analysis/security-audit.md
@@ -0,0 +1,167 @@
+# Skill Test Spec: /security-audit
+
+## Skill Summary
+
+`/security-audit` audits the game for security risks including save data
+integrity, network communication, anti-cheat exposure, and data privacy. It
+reads source files in `src/` for security patterns and checks whether sensitive
+data is handled correctly. No director gates are invoked. The skill does not
+write files (findings report only). Verdicts: SECURE, CONCERNS, or
+VULNERABILITIES FOUND.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: SECURE, CONCERNS, VULNERABILITIES FOUND
+- [ ] Does NOT require "May I write" language (read-only; findings report only)
+- [ ] Has a next-step handoff (what to do with findings)
+
+---
+
+## Director Gate Checks
+
+None. Security audit is a read-only advisory skill; no gates are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Save data encrypted, no hardcoded credentials
+
+**Fixture:**
+- `src/core/save_system.gd` uses `Crypto` class to encrypt save data before writing
+- No hardcoded API keys, passwords, or credentials in any `src/` file
+- No version numbers or internal build IDs exposed in client-facing output
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans `src/` for security patterns: encryption usage, hardcoded credentials, exposed internals
+2. All checks pass: save data encrypted, no credentials found, no exposed internals
+3. Findings report shows all checks PASS
+4. Verdict is SECURE
+
+**Assertions:**
+- [ ] Skill checks save data handling for encryption usage
+- [ ] Skill scans for hardcoded credentials (API keys, passwords, tokens)
+- [ ] Skill checks for version/build numbers exposed to players
+- [ ] All checks shown in findings report
+- [ ] Verdict is SECURE when all checks pass
+
+---
+
+### Case 2: Vulnerabilities Found — Unencrypted save data and exposed version
+
+**Fixture:**
+- `src/core/save_system.gd` writes save data as plain JSON (no encryption)
+- `src/ui/debug_overlay.gd` contains: `label.text = "Build: " + ProjectSettings.get("application/config/version")`
+  (exposes internal build version to player)
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans `src/` — finds unencrypted save write in `save_system.gd`
+2. Skill finds exposed version string in `debug_overlay.gd`
+3. Both findings are flagged as VULNERABILITIES
+4. Verdict is VULNERABILITIES FOUND
+5. Skill provides remediation recommendations for each vulnerability
+
+**Assertions:**
+- [ ] Unencrypted save data is flagged as a vulnerability with file and approximate line
+- [ ] Exposed version string is flagged as a vulnerability
+- [ ] Remediation suggestion is given for each vulnerability
+- [ ] Verdict is VULNERABILITIES FOUND when any vulnerability is detected
+- [ ] No files are written or modified
+
+---
+
+### Case 3: Online Features Without Authentication — CONCERNS
+
+**Fixture:**
+- `src/networking/lobby.gd` exists with functions: `join_lobby()`, `send_chat()`
+- No authentication check is found before `send_chat()` — players can call it without being verified
+- Game has online multiplayer features (inferred from file presence)
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans `src/networking/` — detects online feature code
+2. Skill checks for authentication guard before network calls — finds none on `send_chat()`
+3. Flags: "Online feature without authentication check — CONCERNS"
+4. Verdict is CONCERNS (not VULNERABILITIES FOUND, as this is a missing control, not an exploit)
+
+**Assertions:**
+- [ ] Skill detects online features by scanning for networking source files
+- [ ] Missing authentication checks before network operations are flagged
+- [ ] Verdict is CONCERNS (advisory severity) for missing authentication guards
+- [ ] Output recommends adding authentication before network calls
+
+---
+
+### Case 4: Edge Case — No Source Files to Analyze
+
+**Fixture:**
+- `src/` directory does not exist or is completely empty
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill attempts to scan `src/` — no files found
+2. Skill outputs an error: "No source files found in `src/` — nothing to audit"
+3. No findings report is generated
+4. No verdict is emitted
+
+**Assertions:**
+- [ ] Skill does not crash when `src/` is empty or absent
+- [ ] Output clearly states that no source files were found
+- [ ] No verdict is emitted (there is nothing to assess)
+- [ ] Skill suggests verifying the `src/` directory path
+
+---
+
+### Case 5: Gate Compliance — No gate; security-engineer invoked separately
+
+**Fixture:**
+- Source files exist; 1 CONCERNS-level finding detected (debug logging enabled in release build)
+- `review-mode.txt` contains `full`
+
+**Input:** `/security-audit`
+
+**Expected behavior:**
+1. Skill scans source; finds debug logging active in release path
+2. No director gate is invoked regardless of review mode
+3. Verdict is CONCERNS
+4. Output notes: "For formal security review, consider engaging a security-engineer agent"
+5. Findings are presented as a read-only report; no files written
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Security-engineer consultation is suggested (not mandated)
+- [ ] No files are written
+- [ ] Verdict is CONCERNS for advisory-level security findings
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads source files in `src/` before auditing
+- [ ] Checks save data encryption, hardcoded credentials, exposed internals, auth guards
+- [ ] Provides remediation recommendations for each finding
+- [ ] Does not write any files (read-only skill)
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: SECURE, CONCERNS, VULNERABILITIES FOUND
+
+---
+
+## Coverage Notes
+
+- Anti-cheat analysis (client-side value validation, server authority) is not
+  explicitly tested here; it follows the CONCERNS or VULNERABILITIES pattern
+  depending on severity.
+- Data privacy compliance (GDPR, COPPA) is out of scope for this spec; those
+  require legal review beyond code scanning.
--- a/Framework/skills/analysis/tech-debt.md
+++ b/Framework/skills/analysis/tech-debt.md
@@ -0,0 +1,171 @@
+# Skill Test Spec: /tech-debt
+
+## Skill Summary
+
+`/tech-debt` tracks, categorizes, and prioritizes technical debt across the
+codebase. It reads `docs/tech-debt-register.md` for the existing debt register
+and scans source files in `src/` for inline `TODO` and `FIXME` comments. It
+merges and sorts items by severity. No director gates are invoked. The skill
+asks "May I write to `docs/tech-debt-register.md`?" before updating. Verdicts:
+REGISTER UPDATED or NO NEW DEBT FOUND.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: REGISTER UPDATED, NO NEW DEBT FOUND
+- [ ] Contains "May I write" language (skill writes to debt register)
+- [ ] Has a next-step handoff (what to do after register is updated)
+
+---
+
+## Director Gate Checks
+
+None. Tech debt tracking is an internal codebase analysis skill; no gates are
+invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Inline TODOs plus existing register items merged
+
+**Fixture:**
+- `docs/tech-debt-register.md` exists with 2 items (LOW and MEDIUM severity)
+- `src/gameplay/combat.gd` has 2 `# TODO` comments and 1 `# FIXME` comment
+- `src/ui/hud.gd` has 0 inline debt comments
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill reads `docs/tech-debt-register.md` — finds 2 existing items
+2. Skill scans `src/` — finds 3 inline comments (2 TODOs, 1 FIXME)
+3. Skill checks whether inline comments already exist in the register (deduplication)
+4. Skill presents combined list sorted by severity (FIXME before TODO by default)
+5. Skill asks "May I write to `docs/tech-debt-register.md`?"
+6. User approves; register updated; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] Inline comments are found by scanning `src/` recursively
+- [ ] Existing register items are not duplicated
+- [ ] Combined list is sorted by severity
+- [ ] "May I write" prompt appears before any write
+- [ ] Verdict is REGISTER UPDATED
+
+---
+
+### Case 2: Register Doesn't Exist — Offered to create it
+
+**Fixture:**
+- `docs/tech-debt-register.md` does NOT exist
+- `src/` contains 4 inline TODO/FIXME comments
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill attempts to read `docs/tech-debt-register.md` — not found
+2. Skill informs user: "No tech-debt-register.md found"
+3. Skill offers to create the register with the inline items it found
+4. Skill asks "May I write to `docs/tech-debt-register.md`?" (create)
+5. User approves; register created with 4 items; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] Skill does not crash when register file is absent
+- [ ] User is offered register creation (not silently skipping)
+- [ ] "May I write" prompt reflects file creation (not update)
+- [ ] Verdict is REGISTER UPDATED after creation
+
+---
+
+### Case 3: Resolved Item Detected — Marked resolved in register
+
+**Fixture:**
+- `docs/tech-debt-register.md` has 3 items; one references `src/gameplay/legacy_input.gd`
+- `src/gameplay/legacy_input.gd` has been deleted (refactored away)
+- The referenced TODO comment no longer exists in source
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill reads register — finds 3 items
+2. Skill scans `src/` — does not find the source location referenced by item 2
+3. Skill flags item 2 as RESOLVED (source is gone)
+4. Skill presents the resolved item to user for confirmation
+5. On approval, register is updated with item 2 marked `Status: Resolved`
+
+**Assertions:**
+- [ ] Skill checks whether each register item's source reference still exists
+- [ ] Missing source locations result in items being flagged as RESOLVED
+- [ ] User confirms before resolved items are written
+- [ ] RESOLVED items are kept in the register (not deleted) for audit history
+
+---
+
+### Case 4: Edge Case — CRITICAL debt item surfaces prominently
+
+**Fixture:**
+- `src/core/network_sync.gd` has a comment: `# FIXME(CRITICAL): race condition in sync buffer — can corrupt save data`
+- `docs/tech-debt-register.md` exists with 5 lower-severity items
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill scans source and finds the CRITICAL-tagged FIXME
+2. Skill presents the CRITICAL item at the top of the output — before the full table
+3. Skill asks user to acknowledge the critical item before proceeding
+4. After acknowledgment, skill presents full debt table and asks to write
+5. Register is updated with CRITICAL item at top; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] CRITICAL items appear at the top of the output, not buried in the table
+- [ ] Skill surfaces CRITICAL items before asking to write
+- [ ] User acknowledgment of the CRITICAL item is requested
+- [ ] CRITICAL severity is preserved in the written register entry
+
+---
+
+### Case 5: Gate Compliance — No gate; register updated only with approval
+
+**Fixture:**
+- Inline scan finds 2 new TODOs; register has 3 existing items
+- `review-mode.txt` contains `full`
+
+**Input:** `/tech-debt`
+
+**Expected behavior:**
+1. Skill scans source and reads register; compiles combined debt list
+2. No director gate is invoked regardless of review mode
+3. Skill presents sorted debt table to user
+4. Skill asks "May I write to `docs/tech-debt-register.md`?"
+5. User approves; register updated; verdict REGISTER UPDATED
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Debt table is presented before any write prompt
+- [ ] "May I write" prompt appears before file update
+- [ ] Write only occurs with explicit user approval
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `docs/tech-debt-register.md` and scans `src/` before compiling
+- [ ] Deduplicates inline comments against existing register items
+- [ ] Sorts combined list by severity
+- [ ] Always asks "May I write" before updating register
+- [ ] No director gates are invoked
+- [ ] Verdict is REGISTER UPDATED or NO NEW DEBT FOUND
+
+---
+
+## Coverage Notes
+
+- The case where `src/` is empty or absent is not tested; behavior follows
+  the NO NEW DEBT FOUND path for the inline scan, but register items would
+  still be read and presented.
+- TODO comments without severity tags are treated as LOW severity by default;
+  this classification detail is an implementation concern, not tested here.
--- a/Framework/skills/analysis/test-evidence-review.md
+++ b/Framework/skills/analysis/test-evidence-review.md
@@ -0,0 +1,175 @@
+# Skill Test Spec: /test-evidence-review
+
+## Skill Summary
+
+`/test-evidence-review` performs a quality review of test files in `tests/`,
+checking test naming conventions, determinism, isolation, and absence of
+hardcoded magic numbers — all against the project's test standards defined in
+`coding-standards.md`. Findings may be flagged for qa-lead review. No director
+gates are invoked. The skill does not write without user approval. Verdicts:
+PASS, WARNINGS, or FAIL.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: PASS, WARNINGS, FAIL
+- [ ] Does NOT require "May I write" language (read-only; write is optional flagging report)
+- [ ] Has a next-step handoff (what to do after findings are reviewed)
+
+---
+
+## Director Gate Checks
+
+None. Test evidence review is an advisory quality skill; QL-TEST-COVERAGE gate
+is a separate skill invocation and is NOT triggered here.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Tests follow all standards
+
+**Fixture:**
+- `tests/unit/combat/health_system_take_damage_test.gd` exists with:
+  - Naming: `test_health_system_take_damage_reduces_health()` (follows `test_[system]_[scenario]_[expected]`)
+  - Arrange/Act/Assert structure present
+  - No `sleep()`, `await` with time values, or random seeds
+  - No calls to external APIs or file I/O
+  - No inline magic numbers (uses constants from `tests/unit/combat/fixtures/`)
+
+**Input:** `/test-evidence-review tests/unit/combat/`
+
+**Expected behavior:**
+1. Skill reads test standards from `coding-standards.md`
+2. Skill reads the test file; checks all 5 standards
+3. All checks pass: naming, structure, determinism, isolation, no hardcoded data
+4. Verdict is PASS
+
+**Assertions:**
+- [ ] Each of the 5 test standards is checked and reported
+- [ ] All checks show PASS when standards are met
+- [ ] Verdict is PASS
+- [ ] No files are written
+
+---
+
+### Case 2: Fail — Timing dependency detected
+
+**Fixture:**
+- `tests/unit/ui/hud_update_test.gd` contains:
+  ```gdscript
+  await get_tree().create_timer(1.0).timeout
+  assert_eq(label.text, "Ready")
+  ```
+- Real-time wait of 1 second used instead of mock or signal-based assertion
+
+**Input:** `/test-evidence-review tests/unit/ui/hud_update_test.gd`
+
+**Expected behavior:**
+1. Skill reads the test file
+2. Skill detects real-time wait (`create_timer(1.0)`) — non-deterministic timing dependency
+3. Skill flags this as a FAIL-level finding
+4. Verdict is FAIL
+5. Skill recommends replacing the timer with a signal-based assertion or mock
+
+**Assertions:**
+- [ ] Real-time wait usage is detected as a non-deterministic timing dependency
+- [ ] Finding is classified as FAIL severity (blocking — violates determinism standard)
+- [ ] Verdict is FAIL
+- [ ] Remediation suggestion references signal-based or mock-based approach
+- [ ] Skill does not edit the test file
+
+---
+
+### Case 3: Fail — Test calls external API directly
+
+**Fixture:**
+- `tests/unit/networking/auth_test.gd` contains:
+  ```gdscript
+  var result = HTTPRequest.new().request("https://api.example.com/auth")
+  ```
+- Direct HTTP call to external API without a mock
+
+**Input:** `/test-evidence-review tests/unit/networking/auth_test.gd`
+
+**Expected behavior:**
+1. Skill reads the test file
+2. Skill detects direct external API call (HTTPRequest to live URL)
+3. Skill flags this as a FAIL-level finding — violates isolation standard
+4. Verdict is FAIL
+5. Skill recommends injecting a mock HTTP client
+
+**Assertions:**
+- [ ] Direct external API call is detected and flagged
+- [ ] Finding is classified as FAIL severity (violates isolation standard)
+- [ ] Verdict is FAIL
+- [ ] Remediation references dependency injection with a mock HTTP client
+- [ ] Skill does not modify the test file
+
+---
+
+### Case 4: Edge Case — No Test Files Found
+
+**Fixture:**
+- User calls `/test-evidence-review tests/unit/audio/`
+- `tests/unit/audio/` directory does not exist
+
+**Input:** `/test-evidence-review tests/unit/audio/`
+
+**Expected behavior:**
+1. Skill attempts to read files in `tests/unit/audio/` — not found
+2. Skill outputs: "No test files found at `tests/unit/audio/` — run `/test-setup` to scaffold test directories"
+3. No verdict is emitted
+
+**Assertions:**
+- [ ] Skill does not crash when path does not exist
+- [ ] Output names the attempted path in the message
+- [ ] Output recommends `/test-setup` for scaffolding
+- [ ] No verdict is emitted when there is nothing to review
+
+---
+
+### Case 5: Gate Compliance — No gate; QL-TEST-COVERAGE is a separate skill
+
+**Fixture:**
+- Test file has 1 WARNINGS-level finding (magic number in a non-boundary test)
+- `review-mode.txt` contains `full`
+
+**Input:** `/test-evidence-review tests/unit/combat/`
+
+**Expected behavior:**
+1. Skill reviews tests; finds 1 WARNINGS-level finding
+2. No director gate is invoked (QL-TEST-COVERAGE is invoked separately, not here)
+3. Verdict is WARNINGS
+4. Output notes: "For full test coverage gate, run `/gate-check` which invokes QL-TEST-COVERAGE"
+5. Skill offers optional report write; asks "May I write" if user opts in
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] Output distinguishes this skill from the QL-TEST-COVERAGE gate invocation
+- [ ] Optional report requires "May I write" before writing
+- [ ] Verdict is WARNINGS for advisory-level test quality issues
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads `coding-standards.md` test standards before reviewing test files
+- [ ] Checks naming, Arrange/Act/Assert structure, determinism, isolation, no hardcoded data
+- [ ] Does not edit any test files (read-only skill)
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: PASS, WARNINGS, FAIL
+
+---
+
+## Coverage Notes
+
+- Batch review of all test files in `tests/` is not explicitly tested; behavior
+  is assumed to apply the same checks file by file and aggregate the verdict.
+- The QL-TEST-COVERAGE director gate (which checks test coverage percentage) is
+  a separate concern and is intentionally NOT invoked by this skill.
--- a/Framework/skills/analysis/test-flakiness.md
+++ b/Framework/skills/analysis/test-flakiness.md
@@ -0,0 +1,177 @@
+# Skill Test Spec: /test-flakiness
+
+## Skill Summary
+
+`/test-flakiness` detects non-deterministic tests by analyzing test history logs
+(if available) or scanning test source code for common flakiness patterns (random
+numbers without seeds, real-time waits, external I/O). No director gates are
+invoked. The skill does not write without user approval. Verdicts: NO FLAKINESS,
+SUSPECT TESTS FOUND, or CONFIRMED FLAKY.
+
+---
+
+## Static Assertions (Structural)
+
+Verified automatically by `/skill-test static` — no fixture needed.
+
+- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
+- [ ] Has ≥2 phase headings
+- [ ] Contains verdict keywords: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
+- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
+- [ ] Has a next-step handoff (what to do after flakiness findings)
+
+---
+
+## Director Gate Checks
+
+None. Flakiness detection is an advisory quality skill for the QA lead; no gates
+are invoked.
+
+---
+
+## Test Cases
+
+### Case 1: Happy Path — Clean test history, no flakiness
+
+**Fixture:**
+- `production/qa/test-history/` contains logs for 10 test runs
+- All tests pass consistently across all 10 runs (100% pass rate per test)
+- No test has a failure pattern
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill reads test history logs from `production/qa/test-history/`
+2. Skill computes per-test pass rate across 10 runs
+3. All tests pass all 10 runs — no inconsistency detected
+4. Verdict is NO FLAKINESS
+
+**Assertions:**
+- [ ] Skill reads test history logs when available
+- [ ] Per-test pass rate is computed across all available runs
+- [ ] Verdict is NO FLAKINESS when all tests pass consistently
+- [ ] No files are written
+
+---
+
+### Case 2: Suspect Tests Found — Test fails intermittently in history
+
+**Fixture:**
+- `production/qa/test-history/` contains logs for 10 test runs
+- `test_combat_damage_applies_crit_multiplier` passes 7 times, fails 3 times
+- Failure messages differ (sometimes timeout, sometimes wrong value)
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill reads test history logs — computes pass rates
+2. `test_combat_damage_applies_crit_multiplier` has 70% pass rate (threshold: 95%)
+3. Skill flags it as SUSPECT with pass rate (7/10) and failure pattern noted
+4. Verdict is SUSPECT TESTS FOUND
+5. Skill recommends investigating the test for timing or state dependencies
+
+**Assertions:**
+- [ ] Tests below the pass-rate threshold are flagged by name
+- [ ] Pass rate (fraction and percentage) is shown for each suspect test
+- [ ] Failure pattern (e.g., inconsistent error messages) is noted if detectable
+- [ ] Verdict is SUSPECT TESTS FOUND
+- [ ] Skill recommends investigation steps
+
+---
+
+### Case 3: Source Pattern — Random number used without seed
+
+**Fixture:**
+- No test history logs exist
+- `tests/unit/loot/loot_drop_test.gd` contains:
+  ```gdscript
+  var roll = randf()  # unseeded random — non-deterministic
+  assert_gt(roll, 0.5, "Loot should drop above 50%")
+  ```
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill finds no test history logs
+2. Skill falls back to source code analysis
+3. Skill detects `randf()` call without a preceding `seed()` call
+4. Skill flags the test as FLAKINESS RISK (source pattern, not confirmed)
+5. Verdict is SUSPECT TESTS FOUND (pattern detected, not confirmed by history)
+6. Skill recommends seeding random before the call or mocking the random function
+
+**Assertions:**
+- [ ] Source code analysis is used as fallback when no history logs exist
+- [ ] Unseeded random number usage is detected as a flakiness risk
+- [ ] Verdict is SUSPECT TESTS FOUND (not CONFIRMED FLAKY — no history to confirm)
+- [ ] Remediation recommends seeding or mocking
+
+---
+
+### Case 4: No Test History — Source-only analysis with common patterns
+
+**Fixture:**
+- `production/qa/test-history/` does not exist
+- `tests/` contains 15 test files
+- Scan finds 2 tests using `OS.get_ticks_msec()` for timing assertions
+- No other flakiness patterns found
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill checks for test history — not found
+2. Skill notes: "No test history available — analyzing source code for flakiness patterns only"
+3. Skill scans all test files for known patterns: unseeded random, real-time waits, system clock usage
+4. Finds 2 tests using `OS.get_ticks_msec()` — flags as FLAKINESS RISK
+5. Verdict is SUSPECT TESTS FOUND
+
+**Assertions:**
+- [ ] Skill notes clearly that source-only analysis is being performed (no history)
+- [ ] Common flakiness patterns are scanned: random, time-based assertions, external I/O
+- [ ] `OS.get_ticks_msec()` usage for assertions is flagged as a flakiness risk
+- [ ] Verdict is SUSPECT TESTS FOUND when source patterns are found
+
+---
+
+### Case 5: Gate Compliance — No gate; flakiness report is advisory
+
+**Fixture:**
+- Test history shows 1 CONFIRMED FLAKY test (fails 6 out of 10 runs)
+- `review-mode.txt` contains `full`
+
+**Input:** `/test-flakiness`
+
+**Expected behavior:**
+1. Skill analyzes test history; identifies 1 confirmed flaky test
+2. No director gate is invoked regardless of review mode
+3. Verdict is CONFIRMED FLAKY
+4. Skill presents findings and offers optional written report
+5. If user opts in: "May I write to `production/qa/flakiness-report-[date].md`?"
+
+**Assertions:**
+- [ ] No director gate is invoked in any review mode
+- [ ] CONFIRMED FLAKY verdict requires history-based evidence (not just source patterns)
+- [ ] Optional report requires "May I write" before writing
+- [ ] Flakiness report is advisory for qa-lead; skill does not auto-disable tests
+
+---
+
+## Protocol Compliance
+
+- [ ] Reads test history logs when available; falls back to source analysis when not
+- [ ] Notes clearly which analysis mode is being used (history vs. source-only)
+- [ ] Flakiness threshold (e.g., 95% pass rate) is used for SUSPECT classification
+- [ ] CONFIRMED FLAKY requires history evidence; SUSPECT covers source patterns only
+- [ ] Does not disable or modify any test files
+- [ ] No director gates are invoked
+- [ ] Verdict is one of: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
+
+---
+
+## Coverage Notes
+
+- The pass-rate threshold for SUSPECT classification (95% suggested above) is an
+  implementation detail; the tests verify that intermittent failures are flagged,
+  not the exact threshold value.
+- Tests that fail due to environment issues (missing assets, wrong platform) are
+  not flakiness — the skill distinguishes environment failures from non-determinism
+  in the test itself; this distinction is not explicitly tested here.