添加 claude code game studios 到项目

This commit is contained in:
panw
2026-05-15 14:52:29 +08:00
parent dff559462d
commit a16fe4bff7
415 changed files with 78609 additions and 0 deletions

View File

@@ -0,0 +1,170 @@
# Skill Test Spec: /asset-audit
## Skill Summary
`/asset-audit` audits the `assets/` directory for naming convention compliance,
missing metadata, and format/size issues. It reads asset files against the
conventions and budgets defined in `technical-preferences.md`. No director gates
are invoked. The skill does not write without user approval. Verdicts: COMPLIANT,
WARNINGS, or NON-COMPLIANT.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLIANT, WARNINGS, NON-COMPLIANT
- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
- [ ] Has a next-step handoff (what to do after audit results)
---
## Director Gate Checks
None. Asset auditing is a read-only analysis skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — All assets follow naming conventions
**Fixture:**
- `technical-preferences.md` specifies naming convention: `snake_case`, e.g., `enemy_grunt_idle.png`
- `assets/art/characters/` contains: `enemy_grunt_idle.png`, `enemy_sniper_run.png`
- `assets/audio/sfx/` contains: `sfx_jump_land.ogg`, `sfx_item_pickup.ogg`
- All files are within size budget (textures ≤2MB, audio ≤500KB)
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill reads naming conventions and size budgets from `technical-preferences.md`
2. Skill scans `assets/` recursively
3. All files match `snake_case` convention; all within budget
4. Audit table shows all rows PASS
5. Verdict is COMPLIANT
**Assertions:**
- [ ] Audit covers both art and audio asset directories
- [ ] Each file is checked against naming convention and size budget
- [ ] All rows show PASS when compliant
- [ ] Verdict is COMPLIANT
- [ ] No files are written
---
### Case 2: Non-Compliant — Textures exceed size budget
**Fixture:**
- `assets/art/environment/` contains 5 texture files
- 3 texture files are 4MB each (budget: ≤2MB)
- 2 texture files are within budget
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill reads size budget from `technical-preferences.md` (2MB for textures)
2. Skill scans `assets/art/environment/` — finds 3 oversized textures
3. Audit table lists each oversized file with actual size and budget
4. Verdict is NON-COMPLIANT
5. Skill recommends compression or resolution reduction for flagged files
**Assertions:**
- [ ] All 3 oversized files are listed by name with actual size and budget size
- [ ] Verdict is NON-COMPLIANT when any file exceeds its budget
- [ ] Optimization recommendation is given for oversized files
- [ ] Within-budget files are also listed (showing PASS) for completeness
---
### Case 3: Format Issue — Audio in wrong format
**Fixture:**
- `technical-preferences.md` specifies audio format: OGG
- `assets/audio/music/theme_main.wav` exists (WAV format)
- `assets/audio/sfx/sfx_footstep.ogg` exists (correct OGG format)
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill reads audio format requirement: OGG
2. Skill scans `assets/audio/` — finds `theme_main.wav` in wrong format
3. Audit table flags `theme_main.wav` as FORMAT ISSUE (expected OGG, found WAV)
4. `sfx_footstep.ogg` shows PASS
5. Verdict is WARNINGS (format issues are correctable)
**Assertions:**
- [ ] `theme_main.wav` is flagged as FORMAT ISSUE with expected and actual format noted
- [ ] Verdict is WARNINGS (not NON-COMPLIANT) for format issues, which are correctable
- [ ] Correct-format assets are shown as PASS
- [ ] Skill does not modify or convert any asset files
---
### Case 4: Missing Asset — Asset referenced by GDD but absent from assets/
**Fixture:**
- `design/gdd/enemies.md` references `enemy_boss_idle.png`
- `assets/art/characters/boss/` directory is empty — file does not exist
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill reads GDD references to find expected assets (cross-references with `/content-audit` scope)
2. Skill scans `assets/art/characters/boss/` — file not found
3. Audit table flags `enemy_boss_idle.png` as MISSING ASSET
4. Verdict is NON-COMPLIANT (missing critical art asset)
**Assertions:**
- [ ] Skill checks GDD references to identify expected assets
- [ ] Missing assets are flagged as MISSING ASSET with the GDD reference noted
- [ ] Verdict is NON-COMPLIANT when critical assets are missing
- [ ] Skill does not create or add placeholder assets
---
### Case 5: Gate Compliance — No gate; technical-artist may be consulted separately
**Fixture:**
- 2 files have naming convention violations (CamelCase instead of snake_case)
- `review-mode.txt` contains `full`
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill scans assets and finds 2 naming violations
2. No director gate is invoked regardless of review mode
3. Verdict is WARNINGS
4. Output notes: "Consider having a Technical Artist review naming conventions"
5. Skill presents findings; offers optional audit report write
6. If user opts in: "May I write to `production/qa/asset-audit-[date].md`?"
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Technical artist consultation is suggested (not mandated)
- [ ] Findings table is presented before any write prompt
- [ ] Optional audit report write asks "May I write" before writing
---
## Protocol Compliance
- [ ] Reads `technical-preferences.md` for naming conventions, formats, and size budgets
- [ ] Scans `assets/` directory recursively
- [ ] Audit table shows file name, check type, expected value, actual value, and result
- [ ] Does not modify any asset files
- [ ] No director gates are invoked
- [ ] Verdict is one of: COMPLIANT, WARNINGS, NON-COMPLIANT
---
## Coverage Notes
- Metadata checks (e.g., missing texture import settings in Godot `.import` files)
are not explicitly tested here; they follow the same FORMAT ISSUE flagging pattern.
- The interaction between `/asset-audit` and `/content-audit` (both check GDD
references vs. assets) is intentional overlap; `/asset-audit` focuses on
compliance while `/content-audit` focuses on completeness.

View File

@@ -0,0 +1,172 @@
# Skill Test Spec: /balance-check
## Skill Summary
`/balance-check` reads balance data files (JSON or YAML in `assets/data/`) and
checks each value against the design formulas defined in GDDs under `design/gdd/`.
It produces a findings table with columns: Value → Formula → Deviation → Severity.
No director gates are invoked (read-only analysis). The skill may optionally write
a balance report but asks "May I write" before doing so. Verdicts: BALANCED,
CONCERNS, or OUT OF BALANCE.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: BALANCED, CONCERNS, OUT OF BALANCE
- [ ] Contains "May I write" language (optional report write)
- [ ] Has a next-step handoff (what to do after findings are reviewed)
---
## Director Gate Checks
None. Balance check is a read-only analysis skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — All balance values within formula tolerances
**Fixture:**
- `assets/data/combat-balance.json` exists with 6 stat values
- `design/gdd/combat-system.md` contains formulas for all 6 stats with ±10% tolerance
- All 6 values fall within tolerance
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads all balance data files in `assets/data/`
2. Skill reads GDD formulas from `design/gdd/`
3. Skill computes deviation for each value against its formula
4. All deviations are within ±10% tolerance
5. Skill outputs findings table with all rows showing PASS
6. Verdict is BALANCED
**Assertions:**
- [ ] Findings table is shown for all checked values
- [ ] Each row shows: stat name, formula target, actual value, deviation percentage
- [ ] All rows show PASS or equivalent when within tolerance
- [ ] Verdict is BALANCED
- [ ] No files are written without user approval
---
### Case 2: Out of Balance — Player damage 40% above formula target
**Fixture:**
- `assets/data/combat-balance.json` has `player_damage_base: 140`
- `design/gdd/combat-system.md` formula specifies `player_damage_base = 100` (±10%)
- All other stats are within tolerance
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads combat-balance.json and computes deviation for `player_damage_base`
2. Deviation is +40% — far outside ±10% tolerance
3. Skill flags this row as severity HIGH in the findings table
4. Verdict is OUT OF BALANCE
5. Skill surfaces the HIGH severity item prominently before the table
**Assertions:**
- [ ] `player_damage_base` row shows deviation of +40%
- [ ] Severity is HIGH for deviations exceeding tolerance by more than 2×
- [ ] Verdict is OUT OF BALANCE when any stat has HIGH severity deviation
- [ ] The HIGH severity item is called out explicitly, not buried in table rows
---
### Case 3: No GDD Formulas — Cannot validate, guidance given
**Fixture:**
- `assets/data/economy-balance.yaml` exists with 10 stat values
- No GDD in `design/gdd/` contains formula definitions for economy stats
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads balance data files
2. Skill searches GDDs for formula definitions — finds none for economy stats
3. Skill outputs: "Cannot validate economy stats — no formulas defined. Run /design-system first."
4. No findings table is generated for the economy stats
5. Verdict is CONCERNS (data exists but cannot be validated)
**Assertions:**
- [ ] Skill does not fabricate formula targets when none exist in GDDs
- [ ] Output explicitly names the missing formula source
- [ ] Output recommends running `/design-system` to define formulas
- [ ] Verdict is CONCERNS (not BALANCED, since validation was impossible)
---
### Case 4: Orphan Reference — Balance file references an undefined stat
**Fixture:**
- `assets/data/combat-balance.json` contains a stat `legacy_armor_mult: 1.5`
- `design/gdd/combat-system.md` has no formula for `legacy_armor_mult`
- All other stats have formula definitions and pass validation
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads all stats from combat-balance.json
2. Skill cannot find a formula for `legacy_armor_mult` in any GDD
3. Skill flags `legacy_armor_mult` as ORPHAN REFERENCE in the findings table
4. Other stats are evaluated normally; those within tolerance show PASS
5. Verdict is CONCERNS (orphan reference prevents full validation)
**Assertions:**
- [ ] `legacy_armor_mult` appears in findings table with status ORPHAN REFERENCE
- [ ] Orphan references are distinguished from formula deviations in the table
- [ ] Verdict is CONCERNS when any orphan references are found
- [ ] Skill does not skip orphan stats silently
---
### Case 5: Gate Compliance — Read-only; no gate; optional report requires approval
**Fixture:**
- Balance data and GDD formulas exist; 1 stat has CONCERNS-level deviation (15% above target)
- `review-mode.txt` contains `full`
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads data and GDDs; generates findings table
2. Verdict is CONCERNS (one stat slightly out of range)
3. No director gate is invoked
4. Skill presents findings table to user
5. Skill offers to write an optional balance report
6. If user says yes: skill asks "May I write to `production/qa/balance-report-[date].md`?"
7. If user says no: skill ends without writing
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Findings table is presented without writing anything automatically
- [ ] Optional report write is offered but not forced
- [ ] "May I write" prompt appears only if user opts in to the report
---
## Protocol Compliance
- [ ] Reads both balance data files and GDD formulas before analysis
- [ ] Findings table shows Value, Formula, Deviation, and Severity columns
- [ ] Does not write any files without explicit user approval
- [ ] No director gates are invoked
- [ ] Verdict is one of: BALANCED, CONCERNS, OUT OF BALANCE
---
## Coverage Notes
- The case where `assets/data/` is entirely empty is not tested; behavior
follows the CONCERNS pattern with a message that no data files were found.
- Tolerance thresholds (±10%, ±20%) are implementation details of the skill;
the tests verify that deviations are detected and classified, not the
exact threshold values.

View File

@@ -0,0 +1,172 @@
# Skill Test Spec: /code-review
## Skill Summary
`/code-review` performs an architectural code review of source files in `src/`,
checking coding standards from `CLAUDE.md` (doc comments on public APIs,
dependency injection over singletons, data-driven values, testability). Findings
are advisory. No director gates are invoked. No code edits are made. Verdicts:
APPROVED, CONCERNS, or NEEDS CHANGES.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: APPROVED, CONCERNS, NEEDS CHANGES
- [ ] Does NOT require "May I write" language (read-only; findings are advisory output)
- [ ] Has a next-step handoff (what to do with findings)
---
## Director Gate Checks
None. Code review is a read-only advisory skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Source file follows all coding standards
**Fixture:**
- `src/gameplay/health_component.gd` exists with:
- All public methods have doc comments (`##` notation)
- No singletons used; dependencies injected via constructor
- No hardcoded values; all constants reference `assets/data/`
- ADR reference in file header: `# ADR: docs/architecture/adr-004-health.md`
- Referenced ADR has `Status: Accepted`
**Input:** `/code-review src/gameplay/health_component.gd`
**Expected behavior:**
1. Skill reads the source file
2. Skill checks all coding standards: doc comments, DI, data-driven, ADR status
3. All checks pass
4. Skill outputs findings summary with all checks PASS
5. Verdict is APPROVED
**Assertions:**
- [ ] Each coding standard check is listed in the output
- [ ] All checks show PASS when standards are met
- [ ] Skill reads referenced ADR to confirm its status
- [ ] Verdict is APPROVED
- [ ] No edits are made to any file
---
### Case 2: Needs Changes — Missing doc comment and singleton usage
**Fixture:**
- `src/ui/inventory_ui.gd` has:
- 2 public methods without doc comments
- Uses `GameManager.instance` (singleton pattern)
- All other standards met
**Input:** `/code-review src/ui/inventory_ui.gd`
**Expected behavior:**
1. Skill reads the source file
2. Skill detects: 2 missing doc comments on public methods
3. Skill detects: singleton usage at specific lines (e.g., line 42, line 87)
4. Findings list the exact method names and line numbers
5. Verdict is NEEDS CHANGES
**Assertions:**
- [ ] Missing doc comments are listed with method names
- [ ] Singleton usage is flagged with file and line number
- [ ] Verdict is NEEDS CHANGES when BLOCKING-level standard violations exist
- [ ] Skill does not edit the file — findings are for the developer to act on
- [ ] Output suggests replacing singleton with dependency injection
---
### Case 3: Architecture Risk — ADR reference is Proposed, not Accepted
**Fixture:**
- `src/core/save_system.gd` has a header comment: `# ADR: docs/architecture/adr-010-save.md`
- `adr-010-save.md` exists but has `Status: Proposed`
- Code itself follows all other coding standards
**Input:** `/code-review src/core/save_system.gd`
**Expected behavior:**
1. Skill reads the source file
2. Skill reads referenced ADR — finds `Status: Proposed`
3. Skill flags this as ARCHITECTURE RISK (code is implementing an unaccepted ADR)
4. Other coding standard checks pass
5. Verdict is CONCERNS (risk flag is advisory, not a hard NEEDS CHANGES)
**Assertions:**
- [ ] Skill reads referenced ADR file to check its status
- [ ] ARCHITECTURE RISK is flagged when ADR status is Proposed
- [ ] Verdict is CONCERNS (not NEEDS CHANGES) for ADR risk — advisory severity
- [ ] Output recommends resolving the ADR before the code goes to production
---
### Case 4: Edge Case — No source files found at specified path
**Fixture:**
- User calls `/code-review src/networking/`
- `src/networking/` directory does not exist
**Input:** `/code-review src/networking/`
**Expected behavior:**
1. Skill attempts to read files in `src/networking/`
2. Directory or files not found
3. Skill outputs an error: "No source files found at `src/networking/`"
4. Skill suggests checking `src/` for valid directories
5. No verdict is emitted (nothing was reviewed)
**Assertions:**
- [ ] Skill does not crash when path does not exist
- [ ] Output names the attempted path in the error message
- [ ] Output suggests checking `src/` for valid file paths
- [ ] No verdict is emitted when there is nothing to review
---
### Case 5: Gate Compliance — No gate; LP may be consulted separately
**Fixture:**
- Source file follows most standards but has 1 CONCERNS-level finding (a magic number)
- `review-mode.txt` contains `full`
**Input:** `/code-review src/gameplay/loot_system.gd`
**Expected behavior:**
1. Skill reads and reviews the source file
2. No director gate is invoked (code review findings are advisory)
3. Skill presents findings with the CONCERNS verdict
4. Output notes: "Consider requesting a Lead Programmer review for architecture concerns"
5. Skill does not invoke any agent automatically
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] LP consultation is suggested (not mandated) in the output
- [ ] No code edits are made
- [ ] Verdict is CONCERNS for advisory-level findings
---
## Protocol Compliance
- [ ] Reads source file(s) and coding standards before reviewing
- [ ] Lists each coding standard check in findings output
- [ ] Does not edit any source files (read-only skill)
- [ ] No director gates are invoked
- [ ] Verdict is one of: APPROVED, CONCERNS, NEEDS CHANGES
---
## Coverage Notes
- Batch review of all files in a directory is not explicitly tested; behavior
is assumed to apply the same checks file by file and aggregate the verdict.
- Test coverage checks (verifying corresponding test files exist) are a stretch
goal not tested here; that is primarily the domain of `/test-evidence-review`.

View File

@@ -0,0 +1,176 @@
# Skill Test Spec: /consistency-check
## Skill Summary
`/consistency-check` scans all GDDs in `design/gdd/` and checks for internal
conflicts across documents. It produces a structured findings table with columns:
System A vs System B, Conflict Type, Severity (HIGH / MEDIUM / LOW). Conflict
types include: formula mismatch, competing ownership, stale reference, and
dependency gap.
The skill is read-only during analysis. It has no director gates. An optional
consistency report can be written to `design/consistency-report-[date].md` if the
user requests it, but the skill asks "May I write" before doing so.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
- [ ] Does NOT require "May I write" language during analysis (read-only scan)
- [ ] Has a next-step handoff at the end
- [ ] Documents that report writing is optional and requires approval
---
## Director Gate Checks
No director gates — this skill spawns no director gate agents. Consistency
checking is a mechanical scan; no creative or technical director review is
required as part of the scan itself.
---
## Test Cases
### Case 1: Happy Path — 4 GDDs with no conflicts
**Fixture:**
- `design/gdd/` contains exactly 4 system GDDs
- All GDDs have consistent formulas (no overlapping variables with different values)
- No two GDDs claim ownership of the same game entity or mechanic
- All dependency references point to GDDs that exist
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill reads all 4 GDDs in `design/gdd/`
2. Runs cross-GDD consistency checks (formulas, ownership, references)
3. No conflicts found
4. Outputs structured findings table showing 0 issues
5. Verdict: CONSISTENT
**Assertions:**
- [ ] All 4 GDDs are read before producing output
- [ ] Findings table is present (even if empty — shows "No conflicts found")
- [ ] Verdict is CONSISTENT when no conflicts exist
- [ ] Skill does NOT write any files without user approval
- [ ] Next-step handoff is present
---
### Case 2: Failure Path — Two GDDs with conflicting damage formulas
**Fixture:**
- GDD-A defines damage formula: `damage = attack * 1.5`
- GDD-B defines damage formula: `damage = attack * 2.0` for the same entity type
- Both GDDs refer to the same "attack" variable
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill reads all GDDs and detects the formula mismatch
2. Findings table includes an entry: GDD-A vs GDD-B | Formula Mismatch | HIGH
3. Specific conflicting formulas are shown (not just "formula conflict exists")
4. Verdict: CONFLICTS FOUND
**Assertions:**
- [ ] Verdict is CONFLICTS FOUND (not CONSISTENT)
- [ ] Conflict entry names both GDD filenames
- [ ] Conflict type is "Formula Mismatch"
- [ ] Severity is HIGH for a direct formula contradiction
- [ ] Both conflicting formulas are shown in the findings table
- [ ] Skill does NOT auto-resolve the conflict
---
### Case 3: Partial Path — GDD references a system with no GDD
**Fixture:**
- GDD-A's Dependencies section lists "system-B" as a dependency
- No GDD for system-B exists in `design/gdd/`
- All other GDDs are consistent
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill reads all GDDs and checks dependency references
2. GDD-A's reference to "system-B" cannot be resolved — no GDD exists for it
3. Findings table includes: GDD-A vs (missing) | Dependency Gap | MEDIUM
4. Verdict: DEPENDENCY GAP (not CONSISTENT, not CONFLICTS FOUND)
**Assertions:**
- [ ] Verdict is DEPENDENCY GAP (distinct from CONSISTENT and CONFLICTS FOUND)
- [ ] Findings entry names GDD-A and the missing system-B
- [ ] Severity is MEDIUM for an unresolved dependency reference
- [ ] Skill suggests running `/design-system system-B` to create the missing GDD
---
### Case 4: Edge Case — No GDDs found
**Fixture:**
- `design/gdd/` directory is empty or does not exist
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill attempts to read files in `design/gdd/`
2. No GDD files found
3. Skill outputs an error: "No GDDs found in `design/gdd/`. Run `/design-system` to create GDDs first."
4. No findings table is produced
5. No verdict is issued
**Assertions:**
- [ ] Skill outputs a clear error message when no GDDs are found
- [ ] No verdict is produced (CONSISTENT / CONFLICTS FOUND / DEPENDENCY GAP)
- [ ] Skill recommends the correct next action (`/design-system`)
- [ ] Skill does NOT crash or produce a partial report
---
### Case 5: Director Gate — No gate spawned; no review-mode.txt read
**Fixture:**
- `design/gdd/` contains ≥2 GDDs
- `production/session-state/review-mode.txt` exists with `full`
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill reads all GDDs and runs the consistency scan
2. Skill does NOT read `production/session-state/review-mode.txt`
3. No director gate agents are spawned at any point
4. Findings table and verdict are produced normally
**Assertions:**
- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
- [ ] Skill does NOT read `production/session-state/review-mode.txt`
- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
- [ ] Review mode has no effect on this skill's behavior
---
## Protocol Compliance
- [ ] Reads all GDDs before producing the findings table
- [ ] Findings table shown in full before any write ask (if report is requested)
- [ ] Verdict is one of exactly: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
- [ ] No director gates — no review-mode.txt read
- [ ] Report writing (if requested) gated by "May I write" approval
- [ ] Ends with next-step handoff appropriate to verdict
---
## Coverage Notes
- This skill checks for structural consistency between GDDs. Deep design theory
analysis (pillar drift, dominant strategies) is handled by `/review-all-gdds`.
- Formula conflict detection relies on consistent formula notation across GDDs —
informal descriptions of the same mechanic may not be detected.
- The conflict severity rubric (HIGH / MEDIUM / LOW) is defined in the skill body
and not re-enumerated here.

View File

@@ -0,0 +1,164 @@
# Skill Test Spec: /content-audit
## Skill Summary
`/content-audit` reads GDDs in `design/gdd/` and checks whether all content
items specified there (enemies, items, levels, etc.) are accounted for in
`assets/`. It produces a gap table: Content Type → Specified Count → Found Count
→ Missing Items. No director gates are invoked. The skill does not write without
user approval. Verdicts: COMPLETE, GAPS FOUND, or MISSING CRITICAL CONTENT.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
- [ ] Does NOT require "May I write" language (read-only output; write is optional report)
- [ ] Has a next-step handoff (what to do after gap table is reviewed)
---
## Director Gate Checks
None. Content audit is a read-only analysis skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — All specified content present
**Fixture:**
- `design/gdd/enemies.md` specifies 4 enemy types: Grunt, Sniper, Tank, Boss
- `assets/art/characters/` contains folders: `grunt/`, `sniper/`, `tank/`, `boss/`
- `design/gdd/items.md` specifies 3 item types; all 3 found in `assets/data/items/`
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads all GDDs in `design/gdd/`
2. Skill scans `assets/` for each specified content item
3. All 4 enemy types and 3 item types are found
4. Gap table shows: all rows have Found Count = Specified Count, no missing items
5. Verdict is COMPLETE
**Assertions:**
- [ ] Gap table covers all content types found in GDDs
- [ ] Each row shows Specified Count and Found Count
- [ ] No missing items when counts match
- [ ] Verdict is COMPLETE
- [ ] No files are written
---
### Case 2: Gaps Found — Enemy type missing from assets
**Fixture:**
- `design/gdd/enemies.md` specifies 3 enemy types: Grunt, Sniper, Boss
- `assets/art/characters/` contains: `grunt/`, `sniper/` only (Boss folder missing)
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads GDD — finds 3 enemy types specified
2. Skill scans `assets/art/characters/` — finds only 2
3. Gap table row for enemies: Specified 3, Found 2, Missing: Boss
4. Verdict is GAPS FOUND
**Assertions:**
- [ ] Gap table row identifies "Boss" as the missing item by name
- [ ] Specified Count (3) and Found Count (2) are both shown
- [ ] Verdict is GAPS FOUND when any content item is missing
- [ ] Skill does not assume the asset will be added later — it flags it now
---
### Case 3: No GDD Content Specs Found — Guidance given
**Fixture:**
- `design/gdd/` contains only `core-loop.md` which has no content inventory section
- No other GDDs exist with content specifications
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads all GDDs — finds no content inventory sections
2. Skill outputs: "No content specifications found in GDDs — run /design-system first to define content lists"
3. No gap table is produced
4. Verdict is GAPS FOUND (cannot confirm completeness without specs)
**Assertions:**
- [ ] Skill does not produce a gap table when no GDD content specs exist
- [ ] Output recommends running `/design-system`
- [ ] Verdict reflects inability to confirm completeness
---
### Case 4: Edge Case — Asset in wrong format for target platform
**Fixture:**
- `design/gdd/audio.md` specifies audio assets as OGG format
- `assets/audio/sfx/jump.wav` exists (WAV format, not OGG)
- `assets/audio/sfx/land.ogg` exists (correct format)
- `technical-preferences.md` specifies audio format: OGG
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads GDD audio spec and technical preferences for format requirements
2. Skill finds `jump.wav` — present but in wrong format
3. Gap table row for audio: Specified 2, Found 2 (by name), but `jump.wav` flagged as FORMAT ISSUE
4. Verdict is GAPS FOUND (format compliance is part of content completeness)
**Assertions:**
- [ ] Skill checks asset format against GDD or technical preferences when format is specified
- [ ] `jump.wav` is flagged as FORMAT ISSUE with expected format (OGG) noted
- [ ] Format issues are distinct from missing content in the gap table
- [ ] Verdict is GAPS FOUND when format issues exist
---
### Case 5: Gate Compliance — Read-only; no gate; gap table for human review
**Fixture:**
- GDDs specify 10 content items; 9 are found in assets; 1 is missing
- `review-mode.txt` contains `full`
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads GDDs and scans assets; produces gap table
2. No director gate is invoked regardless of review mode
3. Skill presents gap table to user as read-only output
4. Verdict is GAPS FOUND
5. Skill offers to write an audit report but does not write automatically
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Gap table is presented without auto-writing any file
- [ ] Optional report write is offered but not forced
- [ ] Skill does not modify any asset files
---
## Protocol Compliance
- [ ] Reads GDDs and asset directory before producing gap table
- [ ] Gap table shows Content Type, Specified Count, Found Count, Missing Items
- [ ] Does not write files without explicit user approval
- [ ] No director gates are invoked
- [ ] Verdict is one of: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
---
## Coverage Notes
- MISSING CRITICAL CONTENT verdict (vs. GAPS FOUND) is triggered when the
missing item is tagged as critical in the GDD; this is not explicitly tested
but follows the same detection path.
- The case where `assets/` directory does not exist is not tested; the skill
would produce a MISSING CRITICAL CONTENT verdict for all specified items.

View File

@@ -0,0 +1,168 @@
# Skill Test Spec: /estimate
## Skill Summary
`/estimate` estimates task or story effort using a relative-size scale (S / M /
L / XL) based on story complexity, acceptance criteria count, and historical
sprint velocity from past sprint files. Estimates are advisory and are never
written automatically. No director gates are invoked. Verdicts are effort ranges,
not pass/fail — every run produces an estimate.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains size labels: S, M, L, XL (the "verdict" equivalents for this skill)
- [ ] Does NOT require "May I write" language (advisory output only)
- [ ] Has a next-step handoff (how to use the estimate in sprint planning)
---
## Director Gate Checks
None. Estimation is an advisory informational skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Clear story with known tech stack
**Fixture:**
- `production/epics/combat/story-hitbox-detection.md` exists with:
- 4 clear Acceptance Criteria
- ADR reference (Accepted status)
- No "unknown" or "TBD" language in story body
- `production/sprints/sprint-003.md` through `sprint-005.md` exist with velocity data
- Tech stack is GDScript (well-understood by team per sprint history)
**Input:** `/estimate production/epics/combat/story-hitbox-detection.md`
**Expected behavior:**
1. Skill reads the story file — assesses clarity, AC count, tech stack
2. Skill reads sprint history to determine average velocity
3. Skill outputs estimate: M (12 days) with reasoning
4. No files are written
**Assertions:**
- [ ] Estimate is M for a clear, well-scoped story with known tech
- [ ] Reasoning references AC count, tech stack familiarity, and velocity data
- [ ] Estimate is presented as a range (e.g., "12 days"), not a single point
- [ ] No files are written
---
### Case 2: High Uncertainty — Unknown system, no ADR yet
**Fixture:**
- `production/epics/online/story-lobby-matchmaking.md` exists with:
- 2 vague Acceptance Criteria (using "should" and "TBD")
- No ADR reference — matchmaking architecture not yet decided
- References new subsystem ("online/matchmaking") with no existing source files
**Input:** `/estimate production/epics/online/story-lobby-matchmaking.md`
**Expected behavior:**
1. Skill reads story — finds vague AC, no ADR, no existing source
2. Skill flags multiple uncertainty factors
3. Estimate is LXL with an explicit risk note: "Estimate range is wide due to architectural unknowns"
4. Skill recommends creating an ADR before development begins
**Assertions:**
- [ ] Estimate is L or XL (not S or M) when significant unknowns exist
- [ ] Risk note explains the specific unknowns driving the wide range
- [ ] Output recommends resolving architectural questions first
- [ ] No files are written
---
### Case 3: No Sprint Velocity Data — Conservative defaults used
**Fixture:**
- Story file exists and is well-defined
- `production/sprints/` is empty — no historical sprints
**Input:** `/estimate production/epics/core/story-save-load.md`
**Expected behavior:**
1. Skill reads story — assesses complexity
2. Skill attempts to read sprint velocity data — finds none
3. Skill notes: "No sprint history found — using conservative defaults for velocity"
4. Estimate is produced using default assumptions (e.g., 1 story point = 1 day)
5. No files are written
**Assertions:**
- [ ] Skill does not error when no sprint history exists
- [ ] Output explicitly notes that conservative defaults are being used
- [ ] Estimate is still produced (not blocked by missing velocity)
- [ ] Conservative defaults produce a higher (not lower) estimate range
---
### Case 4: Multiple Stories — Each estimated individually plus sprint total
**Fixture:**
- User provides a sprint file: `production/sprints/sprint-007.md` with 4 stories
- Sprint history exists (3 previous sprints)
**Input:** `/estimate production/sprints/sprint-007.md`
**Expected behavior:**
1. Skill reads sprint file — identifies 4 stories
2. Skill estimates each story individually: S, M, M, L
3. Skill computes sprint total: approximately 68 story points
4. Skill presents per-story estimates followed by sprint total
5. No files are written
**Assertions:**
- [ ] Each story receives its own estimate label
- [ ] Sprint total is presented after individual estimates
- [ ] Total is a sum range derived from individual ranges
- [ ] Skill handles sprint files (not just single story files) as input
---
### Case 5: Gate Compliance — No gate; estimates are informational
**Fixture:**
- Story file exists with medium complexity
- `review-mode.txt` contains `full`
**Input:** `/estimate production/epics/core/story-item-pickup.md`
**Expected behavior:**
1. Skill reads story and sprint history; computes estimate
2. No director gate is invoked in any review mode
3. Estimate is presented as advisory output only
4. Skill notes: "Use this estimate in /sprint-plan when selecting stories for the next sprint"
**Assertions:**
- [ ] No director gate is invoked regardless of review mode
- [ ] Output is purely informational — no approval or write prompt
- [ ] Next-step recommendation references `/sprint-plan`
- [ ] Estimate does not change based on review mode
---
## Protocol Compliance
- [ ] Reads story file before estimating
- [ ] Reads sprint velocity history when available
- [ ] Produces effort range (S/M/L/XL), not a single number
- [ ] Does not write any files
- [ ] No director gates are invoked
- [ ] Always produces an estimate (never blocked by missing data; uses defaults instead)
---
## Coverage Notes
- The skill does not produce PASS/FAIL verdicts; the "verdict" here is the
effort range itself. Test assertions focus on the accuracy of the range
and the quality of the reasoning, not a binary outcome.
- Team-specific velocity calibration (what "M" means for this team) is an
implementation detail not tested here; it is configured via sprint history.

View File

@@ -0,0 +1,171 @@
# Skill Test Spec: /perf-profile
## Skill Summary
`/perf-profile` is a structured performance profiling workflow that identifies
bottlenecks and recommends optimizations. If profiler data or performance logs
are provided, it analyzes them directly. If not, it guides the user through a
manual profiling checklist. No director gates are invoked. The skill asks
"May I write to `production/qa/perf-[date].md`?" before persisting a report.
Verdicts: WITHIN BUDGET, CONCERNS, or OVER BUDGET.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: WITHIN BUDGET, CONCERNS, OVER BUDGET
- [ ] Contains "May I write" language (skill writes perf report)
- [ ] Has a next-step handoff (what to do after performance findings are reviewed)
---
## Director Gate Checks
None. Performance profiling is an advisory analysis skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Frame data provided, draw call spike found
**Fixture:**
- User provides `production/qa/profiler-export-2026-03-15.json` with frame time data
- Data shows: average frame time 14ms (within 16.6ms budget), but frames 4248 spike to 28ms
- Spike correlates with a scene with 450 draw calls (budget: 200)
**Input:** `/perf-profile production/qa/profiler-export-2026-03-15.json`
**Expected behavior:**
1. Skill reads profiler data
2. Skill identifies average frame time is within budget
3. Skill identifies draw call spike on frames 4248 (450 calls vs 200 budget)
4. Verdict is CONCERNS (average OK, but spikes indicate an issue)
5. Skill recommends batching or culling for the identified scene
6. Skill asks "May I write to `production/qa/perf-2026-04-06.md`?"
**Assertions:**
- [ ] Spike frames are identified by frame number
- [ ] Draw call count and budget are compared explicitly
- [ ] Verdict is CONCERNS when spikes exceed budget even if average is OK
- [ ] At least one specific optimization recommendation is given
- [ ] "May I write" prompt appears before writing report
---
### Case 2: No Profiler Data — Manual checklist output
**Fixture:**
- User runs `/perf-profile` with no arguments
- No profiler data files exist in `production/qa/`
**Input:** `/perf-profile`
**Expected behavior:**
1. Skill finds no profiler data
2. Skill outputs a manual profiling checklist for the user to work through:
- Enable Godot profiler or target engine's profiler
- Record a 60-second play session
- Export frame time data
- Note any dropped frames or hitches
3. Skill asks user to provide data once collected before running analysis
**Assertions:**
- [ ] Skill does not crash or emit a verdict when no data is provided
- [ ] Manual profiling checklist is output (actionable steps, not just an error)
- [ ] No verdict is emitted (there is nothing to assess yet)
- [ ] No files are written
---
### Case 3: Over Budget — Frame budget exceeded for target platform
**Fixture:**
- Profiler data shows consistent 22ms frame times (target: 16.6ms for 60fps)
- All frames exceed budget; no single spike — systemic issue
- `technical-preferences.md` specifies target platform: PC, 60fps
**Input:** `/perf-profile production/qa/profiler-export-2026-03-20.json`
**Expected behavior:**
1. Skill reads profiler data and technical preferences for performance budget
2. All frames are over the 16.6ms budget
3. Verdict is OVER BUDGET
4. Skill outputs a prioritized optimization list (e.g., LOD system, shader complexity, physics tick rate)
5. Skill asks "May I write" before writing report
**Assertions:**
- [ ] Verdict is OVER BUDGET when all or most frames exceed budget
- [ ] Target frame budget is read from `technical-preferences.md` (not hardcoded)
- [ ] Optimization priority list is provided, not just the raw verdict
- [ ] "May I write" prompt appears before report write
---
### Case 4: Previous Perf Report Exists — Delta comparison
**Fixture:**
- `production/qa/perf-2026-03-28.md` exists with prior results (avg 15ms, max 19ms)
- New profiler export shows: avg 13ms, max 17ms
- Both reports are for the same scene
**Input:** `/perf-profile production/qa/profiler-export-2026-04-05.json`
**Expected behavior:**
1. Skill reads new profiler data
2. Skill detects prior report for the same scene
3. Skill computes deltas: avg improved 2ms, max improved 2ms
4. Skill presents regression check: no regressions detected
5. Verdict is WITHIN BUDGET; report notes improvement since last profile
**Assertions:**
- [ ] Skill checks `production/qa/` for prior perf reports before writing
- [ ] Delta comparison is shown (prior vs. current for key metrics)
- [ ] Verdict is WITHIN BUDGET when current metrics are within budget
- [ ] Improvement trend is noted positively in the report
---
### Case 5: Gate Compliance — No gate; performance-analyst separate
**Fixture:**
- Profiler data shows CONCERNS-level findings (some spikes)
- `review-mode.txt` contains `full`
**Input:** `/perf-profile production/qa/profiler-export-2026-04-01.json`
**Expected behavior:**
1. Skill analyzes profiler data; verdict is CONCERNS
2. No director gate is invoked regardless of review mode
3. Output notes: "For in-depth analysis, consider running `/perf-profile` with the performance-analyst agent"
4. Skill asks "May I write" and writes report on user approval
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Performance-analyst consultation is suggested (not mandated)
- [ ] "May I write" prompt appears before report write
- [ ] Verdict is CONCERNS for spike-based findings
---
## Protocol Compliance
- [ ] Reads profiler data when provided; outputs checklist when not
- [ ] Reads `technical-preferences.md` for target platform frame budget
- [ ] Checks for prior perf reports to enable delta comparison
- [ ] Always asks "May I write" before writing report
- [ ] No director gates are invoked
- [ ] Verdict is one of: WITHIN BUDGET, CONCERNS, OVER BUDGET
---
## Coverage Notes
- Platform-specific profiling workflows (console, mobile) are not tested here;
the checklist output in Case 2 would be platform-specific in practice.
- The delta comparison in Case 4 assumes reports cover the same scene; cross-scene
comparisons are not explicitly handled.

View File

@@ -0,0 +1,168 @@
# Skill Test Spec: /scope-check
## Skill Summary
`/scope-check` is a Haiku-tier read-only skill that analyzes a feature, sprint,
or story for scope creep risk. It reads sprint and story files and compares them
against the active milestone goals. It is designed for fast, low-cost checks
before or during planning. No director gates are invoked. No files are written.
Verdicts: ON SCOPE, CONCERNS, or SCOPE CREEP DETECTED.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
- [ ] Does NOT require "May I write" language (read-only skill)
- [ ] Has a next-step handoff (what to do based on verdict)
---
## Director Gate Checks
None. Scope check is a read-only advisory skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Sprint stories align with milestone goals
**Fixture:**
- `production/milestones/milestone-03.md` lists 3 goals: combat system, enemy AI, level loading
- `production/sprints/sprint-006.md` contains 5 stories, all tagged to one of the 3 goals
- `production/session-state/active.md` references milestone-03 as the active milestone
**Input:** `/scope-check`
**Expected behavior:**
1. Skill reads active milestone goals from milestone-03
2. Skill reads sprint-006 stories and checks each against milestone goals
3. All 5 stories map to one of the 3 goals
4. Skill outputs a mapping table: story → milestone goal
5. Verdict is ON SCOPE
**Assertions:**
- [ ] Each story is mapped to a milestone goal in the output
- [ ] Verdict is ON SCOPE when all stories map to milestone goals
- [ ] No files are written
- [ ] Skill does not modify sprint or milestone files
---
### Case 2: Scope Creep Detected — Stories introducing systems not in milestone
**Fixture:**
- `production/milestones/milestone-03.md` goals: combat, enemy AI, level loading
- `production/sprints/sprint-006.md` contains 5 stories:
- 3 stories map to milestone goals
- 2 stories reference "online leaderboard" and "achievement system" (not in milestone-03)
**Input:** `/scope-check`
**Expected behavior:**
1. Skill reads milestone goals and sprint stories
2. Skill identifies 2 stories with no matching milestone goal
3. Skill names the out-of-scope stories: "Online Leaderboard Feature", "Achievement System Setup"
4. Verdict is SCOPE CREEP DETECTED
**Assertions:**
- [ ] Out-of-scope stories are named explicitly in the output
- [ ] Verdict is SCOPE CREEP DETECTED when any story has no milestone goal match
- [ ] Skill does not automatically remove the stories — findings are advisory
- [ ] Output recommends deferring the out-of-scope stories to a later milestone
---
### Case 3: No Milestone Defined — CONCERNS; scope cannot be validated
**Fixture:**
- `production/session-state/active.md` has no milestone reference
- `production/milestones/` directory exists but is empty
- `production/sprints/sprint-006.md` has 4 stories
**Input:** `/scope-check`
**Expected behavior:**
1. Skill reads active.md — finds no milestone reference
2. Skill checks `production/milestones/` — no milestone files found
3. Skill outputs: "No active milestone defined — scope cannot be validated"
4. Verdict is CONCERNS
**Assertions:**
- [ ] Skill does not error when no milestone is defined
- [ ] Output explicitly states that scope validation requires a milestone reference
- [ ] Verdict is CONCERNS (not ON SCOPE or SCOPE CREEP DETECTED without data)
- [ ] Output suggests running `/milestone-review` or creating a milestone
---
### Case 4: Single Story Check — Evaluated against its parent epic
**Fixture:**
- User targets a single story: `production/epics/combat/story-parry-timing.md`
- Story references parent epic: `epic-combat.md`
- `production/epics/combat/epic-combat.md` has scope: "melee combat mechanics"
- Story title: "Implement parry timing window" — matches epic scope
**Input:** `/scope-check production/epics/combat/story-parry-timing.md`
**Expected behavior:**
1. Skill reads the specified story file
2. Skill reads the parent epic to get scope definition
3. Skill evaluates story against epic scope — "parry timing" matches "melee combat"
4. Verdict is ON SCOPE
**Assertions:**
- [ ] Single-file argument is accepted (story path, not sprint)
- [ ] Skill reads the parent epic referenced in the story file
- [ ] Story is evaluated against epic scope (not milestone scope) in single-story mode
- [ ] Verdict is ON SCOPE when story matches epic scope
---
### Case 5: Gate Compliance — No gate; PR may be consulted separately
**Fixture:**
- Sprint has 2 SCOPE CREEP stories and 3 ON SCOPE stories
- `review-mode.txt` contains `full`
**Input:** `/scope-check`
**Expected behavior:**
1. Skill reads milestone and sprint; identifies 2 scope creep items
2. No director gate is invoked regardless of review mode
3. Skill presents findings with SCOPE CREEP DETECTED verdict
4. Output notes: "Consider raising scope concerns with the Producer before sprint begins"
5. Skill ends without writing any files
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Producer consultation is suggested (not mandated)
- [ ] No files are written
- [ ] Verdict is SCOPE CREEP DETECTED
---
## Protocol Compliance
- [ ] Reads milestone goals and sprint/story files before analysis
- [ ] Maps each story to a milestone goal (or flags as unmapped)
- [ ] Does not write any files
- [ ] No director gates are invoked
- [ ] Runs on Haiku model tier (fast, low-cost)
- [ ] Verdict is one of: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
---
## Coverage Notes
- The case where the sprint file itself does not exist is not tested; the
skill would output a CONCERNS verdict with a message about missing sprint data.
- Partial scope overlap (story touches a milestone goal but also introduces
new scope) is not explicitly tested; implementation may classify this as
CONCERNS rather than SCOPE CREEP DETECTED.

View File

@@ -0,0 +1,167 @@
# Skill Test Spec: /security-audit
## Skill Summary
`/security-audit` audits the game for security risks including save data
integrity, network communication, anti-cheat exposure, and data privacy. It
reads source files in `src/` for security patterns and checks whether sensitive
data is handled correctly. No director gates are invoked. The skill does not
write files (findings report only). Verdicts: SECURE, CONCERNS, or
VULNERABILITIES FOUND.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: SECURE, CONCERNS, VULNERABILITIES FOUND
- [ ] Does NOT require "May I write" language (read-only; findings report only)
- [ ] Has a next-step handoff (what to do with findings)
---
## Director Gate Checks
None. Security audit is a read-only advisory skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Save data encrypted, no hardcoded credentials
**Fixture:**
- `src/core/save_system.gd` uses `Crypto` class to encrypt save data before writing
- No hardcoded API keys, passwords, or credentials in any `src/` file
- No version numbers or internal build IDs exposed in client-facing output
**Input:** `/security-audit`
**Expected behavior:**
1. Skill scans `src/` for security patterns: encryption usage, hardcoded credentials, exposed internals
2. All checks pass: save data encrypted, no credentials found, no exposed internals
3. Findings report shows all checks PASS
4. Verdict is SECURE
**Assertions:**
- [ ] Skill checks save data handling for encryption usage
- [ ] Skill scans for hardcoded credentials (API keys, passwords, tokens)
- [ ] Skill checks for version/build numbers exposed to players
- [ ] All checks shown in findings report
- [ ] Verdict is SECURE when all checks pass
---
### Case 2: Vulnerabilities Found — Unencrypted save data and exposed version
**Fixture:**
- `src/core/save_system.gd` writes save data as plain JSON (no encryption)
- `src/ui/debug_overlay.gd` contains: `label.text = "Build: " + ProjectSettings.get("application/config/version")`
(exposes internal build version to player)
**Input:** `/security-audit`
**Expected behavior:**
1. Skill scans `src/` — finds unencrypted save write in `save_system.gd`
2. Skill finds exposed version string in `debug_overlay.gd`
3. Both findings are flagged as VULNERABILITIES
4. Verdict is VULNERABILITIES FOUND
5. Skill provides remediation recommendations for each vulnerability
**Assertions:**
- [ ] Unencrypted save data is flagged as a vulnerability with file and approximate line
- [ ] Exposed version string is flagged as a vulnerability
- [ ] Remediation suggestion is given for each vulnerability
- [ ] Verdict is VULNERABILITIES FOUND when any vulnerability is detected
- [ ] No files are written or modified
---
### Case 3: Online Features Without Authentication — CONCERNS
**Fixture:**
- `src/networking/lobby.gd` exists with functions: `join_lobby()`, `send_chat()`
- No authentication check is found before `send_chat()` — players can call it without being verified
- Game has online multiplayer features (inferred from file presence)
**Input:** `/security-audit`
**Expected behavior:**
1. Skill scans `src/networking/` — detects online feature code
2. Skill checks for authentication guard before network calls — finds none on `send_chat()`
3. Flags: "Online feature without authentication check — CONCERNS"
4. Verdict is CONCERNS (not VULNERABILITIES FOUND, as this is a missing control, not an exploit)
**Assertions:**
- [ ] Skill detects online features by scanning for networking source files
- [ ] Missing authentication checks before network operations are flagged
- [ ] Verdict is CONCERNS (advisory severity) for missing authentication guards
- [ ] Output recommends adding authentication before network calls
---
### Case 4: Edge Case — No Source Files to Analyze
**Fixture:**
- `src/` directory does not exist or is completely empty
**Input:** `/security-audit`
**Expected behavior:**
1. Skill attempts to scan `src/` — no files found
2. Skill outputs an error: "No source files found in `src/` — nothing to audit"
3. No findings report is generated
4. No verdict is emitted
**Assertions:**
- [ ] Skill does not crash when `src/` is empty or absent
- [ ] Output clearly states that no source files were found
- [ ] No verdict is emitted (there is nothing to assess)
- [ ] Skill suggests verifying the `src/` directory path
---
### Case 5: Gate Compliance — No gate; security-engineer invoked separately
**Fixture:**
- Source files exist; 1 CONCERNS-level finding detected (debug logging enabled in release build)
- `review-mode.txt` contains `full`
**Input:** `/security-audit`
**Expected behavior:**
1. Skill scans source; finds debug logging active in release path
2. No director gate is invoked regardless of review mode
3. Verdict is CONCERNS
4. Output notes: "For formal security review, consider engaging a security-engineer agent"
5. Findings are presented as a read-only report; no files written
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Security-engineer consultation is suggested (not mandated)
- [ ] No files are written
- [ ] Verdict is CONCERNS for advisory-level security findings
---
## Protocol Compliance
- [ ] Reads source files in `src/` before auditing
- [ ] Checks save data encryption, hardcoded credentials, exposed internals, auth guards
- [ ] Provides remediation recommendations for each finding
- [ ] Does not write any files (read-only skill)
- [ ] No director gates are invoked
- [ ] Verdict is one of: SECURE, CONCERNS, VULNERABILITIES FOUND
---
## Coverage Notes
- Anti-cheat analysis (client-side value validation, server authority) is not
explicitly tested here; it follows the CONCERNS or VULNERABILITIES pattern
depending on severity.
- Data privacy compliance (GDPR, COPPA) is out of scope for this spec; those
require legal review beyond code scanning.

View File

@@ -0,0 +1,171 @@
# Skill Test Spec: /tech-debt
## Skill Summary
`/tech-debt` tracks, categorizes, and prioritizes technical debt across the
codebase. It reads `docs/tech-debt-register.md` for the existing debt register
and scans source files in `src/` for inline `TODO` and `FIXME` comments. It
merges and sorts items by severity. No director gates are invoked. The skill
asks "May I write to `docs/tech-debt-register.md`?" before updating. Verdicts:
REGISTER UPDATED or NO NEW DEBT FOUND.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: REGISTER UPDATED, NO NEW DEBT FOUND
- [ ] Contains "May I write" language (skill writes to debt register)
- [ ] Has a next-step handoff (what to do after register is updated)
---
## Director Gate Checks
None. Tech debt tracking is an internal codebase analysis skill; no gates are
invoked.
---
## Test Cases
### Case 1: Happy Path — Inline TODOs plus existing register items merged
**Fixture:**
- `docs/tech-debt-register.md` exists with 2 items (LOW and MEDIUM severity)
- `src/gameplay/combat.gd` has 2 `# TODO` comments and 1 `# FIXME` comment
- `src/ui/hud.gd` has 0 inline debt comments
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill reads `docs/tech-debt-register.md` — finds 2 existing items
2. Skill scans `src/` — finds 3 inline comments (2 TODOs, 1 FIXME)
3. Skill checks whether inline comments already exist in the register (deduplication)
4. Skill presents combined list sorted by severity (FIXME before TODO by default)
5. Skill asks "May I write to `docs/tech-debt-register.md`?"
6. User approves; register updated; verdict REGISTER UPDATED
**Assertions:**
- [ ] Inline comments are found by scanning `src/` recursively
- [ ] Existing register items are not duplicated
- [ ] Combined list is sorted by severity
- [ ] "May I write" prompt appears before any write
- [ ] Verdict is REGISTER UPDATED
---
### Case 2: Register Doesn't Exist — Offered to create it
**Fixture:**
- `docs/tech-debt-register.md` does NOT exist
- `src/` contains 4 inline TODO/FIXME comments
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill attempts to read `docs/tech-debt-register.md` — not found
2. Skill informs user: "No tech-debt-register.md found"
3. Skill offers to create the register with the inline items it found
4. Skill asks "May I write to `docs/tech-debt-register.md`?" (create)
5. User approves; register created with 4 items; verdict REGISTER UPDATED
**Assertions:**
- [ ] Skill does not crash when register file is absent
- [ ] User is offered register creation (not silently skipping)
- [ ] "May I write" prompt reflects file creation (not update)
- [ ] Verdict is REGISTER UPDATED after creation
---
### Case 3: Resolved Item Detected — Marked resolved in register
**Fixture:**
- `docs/tech-debt-register.md` has 3 items; one references `src/gameplay/legacy_input.gd`
- `src/gameplay/legacy_input.gd` has been deleted (refactored away)
- The referenced TODO comment no longer exists in source
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill reads register — finds 3 items
2. Skill scans `src/` — does not find the source location referenced by item 2
3. Skill flags item 2 as RESOLVED (source is gone)
4. Skill presents the resolved item to user for confirmation
5. On approval, register is updated with item 2 marked `Status: Resolved`
**Assertions:**
- [ ] Skill checks whether each register item's source reference still exists
- [ ] Missing source locations result in items being flagged as RESOLVED
- [ ] User confirms before resolved items are written
- [ ] RESOLVED items are kept in the register (not deleted) for audit history
---
### Case 4: Edge Case — CRITICAL debt item surfaces prominently
**Fixture:**
- `src/core/network_sync.gd` has a comment: `# FIXME(CRITICAL): race condition in sync buffer — can corrupt save data`
- `docs/tech-debt-register.md` exists with 5 lower-severity items
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill scans source and finds the CRITICAL-tagged FIXME
2. Skill presents the CRITICAL item at the top of the output — before the full table
3. Skill asks user to acknowledge the critical item before proceeding
4. After acknowledgment, skill presents full debt table and asks to write
5. Register is updated with CRITICAL item at top; verdict REGISTER UPDATED
**Assertions:**
- [ ] CRITICAL items appear at the top of the output, not buried in the table
- [ ] Skill surfaces CRITICAL items before asking to write
- [ ] User acknowledgment of the CRITICAL item is requested
- [ ] CRITICAL severity is preserved in the written register entry
---
### Case 5: Gate Compliance — No gate; register updated only with approval
**Fixture:**
- Inline scan finds 2 new TODOs; register has 3 existing items
- `review-mode.txt` contains `full`
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill scans source and reads register; compiles combined debt list
2. No director gate is invoked regardless of review mode
3. Skill presents sorted debt table to user
4. Skill asks "May I write to `docs/tech-debt-register.md`?"
5. User approves; register updated; verdict REGISTER UPDATED
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Debt table is presented before any write prompt
- [ ] "May I write" prompt appears before file update
- [ ] Write only occurs with explicit user approval
---
## Protocol Compliance
- [ ] Reads `docs/tech-debt-register.md` and scans `src/` before compiling
- [ ] Deduplicates inline comments against existing register items
- [ ] Sorts combined list by severity
- [ ] Always asks "May I write" before updating register
- [ ] No director gates are invoked
- [ ] Verdict is REGISTER UPDATED or NO NEW DEBT FOUND
---
## Coverage Notes
- The case where `src/` is empty or absent is not tested; behavior follows
the NO NEW DEBT FOUND path for the inline scan, but register items would
still be read and presented.
- TODO comments without severity tags are treated as LOW severity by default;
this classification detail is an implementation concern, not tested here.

View File

@@ -0,0 +1,175 @@
# Skill Test Spec: /test-evidence-review
## Skill Summary
`/test-evidence-review` performs a quality review of test files in `tests/`,
checking test naming conventions, determinism, isolation, and absence of
hardcoded magic numbers — all against the project's test standards defined in
`coding-standards.md`. Findings may be flagged for qa-lead review. No director
gates are invoked. The skill does not write without user approval. Verdicts:
PASS, WARNINGS, or FAIL.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: PASS, WARNINGS, FAIL
- [ ] Does NOT require "May I write" language (read-only; write is optional flagging report)
- [ ] Has a next-step handoff (what to do after findings are reviewed)
---
## Director Gate Checks
None. Test evidence review is an advisory quality skill; QL-TEST-COVERAGE gate
is a separate skill invocation and is NOT triggered here.
---
## Test Cases
### Case 1: Happy Path — Tests follow all standards
**Fixture:**
- `tests/unit/combat/health_system_take_damage_test.gd` exists with:
- Naming: `test_health_system_take_damage_reduces_health()` (follows `test_[system]_[scenario]_[expected]`)
- Arrange/Act/Assert structure present
- No `sleep()`, `await` with time values, or random seeds
- No calls to external APIs or file I/O
- No inline magic numbers (uses constants from `tests/unit/combat/fixtures/`)
**Input:** `/test-evidence-review tests/unit/combat/`
**Expected behavior:**
1. Skill reads test standards from `coding-standards.md`
2. Skill reads the test file; checks all 5 standards
3. All checks pass: naming, structure, determinism, isolation, no hardcoded data
4. Verdict is PASS
**Assertions:**
- [ ] Each of the 5 test standards is checked and reported
- [ ] All checks show PASS when standards are met
- [ ] Verdict is PASS
- [ ] No files are written
---
### Case 2: Fail — Timing dependency detected
**Fixture:**
- `tests/unit/ui/hud_update_test.gd` contains:
```gdscript
await get_tree().create_timer(1.0).timeout
assert_eq(label.text, "Ready")
```
- Real-time wait of 1 second used instead of mock or signal-based assertion
**Input:** `/test-evidence-review tests/unit/ui/hud_update_test.gd`
**Expected behavior:**
1. Skill reads the test file
2. Skill detects real-time wait (`create_timer(1.0)`) — non-deterministic timing dependency
3. Skill flags this as a FAIL-level finding
4. Verdict is FAIL
5. Skill recommends replacing the timer with a signal-based assertion or mock
**Assertions:**
- [ ] Real-time wait usage is detected as a non-deterministic timing dependency
- [ ] Finding is classified as FAIL severity (blocking — violates determinism standard)
- [ ] Verdict is FAIL
- [ ] Remediation suggestion references signal-based or mock-based approach
- [ ] Skill does not edit the test file
---
### Case 3: Fail — Test calls external API directly
**Fixture:**
- `tests/unit/networking/auth_test.gd` contains:
```gdscript
var result = HTTPRequest.new().request("https://api.example.com/auth")
```
- Direct HTTP call to external API without a mock
**Input:** `/test-evidence-review tests/unit/networking/auth_test.gd`
**Expected behavior:**
1. Skill reads the test file
2. Skill detects direct external API call (HTTPRequest to live URL)
3. Skill flags this as a FAIL-level finding — violates isolation standard
4. Verdict is FAIL
5. Skill recommends injecting a mock HTTP client
**Assertions:**
- [ ] Direct external API call is detected and flagged
- [ ] Finding is classified as FAIL severity (violates isolation standard)
- [ ] Verdict is FAIL
- [ ] Remediation references dependency injection with a mock HTTP client
- [ ] Skill does not modify the test file
---
### Case 4: Edge Case — No Test Files Found
**Fixture:**
- User calls `/test-evidence-review tests/unit/audio/`
- `tests/unit/audio/` directory does not exist
**Input:** `/test-evidence-review tests/unit/audio/`
**Expected behavior:**
1. Skill attempts to read files in `tests/unit/audio/` — not found
2. Skill outputs: "No test files found at `tests/unit/audio/` — run `/test-setup` to scaffold test directories"
3. No verdict is emitted
**Assertions:**
- [ ] Skill does not crash when path does not exist
- [ ] Output names the attempted path in the message
- [ ] Output recommends `/test-setup` for scaffolding
- [ ] No verdict is emitted when there is nothing to review
---
### Case 5: Gate Compliance — No gate; QL-TEST-COVERAGE is a separate skill
**Fixture:**
- Test file has 1 WARNINGS-level finding (magic number in a non-boundary test)
- `review-mode.txt` contains `full`
**Input:** `/test-evidence-review tests/unit/combat/`
**Expected behavior:**
1. Skill reviews tests; finds 1 WARNINGS-level finding
2. No director gate is invoked (QL-TEST-COVERAGE is invoked separately, not here)
3. Verdict is WARNINGS
4. Output notes: "For full test coverage gate, run `/gate-check` which invokes QL-TEST-COVERAGE"
5. Skill offers optional report write; asks "May I write" if user opts in
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Output distinguishes this skill from the QL-TEST-COVERAGE gate invocation
- [ ] Optional report requires "May I write" before writing
- [ ] Verdict is WARNINGS for advisory-level test quality issues
---
## Protocol Compliance
- [ ] Reads `coding-standards.md` test standards before reviewing test files
- [ ] Checks naming, Arrange/Act/Assert structure, determinism, isolation, no hardcoded data
- [ ] Does not edit any test files (read-only skill)
- [ ] No director gates are invoked
- [ ] Verdict is one of: PASS, WARNINGS, FAIL
---
## Coverage Notes
- Batch review of all test files in `tests/` is not explicitly tested; behavior
is assumed to apply the same checks file by file and aggregate the verdict.
- The QL-TEST-COVERAGE director gate (which checks test coverage percentage) is
a separate concern and is intentionally NOT invoked by this skill.

View File

@@ -0,0 +1,177 @@
# Skill Test Spec: /test-flakiness
## Skill Summary
`/test-flakiness` detects non-deterministic tests by analyzing test history logs
(if available) or scanning test source code for common flakiness patterns (random
numbers without seeds, real-time waits, external I/O). No director gates are
invoked. The skill does not write without user approval. Verdicts: NO FLAKINESS,
SUSPECT TESTS FOUND, or CONFIRMED FLAKY.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
- [ ] Has a next-step handoff (what to do after flakiness findings)
---
## Director Gate Checks
None. Flakiness detection is an advisory quality skill for the QA lead; no gates
are invoked.
---
## Test Cases
### Case 1: Happy Path — Clean test history, no flakiness
**Fixture:**
- `production/qa/test-history/` contains logs for 10 test runs
- All tests pass consistently across all 10 runs (100% pass rate per test)
- No test has a failure pattern
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill reads test history logs from `production/qa/test-history/`
2. Skill computes per-test pass rate across 10 runs
3. All tests pass all 10 runs — no inconsistency detected
4. Verdict is NO FLAKINESS
**Assertions:**
- [ ] Skill reads test history logs when available
- [ ] Per-test pass rate is computed across all available runs
- [ ] Verdict is NO FLAKINESS when all tests pass consistently
- [ ] No files are written
---
### Case 2: Suspect Tests Found — Test fails intermittently in history
**Fixture:**
- `production/qa/test-history/` contains logs for 10 test runs
- `test_combat_damage_applies_crit_multiplier` passes 7 times, fails 3 times
- Failure messages differ (sometimes timeout, sometimes wrong value)
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill reads test history logs — computes pass rates
2. `test_combat_damage_applies_crit_multiplier` has 70% pass rate (threshold: 95%)
3. Skill flags it as SUSPECT with pass rate (7/10) and failure pattern noted
4. Verdict is SUSPECT TESTS FOUND
5. Skill recommends investigating the test for timing or state dependencies
**Assertions:**
- [ ] Tests below the pass-rate threshold are flagged by name
- [ ] Pass rate (fraction and percentage) is shown for each suspect test
- [ ] Failure pattern (e.g., inconsistent error messages) is noted if detectable
- [ ] Verdict is SUSPECT TESTS FOUND
- [ ] Skill recommends investigation steps
---
### Case 3: Source Pattern — Random number used without seed
**Fixture:**
- No test history logs exist
- `tests/unit/loot/loot_drop_test.gd` contains:
```gdscript
var roll = randf() # unseeded random — non-deterministic
assert_gt(roll, 0.5, "Loot should drop above 50%")
```
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill finds no test history logs
2. Skill falls back to source code analysis
3. Skill detects `randf()` call without a preceding `seed()` call
4. Skill flags the test as FLAKINESS RISK (source pattern, not confirmed)
5. Verdict is SUSPECT TESTS FOUND (pattern detected, not confirmed by history)
6. Skill recommends seeding random before the call or mocking the random function
**Assertions:**
- [ ] Source code analysis is used as fallback when no history logs exist
- [ ] Unseeded random number usage is detected as a flakiness risk
- [ ] Verdict is SUSPECT TESTS FOUND (not CONFIRMED FLAKY — no history to confirm)
- [ ] Remediation recommends seeding or mocking
---
### Case 4: No Test History — Source-only analysis with common patterns
**Fixture:**
- `production/qa/test-history/` does not exist
- `tests/` contains 15 test files
- Scan finds 2 tests using `OS.get_ticks_msec()` for timing assertions
- No other flakiness patterns found
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill checks for test history — not found
2. Skill notes: "No test history available — analyzing source code for flakiness patterns only"
3. Skill scans all test files for known patterns: unseeded random, real-time waits, system clock usage
4. Finds 2 tests using `OS.get_ticks_msec()` — flags as FLAKINESS RISK
5. Verdict is SUSPECT TESTS FOUND
**Assertions:**
- [ ] Skill notes clearly that source-only analysis is being performed (no history)
- [ ] Common flakiness patterns are scanned: random, time-based assertions, external I/O
- [ ] `OS.get_ticks_msec()` usage for assertions is flagged as a flakiness risk
- [ ] Verdict is SUSPECT TESTS FOUND when source patterns are found
---
### Case 5: Gate Compliance — No gate; flakiness report is advisory
**Fixture:**
- Test history shows 1 CONFIRMED FLAKY test (fails 6 out of 10 runs)
- `review-mode.txt` contains `full`
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill analyzes test history; identifies 1 confirmed flaky test
2. No director gate is invoked regardless of review mode
3. Verdict is CONFIRMED FLAKY
4. Skill presents findings and offers optional written report
5. If user opts in: "May I write to `production/qa/flakiness-report-[date].md`?"
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] CONFIRMED FLAKY verdict requires history-based evidence (not just source patterns)
- [ ] Optional report requires "May I write" before writing
- [ ] Flakiness report is advisory for qa-lead; skill does not auto-disable tests
---
## Protocol Compliance
- [ ] Reads test history logs when available; falls back to source analysis when not
- [ ] Notes clearly which analysis mode is being used (history vs. source-only)
- [ ] Flakiness threshold (e.g., 95% pass rate) is used for SUSPECT classification
- [ ] CONFIRMED FLAKY requires history evidence; SUSPECT covers source patterns only
- [ ] Does not disable or modify any test files
- [ ] No director gates are invoked
- [ ] Verdict is one of: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
---
## Coverage Notes
- The pass-rate threshold for SUSPECT classification (95% suggested above) is an
implementation detail; the tests verify that intermittent failures are flagged,
not the exact threshold value.
- Tests that fail due to environment issues (missing assets, wrong platform) are
not flakiness — the skill distinguishes environment failures from non-determinism
in the test itself; this distinction is not explicitly tested here.

View File

@@ -0,0 +1,197 @@
# Skill Test Spec: /architecture-decision
## Skill Summary
`/architecture-decision` guides the user through section-by-section authoring of
a new Architecture Decision Record (ADR). Required sections are: Status, Context,
Decision, Consequences, Alternatives, and Related ADRs. The skill also stamps the
engine version reference from `docs/engine-reference/` into the ADR for traceability.
In `full` review mode, TD-ADR (technical-director) and LP-FEASIBILITY
(lead-programmer) gate agents spawn after the draft is complete. If both gates
return APPROVED, the ADR status is set to Accepted. In `lean` or `solo` mode,
both gates are skipped and the ADR is written with Status: Proposed. The skill
asks "May I write" per section during authoring. ADRs are written to
`docs/architecture/adr-NNN-[name].md`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: ACCEPTED, PROPOSED, CONCERNS
- [ ] Contains "May I write" collaborative protocol language (per-section approval)
- [ ] Has a next-step handoff at the end
- [ ] Documents gate behavior: TD-ADR + LP-FEASIBILITY in full mode; skipped in lean/solo
- [ ] Documents that ADR status is Accepted (full, gates approve) or Proposed (otherwise)
- [ ] Mentions engine version stamp from `docs/engine-reference/`
---
## Director Gate Checks
In `full` mode: TD-ADR (technical-director) and LP-FEASIBILITY (lead-programmer)
spawn after the ADR draft is complete. If both return APPROVED, ADR Status is set
to Accepted. If either returns CONCERNS or FAIL, ADR stays Proposed.
In `lean` mode: both gates are skipped. ADR is written with Status: Proposed.
Output notes: "TD-ADR skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode".
In `solo` mode: both gates are skipped. ADR is written with Status: Proposed.
---
## Test Cases
### Case 1: Happy Path — New ADR for rendering approach, full mode, gates approve
**Fixture:**
- `docs/architecture/` exists with no existing ADR for rendering
- `docs/engine-reference/[engine]/VERSION.md` exists
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/architecture-decision rendering-approach`
**Expected behavior:**
1. Skill guides user through each required section (Status, Context, Decision, Consequences, Alternatives, Related ADRs)
2. Engine version is stamped into the ADR from `docs/engine-reference/`
3. For each section: draft shown, "May I write this section?" asked, approved
4. After all sections: TD-ADR and LP-FEASIBILITY gates spawn in parallel
5. Both gates return APPROVED
6. ADR Status is set to Accepted
7. Skill writes `docs/architecture/adr-NNN-rendering-approach.md`
8. `docs/architecture/tr-registry.yaml` updated if new TR-IDs are defined
**Assertions:**
- [ ] All 6 required sections are authored and written
- [ ] Engine version reference is stamped in the ADR
- [ ] TD-ADR and LP-FEASIBILITY spawn in parallel (not sequentially)
- [ ] ADR Status is Accepted when both gates return APPROVED in full mode
- [ ] "May I write" is asked per section during authoring
- [ ] File is written to `docs/architecture/adr-NNN-[name].md`
---
### Case 2: Failure Path — TD-ADR returns CONCERNS
**Fixture:**
- ADR draft is complete (all sections filled)
- `production/session-state/review-mode.txt` contains `full`
- TD-ADR gate returns CONCERNS: "The decision does not address [specific concern]"
**Input:** `/architecture-decision [topic]`
**Expected behavior:**
1. TD-ADR gate spawns and returns CONCERNS with specific feedback
2. Skill surfaces the concerns to the user
3. ADR Status remains Proposed (not Accepted)
4. User is asked: revise the decision to address concerns, or accept as Proposed
5. ADR is written with Status: Proposed if concerns are not resolved
**Assertions:**
- [ ] TD-ADR concerns are shown to the user verbatim
- [ ] ADR Status is Proposed (not Accepted) when TD-ADR returns CONCERNS
- [ ] Skill does NOT set Status: Accepted while CONCERNS are unresolved
- [ ] User is given the option to revise and re-run the gate
---
### Case 3: Lean Mode — Both gates skipped; ADR written as Proposed
**Fixture:**
- `production/session-state/review-mode.txt` contains `lean`
- ADR draft is authored for a new technical decision
**Input:** `/architecture-decision [topic]`
**Expected behavior:**
1. Skill guides user through all 6 sections
2. After draft is complete: both TD-ADR and LP-FEASIBILITY are skipped
3. Output notes: "TD-ADR skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode"
4. ADR is written with Status: Proposed (not Accepted, since gates did not approve)
5. "May I write" is still asked before the final file write
**Assertions:**
- [ ] Both gate skip notes appear in output
- [ ] ADR Status is Proposed (not Accepted) in lean mode
- [ ] "May I write" is still asked before writing the file
- [ ] Skill writes the ADR after user approval
---
### Case 4: Edge Case — ADR already exists for this topic
**Fixture:**
- `docs/architecture/` contains an existing ADR covering the same topic
- The existing ADR has Status: Accepted
**Input:** `/architecture-decision [same-topic]`
**Expected behavior:**
1. Skill detects an existing ADR covering the same topic
2. Skill asks: "An ADR for [topic] already exists ([filename]). Update it, or create a new superseding ADR?"
3. User selects update or supersede
4. Skill does NOT silently create a duplicate ADR
**Assertions:**
- [ ] Skill detects the existing ADR before authoring begins
- [ ] User is offered update or supersede options — no silent duplicate
- [ ] If update: skill opens the existing ADR for section-by-section revision
- [ ] If supersede: new ADR references the superseded one in Related ADRs section
---
### Case 5: Director Gate — Status set correctly based on mode and gate outcome
**Fixture:**
- ADR draft is complete
- Two scenarios: (a) full mode, both gates APPROVED; (b) full mode, one gate CONCERNS
**Full mode, both APPROVED:**
- ADR Status is set to Accepted
**Assertions (both approved):**
- [ ] ADR frontmatter/header shows `Status: Accepted`
- [ ] Both TD-ADR and LP-FEASIBILITY appear as APPROVED in output
**Full mode, one gate returns CONCERNS:**
- ADR Status stays Proposed
**Assertions (CONCERNS):**
- [ ] ADR frontmatter/header shows `Status: Proposed`
- [ ] Concerns are listed in output
- [ ] Skill does NOT set Status: Accepted when any gate returns CONCERNS
**Lean/solo mode:**
- ADR Status is always Proposed regardless of content quality
**Assertions (lean/solo):**
- [ ] ADR Status is Proposed in lean mode
- [ ] ADR Status is Proposed in solo mode
- [ ] No gate output appears in lean or solo mode
---
## Protocol Compliance
- [ ] All 6 required sections authored before gate review
- [ ] Engine version stamped in ADR from `docs/engine-reference/`
- [ ] "May I write" asked per section during authoring
- [ ] TD-ADR and LP-FEASIBILITY spawn in parallel in full mode
- [ ] Skipped gates noted by name and mode in lean/solo output
- [ ] ADR Status: Accepted only when full mode AND both gates APPROVED
- [ ] Ends with next-step handoff: `/architecture-review` or `/create-control-manifest`
---
## Coverage Notes
- ADR numbering (auto-incrementing NNN) is not independently fixture-tested —
the skill reads existing ADR filenames to assign the next number.
- Related ADRs section linking (supersedes / related-to) is tested structurally
via Case 4 but not all link types are individually verified.
- The TR-registry update (when new TR-IDs are defined in the ADR) is part of the
write phase — tested implicitly via Case 1.

View File

@@ -0,0 +1,185 @@
# Skill Test Spec: /art-bible
## Skill Summary
`/art-bible` is a guided, section-by-section art bible authoring skill. It
produces a comprehensive visual direction document covering: Visual Style overview,
Color Palette, Typography, Character Design Rules, Environment Style, and UI
Visual Language. The skill follows the skeleton-first pattern: creates the file
with all section headers immediately, then fills each section through discussion
and writes each to disk after user approval.
In `full` review mode, the AD-ART-BIBLE director gate (art director) runs after
the draft is complete and before any section is written. In `lean` and `solo`
modes, AD-ART-BIBLE is skipped and only user approval is required. The verdict
is COMPLETE when all sections are written.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" language per section
- [ ] Documents the AD-ART-BIBLE director gate and its mode behavior
- [ ] Has a next-step handoff (e.g., `/asset-spec` or `/design-system`)
---
## Director Gate Checks
| Gate ID | Trigger condition | Mode guard |
|--------------|--------------------------------|-----------------------|
| AD-ART-BIBLE | After draft is complete | full only (not lean/solo) |
---
## Test Cases
### Case 1: Happy Path — Full mode, art bible drafted, AD-ART-BIBLE approves
**Fixture:**
- No existing `design/art-bible.md`
- `production/session-state/review-mode.txt` contains `full`
- `design/gdd/game-concept.md` exists with visual tone described
**Input:** `/art-bible`
**Expected behavior:**
1. Skill creates skeleton `design/art-bible.md` with all section headers
2. Skill discusses and drafts each section with user collaboration
3. After all sections are drafted, AD-ART-BIBLE gate is invoked (art director review)
4. AD-ART-BIBLE returns APPROVED
5. Skill asks "May I write section [N] to `design/art-bible.md`?" per section
6. All sections written after approval; verdict is COMPLETE
**Assertions:**
- [ ] Skeleton file is created first (before any section content is written)
- [ ] AD-ART-BIBLE gate is invoked in full mode after draft is complete
- [ ] Gate approval precedes the "May I write" section asks
- [ ] All sections are present in the final file
- [ ] Verdict is COMPLETE
---
### Case 2: AD-ART-BIBLE Returns CONCERNS — Section revised before writing
**Fixture:**
- Art bible draft complete
- `production/session-state/review-mode.txt` contains `full`
- AD-ART-BIBLE gate returns CONCERNS: "Color palette clashes with the dark
atmospheric tone described in the game concept"
**Input:** `/art-bible`
**Expected behavior:**
1. AD-ART-BIBLE gate returns CONCERNS with specific feedback about palette
2. Skill surfaces feedback to user: "Art director has concerns about the color palette"
3. Skill returns to the Color Palette section for revision
4. User and skill revise the palette to align with game concept tone
5. AD-ART-BIBLE is not re-invoked (user decides to proceed after revision)
6. Revised section is written after "May I write" approval; verdict is COMPLETE
**Assertions:**
- [ ] CONCERNS are shown to user before any section is written
- [ ] Skill returns to the affected section for revision (not all sections)
- [ ] Revised content (not original) is written to file
- [ ] Verdict is COMPLETE after revision and approval
---
### Case 3: Lean Mode — AD-ART-BIBLE Skipped, Written With User Approval Only
**Fixture:**
- No existing art bible
- `production/session-state/review-mode.txt` contains `lean`
**Input:** `/art-bible`
**Expected behavior:**
1. Skill reads review mode — determines `lean`
2. Skill drafts all sections with user collaboration
3. AD-ART-BIBLE gate is skipped: output notes "[AD-ART-BIBLE] skipped — lean mode"
4. Skill asks user for direct approval of each section
5. Sections are written after user confirmation; verdict is COMPLETE
**Assertions:**
- [ ] AD-ART-BIBLE gate is NOT invoked in lean mode
- [ ] Skip is explicitly noted: "[AD-ART-BIBLE] skipped — lean mode"
- [ ] User approval is still required per section (gate skip ≠ approval skip)
- [ ] Verdict is COMPLETE
---
### Case 4: Existing Art Bible — Retrofit Mode
**Fixture:**
- `design/art-bible.md` already exists with all sections populated
- User wants to update the Character Design Rules section
**Input:** `/art-bible`
**Expected behavior:**
1. Skill reads existing art bible and detects all sections populated
2. Skill offers retrofit: "Art bible exists — which section would you like to update?"
3. User selects Character Design Rules
4. Skill drafts updated content; in full mode, AD-ART-BIBLE is invoked for the
revised section before writing
5. Skill asks "May I write Character Design Rules to `design/art-bible.md`?"
6. Only that section is updated; other sections preserved; verdict is COMPLETE
**Assertions:**
- [ ] Existing art bible is detected and retrofit is offered
- [ ] Only the selected section is updated
- [ ] In full mode: AD-ART-BIBLE gate runs even for single-section retrofit
- [ ] Other sections are preserved
- [ ] Verdict is COMPLETE
---
### Case 5: Solo Mode — AD-ART-BIBLE Skipped, Noted in Output
**Fixture:**
- No existing art bible
- `production/session-state/review-mode.txt` contains `solo`
**Input:** `/art-bible`
**Expected behavior:**
1. Skill reads review mode — determines `solo`
2. Art bible is drafted and written with only user approval
3. AD-ART-BIBLE gate is skipped: output notes "[AD-ART-BIBLE] skipped — solo mode"
4. No director agents are spawned
5. Verdict is COMPLETE
**Assertions:**
- [ ] AD-ART-BIBLE gate is NOT invoked in solo mode
- [ ] Skip is explicitly noted with "solo mode" label
- [ ] No director agents of any kind are spawned
- [ ] Verdict is COMPLETE
---
## Protocol Compliance
- [ ] Creates skeleton file immediately with all section headers
- [ ] Discusses and drafts one section at a time
- [ ] AD-ART-BIBLE gate runs in full mode after all sections are drafted
- [ ] AD-ART-BIBLE is skipped in lean and solo modes — noted by name
- [ ] Asks "May I write section [N]" per section
- [ ] Verdict is COMPLETE when all sections are written
---
## Coverage Notes
- The case where AD-ART-BIBLE returns REJECT (not just CONCERNS) is not
separately tested; the skill would block writing and ask the user how to
proceed (revise or override).
- The Typography section is listed as a required art bible section but its
specific content requirements are not assertion-tested here.
- The art bible feeds into `/asset-spec` — this relationship is noted in the
handoff but not tested as part of this skill's spec.

View File

@@ -0,0 +1,187 @@
# Skill Test Spec: /create-architecture
## Skill Summary
`/create-architecture` guides the user through section-by-section authoring of a
technical architecture document. It uses a skeleton-first approach — the file is
created with all required section headers before any content is filled. Each
section is discussed, drafted, and written individually after user approval. If an
architecture document already exists, the skill offers retrofit mode to update
specific sections.
In `full` review mode, TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY
(lead-programmer) spawn after the complete draft is finished. In `lean` or `solo`
mode, both gates are skipped. The skill writes to `docs/architecture/architecture.md`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
- [ ] Contains "May I write" collaborative protocol language (per-section approval)
- [ ] Has a next-step handoff at the end (`/architecture-review` or `/create-control-manifest`)
- [ ] Documents skeleton-first approach
- [ ] Documents gate behavior: TD-ARCHITECTURE + LP-FEASIBILITY in full mode; skipped in lean/solo
- [ ] Documents retrofit mode for existing architecture documents
---
## Director Gate Checks
In `full` mode: TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY
(lead-programmer) spawn in parallel after all sections are drafted and before
any final approval write.
In `lean` mode: both gates are skipped. Output notes:
"TD-ARCHITECTURE skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode".
In `solo` mode: both gates are skipped with equivalent notes.
---
## Test Cases
### Case 1: Happy Path — New architecture doc, skeleton-first, full mode gates approve
**Fixture:**
- No existing `docs/architecture/architecture.md`
- `docs/architecture/` contains Accepted ADRs for reference
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/create-architecture`
**Expected behavior:**
1. Skill creates skeleton `docs/architecture/architecture.md` with all required section headers
2. For each section: drafts content, shows draft, asks "May I write [section]?", writes after approval
3. After all sections are drafted: TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel
4. Both gates return APPROVED
5. Final "May I confirm architecture is complete?" asked
6. Session state updated
**Assertions:**
- [ ] Skeleton file is created with all section headers before any content is written
- [ ] "May I write [section]?" asked per section during authoring
- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel (not sequentially)
- [ ] Both gates complete before the final completion confirmation
- [ ] Verdict is APPROVED when both gates return APPROVED
- [ ] Next-step handoff to `/architecture-review` or `/create-control-manifest` is present
---
### Case 2: Failure Path — TD-ARCHITECTURE returns MAJOR REVISION
**Fixture:**
- Architecture doc is fully drafted (all sections)
- `production/session-state/review-mode.txt` contains `full`
- TD-ARCHITECTURE gate returns MAJOR REVISION: "[specific structural issue]"
**Input:** `/create-architecture`
**Expected behavior:**
1. All sections are drafted and written
2. TD-ARCHITECTURE gate runs and returns MAJOR REVISION with specific feedback
3. Skill surfaces the feedback to the user
4. Architecture is NOT marked as finalized
5. User is asked: revise the flagged sections, or accept the document as a draft
**Assertions:**
- [ ] Architecture is NOT marked finalized when TD-ARCHITECTURE returns MAJOR REVISION
- [ ] Gate feedback is shown to the user with specific issue descriptions
- [ ] User is given the option to revise specific sections
- [ ] Skill does NOT auto-finalize despite MAJOR REVISION feedback
---
### Case 3: Lean Mode — Both gates skipped; architecture written with user approval only
**Fixture:**
- No existing architecture doc
- `production/session-state/review-mode.txt` contains `lean`
**Input:** `/create-architecture`
**Expected behavior:**
1. Skeleton file is created
2. All sections are authored and written per-section with user approval
3. After completion: TD-ARCHITECTURE and LP-FEASIBILITY are skipped
4. Output notes: "TD-ARCHITECTURE skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode"
5. Architecture is considered complete based on user approval alone
**Assertions:**
- [ ] Both gate skip notes appear in output
- [ ] Architecture document is written with only user approval in lean mode
- [ ] Skill does NOT block completion because gates were skipped
- [ ] Next-step handoff is still present
---
### Case 4: Retrofit Mode — Existing architecture doc, user updates a section
**Fixture:**
- `docs/architecture/architecture.md` already exists with all sections populated
**Input:** `/create-architecture`
**Expected behavior:**
1. Skill detects existing architecture doc and reads its current content
2. Skill offers retrofit mode: "Architecture doc already exists. Which section would you like to update?"
3. User selects a section
4. Skill authors only that section, asks "May I write [section]?"
5. Only the selected section is updated — other sections unchanged
**Assertions:**
- [ ] Skill detects and reads the existing architecture doc before offering retrofit
- [ ] User is asked which section to update — not asked to rewrite the whole document
- [ ] Only the selected section is updated
- [ ] Other sections are not modified during a retrofit session
---
### Case 5: Director Gate — Architecture references a Proposed ADR; flagged as risk
**Fixture:**
- Architecture doc is being authored
- One section references or depends on an ADR that has `Status: Proposed`
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/create-architecture`
**Expected behavior:**
1. Skill authors all sections
2. During authoring, skill detects a reference to a Proposed ADR
3. Skill flags: "Note: [section] references ADR-NNN which is Proposed — this is a risk until the ADR is accepted"
4. Risk flag is embedded in the relevant section's content
5. TD-ARCHITECTURE and LP-FEASIBILITY still run — they are informed of the Proposed ADR risk
**Assertions:**
- [ ] Proposed ADR reference is detected and flagged during section authoring
- [ ] Risk note is embedded in the architecture document section
- [ ] TD-ARCHITECTURE and LP-FEASIBILITY still spawn (the risk does not block the gates)
- [ ] Risk flag names the specific ADR number and title
---
## Protocol Compliance
- [ ] Skeleton file created with all section headers before any content is written
- [ ] "May I write [section]?" asked per section during authoring
- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel in full mode
- [ ] Skipped gates noted by name and mode in lean/solo output
- [ ] Proposed ADR references flagged as risks in the document
- [ ] Ends with next-step handoff: `/architecture-review` or `/create-control-manifest`
---
## Coverage Notes
- The required section list for architecture documents is defined in the skill
body and in the `/architecture-review` skill — not re-enumerated here.
- Engine version stamping in the architecture doc (parallel to ADR stamping)
is part of the authoring workflow — tested implicitly via Case 1.
- The retrofit mode for updating multiple sections in one session follows the
same per-section approval pattern — not independently tested for multi-section
retrofits.

View File

@@ -0,0 +1,192 @@
# Skill Test Spec: /design-system
## Skill Summary
`/design-system` guides the user through section-by-section authoring of a Game
Design Document (GDD) for a single game system. All 8 required sections must be
authored: Overview, Player Fantasy, Detailed Rules, Formulas, Edge Cases,
Dependencies, Tuning Knobs, and Acceptance Criteria. The skill uses a
skeleton-first approach — it creates the GDD file with all 8 section headers
before filling any content — and writes each section individually after approval.
The CD-GDD-ALIGN gate (creative-director) runs in both `full` AND `lean` modes.
It is only skipped in `solo` mode. If an existing GDD file is found, the skill
offers a retrofit mode to update specific sections rather than rewriting the whole
document.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION
- [ ] Contains "May I write" collaborative protocol language (per-section approval)
- [ ] Has a next-step handoff at the end
- [ ] Documents skeleton-first approach (file created with headers before content)
- [ ] Documents CD-GDD-ALIGN gate: active in full AND lean mode; skipped in solo only
- [ ] Documents retrofit mode for existing GDD files
---
## Director Gate Checks
In `full` mode: CD-GDD-ALIGN (creative-director) gate runs after each section is
drafted, before writing. If MAJOR REVISION is returned, the section must be
rewritten before proceeding.
In `lean` mode: CD-GDD-ALIGN still runs (this gate is NOT skipped in lean mode —
it runs in both full and lean). Only solo mode skips it.
In `solo` mode: CD-GDD-ALIGN is skipped. Output notes:
"CD-GDD-ALIGN skipped — solo mode". Sections are written with only user approval.
---
## Test Cases
### Case 1: Happy Path — New GDD, skeleton-first, CD-GDD-ALIGN in lean mode
**Fixture:**
- No existing GDD for the target system in `design/gdd/`
- `production/session-state/review-mode.txt` contains `lean`
**Input:** `/design-system [system-name]`
**Expected behavior:**
1. Skill creates skeleton file `design/gdd/[system-name].md` with all 8 section headers (empty bodies)
2. For each section: discusses with user, drafts content, shows draft
3. CD-GDD-ALIGN gate runs on each section draft (lean mode — gate is active)
4. Gate returns APPROVED for each section
5. "May I write [section]?" asked after gate approval
6. Section written to file after user approval
7. Process repeats for all 8 sections
**Assertions:**
- [ ] Skeleton file is created with all 8 section headers before any content is written
- [ ] CD-GDD-ALIGN runs on each section in lean mode (not skipped)
- [ ] "May I write" is asked per section (not once for all sections)
- [ ] Each section is written individually after gate + user approval
- [ ] All 8 sections are present in the final GDD file
---
### Case 2: Retrofit Mode — Existing GDD, update specific section
**Fixture:**
- `design/gdd/[system-name].md` already exists with all 8 sections populated
**Input:** `/design-system [system-name]`
**Expected behavior:**
1. Skill detects existing GDD file and reads its current content
2. Skill offers retrofit mode: "GDD already exists. Which section would you like to update?"
3. User selects a specific section (e.g., Formulas)
4. Skill authors only that section, runs CD-GDD-ALIGN, asks "May I write?"
5. Only the selected section is updated — other sections are not modified
**Assertions:**
- [ ] Skill detects and reads existing GDD before offering retrofit mode
- [ ] User is asked which section to update — not asked to rewrite the whole document
- [ ] Only the selected section is rewritten — others remain unchanged
- [ ] CD-GDD-ALIGN still runs on the updated section
- [ ] "May I write" is asked before updating the section
---
### Case 3: Director Gate — CD-GDD-ALIGN returns MAJOR REVISION
**Fixture:**
- New GDD being authored
- `production/session-state/review-mode.txt` contains `lean`
- CD-GDD-ALIGN gate returns MAJOR REVISION on the Player Fantasy section
**Input:** `/design-system [system-name]`
**Expected behavior:**
1. Player Fantasy section is drafted
2. CD-GDD-ALIGN gate runs and returns MAJOR REVISION with specific feedback
3. Skill surfaces the feedback to the user
4. Section is NOT written to file while MAJOR REVISION is unresolved
5. User rewrites the section in collaboration with the skill
6. CD-GDD-ALIGN runs again on the revised section
7. If revised section passes, "May I write?" is asked and section is written
**Assertions:**
- [ ] Section is NOT written when CD-GDD-ALIGN returns MAJOR REVISION
- [ ] Gate feedback is shown to the user before requesting revision
- [ ] CD-GDD-ALIGN runs again after the section is revised
- [ ] Skill does NOT auto-proceed to the next section while MAJOR REVISION is unresolved
---
### Case 4: Solo Mode — CD-GDD-ALIGN skipped; sections written with user approval only
**Fixture:**
- New GDD being authored
- `production/session-state/review-mode.txt` contains `solo`
**Input:** `/design-system [system-name]`
**Expected behavior:**
1. Skeleton file is created with 8 section headers
2. For each section: drafted, shown to user
3. CD-GDD-ALIGN is skipped — noted per section: "CD-GDD-ALIGN skipped — solo mode"
4. "May I write [section]?" asked after user reviews draft
5. Section written after user approval
6. No gate review at any stage
**Assertions:**
- [ ] "CD-GDD-ALIGN skipped — solo mode" noted for each section
- [ ] Sections are written after user approval alone (no gate required)
- [ ] Skill does NOT spawn any CD-GDD-ALIGN gate in solo mode
- [ ] Full GDD is written with only user approval in solo mode
---
### Case 5: Director Gate — Empty sections not written to file
**Fixture:**
- GDD authoring in progress
- User and skill discuss one section but do not produce any approved content
(e.g., discussion ends without a decision, or user says "skip for now")
**Input:** `/design-system [system-name]`
**Expected behavior:**
1. Section discussion produces no approved content
2. Skill does NOT write an empty or placeholder body to the section
3. The section header remains in the skeleton file but the body stays empty
4. Skill moves to the next section without writing the empty one
5. At the end, incomplete sections are listed and user is reminded to return to them
**Assertions:**
- [ ] Empty or unapproved sections are NOT written to the file
- [ ] Skeleton section header remains (preserves structure)
- [ ] Skill tracks and lists incomplete sections at the end of the session
- [ ] Skill does NOT write "TBD" or placeholder content without user approval
---
## Protocol Compliance
- [ ] Skeleton file created with all 8 headers before any content is written
- [ ] CD-GDD-ALIGN runs in both full AND lean mode (not just full)
- [ ] CD-GDD-ALIGN skipped only in solo mode — noted per section
- [ ] "May I write [section]?" asked per section (not once for the whole document)
- [ ] MAJOR REVISION from CD-GDD-ALIGN blocks section write until resolved
- [ ] Only approved, non-empty sections are written to the file
- [ ] Ends with next-step handoff: `/review-all-gdds` or `/map-systems next`
---
## Coverage Notes
- The 8 required sections are validated against the project's design document
standards defined in `CLAUDE.md` — not re-enumerated here.
- The skill's internal section-ordering logic (which section to author first) is
not independently tested — the order follows the standard GDD template.
- Pillar alignment checking within CD-GDD-ALIGN is evaluated holistically by
the gate agent — specific pillar checks are not fixture-tested here.

View File

@@ -0,0 +1,176 @@
# Skill Test Spec: /quick-design
## Skill Summary
`/quick-design` produces a lightweight design spec for features too small to
warrant a full 8-section GDD. The target scope is under 4 hours of design time
for a single-system feature. Instead of the full 8-section GDD format, the
quick-design spec uses a streamlined 3-section format: Overview, Rules, and
Acceptance Criteria.
The skill has no director gates — adding gate overhead would defeat the purpose
of a lightweight design tool. The skill asks "May I write" before writing the
design note to `design/quick-notes/[name].md`. If the feature scope is too large
for a quick-design, the skill redirects to `/design-system` instead.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: CREATED, BLOCKED, REDIRECTED
- [ ] Contains "May I write" collaborative protocol language (for quick-note file)
- [ ] Has a next-step handoff at the end
- [ ] Explicitly notes: no director gates (lightweight skill by design)
- [ ] Mentions scope check: redirects to `/design-system` if scope exceeds sub-4h threshold
---
## Director Gate Checks
No director gates — this skill spawns no director gate agents. The lightweight
nature of quick-design means director gate overhead is intentionally absent.
Full GDD review is not needed for sub-4-hour single-system features.
---
## Test Cases
### Case 1: Happy Path — Small UI change produces a 3-section spec
**Fixture:**
- No existing quick-note for the target feature
- Feature is clearly scoped: a single UI element change with no cross-system impact
**Input:** `/quick-design [feature-name]`
**Expected behavior:**
1. Skill asks scoping questions: what system, what change, what is the acceptance signal
2. Skill determines scope is within the sub-4h threshold
3. Skill drafts a 3-section spec: Overview, Rules, Acceptance Criteria
4. Draft is shown to user
5. "May I write `design/quick-notes/[name].md`?" is asked
6. File is written after approval
**Assertions:**
- [ ] Spec contains exactly 3 sections: Overview, Rules, Acceptance Criteria
- [ ] Draft is shown to user before "May I write" ask
- [ ] "May I write `design/quick-notes/[name].md`?" is asked before writing
- [ ] File is written to the correct path: `design/quick-notes/[name].md`
- [ ] Verdict is CREATED after successful write
---
### Case 2: Failure Path — Scope check fails; redirected to /design-system
**Fixture:**
- Feature described spans multiple systems or would take more than 4 hours of design time
(e.g., "redesign the entire combat system" or "new progression mechanic affecting all classes")
**Input:** `/quick-design [large-feature]`
**Expected behavior:**
1. Skill asks scoping questions
2. Skill determines scope exceeds the sub-4h / single-system threshold
3. Skill outputs: "This feature is too large for a quick-design. Use `/design-system [name]` for a full GDD."
4. Skill does NOT write a quick-note file
5. Verdict is REDIRECTED
**Assertions:**
- [ ] Skill detects the scope excess and stops before drafting
- [ ] Message explicitly names `/design-system` as the correct alternative
- [ ] No quick-note file is written
- [ ] Verdict is REDIRECTED (not CREATED or BLOCKED)
---
### Case 3: Edge Case — File already exists; offered to update
**Fixture:**
- `design/quick-notes/[name].md` already exists from a previous session
**Input:** `/quick-design [name]`
**Expected behavior:**
1. Skill detects existing quick-note file and reads its current content
2. Skill asks: "[name].md already exists. Update it, or create a new version?"
3. User selects update
4. Skill shows the existing spec and asks which section to revise
5. Updated spec is shown, "May I write?" asked, file updated after approval
**Assertions:**
- [ ] Skill detects and reads the existing file before offering to update
- [ ] User is offered update or create-new options — not auto-overwritten
- [ ] Only the revised section is updated (or the whole spec if user chooses full rewrite)
- [ ] "May I write" is asked before overwriting the existing file
---
### Case 4: Edge Case — No argument provided
**Fixture:**
- `design/quick-notes/` directory may or may not exist
**Input:** `/quick-design` (no argument)
**Expected behavior:**
1. Skill detects no argument is provided
2. Skill outputs a usage error: "No feature name specified. Usage: /quick-design [feature-name]"
3. Skill provides an example: `/quick-design pause-menu-settings`
4. No file is created
**Assertions:**
- [ ] Skill outputs a usage error when no argument is given
- [ ] A usage example is shown with the correct format
- [ ] No quick-note file is written
- [ ] Skill does NOT silently pick a feature name or default to any action
---
### Case 5: Director Gate — No gate spawned; explicitly noted for sub-4h features
**Fixture:**
- Feature is within scope for quick-design
- `production/session-state/review-mode.txt` exists with `full`
**Input:** `/quick-design [feature-name]`
**Expected behavior:**
1. Skill asks scoping questions and determines scope is within threshold
2. Skill does NOT read `production/session-state/review-mode.txt`
3. Skill does NOT spawn any director gate agent
4. Spec is drafted, "May I write" asked, file written after approval
5. Output explicitly notes: "No director gate review — quick-design is for sub-4h features"
**Assertions:**
- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
- [ ] Skill does NOT read `production/session-state/review-mode.txt`
- [ ] Output contains a note explaining why no gate review is needed
- [ ] Review mode has no effect on this skill's behavior
- [ ] Full GDD review path (`/design-system`) is mentioned as the alternative for larger features
---
## Protocol Compliance
- [ ] Scope check runs before drafting (redirects to `/design-system` if scope too large)
- [ ] 3-section format used (Overview, Rules, Acceptance Criteria) — NOT the 8-section GDD format
- [ ] Draft shown to user before "May I write" ask
- [ ] "May I write `design/quick-notes/[name].md`?" asked before writing
- [ ] No director gates — no review-mode.txt read
- [ ] Ends with next-step handoff (e.g., proceed to implementation or `/dev-story`)
---
## Coverage Notes
- The scope threshold heuristic (sub-4h, single-system) is a judgment call —
the skill's internal check is the authoritative definition and is not
independently tested by counting hours.
- The `design/quick-notes/` directory is created automatically if it does not
exist — this filesystem behavior is not independently tested here.
- Integration with the story pipeline (can a quick-design generate a story
directly?) is out of scope for this spec — quick-designs are standalone.

View File

@@ -0,0 +1,176 @@
# Skill Test Spec: /ux-design
## Skill Summary
`/ux-design` is a guided, section-by-section UX spec authoring skill. It produces
user flow diagrams (described textually), interaction state definitions, wireframe
descriptions, and accessibility notes for a specified screen or HUD element. The
skill follows the skeleton-first pattern: it creates the file with all section
headers immediately, then fills each section through discussion and writes each
section to disk after user approval.
The skill has no inline director gates — `/ux-review` is the separate review step.
Each section requires a "May I write section [N] to [filepath]?" ask. If a UX spec
already exists for the named screen, the skill offers to retrofit individual sections
rather than replace. Verdict is COMPLETE when all sections are written.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" language per section
- [ ] Has a next-step handoff (e.g., `/ux-review` to validate the completed spec)
---
## Director Gate Checks
None. `/ux-design` has no inline director gates. `/ux-review` is the separate
review skill invoked after this skill completes.
---
## Test Cases
### Case 1: Happy Path — New HUD spec, all sections authored and written
**Fixture:**
- No existing HUD UX spec in `design/ux/`
- Engine and rendering preferences configured
**Input:** `/ux-design hud`
**Expected behavior:**
1. Skill creates a skeleton file `design/ux/hud.md` with all section headers
2. Skill discusses and drafts each section: User Flows, Interaction States
(normal/hover/focus/disabled), Wireframe Description, Accessibility Notes
3. After each section is drafted and user confirms, skill asks "May I write
section [N] to `design/ux/hud.md`?"
4. Each section is written in sequence after approval
5. After all sections are written, verdict is COMPLETE
6. Skill suggests running `/ux-review` as the next step
**Assertions:**
- [ ] Skeleton file is created first (with empty section bodies)
- [ ] "May I write section [N]" is asked per section (not once at the end)
- [ ] All required sections are present: User Flows, Interaction States,
Wireframe Description, Accessibility Notes
- [ ] Handoff to `/ux-review` is at the end
- [ ] Verdict is COMPLETE
---
### Case 2: Existing UX Spec — Retrofit: user picks section to update
**Fixture:**
- `design/ux/hud.md` already exists with all sections populated
- User wants to update only the Accessibility Notes section
**Input:** `/ux-design hud`
**Expected behavior:**
1. Skill reads existing `design/ux/hud.md` and detects all sections are populated
2. Skill reports: "UX spec already exists for HUD — offering to retrofit"
3. Skill lists all sections and asks which to update
4. User selects Accessibility Notes
5. Skill drafts updated accessibility content and asks "May I write section
Accessibility Notes to `design/ux/hud.md`?"
6. Only that section is updated; other sections are preserved; verdict is COMPLETE
**Assertions:**
- [ ] Existing spec is detected and retrofit is offered
- [ ] User selects which section(s) to update
- [ ] Only the selected section is updated — other sections unchanged
- [ ] "May I write" is asked for the updated section
- [ ] Verdict is COMPLETE
---
### Case 3: Dependency Gap — Spec references a system with no design doc
**Fixture:**
- User is authoring a UX spec for the inventory screen
- `design/gdd/inventory.md` does not exist
**Input:** `/ux-design inventory-screen`
**Expected behavior:**
1. Skill begins authoring the inventory screen UX spec
2. During the User Flows section, skill attempts to reference inventory system rules
3. Skill detects: "No GDD found for inventory system — UX spec has a DEPENDENCY GAP"
4. The dependency gap is flagged in the spec (noted inline: "DEPENDENCY GAP: inventory GDD")
5. Skill continues authoring with placeholder notes for the missing rules
6. Verdict is COMPLETE with advisory note about the dependency gap
**Assertions:**
- [ ] DEPENDENCY GAP label appears in the spec for the missing system doc
- [ ] Skill does NOT block on the missing GDD — it continues with placeholders
- [ ] Dependency gap is also noted in the skill output (not just in the file)
- [ ] Handoff suggests both `/ux-review` and writing the missing GDD
---
### Case 4: No Argument Provided — Usage error
**Fixture:**
- No argument provided with the skill invocation
**Input:** `/ux-design`
**Expected behavior:**
1. Skill detects no screen name or argument provided
2. Skill outputs a usage error: "Screen name required. Usage: `/ux-design [screen-name]`"
3. Skill provides examples: `/ux-design hud`, `/ux-design main-menu`, `/ux-design inventory`
4. No file is created; no "May I write" is asked
**Assertions:**
- [ ] Usage error is clearly stated
- [ ] Example invocations are provided
- [ ] No file is created
- [ ] Skill does not attempt to proceed without an argument
---
### Case 5: Director Gate Check — No gate; ux-review is the separate review skill
**Fixture:**
- New screen spec with argument provided
**Input:** `/ux-design settings-menu`
**Expected behavior:**
1. Skill authors all sections of the settings menu UX spec
2. No director agents are spawned
3. No gate IDs appear in output during authoring
**Assertions:**
- [ ] No director gate is invoked during ux-design
- [ ] No gate skip messages appear
- [ ] Verdict is COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Creates skeleton file with all section headers before discussing content
- [ ] Discusses and drafts one section at a time
- [ ] Asks "May I write section [N]" after each section is approved
- [ ] Detects existing spec and offers retrofit path
- [ ] Ends with handoff to `/ux-review`
- [ ] Verdict is COMPLETE when all sections are written
---
## Coverage Notes
- Interaction state enumeration (normal/hover/focus/disabled/error) is a core
requirement of each spec; the `/ux-review` skill checks for completeness.
- Wireframe descriptions are text-only (no images); image references may be
added manually by a designer after the fact.
- Responsive layout concerns (different screen sizes) are noted as optional
content and not assertion-tested here.

View File

@@ -0,0 +1,176 @@
# Skill Test Spec: /ux-review
## Skill Summary
`/ux-review` validates an existing UX spec or HUD design document against
accessibility and interaction standards. It checks for required sections
(User Flows, Interaction States, Wireframe Description, Accessibility Notes),
completeness of interaction state definitions (hover, focus, disabled, error),
accessibility compliance (keyboard navigation, color contrast notes, screen
reader considerations), and consistency with the art bible or design system
if those documents exist.
The skill is read-only — it produces no file writes. Verdicts: APPROVED
(all checks pass), NEEDS REVISION (fixable issues found), or MAJOR REVISION
NEEDED (structural or accessibility failures). No director gates apply —
`/ux-review` IS the review gate for UX specs.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
- [ ] Does NOT contain "May I write" language (skill is read-only)
- [ ] Has a next-step handoff (e.g., back to `/ux-design` for revision, or proceed to implementation)
---
## Director Gate Checks
None. `/ux-review` is itself the review gate for UX specs. No additional director
gates are invoked within this skill.
---
## Test Cases
### Case 1: Happy Path — Complete UX spec with all required sections, APPROVED
**Fixture:**
- `design/ux/hud.md` exists with all required sections populated:
- User Flows: complete player flow diagrams
- Interaction States: normal, hover, focus, disabled, error all defined
- Wireframe Description: layout described
- Accessibility Notes: keyboard nav, contrast ratios, screen reader notes
**Input:** `/ux-review hud`
**Expected behavior:**
1. Skill reads `design/ux/hud.md`
2. Skill checks all 4 required sections — all present and non-empty
3. Skill checks interaction states — all 5 states defined
4. Skill checks accessibility notes — keyboard, contrast, and screen reader covered
5. Skill outputs: checklist of all passed checks
6. Verdict is APPROVED
**Assertions:**
- [ ] All 4 required sections are checked
- [ ] All 5 interaction states are verified present
- [ ] Verdict is APPROVED
- [ ] No files are written
---
### Case 2: Missing Accessibility Section — NEEDS REVISION
**Fixture:**
- `design/ux/hud.md` exists but the Accessibility Notes section is empty
- All other sections are fully populated
**Input:** `/ux-review hud`
**Expected behavior:**
1. Skill reads the file and checks all sections
2. Accessibility Notes section is empty — check fails
3. Skill outputs: "NEEDS REVISION — Accessibility Notes section is empty"
4. Skill lists specific items to add: keyboard navigation, color contrast ratios,
screen reader labels
5. Verdict is NEEDS REVISION
6. Handoff suggests returning to `/ux-design hud` to fill in the section
**Assertions:**
- [ ] NEEDS REVISION verdict is returned (not APPROVED or MAJOR REVISION NEEDED)
- [ ] Specific missing content items are listed
- [ ] Handoff points back to `/ux-design hud` for revision
- [ ] No files are written
---
### Case 3: Interaction States Incomplete — NEEDS REVISION
**Fixture:**
- `design/ux/settings-menu.md` exists
- Interaction States section only defines: normal and hover
- Missing: focus, disabled, error states
**Input:** `/ux-review settings-menu`
**Expected behavior:**
1. Skill reads the file and checks interaction states
2. Only 2 of 5 required states are defined
3. Skill reports: "NEEDS REVISION — Interaction states incomplete: missing focus, disabled, error"
4. Verdict is NEEDS REVISION with specific missing states named
**Assertions:**
- [ ] NEEDS REVISION verdict returned
- [ ] All 3 missing states are named explicitly in the output
- [ ] Skill does not return MAJOR REVISION NEEDED for a fixable gap
- [ ] Handoff suggests returning to `/ux-design settings-menu`
---
### Case 4: File Not Found — Error with remediation
**Fixture:**
- `design/ux/inventory-screen.md` does not exist
**Input:** `/ux-review inventory-screen`
**Expected behavior:**
1. Skill attempts to read `design/ux/inventory-screen.md` — file not found
2. Skill outputs: "UX spec not found: design/ux/inventory-screen.md"
3. Skill suggests running `/ux-design inventory-screen` to create the spec first
4. No review is performed; no verdict is issued
**Assertions:**
- [ ] Error message names the missing file with full path
- [ ] `/ux-design inventory-screen` is suggested as the remediation
- [ ] No review checklist is produced
- [ ] No verdict is issued (error state, not APPROVED/NEEDS REVISION)
---
### Case 5: Director Gate Check — No gate; ux-review is itself the review
**Fixture:**
- Valid UX spec file
**Input:** `/ux-review hud`
**Expected behavior:**
1. Skill performs the review and issues a verdict
2. No additional director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED — no gate verdict
---
## Protocol Compliance
- [ ] Checks all 4 required sections (User Flows, Interaction States, Wireframe,
Accessibility Notes)
- [ ] Checks all 5 interaction states (normal, hover, focus, disabled, error)
- [ ] Checks accessibility coverage (keyboard nav, contrast, screen reader)
- [ ] Does not write any files
- [ ] Issues specific, actionable feedback when verdict is not APPROVED
- [ ] Ends with next-step handoff to `/ux-design` for revision or implementation
---
## Coverage Notes
- MAJOR REVISION NEEDED is triggered when structural sections are entirely
absent (not just empty) or when fundamental interaction flows are missing
entirely; not tested with a separate fixture here.
- Art bible / design system consistency check (color palette alignment) is
mentioned as a capability but not separately fixture-tested.
- The case where an existing spec was written for a now-renamed screen is
not tested; the skill would review the file by path regardless of the name.

View File

@@ -0,0 +1,200 @@
# Skill Test Spec: /gate-check
## Skill Summary
`/gate-check` validates whether the project is ready to advance to the next
development phase. It checks for required artifacts, runs quality checks, asks
the user about unverifiable items, and produces a PASS/CONCERNS/FAIL verdict.
On PASS with user confirmation, it writes the new stage name to
`production/stage.txt`. It governs all 6 phase transitions and is the most
critical gate-keeping skill in the pipeline.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings (numbered Phase N or ## sections)
- [ ] Contains verdict keywords: PASS, CONCERNS, FAIL
- [ ] Contains "May I write" collaborative protocol language
- [ ] Has a next-step handoff at the end (Follow-Up Actions section)
---
## Test Cases
### Case 1: Happy Path — All Concept artifacts present, advancing to Systems Design
**Fixture:**
- `design/gdd/game-concept.md` exists, has content including all required sections
- `design/gdd/game-pillars.md` exists (or pillars defined within concept doc)
- No systems index yet (which is correct for this stage)
**Input:** `/gate-check systems-design`
**Expected behavior:**
1. Skill reads `design/gdd/game-concept.md` and verifies it has content
2. Skill checks for game pillars (in concept or separate file)
3. Skill checks quality items (core loop described, target audience identified)
4. Skill outputs structured checklist with all items marked
5. Skill presents PASS/CONCERNS/FAIL verdict
6. If PASS: skill asks "May I update `production/stage.txt` to 'Systems Design'?"
**Assertions:**
- [ ] Skill uses Glob or Read to verify `design/gdd/game-concept.md` exists before marking it checked
- [ ] Output includes a "Required Artifacts" section with check status per item
- [ ] Output includes a "Quality Checks" section with check status per item
- [ ] Output includes a "Verdict" line with one of PASS / CONCERNS / FAIL
- [ ] Skill asks about unverifiable quality items (e.g., "Has this been reviewed?") rather than assuming PASS
- [ ] Skill asks "May I write" before updating `production/stage.txt`
- [ ] Skill does NOT write `production/stage.txt` without explicit user confirmation
---
### Case 2: Failure Path — Missing required artifacts for Concept → Systems Design
**Fixture:**
- `design/gdd/game-concept.md` does NOT exist
- No game pillars document exists
- `design/gdd/` directory is empty or absent
**Input:** `/gate-check systems-design`
**Expected behavior:**
1. Skill attempts to read `design/gdd/game-concept.md` — file not found
2. Skill marks required artifact as missing (not present)
3. Skill outputs FAIL verdict
4. Skill lists blocker: "No game concept document found"
5. Skill suggests remediation: run `/brainstorm` to create one
**Assertions:**
- [ ] Verdict is FAIL (not PASS or CONCERNS) when required artifacts are missing
- [ ] Output explicitly names `design/gdd/game-concept.md` as missing
- [ ] Output includes a "Blockers" section with at least 1 item
- [ ] Output recommends `/brainstorm` as the remediation action
- [ ] Skill does NOT write `production/stage.txt` when verdict is FAIL
---
### Case 3: No Argument — Auto-detect current stage
**Fixture:**
- `production/stage.txt` contains `Concept`
- `design/gdd/game-concept.md` exists with content
- No systems index yet
**Input:** `/gate-check` (no argument)
**Expected behavior:**
1. Skill reads `production/stage.txt` to determine current stage
2. Skill determines the next gate is Concept → Systems Design
3. Skill proceeds with the Systems Design gate checks
4. Output clearly states which transition is being validated
**Assertions:**
- [ ] Skill reads `production/stage.txt` (or uses project-stage-detect heuristics) to determine current stage
- [ ] Output header names both current and target phases (e.g., "Gate Check: Concept → Systems Design")
- [ ] Skill does not ask the user which gate to check if current stage is determinable
---
### Case 4: Edge Case — Manual check items flagged correctly
**Fixture:**
- All required artifacts for Concept → Systems Design are present
- No playtest or review record exists (can't auto-verify quality checks)
**Input:** `/gate-check systems-design`
**Expected behavior:**
1. Skill verifies all artifact files exist
2. Skill encounters quality check: "Game concept reviewed (not MAJOR REVISION NEEDED)"
3. Since no review record exists, skill marks item as MANUAL CHECK NEEDED
4. Skill asks the user: "Has the game concept been reviewed for design quality?"
5. Skill waits for user input before finalizing verdict
**Assertions:**
- [ ] Items that cannot be auto-verified are marked `[?] MANUAL CHECK NEEDED` rather than assumed PASS
- [ ] Skill uses a question to the user for at least one unverifiable quality item
- [ ] Skill does not mark unverifiable items as PASS by default
---
---
### Case 5: Director Gate — lean vs full vs solo mode
**Fixture:**
- `production/session-state/review-mode.txt` exists (or equivalent state file)
- All required artifacts for the target gate are present
- `design/gdd/game-concept.md` exists
**Case 5a — full mode:**
- `review-mode.txt` contains `full`
**Input:** `/gate-check systems-design` (with full mode active)
**Expected behavior:**
1. Skill reads review mode — determines `full`
2. Skill spawns all 4 PHASE-GATE director prompts in parallel:
- CD-PHASE-GATE (creative-director)
- TD-PHASE-GATE (technical-director)
- PR-PHASE-GATE (producer)
- AD-PHASE-GATE (art-director)
3. If one director returns CONCERNS → overall gate verdict is at minimum CONCERNS
4. All 4 verdicts are collected before producing final output
**Assertions (5a):**
- [ ] Skill reads review-mode before deciding which directors to spawn
- [ ] All 4 PHASE-GATE director prompts are spawned (not just 1 or 2)
- [ ] Directors are spawned in parallel (simultaneous, not sequential)
- [ ] A CONCERNS verdict from any one director propagates to overall verdict
- [ ] Verdict is NOT auto-PASS if any director returns CONCERNS or REJECT
**Case 5b — solo mode:**
- `review-mode.txt` contains `solo`
**Input:** `/gate-check systems-design` (with solo mode active)
**Expected behavior:**
1. Skill reads review mode — determines `solo`
2. Each director is noted as skipped: "[CD-PHASE-GATE] skipped — Solo mode"
3. Gate verdict is derived from artifact/quality checks only
4. No director gates spawn
**Assertions (5b):**
- [ ] No director gates are spawned in solo mode
- [ ] Each skipped gate is explicitly noted in output: "[GATE-ID] skipped — Solo mode"
- [ ] Verdict is based on artifact and quality checks only
**Note on Case 3 correction:**
The Case 3 assertions previously stated "Skill does not ask the user which gate to check
if current stage is determinable." This is correct. However, the skill DOES use
AskUserQuestion to confirm the auto-detected transition before running full checks —
this is a confirmation step, not a gate selection. Assertions for Case 3 should not
treat this confirmation as a failure.
---
## Protocol Compliance
- [ ] Uses "May I write" before updating `production/stage.txt`
- [ ] Presents the full checklist report before asking for write approval
- [ ] Ends with a "Follow-Up Actions" section listing next steps per verdict
- [ ] Never advances the stage without explicit user confirmation
- [ ] Never auto-creates `production/stage.txt` if it doesn't exist without asking
---
## Coverage Notes
- The Production → Polish and Polish → Release gates are not covered here
because they require complex multi-artifact setups (sprint plans, playtest
data, QA sign-off); these are deferred to dedicated follow-up specs.
- The "CONCERNS" verdict path (minor gaps, not blocking) is not explicitly
tested here; it falls between Case 1 and Case 2 and follows the same pattern.
- The Vertical Slice validation block (Pre-Production → Production gate) is not
covered because it requires a playable build context that cannot be expressed
as a document fixture.

View File

@@ -0,0 +1,175 @@
# Skill Test Spec: /create-control-manifest
## Skill Summary
`/create-control-manifest` reads all Accepted ADRs from `docs/architecture/` and
generates a control manifest — a summary document that captures all architectural
constraints, required patterns, and forbidden patterns in one place. The manifest
is the reference document that story authors use when writing story files, ensuring
stories inherit the correct architectural rules without having to read all ADRs
individually.
The skill only includes Accepted ADRs; Proposed ADRs are excluded and noted. It
has no director gates. The skill asks "May I write" before writing
`docs/architecture/control-manifest.md`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: CREATED, BLOCKED
- [ ] Contains "May I write" collaborative protocol language (for control-manifest.md)
- [ ] Has a next-step handoff at the end (`/create-epics` or `/create-stories`)
- [ ] Documents that only Accepted ADRs are included (not Proposed)
---
## Director Gate Checks
No director gates — this skill spawns no director gate agents. The control
manifest is a mechanical extraction from Accepted ADRs; no creative or technical
review gate is needed.
---
## Test Cases
### Case 1: Happy Path — 4 Accepted ADRs create a correct manifest
**Fixture:**
- `docs/architecture/` contains 4 ADR files, all with `Status: Accepted`
- Each ADR has a "Required Patterns" and/or "Forbidden Patterns" section
- No existing `docs/architecture/control-manifest.md`
**Input:** `/create-control-manifest`
**Expected behavior:**
1. Skill reads all ADR files in `docs/architecture/`
2. Extracts Required Patterns, Forbidden Patterns, and key constraints from each
3. Drafts the manifest with correct section structure
4. Shows the draft manifest to the user
5. Asks "May I write `docs/architecture/control-manifest.md`?"
6. Writes the manifest after approval
**Assertions:**
- [ ] All 4 Accepted ADRs are represented in the manifest
- [ ] Manifest includes distinct sections for Required Patterns and Forbidden Patterns
- [ ] Manifest includes the source ADR number for each constraint
- [ ] "May I write" is asked before writing
- [ ] Skill does NOT write without approval
- [ ] Verdict is CREATED after writing
---
### Case 2: Failure Path — No ADRs found
**Fixture:**
- `docs/architecture/` directory exists but contains no ADR files
**Input:** `/create-control-manifest`
**Expected behavior:**
1. Skill reads `docs/architecture/` and finds no ADR files
2. Skill outputs: "No ADRs found. Run `/architecture-decision` to create ADRs before generating the control manifest."
3. Skill exits without creating any file
4. Verdict is BLOCKED
**Assertions:**
- [ ] Skill outputs a clear error when no ADRs are found
- [ ] No control manifest file is written
- [ ] Skill recommends `/architecture-decision` as the next action
- [ ] Verdict is BLOCKED (not an error crash)
---
### Case 3: Mixed ADR Statuses — Only Accepted ADRs included
**Fixture:**
- `docs/architecture/` contains 3 Accepted ADRs and 2 Proposed ADRs
**Input:** `/create-control-manifest`
**Expected behavior:**
1. Skill reads all ADR files and filters by Status: Accepted
2. Manifest is drafted from the 3 Accepted ADRs only
3. Output notes: "2 Proposed ADRs were excluded: [adr-NNN-name, adr-NNN-name]"
4. User sees which ADRs were excluded before approving the write
5. Asks "May I write `docs/architecture/control-manifest.md`?"
**Assertions:**
- [ ] Only the 3 Accepted ADRs appear in the manifest content
- [ ] Excluded Proposed ADRs are listed by name in the output
- [ ] User sees the exclusion list before approving the write
- [ ] Skill does NOT silently omit Proposed ADRs without noting them
---
### Case 4: Edge Case — Manifest already exists
**Fixture:**
- `docs/architecture/control-manifest.md` already exists (version 1, dated last week)
- `docs/architecture/` contains Accepted ADRs (some new since last manifest)
**Input:** `/create-control-manifest`
**Expected behavior:**
1. Skill detects existing manifest and reads its version number / date
2. Skill offers to regenerate: "control-manifest.md already exists (v1, [date]). Regenerate with current ADRs?"
3. If user confirms: skill drafts updated manifest, increments version number
4. Asks "May I write `docs/architecture/control-manifest.md`?" (overwrite)
5. Writes updated manifest after approval
**Assertions:**
- [ ] Skill reads and reports the existing manifest version before offering to regenerate
- [ ] User is offered a regenerate/skip choice — not auto-overwritten
- [ ] Updated manifest has an incremented version number
- [ ] "May I write" is asked before overwriting the existing file
---
### Case 5: Director Gate — No gate spawned; no review-mode.txt read
**Fixture:**
- 4 Accepted ADRs exist
- `production/session-state/review-mode.txt` exists with `full`
**Input:** `/create-control-manifest`
**Expected behavior:**
1. Skill reads ADRs and drafts manifest
2. Skill does NOT read `production/session-state/review-mode.txt`
3. No director gate agents are spawned at any point
4. Skill proceeds directly to "May I write" after drafting
5. Review mode setting has no effect on this skill's behavior
**Assertions:**
- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
- [ ] Skill does NOT read `production/session-state/review-mode.txt`
- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
- [ ] The manifest is generated from ADRs alone, with no external gate review
---
## Protocol Compliance
- [ ] Reads all ADR files before drafting manifest
- [ ] Only Accepted ADRs included — Proposed ones noted as excluded
- [ ] Manifest draft shown to user before "May I write" ask
- [ ] "May I write `docs/architecture/control-manifest.md`?" asked before writing
- [ ] No director gates — no review-mode.txt read
- [ ] Ends with next-step handoff: `/create-epics` or `/create-stories`
---
## Coverage Notes
- The exact section structure of the generated manifest (constraint tables, pattern
lists) is defined by the skill body and not re-enumerated in test assertions.
- The `version` field incrementing logic (v1 → v2) is tested via Case 4 but exact
version numbering format is not fixture-locked.
- ADR parsing (extracting Required/Forbidden Patterns) depends on consistent ADR
structure — tested implicitly via Case 1's fixture.

View File

@@ -0,0 +1,190 @@
# Skill Test Spec: /create-epics
## Skill Summary
`/create-epics` reads all approved GDDs and translates them into EPIC.md files,
one per system. Epics are organized by layer (Foundation → Core → Feature →
Presentation) and processed in priority order within each layer. Each EPIC.md
includes scope, governing ADRs, GDD requirements, engine risk level, and a
Definition of Done. The skill asks "May I write" before creating each EPIC file.
In `full` review mode, a PR-EPIC gate (producer) runs after drafting epics and
before writing any files. In `lean` or `solo` mode, PR-EPIC is skipped and noted.
Epics are written to `production/epics/[layer]/EPIC-[name].md`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: CREATED, BLOCKED
- [ ] Contains "May I write" collaborative protocol language (per-epic approval)
- [ ] Has a next-step handoff at the end (`/create-stories`)
- [ ] Documents PR-EPIC gate behavior: runs in full mode; skipped in lean/solo
---
## Director Gate Checks
In `full` mode: PR-EPIC (producer) gate runs after epics are drafted and before
any epic file is written. If PR-EPIC returns CONCERNS, epics are revised before
the "May I write" ask.
In `lean` mode: PR-EPIC is skipped. Output notes: "PR-EPIC skipped — lean mode".
In `solo` mode: PR-EPIC is skipped. Output notes: "PR-EPIC skipped — solo mode".
---
## Test Cases
### Case 1: Happy Path — Two approved GDDs create two EPIC files
**Fixture:**
- `design/gdd/systems-index.md` exists with 2 systems listed
- Both systems have approved GDDs in `design/gdd/`
- `docs/architecture/architecture.md` exists with matching modules
- At least one Accepted ADR exists for each system
- `production/session-state/review-mode.txt` contains `lean`
**Input:** `/create-epics`
**Expected behavior:**
1. Skill reads systems index and both GDDs
2. Drafts 2 EPIC definitions (layer, GDD path, ADRs, requirements, engine risk)
3. PR-EPIC gate is skipped (lean mode) — noted in output
4. For each epic: asks "May I write `production/epics/[layer]/EPIC-[name].md`?"
5. After approval: writes both EPIC files
6. Creates or updates `production/epics/index.md`
**Assertions:**
- [ ] Epic summary is shown before any write ask
- [ ] "May I write" is asked per-epic (not once for all epics together)
- [ ] Each EPIC.md contains: layer, GDD path, governing ADRs, requirements table, Definition of Done
- [ ] PR-EPIC skip is noted in output
- [ ] `production/epics/index.md` is updated after writing
- [ ] Skill does NOT write EPIC files without per-epic approval
---
### Case 2: Failure Path — No approved GDDs found
**Fixture:**
- `design/gdd/systems-index.md` exists
- No GDDs in `design/gdd/` have approved status (all are Draft or In Progress)
**Input:** `/create-epics`
**Expected behavior:**
1. Skill reads systems index and attempts to find approved GDDs
2. No approved GDDs found
3. Skill outputs: "No approved GDDs to convert. GDDs must be Approved before creating epics."
4. Skill suggests running `/design-system` and completing GDD approval first
5. Skill exits without creating any EPIC files
**Assertions:**
- [ ] Skill stops cleanly with a clear message when no approved GDDs exist
- [ ] No EPIC files are written
- [ ] Skill recommends the correct next action
- [ ] Verdict is BLOCKED
---
### Case 3: Director Gate — Full mode spawns PR-EPIC before writing
**Fixture:**
- 2 approved GDDs exist
- `production/session-state/review-mode.txt` contains `full`
**Full mode expected behavior:**
1. Skill drafts both epics
2. PR-EPIC gate spawns and reviews the epic drafts
3. If PR-EPIC returns APPROVED: "May I write" ask proceeds normally
4. Epic files are written after approval
**Assertions (full mode):**
- [ ] PR-EPIC gate appears in output as an active gate
- [ ] PR-EPIC runs before any "May I write" ask
- [ ] Epic files are NOT written before PR-EPIC completes
**Fixture (lean mode):**
- Same GDDs
- `production/session-state/review-mode.txt` contains `lean`
**Lean mode expected behavior:**
1. Epics are drafted
2. PR-EPIC is skipped — noted in output
3. "May I write" ask proceeds directly
**Assertions (lean mode):**
- [ ] "PR-EPIC skipped — lean mode" appears in output
- [ ] Skill proceeds to "May I write" without waiting for PR-EPIC
---
### Case 4: Edge Case — Epic already exists for a GDD
**Fixture:**
- `production/epics/[layer]/EPIC-[name].md` already exists for one of the approved GDDs
- The other GDD has no existing EPIC file
**Input:** `/create-epics`
**Expected behavior:**
1. Skill detects the existing EPIC file for the first system
2. Skill offers to update rather than overwrite: "EPIC-[name].md already exists. Update it, or skip?"
3. For the second system (no existing file): proceeds normally with "May I write"
**Assertions:**
- [ ] Skill detects existing EPIC files before writing
- [ ] User is offered "update" or "skip" options — not auto-overwritten
- [ ] The new system's EPIC is created normally without conflict
---
### Case 5: Director Gate — PR-EPIC returns CONCERNS
**Fixture:**
- 2 approved GDDs exist
- `production/session-state/review-mode.txt` contains `full`
- PR-EPIC gate returns CONCERNS (e.g., scope of one epic is too large)
**Input:** `/create-epics`
**Expected behavior:**
1. PR-EPIC gate spawns and returns CONCERNS with specific feedback
2. Skill surfaces the concerns to the user before any write ask
3. User is given options: revise epics, accept concerns and proceed, or stop
4. If user revises: updated epic drafts are shown before the "May I write" ask
5. Skill does NOT write epics while CONCERNS are unaddressed
**Assertions:**
- [ ] CONCERNS from PR-EPIC are shown to the user before writing
- [ ] Skill does NOT auto-write epics when CONCERNS are returned
- [ ] User is given a clear choice to revise, proceed, or stop
- [ ] Revised epic drafts are re-shown after revision before final approval
---
## Protocol Compliance
- [ ] Epic drafts shown to user before any "May I write" ask
- [ ] "May I write" asked per-epic, not once for the entire batch
- [ ] PR-EPIC gate (if active) runs before write asks — not after
- [ ] Skipped gates noted by name and mode in output
- [ ] EPIC.md content sourced only from GDDs, ADRs, and architecture docs — nothing invented
- [ ] Ends with next-step handoff: `/create-stories [epic-slug]` per created epic
---
## Coverage Notes
- Processing of Core, Feature, and Presentation layers follows the same per-epic
pattern as Foundation — layer-specific ordering is not independently tested.
- Engine risk level assignment (LOW/MEDIUM/HIGH) from governing ADRs is
validated implicitly via Case 1's fixture structure.
- The `layer: [name]` and `[system-name]` argument modes follow the same approval
pattern as the default (all systems) mode.

View File

@@ -0,0 +1,191 @@
# Skill Test Spec: /create-stories
## Skill Summary
`/create-stories` breaks a single epic into developer-ready story files. It reads
the EPIC.md, the corresponding GDD, governing ADRs, the control manifest, and the
TR registry. Each story gets structured frontmatter including: Title, Epic, Layer,
Priority, Status, TR-ID, ADR references, Acceptance Criteria, and Definition of
Done. Stories are classified by type (Logic / Integration / Visual/Feel / UI /
Config/Data) which determines the required test evidence path.
In `full` review mode, a QL-STORY-READY check runs per story after creation. In
`lean` or `solo` mode, QL-STORY-READY is skipped. The skill asks "May I write"
before writing each story file. Stories are written to
`production/epics/[layer]/story-[name].md`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED, NEEDS WORK
- [ ] Contains "May I write" collaborative protocol language (per-story approval)
- [ ] Has a next-step handoff at the end (`/story-readiness`, `/dev-story`)
- [ ] Documents story Status: Blocked when governing ADR is Proposed
- [ ] Documents QL-STORY-READY gate: active in full mode, skipped in lean/solo
---
## Director Gate Checks
In `full` mode: QL-STORY-READY check runs per story after creation. Stories that
fail the check are noted as NEEDS WORK before the "May I write" ask.
In `lean` mode: QL-STORY-READY is skipped. Output notes:
"QL-STORY-READY skipped — lean mode" per story.
In `solo` mode: QL-STORY-READY is skipped with equivalent notes.
---
## Test Cases
### Case 1: Happy Path — Epic with 3 stories, all ADRs Accepted
**Fixture:**
- `production/epics/[layer]/EPIC-[name].md` exists with 3 GDD requirements
- Corresponding GDD exists with matching acceptance criteria
- All governing ADRs have `Status: Accepted`
- `docs/architecture/control-manifest.md` exists
- `docs/architecture/tr-registry.yaml` has TR-IDs for all 3 requirements
- `production/session-state/review-mode.txt` contains `lean`
**Input:** `/create-stories [epic-name]`
**Expected behavior:**
1. Skill reads EPIC.md, GDD, governing ADRs, control manifest, and TR registry
2. Classifies each requirement into a story type (Logic / Integration / Visual/Feel / UI / Config/Data)
3. Drafts 3 story files with correct frontmatter schema
4. QL-STORY-READY is skipped (lean mode) — noted in output
5. Asks "May I write" before writing each story file
6. Writes all 3 story files after approval
**Assertions:**
- [ ] Each story's frontmatter contains: Title, Epic, Layer, Priority, Status, TR-ID, ADR reference, Acceptance Criteria, DoD
- [ ] Story types are correctly classified (at least one Logic type in fixture)
- [ ] "May I write" is asked per story (not once for the entire batch)
- [ ] QL-STORY-READY skip is noted in output
- [ ] All 3 story files are written with correct naming: `story-[name].md`
- [ ] Skill does NOT start implementation
---
### Case 2: Failure Path — No epic file found
**Fixture:**
- The epic path provided does not exist in `production/epics/`
**Input:** `/create-stories nonexistent-epic`
**Expected behavior:**
1. Skill attempts to read the EPIC.md file
2. File not found
3. Skill outputs a clear error with the path it searched
4. Skill suggests checking `production/epics/` or running `/create-epics` first
5. No story files are created
**Assertions:**
- [ ] Skill outputs a clear error naming the missing file path
- [ ] No story files are written
- [ ] Skill recommends the correct next action (`/create-epics`)
- [ ] Skill does NOT create stories without a valid EPIC.md
---
### Case 3: Blocked Story — ADR is Proposed
**Fixture:**
- EPIC.md exists with 2 requirements
- Requirement 1 is covered by an Accepted ADR
- Requirement 2 is covered by an ADR with `Status: Proposed`
**Input:** `/create-stories [epic-name]`
**Expected behavior:**
1. Skill reads the ADR for Requirement 2 and finds Status: Proposed
2. Story for Requirement 2 is drafted with `Status: Blocked`
3. Blocking note references the specific ADR: "BLOCKED: ADR-NNN is Proposed"
4. Story for Requirement 1 is drafted normally with `Status: Ready`
5. Both stories are shown in the draft — user asked "May I write" for both
**Assertions:**
- [ ] Story 2 has `Status: Blocked` in its frontmatter
- [ ] Blocking note names the specific ADR number and recommends `/architecture-decision`
- [ ] Story 1 has `Status: Ready` — blocked status does not affect non-blocked stories
- [ ] Blocked status is shown in the draft preview before writing
- [ ] Both story files are written (blocked stories are still written — just flagged)
---
### Case 4: Edge Case — No argument provided
**Fixture:**
- `production/epics/` directory exists with ≥2 epic subdirectories
**Input:** `/create-stories` (no argument)
**Expected behavior:**
1. Skill detects no argument is provided
2. Outputs a usage error: "No epic specified. Usage: /create-stories [epic-name]"
3. Skill lists available epics from `production/epics/`
4. No story files are created
**Assertions:**
- [ ] Skill outputs a usage error when no argument is given
- [ ] Skill lists available epics to help the user choose
- [ ] No story files are written
- [ ] Skill does NOT silently pick an epic without user input
---
### Case 5: Director Gate — Full mode runs QL-STORY-READY; stories failing noted as NEEDS WORK
**Fixture:**
- EPIC.md exists with 2 requirements
- Both governing ADRs are Accepted
- `production/session-state/review-mode.txt` contains `full`
- QL-STORY-READY check finds one story has ambiguous acceptance criteria
**Input:** `/create-stories [epic-name]`
**Expected behavior:**
1. Both stories are drafted
2. QL-STORY-READY check runs for each story
3. Story 1 passes QL-STORY-READY
4. Story 2 fails QL-STORY-READY — noted as NEEDS WORK with specific feedback
5. Both stories are shown to user with pass/fail status before "May I write"
6. User can proceed (story written as-is with NEEDS WORK note) or revise first
**Assertions:**
- [ ] QL-STORY-READY results appear per story in the output
- [ ] Story 2 is flagged as NEEDS WORK with the specific failing criteria
- [ ] Story 1 shows as passing QL-STORY-READY
- [ ] User is given the choice to proceed or revise before writing
- [ ] Skill does NOT auto-block writing of stories that fail QL-STORY-READY without user input
---
## Protocol Compliance
- [ ] All context (EPIC, GDD, ADRs, manifest, TR registry) loaded before drafting stories
- [ ] Story drafts shown in full before any "May I write" ask
- [ ] "May I write" asked per story (not once for the entire batch)
- [ ] Blocked stories flagged before write approval — not discovered after writing
- [ ] TR-IDs reference the registry — requirement text is not embedded inline in story files
- [ ] Control manifest rules quoted per-story from the manifest, not invented
- [ ] Ends with next-step handoff: `/story-readiness``/dev-story`
---
## Coverage Notes
- Integration story test evidence (playtest doc alternative) follows the same
approval pattern as Logic stories — not independently fixture-tested.
- Story ordering (foundational first, UI last) is validated implicitly via
Case 1's multi-story fixture.
- The story sizing rule (splitting large requirement groups) is not tested here
— it is addressed in the `/create-stories` skill's internal logic.

View File

@@ -0,0 +1,205 @@
# Skill Test Spec: /dev-story
## Skill Summary
`/dev-story` reads a story file, loads all required context (referenced ADR,
TR-ID from the registry, control manifest, engine preferences), implements the
story, verifies that all acceptance criteria are met, and marks the story
Complete. The skill routes implementation to the correct specialist agent based
on the engine and file type — it does not write source code directly.
In `full` review mode, an LP-CODE-REVIEW gate runs before marking the story
Complete. In `lean` or `solo` mode, LP-CODE-REVIEW is skipped and the story is
marked Complete after the user confirms all criteria are met. The skill asks
"May I write" before updating story status and before writing code files.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED, IN PROGRESS, NEEDS CHANGES
- [ ] Contains "May I write" collaborative protocol language (story status + code files)
- [ ] Has a next-step handoff at the end (`/story-done`)
- [ ] Documents LP-CODE-REVIEW gate: active in full mode, skipped in lean/solo
- [ ] Notes that implementation is delegated to specialist agents (not done directly)
---
## Director Gate Checks
In `full` mode: LP-CODE-REVIEW gate runs after implementation is complete and all
criteria are verified, before marking the story Complete.
In `lean` mode: LP-CODE-REVIEW is skipped. Output notes:
"LP-CODE-REVIEW skipped — lean mode". Story is marked Complete after user confirms.
In `solo` mode: LP-CODE-REVIEW is skipped with equivalent notes.
---
## Test Cases
### Case 1: Happy Path — Story implemented and marked Complete (full mode)
**Fixture:**
- A story file exists at `production/epics/[layer]/story-[name].md` with:
- `Status: Ready`
- A TR-ID referencing a registered requirement
- At least 2 Given-When-Then acceptance criteria
- A test evidence path
- Referenced ADR has `Status: Accepted`
- `docs/architecture/control-manifest.md` exists
- `.claude/docs/technical-preferences.md` has engine and language configured
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/dev-story production/epics/[layer]/story-[name].md`
**Expected behavior:**
1. Skill reads the story file and all referenced context
2. Skill verifies the ADR is Accepted (no block)
3. Skill routes implementation to the correct specialist agent
4. All acceptance criteria are verified as met
5. LP-CODE-REVIEW gate spawns and returns APPROVED
6. Skill asks "May I update story status to Complete?"
7. Story status is updated to Complete
**Assertions:**
- [ ] Skill reads story before spawning any agent
- [ ] ADR status is checked before implementation begins
- [ ] Implementation is delegated to a specialist agent (not done inline)
- [ ] All acceptance criteria are confirmed before LP-CODE-REVIEW
- [ ] LP-CODE-REVIEW appears in output as a completed gate
- [ ] Story status is updated to Complete only after gate approval and user consent
- [ ] Test file is written as part of implementation (not deferred)
---
### Case 2: Failure Path — Referenced ADR is Proposed
**Fixture:**
- A story file exists with `Status: Ready`
- The story's TR-ID points to a requirement covered by an ADR with `Status: Proposed`
**Input:** `/dev-story production/epics/[layer]/story-[name].md`
**Expected behavior:**
1. Skill reads the story file
2. Skill resolves the TR-ID and reads the governing ADR
3. ADR status is Proposed — skill outputs a BLOCKED message
4. Skill names the specific ADR blocking the story
5. Skill recommends running `/architecture-decision` to advance the ADR
6. Implementation does NOT begin
**Assertions:**
- [ ] Skill does NOT begin implementation with a Proposed ADR
- [ ] BLOCKED message names the specific ADR number and title
- [ ] Skill recommends `/architecture-decision` as the next action
- [ ] Story status remains unchanged (not set to In Progress or Complete)
---
### Case 3: Ambiguous Acceptance Criteria — Skill asks for clarification
**Fixture:**
- A story file exists with `Status: Ready`
- Referenced ADR is Accepted
- One acceptance criterion is ambiguous (not Given-When-Then; uses subjective language like "feels responsive")
**Input:** `/dev-story production/epics/[layer]/story-[name].md`
**Expected behavior:**
1. Skill reads the story and identifies the ambiguous criterion
2. Before routing to the specialist, skill asks the user to clarify the criterion
3. User provides a concrete, testable restatement
4. Skill proceeds with implementation using the clarified criterion
5. Skill does NOT guess at the intended behavior
**Assertions:**
- [ ] Skill surfaces the ambiguous criterion before implementation starts
- [ ] Skill asks for user clarification (not auto-interpretation)
- [ ] Implementation begins only after clarification is provided
- [ ] Clarified criterion is used in the test (not the original vague version)
---
### Case 4: Edge Case — No argument; reads from session state
**Fixture:**
- No argument is provided
- `production/session-state/active.md` references an active story file
- That story file exists with `Status: In Progress`
**Input:** `/dev-story` (no argument)
**Expected behavior:**
1. Skill detects no argument is provided
2. Skill reads `production/session-state/active.md`
3. Skill finds the active story reference
4. Skill confirms with user: "Continuing work on [story title] — is that correct?"
5. After confirmation, skill proceeds with that story
**Assertions:**
- [ ] Skill reads session state when no argument is provided
- [ ] Skill confirms the active story with the user before proceeding
- [ ] Skill does NOT silently assume the active story without confirmation
- [ ] If session state has no active story, skill asks which story to implement
---
### Case 5: Director Gate — LP-CODE-REVIEW returns NEEDS CHANGES; lean mode skips gate
**Fixture (full mode):**
- Story is implemented and all criteria appear met
- `production/session-state/review-mode.txt` contains `full`
- LP-CODE-REVIEW gate returns NEEDS CHANGES with specific feedback
**Full mode expected behavior:**
1. LP-CODE-REVIEW gate spawns after implementation
2. Gate returns NEEDS CHANGES with 2 specific issues
3. Story status remains In Progress — NOT marked Complete
4. User is shown the gate feedback and asked how to proceed
**Assertions (full mode):**
- [ ] Story is NOT marked Complete when LP-CODE-REVIEW returns NEEDS CHANGES
- [ ] Gate feedback is shown to the user verbatim
- [ ] Story status stays In Progress until issues are resolved and gate passes
**Fixture (lean mode):**
- Same story, `production/session-state/review-mode.txt` contains `lean`
**Lean mode expected behavior:**
1. Implementation completes
2. LP-CODE-REVIEW gate is skipped — noted in output
3. User is asked to confirm all criteria are met
4. Story is marked Complete after user confirmation
**Assertions (lean mode):**
- [ ] "LP-CODE-REVIEW skipped — lean mode" appears in output
- [ ] Story is marked Complete after user confirms criteria (no gate required)
- [ ] Skill does NOT block on a gate that is skipped
---
## Protocol Compliance
- [ ] Does NOT write source code directly — delegates to specialist agents
- [ ] Reads all context (story, TR-ID, ADR, manifest, engine prefs) before implementation
- [ ] "May I write" asked before updating story status and before writing code files
- [ ] Skipped gates noted by name and mode in output
- [ ] Updates `production/session-state/active.md` after story completion
- [ ] Ends with next-step handoff: `/story-done`
---
## Coverage Notes
- Engine routing logic (Godot vs Unity vs Unreal) is not tested per engine —
the routing pattern is consistent; engine selection is a config fact.
- Visual/Feel and UI story types (no automated test required) have different
evidence requirements and are not covered in these cases.
- Integration story type follows the same pattern as Logic but with a different
evidence path — not independently fixture-tested.

View File

@@ -0,0 +1,196 @@
# Skill Test Spec: /map-systems
## Skill Summary
`/map-systems` decomposes a game concept into a systems index. It reads the
approved game concept and pillars, enumerates both explicit and implicit systems,
maps dependencies between systems, assigns priority tiers (MVP / Vertical Slice /
Alpha / Full Vision), and organizes systems into a layered design order
(Foundation → Core → Feature → Presentation). The output is written to
`design/systems-index.md` after user approval.
This skill is required between game concept approval and per-system GDD creation
— it is a mandatory gate in the pipeline. In `full` review mode, CD-SYSTEMS
(creative-director) and TD-SYSTEM-BOUNDARY (technical-director) spawn in parallel
after the decomposition is drafted. In `lean` or `solo` mode, both gates are
skipped. The skill writes to `design/systems-index.md`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "May I write" collaborative protocol language (for systems-index.md)
- [ ] Has a next-step handoff at the end (`/design-system`)
- [ ] Documents gate behavior: CD-SYSTEMS + TD-SYSTEM-BOUNDARY in parallel in full mode
---
## Director Gate Checks
In `full` mode: CD-SYSTEMS (creative-director) and TD-SYSTEM-BOUNDARY
(technical-director) spawn in parallel after the systems decomposition is drafted
and before `design/systems-index.md` is written.
In `lean` mode: both gates are skipped. Output notes:
"CD-SYSTEMS skipped — lean mode" and "TD-SYSTEM-BOUNDARY skipped — lean mode".
In `solo` mode: both gates are skipped with equivalent notes.
---
## Test Cases
### Case 1: Happy Path — Game concept exists, 5-8 systems identified
**Fixture:**
- `design/gdd/game-concept.md` exists with Core Mechanics and MVP Definition sections
- `design/gdd/game-pillars.md` exists with ≥1 pillar defined
- No `design/systems-index.md` exists yet
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/map-systems`
**Expected behavior:**
1. Skill reads game-concept.md and game-pillars.md
2. Identifies 5-8 systems (explicit + implicit)
3. Maps dependencies between systems and assigns layers
4. CD-SYSTEMS and TD-SYSTEM-BOUNDARY spawn in parallel and return APPROVED
5. Asks "May I write `design/systems-index.md`?"
6. Writes systems-index.md after approval
7. Updates `production/session-state/active.md`
**Assertions:**
- [ ] Between 5 and 8 systems are identified (not fewer, not more without explanation)
- [ ] CD-SYSTEMS and TD-SYSTEM-BOUNDARY spawn in parallel (not sequentially)
- [ ] Both gates complete before the "May I write" ask
- [ ] "May I write `design/systems-index.md`?" is asked before writing
- [ ] systems-index.md is NOT written without approval
- [ ] Session state is updated after writing
- [ ] Verdict is COMPLETE
---
### Case 2: Failure Path — No game concept found
**Fixture:**
- `design/gdd/game-concept.md` does NOT exist
- `design/gdd/` directory may be empty or absent
**Input:** `/map-systems`
**Expected behavior:**
1. Skill attempts to read `design/gdd/game-concept.md`
2. File not found
3. Skill outputs: "No game concept found. Run `/brainstorm` to create one, then return to `/map-systems`."
4. Skill exits without creating systems-index.md
**Assertions:**
- [ ] Skill outputs a clear error naming the missing file path
- [ ] Skill recommends `/brainstorm` as the next action
- [ ] No systems-index.md is created
- [ ] Verdict is BLOCKED
---
### Case 3: Director Gate — CD-SYSTEMS returns CONCERNS (missing core system)
**Fixture:**
- Game concept exists
- `production/session-state/review-mode.txt` contains `full`
- CD-SYSTEMS gate returns CONCERNS: "The [core-system] is implied by the concept but not identified"
**Input:** `/map-systems`
**Expected behavior:**
1. Systems are drafted (5-8 initial systems identified)
2. CD-SYSTEMS gate returns CONCERNS naming the missing core system
3. TD-SYSTEM-BOUNDARY returns APPROVED
4. Skill surfaces CD-SYSTEMS concerns to user
5. User is asked: revise systems list to add the missing system, or proceed as-is
6. If revised: updated systems list shown before "May I write" ask
**Assertions:**
- [ ] CD-SYSTEMS concerns are shown to the user before writing
- [ ] Skill does NOT auto-write systems-index.md while CONCERNS are unresolved
- [ ] User is given the option to revise or proceed
- [ ] Revised systems list is re-shown after revision before final "May I write"
---
### Case 4: Edge Case — systems-index.md already exists
**Fixture:**
- `design/gdd/game-concept.md` exists
- `design/systems-index.md` already exists with N systems
**Input:** `/map-systems`
**Expected behavior:**
1. Skill reads the existing systems-index.md and presents its current state
2. Skill asks: "systems-index.md already exists with [N] systems. Update with new systems, or review and revise priorities?"
3. User chooses an action
4. Skill does NOT silently overwrite the existing index
**Assertions:**
- [ ] Skill detects and reads the existing systems-index.md before proceeding
- [ ] User is offered update/review options — not auto-overwritten
- [ ] Existing system count is presented to the user
- [ ] Skill does NOT proceed with a full re-decomposition without user choosing to do so
---
### Case 5: Director Gate — Lean mode and solo mode both skip gates, noted
**Fixture (lean mode):**
- Game concept exists
- `production/session-state/review-mode.txt` contains `lean`
**Lean mode expected behavior:**
1. Systems are decomposed and drafted
2. Both CD-SYSTEMS and TD-SYSTEM-BOUNDARY are skipped
3. Output notes: "CD-SYSTEMS skipped — lean mode" and "TD-SYSTEM-BOUNDARY skipped — lean mode"
4. "May I write" ask proceeds directly
**Assertions (lean mode):**
- [ ] Both gate skip notes appear in output
- [ ] Skill proceeds to "May I write" without gate approval
- [ ] systems-index.md is written after user approval
**Fixture (solo mode):**
- Same game concept, `production/session-state/review-mode.txt` contains `solo`
**Solo mode expected behavior:**
1. Same decomposition workflow
2. Both gates skipped — noted in output with "solo mode"
3. "May I write" ask proceeds
**Assertions (solo mode):**
- [ ] Both skip notes appear with "solo mode" label
- [ ] Behavior is otherwise identical to lean mode for this skill
---
## Protocol Compliance
- [ ] Reads game-concept.md and game-pillars.md before any decomposition
- [ ] "May I write `design/systems-index.md`?" asked before writing
- [ ] systems-index.md is NOT written without user approval
- [ ] CD-SYSTEMS and TD-SYSTEM-BOUNDARY spawn in parallel in full mode
- [ ] Skipped gates noted by name and mode in lean/solo output
- [ ] Ends with next-step handoff: `/design-system [next-system]`
---
## Coverage Notes
- Circular dependency detection (System A depends on System B which depends on A)
is part of the dependency mapping phase — not independently fixture-tested here.
- Priority tier assignment (MVP heuristics) is evaluated as part of the Case 1
collaborative workflow rather than independently.
- The `next` argument mode (handing off the highest-priority undesigned system to
`/design-system`) is not tested here — it is a post-index-creation convenience.

View File

@@ -0,0 +1,175 @@
# Skill Test Spec: /propagate-design-change
## Skill Summary
`/propagate-design-change` handles GDD revision cascades. When a GDD is updated,
the skill traces all downstream artifacts that reference it: ADRs, TR-registry
entries, stories, and epics. It produces a structured impact report showing what
needs to change and why. The skill does NOT automatically apply changes — it
proposes edits for each affected artifact and asks "May I write" per artifact
before making any modification.
The skill is read-only during analysis and write-gated per artifact during the
update phase. It has no director gates — the analysis itself is mechanical
tracing, not a creative review.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED, NO IMPACT
- [ ] Contains "May I write" collaborative protocol language (per-artifact approval)
- [ ] Has a next-step handoff at the end
- [ ] Documents that changes are proposed, not applied automatically
---
## Director Gate Checks
No director gates — this skill spawns no director gate agents during analysis.
The impact report is a mechanical tracing operation; no creative or technical
director review is required at the analysis stage.
---
## Test Cases
### Case 1: Happy Path — GDD revision affects 2 stories and 1 epic
**Fixture:**
- `design/gdd/[system].md` exists and has been recently revised (git diff shows changes)
- `production/epics/[layer]/EPIC-[system].md` references this GDD
- 2 story files reference TR-IDs from this GDD
- The changed GDD section affects the acceptance criteria of both stories
**Input:** `/propagate-design-change design/gdd/[system].md`
**Expected behavior:**
1. Skill reads the revised GDD and identifies what changed (git diff or content comparison)
2. Skill scans ADRs, TR-registry, epics, and stories for references to this GDD
3. Skill produces an impact report: 1 epic affected, 2 stories affected
4. Skill shows the proposed change for each artifact
5. For each artifact: asks "May I update [filepath]?" separately
6. Applies changes only after per-artifact approval
**Assertions:**
- [ ] Impact report identifies all 3 affected artifacts (1 epic + 2 stories)
- [ ] Each affected artifact's proposed change is shown before asking to write
- [ ] "May I write" is asked per artifact (not once for all artifacts)
- [ ] Skill does NOT apply any changes without per-artifact approval
- [ ] Verdict is COMPLETE after all approved changes are applied
---
### Case 2: No Impact — Changed GDD has no downstream references
**Fixture:**
- `design/gdd/[system].md` exists and has been revised
- No ADRs, stories, or epics reference this GDD's TR-IDs or GDD path
**Input:** `/propagate-design-change design/gdd/[system].md`
**Expected behavior:**
1. Skill reads the revised GDD
2. Skill scans all ADRs, stories, and epics for references
3. No references found
4. Skill outputs: "No downstream impact found for [system].md — no artifacts reference this GDD."
5. No write operations are performed
**Assertions:**
- [ ] Skill outputs the "No downstream impact found" message
- [ ] Verdict is NO IMPACT
- [ ] No "May I write" asks are issued (nothing to update)
- [ ] Skill does NOT error or crash when no references are found
---
### Case 3: In-Progress Story Warning — Referenced story is currently being developed
**Fixture:**
- A story referencing this GDD has `Status: In Progress`
- The developer has already started implementing this story
**Input:** `/propagate-design-change design/gdd/[system].md`
**Expected behavior:**
1. Skill identifies the In Progress story as an affected artifact
2. Skill outputs an elevated warning: "CAUTION: [story-file] is currently In Progress — a developer may be working on this. Coordinate before updating."
3. The warning appears in the impact report before the "May I write" ask for that story
4. User can still approve or skip the update for that story
**Assertions:**
- [ ] In Progress story is flagged with an elevated warning (distinct from regular affected-artifact entries)
- [ ] Warning appears before the "May I write" ask for that story
- [ ] Skill still offers to update the story — the warning does not block the option
- [ ] Other (non-In-Progress) artifacts are not affected by this warning
---
### Case 4: Edge Case — No argument provided
**Fixture:**
- Multiple GDDs exist in `design/gdd/`
**Input:** `/propagate-design-change` (no argument)
**Expected behavior:**
1. Skill detects no argument is provided
2. Skill outputs a usage error: "No GDD specified. Usage: /propagate-design-change design/gdd/[system].md"
3. Skill lists recently modified GDDs as suggestions (git log)
4. No analysis is performed
**Assertions:**
- [ ] Skill outputs a usage error when no argument is given
- [ ] Usage example is shown with the correct path format
- [ ] No impact analysis is performed without a target GDD
- [ ] Skill does NOT silently pick a GDD without user input
---
### Case 5: Director Gate — No gate spawned regardless of review mode
**Fixture:**
- A GDD has been revised with downstream references
- `production/session-state/review-mode.txt` exists with `full`
**Input:** `/propagate-design-change design/gdd/[system].md`
**Expected behavior:**
1. Skill reads the GDD and traces downstream references
2. Skill does NOT read `production/session-state/review-mode.txt`
3. No director gate agents are spawned at any point
4. Impact report is produced and per-artifact approval proceeds normally
**Assertions:**
- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
- [ ] Skill does NOT read `production/session-state/review-mode.txt`
- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
- [ ] Review mode has no effect on this skill's behavior
---
## Protocol Compliance
- [ ] Reads revised GDD and all potentially affected artifacts before producing impact report
- [ ] Impact report shown in full before any "May I write" ask
- [ ] "May I write" asked per artifact — never for the entire set at once
- [ ] In Progress stories flagged with elevated warning before their approval ask
- [ ] No director gates — no review-mode.txt read
- [ ] Ends with next-step handoff appropriate to verdict (COMPLETE or NO IMPACT)
---
## Coverage Notes
- ADR impact (when a GDD change requires an ADR update or new ADR) follows the
same per-artifact approval pattern as story/epic updates — not independently
fixture-tested.
- TR-registry impact (when changed GDD requires new or updated TR-IDs) is part
of the analysis phase but not independently fixture-tested.
- The git diff comparison method (detecting what changed in the GDD) is a runtime
concern — fixtures use pre-arranged content differences.

View File

@@ -0,0 +1,209 @@
# Skill Test Spec: /story-done
## Skill Summary
`/story-done` closes the loop between design and implementation. Run at the
end of implementing a story, it reads the story file and verifies each
acceptance criterion against the implementation. It checks for GDD and ADR
deviations, prompts a code review, updates the story status to `Complete`,
logs any tech debt, and surfaces the next ready story from the sprint. It
produces a COMPLETE / COMPLETE WITH NOTES / BLOCKED verdict and writes to
the story file and optionally to `docs/tech-debt-register.md`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥5 phase headings (complex skill warranting `context: fork` if applicable)
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "May I write" collaborative protocol language (writes to story file and tech-debt register)
- [ ] Has a next-step handoff (surfaces next story from sprint)
---
## Test Cases
### Case 1: Happy Path — All acceptance criteria met, no deviations
**Fixture:**
- Story file at `production/epics/core/story-light-pickup.md` with:
- 3 acceptance criteria, all implemented as described
- `TR-ID: TR-light-001` referencing a GDD requirement
- `ADR: docs/architecture/adr-003-inventory.md` (Accepted)
- `Status: In Progress`
- Implementation files listed in story exist in `src/`
- GDD requirement text at TR-light-001 matches how the feature was implemented
- ADR guidance was followed (no deviations)
**Input:** `/story-done production/epics/core/story-light-pickup.md`
**Expected behavior:**
1. Skill reads the story file and extracts all key fields
2. Skill reads the GDD requirement fresh from `tr-registry.yaml` (not from story's quoted text)
3. Skill reads the referenced ADR to understand implementation constraints
4. Skill evaluates each acceptance criterion (auto where possible, manual prompt where not)
5. Skill checks for GDD requirement deviations
6. Skill checks for ADR guideline deviations
7. Skill prompts user: "Please provide the code review outcome for this story"
8. Skill presents COMPLETE verdict
9. Skill asks "May I update story Status to Complete and add Completion Notes?"
10. If yes: skill updates the story file
11. Skill surfaces the next `Ready for Dev` story from the sprint
**Assertions:**
- [ ] Skill reads `docs/architecture/tr-registry.yaml` for TR-ID requirement text (not just story)
- [ ] Skill reads the referenced ADR file (not just the story reference)
- [ ] Each acceptance criterion is listed with VERIFIED / DEFERRED / FAILED status
- [ ] Skill prompts the user for code review outcome (does not skip this step)
- [ ] Verdict is COMPLETE when all criteria are verified and no deviations exist
- [ ] Skill asks "May I write" before updating the story file
- [ ] Skill does NOT auto-update story status without user confirmation
- [ ] After completion, skill surfaces the next ready story from `production/sprints/`
---
### Case 2: Blocked Path — Acceptance criterion cannot be verified
**Fixture:**
- Story file has an acceptance criterion: "Player sees correct animation on pickup"
- No automated test for this criterion exists
- Manual verification has not been performed
- All other criteria are met
**Input:** `/story-done production/epics/core/story-light-pickup.md`
**Expected behavior:**
1. Skill processes all acceptance criteria
2. Reaches the animation criterion — cannot auto-verify
3. Skill asks the user: "Acceptance criterion 'Player sees correct animation on
pickup' cannot be auto-verified. Has this been manually tested?"
4. If user says No: criterion is marked DEFERRED, verdict becomes COMPLETE WITH NOTES
5. Skill records the deferred criterion in completion notes
6. Asks "May I write updated story with deferred criterion noted?"
**Assertions:**
- [ ] Skill asks the user about unverifiable criteria rather than assuming PASS
- [ ] Deferred criteria result in COMPLETE WITH NOTES (not COMPLETE or BLOCKED)
- [ ] The deferred criterion is explicitly named in the completion notes
- [ ] Skill still asks "May I write" before updating the story file
---
### Case 3: Blocked Path — GDD deviation detected
**Fixture:**
- Story TR-ID points to requirement: "Player can carry max 3 light sources"
- Implementation in `src/` uses a variable `MAX_CARRIED_LIGHTS = 5`
- This is a deliberate deviation from the GDD
**Input:** `/story-done production/epics/core/story-light-pickup.md`
**Expected behavior:**
1. Skill reads the GDD requirement text (max 3)
2. Skill detects discrepancy between requirement and implementation value (5)
3. Skill flags this as a GDD deviation and asks the user to classify it:
- INTENTIONAL: document the deviation and reason
- ERROR: implementation must be fixed before story can be marked Complete
- OUT OF SCOPE: requirement changed and GDD needs updating
4. If INTENTIONAL: skill records deviation in completion notes, verdict is COMPLETE WITH NOTES
5. If ERROR: verdict is BLOCKED until implementation is corrected
**Assertions:**
- [ ] Skill detects the mismatch between GDD requirement and implementation value
- [ ] Skill asks the user to classify the deviation (not auto-assumes either way)
- [ ] INTENTIONAL deviation → COMPLETE WITH NOTES (not BLOCKED)
- [ ] ERROR deviation → BLOCKED verdict until fixed
- [ ] Detected deviations are recorded in completion notes or tech debt register
---
### Case 4: Edge Case — No argument, auto-detect current story
**Fixture:**
- `production/session-state/active.md` contains a reference to
`production/epics/core/story-oxygen-drain.md` as the active story
- That story file exists with `Status: In Progress`
**Input:** `/story-done` (no argument)
**Expected behavior:**
1. Skill reads `production/session-state/active.md`
2. Skill finds the active story reference
3. Skill reads that story file and proceeds normally
4. Output confirms which story was auto-detected
**Assertions:**
- [ ] Skill reads `production/session-state/active.md` when no argument is given
- [ ] Skill identifies and confirms the auto-detected story before proceeding
- [ ] If no story is found in session state, skill asks the user to provide a path
---
---
### Case 5: Director Gate — LP-CODE-REVIEW behavior across review modes
**Fixture:**
- Story file at `production/epics/core/story-light-pickup.md`
- All acceptance criteria verified, no GDD deviations
- `production/session-state/review-mode.txt` exists
**Case 5a — full mode:**
- `review-mode.txt` contains `full`
**Input:** `/story-done production/epics/core/story-light-pickup.md` (full mode)
**Expected behavior:**
1. Skill reads review mode — determines `full`
2. After implementation verification, skill invokes LP-CODE-REVIEW gate
3. Lead programmer reviews the implementation
4. If LP verdict is NEEDS CHANGES → story cannot be marked Complete
5. If LP verdict is APPROVED → skill proceeds to mark story Complete
**Assertions (5a):**
- [ ] Skill reads review mode before deciding whether to invoke LP-CODE-REVIEW
- [ ] LP-CODE-REVIEW gate is invoked in full mode after implementation check
- [ ] An LP NEEDS CHANGES verdict prevents story from being marked Complete
- [ ] Gate result is noted in output: "Gate: LP-CODE-REVIEW — [result]"
- [ ] Skill still asks "May I write" before updating story status even if LP approved
**Case 5b — lean or solo mode:**
- `review-mode.txt` contains `lean` or `solo`
**Expected behavior:**
1. Skill reads review mode — determines `lean` or `solo`
2. LP-CODE-REVIEW gate is SKIPPED
3. Output notes the skip: "[LP-CODE-REVIEW] skipped — Lean/Solo mode"
4. Story completion proceeds based on acceptance criteria check only
**Assertions (5b):**
- [ ] LP-CODE-REVIEW gate does NOT spawn in lean or solo mode
- [ ] Skip is explicitly noted in output
- [ ] Skill still requires "May I write" approval before marking story Complete
---
## Protocol Compliance
- [ ] Uses "May I write" before updating the story file
- [ ] Uses "May I write" before adding entries to `docs/tech-debt-register.md`
- [ ] Presents complete findings (criteria check, deviation check) before asking approval
- [ ] Ends by surfacing the next ready story from the sprint plan
- [ ] Does not mark a story Complete if any criteria are in ERROR state
- [ ] Does not skip the code review prompt
---
## Coverage Notes
- The full 8-phase flow of the skill is exercised across Cases 1-3; not all
edge cases within each phase are covered.
- Tech debt logging (deferred items written to `docs/tech-debt-register.md`)
is mentioned in Case 2 but not the primary assertion focus; dedicated
coverage deferred.
- The `sprint-status.yaml` update (Phase 7 in the skill) is implied by Case 1
but not the primary assertion; assumed to follow the same "May I write" pattern.
- Stories with multiple TR-IDs or multiple ADRs are not explicitly tested.

View File

@@ -0,0 +1,195 @@
# Skill Test Spec: /story-readiness
## Skill Summary
`/story-readiness` validates that a story file is ready for a developer to
pick up and implement. It checks four dimensions: Design (embedded GDD
requirements), Architecture (ADR references and status), Scope (clear
boundaries and DoD), and Definition of Done (testable criteria). It produces
a READY / NEEDS WORK / BLOCKED verdict. It is a read-only skill and runs
before any developer picks up a story.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings or numbered check sections
- [ ] Contains verdict keywords: READY, NEEDS WORK, BLOCKED
- [ ] Does NOT require "May I write" language (read-only skill)
- [ ] Has a next-step handoff (what to do after verdict)
---
## Test Cases
### Case 1: Happy Path — Fully ready story
**Fixture:**
- Story file exists at `production/epics/core/story-light-pickup.md`
- Story contains:
- `TR-ID: TR-light-001` (GDD requirement reference)
- `ADR: docs/architecture/adr-003-inventory.md`
- Referenced ADR exists and has status `Accepted`
- Referenced TR-ID exists in `docs/architecture/tr-registry.yaml`
- Story has `## Acceptance Criteria` with ≥3 testable items
- Story has `## Definition of Done` section
- Story has `Status: Ready for Dev`
- Manifest version in story header matches current `docs/architecture/control-manifest.md`
**Input:** `/story-readiness production/epics/core/story-light-pickup.md`
**Expected behavior:**
1. Skill reads the story file
2. Skill reads the referenced ADR — verifies status is `Accepted`
3. Skill reads `docs/architecture/tr-registry.yaml` — verifies TR-ID exists
4. Skill reads `docs/architecture/control-manifest.md` — verifies manifest version matches
5. Skill evaluates all 4 dimensions (Design, Architecture, Scope, DoD)
6. Skill outputs READY verdict with all checks passing
**Assertions:**
- [ ] Skill reads the referenced ADR file (not just the story)
- [ ] Skill verifies ADR status is `Accepted` (not `Proposed`)
- [ ] Skill reads `tr-registry.yaml` to verify TR-ID exists
- [ ] Output includes check results for all 4 dimensions
- [ ] Verdict is READY when all checks pass
- [ ] Skill does not write any files
---
### Case 2: Blocked Path — Referenced ADR is Proposed (not Accepted)
**Fixture:**
- Story file exists with `ADR: docs/architecture/adr-005-light-system.md`
- `adr-005-light-system.md` exists but has `Status: Proposed`
- All other story content is otherwise complete
**Input:** `/story-readiness production/epics/core/story-light-system.md`
**Expected behavior:**
1. Skill reads the story
2. Skill reads `adr-005-light-system.md` — finds `Status: Proposed`
3. Skill flags this as a BLOCKING issue (cannot implement against unaccepted ADR)
4. Skill outputs BLOCKED verdict
5. Skill recommends: accept or reject the ADR before picking up the story
**Assertions:**
- [ ] Verdict is BLOCKED (not NEEDS WORK or READY) when ADR is Proposed
- [ ] Output explicitly names the Proposed ADR as the blocker
- [ ] Output recommends resolving ADR status before proceeding
- [ ] Skill does not output READY regardless of other checks passing
---
### Case 3: Needs Work — Missing Acceptance Criteria
**Fixture:**
- Story file exists but has no `## Acceptance Criteria` section
- ADR reference exists and is `Accepted`
- TR-ID exists in registry
- Manifest version matches
**Input:** `/story-readiness production/epics/core/story-oxygen-drain.md`
**Expected behavior:**
1. Skill reads the story
2. Skill finds no Acceptance Criteria section
3. Skill flags this as a NEEDS WORK issue (story is incomplete, not blocked)
4. Skill outputs NEEDS WORK verdict
5. Skill names the missing section and suggests adding measurable criteria
**Assertions:**
- [ ] Verdict is NEEDS WORK (not BLOCKED or READY) when Acceptance Criteria section is absent
- [ ] Output identifies the missing Acceptance Criteria section specifically
- [ ] Output suggests adding testable/measurable criteria
- [ ] Skill distinguishes NEEDS WORK (fixable without external dependencies) from BLOCKED (requires outside action)
---
### Case 4: Edge Case — Stale manifest version
**Fixture:**
- Story file has `Manifest Version: 2026-01-15` in its header
- `docs/architecture/control-manifest.md` has `Manifest Version: 2026-03-10`
- Versions do not match (story was created before manifest was updated)
**Input:** `/story-readiness production/epics/core/story-mirror-rotation.md`
**Expected behavior:**
1. Skill reads the story and extracts manifest version `2026-01-15`
2. Skill reads control manifest header and extracts current version `2026-03-10`
3. Skill detects version mismatch
4. Skill flags this as an ADVISORY issue (not blocking, but worth noting)
5. Verdict is NEEDS WORK with manifest staleness noted
**Assertions:**
- [ ] Skill reads `docs/architecture/control-manifest.md` to get current version
- [ ] Skill compares story's embedded manifest version against current manifest version
- [ ] Stale manifest version results in NEEDS WORK (not BLOCKED, not READY)
- [ ] Output explains that the story's embedded guidance may be outdated
---
---
### Case 5: Director Gate — QL-STORY-READY behavior across review modes
**Fixture:**
- Story file exists and is READY (all 4 dimensions pass, ADR Accepted, criteria present)
- `production/session-state/review-mode.txt` exists
**Case 5a — full mode:**
- `review-mode.txt` contains `full`
**Input:** `/story-readiness production/epics/core/story-light-pickup.md` (full mode)
**Expected behavior:**
1. Skill reads review mode — determines `full`
2. After completing its own 4-dimension check, skill invokes QL-STORY-READY gate
3. QA lead reviews the story for readiness
4. If QA lead verdict is INADEQUATE → story verdict is BLOCKED regardless of 4-dimension result
5. If QA lead verdict is ADEQUATE → verdict proceeds normally
**Assertions (5a):**
- [ ] Skill reads review mode before deciding whether to invoke QL-STORY-READY
- [ ] QL-STORY-READY gate is invoked in full mode after the 4-dimension check completes
- [ ] A QA lead INADEQUATE verdict overrides a READY 4-dimension result → final verdict BLOCKED
- [ ] Gate invocation is noted in output: "Gate: QL-STORY-READY — [result]"
**Case 5b — lean or solo mode:**
- `review-mode.txt` contains `lean` or `solo`
**Expected behavior:**
1. Skill reads review mode — determines `lean` or `solo`
2. QL-STORY-READY gate is SKIPPED
3. Output notes the skip: "[QL-STORY-READY] skipped — Lean/Solo mode"
4. Verdict is based on 4-dimension check only
**Assertions (5b):**
- [ ] QL-STORY-READY gate does NOT spawn in lean or solo mode
- [ ] Skip is explicitly noted in output
- [ ] Verdict is based on 4-dimension check alone
---
## Protocol Compliance
- [ ] Does NOT use Write or Edit tools (read-only skill)
- [ ] Presents complete check results before verdict
- [ ] Does not ask for approval (no file writes)
- [ ] Ends with recommended next step (fix issues or proceed to implementation)
- [ ] Distinguishes three verdict levels clearly (READY vs NEEDS WORK vs BLOCKED)
---
## Coverage Notes
- Case where TR-ID is missing from the registry entirely is not explicitly
tested here; it follows the same NEEDS WORK pattern as Case 3.
- The "no argument" path (skill auto-detecting the current story) is not
tested because it depends on `production/session-state/active.md` content,
which is hard to fixture reliably.
- Stories with multiple ADR references are not tested; behavior is assumed to
be additive (all ADRs must be Accepted for READY verdict).

View File

@@ -0,0 +1,192 @@
# Skill Test Spec: /architecture-review
## Skill Summary
`/architecture-review` is an Opus-tier skill that validates a technical architecture
document against the project's 8 required architecture sections and checks that it
is internally consistent, non-contradictory with existing ADRs, and correctly
targeting the pinned engine version. It produces a verdict of APPROVED /
NEEDS REVISION / MAJOR REVISION NEEDED.
In `full` review mode, the skill spawns two director gate agents in parallel:
TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY (lead-programmer). In
`lean` or `solo` mode, both gates are skipped and noted. The skill is read-only —
no files are written.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
- [ ] Does NOT require "May I write" language (read-only skill)
- [ ] Has a next-step handoff at the end
- [ ] Documents gate behavior: TD-ARCHITECTURE + LP-FEASIBILITY in full mode; skipped in lean/solo
---
## Director Gate Checks
In `full` mode: TD-ARCHITECTURE (technical-director) and LP-FEASIBILITY
(lead-programmer) are spawned in parallel after the skill reads the architecture doc.
In `lean` mode: both gates are skipped. Output notes:
"TD-ARCHITECTURE skipped — lean mode" and "LP-FEASIBILITY skipped — lean mode".
In `solo` mode: both gates are skipped with equivalent notes.
---
## Test Cases
### Case 1: Happy Path — Complete architecture doc in full mode
**Fixture:**
- `docs/architecture/architecture.md` exists with all 8 required sections populated
- All sections reference the correct engine version from `docs/engine-reference/`
- No contradictions with existing Accepted ADRs in `docs/architecture/`
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/architecture-review docs/architecture/architecture.md`
**Expected behavior:**
1. Skill reads the architecture document
2. Skill reads existing ADRs for cross-reference
3. Skill reads engine version reference
4. TD-ARCHITECTURE and LP-FEASIBILITY gate agents spawn in parallel
5. Both gates return APPROVED
6. Skill outputs section-by-section completeness check (8/8 sections present)
7. Verdict: APPROVED
**Assertions:**
- [ ] All 8 required sections are checked and reported
- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel (not sequentially)
- [ ] Verdict is APPROVED when all sections are present and no conflicts exist
- [ ] Skill does NOT write any files
- [ ] Next-step handoff to `/create-control-manifest` or `/create-epics` is present
---
### Case 2: Failure Path — Missing required sections
**Fixture:**
- `docs/architecture/architecture.md` exists but is missing at least 2 required sections
(e.g., no data model section, no error handling section)
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/architecture-review docs/architecture/architecture.md`
**Expected behavior:**
1. Skill reads the document and identifies missing sections
2. Section completeness shows fewer than 8/8 sections present
3. Missing sections are listed by name with specific remediation guidance
4. Verdict: MAJOR REVISION NEEDED (≥2 missing sections)
**Assertions:**
- [ ] Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) for ≥2 missing sections
- [ ] Each missing section is named explicitly in the output
- [ ] Remediation guidance is specific (what to add, not just "add missing sections")
- [ ] Skill does NOT pass a document missing required sections
---
### Case 3: Partial Path — Architecture contradicts an existing ADR
**Fixture:**
- `docs/architecture/architecture.md` exists with all 8 sections present
- One Accepted ADR in `docs/architecture/` establishes a constraint that the architecture doc contradicts
(e.g., ADR-001 mandates ECS pattern; architecture.md describes a different pattern for the same system)
**Input:** `/architecture-review docs/architecture/architecture.md`
**Expected behavior:**
1. Skill reads the architecture doc and all existing ADRs
2. Conflict is detected between the architecture doc and the named ADR
3. Conflict entry names: the ADR number/title, the contradicting sections, and impact
4. Verdict: NEEDS REVISION (conflict exists but structure is otherwise sound)
**Assertions:**
- [ ] Verdict is NEEDS REVISION (not MAJOR REVISION NEEDED for a single contradiction)
- [ ] The specific ADR number and title are named in the conflict entry
- [ ] The contradicting sections in both documents are identified
- [ ] Skill does NOT auto-resolve the contradiction
---
### Case 4: Edge Case — File not found
**Fixture:**
- The path provided does not exist in the project
**Input:** `/architecture-review docs/architecture/nonexistent.md`
**Expected behavior:**
1. Skill attempts to read the file
2. File not found
3. Skill outputs a clear error naming the missing file
4. Skill suggests checking `docs/architecture/` or running `/create-architecture`
5. Skill does NOT produce a verdict
**Assertions:**
- [ ] Skill outputs a clear error when the file is not found
- [ ] No verdict is produced (APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED)
- [ ] Skill suggests a corrective action
- [ ] Skill does NOT crash or produce a partial report
---
### Case 5: Director Gate — Full mode spawns both gates; solo mode skips both
**Fixture (full mode):**
- `docs/architecture/architecture.md` exists with all 8 sections
- `production/session-state/review-mode.txt` contains `full`
**Full mode expected behavior:**
1. TD-ARCHITECTURE gate spawns
2. LP-FEASIBILITY gate spawns in parallel with TD-ARCHITECTURE
3. Both gates complete before verdict is issued
**Assertions (full mode):**
- [ ] TD-ARCHITECTURE and LP-FEASIBILITY both appear in the output as completed gates
- [ ] Both gates spawn in parallel (not one after the other)
- [ ] Verdict reflects gate feedback
**Fixture (solo mode):**
- Same architecture doc
- `production/session-state/review-mode.txt` contains `solo`
**Solo mode expected behavior:**
1. Skill reads the architecture doc
2. Gates are NOT spawned
3. Output notes: "TD-ARCHITECTURE skipped — solo mode" and "LP-FEASIBILITY skipped — solo mode"
4. Verdict is based on structural checks only
**Assertions (solo mode):**
- [ ] Neither TD-ARCHITECTURE nor LP-FEASIBILITY appears as an active gate
- [ ] Both skipped gates are noted in the output
- [ ] Verdict is still produced based on the structural check alone
---
## Protocol Compliance
- [ ] Does NOT write any files (read-only skill)
- [ ] Presents section completeness check before issuing verdict
- [ ] TD-ARCHITECTURE and LP-FEASIBILITY spawn in parallel in full mode
- [ ] Skipped gates are noted by name and mode in lean/solo output
- [ ] Verdict is one of exactly: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
- [ ] Ends with next-step handoff appropriate to verdict
---
## Coverage Notes
- The 8 required architecture sections are project-specific; tests use the
section list defined in the skill body — not re-enumerated here.
- Engine version compatibility checking (cross-referencing `docs/engine-reference/`)
is part of Case 1's happy path but not independently fixture-tested.
- RTM (requirement traceability matrix) mode is a separate concern covered by
the `/architecture-review` skill's own `rtm` argument mode, not tested here.

View File

@@ -0,0 +1,170 @@
# Skill Test Spec: /design-review
## Skill Summary
`/design-review` reads a game design document (GDD) and evaluates it against
the project's 8-section design standard (Overview, Player Fantasy, Detailed
Rules, Formulas, Edge Cases, Dependencies, Tuning Knobs, Acceptance Criteria).
It checks for internal consistency, implementability, and cross-system
conflicts. It produces a verdict of APPROVED, NEEDS REVISION, or MAJOR
REVISION NEEDED. It is a read-only skill (no file writes) and runs as a
`context: fork` subagent.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings or numbered steps
- [ ] Contains verdict keywords: APPROVED, NEEDS REVISION, MAJOR REVISION NEEDED
- [ ] Does NOT require "May I write" language (read-only skill — `allowed-tools` excludes Write/Edit)
- [ ] Output format is documented (review template shown in skill body)
---
## Test Cases
### Case 1: Happy Path — Complete GDD, all 8 sections present
**Fixture:**
- `design/gdd/light-manipulation.md` exists (use `_fixtures/minimal-game-concept.md`
as a stand-in — represents a complete document with all required content)
- All 8 required sections are populated with substantive content
- Formulas section contains at least one formula with defined variables
- Acceptance Criteria section contains at least 3 testable criteria
**Input:** `/design-review design/gdd/light-manipulation.md`
**Expected behavior:**
1. Skill reads the target document in full
2. Skill reads CLAUDE.md for project context and standards
3. Skill evaluates all 8 required sections (present/absent check)
4. Skill checks internal consistency (formulas match described behavior)
5. Skill checks implementability (rules are precise enough to code)
6. Skill outputs structured review with section-by-section status
7. Skill outputs APPROVED verdict
**Assertions:**
- [ ] Skill reads the target file before producing any output
- [ ] Output includes a "Completeness" section showing X/8 sections present
- [ ] Output includes an "Internal Consistency" section
- [ ] Output includes an "Implementability" section
- [ ] Output ends with a verdict line: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED
- [ ] APPROVED verdict is given when all 8 sections are present and consistent
---
### Case 2: Failure Path — Incomplete GDD (4/8 sections)
**Fixture:**
- `design/gdd/light-manipulation.md` exists using content from
`tests/skills/_fixtures/incomplete-gdd.md` (4 of 8 sections populated;
Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria are missing)
**Input:** `/design-review design/gdd/light-manipulation.md`
**Expected behavior:**
1. Skill reads the document
2. Skill identifies 4 missing sections
3. Skill outputs "Completeness: 4/8 sections present"
4. Skill lists specifically which 4 sections are missing
5. Skill outputs MAJOR REVISION NEEDED verdict (not APPROVED or NEEDS REVISION)
**Assertions:**
- [ ] Output shows "4/8" in the completeness section (not a higher number)
- [ ] Output explicitly names each missing section (Formulas, Edge Cases, Tuning Knobs, Acceptance Criteria)
- [ ] Verdict is MAJOR REVISION NEEDED (not APPROVED or NEEDS REVISION) when ≥3 sections are missing
- [ ] Output does not suggest the document is implementation-ready
- [ ] Skill does not write any files (read-only enforcement)
---
### Case 3: Partial Path — 7/8 sections, minor inconsistency
**Fixture:**
- GDD has all sections except Formulas
- The described behavior mentions numeric values but no formulas are defined
- Acceptance Criteria exist but are vague ("feels good" rather than measurable)
**Input:** `/design-review design/gdd/[document].md`
**Expected behavior:**
1. Skill identifies missing Formulas section
2. Skill flags vague acceptance criteria as an implementability issue
3. Skill outputs NEEDS REVISION verdict (not APPROVED, not MAJOR REVISION NEEDED)
4. Skill provides specific remediation notes for each issue
**Assertions:**
- [ ] Verdict is NEEDS REVISION (not APPROVED, not MAJOR REVISION NEEDED) for 7/8 with issues
- [ ] Output identifies the missing Formulas section specifically
- [ ] Output flags the vague acceptance criteria as an implementability gap
- [ ] Each flagged issue has a specific, actionable remediation note
---
### Case 4: Edge Case — File not found
**Fixture:**
- The path provided does not exist in the project
**Input:** `/design-review design/gdd/nonexistent.md`
**Expected behavior:**
1. Skill attempts to read the file
2. File not found
3. Skill outputs an error message naming the missing file
4. Skill suggests checking the path or listing files in `design/gdd/`
5. Skill does NOT produce a verdict
**Assertions:**
- [ ] Skill outputs a clear error when the file is not found
- [ ] Skill does NOT output APPROVED, NEEDS REVISION, or MAJOR REVISION NEEDED when file is missing
- [ ] Skill suggests a corrective action (check path, list available GDDs)
---
---
### Case 5: Director Gate — no gate spawned regardless of review mode
**Fixture:**
- `design/gdd/light-manipulation.md` exists with all 8 sections
- `production/session-state/review-mode.txt` exists with `full` (most permissive mode)
**Input:** `/design-review design/gdd/light-manipulation.md` (with full review mode active)
**Expected behavior:**
1. Skill reads the GDD document
2. Skill does NOT read `review-mode.txt` — this skill has no director gates
3. Skill produces the review output normally
4. No director gate agents are spawned at any point
5. Verdict is APPROVED (all 8 sections present in fixture)
**Assertions:**
- [ ] Skill does NOT spawn any director gate agent (CD-, TD-, PR-, AD- prefixed agents)
- [ ] Skill does NOT read `review-mode.txt` or equivalent mode file
- [ ] The `--review` flag or `full` mode state has NO effect on whether directors spawn
- [ ] Output does not contain any "Gate: [GATE-ID]" entries
- [ ] Skill IS the review — it does not delegate the review to a director
---
## Protocol Compliance
- [ ] Does NOT use Write or Edit tools (read-only skill)
- [ ] Presents complete findings before any verdict
- [ ] Does not ask for approval before producing output (no writes to approve)
- [ ] Ends with recommended next step (e.g., fix issues and re-run, or proceed to `/map-systems`)
---
## Coverage Notes
- Cross-system consistency checking (Case 3 in the skill's own phase list) is
not directly tested here because it requires multiple GDD files to compare;
this is covered by the `/review-all-gdds` spec instead.
- The skill's `context: fork` behavior (running as a subagent) is not tested
at the spec level — this is a runtime behavior verified manually.
- Performance and edge cases involving very large GDD files are not in scope.

View File

@@ -0,0 +1,178 @@
# Skill Test Spec: /review-all-gdds
## Skill Summary
`/review-all-gdds` is an Opus-tier skill that performs a holistic cross-GDD review
across all files in `design/gdd/`. It runs two complementary review phases in
parallel: Phase 1 checks for consistency (contradictions, formula mismatches,
stale references, competing ownership), and Phase 2 checks design theory (dominant
strategies, pillar drift, cognitive overload, economic imbalance). Because the two
phases are independent, they are spawned simultaneously to save time. The skill
produces a CONSISTENT / MINOR ISSUES / MAJOR ISSUES verdict and is read-only — no
files are written without explicit user approval.
The skill is itself the holistic review gate in the pipeline. It is invoked after
individual GDDs are complete and before architecture work begins. It does NOT spawn
any director gate agents (it IS the director-level review).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥5 phase headings (complex multi-phase skill)
- [ ] Contains verdict keywords: CONSISTENT, MINOR ISSUES, MAJOR ISSUES
- [ ] Does NOT require "May I write" language (read-only skill)
- [ ] Has a next-step handoff at the end
- [ ] Documents parallel phase spawning (Phase 1 and Phase 2 are independent)
---
## Director Gate Checks
No director gates — this skill spawns no director gate agents. It IS the holistic
review; delegating to a director gate would create a circular dependency.
---
## Test Cases
### Case 1: Happy Path — Clean GDD set with no conflicts
**Fixture:**
- `design/gdd/` contains ≥3 system GDDs
- All GDDs are internally consistent: no formula contradictions, no competing ownership, no stale references
- All GDDs align with the pillars defined in `design/gdd/game-pillars.md`
**Input:** `/review-all-gdds`
**Expected behavior:**
1. Skill reads all GDD files in `design/gdd/`
2. Phase 1 (consistency scan) and Phase 2 (design theory check) spawn in parallel
3. Phase 1 finds no contradictions, no formula mismatches, no ownership conflicts
4. Phase 2 finds no pillar drift, no dominant strategies, no cognitive overload
5. Skill outputs a structured findings table with 0 blocking issues
6. Verdict: CONSISTENT
**Assertions:**
- [ ] Both review phases are spawned in parallel (not sequentially)
- [ ] Output includes a findings table (even if empty — shows "No issues found")
- [ ] Verdict is CONSISTENT when no conflicts are found
- [ ] Skill does NOT write any files without user approval
- [ ] Next-step handoff to `/architecture-review` or `/create-architecture` is present
---
### Case 2: Failure Path — Conflicting rules between two GDDs
**Fixture:**
- GDD-A defines a floor value (e.g. "minimum [output] is [N]")
- GDD-B states a mechanic that bypasses that floor (e.g. "[mechanic] can reduce [output] to 0")
- The two GDDs are otherwise complete and valid
**Input:** `/review-all-gdds`
**Expected behavior:**
1. Phase 1 (consistency scan) detects the contradiction between GDD-A and GDD-B
2. Conflict is reported with: both filenames, the specific conflicting rules, and severity HIGH
3. Verdict: MAJOR ISSUES
4. Handoff instructs user to resolve the conflict and re-run before proceeding
**Assertions:**
- [ ] Verdict is MAJOR ISSUES (not CONSISTENT or MINOR ISSUES)
- [ ] Both GDD filenames are named in the conflict entry
- [ ] The specific contradicting rules are quoted or described (not vague "conflict found")
- [ ] Issue is classified as severity HIGH (blocking)
- [ ] Skill does NOT auto-resolve the conflict
---
### Case 3: Partial Path — Single GDD with orphaned dependency reference
**Fixture:**
- GDD-A lists a dependency in its Dependencies section pointing to "system-B"
- No GDD for system-B exists in `design/gdd/`
- All other GDDs are consistent
**Input:** `/review-all-gdds`
**Expected behavior:**
1. Phase 1 detects the orphaned dependency reference in GDD-A
2. Issue is reported as: DEPENDENCY GAP — GDD-A references system-B which has no GDD
3. No other conflicts found
4. Verdict: MINOR ISSUES (dependency gap is advisory, not blocking by itself)
**Assertions:**
- [ ] Verdict is MINOR ISSUES (not MAJOR ISSUES for a single orphaned reference)
- [ ] The specific GDD filename and the missing dependency name are reported
- [ ] Skill suggests running `/design-system system-B` to resolve the gap
- [ ] Skill does NOT skip or silently ignore the missing dependency
---
### Case 4: Edge Case — No GDD files found
**Fixture:**
- `design/gdd/` directory is empty or does not exist
- No GDD files are present
**Input:** `/review-all-gdds`
**Expected behavior:**
1. Skill attempts to read files in `design/gdd/`
2. No files found — skill outputs an error with guidance
3. Skill recommends running `/brainstorm` and `/design-system` before re-running
4. Skill does NOT produce a verdict (CONSISTENT / MINOR ISSUES / MAJOR ISSUES)
**Assertions:**
- [ ] Skill outputs a clear error message when no GDDs are found
- [ ] No verdict is produced when the directory is empty
- [ ] Skill recommends the correct next action (`/brainstorm` or `/design-system`)
- [ ] Skill does NOT crash or produce a partial report
---
### Case 5: Director Gate — No gate spawned regardless of review mode
**Fixture:**
- `design/gdd/` contains ≥2 consistent system GDDs
- `production/session-state/review-mode.txt` exists with content `full`
**Input:** `/review-all-gdds`
**Expected behavior:**
1. Skill reads all GDDs and runs the two review phases
2. Skill does NOT read `review-mode.txt`
3. Skill does NOT spawn any director gate agent (CD-, TD-, PR-, AD- prefixed)
4. Skill completes and outputs its verdict normally
5. Review mode setting has no effect on this skill's behavior
**Assertions:**
- [ ] No director gate agents are spawned at any point
- [ ] Skill does NOT read `production/session-state/review-mode.txt`
- [ ] Output does not contain any "Gate: [GATE-ID]" or "skipped" gate entries
- [ ] The skill produces a verdict regardless of review mode
- [ ] R4 metric: gate count for this skill = 0 in all modes
---
## Protocol Compliance
- [ ] Phase 1 (consistency) and Phase 2 (design theory) spawned in parallel — not sequentially
- [ ] Does NOT write any files without "May I write" approval
- [ ] Findings table shown before any write ask
- [ ] Verdict is one of exactly: CONSISTENT, MINOR ISSUES, MAJOR ISSUES
- [ ] Ends with appropriate handoff: MAJOR ISSUES → fix and re-run; MINOR ISSUES → may proceed with awareness; CONSISTENT → `/create-architecture`
---
## Coverage Notes
- Economic balance analysis (source/sink loops) requires cross-GDD resource data — covered
structurally by Case 2 (the conflict detection pattern is the same).
- The design theory phase (Phase 2) checks including dominant strategy detection and
cognitive overload are not individually fixture-tested — they follow the same
pattern as consistency checks and are validated via the pillar drift case structure.
- The `since-last-review` scoping mode is not tested here — it is a runtime concern.

View File

@@ -0,0 +1,169 @@
# Skill Test Spec: /changelog
## Skill Summary
`/changelog` is a Haiku-tier skill that auto-generates a developer-facing
changelog by reading git commit history and closed sprint stories since the
last release tag. It organizes entries into features, fixes, and known issues.
No director gates are used. The skill asks "May I write to `docs/CHANGELOG.md`?"
before persisting. Verdict is always COMPLETE.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" language (skill writes changelog)
- [ ] Has a next-step handoff (e.g., run /patch-notes for player-facing version)
---
## Director Gate Checks
None. Changelog generation is a fast compilation task; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Multiple sprints since last release tag
**Fixture:**
- Git history has a tag `v0.3.0` three sprints ago
- Since that tag: 12 commits across sprints 006, 007, 008
- Sprint story files reference task IDs matching commit messages
- `docs/CHANGELOG.md` does not yet exist
**Input:** `/changelog`
**Expected behavior:**
1. Skill reads git log since `v0.3.0` tag
2. Skill reads sprint stories to cross-reference task IDs
3. Skill compiles entries into Features, Fixes, and Known Issues sections
4. Skill presents draft to user
5. Skill asks "May I write to `docs/CHANGELOG.md`?"
6. User approves; file written; verdict COMPLETE
**Assertions:**
- [ ] Changelog covers commits since the most recent git tag
- [ ] Entries are organized into Features / Fixes / Known Issues sections
- [ ] Sprint story references are used to enrich commit descriptions
- [ ] "May I write" prompt appears before file write
- [ ] Verdict is COMPLETE after write
---
### Case 2: No Git Tags Found — All commits used, version baseline noted
**Fixture:**
- Git repository has commits but no tags exist
- 20 commits in history across 3 sprints
**Input:** `/changelog`
**Expected behavior:**
1. Skill checks for git tags — finds none
2. Skill uses all commits in history as the baseline
3. Skill notes in the output: "No version tag found — using full commit history; version baseline is unset"
4. Skill still compiles organized changelog from available commits
5. Skill asks "May I write" and writes on approval
**Assertions:**
- [ ] Skill does not error when no git tags exist
- [ ] Output explicitly notes that no version baseline was found
- [ ] Full commit history is used as the source
- [ ] Changelog is still organized into sections despite missing tag
---
### Case 3: Commit Messages Without Task IDs — Grouped by date with note
**Fixture:**
- Git log since last tag has 8 commits
- 5 commits have no task ID in the message (e.g., "fix typo", "tweak values")
- 3 commits reference task IDs matching sprint stories
**Input:** `/changelog`
**Expected behavior:**
1. Skill reads commits and sprint stories
2. 3 commits are matched to sprint stories and placed in appropriate sections
3. 5 untagged commits are grouped by date under a "Misc" or "Other Changes" section
4. Output notes: "5 commits without task IDs — grouped by date"
5. Skill writes changelog on approval
**Assertions:**
- [ ] Commits with task IDs are placed in appropriate sections (Features or Fixes)
- [ ] Commits without task IDs are grouped separately with a note
- [ ] Output flags the number of commits missing task references
- [ ] No commits are silently dropped from the changelog
---
### Case 4: Existing CHANGELOG.md — New section prepended, old entries preserved
**Fixture:**
- `docs/CHANGELOG.md` already exists with sections for `v0.2.0` and `v0.3.0`
- New commits exist since `v0.3.0` tag
**Input:** `/changelog`
**Expected behavior:**
1. Skill detects that `docs/CHANGELOG.md` already exists
2. Skill compiles new entries for the period since `v0.3.0`
3. Skill presents draft with new section prepended above existing content
4. Skill asks "May I write to `docs/CHANGELOG.md`?" (confirming prepend strategy)
5. User approves; new content is prepended, old entries intact; verdict COMPLETE
**Assertions:**
- [ ] Skill reads existing changelog before writing to detect prior content
- [ ] New section is prepended (not appended or overwriting) existing entries
- [ ] Old changelog entries for v0.2.0 and v0.3.0 are preserved in the written file
- [ ] "May I write" prompt reflects the prepend operation
---
### Case 5: Gate Compliance — No gate; read-then-write with approval
**Fixture:**
- Git history has commits since last tag
- `review-mode.txt` contains `full`
**Input:** `/changelog`
**Expected behavior:**
1. Skill compiles changelog in full mode
2. No director gate is invoked (changelog generation is compilation, not a delivery gate)
3. Skill runs on Haiku model — fast compilation
4. Skill asks user for approval and writes file on confirmation
**Assertions:**
- [ ] No director gate is invoked regardless of review mode
- [ ] Output does not reference any gate result
- [ ] Skill proceeds directly from compilation to "May I write" prompt
- [ ] Verdict is COMPLETE
---
## Protocol Compliance
- [ ] Reads git log and sprint story files before compiling
- [ ] Always asks "May I write" before writing changelog
- [ ] No director gates are invoked
- [ ] Verdict is always COMPLETE
- [ ] Runs on Haiku model tier (fast, low-cost)
---
## Coverage Notes
- The case where git is not initialized in the repository is not tested;
behavior would depend on git command failure handling.
- Merge commits vs. squash commits are not explicitly differentiated in
these tests; implementation detail of the git log parsing phase.
- The `/patch-notes` skill should be run after `/changelog` for player-facing
output; that handoff is verified in the patch-notes spec.

View File

@@ -0,0 +1,171 @@
# Skill Test Spec: /milestone-review
## Skill Summary
`/milestone-review` generates a comprehensive review of a completed milestone:
what shipped, velocity metrics, deferred items, risks surfaced, and retrospective
seeds. In full mode the PR-MILESTONE director gate runs after the review is
compiled (producer reviews scope delivery). In lean and solo modes the gate is
skipped. The skill asks "May I write to `production/milestones/review-milestone-N.md`?"
before persisting. Verdicts: MILESTONE COMPLETE or MILESTONE INCOMPLETE.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: MILESTONE COMPLETE, MILESTONE INCOMPLETE
- [ ] Contains "May I write" language (skill writes review document)
- [ ] Has a next-step handoff (what to do after review is written)
---
## Director Gate Checks
| Gate ID | Trigger condition | Mode guard |
|---------------|--------------------------------|-------------------------|
| PR-MILESTONE | After review document compiled | full only (not lean/solo) |
---
## Test Cases
### Case 1: Happy Path — Nearly complete milestone with one deferred story
**Fixture:**
- `production/milestones/milestone-03.md` exists with 8 stories
- 7 stories have `Status: Complete`
- 1 story has `Status: Deferred` (deferred to milestone-04)
- `review-mode.txt` contains `full`
**Input:** `/milestone-review milestone-03`
**Expected behavior:**
1. Skill reads `milestone-03.md` and all referenced sprint files
2. Skill compiles: 7 shipped, 1 deferred; velocity; no blockers
3. Skill presents review draft to user
4. PR-MILESTONE gate invoked; producer approves
5. Skill asks "May I write to `production/milestones/review-milestone-03.md`?"
6. User approves; file is written; verdict MILESTONE COMPLETE
**Assertions:**
- [ ] Deferred story is noted in the review with its target milestone
- [ ] Verdict is MILESTONE COMPLETE despite the one deferred story
- [ ] PR-MILESTONE gate is invoked after draft compilation in full mode
- [ ] Skill asks "May I write" before writing review file
- [ ] Review document path matches `production/milestones/review-milestone-03.md`
---
### Case 2: Blocked Milestone — Multiple blocked stories
**Fixture:**
- `production/milestones/milestone-03.md` exists with 5 stories
- 2 stories have `Status: Complete`
- 3 stories have `Status: Blocked` (named blockers listed in each story)
- `review-mode.txt` contains `full`
**Input:** `/milestone-review milestone-03`
**Expected behavior:**
1. Skill reads milestone and sprint files
2. Skill finds 3 blocked stories; compiles blocker details
3. Verdict is MILESTONE INCOMPLETE
4. PR-MILESTONE gate runs; producer notes the unresolved blockers
5. Review is written with blocker list on approval
**Assertions:**
- [ ] Verdict is MILESTONE INCOMPLETE when any stories are Blocked
- [ ] Each blocked story's name and blocker reason is listed in the review
- [ ] PR-MILESTONE gate is still invoked in full mode even for INCOMPLETE verdict
- [ ] "May I write" prompt still appears before file write
---
### Case 3: Full Mode — PR-MILESTONE returns CONCERNS
**Fixture:**
- Milestone-03 has 6 complete stories but 2 were not in the original scope (added mid-sprint)
- `review-mode.txt` contains `full`
**Input:** `/milestone-review milestone-03`
**Expected behavior:**
1. Skill compiles review; notes 2 out-of-scope stories shipped
2. PR-MILESTONE gate invoked; producer returns CONCERNS about scope drift
3. Skill surfaces the CONCERNS to the user and adds a "scope drift" note to the review
4. User approves revised review; file written as MILESTONE COMPLETE with caveat
**Assertions:**
- [ ] CONCERNS from PR-MILESTONE gate are shown to user before write
- [ ] Scope drift is explicitly noted in the written review document
- [ ] Verdict is MILESTONE COMPLETE (stories shipped) with CONCERNS annotation
- [ ] Skill does not suppress gate feedback
---
### Case 4: Edge Case — No milestone file found for specified milestone
**Fixture:**
- User calls `/milestone-review milestone-07`
- `production/milestones/milestone-07.md` does NOT exist
**Input:** `/milestone-review milestone-07`
**Expected behavior:**
1. Skill attempts to read `production/milestones/milestone-07.md`
2. File not found; skill outputs an error message
3. Skill suggests checking available milestones in `production/milestones/`
4. No gate is invoked; no file is written
**Assertions:**
- [ ] Skill does not crash when milestone file is absent
- [ ] Output names the expected file path in the error message
- [ ] Output suggests checking `production/milestones/` for valid milestone names
- [ ] Verdict is BLOCKED (cannot review a non-existent milestone)
---
### Case 5: Lean/Solo Mode — PR-MILESTONE gate skipped
**Fixture:**
- `production/milestones/milestone-03.md` exists with 5 complete stories
- `review-mode.txt` contains `solo`
**Input:** `/milestone-review milestone-03`
**Expected behavior:**
1. Skill reads review mode — determines `solo`
2. Skill compiles review draft
3. PR-MILESTONE gate is skipped; output notes "[PR-MILESTONE] skipped — Solo mode"
4. Skill asks user for direct approval of the review
5. User approves; review file is written; verdict MILESTONE COMPLETE
**Assertions:**
- [ ] PR-MILESTONE gate is NOT invoked in solo (or lean) mode
- [ ] Skip is explicitly noted in skill output
- [ ] User direct approval is still required before write
- [ ] Verdict is MILESTONE COMPLETE after successful write
---
## Protocol Compliance
- [ ] Shows compiled review draft before invoking PR-MILESTONE or asking to write
- [ ] Always asks "May I write" before writing review document
- [ ] PR-MILESTONE gate only runs in full mode
- [ ] Skip message appears in lean and solo output
- [ ] Verdict is MILESTONE COMPLETE or MILESTONE INCOMPLETE, stated clearly
---
## Coverage Notes
- The case where the milestone has zero stories is not tested; it follows the
MILESTONE INCOMPLETE pattern with a note suggesting the milestone may not
have been planned.
- Velocity calculation specifics (story points vs. story count) are not
verified here; they are implementation details of the review compilation phase.

View File

@@ -0,0 +1,170 @@
# Skill Test Spec: /patch-notes
## Skill Summary
`/patch-notes` is a Haiku-tier skill that generates player-facing patch notes
from existing changelog content, stripping internal task IDs and technical
jargon in favor of plain language. It filters entries to only those relevant
to players (visible features and bug fixes; internal refactors are excluded).
No director gates are used. The skill asks "May I write to
`docs/patch-notes-vX.X.md`?" before persisting. Verdict is always COMPLETE.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" language (skill writes patch notes file)
- [ ] Has a next-step handoff (e.g., share with community manager)
---
## Director Gate Checks
None. Patch notes generation is a fast compilation task; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Changelog filtered to player-facing entries
**Fixture:**
- `docs/CHANGELOG.md` exists with 5 entries:
- "Add dual-wield melee system" (Features — player-facing)
- "Fix crash on level transition" (Fixes — player-facing)
- "Add enemy patrol AI" (Features — player-facing)
- "Refactor input handler to use event bus" (Fixes — internal only)
- "Update dependency: Godot 4.6" (internal only)
- Version is `v0.4.0`
**Input:** `/patch-notes v0.4.0`
**Expected behavior:**
1. Skill reads `docs/CHANGELOG.md`
2. Skill filters to 3 player-facing entries; excludes 2 internal entries
3. Skill rewrites entries in plain language (no task IDs, no tech jargon)
4. Skill presents draft to user
5. Skill asks "May I write to `docs/patch-notes-v0.4.0.md`?"
6. User approves; file written; verdict COMPLETE
**Assertions:**
- [ ] Only 3 entries appear in the patch notes (2 internal entries excluded)
- [ ] Entries are written in plain language without internal task IDs
- [ ] File path matches `docs/patch-notes-v0.4.0.md`
- [ ] "May I write" prompt appears before file write
- [ ] Verdict is COMPLETE after write
---
### Case 2: No Changelog Found — Directed to run /changelog first
**Fixture:**
- `docs/CHANGELOG.md` does NOT exist
**Input:** `/patch-notes v0.4.0`
**Expected behavior:**
1. Skill attempts to read `docs/CHANGELOG.md` — not found
2. Skill outputs: "No changelog found — run /changelog first to generate one"
3. No patch notes are generated; no file is written
**Assertions:**
- [ ] Skill does not crash when changelog is absent
- [ ] Output explicitly directs user to run `/changelog`
- [ ] No "May I write" prompt appears (nothing to write)
- [ ] Verdict is BLOCKED (dependency not met)
---
### Case 3: Tone Guidance from Design Folder — Incorporated into output
**Fixture:**
- `docs/CHANGELOG.md` exists with player-facing entries
- `design/community/tone-guide.md` exists with guidance: "upbeat, encouraging tone; avoid passive voice"
**Input:** `/patch-notes v0.4.0`
**Expected behavior:**
1. Skill reads changelog
2. Skill detects tone guide at `design/community/tone-guide.md`
3. Skill applies tone guidance when rewriting entries in plain language
4. Patch notes use upbeat, active-voice phrasing
5. Skill presents draft, asks to write, writes on approval
**Assertions:**
- [ ] Skill checks `design/` for a community or tone guidance file
- [ ] Tone guide content influences phrasing of patch note entries
- [ ] Output reflects active voice and upbeat tone where applicable
- [ ] Skill notes that tone guidance was applied
---
### Case 4: Patch Note Template Exists — Used instead of generated structure
**Fixture:**
- `.claude/docs/templates/patch-notes-template.md` exists with a structured header format
- `docs/CHANGELOG.md` exists with player-facing entries
**Input:** `/patch-notes v0.4.0`
**Expected behavior:**
1. Skill reads changelog and detects template exists
2. Skill populates the template with player-facing entries
3. Template header/footer structure is preserved in the output
4. Skill asks "May I write" and writes on approval
**Assertions:**
- [ ] Skill checks for a patch notes template before generating from scratch
- [ ] Template structure is used when found (not overridden by default format)
- [ ] Player-facing entries are inserted into the correct template section
- [ ] Output note confirms template was used
---
### Case 5: Gate Compliance — No gate; community-manager is separate
**Fixture:**
- `docs/CHANGELOG.md` exists with player-facing entries
- `review-mode.txt` contains `full`
**Input:** `/patch-notes v0.4.0`
**Expected behavior:**
1. Skill compiles patch notes in full mode
2. No director gate is invoked (community review is a separate, manual step)
3. Skill runs on Haiku model — fast compilation
4. Skill notes in output: "Consider sharing draft with community manager before publishing"
5. Skill asks user for approval and writes on confirmation
**Assertions:**
- [ ] No director gate is invoked regardless of review mode
- [ ] Output suggests (but does not require) community manager review
- [ ] Skill proceeds directly from compilation to "May I write" prompt
- [ ] Verdict is COMPLETE
---
## Protocol Compliance
- [ ] Reads `docs/CHANGELOG.md` before generating patch notes
- [ ] Filters entries to player-facing items only
- [ ] Rewrites entries in plain language without internal IDs
- [ ] Always asks "May I write" before writing patch notes file
- [ ] No director gates are invoked
- [ ] Runs on Haiku model tier (fast, low-cost)
---
## Coverage Notes
- The case where all changelog entries are internal (zero player-facing items)
is not tested; behavior is an empty patch notes draft with a warning.
- Version number parsing from the changelog header is an implementation detail
not verified here.
- The community manager consultation noted in Case 5 is advisory; a separate
skill or manual review handles that step.

View File

@@ -0,0 +1,169 @@
# Skill Test Spec: /retrospective
## Skill Summary
`/retrospective` generates a structured sprint or milestone retrospective
covering three categories: what went well, what didn't, and action items.
It reads sprint files and session logs to compile observations, then produces
a retrospective document. No director gates are used — retrospectives are
team self-reflection artifacts. The skill asks "May I write to
`production/retrospectives/retro-sprint-NNN.md`?" before persisting.
Verdict is always COMPLETE (retrospective is structured output, not a pass/fail
assessment).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" language (skill writes retrospective document)
- [ ] Has a next-step handoff (what to do after retrospective is written)
---
## Director Gate Checks
None. Retrospectives are team self-reflection documents; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Sprint with mixed outcomes
**Fixture:**
- `production/sprints/sprint-005.md` exists with 6 stories (4 Complete, 1 Blocked, 1 Deferred)
- `production/session-logs/` contains log entries for the sprint period
- No prior retrospective exists for sprint-005
**Input:** `/retrospective sprint-005`
**Expected behavior:**
1. Skill reads sprint-005 and session logs
2. Skill compiles three retrospective categories: went well (4 stories shipped),
didn't (1 blocked, 1 deferred), and action items (address blocker root cause)
3. Skill presents retrospective draft to user
4. Skill asks "May I write to `production/retrospectives/retro-sprint-005.md`?"
5. User approves; file is written; verdict COMPLETE
**Assertions:**
- [ ] Retrospective contains all three categories (went well / didn't / actions)
- [ ] Blocked and deferred stories appear in the "what didn't" section
- [ ] At least one action item is generated from the blocked story
- [ ] Skill asks "May I write" before writing file
- [ ] Verdict is COMPLETE after successful write
---
### Case 2: No Sprint Data — Manual input fallback
**Fixture:**
- User calls `/retrospective sprint-009`
- `production/sprints/sprint-009.md` does NOT exist
- No session logs reference sprint-009
**Input:** `/retrospective sprint-009`
**Expected behavior:**
1. Skill attempts to read sprint-009 — not found
2. Skill informs user that no sprint data was found for sprint-009
3. Skill prompts user to provide retrospective input manually (went well, didn't, actions)
4. User provides input; skill formats it into the retrospective structure
5. Skill asks "May I write" and writes the document on approval
**Assertions:**
- [ ] Skill does not crash or produce an empty document when sprint file is absent
- [ ] User is prompted to provide manual input
- [ ] Manual input is formatted into the three-category structure
- [ ] "May I write" prompt still appears before file write
---
### Case 3: Prior Retrospective Exists — Offer to append or replace
**Fixture:**
- `production/retrospectives/retro-sprint-005.md` already exists with content
- User re-runs `/retrospective sprint-005` after changes
**Input:** `/retrospective sprint-005`
**Expected behavior:**
1. Skill detects that `retro-sprint-005.md` already exists
2. Skill presents user with choice: append new observations or replace existing file
3. User selects "replace"; skill compiles fresh retrospective
4. Skill asks "May I write to `production/retrospectives/retro-sprint-005.md`?" (confirming overwrite)
5. File is overwritten; verdict COMPLETE
**Assertions:**
- [ ] Skill checks for existing retrospective file before compiling
- [ ] User is offered append or replace choice — not silently overwritten
- [ ] "May I write" prompt reflects the overwrite scenario
- [ ] Verdict is COMPLETE after write regardless of append vs. replace
---
### Case 4: Edge Case — Unresolved action items from previous retrospective
**Fixture:**
- `production/retrospectives/retro-sprint-004.md` exists with 2 action items marked `[ ]` (not done)
- User runs `/retrospective sprint-005`
**Input:** `/retrospective sprint-005`
**Expected behavior:**
1. Skill reads the most recent prior retrospective (retro-sprint-004)
2. Skill detects 2 unchecked action items from sprint-004
3. Skill includes a "Carry-over from Sprint 004" section in the new retrospective
4. The unresolved items are listed with a note that they were not followed up
**Assertions:**
- [ ] Skill reads the most recent prior retrospective to check for open action items
- [ ] Unresolved action items appear in the new retrospective under a carry-over section
- [ ] Carry-over items are distinct from newly generated action items
- [ ] Output notes that these items were not followed up in the previous sprint
---
### Case 5: Gate Compliance — No gate invoked in any mode
**Fixture:**
- `production/sprints/sprint-005.md` exists with complete stories
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/retrospective sprint-005`
**Expected behavior:**
1. Skill compiles retrospective in full mode
2. No director gate is invoked (retrospectives are team self-reflection, not delivery gates)
3. Skill asks user for approval and writes file on confirmation
4. Verdict is COMPLETE
**Assertions:**
- [ ] No director gate is invoked regardless of review mode
- [ ] Output does not contain any gate invocation or gate result notation
- [ ] Skill proceeds directly from compilation to "May I write" prompt
- [ ] Review mode file content is irrelevant to this skill's behavior
---
## Protocol Compliance
- [ ] Always shows retrospective draft before asking to write
- [ ] Always asks "May I write" before writing retrospective file
- [ ] No director gates are invoked
- [ ] Verdict is always COMPLETE (not a pass/fail skill)
- [ ] Checks prior retrospective for unresolved action items
---
## Coverage Notes
- Milestone retrospectives (as opposed to sprint retrospectives) follow the
same pattern but read milestone files instead of sprint files; not
separately tested here.
- The case where session logs are empty is similar to Case 2 (no data);
the skill falls back to manual input in both situations.

View File

@@ -0,0 +1,177 @@
# Skill Test Spec: /sprint-plan
## Skill Summary
`/sprint-plan` reads the current milestone file and backlog stories, then
generates a new numbered sprint with stories prioritized by implementation layer
and priority score. In full mode the PR-SPRINT director gate runs after the
sprint draft is compiled (producer reviews the plan). In lean and solo modes
the gate is skipped. The skill asks "May I write to `production/sprints/sprint-NNN.md`?"
before persisting. Verdicts: COMPLETE (sprint generated and written) or
BLOCKED (cannot proceed due to missing data or gate failure).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "May I write" language (skill writes sprint file)
- [ ] Has a next-step handoff (what to do after sprint is written)
---
## Director Gate Checks
| Gate ID | Trigger condition | Mode guard |
|-----------|--------------------------|--------------------|
| PR-SPRINT | After sprint draft built | full only (not lean/solo) |
---
## Test Cases
### Case 1: Happy Path — Backlog with stories generates sprint
**Fixture:**
- `production/milestones/milestone-02.md` exists with capacity `10 story points`
- Backlog contains 5 unstarted stories across 2 epics, mixed priorities
- `production/session-state/review-mode.txt` contains `full`
- Next sprint number is `003` (sprints 001 and 002 already exist)
**Input:** `/sprint-plan`
**Expected behavior:**
1. Skill reads current milestone to obtain capacity and goals
2. Skill reads all unstarted stories from backlog; sorts by layer + priority
3. Skill drafts sprint-003 with stories fitting within capacity
4. Skill presents draft to user before invoking gate
5. Skill invokes PR-SPRINT gate (full mode); producer approves
6. Skill asks "May I write to `production/sprints/sprint-003.md`?"
7. User approves; file is written
**Assertions:**
- [ ] Stories are sorted by implementation layer before priority
- [ ] Sprint draft is shown before any write or gate invocation
- [ ] PR-SPRINT gate is invoked in full mode after draft is ready
- [ ] Skill asks "May I write" before writing the sprint file
- [ ] Written file path matches `production/sprints/sprint-003.md`
- [ ] Verdict is COMPLETE after successful write
---
### Case 2: Blocked Path — Backlog is empty
**Fixture:**
- `production/milestones/milestone-02.md` exists
- No unstarted stories exist in any epic backlog
**Input:** `/sprint-plan`
**Expected behavior:**
1. Skill reads backlog — finds no unstarted stories
2. Skill outputs "No unstarted stories in backlog"
3. Skill suggests running `/create-stories` to populate the backlog
4. No gate is invoked; no file is written
**Assertions:**
- [ ] Verdict is BLOCKED
- [ ] Output contains "No unstarted stories" or equivalent message
- [ ] Output recommends `/create-stories`
- [ ] PR-SPRINT gate is NOT invoked
- [ ] No write tool is called
---
### Case 3: Gate returns CONCERNS — Sprint overloaded, revised before write
**Fixture:**
- Backlog has 8 stories totalling 16 points; milestone capacity is 10 points
- `review-mode.txt` contains `full`
**Input:** `/sprint-plan`
**Expected behavior:**
1. Skill drafts sprint with all 8 stories (over capacity)
2. PR-SPRINT gate runs; producer returns CONCERNS: sprint is overloaded
3. Skill presents concern to user and asks which stories to defer
4. User selects 3 stories to defer; sprint is revised to 5 stories / 10 points
5. Skill asks "May I write" with revised sprint; writes on approval
**Assertions:**
- [ ] CONCERNS from PR-SPRINT gate surfaces to user before any write
- [ ] Skill allows sprint to be revised after gate feedback
- [ ] Revised sprint (not original) is written to file
- [ ] Verdict is COMPLETE after revision and write
---
### Case 4: Lean Mode — PR-SPRINT gate skipped
**Fixture:**
- Backlog has 4 stories; milestone capacity is 8 points
- `review-mode.txt` contains `lean`
**Input:** `/sprint-plan`
**Expected behavior:**
1. Skill reads review mode — determines `lean`
2. Skill drafts sprint and presents it to user
3. PR-SPRINT gate is skipped; output notes "[PR-SPRINT] skipped — Lean mode"
4. Skill asks user for direct approval of the sprint
5. User approves; sprint file is written
**Assertions:**
- [ ] PR-SPRINT gate is NOT invoked in lean mode
- [ ] Skip is explicitly noted in output
- [ ] User approval is still required before write (gate skip ≠ approval skip)
- [ ] Verdict is COMPLETE after write
---
### Case 5: Edge Case — Previous sprint still has open stories
**Fixture:**
- `production/sprints/sprint-002.md` exists with 2 stories still `Status: In Progress`
- Backlog has 5 new unstarted stories
- `review-mode.txt` contains `full`
**Input:** `/sprint-plan`
**Expected behavior:**
1. Skill reads sprint-002 and detects 2 open (in-progress) stories
2. Skill flags: "Sprint 002 has 2 open stories — confirm carry-over before planning sprint 003"
3. Skill presents user with choice: carry stories over, defer them, or cancel
4. User confirms carry-over; carried stories are prepended to new sprint with `[CARRY]` tag
5. Sprint draft is built; PR-SPRINT gate runs; sprint is written on approval
**Assertions:**
- [ ] Skill checks the most recent sprint file for open stories
- [ ] User is asked to confirm carry-over before sprint planning continues
- [ ] Carried stories appear in the new sprint draft with a distinguishing label
- [ ] Skill does not silently ignore open stories from the previous sprint
---
## Protocol Compliance
- [ ] Shows draft sprint before invoking PR-SPRINT gate or asking to write
- [ ] Always asks "May I write" before writing sprint file
- [ ] PR-SPRINT gate only runs in full mode
- [ ] Skip message appears in lean and solo mode output
- [ ] Verdict is clearly stated at the end of the skill output
---
## Coverage Notes
- The case where no milestone file exists is not explicitly tested; behavior
follows the BLOCKED pattern with a suggestion to run `/gate-check` for
milestone progression.
- Solo mode behavior is equivalent to lean (gate skipped, user approval
required) and is not separately tested.
- Parallel story selection algorithms are not tested here; those are unit
concerns for the sprint-plan subagent.

View File

@@ -0,0 +1,167 @@
# Skill Test Spec: /sprint-status
## Skill Summary
`/sprint-status` is a Haiku-tier read-only skill that reads the current active
sprint file and the session state to produce a concise sprint health summary.
It reports story counts by status (Complete / In Progress / Blocked / Not Started)
and emits one of three sprint-health verdicts: ON TRACK, AT RISK, or BLOCKED.
It never writes files and does not invoke any director gates. It is designed for
fast, low-cost status checks during a session.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings or numbered check sections
- [ ] Contains verdict keywords: ON TRACK, AT RISK, BLOCKED
- [ ] Does NOT require "May I write" language (read-only skill)
- [ ] Has a next-step handoff (what to do based on the verdict)
---
## Director Gate Checks
None. `/sprint-status` is a read-only reporting skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Mixed sprint, AT RISK with named blocker
**Fixture:**
- `production/sprints/sprint-004.md` exists (active sprint, linked in `active.md`)
- Sprint contains 6 stories:
- 3 with `Status: Complete`
- 2 with `Status: In Progress`
- 1 with `Status: Blocked` (blocker: "Waiting on physics ADR acceptance")
- Sprint end date is 2 days away
**Input:** `/sprint-status`
**Expected behavior:**
1. Skill reads `production/session-state/active.md` to find active sprint reference
2. Skill reads `production/sprints/sprint-004.md`
3. Skill counts stories by status: 3 Complete, 2 In Progress, 1 Blocked
4. Skill detects a Blocked story and the approaching deadline
5. Skill outputs AT RISK verdict with the blocker named explicitly
**Assertions:**
- [ ] Output includes story count breakdown by status
- [ ] Output names the specific blocked story and its blocker reason
- [ ] Verdict is AT RISK (not BLOCKED, not ON TRACK) when any story is Blocked
- [ ] Skill does not write any files
---
### Case 2: All Stories Complete — Sprint COMPLETE verdict
**Fixture:**
- `production/sprints/sprint-004.md` exists
- All 5 stories have `Status: Complete`
**Input:** `/sprint-status`
**Expected behavior:**
1. Skill reads sprint file — all stories are Complete
2. Skill outputs ON TRACK verdict or SPRINT COMPLETE label
3. Skill suggests running `/milestone-review` or `/sprint-plan` as next steps
**Assertions:**
- [ ] Verdict is ON TRACK or SPRINT COMPLETE when all stories are Complete
- [ ] Output notes that the sprint is fully done
- [ ] Next-step suggestion references `/milestone-review` or `/sprint-plan`
- [ ] No files are written
---
### Case 3: No Active Sprint File — Guidance to run /sprint-plan
**Fixture:**
- `production/session-state/active.md` does not reference an active sprint
- `production/sprints/` directory is empty or absent
**Input:** `/sprint-status`
**Expected behavior:**
1. Skill reads `active.md` — finds no active sprint reference
2. Skill checks `production/sprints/` — finds no files
3. Skill outputs an informational message: no active sprint detected
4. Skill suggests running `/sprint-plan` to create one
**Assertions:**
- [ ] Skill does not error or crash when no sprint file exists
- [ ] Output clearly states no active sprint was found
- [ ] Output recommends `/sprint-plan` as the next action
- [ ] No verdict keyword is emitted (no sprint to assess)
---
### Case 4: Edge Case — Stale In Progress Story (flagged)
**Fixture:**
- `production/sprints/sprint-004.md` exists
- One story has `Status: In Progress` with a note in `active.md`:
`Last updated: 2026-03-30` (more than 2 days before today's session date)
- No stories are Blocked
**Input:** `/sprint-status`
**Expected behavior:**
1. Skill reads sprint file and session state
2. Skill detects the story has been In Progress for >2 days without update
3. Skill flags the story as "stale" in the output
4. Verdict is AT RISK (stale in-progress stories indicate a hidden blocker)
**Assertions:**
- [ ] Skill compares story "last updated" metadata against session date
- [ ] Stale In Progress story is flagged by name in the output
- [ ] Verdict is AT RISK, not ON TRACK, when a stale story is detected
- [ ] Output does not conflate "stale" with "Blocked" — the label is distinct
---
### Case 5: Gate Compliance — Read-only; no gate invocation
**Fixture:**
- `production/sprints/sprint-004.md` exists with 4 stories (2 Complete, 2 In Progress)
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/sprint-status`
**Expected behavior:**
1. Skill reads sprint and produces status summary
2. Skill does NOT invoke any director gate regardless of review mode
3. Output is a plain status report with ON TRACK, AT RISK, or BLOCKED verdict
4. Skill does not prompt for user approval or ask to write any file
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Output does not contain any "May I write" prompt
- [ ] Skill completes and returns a verdict without user interaction
- [ ] Review mode file is ignored (or confirmed irrelevant) by this skill
---
## Protocol Compliance
- [ ] Does NOT use Write or Edit tools (read-only skill)
- [ ] Presents story count breakdown before emitting verdict
- [ ] Does not ask for approval
- [ ] Ends with a recommended next step based on verdict
- [ ] Runs on Haiku model tier (fast, low-cost)
---
## Coverage Notes
- The case where multiple sprints are active simultaneously is not tested;
the skill reads whichever sprint `active.md` references.
- Partial sprint completion percentages are not explicitly verified; the
count-by-status output implies them.
- The `solo` mode review-mode variant is not separately tested; gate
behavior in Case 5 applies to all modes equally.

View File

@@ -0,0 +1,210 @@
# Skill Test Spec: /team-audio
## Skill Summary
Orchestrates the audio team through a four-step pipeline: audio direction
(audio-director) → sound design + accessibility review in parallel (sound-designer
+ accessibility-specialist) → technical implementation + engine validation in
parallel (technical-artist + primary engine specialist) → code integration
(gameplay-programmer). Reads relevant GDDs, the sound bible (if present), and
existing audio asset lists before spawning agents. Compiles all outputs into an
audio design document saved to `design/gdd/audio-[feature].md`. Uses
`AskUserQuestion` at each step transition. Verdict is COMPLETE when the audio
design document is produced. Skips the engine specialist spawn gracefully when no
engine is configured.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 step/phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "File Write Protocol" section
- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
- [ ] Sub-agents enforce "May I write to [path]?" before any write
- [ ] Has a next-step handoff at the end (references `/dev-story`, `/asset-audit`)
- [ ] Error Recovery Protocol section is present
- [ ] `AskUserQuestion` is used at step transitions before proceeding
- [ ] Step 2 explicitly spawns sound-designer and accessibility-specialist in parallel
- [ ] Step 3 explicitly spawns technical-artist and engine specialist in parallel (when engine is configured)
- [ ] Skill reads `design/gdd/sound-bible.md` during context gathering if it exists
- [ ] Output document is saved to `design/gdd/audio-[feature].md`
---
## Test Cases
### Case 1: Happy Path — All steps complete, audio design document saved
**Fixture:**
- GDD for the target feature exists at `design/gdd/combat.md`
- Sound bible exists at `design/gdd/sound-bible.md`
- Existing audio assets are listed in `assets/audio/`
- Engine is configured in `.claude/docs/technical-preferences.md`
- No accessibility gaps exist in the planned audio event list
**Input:** `/team-audio combat`
**Expected behavior:**
1. Context gathering: orchestrator reads `design/gdd/combat.md`, `design/gdd/sound-bible.md`, and `assets/audio/` asset list before spawning any agent
2. Step 1: audio-director is spawned; defines sonic identity, emotional tone, adaptive music direction, mix targets, and adaptive audio rules for combat
3. `AskUserQuestion` presents audio direction; user approves before Step 2 begins
4. Step 2: sound-designer and accessibility-specialist are spawned in parallel; sound-designer produces SFX specifications, audio event list with trigger conditions, and mixing groups; accessibility-specialist identifies critical gameplay audio events and specifies visual fallback and subtitle requirements
5. `AskUserQuestion` presents SFX spec and accessibility requirements; user approves before Step 3 begins
6. Step 3: technical-artist and primary engine specialist are spawned in parallel; technical-artist designs bus structure, middleware integration, memory budgets, and streaming strategy; engine specialist validates that the integration approach is idiomatic for the configured engine
7. `AskUserQuestion` presents technical plan; user approves before Step 4 begins
8. Step 4: gameplay-programmer is spawned; wires up audio events to gameplay triggers, implements adaptive music, sets up occlusion zones, writes unit tests for audio event triggers
9. Orchestrator compiles all outputs into a single audio design document
10. Subagent asks "May I write the audio design document to `design/gdd/audio-combat.md`?" before writing
11. Summary output lists: audio event count, estimated asset count, implementation tasks, and any open questions
12. Verdict: COMPLETE
**Assertions:**
- [ ] Sound bible is read during context gathering (before Step 1) when it exists
- [ ] audio-director is spawned before sound-designer or accessibility-specialist
- [ ] `AskUserQuestion` appears after Step 1 output and before Step 2 launch
- [ ] sound-designer and accessibility-specialist Task calls are issued simultaneously in Step 2
- [ ] technical-artist and engine specialist Task calls are issued simultaneously in Step 3
- [ ] gameplay-programmer is not launched until Step 3 `AskUserQuestion` is approved
- [ ] Audio design document is written to `design/gdd/audio-combat.md` (not another path)
- [ ] Summary includes audio event count and estimated asset count
- [ ] No files are written by the orchestrator directly
- [ ] Verdict is COMPLETE after document delivery
---
### Case 2: Accessibility Gap — Critical gameplay audio event has no visual fallback
**Fixture:**
- GDD for the target feature exists
- Step 1 and Step 2 are in progress
- sound-designer's audio event list includes "EnemyNearbyAlert" — a spatial audio cue that warns the player an enemy is approaching from off-screen
- accessibility-specialist reviews the event list and finds "EnemyNearbyAlert" has no visual fallback (no on-screen indicator, no subtitle, no controller rumble specified)
**Input:** `/team-audio stealth` (Step 2 scenario)
**Expected behavior:**
1. Steps 12 proceed; accessibility-specialist and sound-designer are spawned in parallel
2. accessibility-specialist returns its review with a BLOCKING concern: "`EnemyNearbyAlert` is a critical gameplay audio event (warns player of off-screen threat) with no visual fallback — hearing-impaired players cannot detect this threat. This is a BLOCKING accessibility gap."
3. Orchestrator surfaces the concern immediately in conversation before presenting `AskUserQuestion`
4. `AskUserQuestion` presents the accessibility concern as a BLOCKING issue with options:
- Add a visual indicator for EnemyNearbyAlert (e.g., directional arrow on HUD) and continue
- Add controller haptic feedback as the fallback and continue
- Stop here and resolve all accessibility gaps before proceeding to Step 3
5. Step 3 (technical-artist + engine specialist) is not launched until the user resolves or explicitly accepts the gap
6. The accessibility gap is included in the final audio design document under "Open Accessibility Issues" if unresolved
**Assertions:**
- [ ] Accessibility gap is labeled BLOCKING (not advisory) in the report
- [ ] The specific event name ("EnemyNearbyAlert") and the nature of the gap are stated
- [ ] `AskUserQuestion` surfaces the gap before Step 3 is launched
- [ ] At least one resolution option is offered (add visual fallback, add haptic fallback)
- [ ] Step 3 is not launched while the gap is unresolved without explicit user authorization
- [ ] If the gap is carried forward unresolved, it is documented in the audio design doc as an open issue
---
### Case 3: No Argument — Usage guidance or design doc inference
**Fixture:**
- Any project state
**Input:** `/team-audio` (no argument)
**Expected behavior:**
1. Skill detects no argument is provided
2. Outputs usage guidance: e.g., "Usage: `/team-audio [feature or area]` — specify the feature or area to design audio for (e.g., `combat`, `main menu`, `forest biome`, `boss encounter`)"
3. Skill exits without spawning any agents
**Assertions:**
- [ ] Skill does NOT spawn any agents when no argument is provided
- [ ] Usage message includes the correct invocation format with argument examples
- [ ] Skill does NOT attempt to infer a feature from existing design docs without user direction
- [ ] No `AskUserQuestion` is used — output is direct guidance
---
### Case 4: Missing Sound Bible — Skill notes the gap and proceeds without it
**Fixture:**
- GDD for the target feature exists at `design/gdd/main-menu.md`
- `design/gdd/sound-bible.md` does NOT exist
- Engine is configured; other context files are present
**Input:** `/team-audio main menu`
**Expected behavior:**
1. Context gathering: orchestrator reads `design/gdd/main-menu.md` and checks for `design/gdd/sound-bible.md`
2. Sound bible is not found; orchestrator notes the gap in conversation: "Note: `design/gdd/sound-bible.md` not found — audio direction will proceed without a project-wide sonic identity reference. Consider creating a sound bible if this is an ongoing project."
3. Pipeline proceeds normally through all four steps without the sound bible as input
4. audio-director in Step 1 is informed that no sound bible exists and must establish sonic identity from the feature GDD alone
5. The missing sound bible is mentioned in the final summary as a recommended next step
**Assertions:**
- [ ] Orchestrator checks for the sound bible during context gathering (before Step 1)
- [ ] Missing sound bible is noted explicitly in conversation — not silently ignored
- [ ] Pipeline does NOT halt due to the missing sound bible
- [ ] audio-director is notified that no sound bible exists in its prompt context
- [ ] Summary or Next Steps section recommends creating a sound bible
- [ ] Verdict is still COMPLETE if all other steps succeed
---
### Case 5: Engine Not Configured — Engine specialist step skipped gracefully
**Fixture:**
- Engine is NOT configured in `.claude/docs/technical-preferences.md` (shows `[TO BE CONFIGURED]`)
- GDD for the target feature exists
- Sound bible may or may not exist
**Input:** `/team-audio boss encounter`
**Expected behavior:**
1. Context gathering: orchestrator reads `.claude/docs/technical-preferences.md` and detects no engine is configured
2. Steps 12 proceed normally (audio-director, sound-designer, accessibility-specialist)
3. Step 3: technical-artist is spawned normally; engine specialist spawn is SKIPPED
4. Orchestrator notes in conversation: "Engine specialist not spawned — no engine configured in technical-preferences.md. Engine integration validation will be deferred until an engine is selected."
5. Step 4: gameplay-programmer proceeds with a note that engine-specific audio integration patterns could not be validated
6. The engine specialist gap is included in the audio design document under "Deferred Validation"
7. Verdict: COMPLETE (skip is graceful, not a blocker)
**Assertions:**
- [ ] Engine specialist is NOT spawned when no engine is configured
- [ ] Skill does NOT error out due to the missing engine configuration
- [ ] The skip is explicitly noted in conversation — not silently omitted
- [ ] technical-artist is still spawned in Step 3 (skip applies only to the engine specialist)
- [ ] gameplay-programmer proceeds in Step 4 with the deferred validation noted
- [ ] Deferred engine validation is recorded in the audio design document
- [ ] Verdict is COMPLETE (engine not configured is a known graceful case)
---
## Protocol Compliance
- [ ] Context gathering (GDDs, sound bible, asset list) runs before any agent is spawned
- [ ] `AskUserQuestion` is used after every step output before the next step launches
- [ ] Parallel spawning: Step 2 (sound-designer + accessibility-specialist) and Step 3 (technical-artist + engine specialist) issue all Task calls before waiting for results
- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
- [ ] A partial report is always produced when some agents complete and others block
- [ ] Audio design document path follows the pattern `design/gdd/audio-[feature].md`
- [ ] Verdict is exactly COMPLETE or BLOCKED — no other verdict values used
- [ ] Next Steps handoff references `/dev-story` and `/asset-audit`
---
## Coverage Notes
- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+ partial-report pattern validated in Cases 2 and 5.
- Step 4 (gameplay-programmer) happy-path behavior is validated implicitly by Case 1.
Failure modes for this step follow the standard Error Recovery Protocol.
- The accessibility-specialist's subtitle and caption requirements (beyond visual fallbacks)
are validated implicitly by Case 1. Case 2 focuses on the more severe case where a
critical gameplay event has no fallback at all.
- Engine specialist validation logic (idiomatic integration, version-specific changes) is
tested only for the configured and unconfigured states. The specific content of the
engine specialist's output is out of scope for this behavioral spec.

View File

@@ -0,0 +1,180 @@
# Skill Test Spec: /team-combat
## Skill Summary
Orchestrates the full combat team pipeline end-to-end for a single combat feature.
Coordinates game-designer, gameplay-programmer, ai-programmer, technical-artist,
sound-designer, the primary engine specialist, and qa-tester through six structured
phases: Design → Architecture (with engine specialist validation) → Implementation
(parallel) → Integration → Validation → Sign-off. Uses `AskUserQuestion` at each
phase transition. Delegates all file writes to sub-agents. Produces a summary report
with verdict COMPLETE / NEEDS WORK / BLOCKED and handoffs to `/code-review`,
`/balance-check`, and `/team-polish`.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings (Phase 1 through Phase 6 are all present)
- [ ] Contains verdict keywords: COMPLETE, NEEDS WORK, BLOCKED
- [ ] Contains "May I write" or "File Write Protocol" — writes delegated to sub-agents, orchestrator does not write files directly
- [ ] Has a next-step handoff at the end (references `/code-review`, `/balance-check`, `/team-polish`)
- [ ] Error Recovery Protocol section is present with all four recovery steps
- [ ] Uses `AskUserQuestion` at phase transitions for user approval before proceeding
- [ ] Phase 3 is explicitly marked as parallel (gameplay-programmer, ai-programmer, technical-artist, sound-designer)
- [ ] Phase 2 includes spawning the primary engine specialist (read from `.claude/docs/technical-preferences.md`)
- [ ] Team Composition lists all seven roles (game-designer, gameplay-programmer, ai-programmer, technical-artist, sound-designer, engine specialist, qa-tester)
---
## Test Cases
### Case 1: Happy Path — All agents succeed, full pipeline runs to completion
**Fixture:**
- `design/gdd/game-concept.md` exists and is populated
- Engine is configured in `.claude/docs/technical-preferences.md` (Engine Specialists section filled)
- No existing GDD for the requested combat feature
**Input:** `/team-combat parry and riposte system`
**Expected behavior:**
1. Phase 1 — game-designer spawned; produces `design/gdd/parry-riposte.md` covering all 8 required sections (overview, player fantasy, rules, formulas, edge cases, dependencies, tuning knobs, acceptance criteria); asks user to approve design doc
2. Phase 2 — gameplay-programmer + ai-programmer spawned; produce architecture sketch with class structure, interfaces, and file list; then primary engine specialist is spawned to validate idioms; engine specialist output incorporated; `AskUserQuestion` presented with architecture options before Phase 3 begins
3. Phase 3 — gameplay-programmer, ai-programmer, technical-artist, sound-designer spawned in parallel; all four return outputs before Phase 4 begins
4. Phase 4 — integration wires together all Phase 3 outputs; tuning knobs verified as data-driven; `AskUserQuestion` confirms integration before Phase 5
5. Phase 5 — qa-tester spawned; writes test cases from acceptance criteria; verifies edge cases; performance impact checked against budget
6. Phase 6 — summary report produced: design COMPLETE, all team members COMPLETE, test cases listed, verdict: COMPLETE
7. Next steps listed: `/code-review`, `/balance-check`, `/team-polish`
**Assertions:**
- [ ] `AskUserQuestion` called at each phase gate (at minimum before Phase 3 and before Phase 5)
- [ ] Phase 3 agents launched simultaneously — no sequential dependency between gameplay-programmer, ai-programmer, technical-artist, sound-designer
- [ ] Engine specialist runs in Phase 2 before Phase 3 begins (output incorporated into architecture)
- [ ] All file writes delegated to sub-agents (orchestrator never calls Write/Edit directly)
- [ ] Verdict COMPLETE present in final report
- [ ] Next steps include `/code-review`, `/balance-check`, `/team-polish`
- [ ] Design doc covers all 8 required GDD sections
---
### Case 2: Blocked Agent — One subagent returns BLOCKED mid-pipeline
**Fixture:**
- `design/gdd/parry-riposte.md` exists (Phase 1 already complete)
- ai-programmer agent returns BLOCKED because no AI system architecture ADR exists (ADR status is Proposed)
**Input:** `/team-combat parry and riposte system`
**Expected behavior:**
1. Phase 1 — design doc found; game-designer confirms it is valid; phase approved
2. Phase 2 — gameplay-programmer completes architecture sketch; ai-programmer returns BLOCKED: "ADR for AI behavior system is Proposed — cannot implement until ADR is Accepted"
3. Error Recovery Protocol triggered: "ai-programmer: BLOCKED — AI behavior ADR is Proposed"
4. `AskUserQuestion` presented with options: (a) Skip ai-programmer and note the gap; (b) Retry with narrower scope; (c) Stop here and run `/architecture-decision` first
5. If user chooses (a): Phase 3 proceeds with gameplay-programmer, technical-artist, sound-designer only; ai-programmer gap noted in partial report
6. Final report produced: partial implementation documented, ai-programmer section marked BLOCKED, overall verdict: BLOCKED
**Assertions:**
- [ ] BLOCKED surface message appears before any dependent phase continues
- [ ] `AskUserQuestion` offers at minimum three options: skip / retry / stop
- [ ] Partial report produced — completed agents' work is not discarded
- [ ] Overall verdict is BLOCKED (not COMPLETE) when any agent is unresolved
- [ ] Blocked reason references the ADR and suggests `/architecture-decision`
- [ ] Orchestrator does not silently proceed past the blocked dependency
---
### Case 3: No Argument — Clear usage guidance shown
**Fixture:**
- Any project state
**Input:** `/team-combat` (no argument)
**Expected behavior:**
1. Skill detects no argument provided
2. Outputs usage message explaining the required argument (combat feature description)
3. Provides an example invocation: `/team-combat [combat feature description]`
4. Skill exits without spawning any subagents
**Assertions:**
- [ ] Skill does NOT spawn any subagents when no argument is given
- [ ] Usage message includes the argument-hint format from frontmatter
- [ ] Error message includes at least one example of a valid invocation
- [ ] No file reads beyond what is needed to detect the missing argument
- [ ] Verdict is NOT shown (pipeline never runs)
---
### Case 4: Parallel Phase Validation — Phase 3 agents run simultaneously
**Fixture:**
- `design/gdd/parry-riposte.md` exists and is complete
- Architecture sketch has been approved
- Engine specialist has validated architecture
**Input:** `/team-combat parry and riposte system` (resuming from Phase 2 complete)
**Expected behavior:**
1. Phase 3 begins after architecture approval
2. All four Task calls — gameplay-programmer, ai-programmer, technical-artist, sound-designer — are issued before any result is awaited
3. Skill waits for all four agents to complete before proceeding to Phase 4
4. If any single agent completes early, skill does not begin Phase 4 until all four have returned
**Assertions:**
- [ ] Four Task calls issued in a single batch (no sequential waiting between them)
- [ ] Phase 4 does not begin until all four Phase 3 agents have returned results
- [ ] Skill does not pass one Phase 3 agent's output as input to another Phase 3 agent (they are independent)
- [ ] All four Phase 3 agent results referenced in the Phase 4 integration step
---
### Case 5: Architecture Phase Engine Routing — Engine specialist receives correct context
**Fixture:**
- `.claude/docs/technical-preferences.md` has Engine Specialists section populated (e.g., Primary: godot-specialist)
- Architecture sketch produced by gameplay-programmer is available
- Engine version pinned in `docs/engine-reference/godot/VERSION.md`
**Input:** `/team-combat parry and riposte system`
**Expected behavior:**
1. Phase 2 — gameplay-programmer produces architecture sketch
2. Skill reads `.claude/docs/technical-preferences.md` Engine Specialists section to identify the primary engine specialist agent type
3. Engine specialist is spawned with: the architecture sketch, the GDD path, the engine version from `VERSION.md`, and explicit instructions to check for deprecated APIs
4. Engine specialist output (idiom notes, deprecated API warnings, native system recommendations) is returned to orchestrator
5. Orchestrator incorporates engine notes into the architecture before presenting Phase 2 results to user
6. `AskUserQuestion` includes engine specialist's notes alongside the architecture sketch
**Assertions:**
- [ ] Engine specialist agent type is read from `.claude/docs/technical-preferences.md` — not hardcoded
- [ ] Engine specialist prompt includes the architecture sketch and GDD path
- [ ] Engine specialist checks for deprecated APIs against the pinned engine version
- [ ] Engine specialist output is incorporated before Phase 3 begins (not skipped or appended separately)
- [ ] If no engine is configured, engine specialist step is skipped and a note is added to the report
---
## Protocol Compliance
- [ ] `AskUserQuestion` used at each phase transition — user approves before pipeline advances
- [ ] All file writes delegated to sub-agents via Task — orchestrator does not call Write or Edit directly
- [ ] Error Recovery Protocol followed: surface → assess → offer options → partial report
- [ ] Phase 3 agents launched in parallel per skill spec
- [ ] Partial report always produced even when agents are BLOCKED
- [ ] Verdict is one of COMPLETE / NEEDS WORK / BLOCKED
- [ ] Next steps present at end of output: `/code-review`, `/balance-check`, `/team-polish`
---
## Coverage Notes
- The NEEDS WORK verdict path (qa-tester finds failures in Phase 5) is not separately tested
here; it follows the same error recovery and partial report protocol as Case 2.
- "Retry with narrower scope" error recovery option is listed in assertions but its full
recursive behavior (splitting via `/create-stories`) is covered by the `/create-stories` spec.
- Phase 4 integration logic (wiring gameplay, AI, VFX, audio) is validated implicitly by
the Happy Path case; a dedicated integration test would require fixture code files.
- Engine specialist unavailable (no engine configured) is partially covered in Case 5
assertions — a dedicated fixture for unconfigured engine state would strengthen coverage.

View File

@@ -0,0 +1,209 @@
# Skill Test Spec: /team-level
## Skill Summary
Orchestrates the full level design team for a single level or area. Coordinates
narrative-director, world-builder, level-designer, systems-designer, art-director,
accessibility-specialist, and qa-tester through five sequential steps with one
parallel phase (Step 4). Compiles all team outputs into a single level design
document saved to `design/levels/[level-name].md`. Uses `AskUserQuestion` at each
step transition. Delegates all file writes to sub-agents. Produces a summary report
with verdict COMPLETE / BLOCKED and handoffs to `/design-review`, `/dev-story`,
`/qa-plan`.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase/step headings (Step 1 through Step 5 are all present)
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "May I write" or "File Write Protocol" — writes delegated to sub-agents, orchestrator does not write files directly
- [ ] Has a next-step handoff at the end (references `/design-review`, `/dev-story`, `/qa-plan`)
- [ ] Error Recovery Protocol section is present with all four recovery steps
- [ ] Uses `AskUserQuestion` at step transitions for user approval before proceeding
- [ ] Step 4 is explicitly marked as parallel (art-director and accessibility-specialist run simultaneously)
- [ ] Context gathering reads: `design/gdd/game-concept.md`, `design/gdd/game-pillars.md`, `design/levels/`, `design/narrative/`, and relevant world-building docs
- [ ] Team Composition lists all seven roles (narrative-director, world-builder, level-designer, systems-designer, art-director, accessibility-specialist, qa-tester)
- [ ] accessibility-specialist output includes severity ratings (BLOCKING / RECOMMENDED / NICE TO HAVE)
- [ ] Final level design document saved to `design/levels/[level-name].md`
---
## Test Cases
### Case 1: Happy Path — All team members produce outputs, document compiled and saved
**Fixture:**
- `design/gdd/game-concept.md` exists and is populated
- `design/gdd/game-pillars.md` exists
- `design/levels/` directory exists (may contain other level docs)
- `design/narrative/` directory exists with relevant narrative docs
**Input:** `/team-level forest dungeon`
**Expected behavior:**
1. Context gathering — orchestrator reads game-concept.md, game-pillars.md, existing level docs in `design/levels/`, narrative docs in `design/narrative/`, and world-building docs for the forest region
2. Step 1 — narrative-director spawned: defines narrative purpose, key characters, dialogue triggers, emotional arc; world-builder spawned: provides lore context, environmental storytelling opportunities, world rules; `AskUserQuestion` confirms Step 1 outputs before Step 2
3. Step 2 — level-designer spawned: designs spatial layout (critical path, optional paths, secrets), pacing curve, encounters, puzzles, entry/exit points and connections to adjacent areas; `AskUserQuestion` confirms layout before Step 3
4. Step 3 — systems-designer spawned: specifies enemy compositions, loot tables, difficulty balance, area-specific mechanics, resource distribution; `AskUserQuestion` confirms systems before Step 4
5. Step 4 — art-director and accessibility-specialist spawned in parallel; art-director: visual theme, color palette, lighting, asset list, VFX needs; accessibility-specialist: navigation clarity, colorblind safety, cognitive load check — each concern rated BLOCKING / RECOMMENDED / NICE TO HAVE; `AskUserQuestion` presents both outputs before Step 5
6. Step 5 — qa-tester spawned: test cases for critical path, boundary/edge cases (sequence breaks, softlocks), playtest checklist, acceptance criteria
7. Orchestrator compiles all team outputs into level design document format; sub-agent asked "May I write to `design/levels/forest-dungeon.md`?"; file saved
8. Summary report: area overview, encounter count, estimated asset list, narrative beats, cross-team dependencies, verdict: COMPLETE
9. Next steps listed: `/design-review design/levels/forest-dungeon.md`, `/dev-story`, `/qa-plan`
**Assertions:**
- [ ] All five sources read during context gathering before any agent is spawned
- [ ] narrative-director and world-builder both spawned in Step 1 (may be sequential or parallel — both must complete before Step 2)
- [ ] `AskUserQuestion` called at each step gate (minimum: after Step 1, Step 2, Step 3, Step 4)
- [ ] Step 4 agents (art-director, accessibility-specialist) launched simultaneously
- [ ] All file writes delegated to sub-agents — orchestrator does not write directly
- [ ] Level doc saved to `design/levels/forest-dungeon.md` (slugified from argument)
- [ ] Verdict COMPLETE in final summary report
- [ ] Next steps include `/design-review`, `/dev-story`, `/qa-plan`
- [ ] Summary report includes: area overview, encounter count, estimated asset list, narrative beats
---
### Case 2: Blocked Agent (world-builder) — Partial report produced with gap noted
**Fixture:**
- `design/gdd/game-concept.md` exists
- World-building docs for the forest region do NOT exist
- world-builder agent returns BLOCKED: "No world-building docs found for the forest region — cannot provide lore context"
**Input:** `/team-level forest dungeon`
**Expected behavior:**
1. Context gathering completes; missing world-building docs noted
2. Step 1 — narrative-director completes successfully; world-builder spawned and returns BLOCKED
3. Error Recovery Protocol triggered: "world-builder: BLOCKED — no world-building docs for forest region"
4. `AskUserQuestion` presented with options:
- (a) Skip world-builder and note the lore gap in the level doc
- (b) Retry with narrower scope (world-builder focuses only on what can be inferred from game-concept.md)
- (c) Stop here and create world-building docs first
5. If user chooses (a): pipeline continues with Steps 25 using narrative-director context only; level doc compiled with a clearly marked gap section: "World-building context: NOT PROVIDED — see open dependency"
6. Final report produced: partial outputs documented, world-builder section marked BLOCKED, overall verdict: BLOCKED
**Assertions:**
- [ ] BLOCKED surface message appears immediately when world-builder fails — before Step 2 begins without user input
- [ ] `AskUserQuestion` offers at minimum three options (skip / retry / stop)
- [ ] Partial report produced — narrative-director's completed work is not discarded
- [ ] Level doc (if compiled) contains an explicit gap notation for the missing world-building context
- [ ] Overall verdict is BLOCKED (not COMPLETE) when world-builder remains unresolved
- [ ] Skill does NOT silently fabricate lore content to fill the gap
---
### Case 3: No Argument — Usage guidance shown
**Fixture:**
- Any project state
**Input:** `/team-level` (no argument)
**Expected behavior:**
1. Skill detects no argument provided
2. Outputs usage message explaining the required argument (level name or area to design)
3. Provides example invocations: `/team-level tutorial`, `/team-level forest dungeon`, `/team-level final boss arena`
4. Skill exits without reading any project files or spawning any subagents
**Assertions:**
- [ ] Skill does NOT spawn any subagents when no argument is given
- [ ] Usage message includes the argument-hint format from frontmatter
- [ ] At least one example of a valid invocation is shown
- [ ] No GDD or level files read before failing
- [ ] Verdict is NOT shown (pipeline never starts)
---
### Case 4: Accessibility Review Gate — Blocking concern surfaces before sign-off
**Fixture:**
- Steps 13 complete successfully
- `design/accessibility-requirements.md` committed tier: Enhanced
- accessibility-specialist (Step 4, parallel) flags a BLOCKING concern: the critical path through the forest dungeon requires players to distinguish between two environmental hazards (toxic pools vs. shallow water) using color alone — no shape, icon, or audio cue differentiates them
**Input:** `/team-level forest dungeon`
**Expected behavior:**
1. Steps 13 complete; Step 4 parallel phase begins
2. accessibility-specialist returns: BLOCKING concern — "Critical path hazard distinction relies on color only (toxic pools vs. shallow water). Shape, icon, or audio cue required per Enhanced accessibility tier."
3. art-director returns Step 4 output (complete)
4. Skill presents both Step 4 results via `AskUserQuestion` — BLOCKING concern highlighted prominently
5. `AskUserQuestion` offers:
- (a) Return to level-designer + art-director to redesign hazard visual/audio language before Step 5
- (b) Document as a known accessibility gap and proceed to Step 5 with the concern logged
6. Skill does NOT silently proceed past the BLOCKING concern
7. If user chooses (a): level-designer and art-director revision spawned; re-run Step 4 accessibility check
8. Final report includes BLOCKING concern and its resolution status regardless of user choice
**Assertions:**
- [ ] BLOCKING accessibility concern is not treated as advisory — it is surfaced as a blocker
- [ ] `AskUserQuestion` presents the specific concern text (not just "accessibility issue found")
- [ ] Step 5 (qa-tester) does NOT begin without user acknowledging the BLOCKING concern
- [ ] Revision path offered: level-designer + art-director can be sent back before proceeding
- [ ] Final report includes the accessibility concern and its resolution status
- [ ] art-director's completed output is NOT discarded when accessibility-specialist blocks
---
### Case 5: Circular Level Reference — Adjacent area dependency flagged
**Fixture:**
- Steps 13 in progress
- level-designer (Step 2) produces a layout that specifies entry/exit points connecting to "the crystal caves" (an adjacent area)
- `design/levels/crystal-caves.md` does NOT exist — the crystal caves area has not been designed yet
**Input:** `/team-level forest dungeon`
**Expected behavior:**
1. Step 2 — level-designer produces layout including: "West exit connects to crystal-caves entry point A"
2. Orchestrator (or level-designer subagent) checks `design/levels/` for `crystal-caves.md`; file not found
3. Dependency gap surfaced: "Level references crystal-caves as an adjacent area but `design/levels/crystal-caves.md` does not exist"
4. `AskUserQuestion` presented with options:
- (a) Proceed with a placeholder reference — note the dependency in the level doc as UNRESOLVED
- (b) Pause and run `/team-level crystal caves` first to establish that area
5. Skill does NOT invent crystal caves content to satisfy the reference
6. If user chooses (a): level doc compiled with the west exit marked "→ crystal-caves (UNRESOLVED — area not yet designed)"; flagged in the open dependencies section of the summary report
7. Final report includes open cross-level dependencies section
**Assertions:**
- [ ] Skill detects the missing adjacent area by checking `design/levels/` — does not assume it will be created later
- [ ] Skill does NOT fabricate crystal caves content (lore, layout, connections) to resolve the reference
- [ ] `AskUserQuestion` offers a "design crystal caves first" option referencing `/team-level`
- [ ] If user proceeds with placeholder, level doc explicitly marks the west exit as UNRESOLVED
- [ ] Summary report includes an open cross-level dependencies section listing unresolved references
- [ ] Circular or forward references do not cause the skill to loop or crash
---
## Protocol Compliance
- [ ] `AskUserQuestion` used at each step transition — user approves before pipeline advances
- [ ] All file writes delegated to sub-agents via Task — orchestrator does not call Write or Edit directly
- [ ] Error Recovery Protocol followed: surface → assess → offer options → partial report
- [ ] Step 4 agents (art-director, accessibility-specialist) launched in parallel per skill spec
- [ ] Partial report always produced even when agents are BLOCKED
- [ ] Accessibility BLOCKING concerns surface before sign-off and require explicit user acknowledgment
- [ ] Verdict is one of COMPLETE / BLOCKED
- [ ] Next steps present at end: `/design-review`, `/dev-story`, `/qa-plan`
---
## Coverage Notes
- narrative-director and world-builder in Step 1 may be sequential or parallel — the skill spec
spawns both but does not mandate simultaneous launch; coverage of parallel Step 1 would require
an explicit timing assertion fixture.
- The "Retry with narrower scope" option in the blocked world-builder case (Case 2) — the
retry behavior itself is not tested in depth; its full path is analogous to the blocked agent
pattern covered in Case 2 and in other team-* specs.
- systems-designer (Step 3) block scenarios are not separately tested; the same Error Recovery
Protocol applies and the pattern is validated by Case 2.
- Step 4 parallel ordering (art-director completing before or after accessibility-specialist)
does not affect outcomes — both must return before Step 5 regardless of order.
- The level doc slug convention (argument → filename) is implicitly tested by Case 1
(`forest dungeon``forest-dungeon.md`); multi-word slugification edge cases (special
characters, very long names) are not covered.

View File

@@ -0,0 +1,178 @@
# Skill Test Spec: /team-live-ops
## Skill Summary
Orchestrates the live-ops team through a 7-phase planning pipeline to produce a
season or event plan. Coordinates live-ops-designer, economy-designer,
analytics-engineer, community-manager, narrative-director, and writer. Phases 3
and 4 (economy design and analytics) run simultaneously. Ends with a consolidated
season plan requiring user approval before handoff to production.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "May I write" language in the File Write Protocol section (delegated to sub-agents)
- [ ] Has a File Write Protocol section stating that the orchestrator does not write files directly
- [ ] Has a next-step handoff at the end referencing `/design-review`, `/sprint-plan`, and `/team-release`
- [ ] Uses `AskUserQuestion` at phase transitions to capture user approval before proceeding
- [ ] States explicitly that Phases 3 and 4 can run simultaneously (parallel spawning)
- [ ] Error recovery section present (or implied through BLOCKED handling)
- [ ] Output documents section specifies paths under `design/live-ops/seasons/`
---
## Test Cases
### Case 1: Happy Path — All 7 phases complete, season plan produced
**Fixture:**
- `design/live-ops/economy-rules.md` exists with current economy configuration
- `design/live-ops/ethics-policy.md` exists with the project ethics policy
- Game concept document exists at its standard path
- No existing season documents for the new season name being planned
**Input:** `/team-live-ops "Season 2: The Frozen Wastes"`
**Expected behavior:**
1. Phase 1: Spawns `live-ops-designer` via Task; receives season brief with scope, content list, and retention mechanic; presents to user
2. AskUserQuestion: user approves Phase 1 output before Phase 2 begins
3. Phase 2: Spawns `narrative-director` via Task; reads the Phase 1 season brief; produces narrative framing document (theme, story hook, lore connections); presents to user
4. Phase 3 and 4 (parallel): Spawns `economy-designer` and `analytics-engineer` simultaneously via two Task calls before waiting for either result; economy-designer reads `design/live-ops/economy-rules.md`
5. Phase 5: Spawns `narrative-director` and `writer` in parallel to produce in-game narrative text and player-facing copy; both read Phase 2 narrative framing doc
6. Phase 6: Spawns `community-manager` via Task; reads season brief, economy design, and narrative framing; produces communication calendar with draft copy
7. Phase 7: Collects all phase outputs; presents consolidated season plan summary including economy health check, analytics readiness, ethics review, and open questions
8. AskUserQuestion: user approves the full season plan
9. Sub-agents ask "May I write to `design/live-ops/seasons/S2_The_Frozen_Wastes.md`?", `...analytics.md`, and `...comms.md` before writing
10. Verdict: COMPLETE — season plan produced and handed off for production
**Assertions:**
- [ ] All 7 phases execute in order; Phase 3 and 4 are issued as parallel Task calls
- [ ] Phase 7 consolidated summary includes all six sections (season brief, narrative framing, economy design, analytics plan, content inventory, communication calendar)
- [ ] Ethics review section in Phase 7 explicitly references `design/live-ops/ethics-policy.md`
- [ ] Three output documents written to `design/live-ops/seasons/` with correct naming convention
- [ ] File writes are delegated to sub-agents — orchestrator does not write directly
- [ ] Verdict: COMPLETE appears in final output
- [ ] Next steps reference `/design-review`, `/sprint-plan`, and `/team-release`
---
### Case 2: Ethics Violation Found — Reward element violates ethics policy
**Fixture:**
- All standard live-ops fixtures present (economy-rules.md, ethics-policy.md)
- `design/live-ops/ethics-policy.md` explicitly prohibits loot boxes targeting players under 18
- economy-designer (Phase 3) proposes a "Mystery Chest" mechanic with randomized premium rewards and no pity timer
**Input:** `/team-live-ops "Season 3: Shadow Tournament"`
**Expected behavior:**
1. Phases 14 proceed normally; economy-designer proposes Mystery Chest mechanic
2. Phase 7: Orchestrator reviews Phase 3 output against ethics policy; identifies Mystery Chest as a violation of the "no untransparent random premium rewards" rule in the ethics policy
3. Ethics review section of the Phase 7 summary flags the violation explicitly: "ETHICS FLAG: Mystery Chest mechanic in Phase 3 economy design violates [policy rule]. Approval is blocked until this is resolved."
4. AskUserQuestion presented with resolution options before season plan approval is offered
5. Skill does NOT issue a COMPLETE verdict or write output documents until the ethics violation is resolved or explicitly waived by the user
**Assertions:**
- [ ] Phase 7 ethics review section explicitly names the violating element and the policy rule it breaks
- [ ] Skill does not auto-approve the season plan when an ethics violation is present
- [ ] AskUserQuestion is used to surface the violation and offer resolution options (revise economy design, override with documented rationale, cancel)
- [ ] Output documents are NOT written while the violation is unresolved
- [ ] If user chooses to revise: skill re-spawns economy-designer to produce a corrected design before returning to Phase 7 review
- [ ] Verdict: COMPLETE is only issued after the ethics flag is cleared
---
### Case 3: No Argument — Usage guidance shown
**Fixture:**
- Any project state
**Input:** `/team-live-ops` (no argument)
**Expected behavior:**
1. Phase 1: No argument detected
2. Outputs: "Usage: `/team-live-ops [season name or event description]` — Provide the name or description of the season or live event to plan."
3. Skill exits immediately without spawning any subagents
**Assertions:**
- [ ] Skill does NOT guess a season name or fabricate a scope
- [ ] Error message includes the correct usage format with the argument-hint
- [ ] No Task calls are issued before the argument check fails
- [ ] No files are read or written
---
### Case 4: Parallel Phase Validation — Phases 3 and 4 run simultaneously
**Fixture:**
- All standard live-ops fixtures present
- Phase 1 (season brief) and Phase 2 (narrative framing) already approved
- Phase 3 (economy-designer) and Phase 4 (analytics-engineer) inputs are independent of each other
**Input:** `/team-live-ops "Season 1: The First Thaw"` (observed at Phase 3/4 transition)
**Expected behavior:**
1. After Phase 2 is approved by the user, the orchestrator issues both Task calls (economy-designer and analytics-engineer) before awaiting either result
2. Both agents receive the season brief as context; analytics-engineer does NOT wait for economy-designer output to begin
3. Economy-designer output and analytics-engineer output are collected together before Phase 5 begins
4. If one of the two parallel agents blocks, the other continues; a partial result is reported
**Assertions:**
- [ ] Both Task calls for Phase 3 and Phase 4 are issued before either result is awaited — they are not sequential
- [ ] Analytics-engineer prompt does NOT include economy-designer output as a required input (the inputs are independent)
- [ ] If economy-designer blocks but analytics-engineer succeeds, analytics output is preserved and the block is surfaced via AskUserQuestion
- [ ] Phase 5 does not begin until BOTH Phase 3 and Phase 4 results are collected
- [ ] Skill documentation explicitly states "Phases 3 and 4 can run simultaneously"
---
### Case 5: Missing Ethics Policy — `design/live-ops/ethics-policy.md` does not exist
**Fixture:**
- `design/live-ops/economy-rules.md` exists
- `design/live-ops/ethics-policy.md` does NOT exist
- All other fixtures are present
**Input:** `/team-live-ops "Season 4: Desert Heat"`
**Expected behavior:**
1. Phases 14 proceed; economy-designer and analytics-engineer are given the ethics policy path but it is absent
2. Phase 7: Orchestrator attempts to run ethics review; detects that `design/live-ops/ethics-policy.md` is missing
3. Phase 7 summary includes a gap flag: "ETHICS REVIEW SKIPPED: `design/live-ops/ethics-policy.md` not found. Economy design was not reviewed against an ethics policy. Recommend creating one before production begins."
4. Skill still completes the season plan and reaches COMPLETE verdict, but the gap is prominently flagged in the output and in the season design document
5. Next steps include a recommendation to create the ethics policy document
**Assertions:**
- [ ] Skill does NOT error out when the ethics policy file is missing
- [ ] Skill does NOT fabricate ethics policy rules in the absence of the file
- [ ] Phase 7 summary explicitly notes that ethics review was skipped and why
- [ ] Verdict: COMPLETE is still reachable despite the missing file
- [ ] Gap flag appears in the season design output document (not just in conversation)
- [ ] Next steps recommend creating `design/live-ops/ethics-policy.md`
---
## Protocol Compliance
- [ ] `AskUserQuestion` used at every phase transition — user approves before the next phase begins
- [ ] Phases 3 and 4 are always spawned in parallel, not sequentially
- [ ] File Write Protocol: orchestrator never calls Write/Edit directly — all writes are delegated to sub-agents
- [ ] Each output document gets its own "May I write to [path]?" ask from the relevant sub-agent
- [ ] Ethics review in Phase 7 always references the ethics policy file path explicitly
- [ ] Error recovery: any BLOCKED agent is surfaced immediately with AskUserQuestion options (skip / retry / stop)
- [ ] Partial reports are produced if any phase blocks — work is never discarded
- [ ] Verdict: COMPLETE only after user approves the consolidated season plan; BLOCKED if any unresolved ethics violation exists
- [ ] Next steps always include `/design-review`, `/sprint-plan`, and `/team-release`
---
## Coverage Notes
- Phase 5 parallel spawning (narrative-director + writer) follows the same pattern as Phases 3/4 but is not separately tested here — it uses the same parallel Task protocol validated in Case 4.
- The "economy-rules.md absent" edge case is not separately tested — it would surface as a BLOCKED result from economy-designer and follow the standard error recovery path tested implicitly in Case 4.
- The full content writing pipeline (Phase 5 output validation) is validated implicitly by the Case 1 happy path consolidated summary check.
- Community manager communication calendar format (pre-launch, launch day, mid-season, final week) is validated implicitly by Case 1; no separate edge case is needed.

View File

@@ -0,0 +1,209 @@
# Skill Test Spec: /team-narrative
## Skill Summary
Orchestrates the narrative team through a five-phase pipeline: narrative direction
(narrative-director) → world foundation + dialogue drafting (world-builder and writer
in parallel) → level narrative integration (level-designer) → consistency review
(narrative-director) → polish + localization compliance (writer, localization-lead,
and world-builder in parallel). Uses `AskUserQuestion` at each phase transition to
present proposals as selectable options. Produces a narrative summary report and
delivers narrative documents via subagents that each enforce the "May I write?"
protocol. Verdict is COMPLETE when all phases succeed, or BLOCKED when a dependency
is unresolved.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "File Write Protocol" section
- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
- [ ] Sub-agents enforce "May I write to [path]?" before any write
- [ ] Has a next-step handoff at the end (references `/design-review`, `/localize extract`, `/dev-story`)
- [ ] Error Recovery Protocol section is present
- [ ] `AskUserQuestion` is used at phase transitions before proceeding
- [ ] Phase 2 explicitly spawns world-builder and writer in parallel
- [ ] Phase 5 explicitly spawns writer, localization-lead, and world-builder in parallel
---
## Test Cases
### Case 1: Happy Path — All five phases complete, narrative doc delivered
**Fixture:**
- A game concept and GDD exist for the target feature (e.g., `design/gdd/faction-intro.md`)
- Character voice profiles exist (e.g., `design/narrative/characters/`)
- Existing lore entries exist for cross-reference (e.g., `design/narrative/lore/`)
- No lore contradictions exist between existing entries and the new content
**Input:** `/team-narrative faction introduction cutscene for the Ironveil faction`
**Expected behavior:**
1. Phase 1: narrative-director is spawned; outputs a narrative brief defining the story beat, characters involved, emotional tone, and lore dependencies
2. `AskUserQuestion` presents the narrative brief; user approves before Phase 2 begins
3. Phase 2: world-builder and writer are spawned in parallel; world-builder produces lore entries for the Ironveil faction; writer drafts dialogue lines using character voice profiles
4. `AskUserQuestion` presents world foundation and dialogue drafts; user approves before Phase 3 begins
5. Phase 3: level-designer is spawned; produces environmental storytelling layout, trigger placement, and pacing plan
6. `AskUserQuestion` presents level narrative plan; user approves before Phase 4 begins
7. Phase 4: narrative-director reviews all dialogue against voice profiles, verifies lore consistency, confirms pacing; approves or flags issues
8. `AskUserQuestion` presents review results; user approves before Phase 5 begins
9. Phase 5: writer, localization-lead, and world-builder are spawned in parallel; writer performs final self-review; localization-lead validates i18n compliance; world-builder finalizes canon levels
10. Final summary report is presented; subagent asks "May I write the narrative document to [path]?" before writing
11. Verdict: COMPLETE
**Assertions:**
- [ ] narrative-director is spawned in Phase 1 before any other agents
- [ ] `AskUserQuestion` appears after Phase 1 output and before Phase 2 launch
- [ ] world-builder and writer Task calls are issued simultaneously in Phase 2 (not sequentially)
- [ ] level-designer is not launched until Phase 2 `AskUserQuestion` is approved
- [ ] narrative-director is re-spawned in Phase 4 for consistency review
- [ ] Phase 5 spawns all three agents (writer, localization-lead, world-builder) simultaneously
- [ ] Summary report includes: narrative brief status, lore entries created/updated, dialogue lines written, level narrative integration points, consistency review results
- [ ] No files are written by the orchestrator directly
- [ ] Verdict is COMPLETE after delivery
---
### Case 2: Lore Contradiction Found — world-builder finds conflict before writer proceeds
**Fixture:**
- Existing lore entry at `design/narrative/lore/ironveil-history.md` states the Ironveil faction was founded 200 years ago
- The new narrative brief (from Phase 1) states the Ironveil were founded 50 years ago
- The writer has been spawned in parallel with the world-builder in Phase 2
**Input:** `/team-narrative ironveil faction introduction cutscene`
**Expected behavior:**
1. Phases 12 begin normally
2. Phase 2 world-builder detects a factual contradiction between the narrative brief and existing lore: founding date conflict
3. world-builder returns BLOCKED with reason: "Lore contradiction found — founding date conflicts with `design/narrative/lore/ironveil-history.md`"
4. Orchestrator surfaces the contradiction immediately: "world-builder: BLOCKED — Lore contradiction: founding date in narrative brief (50 years ago) conflicts with existing canon (200 years ago in `ironveil-history.md`)"
5. Orchestrator assesses dependency: the writer's dialogue depends on canon lore — the writer's draft cannot be finalized without resolving the contradiction
6. `AskUserQuestion` presents options:
- Revise the narrative brief to match existing canon (200 years ago)
- Update the existing lore entry to reflect the new canon (50 years ago)
- Stop here and resolve the contradiction in the lore docs first
7. Writer output is preserved but flagged as pending canon resolution — work is not discarded
8. Orchestrator does NOT proceed to Phase 3 until the contradiction is resolved or user explicitly chooses to skip
**Assertions:**
- [ ] Contradiction is surfaced before Phase 3 begins
- [ ] Orchestrator does not silently resolve the contradiction by picking one version
- [ ] `AskUserQuestion` presents at least 3 options including "stop and resolve first"
- [ ] Writer's draft output is preserved in the partial report, not discarded
- [ ] Phase 3 (level-designer) is not launched until the user resolves the contradiction
- [ ] Verdict is BLOCKED (not COMPLETE) if the user stops to resolve the contradiction
---
### Case 3: No Argument — Usage guidance shown
**Fixture:**
- Any project state
**Input:** `/team-narrative` (no argument)
**Expected behavior:**
1. Skill detects no argument is provided
2. Outputs usage guidance: e.g., "Usage: `/team-narrative [narrative content description]` — describe the story content, scene, or narrative area to work on (e.g., `boss encounter cutscene`, `faction intro dialogue`, `tutorial narrative`)"
3. Skill exits without spawning any agents
**Assertions:**
- [ ] Skill does NOT spawn any agents when no argument is provided
- [ ] Usage message includes the correct invocation format with an argument example
- [ ] Skill does NOT attempt to guess or infer a narrative topic from project files
- [ ] No `AskUserQuestion` is used — output is direct guidance
---
### Case 4: Localization Compliance — localization-lead flags a non-translatable string
**Fixture:**
- Phases 14 complete successfully
- Phase 5 begins; writer and world-builder complete without issues
- localization-lead finds a dialogue line that uses a hardcoded formatted date string (e.g., `"On March 12th, Year 3"`) that cannot survive locale-specific translation without a locale-aware formatter
**Input:** `/team-narrative ironveil faction introduction cutscene` (Phase 5 scenario)
**Expected behavior:**
1. Phase 5 spawns writer, localization-lead, and world-builder in parallel
2. localization-lead completes its review and flags: "String key `dialogue.ironveil.intro.003` contains a hardcoded date format (`March 12th, Year 3`) that will not localize correctly — requires a locale-aware date placeholder"
3. Orchestrator surfaces the localization blocker in the summary report
4. The localization issue is labeled as BLOCKING in the final report (not advisory)
5. `AskUserQuestion` presents options:
- Fix the string now (writer revises the line)
- Note the gap and deliver the narrative doc with the issue flagged
- Stop and resolve before finalizing
6. If the user chooses to proceed with the issue flagged, verdict is COMPLETE with noted localization debt; if user stops, verdict is BLOCKED
**Assertions:**
- [ ] localization-lead is spawned in Phase 5 simultaneously with writer and world-builder
- [ ] Hardcoded date format is identified as a localization blocker (not silently passed)
- [ ] The specific string key and reason are included in the issue report
- [ ] `AskUserQuestion` offers the option to fix now vs. flag and proceed
- [ ] Verdict notes the localization debt if the user proceeds without fixing
- [ ] Skill does NOT automatically rewrite the offending line without user approval
---
### Case 5: Writer Blocked — Missing character voice profiles
**Fixture:**
- Phase 1 narrative-director produces a narrative brief referencing two characters: Commander Varek and Advisor Selene
- No character voice profiles exist in `design/narrative/characters/` for either character
- Phase 2 begins; world-builder proceeds normally
**Input:** `/team-narrative ironveil surrender negotiation scene`
**Expected behavior:**
1. Phase 1 completes; narrative brief lists Commander Varek and Advisor Selene as characters
2. Phase 2: writer is spawned in parallel with world-builder
3. writer returns BLOCKED: "Cannot produce dialogue — no voice profiles found for Commander Varek or Advisor Selene in `design/narrative/characters/`. Voice profiles required to match character tone and speech patterns."
4. Orchestrator surfaces the blocker immediately: "writer: BLOCKED — Missing prerequisite: character voice profiles for Commander Varek and Advisor Selene"
5. world-builder output is preserved; partial report is produced with lore entries
6. `AskUserQuestion` presents options:
- Create voice profiles first (redirects to the narrative-director or design workflow)
- Provide minimal voice direction inline and retry the writer with that context
- Stop here and create voice profiles before proceeding
7. Orchestrator does NOT proceed to Phase 3 (level-designer) without writer output
**Assertions:**
- [ ] Writer block is surfaced before Phase 3 begins
- [ ] world-builder's completed lore output is preserved in the partial report
- [ ] Missing prerequisite (voice profiles) is named specifically (character names and expected file path)
- [ ] `AskUserQuestion` offers at least one option to resolve the missing prerequisite
- [ ] Orchestrator does not fabricate voice profiles or invent character voices
- [ ] Phase 3 is not launched while writer is BLOCKED without explicit user authorization
---
## Protocol Compliance
- [ ] `AskUserQuestion` is used after every phase output before the next phase launches
- [ ] Parallel spawning: Phase 2 (world-builder + writer) and Phase 5 (writer + localization-lead + world-builder) issue all Task calls before waiting for results
- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
- [ ] A partial report is always produced when some agents complete and others block
- [ ] Verdict is exactly COMPLETE or BLOCKED — no other verdict values used
- [ ] Next Steps handoff references `/design-review`, `/localize extract`, and `/dev-story`
---
## Coverage Notes
- Phase 3 (level-designer) and Phase 4 (narrative-director review) happy-path behavior are
validated implicitly by Case 1. Separate edge cases are not needed for these phases as
their failure modes follow the standard Error Recovery Protocol.
- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+ partial-report pattern validated in Cases 2 and 5.
- Localization concerns that are advisory (e.g., German/Finnish +30% expansion warnings)
vs. blocking (hardcoded formats) are distinguished in Case 4; advisory-only scenarios
follow the same pattern but do not change the verdict.
- The writer's "all lines under 120 characters" and "string keys not raw strings" checks
in Phase 5 are covered implicitly by Case 4's localization compliance scenario.

View File

@@ -0,0 +1,218 @@
# Skill Test Spec: /team-polish
## Skill Summary
Orchestrates the polish team through a six-phase pipeline: performance assessment
(performance-analyst) → optimization (performance-analyst, optionally with
engine-programmer when engine-level root causes are found) → visual polish
(technical-artist, parallel with Phase 2) → audio polish (sound-designer, parallel
with Phase 2) → hardening (qa-tester) → sign-off (orchestrator collects all results
and issues READY FOR RELEASE or NEEDS MORE WORK). Uses `AskUserQuestion` at each
phase transition. Engine-programmer is spawned conditionally only when Phase 1
identifies engine-level root causes. Verdict is READY FOR RELEASE or NEEDS MORE WORK.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: READY FOR RELEASE, NEEDS MORE WORK
- [ ] Contains "File Write Protocol" section
- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
- [ ] Sub-agents enforce "May I write to [path]?" before any write
- [ ] Has a next-step handoff at the end (references `/release-checklist`, `/sprint-plan update`, `/gate-check`)
- [ ] Error Recovery Protocol section is present
- [ ] `AskUserQuestion` is used at phase transitions before proceeding
- [ ] Phase 3 (visual polish) and Phase 4 (audio polish) are explicitly run in parallel with Phase 2
- [ ] engine-programmer is conditionally spawned in Phase 2 only when Phase 1 identifies engine-level root causes
- [ ] Phase 6 sign-off compares metrics against budgets before issuing verdict
---
## Test Cases
### Case 1: Happy Path — Full pipeline completes, READY FOR RELEASE verdict
**Fixture:**
- Feature exists and is functionally complete (e.g., `combat` system)
- Performance budgets are defined in technical-preferences.md (e.g., target 60fps, 16ms frame budget)
- No frame budget violations exist before polishing begins
- No audio events are missing; VFX assets are complete
- No regressions are introduced by polish changes
**Input:** `/team-polish combat`
**Expected behavior:**
1. Phase 1: performance-analyst is spawned; profiles the combat system, measures frame budget, checks memory usage; output: performance report showing all metrics within budget, no violations
2. `AskUserQuestion` presents performance report; user approves before Phases 2, 3, and 4 begin
3. Phase 2: performance-analyst applies minor optimizations (e.g., draw call batching); no engine-programmer needed (no engine-level root causes identified)
4. Phases 3 and 4 are launched in parallel alongside Phase 2:
- Phase 3: technical-artist reviews VFX for quality, optimizes particle systems, adds screen shake and visual juice
- Phase 4: sound-designer reviews audio events for completeness, checks mix levels, adds ambient audio layers
5. All three parallel phases complete; `AskUserQuestion` presents results; user approves before Phase 5 begins
6. Phase 5: qa-tester runs edge case tests, soak tests, stress tests, and regression tests; all pass
7. `AskUserQuestion` presents test results; user approves before Phase 6
8. Phase 6: orchestrator collects all results; compares before/after performance metrics against budgets; all metrics pass
9. Subagent asks "May I write the polish report to `production/qa/evidence/polish-combat-[date].md`?" before writing
10. Verdict: READY FOR RELEASE
**Assertions:**
- [ ] performance-analyst is spawned first in Phase 1 before any other agents
- [ ] `AskUserQuestion` appears after Phase 1 output and before Phases 2/3/4 launch
- [ ] Phases 3 and 4 Task calls are issued at the same time as Phase 2 (not after Phase 2 completes)
- [ ] engine-programmer is NOT spawned when Phase 1 finds no engine-level root causes
- [ ] qa-tester (Phase 5) is not launched until the parallel phases complete and user approves
- [ ] Phase 6 verdict is based on comparison of metrics against defined budgets
- [ ] Summary report includes: before/after performance metrics, visual polish changes, audio polish changes, test results
- [ ] No files are written by the orchestrator directly
- [ ] Verdict is READY FOR RELEASE
---
### Case 2: Performance Blocker — Frame budget violation cannot be fully resolved
**Fixture:**
- Feature being polished: `particle-storm` VFX system
- Phase 1 identifies a frame budget violation: particle-storm costs 12ms on target hardware (budget is 6ms for this system)
- Phase 2 performance-analyst applies optimizations reducing cost to 9ms — still over the 6ms budget
- Phase 2 cannot fully resolve the violation without a fundamental design change
**Input:** `/team-polish particle-storm`
**Expected behavior:**
1. Phase 1: performance-analyst identifies the 12ms frame cost vs. 6ms budget; reports "FRAME BUDGET VIOLATION: particle-storm costs 12ms, budget is 6ms"
2. `AskUserQuestion` presents the violation; user chooses to proceed with optimization attempt
3. Phase 2: performance-analyst applies optimizations; achieves 9ms — reduced but still over budget; reports "Optimization reduced cost to 9ms (was 12ms) — 3ms over budget. No further gains achievable without design changes."
4. Phases 3 and 4 run in parallel with Phase 2 (visual and audio polish)
5. Phase 5: qa-tester runs regression and edge case tests; all pass
6. Phase 6: orchestrator collects results; frame budget violation (9ms vs 6ms budget) remains unresolved
7. Verdict: NEEDS MORE WORK
8. Report lists the specific unresolved issue: "particle-storm frame cost (9ms) exceeds budget (6ms) by 3ms — requires design scope reduction or budget renegotiation"
9. Next Steps: schedule the remaining issue in `/sprint-plan update`; re-run `/team-polish` after fix
**Assertions:**
- [ ] Frame budget violation is flagged in Phase 1 with specific numbers (actual vs. budget)
- [ ] Phase 2 reports the post-optimization metric explicitly (9ms achieved, 3ms still over)
- [ ] Verdict is NEEDS MORE WORK (not READY FOR RELEASE) when a budget violation remains
- [ ] The specific unresolved issue is listed by name with the remaining gap quantified
- [ ] Next Steps references `/sprint-plan update` for scheduling the remaining fix
- [ ] Phases 3 and 4 still run (polish work is not abandoned due to a Phase 2 partial resolution)
- [ ] Phase 5 qa-tester still runs (regression testing is independent of the performance outcome)
---
### Case 3: No Argument — Usage guidance shown
**Fixture:**
- Any project state
**Input:** `/team-polish` (no argument)
**Expected behavior:**
1. Skill detects no argument is provided
2. Outputs usage guidance: e.g., "Usage: `/team-polish [feature or area]` — specify the feature or area to polish (e.g., `combat`, `main menu`, `inventory system`, `level-1`)"
3. Skill exits without spawning any agents
**Assertions:**
- [ ] Skill does NOT spawn any agents when no argument is provided
- [ ] Usage message includes the correct invocation format with argument examples
- [ ] Skill does NOT attempt to guess a feature from project files
- [ ] No `AskUserQuestion` is used — output is direct guidance
---
### Case 4: Engine-Level Bottleneck — engine-programmer spawned conditionally in Phase 2
**Fixture:**
- Feature being polished: `open-world` environment streaming
- Phase 1 identifies a performance bottleneck with a root cause in the rendering pipeline: "draw call overhead is caused by the engine's scene tree traversal in the spatial indexer — this is an engine-level issue, not a game code issue"
- Performance budgets are defined; the rendering overhead exceeds target frame budget
**Input:** `/team-polish open-world`
**Expected behavior:**
1. Phase 1: performance-analyst profiles the environment; identifies frame budget violation; root cause analysis points to engine-level rendering pipeline (spatial indexer traversal overhead)
2. Phase 1 output explicitly classifies the root cause as engine-level
3. `AskUserQuestion` presents the performance report including the engine-level root cause; user approves before Phase 2
4. Phase 2: performance-analyst is spawned for game-code-level optimizations AND engine-programmer is spawned in parallel for the engine-level rendering fix
5. Phases 3 and 4 also run in parallel with Phase 2 (visual and audio polish)
6. engine-programmer addresses the spatial indexer traversal; provides profiler validation showing the fix reduces overhead
7. Phase 5: qa-tester runs regression tests including tests for the engine-level fix
8. Phase 6: orchestrator collects all results; if metrics are now within budget, verdict is READY FOR RELEASE; if not, NEEDS MORE WORK
**Assertions:**
- [ ] engine-programmer is NOT spawned in Phase 2 unless Phase 1 explicitly identifies an engine-level root cause
- [ ] engine-programmer is spawned in Phase 2 when Phase 1 identifies an engine-level root cause
- [ ] engine-programmer and performance-analyst Task calls in Phase 2 are issued simultaneously (not sequentially)
- [ ] Phases 3 and 4 also run in parallel with Phase 2 (not deferred until Phase 2 completes)
- [ ] engine-programmer's output includes profiler validation of the fix
- [ ] qa-tester in Phase 5 runs regression tests that cover the engine-level change
- [ ] Verdict correctly reflects whether all metrics including the engine fix now meet budgets
---
### Case 5: Regression Found — Polish change broke an existing feature
**Fixture:**
- Feature being polished: `inventory-ui`
- Phases 14 complete successfully; performance and polish changes are applied
- Phase 5: qa-tester runs regression tests and finds that a shader optimization applied in Phase 3 broke the item highlight glow effect on hover — an existing feature that was working before the polish pass
**Input:** `/team-polish inventory-ui` (Phase 5 scenario)
**Expected behavior:**
1. Phases 14 complete; polish changes include a shader optimization from technical-artist
2. Phase 5: qa-tester runs regression tests and detects "Item highlight glow on hover no longer renders — regression introduced by shader optimization in Phase 3"
3. qa-tester returns test results with the regression noted
4. Orchestrator surfaces the regression immediately: "qa-tester: REGRESSION FOUND — `item-highlight-hover` glow broken by Phase 3 shader optimization"
5. Subagent files a bug report asking "May I write the bug report to `production/qa/evidence/bug-polish-inventory-ui-[date].md`?" before writing
6. Bug report is written after approval; it includes: the broken behavior, the polish change that caused it, reproduction steps, and severity
7. `AskUserQuestion` presents the regression with options:
- Revert the shader optimization and find an alternative approach
- Fix the shader optimization to preserve the glow effect
- Accept the regression and schedule a fix in the next sprint
8. Verdict: NEEDS MORE WORK (regression present regardless of user's chosen resolution path, unless fix is applied within the current session)
**Assertions:**
- [ ] Regression is surfaced before Phase 6 sign-off
- [ ] The specific broken behavior and the responsible change are both named in the report
- [ ] Subagent asks "May I write the bug report to [path]?" before filing
- [ ] Bug report includes: broken behavior, causal change, reproduction steps, severity
- [ ] `AskUserQuestion` offers options including revert, fix in place, and schedule later
- [ ] Verdict is NEEDS MORE WORK when a regression is present and unresolved
- [ ] Verdict may become READY FOR RELEASE only if the regression is fixed within the current polish session and qa-tester re-runs to confirm
---
## Protocol Compliance
- [ ] Phase 1 (assessment) must complete before any other phase begins
- [ ] `AskUserQuestion` is used after every phase output before the next phase launches
- [ ] Phases 3 and 4 are always launched in parallel with Phase 2 (not deferred)
- [ ] engine-programmer is only spawned when Phase 1 explicitly identifies engine-level root causes
- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
- [ ] A partial report is always produced when some agents complete and others block
- [ ] Verdict is exactly READY FOR RELEASE or NEEDS MORE WORK — no other verdict values used
- [ ] NEEDS MORE WORK verdict always lists specific remaining issues with severity
- [ ] Next Steps handoff references `/release-checklist` (on success) and `/sprint-plan update` + `/gate-check` (on failure)
---
## Coverage Notes
- The tools-programmer optional agent (for content pipeline tool verification) is not
separately tested — it follows the same conditional spawn pattern as engine-programmer
and is invoked only when content authoring tools are involved in the polished area.
- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+ partial-report pattern validated in Cases 2 and 5.
- Phase 6 sign-off logic (collecting and comparing all metrics) is validated implicitly
by Cases 1 and 2. The distinction between READY FOR RELEASE and NEEDS MORE WORK is
exercised in both directions across these cases.
- Soak testing and stress testing (Phase 5) are validated implicitly by Case 1's
qa-tester output. Case 5 focuses on the regression detection aspect of Phase 5.
- The "minimum spec hardware" test path in Phase 5 is not separately tested — it follows
the same qa-tester delegation pattern when the hardware is available.

View File

@@ -0,0 +1,204 @@
# Skill Test Spec: /team-qa
## Skill Summary
Orchestrates the QA team through a 7-phase structured testing cycle. Coordinates
qa-lead (strategy, test plan, sign-off report) and qa-tester (test case writing,
bug report writing). Covers scope detection, story classification, QA plan
generation, smoke check gate, test case writing, manual QA execution with bug
filing, and a final sign-off report with an APPROVED / APPROVED WITH CONDITIONS /
NOT APPROVED verdict. Parallel qa-tester spawning is used in Phase 5 for
independent stories.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains verdict keywords for sign-off report: APPROVED, APPROVED WITH CONDITIONS, NOT APPROVED
- [ ] Contains "May I write" language for both the QA plan and the sign-off report
- [ ] Has an Error Recovery Protocol section
- [ ] Uses `AskUserQuestion` at phase transitions to capture user approval before proceeding
- [ ] Phase 4 (smoke check) is a hard gate: FAIL stops the cycle
- [ ] Bug reports are written to `production/qa/bugs/` with `BUG-[NNN]-[short-slug].md` naming
- [ ] Next-step guidance differs by verdict (APPROVED / APPROVED WITH CONDITIONS / NOT APPROVED)
- [ ] Independent qa-tester tasks in Phase 5 are spawned in parallel
---
## Test Cases
### Case 1: Happy Path — All stories pass manual QA, APPROVED verdict
**Fixture:**
- `production/sprints/sprint-03/` exists with 4 story files
- Stories are a mix of types: 1 Logic, 1 Integration, 2 Visual/Feel
- All stories have acceptance criteria populated
- `tests/smoke/` contains a smoke test list; all items are verifiable
- No existing bugs in `production/qa/bugs/`
**Input:** `/team-qa sprint-03`
**Expected behavior:**
1. Phase 1: Reads all story files in `production/sprints/sprint-03/`; reads `production/stage.txt`; reports "Found 4 stories. Current stage: [stage]. Ready to begin QA strategy?"
2. Phase 2: Spawns `qa-lead` via Task; produces strategy table classifying all 4 stories; no blockers flagged; presents to user; AskUserQuestion: user selects "Looks good — proceed to test plan"
3. Phase 3: Produces QA plan document; asks "May I write the QA plan to `production/qa/qa-plan-sprint-03-[date].md`?"; writes after approval
4. Phase 4: Spawns `qa-lead` via Task; reviews `tests/smoke/`; returns PASS; reports "Smoke check passed. Proceeding to test case writing."
5. Phase 5: Spawns `qa-tester` via Task for each Visual/Feel and Integration story (23 stories); run in parallel; presents test cases grouped by story; AskUserQuestion per group; user approves
6. Phase 6: Walks through each approved story; user marks all as PASS; result summary: "Stories PASS: 4, FAIL: 0, BLOCKED: 0"
7. Phase 7: Spawns `qa-lead` via Task to produce sign-off report; report shows all stories PASS; no bugs filed; Verdict: APPROVED; asks "May I write this QA sign-off report to `production/qa/qa-signoff-sprint-03-[date].md`?"; writes after approval
8. Verdict: COMPLETE — QA cycle finished
**Assertions:**
- [ ] Phase 1 correctly counts and reports 4 stories with current stage
- [ ] Strategy table in Phase 2 classifies all 4 stories with correct types
- [ ] QA plan written only after "May I write?" approval
- [ ] Smoke check PASS allows pipeline to continue without user intervention
- [ ] Phase 5 qa-tester tasks for independent stories are issued in parallel
- [ ] Sign-off report includes Test Coverage Summary table and Verdict: APPROVED
- [ ] Sign-off report written only after "May I write?" approval
- [ ] Verdict: COMPLETE appears in final output
- [ ] Next step: "Run `/gate-check` to validate advancement."
---
### Case 2: Smoke Check Fail — QA cycle stops at Phase 4
**Fixture:**
- `production/sprints/sprint-04/` exists with 3 story files
- `tests/smoke/` exists with 5 smoke test items; 2 items cannot be verified (e.g., build is unstable, core navigation broken)
**Input:** `/team-qa sprint-04`
**Expected behavior:**
1. Phases 13 complete normally; QA plan is written
2. Phase 4: Spawns `qa-lead` via Task; smoke check returns FAIL; two specific failures are identified
3. Skill reports: "Smoke check failed. QA cannot begin until these issues are resolved: [list of 2 failures]. Fix them and re-run `/smoke-check`, or re-run `/team-qa` once resolved."
4. Skill stops immediately after Phase 4 — no Phase 5, 6, or 7 is executed
5. No sign-off report is produced; no "May I write?" for a sign-off is issued
**Assertions:**
- [ ] Smoke check FAIL causes the pipeline to halt at Phase 4 — Phases 5, 6, 7 are NOT executed
- [ ] Failure list is shown to the user explicitly (not summarized vaguely)
- [ ] Skill recommends `/smoke-check` and `/team-qa` re-run as remediation steps
- [ ] No QA sign-off report is written or offered
- [ ] Skill does NOT produce a COMPLETE verdict
- [ ] Any QA plan already written in Phase 3 is preserved (not deleted)
---
### Case 3: Bug Found — Visual/Feel story fails manual QA, bug report filed
**Fixture:**
- `production/sprints/sprint-05/` exists with 2 story files: 1 Logic (passes automated tests), 1 Visual/Feel
- `tests/smoke/` smoke check passes
- The Visual/Feel story's animation timing is visibly wrong (acceptance criterion not met)
- `production/qa/bugs/` directory exists (empty or with existing bugs)
**Input:** `/team-qa sprint-05`
**Expected behavior:**
1. Phases 15 complete normally; test cases are written for the Visual/Feel story
2. Phase 6: User marks Visual/Feel story as FAIL; AskUserQuestion collects failure description: "Animation plays at 2x speed — jitter visible on every loop"
3. Phase 6: Spawns `qa-tester` via Task to write a formal bug report; bug report written to `production/qa/bugs/BUG-001-animation-speed-jitter.md` (or next increment if bugs exist); report includes severity field
4. Result summary: "Stories PASS: 1, FAIL: 1 — bugs filed: BUG-001"
5. Phase 7: Spawns `qa-lead` to produce sign-off report; Bugs Found table lists BUG-001 with severity and status Open; Verdict: NOT APPROVED (S1/S2 bug open, or FAIL without documented workaround)
6. Sign-off report write is offered; writes after approval
7. Next step: "Resolve S1/S2 bugs and re-run `/team-qa` or targeted manual QA before advancing."
**Assertions:**
- [ ] FAIL result in Phase 6 triggers AskUserQuestion to collect the failure description before the bug report is written
- [ ] `qa-tester` is spawned via Task to write the bug report — orchestrator does not write it directly
- [ ] Bug report follows naming convention: `BUG-[NNN]-[short-slug].md` in `production/qa/bugs/`
- [ ] Bug report NNN is incremented correctly from existing bugs in the directory
- [ ] Phase 7 sign-off report Bugs Found table includes the bug ID, story name, severity, and status
- [ ] Verdict in sign-off report is NOT APPROVED
- [ ] Next step explicitly mentions re-running `/team-qa`
- [ ] Verdict: COMPLETE is still issued by the orchestrator (the QA cycle finished — the verdict is NOT APPROVED, but the skill completed its pipeline)
---
### Case 4: No Argument — Skill infers active sprint or asks user
**Fixture (variant A — state files present):**
- `production/session-state/active.md` exists and contains a reference to `sprint-06`
- `production/sprint-status.yaml` exists and identifies `sprint-06` as active
**Fixture (variant B — state files absent):**
- `production/session-state/active.md` does NOT exist
- `production/sprint-status.yaml` does NOT exist
**Input:** `/team-qa` (no argument)
**Expected behavior (variant A):**
1. Phase 1: No argument provided; reads `production/session-state/active.md`; reads `production/sprint-status.yaml`
2. Detects `sprint-06` as the active sprint from both sources
3. Proceeds as if `/team-qa sprint-06` was the input; reports "No sprint argument provided — inferred sprint-06 from session state. Found [N] stories."
**Expected behavior (variant B):**
1. Phase 1: No argument provided; attempts to read `production/session-state/active.md` — file missing; attempts to read `production/sprint-status.yaml` — file missing
2. Cannot infer sprint; uses AskUserQuestion: "Which sprint or feature should QA cover?" with options to type a sprint identifier or cancel
**Assertions:**
- [ ] Skill does NOT default to a hardcoded sprint name when no argument is provided
- [ ] Skill reads both `production/session-state/active.md` AND `production/sprint-status.yaml` before asking the user (variant A)
- [ ] When both state files are absent, skill uses AskUserQuestion rather than guessing (variant B)
- [ ] Inferred sprint is reported to the user before proceeding (variant A transparency)
- [ ] Skill does NOT error out when state files are missing — it falls back to asking (variant B)
---
### Case 5: Mixed Results — Some PASS, one FAIL with S1 bug, one BLOCKED
**Fixture:**
- `production/sprints/sprint-07/` exists with 4 story files
- Smoke check passes
- Story A (Logic): automated test passes — PASS
- Story B (UI): manual QA — PASS WITH NOTES (minor text overflow)
- Story C (Visual/Feel): manual QA — FAIL; tester identifies S1 crash on ability activation
- Story D (Integration): cannot test — BLOCKED (dependency system not yet implemented)
**Input:** `/team-qa sprint-07`
**Expected behavior:**
1. Phases 15 proceed; Phase 5 test cases cover stories B, C, D
2. Phase 6: User marks Story A as implicitly PASS (automated); Story B: PASS WITH NOTES; Story C: FAIL; Story D: BLOCKED
3. After Story C FAIL: qa-tester spawned to write bug report `BUG-001-crash-ability-activation.md` with S1 severity
4. Result summary presented: "Stories PASS: 1, PASS WITH NOTES: 1, FAIL: 1 — bugs filed: BUG-001 (S1), BLOCKED: 1"
5. Phase 7: qa-lead produces sign-off report covering all 4 stories; BUG-001 listed as S1/Open; Story D listed as BLOCKED; Verdict: NOT APPROVED
6. Sign-off report written after "May I write?" approval
7. Next step: "Resolve S1/S2 bugs and re-run `/team-qa` or targeted manual QA before advancing."
**Assertions:**
- [ ] All 4 stories appear in the Phase 7 sign-off report Test Coverage Summary table — none are silently omitted
- [ ] Story D (BLOCKED) is listed in the report with a BLOCKED status, not silently dropped
- [ ] S1 bug causes Verdict: NOT APPROVED regardless of the other stories passing
- [ ] PASS WITH NOTES stories do not downgrade to FAIL — they are tracked separately
- [ ] BUG-001 severity is listed as S1 in the Bugs Found table
- [ ] Partial results are preserved — the sign-off report is still produced even with failures and blocks
- [ ] Verdict: COMPLETE is issued by the orchestrator (pipeline completed); sign-off verdict is NOT APPROVED
---
## Protocol Compliance
- [ ] `AskUserQuestion` used at Phase 2 (strategy review), Phase 5 (test case approval per group), and Phase 6 (per-story manual QA result)
- [ ] Phase 4 smoke check is a hard gate: FAIL halts the pipeline at Phase 4 with no exceptions
- [ ] "May I write?" asked separately for QA plan (Phase 3) and sign-off report (Phase 7)
- [ ] Bug reports are always written by `qa-tester` via Task — orchestrator does not write directly
- [ ] Phase 5 qa-tester tasks for independent stories are issued in parallel where possible
- [ ] Error recovery: any BLOCKED agent is surfaced immediately with AskUserQuestion options
- [ ] Partial report always produced — no work is discarded because one story failed or blocked
- [ ] Sign-off verdict rules are strictly applied: any S1/S2 bug open = NOT APPROVED; no exceptions
- [ ] Orchestrator-level Verdict: COMPLETE is distinct from the sign-off report's APPROVED/NOT APPROVED verdict
---
## Coverage Notes
- The "APPROVED WITH CONDITIONS" verdict path (S3/S4 bugs, PASS WITH NOTES) is covered implicitly by Case 5's PASS WITH NOTES story (Story B) — if no S1/S2 bugs existed, that case would produce APPROVED WITH CONDITIONS. A dedicated case is not required as the verdict logic is table-driven.
- The `feature: [system-name]` argument form is not separately tested — it follows the same Phase 1 logic as the sprint form, using glob instead of directory read. The no-argument inference path (Case 4) provides sufficient coverage of the detection logic.
- Logic stories with passing automated tests do not need manual QA — this is validated implicitly by Case 5 (Story A) where the Logic story receives no manual QA phase.
- Parallel qa-tester spawning in Phase 5 is validated implicitly by Case 1 (multiple Visual/Feel stories issued simultaneously); no dedicated parallelism case is required beyond the Static Assertions check.

View File

@@ -0,0 +1,215 @@
# Skill Test Spec: /team-release
## Skill Summary
Orchestrates the release team through a 7-phase pipeline from release candidate to
deployment and post-release monitoring. Coordinates release-manager, qa-lead,
devops-engineer, producer, security-engineer (optional, required for online/
multiplayer), network-programmer (optional, required for multiplayer),
analytics-engineer, and community-manager. Phase 3 agents run in parallel. Ends
with a go/no-go decision; deployment (Phase 6) is skipped if the producer calls
NO-GO. Closes with a post-release monitoring plan.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "May I write" language in the File Write Protocol section (delegated to sub-agents)
- [ ] Has a File Write Protocol section stating that the orchestrator does not write files directly
- [ ] Has an Error Recovery Protocol section with four recovery options (surface / assess / offer options / partial report)
- [ ] Has a next-step handoff referencing post-release monitoring, `/retrospective`, and `production/stage.txt`
- [ ] Uses `AskUserQuestion` at phase transitions requiring user approval before proceeding
- [ ] Phase 3 agents (qa-lead, devops-engineer, and optionally security-engineer, network-programmer) are explicitly stated to run in parallel
- [ ] Phase 6 (Deployment) is conditional on a GO decision from Phase 5
- [ ] security-engineer is described as conditional on online features / player data — not always spawned
---
## Test Cases
### Case 1: Happy Path (Single-Player) — All phases complete, version deployed
**Fixture:**
- `production/stage.txt` exists and contains a Production-or-later stage
- Milestone acceptance criteria are all met (producer can confirm)
- No online features, no multiplayer, no player data collection
- All CI builds are clean on the current branch
- No open S1/S2 bugs
- `production/sprints/` contains the completed sprint stories for this milestone
**Input:** `/team-release v1.0.0`
**Expected behavior:**
1. Phase 1: Spawns `producer` via Task; confirms all milestone acceptance criteria met; identifies any deferred scope; produces release authorization; presents to user; AskUserQuestion: user approves before Phase 2
2. Phase 2: Spawns `release-manager` via Task; cuts release branch from agreed commit; bumps version numbers; invokes `/release-checklist`; freezes branch; output: branch name and checklist; AskUserQuestion: user approves before Phase 3
3. Phase 3 (parallel): Issues Task calls simultaneously for `qa-lead` (regression suite, critical path sign-off) and `devops-engineer` (build artifacts, CI verification); security-engineer is NOT spawned (no online features); network-programmer is NOT spawned (no multiplayer); both complete successfully
4. Phase 4: Verifies localization strings all translated; `analytics-engineer` verifies telemetry fires correctly on the release build; performance benchmarks pass; sign-off produced
5. Phase 5: Spawns `producer` via Task; collects sign-offs from qa-lead, release-manager, devops-engineer; no open blocking issues; producer declares GO; AskUserQuestion: user sees GO decision and confirms deployment
6. Phase 6: Spawns `release-manager` + `devops-engineer` (parallel); tags release in version control; invokes `/changelog`; deploys to staging; smoke test passes; deploys to production; simultaneously spawns `community-manager` to finalize patch notes via `/patch-notes v1.0.0` and prepare launch announcement
7. Phase 7: release-manager generates release report; producer updates milestone tracking; qa-lead begins monitoring for regressions; community-manager publishes communication; analytics-engineer confirms live dashboards healthy
8. Verdict: COMPLETE — release executed and deployed
**Assertions:**
- [ ] Phase 3 qa-lead and devops-engineer Task calls are issued simultaneously, not sequentially
- [ ] security-engineer is NOT spawned when the game has no online features, multiplayer, or player data
- [ ] Phase 5 producer collects sign-offs from all required parties before declaring GO
- [ ] Phase 6 deployment only begins after GO decision is confirmed by the user
- [ ] `/changelog` is invoked by release-manager in Phase 6 (not written directly)
- [ ] `/patch-notes v1.0.0` is invoked by community-manager in Phase 6
- [ ] Phase 7 monitoring plan includes a 48-hour post-release monitoring commitment
- [ ] Next steps recommend updating `production/stage.txt` to `Live` after successful deployment
- [ ] Verdict: COMPLETE appears in the final output
---
### Case 2: Go/No-Go: NO — S1 bug found in Phase 3, deployment skipped
**Fixture:**
- Release candidate branch exists for v0.9.0
- qa-lead discovers a previously unreported S1 crash in the main menu during Phase 3 regression testing
- devops-engineer build is clean and artifacts are ready
- producer is aware of the S1 bug
**Input:** `/team-release v0.9.0`
**Expected behavior:**
1. Phases 12 complete normally; release candidate is cut
2. Phase 3 (parallel): devops-engineer returns clean build sign-off; qa-lead returns with an S1 bug identified and regression suite failing; qa-lead declares quality gate: NOT PASSED
3. Orchestrator surfaces the qa-lead result immediately: "QA-LEAD: S1 bug found — [crash description]. Quality gate: NOT PASSED."
4. Phase 4 proceeds cautiously or is paused (AskUserQuestion: continue to Phase 4 or skip to Phase 5 for go/no-go?)
5. Phase 5: Spawns `producer` via Task; producer receives qa-lead's NOT PASSED verdict; no S1 sign-off available; producer declares NO-GO with rationale: "S1 bug [ID] is open and unresolved. Releasing is not safe."
6. AskUserQuestion: user is presented with the NO-GO decision and the S1 bug details; options: fix the bug and re-run, defer the release, or override (with documented rationale)
7. Phase 6 (Deployment) is SKIPPED entirely — no branch tagging, no deploy to staging, no deploy to production
8. community-manager is NOT spawned in Phase 6 (no deployment to announce)
9. Skill ends with a partial report summarizing what was completed (Phases 15) and what was skipped (Phase 6) and why
10. Verdict: BLOCKED — release not deployed
**Assertions:**
- [ ] qa-lead S1 bug finding is surfaced to the user immediately after Phase 3 completes — not suppressed until Phase 5
- [ ] producer's NO-GO decision explicitly references the S1 bug and the quality gate result
- [ ] Phase 6 Deployment is completely skipped when producer declares NO-GO
- [ ] community-manager is NOT spawned for patch notes or launch announcement on NO-GO
- [ ] The partial report clearly states which phases completed and which were skipped, with reasons
- [ ] Verdict: BLOCKED (not COMPLETE) when deployment is skipped due to NO-GO
- [ ] AskUserQuestion offers the user resolution options (fix and re-run / defer / override with rationale)
- [ ] Override path (if chosen) requires user to provide a documented rationale before proceeding to Phase 6
---
### Case 3: Security Audit for Online Game — security-engineer is spawned in Phase 3
**Fixture:**
- Game has multiplayer features and stores player account data
- Release candidate exists for v2.1.0
- qa-lead and devops-engineer both return clean sign-offs
- security-engineer audit is required per team composition rules
**Input:** `/team-release v2.1.0`
**Expected behavior:**
1. Phases 12 complete normally
2. Phase 3 (parallel): Orchestrator detects that the game has online/multiplayer features and player data; issues Task calls simultaneously for `qa-lead`, `devops-engineer`, AND `security-engineer`; also spawns `network-programmer` for netcode stability sign-off
3. security-engineer conducts pre-release security audit: reviews authentication flows, anti-cheat presence, data privacy compliance; returns sign-off
4. network-programmer verifies lag compensation, reconnect handling, and bandwidth under load; returns sign-off
5. All four Phase 3 agents complete; their results are collected before Phase 4 begins
6. Phase 5: producer collects sign-offs from all four Phase 3 agents (qa-lead, devops-engineer, security-engineer, network-programmer) before making the go/no-go call
7. Remaining phases proceed normally to COMPLETE
**Assertions:**
- [ ] security-engineer IS spawned in Phase 3 when the game has online features, multiplayer, or player data — this is not skipped
- [ ] network-programmer IS spawned in Phase 3 when the game has multiplayer
- [ ] All four Phase 3 Task calls (qa-lead, devops-engineer, security-engineer, network-programmer) are issued simultaneously
- [ ] security-engineer audit covers authentication, anti-cheat, and data privacy compliance
- [ ] Phase 5 producer sign-off collection includes security-engineer (four parties, not two)
- [ ] Phase 6 deployment does not begin until security-engineer has signed off
- [ ] Skill does NOT treat security-engineer as optional for a game with player data
---
### Case 4: Localization Miss — Untranslated strings block the ship
**Fixture:**
- Release candidate exists for v1.2.0
- Phase 3 (qa-lead, devops-engineer) complete with clean sign-offs
- Phase 4: localization verification detects 47 untranslated strings in the French locale (a supported language in the game's localization scope)
- localization-lead is available as a delegatable agent
**Input:** `/team-release v1.2.0`
**Expected behavior:**
1. Phases 13 complete with clean sign-offs
2. Phase 4: Localization verification step detects untranslated strings; identifies 47 strings in French locale; localization-lead (if available) is spawned to assess the severity
3. Orchestrator surfaces: "LOCALIZATION MISS: 47 untranslated strings found in French locale. Localization sign-off is required before shipping."
4. AskUserQuestion: options presented — (a) Fix translations and re-run Phase 4, (b) Remove French locale from this release, (c) Ship as-is with a known issues note
5. If user selects (a): Phase 4 is re-run after translations are provided; skill waits for localization sign-off
6. Phase 5 go/no-go does NOT proceed while localization sign-off is outstanding
7. Ship is blocked (Phase 6 not entered) until localization issue is resolved or explicitly waived
**Assertions:**
- [ ] Localization verification in Phase 4 detects untranslated strings and counts them (not just "some strings missing")
- [ ] Untranslated strings for a supported locale block the pipeline before Phase 5
- [ ] AskUserQuestion is used to offer the user resolution choices — the skill does not auto-waive
- [ ] Phase 5 go/no-go is NOT called while localization sign-off is pending
- [ ] If user chooses to re-run Phase 4: the skill does not require restarting from Phase 1
- [ ] If user explicitly waives (ships as-is): the waiver is documented in the release report (Phase 7) as a known issue
- [ ] Skill does NOT fabricate translated strings to unblock itself
---
### Case 5: No Argument — Skill infers version or asks
**Fixture (variant A — milestone data present):**
- `production/milestones/` exists with a milestone file; most recent milestone is "v1.1.0 — Gold"
- `production/session-state/active.md` references a version or milestone
**Fixture (variant B — no discoverable version):**
- `production/milestones/` does not exist
- `production/session-state/active.md` does not reference a version
- No git tags are present from which to infer a version
**Input:** `/team-release` (no argument)
**Expected behavior (variant A):**
1. Phase 1: No argument provided; reads `production/session-state/active.md`; reads most recent milestone file in `production/milestones/`
2. Infers v1.1.0 as the target version; reports "No version argument provided — inferred v1.1.0 from milestone data. Proceeding."
3. Confirms with AskUserQuestion before beginning Phase 1 proper: "Releasing v1.1.0. Is this correct?"
4. Proceeds as if `/team-release v1.1.0` was the input
**Expected behavior (variant B):**
1. Phase 1: No argument provided; reads available state files — no version discoverable
2. Uses AskUserQuestion: "What version number should be released? (e.g., v1.0.0)"
3. Waits for user input before proceeding
**Assertions:**
- [ ] Skill does NOT default to a hardcoded version string when no argument is provided
- [ ] Skill reads `production/session-state/active.md` and milestone files before asking (variant A)
- [ ] Inferred version is confirmed with the user via AskUserQuestion before proceeding (variant A)
- [ ] When no version is discoverable, AskUserQuestion is used — skill does not guess (variant B)
- [ ] Skill does NOT error out when milestone files are absent — it falls back to asking (variant B)
---
## Protocol Compliance
- [ ] `AskUserQuestion` used at each phase transition gate (post-Phase 1, post-Phase 2, post-Phase 3/4 if issues, post-Phase 5 go/no-go)
- [ ] Phase 3 agents are always issued as parallel Task calls — qa-lead and devops-engineer are never sequential
- [ ] security-engineer is conditionally spawned based on game features — never silently skipped when features are present
- [ ] File Write Protocol: orchestrator never calls Write/Edit directly — all writes are delegated to sub-agents or sub-skills
- [ ] Phase 6 Deployment is strictly conditional on a GO verdict from Phase 5 — never auto-triggered
- [ ] Error recovery: any BLOCKED agent is surfaced immediately before continuing to dependent phases
- [ ] Partial reports are always produced if any phase fails or the pipeline is halted (Case 2)
- [ ] Verdict: COMPLETE only when deployment completes; BLOCKED when go/no-go is NO or a hard blocker is unresolved
- [ ] Next steps always include 48-hour post-release monitoring, `/retrospective` recommendation, and `production/stage.txt` update to `Live`
---
## Coverage Notes
- Phase 7 post-release actions (release report, milestone tracking, community publishing, dashboard monitoring) are validated implicitly by Case 1. No separate edge case is required as Phase 7 is non-gated and does not have a blocking failure mode.
- The "devops-engineer build fails" path is not separately tested — it would surface as a BLOCKED result in Phase 3 and follow the standard error recovery protocol (surface → assess → AskUserQuestion options). This is validated structurally by the Static Assertions error recovery check.
- The parallel Phase 4 path (localization + performance + analytics simultaneously with Phase 3) is a documented option in the skill ("can run in parallel with Phase 3 if resources available"). Case 4 tests Phase 4 as a sequential gate; the parallel variant is left to the skill's implementation judgment.
- The `network-programmer` sign-off path for multiplayer is validated as part of Case 3 rather than a separate case, as it follows the same parallel-spawn pattern as security-engineer.
- The "override NO-GO with documented rationale" path in Case 2 is referenced but not exhaustively tested — it is an escape hatch that the skill must support, and its existence is validated by the AskUserQuestion options assertion in Case 2.

View File

@@ -0,0 +1,201 @@
# Skill Test Spec: /team-ui
## Skill Summary
Orchestrates the UI team through the full UX pipeline for a single UI feature.
Coordinates ux-designer, ui-programmer, art-director, the engine UI specialist,
and accessibility-specialist through five structured phases: Context Gathering +
UX Spec (Phase 1a/1b) → UX Review Gate (Phase 1c) → Visual Design (Phase 2) →
Implementation (Phase 3) → Review in parallel (Phase 4) → Polish (Phase 5).
Uses `AskUserQuestion` at each phase transition. Delegates all file writes to
sub-agents and sub-skills (`/ux-design`, `ui-programmer`). Produces a summary report
with verdict COMPLETE / BLOCKED and handoffs to `/ux-review`, `/code-review`,
`/team-polish`.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings (Phase 1a through Phase 5 are all present)
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "May I write" or "File Write Protocol" — writes delegated to sub-agents and sub-skills, orchestrator does not write files directly
- [ ] Has a next-step handoff at the end (references `/ux-review`, `/code-review`, `/team-polish`)
- [ ] Error Recovery Protocol section is present with all four recovery steps
- [ ] Uses `AskUserQuestion` at phase transitions for user approval before proceeding
- [ ] Phase 4 is explicitly marked as parallel (ux-designer, art-director, accessibility-specialist)
- [ ] UX Review Gate (Phase 1c) is defined as a blocking gate — skill must not proceed to Phase 2 without APPROVED verdict
- [ ] Team Composition lists all five roles (ux-designer, ui-programmer, art-director, engine UI specialist, accessibility-specialist)
- [ ] References the interaction pattern library (`design/ux/interaction-patterns.md`) — ui-programmer must use existing patterns
- [ ] Phase 1a reads `design/accessibility-requirements.md` before design begins
---
## Test Cases
### Case 1: Happy Path — Full pipeline from UX spec through polish succeeds
**Fixture:**
- `design/gdd/game-concept.md` exists with platform targets and intended audience
- `design/player-journey.md` exists
- `design/ux/interaction-patterns.md` exists with relevant patterns
- `design/accessibility-requirements.md` exists with committed tier (e.g., Enhanced)
- Engine UI specialist configured in `.claude/docs/technical-preferences.md`
**Input:** `/team-ui inventory screen`
**Expected behavior:**
1. Phase 1a — orchestrator reads game-concept.md, player-journey.md, relevant GDD UI sections, interaction-patterns.md, accessibility-requirements.md; summarizes a brief for the ux-designer
2. Phase 1b — `/ux-design inventory-screen` invoked (or ux-designer spawned directly); produces `design/ux/inventory-screen.md` using `ux-spec.md` template; `AskUserQuestion` confirms spec before review
3. Phase 1c — `/ux-review design/ux/inventory-screen.md` invoked; returns APPROVED; gate passed, proceed to Phase 2
4. Phase 2 — art-director spawned; reviews full UX spec (not only wireframes); applies visual treatment; verifies color contrast; produces visual design spec with asset manifest; `AskUserQuestion` confirms before Phase 3
5. Phase 3 — engine UI specialist spawned first (read from technical-preferences.md); produces implementation notes for ui-programmer; ui-programmer spawned with UX spec + visual spec + engine notes; implementation produced; interaction-patterns.md updated if new patterns introduced
6. Phase 4 — ux-designer, art-director, accessibility-specialist spawned in parallel; all three return results before Phase 5
7. Phase 5 — review feedback addressed; animations verified skippable; UI sounds confirmed through audio event system; interaction-patterns.md final check; verdict: COMPLETE
8. Summary report: UX spec APPROVED, visual design COMPLETE, implementation COMPLETE, accessibility COMPLIANT, all input methods supported, pattern library updated, verdict: COMPLETE
**Assertions:**
- [ ] Phase 1a reads all five sources before briefing ux-designer
- [ ] UX Review Gate checked before Phase 2 — Phase 2 does NOT begin until APPROVED
- [ ] Art-director in Phase 2 reviews full spec, not just wireframe images
- [ ] Engine UI specialist spawned before ui-programmer in Phase 3
- [ ] Phase 4 agents launched simultaneously (ux-designer, art-director, accessibility-specialist)
- [ ] All file writes delegated to sub-agents and sub-skills
- [ ] Verdict COMPLETE in final summary report
- [ ] Next steps include `/ux-review`, `/code-review`, `/team-polish`
---
### Case 2: UX Review Gate — Spec fails review; skill halts before implementation
**Fixture:**
- `design/ux/inventory-screen.md` produced by Phase 1b
- `/ux-review` returns verdict NEEDS REVISION with specific concerns flagged (e.g., gamepad navigation flow incomplete, contrast ratio below minimum)
**Input:** `/team-ui inventory screen`
**Expected behavior:**
1. Phase 1a + 1b complete — UX spec produced
2. Phase 1c — `/ux-review design/ux/inventory-screen.md` returns NEEDS REVISION
3. Skill does NOT advance to Phase 2
4. `AskUserQuestion` presented with the specific flagged concerns and options:
- (a) Return to ux-designer to address the issues and re-review
- (b) Accept the risk and proceed to Phase 2 anyway (conscious decision)
5. If user chooses (a): ux-designer revises spec, `/ux-review` re-run; loop continues until APPROVED or user overrides
6. If user chooses (b): skill proceeds with an explicit NEEDS REVISION note in the final report
7. Skill does NOT silently proceed past the gate
**Assertions:**
- [ ] Phase 2 does NOT begin while UX review verdict is NEEDS REVISION
- [ ] `AskUserQuestion` presents the specific flagged concerns before offering options
- [ ] User must make a conscious choice to override — skill does not assume override
- [ ] If user accepts risk, NEEDS REVISION concern is documented in the final report
- [ ] Revision-and-re-review loop is offered (not just a one-shot failure)
- [ ] Skill does NOT discard the produced UX spec on review failure
---
### Case 3: No Argument — Usage guidance shown
**Fixture:**
- Any project state
**Input:** `/team-ui` (no argument)
**Expected behavior:**
1. Skill detects no argument provided
2. Outputs usage message explaining the required argument (UI feature description)
3. Provides an example invocation: `/team-ui [UI feature description]`
4. Skill exits without spawning any subagents or reading any project files
**Assertions:**
- [ ] Skill does NOT spawn any subagents when no argument is given
- [ ] Usage message includes the argument-hint format from frontmatter
- [ ] At least one example of a valid invocation is shown
- [ ] No UX spec files or GDDs read before failing
- [ ] Verdict is NOT shown (pipeline never starts)
---
### Case 4: Accessibility Parallel Review — Phase 4 runs three streams simultaneously
**Fixture:**
- `design/ux/inventory-screen.md` exists (APPROVED)
- Visual design spec complete
- Implementation complete
- `design/accessibility-requirements.md` committed tier: Enhanced
**Input:** `/team-ui inventory screen` (resuming from Phase 3 complete)
**Expected behavior:**
1. Phase 4 begins after implementation is confirmed complete
2. Three Task calls issued simultaneously: ux-designer, art-director, accessibility-specialist
3. Each stream operates independently:
- ux-designer: verifies implementation matches wireframes, tests keyboard-only and gamepad-only navigation, checks accessibility features function
- art-director: verifies visual consistency with art bible at minimum and maximum supported resolutions
- accessibility-specialist: audits against the Enhanced accessibility tier in `design/accessibility-requirements.md`; any violation flagged as a blocker
4. Skill waits for all three results before proceeding to Phase 5
5. `AskUserQuestion` presents all three review results before Phase 5 begins
**Assertions:**
- [ ] All three Task calls issued before any result is awaited (parallel, not sequential)
- [ ] Phase 5 does NOT begin until all three Phase 4 agents have returned
- [ ] Accessibility-specialist explicitly reads `design/accessibility-requirements.md` for the committed tier
- [ ] Accessibility violations flagged as BLOCKING (not merely advisory)
- [ ] `AskUserQuestion` shows all three review streams' results together before Phase 5 approval
- [ ] No Phase 4 agent's output is used as input for another Phase 4 agent
---
### Case 5: Missing Interaction Pattern Library — Skill notes the gap rather than inventing patterns
**Fixture:**
- `design/ux/interaction-patterns.md` does NOT exist
- All other required files present
**Input:** `/team-ui settings menu`
**Expected behavior:**
1. Phase 1a — orchestrator attempts to read `design/ux/interaction-patterns.md`; file not found
2. Skill surfaces the gap: "interaction-patterns.md does not exist — no existing patterns to reuse"
3. `AskUserQuestion` presented with options:
- (a) Run `/ux-design patterns` first to establish the pattern library, then continue
- (b) Proceed without the pattern library — ux-designer will document new patterns as they are created
4. Skill does NOT invent or assume patterns from other sources
5. If user chooses (b): ui-programmer is explicitly instructed to treat all patterns created as new and to add each to a new `design/ux/interaction-patterns.md` at completion
6. Final report notes that interaction-patterns.md was created (or is still absent if user skipped)
**Assertions:**
- [ ] Skill does NOT silently ignore the missing pattern library
- [ ] Skill does NOT invent patterns by guessing from the feature name or GDD alone
- [ ] `AskUserQuestion` offers a "create pattern library first" option (referencing `/ux-design patterns`)
- [ ] If user proceeds without the library, ui-programmer is told to treat all patterns as new
- [ ] Final report documents pattern library status (created / absent / updated)
- [ ] Skill does NOT fail entirely — the gap is noted and user is given a choice
---
## Protocol Compliance
- [ ] `AskUserQuestion` used at each phase transition — user approves before pipeline advances
- [ ] UX Review Gate (Phase 1c) is blocking — Phase 2 cannot begin without APPROVED or explicit user override
- [ ] All file writes delegated to sub-agents and sub-skills — orchestrator does not call Write or Edit directly
- [ ] Phase 4 agents launched in parallel per skill spec
- [ ] Error Recovery Protocol followed: surface → assess → offer options → partial report
- [ ] Partial report always produced even when agents are BLOCKED
- [ ] Verdict is one of COMPLETE / BLOCKED
- [ ] Next steps present at end: `/ux-review`, `/code-review`, `/team-polish`
---
## Coverage Notes
- The HUD-specific path (`/ux-design hud` + `hud-design.md` template + visual budget check in Phase 5)
is not separately tested here; it shares the same phase structure but uses different templates.
- The "Update in place" path for interaction-patterns.md (new pattern added during implementation)
is exercised implicitly in Case 1 Step 5 — a dedicated fixture with a known new pattern would
strengthen coverage.
- Engine UI specialist unavailable (no engine configured) — skill spec states "skip if no engine
configured"; this path is asserted in Case 1 but not given a dedicated fixture.
- The NEEDS REVISION acceptance-risk override (Case 2 option b) requires the override to be
explicitly documented in the report; this is asserted but not further tested for downstream effects.

View File

@@ -0,0 +1,214 @@
# Skill Test Spec: /adopt
## Skill Summary
`/adopt` audits an existing project's artifacts — GDDs, ADRs, stories, infrastructure
files, and `technical-preferences.md` — for format compliance with the template's
skill pipeline. It classifies every gap by severity (BLOCKING / HIGH / MEDIUM / LOW),
composes a numbered, ordered migration plan, and writes it to `docs/adoption-plan-[date].md`
after explicit user approval via `AskUserQuestion`.
This skill is distinct from `/project-stage-detect` (which checks what exists).
`/adopt` checks whether what exists will actually work with the template's skills.
No director gates apply. The skill does NOT invoke any director agents.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains severity tier keywords: BLOCKING, HIGH, MEDIUM, LOW
- [ ] Contains "May I write" or `AskUserQuestion` language before writing the adoption plan
- [ ] Has a next-step handoff at the end (e.g., offering to fix the highest-priority gap immediately)
---
## Director Gate Checks
None. `/adopt` is a brownfield audit utility. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — All GDDs compliant, no gaps, COMPLIANT
**Fixture:**
- `design/gdd/` contains 3 GDD files; each has all 8 required sections with content
- `docs/architecture/adr-0001.md` exists with `## Status`, `## Engine Compatibility`,
and all other required sections
- `production/stage.txt` exists
- `docs/architecture/tr-registry.yaml` and `docs/architecture/control-manifest.md` exist
- Engine configured in `technical-preferences.md`
**Input:** `/adopt`
**Expected behavior:**
1. Skill emits "Scanning project artifacts..." then reads all artifacts silently
2. Reports detected phase, GDD count, ADR count, story count
3. Phase 2 audit: all 3 GDDs have all 8 sections, Status field present and valid
4. ADR audit: all required sections present
5. Infrastructure audit: all critical files exist
6. Phase 3: zero BLOCKING, zero HIGH, zero MEDIUM, zero LOW gaps
7. Summary reports: "No blocking gaps — this project is template-compatible"
8. Uses `AskUserQuestion` to ask about writing the plan; user selects write
9. Adoption plan is written to `docs/adoption-plan-[date].md`
10. Phase 7 offers next action: no blocking gaps, offers options for next steps
**Assertions:**
- [ ] Skill reads silently before presenting any output
- [ ] "Scanning project artifacts..." appears before the silent read phase
- [ ] Gap counts show 0 BLOCKING, 0 HIGH, 0 MEDIUM (or only LOW)
- [ ] `AskUserQuestion` is used before writing the adoption plan
- [ ] Adoption plan file is written to `docs/adoption-plan-[date].md`
- [ ] Phase 7 offers a specific next action (not just a list)
---
### Case 2: Non-Compliant Documents — GDDs missing sections, NEEDS MIGRATION
**Fixture:**
- `design/gdd/` contains 2 GDD files:
- `combat.md` — missing `## Acceptance Criteria` and `## Formulas` sections
- `movement.md` — all 8 sections present
- One ADR (`adr-0001.md`) is missing `## Status` section
- `docs/architecture/tr-registry.yaml` does not exist
**Input:** `/adopt`
**Expected behavior:**
1. Skill scans all artifacts
2. Phase 2 audit finds:
- `combat.md`: 2 missing sections (Acceptance Criteria, Formulas)
- `adr-0001.md`: missing `## Status` — BLOCKING impact
- `tr-registry.yaml`: missing — HIGH impact
3. Phase 3 classifies:
- BLOCKING: `adr-0001.md` missing `## Status` (story-readiness silently passes)
- HIGH: `tr-registry.yaml` missing; `combat.md` missing Acceptance Criteria (can't generate stories)
- MEDIUM: `combat.md` missing Formulas
4. Phase 4 builds ordered migration plan:
- Step 1 (BLOCKING): Add `## Status` to `adr-0001.md` — command: `/architecture-decision retrofit`
- Step 2 (HIGH): Run `/architecture-review` to bootstrap tr-registry.yaml
- Step 3 (HIGH): Add Acceptance Criteria to `combat.md` — command: `/design-system retrofit`
- Step 4 (MEDIUM): Add Formulas to `combat.md`
5. Gap Preview shows BLOCKING items as bullets (actual file names), HIGH/MEDIUM as counts
6. `AskUserQuestion` asks to write the plan; writes after approval
7. Phase 7 offers to fix the highest-priority gap (ADR Status) immediately
**Assertions:**
- [ ] BLOCKING gaps are listed as explicit file-name bullets in the Gap Preview
- [ ] HIGH and MEDIUM shown as counts in Gap Preview
- [ ] Migration plan items are in BLOCKING-first order
- [ ] Each plan item includes the fix command or manual steps
- [ ] `AskUserQuestion` is used before writing
- [ ] Phase 7 offers to immediately retrofit the first BLOCKING item
---
### Case 3: Mixed State — Some docs compliant, some not, partial report
**Fixture:**
- 4 GDD files: 2 fully compliant, 2 with gaps (one missing Tuning Knobs, one missing Edge Cases)
- ADRs: 3 files — 2 compliant, 1 missing `## ADR Dependencies`
- Stories: 5 files — 3 have TR-ID references, 2 do not
- Infrastructure: all critical files present; `technical-preferences.md` fully configured
**Input:** `/adopt`
**Expected behavior:**
1. Skill audits all artifact types
2. Audit summary shows totals: "4 GDDs (2 fully compliant, 2 with gaps); 3 ADRs
(2 fully compliant, 1 with gaps); 5 stories (3 with TR-IDs, 2 without)"
3. Gap classification:
- No BLOCKING gaps
- HIGH: 1 ADR missing `## ADR Dependencies`
- MEDIUM: 2 GDDs with missing sections; 2 stories missing TR-IDs
- LOW: none
4. Migration plan lists HIGH gap first, then MEDIUM gaps in order
5. Note included: "Existing stories continue to work — do not regenerate stories
that are in progress or done"
6. `AskUserQuestion` to write plan; writes after approval
**Assertions:**
- [ ] Per-artifact compliance tallies are shown (N compliant, M with gaps)
- [ ] Existing story compatibility note is included in the plan
- [ ] No BLOCKING gaps results in no BLOCKING section in migration plan
- [ ] HIGH gap precedes MEDIUM gaps in plan ordering
- [ ] `AskUserQuestion` is used before writing
---
### Case 4: No Artifacts Found — Fresh project, guidance to run /start
**Fixture:**
- Repository has no files in `design/gdd/`, `docs/architecture/`, `production/epics/`
- `production/stage.txt` does not exist
- `src/` directory does not exist or has fewer than 10 files
- No game-concept.md, no systems-index.md
**Input:** `/adopt`
**Expected behavior:**
1. Phase 1 existence check finds no artifacts
2. Skill infers "Fresh" — no brownfield work to migrate
3. Uses `AskUserQuestion`:
- "This looks like a fresh project — no existing artifacts found. `/adopt` is for
projects with work to migrate. What would you like to do?"
- Options: "Run `/start`", "My artifacts are in a non-standard location", "Cancel"
4. Skill stops — does not proceed to audit regardless of user selection
**Assertions:**
- [ ] `AskUserQuestion` is used (not a plain text message) when no artifacts are found
- [ ] `/start` is presented as a named option
- [ ] Skill stops after the question — no audit phases run
- [ ] No adoption plan file is written
---
### Case 5: Director Gate Check — No gate; adopt is a utility audit skill
**Fixture:**
- Project with a mix of compliant and non-compliant GDDs
**Input:** `/adopt`
**Expected behavior:**
1. Skill completes full audit and produces migration plan
2. No director agents are spawned at any point
3. No gate IDs (CD-*, TD-*, AD-*, PR-*) appear in output
4. No `/gate-check` is invoked during the skill run
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Skill reaches plan-writing or cancellation without any gate verdict
---
## Protocol Compliance
- [ ] Emits "Scanning project artifacts..." before silent read phase
- [ ] Reads all artifacts silently before presenting any results
- [ ] Shows Adoption Audit Summary and Gap Preview before asking to write
- [ ] Uses `AskUserQuestion` before writing the adoption plan file
- [ ] Adoption plan written to `docs/adoption-plan-[date].md` — not to any other path
- [ ] Migration plan items ordered: BLOCKING first, HIGH second, MEDIUM third, LOW last
- [ ] Phase 7 always offers a single specific next action (not a generic list)
- [ ] Never regenerates existing artifacts — only fills gaps in what exists
- [ ] Does not invoke director gates at any point
---
## Coverage Notes
- The `gdds`, `adrs`, `stories`, and `infra` argument modes narrow the audit scope;
each follows the same pattern as the full audit but limited to that artifact type.
Not separately fixture-tested here.
- The systems-index.md parenthetical status value check (BLOCKING) is a special case
that triggers an immediate fix offer before writing the plan; not separately tested.
- The review-mode.txt prompt (Phase 6b) runs after plan writing if `production/review-mode.txt`
does not exist; not separately tested here.

View File

@@ -0,0 +1,179 @@
# Skill Test Spec: /asset-spec
## Skill Summary
`/asset-spec` generates per-asset visual specification documents from design
requirements. It reads the relevant GDD, art bible, and design system to produce
a structured asset spec sheet that defines: dimensions, animation states (if
applicable), color palette reference, style notes, technical constraints
(format, file size budget), and deliverable checklist.
Spec sheets are written to `assets/specs/[asset-name]-spec.md` after a "May I write"
ask. If a spec already exists, the skill offers to update it. When multiple assets
are requested in a single invocation, a "May I write" ask is made per asset. No
director gates apply. The verdict is COMPLETE when all requested specs are written.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" collaborative protocol language (per asset)
- [ ] Has a next-step handoff (e.g., assign to an artist, or `/asset-audit` later)
---
## Director Gate Checks
None. `/asset-spec` is a design documentation utility. Technical artists may
review specs separately but this is not a gate within this skill.
---
## Test Cases
### Case 1: Happy Path — Enemy sprite spec with full GDD and art bible
**Fixture:**
- `design/gdd/enemies.md` exists with enemy variants defined
- `design/art-bible.md` exists with color palette and style notes
- No existing asset spec for "goblin-enemy"
**Input:** `/asset-spec goblin-enemy`
**Expected behavior:**
1. Skill reads enemies GDD and art bible
2. Skill generates a spec for the goblin enemy sprite:
- Dimensions: inferred from engine defaults or explicitly from GDD
- Animation states: idle, walk, attack, hurt, death
- Color palette reference: links to art-bible palette section
- Style notes: from art bible character design rules
- Technical constraints: format (PNG), size budget
- Deliverable checklist
3. Skill asks "May I write to `assets/specs/goblin-enemy-spec.md`?"
4. File written on approval; verdict is COMPLETE
**Assertions:**
- [ ] All 6 spec components are present (dimensions, animations, palette, style, tech, checklist)
- [ ] Color palette reference links to art bible (not duplicated)
- [ ] Animation states are drawn from GDD (not invented)
- [ ] "May I write" is asked with the correct path
- [ ] Verdict is COMPLETE
---
### Case 2: No Art Bible Found — Spec with Placeholder Style Notes, Dependency Flagged
**Fixture:**
- `design/gdd/player.md` exists
- `design/art-bible.md` does NOT exist
**Input:** `/asset-spec player-sprite`
**Expected behavior:**
1. Skill reads player GDD but cannot find the art bible
2. Skill generates spec with placeholder style notes: "DEPENDENCY GAP: art bible
not found — style notes are placeholders"
3. Color palette section uses: "TBD — see art bible when created"
4. Skill asks "May I write to `assets/specs/player-sprite-spec.md`?"
5. File written with placeholders and dependency flag; verdict is COMPLETE with advisory
**Assertions:**
- [ ] DEPENDENCY GAP is flagged for the missing art bible
- [ ] Spec is still generated (not blocked)
- [ ] Style notes contain placeholder markers, not invented styles
- [ ] Verdict is COMPLETE with advisory note
---
### Case 3: Asset Spec Already Exists — Offers to Update
**Fixture:**
- `assets/specs/goblin-enemy-spec.md` already exists
- GDD has been updated since the spec was written (new attack animation added)
**Input:** `/asset-spec goblin-enemy`
**Expected behavior:**
1. Skill detects existing spec file
2. Skill reports: "Asset spec already exists for goblin-enemy — checking for updates"
3. Skill diffs GDD against existing spec and identifies: new "charge-attack" animation
state added in GDD but not in spec
4. Skill presents the diff: "1 new animation state found — offering to update spec"
5. Skill asks "May I update `assets/specs/goblin-enemy-spec.md`?" (not overwrite)
6. Spec is updated; verdict is COMPLETE
**Assertions:**
- [ ] Existing spec is detected and "update" path is offered
- [ ] Diff between GDD and existing spec is shown
- [ ] "May I update" language is used (not "May I write")
- [ ] Existing spec content is preserved; only the diff is applied
- [ ] Verdict is COMPLETE
---
### Case 4: Multiple Assets Requested — May-I-Write Per Asset
**Fixture:**
- GDD and art bible exist
- User requests specs for 3 assets: goblin-enemy, orc-enemy, treasure-chest
**Input:** `/asset-spec goblin-enemy orc-enemy treasure-chest`
**Expected behavior:**
1. Skill generates all 3 specs in sequence
2. For each asset, skill shows the draft and asks "May I write to
`assets/specs/[name]-spec.md`?" individually
3. User can approve all 3 or skip individual assets
4. All approved specs are written; verdict is COMPLETE
**Assertions:**
- [ ] "May I write" is asked 3 times (once per asset), not once for all
- [ ] User can decline one asset without blocking the others
- [ ] All 3 spec files are written for approved assets
- [ ] Verdict is COMPLETE when all approved specs are written
---
### Case 5: Director Gate Check — No gate; asset-spec is a design utility
**Fixture:**
- GDD and art bible exist
**Input:** `/asset-spec goblin-enemy`
**Expected behavior:**
1. Skill generates and writes the asset spec
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Reads GDD, art bible, and design system before generating spec
- [ ] Includes all 6 spec components (dimensions, animations, palette, style, tech, checklist)
- [ ] Flags missing dependencies (art bible, GDD) with DEPENDENCY GAP notes
- [ ] Asks "May I write" (or "May I update") per asset
- [ ] Handles multiple assets with individual write confirmations
- [ ] Verdict is COMPLETE when all approved specs are written
---
## Coverage Notes
- Audio asset specs (sound effects, music) follow the same structure with
different fields (duration, sample rate, looping) and are not separately tested.
- UI asset specs (icons, button states) follow the same flow with interaction
state requirements aligned to the UX spec.
- The case where GDD is also missing (neither GDD nor art bible exists) is not
separately tested; spec would be generated with both dependency gaps flagged.

View File

@@ -0,0 +1,189 @@
# Skill Test Spec: /brainstorm
## Skill Summary
`/brainstorm` facilitates guided game concept ideation. It presents 2-4 concept
options with pros/cons, lets the user choose and refine a concept, and produces
a structured `design/gdd/game-concept.md` document. The skill is collaborative —
it asks questions before proposing options and iterates until the user approves
a concept direction.
In `full` review mode, four director gates spawn in parallel after the concept
is drafted: CD-PILLARS (creative-director), AD-CONCEPT-VISUAL (art-director),
TD-FEASIBILITY (technical-director), and PR-SCOPE (producer). In `lean` mode,
all 4 inline gates are skipped (lean mode only runs PHASE-GATEs, and brainstorm
has none). In `solo` mode, all gates are skipped. The skill asks "May I write"
before writing `design/gdd/game-concept.md`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: APPROVED, REJECTED, CONCERNS
- [ ] Contains "May I write" collaborative protocol language (for game-concept.md)
- [ ] Has a next-step handoff at the end (`/map-systems`)
- [ ] Documents 4 director gates in full mode: CD-PILLARS, AD-CONCEPT-VISUAL, TD-FEASIBILITY, PR-SCOPE
- [ ] Documents that all 4 gates are skipped in lean and solo modes
---
## Director Gate Checks
In `full` mode: CD-PILLARS, AD-CONCEPT-VISUAL, TD-FEASIBILITY, and PR-SCOPE
spawn in parallel after the concept draft is approved by the user.
In `lean` mode: all 4 inline gates are skipped (brainstorm has no PHASE-GATEs,
so lean mode skips everything). Output notes all 4 as: "[GATE-ID] skipped — lean mode".
In `solo` mode: all 4 gates are skipped. Output notes all 4 as: "[GATE-ID] skipped — solo mode".
---
## Test Cases
### Case 1: Happy Path — Full mode, 3 concepts, user picks one, all 4 directors approve
**Fixture:**
- No existing `design/gdd/game-concept.md`
- `production/session-state/review-mode.txt` contains `full`
**Input:** `/brainstorm`
**Expected behavior:**
1. Skill asks the user questions about genre, scope, and target feeling
2. Skill presents 3 concept options with pros/cons each
3. User selects one concept
4. Skill elaborates the chosen concept into a structured draft
5. All 4 director gates spawn in parallel: CD-PILLARS, AD-CONCEPT-VISUAL, TD-FEASIBILITY, PR-SCOPE
6. All 4 return APPROVED
7. Skill asks "May I write `design/gdd/game-concept.md`?"
8. Concept written after approval
**Assertions:**
- [ ] Exactly 3 concept options are presented (not 1, not 5+)
- [ ] All 4 director gates spawn in parallel (not sequentially)
- [ ] All 4 gates complete before the "May I write" ask
- [ ] "May I write `design/gdd/game-concept.md`?" is asked before writing
- [ ] Concept file is NOT written without user approval
- [ ] Next-step handoff to `/map-systems` is present
---
### Case 2: Failure Path — CD-PILLARS returns REJECT
**Fixture:**
- Concept draft is complete
- `production/session-state/review-mode.txt` contains `full`
- CD-PILLARS gate returns REJECT: "The concept has no identifiable creative pillar"
**Input:** `/brainstorm`
**Expected behavior:**
1. CD-PILLARS gate returns REJECT with specific feedback
2. Skill surfaces the rejection to the user
3. Concept is NOT written to file
4. User is asked: rethink the concept direction, or override the rejection
5. If rethinking: skill returns to the concept options phase
**Assertions:**
- [ ] Concept is NOT written when CD-PILLARS returns REJECT
- [ ] Rejection feedback is shown to the user verbatim
- [ ] User is given the option to rethink or override
- [ ] Skill returns to concept ideation phase if user chooses to rethink
---
### Case 3: Lean Mode — All 4 gates skipped; concept written after user confirms
**Fixture:**
- No existing game concept
- `production/session-state/review-mode.txt` contains `lean`
**Input:** `/brainstorm`
**Expected behavior:**
1. Concept options are presented and user selects one
2. Concept is elaborated into a structured draft
3. All 4 director gates are skipped — each noted: "[GATE-ID] skipped — lean mode"
4. Skill asks user to confirm the concept is ready to write
5. "May I write `design/gdd/game-concept.md`?" asked after confirmation
6. Concept written after approval
**Assertions:**
- [ ] All 4 gate skip notes appear: "CD-PILLARS skipped — lean mode", "AD-CONCEPT-VISUAL skipped — lean mode", "TD-FEASIBILITY skipped — lean mode", "PR-SCOPE skipped — lean mode"
- [ ] Concept is written after user confirmation only (no director approval needed in lean)
- [ ] "May I write" is still asked before writing
---
### Case 4: Solo Mode — All gates skipped; concept written with only user approval
**Fixture:**
- No existing game concept
- `production/session-state/review-mode.txt` contains `solo`
**Input:** `/brainstorm`
**Expected behavior:**
1. Concept options are presented and user selects one
2. Concept draft is shown to user
3. All 4 director gates are skipped — each noted with "solo mode"
4. "May I write `design/gdd/game-concept.md`?" asked
5. Concept written after user approval
**Assertions:**
- [ ] All 4 skip notes appear with "solo mode" label
- [ ] No director agents are spawned
- [ ] Concept is written with only user approval
- [ ] Behavior is otherwise equivalent to lean mode for this skill
---
### Case 5: Director Gate — PR-SCOPE returns CONCERNS (scope too large)
**Fixture:**
- Concept draft is complete
- `production/session-state/review-mode.txt` contains `full`
- PR-SCOPE gate returns CONCERNS: "The concept scope would require 18+ months for a solo developer"
**Input:** `/brainstorm`
**Expected behavior:**
1. PR-SCOPE gate returns CONCERNS with specific scope feedback
2. Skill surfaces the scope concerns to the user
3. Scope concerns are documented in the concept draft before writing
4. User is asked: reduce scope, accept concerns and document them, or rethink
5. If concerns are accepted: concept is written with a "Scope Risk" note embedded
**Assertions:**
- [ ] PR-SCOPE concerns are shown to the user before the "May I write" ask
- [ ] Skill does NOT write concept without surfacing scope concerns
- [ ] If user accepts: scope concerns are documented in the concept file
- [ ] Skill does NOT auto-reject a concept due to PR-SCOPE CONCERNS (user decides)
---
## Protocol Compliance
- [ ] Presents 2-4 concept options with pros/cons before user commits
- [ ] User confirms concept direction before director gates are invoked
- [ ] All 4 director gates spawn in parallel in full mode
- [ ] All 4 gates skipped in lean AND solo mode — each noted by name
- [ ] "May I write `design/gdd/game-concept.md`?" asked before writing
- [ ] Ends with next-step handoff: `/map-systems`
---
## Coverage Notes
- AD-CONCEPT-VISUAL gate (art director feasibility) is grouped with the other
3 gates in the parallel spawn — not independently fixture-tested.
- The iterative concept refinement loop (user rejects all options, skill
generates new ones) is not fixture-tested — it follows the same pattern as
the option selection phase.
- The game-concept.md document structure (required sections) is defined in the
skill body and not re-enumerated in test assertions.

View File

@@ -0,0 +1,174 @@
# Skill Test Spec: /bug-report
## Skill Summary
`/bug-report` creates a structured bug report document from a user description.
It produces a report with the following required fields: Title, Repro Steps,
Expected Behavior, Actual Behavior, Severity (CRITICAL/HIGH/MEDIUM/LOW), Affected
System(s), and Build/Version. If the user's initial description is missing any
required field, the skill asks follow-up questions to fill the gaps before
producing the draft.
The skill checks for possibly duplicate reports (by comparing to existing files
in `production/bugs/`) and offers to link rather than create a new report. Each
report is written to `production/bugs/bug-[date]-[slug].md` after a "May I write"
ask. No director gates are used — bug reporting is an operational utility.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" collaborative protocol language before writing the report
- [ ] Has a next-step handoff (e.g., `/bug-triage` to reprioritize, `/hotfix` for critical)
---
## Director Gate Checks
None. `/bug-report` is an operational documentation skill. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — User describes a crash, full report produced
**Fixture:**
- `production/bugs/` directory exists and is empty
- No similar existing reports
**Input:** `/bug-report` (user describes: "Game crashes when player enters the boss arena")
**Expected behavior:**
1. Skill extracts: Title = "Game crashes when entering boss arena"
2. Skill recognizes crash reports as CRITICAL severity
3. Skill confirms repro steps, expected (no crash), actual (crash), affected system
(arena/boss), and build version with the user
4. Skill drafts the full structured report
5. Skill asks "May I write to `production/bugs/bug-2026-04-06-game-crashes-boss-arena.md`?"
6. File is written on approval; verdict is COMPLETE
**Assertions:**
- [ ] All 7 required fields are present in the report
- [ ] Severity is CRITICAL for a crash report
- [ ] Filename follows the `bug-[date]-[slug].md` convention
- [ ] "May I write" is asked with the full file path
- [ ] Verdict is COMPLETE
---
### Case 2: Minimal Input — Skill asks follow-up questions for missing fields
**Fixture:**
- User provides: "Sometimes the audio cuts out"
- No existing reports
**Input:** `/bug-report`
**Expected behavior:**
1. Skill identifies missing required fields: repro steps, expected vs. actual,
severity, affected system, build
2. Skill asks targeted follow-up questions for each missing field (one at a time
or in a structured prompt)
3. User provides answers
4. Skill compiles complete report from answers
5. Skill asks "May I write?" and writes on approval
**Assertions:**
- [ ] At least 3 follow-up questions are asked to fill missing fields
- [ ] Each required field is filled before the report is finalized
- [ ] Report is not written until all required fields are present
- [ ] Verdict is COMPLETE after all fields are filled and file is written
---
### Case 3: Possible Duplicate — Offers to link rather than create new
**Fixture:**
- `production/bugs/bug-2026-03-20-audio-cut-out.md` already exists with
similar title and MEDIUM severity
**Input:** `/bug-report` (user describes: "Audio randomly stops working")
**Expected behavior:**
1. Skill scans existing reports and finds the similar audio bug
2. Skill reports: "A similar bug report exists: bug-2026-03-20-audio-cut-out.md"
3. Skill presents options: link as duplicate (add note to existing), create new anyway
4. If user chooses link: skill adds a cross-reference note to the existing file
(asks "May I update the existing report?")
5. If user chooses create new: normal report creation proceeds
**Assertions:**
- [ ] Existing similar report is surfaced before creating a new one
- [ ] User is given the choice (not forced to link or create)
- [ ] If linking: "May I update" is asked before modifying the existing file
- [ ] Verdict is COMPLETE in either path
---
### Case 4: Multi-System Bug — Report created with multiple system tags
**Fixture:**
- No existing reports
**Input:** `/bug-report` (user describes: "After finishing a level, the save system
freezes and the UI doesn't show the completion screen")
**Expected behavior:**
1. Skill identifies 2 affected systems from the description: Save System and UI
2. Report is drafted with both systems listed under Affected System(s)
3. Severity is assessed (likely HIGH — data loss risk from save freeze)
4. Skill asks "May I write" with the appropriate filename
5. Report is written with both systems tagged; verdict is COMPLETE
**Assertions:**
- [ ] Both affected systems are listed in the report
- [ ] Single report is created (not one per system)
- [ ] Severity reflects the most impactful component (save freeze → HIGH or CRITICAL)
- [ ] Verdict is COMPLETE
---
### Case 5: Director Gate Check — No gate; bug reporting is operational
**Fixture:**
- Any bug description provided
**Input:** `/bug-report`
**Expected behavior:**
1. Skill creates and writes the bug report
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Skill reaches COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Collects all 7 required fields before drafting the report
- [ ] Asks follow-up questions for any missing required fields
- [ ] Checks for similar existing reports before creating a new one
- [ ] Asks "May I write to `production/bugs/bug-[date]-[slug].md`?" before writing
- [ ] Verdict is COMPLETE when the report file is written
---
## Coverage Notes
- The case where the user provides a severity that seems too low for the
described impact (e.g., LOW for a crash) is not tested; the skill may suggest
a higher severity but ultimately respects user input.
- Build/version field is required but may be "unknown" if the user doesn't know —
this is accepted as a valid value and not tested separately.
- Report slug generation (sanitizing the title into a filename) is an
implementation detail not assertion-tested here.

View File

@@ -0,0 +1,174 @@
# Skill Test Spec: /bug-triage
## Skill Summary
`/bug-triage` reads all open bug reports in `production/bugs/` and produces a
prioritized triage table sorted by severity (CRITICAL → HIGH → MEDIUM → LOW).
It runs on the Haiku model (read-only, formatting/sorting task) and produces no
file writes — the triage output is conversational. The skill flags bugs missing
reproduction steps and identifies possible duplicates by comparing titles and
affected systems.
The verdict is always TRIAGED — the skill is advisory and informational. No
director gates apply. The output is intended to help a producer or QA lead
prioritize which bugs to address next.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: TRIAGED
- [ ] Does NOT contain "May I write" language (skill is read-only)
- [ ] Has a next-step handoff (e.g., `/bug-report` to create new reports, `/hotfix` for critical bugs)
---
## Director Gate Checks
None. `/bug-triage` is a read-only advisory skill. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — 5 bugs of varying severity, sorted table produced
**Fixture:**
- `production/bugs/` contains 5 bug report files:
- bug-2026-03-10-audio-crash.md (CRITICAL)
- bug-2026-03-12-score-overflow.md (HIGH)
- bug-2026-03-14-ui-overlap.md (MEDIUM)
- bug-2026-03-15-typo-tutorial.md (LOW)
- bug-2026-03-16-vfx-flicker.md (HIGH)
**Input:** `/bug-triage`
**Expected behavior:**
1. Skill reads all 5 bug report files
2. Skill extracts severity, title, system, and repro status from each
3. Skill produces a triage table sorted: CRITICAL first, then HIGH, MEDIUM, LOW
4. Within the same severity, bugs are ordered by date (oldest first)
5. Verdict is TRIAGED
**Assertions:**
- [ ] Triage table has exactly 5 rows
- [ ] CRITICAL bug appears before both HIGH bugs
- [ ] HIGH bugs appear before MEDIUM and LOW bugs
- [ ] Verdict is TRIAGED
- [ ] No files are written
---
### Case 2: No Bug Reports Found — Guidance to run /bug-report
**Fixture:**
- `production/bugs/` directory exists but is empty (or does not exist)
**Input:** `/bug-triage`
**Expected behavior:**
1. Skill scans `production/bugs/` and finds no reports
2. Skill outputs: "No open bug reports found in production/bugs/"
3. Skill suggests running `/bug-report` to create a bug report
4. No triage table is produced
**Assertions:**
- [ ] Output explicitly states no bugs were found
- [ ] `/bug-report` is suggested as the next step
- [ ] Skill does not error out — it handles empty directory gracefully
- [ ] Verdict is TRIAGED (with "no bugs found" context)
---
### Case 3: Bug Missing Reproduction Steps — Flagged as NEEDS REPRO INFO
**Fixture:**
- `production/bugs/` contains 3 bug reports; one has an empty "Repro Steps" section
**Input:** `/bug-triage`
**Expected behavior:**
1. Skill reads all 3 reports
2. Skill detects the report with no repro steps
3. That bug appears in the triage table with a `NEEDS REPRO INFO` tag
4. Other bugs are triaged normally
5. Verdict is TRIAGED
**Assertions:**
- [ ] `NEEDS REPRO INFO` tag appears next to the bug missing repro steps
- [ ] The flagged bug is still included in the table (not excluded)
- [ ] Other bugs are unaffected
- [ ] Verdict is TRIAGED
---
### Case 4: Possible Duplicate Bugs — Flagged in triage output
**Fixture:**
- `production/bugs/` contains 2 bug reports with similar titles:
- bug-2026-03-18-player-fall-through-floor.md
- bug-2026-03-20-player-clips-through-floor.md
- Both affect the "Physics" system with identical severity
**Input:** `/bug-triage`
**Expected behavior:**
1. Skill reads both reports and detects similar title + same system + same severity
2. Both bugs are included in the triage table
3. Each is tagged with `POSSIBLE DUPLICATE` and cross-references the other report
4. No bugs are merged or deleted — flagging is advisory
5. Verdict is TRIAGED
**Assertions:**
- [ ] Both bugs appear in the table (not merged)
- [ ] Both are tagged `POSSIBLE DUPLICATE`
- [ ] Each cross-references the other (by filename or title)
- [ ] Verdict is TRIAGED
---
### Case 5: Director Gate Check — No gate; triage is advisory
**Fixture:**
- `production/bugs/` contains any number of reports
**Input:** `/bug-triage`
**Expected behavior:**
1. Skill produces the triage table
2. No director agents are spawned
3. No gate IDs appear in output
4. No write tool is called
**Assertions:**
- [ ] No director gate is invoked
- [ ] No write tool is called
- [ ] No gate skip messages appear
- [ ] Verdict is TRIAGED without any gate check
---
## Protocol Compliance
- [ ] Reads all files in `production/bugs/` before generating the table
- [ ] Sorts by severity (CRITICAL → HIGH → MEDIUM → LOW)
- [ ] Flags bugs missing repro steps
- [ ] Flags possible duplicates by title/system similarity
- [ ] Does not write any files
- [ ] Verdict is TRIAGED in all cases (even empty)
---
## Coverage Notes
- The case where a bug report is malformed (missing severity field entirely)
is not fixture-tested; skill would flag it as `UNKNOWN SEVERITY` and sort it
last in the table.
- Status transitions (marking bugs as resolved) are outside this skill's scope —
bug-triage is read-only.
- The duplicate detection heuristic (title similarity + same system) is
approximate; exact matching logic is defined in the skill body.

View File

@@ -0,0 +1,175 @@
# Skill Test Spec: /day-one-patch
## Skill Summary
`/day-one-patch` prepares a day-one patch plan for issues that are known at
launch but deferred from the v1.0 release. It reads open bug reports in
`production/bugs/`, deferred acceptance criteria from story files (stories
marked `Status: Done` but with noted deferred ACs), and produces a prioritized
patch plan with estimated fix timelines per issue.
The patch plan is written to `production/releases/day-one-patch.md` after a
"May I write" ask. If a P0 (critical post-ship) issue is discovered, the skill
triggers guidance to run `/hotfix` before the patch. No director gates apply.
The verdict is always COMPLETE.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" collaborative protocol language before writing the plan
- [ ] Has a next-step handoff (e.g., `/hotfix` for P0 issues, `/release-checklist` for follow-up)
---
## Director Gate Checks
None. `/day-one-patch` is a release planning utility. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — 3 Known Issues, Patch Plan With Fix Estimates
**Fixture:**
- `production/bugs/` contains 3 open bugs with severities: 1 MEDIUM, 2 LOW
- No deferred ACs in sprint stories
- All bugs have repro steps and system identifications
**Input:** `/day-one-patch`
**Expected behavior:**
1. Skill reads all 3 open bugs
2. Skill assigns fix effort estimates: MEDIUM bug = 1-2 days, LOW bugs = 4 hours each
3. Skill produces a patch plan prioritizing MEDIUM bug first
4. Plan includes: priority order, estimated timeline, responsible system, fix description
5. Skill asks "May I write to `production/releases/day-one-patch.md`?"
6. File written; verdict is COMPLETE
**Assertions:**
- [ ] All 3 bugs appear in the plan
- [ ] Bugs are prioritized by severity (MEDIUM before LOW)
- [ ] Fix estimates are provided per issue
- [ ] "May I write" is asked before writing
- [ ] Verdict is COMPLETE
---
### Case 2: Critical Issue Discovered Post-Ship — P0, Triggers /hotfix Guidance
**Fixture:**
- A CRITICAL severity bug is found in `production/bugs/` after the v1.0 release
- The bug causes data loss for all save files
**Input:** `/day-one-patch`
**Expected behavior:**
1. Skill reads bugs and identifies the CRITICAL severity issue
2. Skill escalates: "P0 ISSUE DETECTED — data loss bug requires immediate hotfix
before patch planning can proceed"
3. Skill does NOT include the P0 issue in the patch plan timeline
4. Skill explicitly directs: "Run `/hotfix` to resolve this issue first"
5. After P0 guidance is issued: plan for remaining lower-severity bugs is still
generated and written; verdict is COMPLETE
**Assertions:**
- [ ] P0 escalation message appears prominently before the patch plan
- [ ] `/hotfix` is explicitly directed for the P0 issue
- [ ] P0 issue is NOT scheduled in the patch plan timeline (it needs immediate action)
- [ ] Non-P0 issues are still planned; verdict is COMPLETE
---
### Case 3: Deferred AC From Story-Done — Pulled Into Patch Plan Automatically
**Fixture:**
- `production/sprints/sprint-008.md` has a story with `Status: Done` and a note:
"DEFERRED AC: Gamepad vibration on damage — deferred to post-launch patch"
- No open bugs for the same system
**Input:** `/day-one-patch`
**Expected behavior:**
1. Skill reads sprint stories and detects the deferred AC note
2. Deferred AC is automatically included in the patch plan as a work item
3. Plan entry: "Deferred from sprint-008: Gamepad vibration on damage"
4. Fix estimate is assigned; patch plan written after "May I write" approval
5. Verdict is COMPLETE
**Assertions:**
- [ ] Deferred ACs from story files are automatically pulled into the plan
- [ ] Deferred items are labeled by their source story (sprint-008)
- [ ] Deferred AC gets a fix estimate like bug entries
- [ ] Verdict is COMPLETE
---
### Case 4: No Known Issues — Empty Plan With Template Note
**Fixture:**
- `production/bugs/` is empty
- No stories have deferred ACs
**Input:** `/day-one-patch`
**Expected behavior:**
1. Skill reads bugs — none found
2. Skill reads story deferred ACs — none found
3. Skill produces an empty patch plan with a note: "No known issues at launch"
4. Template structure is preserved (headers intact) for future use
5. Skill asks "May I write to `production/releases/day-one-patch.md`?"
6. File written; verdict is COMPLETE
**Assertions:**
- [ ] "No known issues at launch" note appears in the written file
- [ ] Template headers are present in the empty plan
- [ ] Skill does NOT error out when there are no issues to plan
- [ ] Verdict is COMPLETE
---
### Case 5: Director Gate Check — No gate; day-one-patch is a planning utility
**Fixture:**
- Known issues present in production/bugs/
**Input:** `/day-one-patch`
**Expected behavior:**
1. Skill generates and writes the patch plan
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Reads open bugs from `production/bugs/` before generating the plan
- [ ] Scans story files for deferred AC notes
- [ ] Escalates CRITICAL (P0) bugs with explicit `/hotfix` guidance
- [ ] Produces an empty plan with note when no issues exist (not an error)
- [ ] Asks "May I write to `production/releases/day-one-patch.md`?" before writing
- [ ] Verdict is COMPLETE in all paths
---
## Coverage Notes
- The case where multiple CRITICAL bugs exist is handled the same as Case 2;
all P0 issues are escalated together.
- Timeline estimation for the patch (e.g., "patch available in 3 days")
requires manual QA and build time estimates; the skill uses rough estimates
based on severity, not actual team velocity.
- The patch notes player communication document (`/patch-notes`) is a separate
skill invoked after the patch plan is executed.

View File

@@ -0,0 +1,172 @@
# Skill Test Spec: /help
## Skill Summary
`/help` analyzes what has been done and what comes next in the project workflow.
It runs on the Haiku model (read-only, formatting task) and reads `production/stage.txt`,
the active sprint file, and recent session state to produce a concise situational
guidance summary. The skill optionally accepts a context query (e.g., `/help testing`)
to surface relevant skills for a specific topic.
The output is always informational — no files are written and no director gates
are invoked. The verdict is always HELP COMPLETE. The skill serves as a workflow
navigator, suggesting 2-3 next skills based on the current project state.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: HELP COMPLETE
- [ ] Does NOT contain "May I write" language (skill is read-only)
- [ ] Has a next-step handoff (suggests 2-3 relevant skills based on state)
---
## Director Gate Checks
None. `/help` is a read-only navigation skill. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — Production stage with active sprint
**Fixture:**
- `production/stage.txt` contains `Production`
- `production/sprints/sprint-004.md` exists with in-progress stories
- `production/session-state/active.md` has a recent checkpoint
**Input:** `/help`
**Expected behavior:**
1. Skill reads stage.txt and active sprint
2. Skill identifies current sprint number and in-progress story count
3. Skill outputs: current stage, sprint summary, and 3 suggested next skills
(e.g., `/sprint-status`, `/dev-story`, `/story-done`)
4. Suggestions are ranked by relevance to current sprint state
5. Verdict is HELP COMPLETE
**Assertions:**
- [ ] Current stage is shown (Production)
- [ ] Active sprint number and story count are mentioned
- [ ] Exactly 2-3 next-skill suggestions are given (not a list of all skills)
- [ ] Suggestions are appropriate for Production stage
- [ ] Verdict is HELP COMPLETE
- [ ] No files are written
---
### Case 2: Concept Stage — Shows concept-to-systems-design workflow path
**Fixture:**
- `production/stage.txt` contains `Concept`
- No sprint files, no GDD files
- `technical-preferences.md` is configured (engine selected)
**Input:** `/help`
**Expected behavior:**
1. Skill reads stage.txt — detects Concept stage
2. Skill outputs the Concept-stage workflow: brainstorm → map-systems → design-system
3. Suggested skills are: `/brainstorm`, `/map-systems` (if concept exists)
4. Current progress is noted: "Engine configured, concept not yet created"
**Assertions:**
- [ ] Stage is identified as Concept
- [ ] Workflow path shows the expected sequence for this stage
- [ ] Suggestions do not include Production-stage skills (e.g., `/dev-story`)
- [ ] Verdict is HELP COMPLETE
---
### Case 3: No stage.txt — Shows full workflow overview
**Fixture:**
- No `production/stage.txt`
- No sprint files
- `technical-preferences.md` has placeholders
**Input:** `/help`
**Expected behavior:**
1. Skill cannot determine stage from stage.txt
2. Skill runs project-stage-detect logic to infer stage from artifacts
3. If stage cannot be inferred: outputs the full workflow overview from
Concept through Release as a reference map
4. Primary suggestion is `/start` to begin configuration
**Assertions:**
- [ ] Skill does not crash when stage.txt is absent
- [ ] Full workflow overview is shown when stage cannot be determined
- [ ] `/start` or `/project-stage-detect` is a top suggestion
- [ ] Verdict is HELP COMPLETE
---
### Case 4: Context Query — User asks for help with testing
**Fixture:**
- `production/stage.txt` contains `Production`
- Active sprint has a story with `Status: In Review`
**Input:** `/help testing`
**Expected behavior:**
1. Skill reads context query: "testing"
2. Skill surfaces skills relevant to testing: `/qa-plan`, `/smoke-check`,
`/regression-suite`, `/test-setup`, `/test-evidence-review`
3. Output is focused on testing workflow, not general sprint navigation
4. Currently in-review story is highlighted as a testing candidate
**Assertions:**
- [ ] Context query is acknowledged in output ("Help topic: testing")
- [ ] At least 3 testing-relevant skills are listed
- [ ] General sprint skills (e.g., `/sprint-plan`) are not the primary suggestions
- [ ] Verdict is HELP COMPLETE
---
### Case 5: Director Gate Check — No gate; help is read-only navigation
**Fixture:**
- Any project state
**Input:** `/help`
**Expected behavior:**
1. Skill produces workflow guidance summary
2. No director agents are spawned
3. No gate IDs appear in output
4. No write tool is called
**Assertions:**
- [ ] No director gate is invoked
- [ ] No write tool is called
- [ ] No gate skip messages appear
- [ ] Verdict is HELP COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Reads stage, sprint, and session state before generating suggestions
- [ ] Suggestions are specific to the current project state (not generic)
- [ ] Context query (if provided) narrows the suggestion set
- [ ] Does not write any files
- [ ] Verdict is HELP COMPLETE in all cases
---
## Coverage Notes
- The case where the active sprint is complete (all stories Done) is not
separately tested; the skill would suggest `/sprint-plan` for the next sprint.
- The `/help` skill does not validate whether suggested skills are available —
it assumes standard skill catalog availability.
- Stage detection fallback (when stage.txt is absent) delegates to the same
logic as `/project-stage-detect` and is not re-tested here in detail.

View File

@@ -0,0 +1,173 @@
# Skill Test Spec: /hotfix
## Skill Summary
`/hotfix` manages an emergency fix workflow: it creates a hotfix branch from
main, applies a targeted fix to the identified file(s), runs `/smoke-check` to
validate the fix doesn't introduce regressions, and prompts the user to confirm
merge back to main. Each code change requires a "May I write to [filepath]?" ask.
Git operations (branch creation, merge) are presented as Bash commands for user
confirmation before execution.
The skill is time-sensitive — director review is optional post-hoc, not a
blocking gate. Verdicts: HOTFIX COMPLETE (fix applied, smoke check passed, merged)
or HOTFIX BLOCKED (fix introduced regression or user declined).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: HOTFIX COMPLETE, HOTFIX BLOCKED
- [ ] Contains "May I write" language for code changes
- [ ] Has a next-step handoff (e.g., `/bug-report` to document the issue, or version bump)
---
## Director Gate Checks
None. Hotfixes are time-critical. Director review may follow separately as a
post-hoc step. No gate is invoked within this skill.
---
## Test Cases
### Case 1: Happy Path — Critical crash bug fixed, smoke check passes
**Fixture:**
- `main` branch is clean
- Bug is identified in `src/gameplay/arena.gd` (crash on boss arena entry)
- Repro steps are provided by user
**Input:** `/hotfix` (user describes the crash and affected file)
**Expected behavior:**
1. Skill proposes creating a hotfix branch: `hotfix/boss-arena-crash`
2. User confirms; Bash command for branch creation is shown and confirmed
3. Skill identifies the fix location in `arena.gd` and drafts the change
4. Skill asks "May I write to `src/gameplay/arena.gd`?" and applies fix on approval
5. Skill runs `/smoke-check` — PASS
6. Skill presents the merge command and asks user to confirm merge to `main`
7. User confirms; merge executes; verdict is HOTFIX COMPLETE
**Assertions:**
- [ ] Hotfix branch is created before any code changes
- [ ] "May I write" is asked before modifying any source file
- [ ] `/smoke-check` runs after the fix is applied
- [ ] Merge requires explicit user confirmation (not automatic)
- [ ] Verdict is HOTFIX COMPLETE after successful merge
---
### Case 2: Smoke Check Fails — HOTFIX BLOCKED
**Fixture:**
- Fix has been applied to `src/gameplay/arena.gd`
- `/smoke-check` returns FAIL: "Player health clamping regression detected"
**Input:** `/hotfix`
**Expected behavior:**
1. Skill applies the fix and runs `/smoke-check`
2. Smoke check returns FAIL with specific regression identified
3. Skill reports: "HOTFIX BLOCKED — smoke check failed: [regression detail]"
4. Skill presents options: attempt revised fix, revert changes, or merge with
known regression (user acknowledges risk)
5. No automatic merge occurs when smoke check fails
**Assertions:**
- [ ] Verdict is HOTFIX BLOCKED
- [ ] Smoke check failure is shown verbatim to user
- [ ] Merge is NOT performed automatically when smoke check fails
- [ ] User is given explicit options for how to proceed
---
### Case 3: Fix to Already-Released Build — Version tag noted, patch bump prompted
**Fixture:**
- Latest git tag is `v1.2.0`
- Hotfix targets a bug in the v1.2.0 release
**Input:** `/hotfix`
**Expected behavior:**
1. Skill detects that the current HEAD is a tagged release (v1.2.0)
2. Skill notes: "Hotfix targeting tagged release v1.2.0"
3. After smoke check passes, skill prompts: "Should version be bumped to v1.2.1?"
4. If user confirms version bump: skill asks "May I write to VERSION or equivalent?"
5. After version update and merge: verdict is HOTFIX COMPLETE with version noted
**Assertions:**
- [ ] Version tag context is detected and surfaced to user
- [ ] Patch version bump is suggested (not required) after merge
- [ ] Version bump requires its own "May I write" confirmation
- [ ] Verdict is HOTFIX COMPLETE
---
### Case 4: No Repro Steps — Skill Asks Before Applying Fix
**Fixture:**
- User invokes `/hotfix` with a vague description: "something is broken on level 3"
- No repro steps provided
**Input:** `/hotfix` (vague description)
**Expected behavior:**
1. Skill detects insufficient information to identify the fix location
2. Skill asks: "Please provide reproduction steps and the affected file or system"
3. Skill does NOT create a branch or modify any file until repro steps are provided
4. After user provides repro steps: normal hotfix flow begins
**Assertions:**
- [ ] No branch is created without repro steps
- [ ] No code changes are made without a clearly identified fix location
- [ ] Repro step request is specific (not a generic "please provide more info")
- [ ] Normal hotfix flow resumes after user provides repro steps
---
### Case 5: Director Gate Check — No gate; hotfixes are time-critical
**Fixture:**
- Critical bug with repro steps identified
**Input:** `/hotfix`
**Expected behavior:**
1. Skill completes the hotfix workflow
2. No director agents are spawned during execution
3. No gate IDs appear in output
4. Post-hoc director review (if needed) is a manual follow-up, not invoked here
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is HOTFIX COMPLETE or HOTFIX BLOCKED — no gate verdict
---
## Protocol Compliance
- [ ] Creates hotfix branch before making any code changes
- [ ] Asks "May I write" before modifying any source files
- [ ] Runs `/smoke-check` after applying the fix
- [ ] Requires explicit user confirmation before merging
- [ ] HOTFIX BLOCKED when smoke check fails — no automatic merge
- [ ] Verdict is HOTFIX COMPLETE or HOTFIX BLOCKED
---
## Coverage Notes
- The case where multiple files need to be modified for one fix follows the same
"May I write" per-file pattern and is not separately tested.
- The post-hotfix steps (create bug report, update changelog) are suggested in
the handoff but not tested as part of this skill's execution.
- Conflict resolution during the merge (if main has diverged) is not tested;
the skill would surface the conflict and ask the user to resolve it manually.

View File

@@ -0,0 +1,180 @@
# Skill Test Spec: /launch-checklist
## Skill Summary
`/launch-checklist` generates and evaluates a complete launch readiness checklist
covering: legal compliance (EULA, privacy policy, ESRB/PEGI ratings), platform
certification status, store page completeness (screenshots, description, metadata),
build validation (version tag, reproducible build), analytics and crash reporting
configuration, and first-run experience verification.
The skill produces a checklist report written to `production/launch/launch-checklist-[date].md`
after a "May I write" ask. If a previous launch checklist exists, it compares the
new results against the old to highlight newly resolved and newly blocked items. No
director gates apply — `/team-release` orchestrates the full release pipeline. Verdicts:
LAUNCH READY, LAUNCH BLOCKED, or CONCERNS.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: LAUNCH READY, LAUNCH BLOCKED, CONCERNS
- [ ] Contains "May I write" collaborative protocol language before writing the checklist
- [ ] Has a next-step handoff (e.g., `/team-release` or `/day-one-patch`)
---
## Director Gate Checks
None. `/launch-checklist` is a readiness audit utility. The full release pipeline
is managed by `/team-release`.
---
## Test Cases
### Case 1: Happy Path — All Checklist Items Verified, LAUNCH READY
**Fixture:**
- Legal docs present: EULA, privacy policy in `production/legal/`
- Platform certification: marked as submitted and approved in production notes
- Store page assets: screenshots, description, metadata all present in `production/store/`
- Build: version tag `v1.0.0` exists, reproducible build confirmed
- Crash reporting: configured in `technical-preferences.md`
**Input:** `/launch-checklist`
**Expected behavior:**
1. Skill checks all checklist categories
2. All items pass their verification checks
3. Skill produces checklist report with all items marked PASS
4. Skill asks "May I write to `production/launch/launch-checklist-2026-04-06.md`?"
5. Report written on approval; verdict is LAUNCH READY
**Assertions:**
- [ ] All checklist categories are checked (legal, platform, store, build, analytics, UX)
- [ ] All items appear in the report with PASS markers
- [ ] Verdict is LAUNCH READY
- [ ] "May I write" is asked with the correct dated filename
---
### Case 2: Platform Certification Not Submitted — LAUNCH BLOCKED
**Fixture:**
- All other checklist items pass
- Platform certification section: "not submitted" (no submission record found)
**Input:** `/launch-checklist`
**Expected behavior:**
1. Skill checks all items
2. Platform certification check fails: no submission record
3. Skill reports: "LAUNCH BLOCKED — Platform certification not submitted"
4. Specific platform(s) missing certification are named
5. Verdict is LAUNCH BLOCKED
**Assertions:**
- [ ] Verdict is LAUNCH BLOCKED (not CONCERNS)
- [ ] Platform certification is identified as the blocking item
- [ ] Missing platform names are specified
- [ ] All other passing items are still shown in the report
---
### Case 3: Manual Check Required — CONCERNS Verdict
**Fixture:**
- All critical checklist items pass
- First-run experience item: "MANUAL CHECK NEEDED — human must play the first 5
minutes and verify tutorial completion flow"
- Store screenshots item: "MANUAL CHECK NEEDED — art team must verify screenshot
quality matches current build"
**Input:** `/launch-checklist`
**Expected behavior:**
1. Skill checks all items
2. 2 items are flagged as requiring human verification
3. Skill reports: "CONCERNS — 2 items require manual verification before launch"
4. Both items are listed with instructions for what to manually verify
5. Verdict is CONCERNS (not LAUNCH BLOCKED, since these are advisory)
**Assertions:**
- [ ] Verdict is CONCERNS (not LAUNCH READY or LAUNCH BLOCKED)
- [ ] Both manual check items are listed with verification instructions
- [ ] Skill does not auto-block on MANUAL CHECK items
---
### Case 4: Previous Checklist Exists — Delta Comparison
**Fixture:**
- `production/launch/launch-checklist-2026-03-25.md` exists with previous results:
- 2 items were BLOCKED (platform cert, crash reporting)
- 1 item had a MANUAL CHECK
- New checklist: platform cert is now PASS, crash reporting is now PASS,
manual check still open; 1 new item flagged (EULA last updated date)
**Input:** `/launch-checklist`
**Expected behavior:**
1. Skill finds the previous checklist and loads it for comparison
2. Skill produces the new checklist and compares:
- Newly resolved: "Platform cert — was BLOCKED, now PASS"
- Newly resolved: "Crash reporting — was BLOCKED, now PASS"
- Still open: manual check (unchanged)
- New issue: EULA last updated date (not in previous checklist)
3. Delta is shown prominently in the report
4. Verdict is CONCERNS (manual check + new EULA question)
**Assertions:**
- [ ] Delta section shows newly resolved items
- [ ] Delta section shows new issues (not present in previous checklist)
- [ ] Still-open items from the previous checklist are noted as persistent
- [ ] Verdict reflects the current state (not the previous state)
---
### Case 5: Director Gate Check — No gate; launch-checklist is an audit utility
**Fixture:**
- All checklist dependencies present
**Input:** `/launch-checklist`
**Expected behavior:**
1. Skill runs the full checklist and writes the report
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is LAUNCH READY, LAUNCH BLOCKED, or CONCERNS — no gate verdict
---
## Protocol Compliance
- [ ] Checks all required categories (legal, platform, store, build, analytics, UX)
- [ ] LAUNCH BLOCKED for hard failures (uncompleted certifications, missing legal docs)
- [ ] CONCERNS for advisory items requiring manual verification
- [ ] Compares against previous checklist when one exists
- [ ] Asks "May I write" before creating the checklist report
- [ ] Verdict is LAUNCH READY, LAUNCH BLOCKED, or CONCERNS
---
## Coverage Notes
- Region-specific compliance (GDPR data handling, COPPA for under-13 audiences)
is checked but the specific requirements are not enumerated in test assertions.
- The store page completeness check (screenshots, description) relies on the
presence of files in `production/store/`; it cannot verify visual quality.
- Build reproducibility check validates the presence of a version tag and build
configuration but does not execute the build process.

View File

@@ -0,0 +1,176 @@
# Skill Test Spec: /localize
## Skill Summary
`/localize` manages the full localization pipeline: it extracts all player-facing
strings from source files, manages translation files in `assets/localization/`,
and validates completeness across all locale files. For new languages, it creates
a locale file skeleton with all current strings as keys and empty values. For
existing locale files, it produces a diff showing additions, removals, and
changed keys.
Translation files are written to `assets/localization/[locale-code].csv` (or
engine-appropriate format) after a "May I write" ask. No director gates apply.
Verdicts: LOCALIZATION COMPLETE (all locales are complete) or GAPS FOUND (at
least one locale is missing string keys).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: LOCALIZATION COMPLETE, GAPS FOUND
- [ ] Contains "May I write" collaborative protocol language before writing locale files
- [ ] Has a next-step handoff (e.g., send locale skeletons to translators)
---
## Director Gate Checks
None. `/localize` is a pipeline utility. No director gates apply. Localization
lead agent may review separately but is not invoked within this skill.
---
## Test Cases
### Case 1: New Language — String Extraction and Locale Skeleton Created
**Fixture:**
- Source code in `src/` contains player-facing strings (UI text, tutorial messages)
- Existing locale: `assets/localization/en.csv`
- No French locale exists
**Input:** `/localize fr`
**Expected behavior:**
1. Skill extracts all player-facing strings from source files
2. Skill finds the same strings in `en.csv` as a reference
3. Skill generates `fr.csv` skeleton with all string keys and empty values
4. Skill asks "May I write to `assets/localization/fr.csv`?"
5. File written on approval; verdict is GAPS FOUND (file created but empty values)
6. Skill notes: "fr.csv created — send to translator to fill values"
**Assertions:**
- [ ] All string keys from `en.csv` are present in `fr.csv`
- [ ] All values in `fr.csv` are empty (not copied from English)
- [ ] "May I write" is asked before creating the file
- [ ] Verdict is GAPS FOUND (file is created but untranslated)
---
### Case 2: Existing Locale Diff — Additions, Removals, and Changes Listed
**Fixture:**
- `assets/localization/fr.csv` exists with 20 string keys translated
- Source code has changed: 3 new strings added, 1 string removed, 2 strings
with changed English source text
**Input:** `/localize fr`
**Expected behavior:**
1. Skill extracts current strings from source
2. Skill diffs against existing `fr.csv`
3. Skill produces diff report:
- 3 new keys (need translation — listed as empty in fr.csv)
- 1 removed key (marked as obsolete — suggest removal)
- 2 changed keys (English source changed — French may need update, flagged)
4. Skill asks "May I update `assets/localization/fr.csv`?"
5. File updated with new empty keys added, obsolete keys marked; verdict is GAPS FOUND
**Assertions:**
- [ ] New keys appear as empty in the updated file (not auto-translated)
- [ ] Removed keys are flagged as obsolete (not silently deleted)
- [ ] Changed source strings are flagged for translator review
- [ ] Verdict is GAPS FOUND (new empty keys exist)
---
### Case 3: String Missing in One Locale — GAPS FOUND With Missing Key List
**Fixture:**
- 3 locale files exist: `en.csv`, `fr.csv`, `de.csv`
- `de.csv` is missing 4 keys that exist in both `en.csv` and `fr.csv`
**Input:** `/localize`
**Expected behavior:**
1. Skill reads all 3 locale files and cross-references keys
2. `de.csv` is missing 4 keys
3. Skill produces GAPS FOUND report listing the 4 missing keys by locale:
"de.csv missing: [key1], [key2], [key3], [key4]"
4. Skill offers to add the missing keys as empty values to `de.csv`
5. After approval: file updated; verdict remains GAPS FOUND (values still empty)
**Assertions:**
- [ ] Missing keys are listed explicitly (not just a count)
- [ ] Missing keys are attributed to the specific locale file
- [ ] Verdict is GAPS FOUND (not LOCALIZATION COMPLETE)
- [ ] Missing keys are added as empty (not auto-translated from English)
---
### Case 4: Translation File Has Syntax Error — Error With Line Reference
**Fixture:**
- `assets/localization/fr.csv` has a malformed line at line 47
(missing quote closure)
**Input:** `/localize fr`
**Expected behavior:**
1. Skill reads `fr.csv` and encounters a parse error at line 47
2. Skill outputs: "Parse error in fr.csv at line 47: [error detail]"
3. Skill cannot diff or validate the file until the error is fixed
4. Skill does NOT attempt to overwrite or auto-fix the malformed file
5. Skill suggests fixing the file manually and re-running `/localize`
**Assertions:**
- [ ] Error message includes line number (line 47)
- [ ] Error detail describes the nature of the parse error
- [ ] Skill does NOT overwrite or modify the malformed file
- [ ] Manual fix + re-run is suggested as remediation
---
### Case 5: Director Gate Check — No gate; localization is a pipeline utility
**Fixture:**
- Source code with player-facing strings
**Input:** `/localize fr`
**Expected behavior:**
1. Skill extracts strings and manages locale files
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is LOCALIZATION COMPLETE or GAPS FOUND — no gate verdict
---
## Protocol Compliance
- [ ] Extracts strings from source before operating on locale files
- [ ] Creates new locale files with all keys as empty values (not auto-translated)
- [ ] Diffs existing locale files against current source strings
- [ ] Flags missing keys by locale and by key name
- [ ] Asks "May I write" before creating or updating any locale file
- [ ] Verdict is LOCALIZATION COMPLETE (all locales fully translated) or GAPS FOUND
---
## Coverage Notes
- LOCALIZATION COMPLETE is only achievable when all locale files have all keys
with non-empty values; new-language skeleton creation always results in GAPS FOUND.
- Engine-specific locale formats (Godot `.translation`, Unity `.po` files) are
handled by the skill body; `.csv` is used as the canonical format in tests.
- The case where source strings change at a very high rate (continuous integration
of new UI text) is not tested; the diff logic handles this case.

View File

@@ -0,0 +1,179 @@
# Skill Test Spec: /onboard
## Skill Summary
`/onboard` generates a contextual project onboarding summary tailored for a new
team member. It reads CLAUDE.md, `technical-preferences.md`, the active sprint
file, recent git commits, and `production/stage.txt` to produce a structured
orientation document. The skill runs on the Haiku model (read-only, formatting
task) and produces no file writes — all output is conversational.
The skill optionally accepts a role argument (e.g., `/onboard artist`) to tailor
the summary to a specific discipline. When the project is in an early stage or
unconfigured, the output adapts to reflect what little is known. The verdict is
always ONBOARDING COMPLETE — the skill is purely informational.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: ONBOARDING COMPLETE
- [ ] Does NOT contain "May I write" language (skill is read-only)
- [ ] Has a next-step handoff suggesting a relevant follow-on skill
---
## Director Gate Checks
None. `/onboard` is a read-only orientation skill. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — Configured project in Production stage with active sprint
**Fixture:**
- `production/stage.txt` contains `Production`
- `technical-preferences.md` has engine, language, and specialists populated
- `production/sprints/sprint-005.md` exists with stories in progress
- Git log contains 5 recent commits
**Input:** `/onboard`
**Expected behavior:**
1. Skill reads stage.txt, technical-preferences.md, active sprint, and git log
2. Skill produces an onboarding summary with sections: Project Overview, Tech Stack,
Current Stage, Active Sprint Summary, Recent Activity
3. Summary is formatted for readability (headers, bullet points)
4. Next-step suggestions are appropriate for Production stage (e.g., `/sprint-status`,
`/dev-story`)
5. Verdict ONBOARDING COMPLETE is stated
**Assertions:**
- [ ] Output includes current stage name from stage.txt
- [ ] Output includes engine and language from technical-preferences.md
- [ ] Active sprint stories are summarized (not just the sprint file name)
- [ ] Recent commit context is present
- [ ] Verdict is ONBOARDING COMPLETE
- [ ] No files are written
---
### Case 2: Fresh Project — No engine, no sprint, suggests /start
**Fixture:**
- `technical-preferences.md` contains only placeholders (`[TO BE CONFIGURED]`)
- No `production/stage.txt`
- No sprint files
- No CLAUDE.md overrides beyond defaults
**Input:** `/onboard`
**Expected behavior:**
1. Skill reads all config files and detects unconfigured state
2. Skill produces a minimal summary: "This project has not been configured yet"
3. Output explains the onboarding workflow: `/start``/setup-engine``/brainstorm`
4. Skill suggests running `/start` as the immediate next step
5. Verdict is ONBOARDING COMPLETE (informational, not a failure)
**Assertions:**
- [ ] Output explicitly mentions the project is not yet configured
- [ ] `/start` is recommended as the next step
- [ ] Skill does NOT error out — it gracefully handles an empty project state
- [ ] Verdict is still ONBOARDING COMPLETE
---
### Case 3: No CLAUDE.md Found — Error with remediation
**Fixture:**
- `CLAUDE.md` file does not exist (deleted or never created)
- All other files may or may not exist
**Input:** `/onboard`
**Expected behavior:**
1. Skill attempts to read CLAUDE.md and fails
2. Skill outputs an error: "CLAUDE.md not found — cannot generate onboarding summary"
3. Skill provides remediation: "Run `/start` to initialize the project configuration"
4. No partial summary is generated
**Assertions:**
- [ ] Error message clearly identifies the missing file as CLAUDE.md
- [ ] Remediation step (`/start`) is explicitly named
- [ ] Skill does NOT produce a partial output when the root config is missing
- [ ] Verdict is ONBOARDING COMPLETE (with error context, not a crash)
---
### Case 4: Role-Specific Onboarding — User specifies "artist" role
**Fixture:**
- Fully configured project in Production stage
- `art-bible.md` exists in `design/`
- Active sprint has visual story types (animation, VFX)
**Input:** `/onboard artist`
**Expected behavior:**
1. Skill reads all standard files plus any art-relevant docs (art bible, asset specs)
2. Summary is tailored to the artist role: art bible overview, asset pipeline,
current visual stories in the active sprint
3. Technical architecture details (code structure, ADRs) are de-emphasized
4. Specialist agents for art/audio are highlighted in the summary
5. Verdict is ONBOARDING COMPLETE
**Assertions:**
- [ ] Role argument is acknowledged in the output ("Onboarding for: Artist")
- [ ] Art bible summary is included if the file exists
- [ ] Current visual stories from the active sprint are shown
- [ ] Technical implementation details are not the primary focus
- [ ] Verdict is ONBOARDING COMPLETE
---
### Case 5: Director Gate Check — No gate; onboard is read-only orientation
**Fixture:**
- Any configured project state
**Input:** `/onboard`
**Expected behavior:**
1. Skill completes the full onboarding summary
2. No director agents are spawned at any point
3. No gate IDs appear in the output
4. No "May I write" prompts appear
**Assertions:**
- [ ] No director gate is invoked
- [ ] No write tool is called
- [ ] No gate skip messages appear
- [ ] Verdict is ONBOARDING COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Reads all source files before generating output (no hallucinated project state)
- [ ] Adapts output to project stage (Production ≠ Concept)
- [ ] Respects role argument when provided
- [ ] Does not write any files
- [ ] Ends with ONBOARDING COMPLETE verdict in all paths
---
## Coverage Notes
- The case where `technical-preferences.md` is missing entirely (as opposed to
having placeholders) is not separately tested; behavior follows the graceful
error pattern of Case 3.
- Git history reading is assumed available; offline/no-git scenarios are not
tested here.
- Discipline roles beyond "artist" (e.g., programmer, designer, producer) follow
the same tailoring pattern as Case 4 and are not separately tested.

View File

@@ -0,0 +1,178 @@
# Skill Test Spec: /playtest-report
## Skill Summary
`/playtest-report` generates a structured playtest report from session notes or
user input. The report is organized into four sections: Feel/Accessibility,
Bugs Observed, Design Feedback, and Next Steps. When multiple testers participated,
the skill aggregates feedback and distinguishes majority opinions from minority
ones. The skill links to existing bug reports when a reported bug matches a file
in `production/bugs/`.
Reports are written to `production/qa/playtest-[date].md` after a "May I write"
ask. No director gates apply here — the CD-PLAYTEST director gate (if needed) is
a separate invocation. The verdict is COMPLETE when the report is written.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" collaborative protocol language before writing the report
- [ ] Has a next-step handoff (e.g., `/bug-report` for new issues found, `/design-review` for feedback)
---
## Director Gate Checks
None. `/playtest-report` is a documentation utility. The CD-PLAYTEST gate is a
separate invocation and not part of this skill.
---
## Test Cases
### Case 1: Happy Path — User provides playtest notes, structured report produced
**Fixture:**
- User provides typed playtest notes from a single session
- Notes cover: game feel, one bug (framerate drop), and a design concern
(tutorial too long)
- `production/bugs/` exists but is empty (bug not yet reported)
**Input:** `/playtest-report` (user pastes session notes)
**Expected behavior:**
1. Skill reads the provided notes and structures them into the 4-section template
2. Feel/Accessibility: extracts feel observations
3. Bugs: notes the framerate drop with available repro details
4. Design Feedback: notes the tutorial length concern
5. Next Steps: suggests `/bug-report` for the framerate issue and `/design-review`
for the tutorial feedback
6. Skill asks "May I write to `production/qa/playtest-2026-04-06.md`?"
7. Report is written on approval; verdict is COMPLETE
**Assertions:**
- [ ] All 4 sections are present in the report
- [ ] Bug is listed in the Bugs section (not the Design Feedback section)
- [ ] Next Steps are appropriate (bug report for crash, design review for feedback)
- [ ] "May I write" is asked before writing
- [ ] Verdict is COMPLETE
---
### Case 2: Empty Input — Guided prompting through each section
**Fixture:**
- No notes provided by user at invocation
**Input:** `/playtest-report`
**Expected behavior:**
1. Skill detects empty input
2. Skill prompts through each section:
a. "Describe the overall feel and any accessibility observations"
b. "Were any bugs observed? Describe them"
c. "What design feedback did testers provide?"
3. User answers each prompt
4. Skill compiles report from answers and asks "May I write"
5. Report written on approval; verdict is COMPLETE
**Assertions:**
- [ ] At least 3 guiding questions are asked (one per main section)
- [ ] Report is not created until all sections have input (or user explicitly skips one)
- [ ] Verdict is COMPLETE after file is written
---
### Case 3: Multiple Testers — Aggregated feedback with majority/minority notes
**Fixture:**
- User provides notes from 3 testers
- 2/3 testers found the controls "intuitive"
- 1/3 tester found the UI font too small
- All 3 noted the same bug (player stuck on ledge)
**Input:** `/playtest-report` (3-tester session)
**Expected behavior:**
1. Skill identifies 3 distinct tester perspectives in the input
2. Control intuitiveness → noted as "Majority (2/3): controls intuitive"
3. Font size → noted as "Minority (1/3): UI font size concern"
4. Stuck-on-ledge bug → noted as "All testers: player stuck on ledge (confirmed)"
5. Skill generates aggregated report with majority/minority labels
6. Report written after "May I write" approval; verdict is COMPLETE
**Assertions:**
- [ ] Majority opinion (2/3) is labeled as majority
- [ ] Minority opinion (1/3) is labeled as minority
- [ ] Unanimously reported bug is noted as confirmed by all testers
- [ ] Verdict is COMPLETE
---
### Case 4: Bug Matches Existing Report — Links to existing file
**Fixture:**
- `production/bugs/bug-2026-03-30-player-stuck-ledge.md` exists
- User's playtest notes describe "player gets stuck on ledges near walls"
**Input:** `/playtest-report`
**Expected behavior:**
1. Skill structures the report and identifies the stuck-on-ledge bug
2. Skill scans `production/bugs/` and finds `bug-2026-03-30-player-stuck-ledge.md`
3. In the Bugs section, the report includes: "See existing report:
production/bugs/bug-2026-03-30-player-stuck-ledge.md"
4. Skill does NOT suggest creating a new bug report for this issue
5. Report written; verdict is COMPLETE
**Assertions:**
- [ ] Existing bug report is found and linked in the playtest report
- [ ] `/bug-report` is NOT suggested for the already-reported issue
- [ ] Cross-reference to existing file appears in the Bugs section
- [ ] Verdict is COMPLETE
---
### Case 5: Director Gate Check — No gate; CD-PLAYTEST is a separate invocation
**Fixture:**
- Playtest notes provided
**Input:** `/playtest-report`
**Expected behavior:**
1. Skill generates and writes the playtest report
2. No director agents are spawned (CD-PLAYTEST is not invoked here)
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No CD-PLAYTEST gate skip message appears
- [ ] Verdict is COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Structures output into all 4 sections (Feel, Bugs, Design Feedback, Next Steps)
- [ ] Labels majority vs. minority opinions when multiple testers are involved
- [ ] Cross-references existing bug reports when bugs match
- [ ] Asks "May I write to `production/qa/playtest-[date].md`?" before writing
- [ ] Verdict is COMPLETE when report is written
---
## Coverage Notes
- The CD-PLAYTEST director gate (creative director reviews playtest insights
for design implications) is a separate invocation and is not tested here.
- Video recording or screenshot attachments are not tested; the report is a
text-only document.
- The case where a tester's identity is unknown (anonymous feedback) follows
the same aggregation pattern as Case 3 without tester labels.

View File

@@ -0,0 +1,183 @@
# Skill Test Spec: /project-stage-detect
## Skill Summary
`/project-stage-detect` automatically analyzes project artifacts to determine
the current development stage. It runs on the Haiku model (read-only) and
examines `production/stage.txt` (if present), design documents in `design/`,
source code in `src/`, sprint and milestone files in `production/`, and the
presence of engine configuration to classify the project into one of seven
stages: Concept, Systems Design, Technical Setup, Pre-Production, Production,
Polish, or Release.
The skill is advisory — it never writes `stage.txt`. That file is only updated
when `/gate-check` passes and the user confirms advancement. The skill reports
its confidence level (HIGH if stage.txt was read directly, MEDIUM if inferred
from artifacts, LOW if conflicting signals were found).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains all seven stage names: Concept, Systems Design, Technical Setup, Pre-Production, Production, Polish, Release
- [ ] Does NOT contain "May I write" language (skill is detection-only)
- [ ] Has a next-step handoff (e.g., `/gate-check` to formally advance stage)
---
## Director Gate Checks
None. `/project-stage-detect` is a read-only detection utility. No director
gates apply.
---
## Test Cases
### Case 1: stage.txt Exists — Reads directly and cross-checks artifacts
**Fixture:**
- `production/stage.txt` contains `Production`
- `design/gdd/` has 4 GDD files
- `src/` has source code files
- `production/sprints/sprint-002.md` exists
**Input:** `/project-stage-detect`
**Expected behavior:**
1. Skill reads `production/stage.txt` — detects stage `Production`
2. Skill cross-checks artifacts: GDDs present, source code present, sprint present
3. Artifacts are consistent with Production stage
4. Skill reports: Stage = Production, Confidence = HIGH (from stage.txt, confirmed by artifacts)
5. Next step: continue with `/sprint-plan` or `/dev-story`
**Assertions:**
- [ ] Detected stage is Production
- [ ] Confidence is reported as HIGH when stage.txt is present
- [ ] Cross-check result (consistent vs. discrepant) is noted
- [ ] No files are written
- [ ] Verdict clearly states the detected stage
---
### Case 2: No stage.txt but GDDs and Epics Exist — Infers Production
**Fixture:**
- No `production/stage.txt`
- `design/gdd/` has 3 GDD files
- `production/epics/` has 2 epic files
- `src/` has source code files
- `production/sprints/sprint-001.md` exists
**Input:** `/project-stage-detect`
**Expected behavior:**
1. Skill finds no stage.txt — switches to artifact inference mode
2. Skill finds GDDs (Systems Design complete), epics (Pre-Production complete),
source code and sprints (Production active)
3. Skill infers: Stage = Production
4. Confidence is MEDIUM (inferred from artifacts, not from stage.txt)
5. Skill recommends running `/gate-check` to formalize and write stage.txt
**Assertions:**
- [ ] Inferred stage is Production
- [ ] Confidence is MEDIUM (not HIGH, since stage.txt is absent)
- [ ] Recommendation to run `/gate-check` is present
- [ ] No stage.txt is written by this skill
---
### Case 3: No stage.txt, No Docs, No Source — Infers Concept
**Fixture:**
- No `production/stage.txt`
- `design/` directory exists but is empty
- `src/` exists but contains no code files
- `technical-preferences.md` has placeholders only
**Input:** `/project-stage-detect`
**Expected behavior:**
1. Skill finds no stage.txt
2. Artifact scan: no GDDs, no source, no epics, no sprints, engine unconfigured
3. Skill infers: Stage = Concept
4. Confidence is MEDIUM
5. Skill suggests `/start` to begin the onboarding workflow
**Assertions:**
- [ ] Inferred stage is Concept
- [ ] Output lists the artifacts that were checked (and found absent)
- [ ] `/start` is suggested as the next step
- [ ] No files are written
---
### Case 4: Discrepancy — stage.txt says Production but no source code
**Fixture:**
- `production/stage.txt` contains `Production`
- `design/gdd/` has GDD files
- `src/` directory exists but contains no source code files
- No sprint files exist
**Input:** `/project-stage-detect`
**Expected behavior:**
1. Skill reads stage.txt — detects `Production`
2. Cross-check finds: no source code, no sprints — inconsistent with Production
3. Skill flags discrepancy: "stage.txt says Production but no source code or sprints found"
4. Skill reports detected stage as Production (honoring stage.txt) but
confidence drops to LOW due to artifact mismatch
5. Skill suggests reviewing stage.txt manually or running `/gate-check`
**Assertions:**
- [ ] Discrepancy is flagged explicitly in the output
- [ ] Confidence is LOW when artifacts contradict stage.txt
- [ ] stage.txt value is not silently overridden
- [ ] User is advised to verify the discrepancy manually
---
### Case 5: Director Gate Check — No gate; detection is advisory
**Fixture:**
- Any project state with or without stage.txt
**Input:** `/project-stage-detect`
**Expected behavior:**
1. Skill completes full stage detection
2. No director agents are spawned at any point
3. No gate IDs appear in output
4. No write tool is called
**Assertions:**
- [ ] No director gate is invoked
- [ ] No write tool is called
- [ ] Detection output is purely advisory
- [ ] Verdict names the detected stage without triggering any gate
---
## Protocol Compliance
- [ ] Reads stage.txt if present; falls back to artifact inference if absent
- [ ] Always reports a confidence level (HIGH / MEDIUM / LOW)
- [ ] Cross-checks stage.txt against artifacts and flags discrepancies
- [ ] Does not write stage.txt (that is `/gate-check`'s responsibility)
- [ ] Ends with a next-step recommendation appropriate to the detected stage
---
## Coverage Notes
- The Technical Setup stage (engine configured, no GDDs yet) and Pre-Production
stage (GDDs complete, no epics yet) follow the same artifact-inference pattern
as Cases 2 and 3 and are not separately fixture-tested.
- The Polish and Release stages are not fixture-tested here; they follow the
same high-confidence (stage.txt present) or inference logic.
- Confidence levels are advisory — the skill does not gate any actions on them.

View File

@@ -0,0 +1,178 @@
# Skill Test Spec: /prototype
## Skill Summary
`/prototype` manages a rapid prototyping workflow for validating a game mechanic
before committing to full production implementation. Prototypes are created in
`prototypes/[mechanic-name]/` and are intentionally disposable — coding standards
are relaxed (no ADR required, AC can be minimal, hardcoded values acceptable).
After implementation, the skill produces a findings document summarizing what
was learned and recommending next steps.
The skill asks "May I write to `prototypes/[name]/`?" before creating files. If a
prototype already exists, the skill offers to extend, replace, or archive. No
director gates apply. Verdicts: PROTOTYPE COMPLETE (prototype built and findings
documented) or PROTOTYPE ABANDONED (mechanic found to be unworkable).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: PROTOTYPE COMPLETE, PROTOTYPE ABANDONED
- [ ] Contains "May I write" language before creating prototype files
- [ ] Has a next-step handoff (e.g., `/design-system` to formalize, or archive)
---
## Director Gate Checks
None. Prototypes are throwaway validation artifacts. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — Mechanic concept prototyped, findings documented
**Fixture:**
- `prototypes/` directory exists
- No existing prototype for "grapple-hook"
**Input:** `/prototype grapple-hook`
**Expected behavior:**
1. Skill asks "May I write to `prototypes/grapple-hook/`?"
2. After approval: creates `prototypes/grapple-hook/` directory and basic
implementation skeleton (main scene, player controller extension)
3. Skill implements a minimal grapple hook mechanic (intentionally rough — no
polish, hardcoded values acceptable)
4. Skill produces `prototypes/grapple-hook/findings.md` with:
- What was tested
- What worked
- What didn't work
- Recommendation (proceed / abandon / revise concept)
5. Verdict is PROTOTYPE COMPLETE
**Assertions:**
- [ ] "May I write to `prototypes/grapple-hook/`?" is asked before any files are created
- [ ] Implementation is isolated to `prototypes/` (not `src/`)
- [ ] `findings.md` is created with at minimum: tested/worked/didn't-work/recommendation
- [ ] Verdict is PROTOTYPE COMPLETE
---
### Case 2: Prototype Already Exists — Offers Extend, Replace, or Archive
**Fixture:**
- `prototypes/grapple-hook/` already exists from a previous prototype session
- It contains a basic implementation and a findings.md
**Input:** `/prototype grapple-hook`
**Expected behavior:**
1. Skill detects existing `prototypes/grapple-hook/` directory
2. Skill reports: "Prototype already exists for grapple-hook"
3. Skill presents 3 options:
- Extend: add new features to the existing prototype
- Replace: start fresh (asks "May I replace `prototypes/grapple-hook/`?")
- Archive: move to `prototypes/archive/grapple-hook/` and start fresh
4. User selects; skill proceeds accordingly
**Assertions:**
- [ ] Existing prototype is detected and reported
- [ ] Exactly 3 options are presented (extend, replace, archive)
- [ ] Replace path includes a "May I replace" confirmation
- [ ] Archive path moves (not deletes) the existing prototype
---
### Case 3: Prototype Validates Mechanic — Recommends Proceeding to Production
**Fixture:**
- Prototype implementation complete
- Findings: grapple hook mechanic is fun and technically feasible
**Input:** `/prototype grapple-hook` (prototype session complete)
**Expected behavior:**
1. After prototype is built and tested, findings are summarized
2. Recommendation in findings.md: "Mechanic validated — recommend proceeding
to `/design-system` for full specification"
3. Skill handoff message explicitly suggests `/design-system grapple-hook`
4. Verdict is PROTOTYPE COMPLETE
**Assertions:**
- [ ] `findings.md` contains an explicit recommendation
- [ ] Recommendation references `/design-system` when mechanic is validated
- [ ] Handoff message echoes the recommendation
- [ ] Verdict is PROTOTYPE COMPLETE (not PROTOTYPE ABANDONED)
---
### Case 4: Prototype Reveals Mechanic is Unworkable — PROTOTYPE ABANDONED
**Fixture:**
- Prototype implemented for "procedural-dialogue"
- After testing: the mechanic creates incoherent dialogue trees and is
frustrating to play
**Input:** `/prototype procedural-dialogue`
**Expected behavior:**
1. Prototype is built
2. Findings document the failure: incoherent output, player confusion, technical complexity
3. Recommendation in findings.md: "Mechanic not viable — abandoning"
4. `findings.md` documents the specific reasons the mechanic failed
5. Skill suggests alternatives in the handoff (e.g., curated dialogue instead)
6. Verdict is PROTOTYPE ABANDONED
**Assertions:**
- [ ] Verdict is PROTOTYPE ABANDONED (not PROTOTYPE COMPLETE)
- [ ] `findings.md` documents specific failure reasons (not vague)
- [ ] Alternative approaches are suggested in the handoff
- [ ] Prototype files are retained (not deleted) for reference
---
### Case 5: Director Gate Check — No gate; prototypes are validation artifacts
**Fixture:**
- Mechanic concept provided
**Input:** `/prototype wall-jump`
**Expected behavior:**
1. Skill creates and documents the prototype
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is PROTOTYPE COMPLETE or PROTOTYPE ABANDONED — no gate verdict
---
## Protocol Compliance
- [ ] Asks "May I write to `prototypes/[name]/`?" before creating any files
- [ ] Creates all files under `prototypes/` (not `src/`)
- [ ] Produces `findings.md` with tested/worked/didn't-work/recommendation
- [ ] Notes that production coding standards are intentionally relaxed
- [ ] Offers extend/replace/archive when prototype already exists
- [ ] Verdict is PROTOTYPE COMPLETE or PROTOTYPE ABANDONED
---
## Coverage Notes
- Prototype implementation quality (code style) is intentionally not tested —
prototypes are throwaway artifacts and quality standards do not apply.
- The archiving mechanism is mentioned in Case 2 but the archive format is
not assertion-tested in detail.
- Engine-specific prototype scaffolding (GDScript scenes vs. C# MonoBehaviour)
follows the same flow with engine-appropriate file types.

View File

@@ -0,0 +1,175 @@
# Skill Test Spec: /qa-plan
## Skill Summary
`/qa-plan` generates a structured QA test plan for a feature or sprint milestone.
It reads story files for the specified sprint, extracts acceptance criteria from
each story, cross-references test standards from `coding-standards.md` to assign
the appropriate test type (unit, integration, visual, UI, or config/data), and
produces a prioritized QA plan document.
The skill asks "May I write to `production/qa/qa-plan-sprint-NNN.md`?" before
persisting the output. If an existing test plan for the same sprint is found, the
skill offers to update rather than replace. The verdict is COMPLETE when the plan
is written. No director gates are used — gate-level story readiness is handled by
`/story-readiness`.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" collaborative protocol language before writing the plan
- [ ] Has a next-step handoff (e.g., `/smoke-check` or `/story-readiness`)
---
## Director Gate Checks
None. `/qa-plan` is a planning utility. Story readiness gates are separate.
---
## Test Cases
### Case 1: Happy Path — Sprint with 4 stories generates full test plan
**Fixture:**
- `production/sprints/sprint-003.md` lists 4 stories with defined acceptance criteria
- Stories span types: 1 logic (formula), 1 integration, 1 visual, 1 UI
- `coding-standards.md` is present with test evidence table
**Input:** `/qa-plan sprint-003`
**Expected behavior:**
1. Skill reads sprint-003.md and identifies 4 stories
2. Skill reads each story's acceptance criteria
3. Skill assigns test types per coding-standards.md table:
- Logic story → Unit test (BLOCKING)
- Integration story → Integration test (BLOCKING)
- Visual story → Screenshot + lead sign-off (ADVISORY)
- UI story → Manual walkthrough doc (ADVISORY)
4. Skill drafts QA plan with story-by-story test type breakdown
5. Skill asks "May I write to `production/qa/qa-plan-sprint-003.md`?"
6. File is written on approval; verdict is COMPLETE
**Assertions:**
- [ ] All 4 stories are included in the plan
- [ ] Test type is assigned per coding-standards.md (not guessed)
- [ ] Gate level (BLOCKING vs ADVISORY) is noted for each story
- [ ] "May I write" is asked with the correct file path
- [ ] Verdict is COMPLETE
---
### Case 2: Story With No Acceptance Criteria — Flagged as UNTESTABLE
**Fixture:**
- `production/sprints/sprint-004.md` lists 3 stories; one story has empty
acceptance criteria section
**Input:** `/qa-plan sprint-004`
**Expected behavior:**
1. Skill reads all 3 stories
2. Skill detects the story with no AC
3. Story is flagged as `UNTESTABLE — Acceptance Criteria required` in the plan
4. Other 2 stories receive normal test type assignments
5. Plan is written with the UNTESTABLE story flagged; verdict is COMPLETE
**Assertions:**
- [ ] UNTESTABLE label appears for the story with no AC
- [ ] Plan is not blocked — the other stories are still planned
- [ ] Output suggests adding AC to the flagged story (next step)
- [ ] Verdict is COMPLETE (the plan is still generated)
---
### Case 3: Existing Test Plan Found — Offers update rather than replace
**Fixture:**
- `production/qa/qa-plan-sprint-003.md` already exists from a previous run
- Sprint-003 has 2 new stories added since the last plan
**Input:** `/qa-plan sprint-003`
**Expected behavior:**
1. Skill reads sprint-003.md and detects 2 stories not in the existing plan
2. Skill reports: "Existing QA plan found for sprint-003 — offering to update"
3. Skill presents the 2 new stories and their proposed test assignments
4. Skill asks "May I update `production/qa/qa-plan-sprint-003.md`?" (not overwrite)
5. Updated plan is written on approval
**Assertions:**
- [ ] Skill detects the existing plan file
- [ ] "update" language is used (not "overwrite")
- [ ] Only new stories are proposed for addition — existing entries preserved
- [ ] Verdict is COMPLETE
---
### Case 4: No Stories Found for Sprint — Error with guidance
**Fixture:**
- `production/sprints/sprint-007.md` does not exist
- No other sprint file matching sprint-007
**Input:** `/qa-plan sprint-007`
**Expected behavior:**
1. Skill attempts to read sprint-007.md — file not found
2. Skill outputs: "No sprint file found for sprint-007"
3. Skill suggests running `/sprint-plan` to create the sprint first
4. No plan is written; no "May I write" is asked
**Assertions:**
- [ ] Error message names the missing sprint file
- [ ] `/sprint-plan` is suggested as the remediation step
- [ ] No write tool is called
- [ ] Verdict is not COMPLETE (error state)
---
### Case 5: Director Gate Check — No gate; QA planning is a utility
**Fixture:**
- Sprint with valid stories and AC
**Input:** `/qa-plan sprint-003`
**Expected behavior:**
1. Skill generates and writes QA plan
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Skill reaches COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Reads coding-standards.md test evidence table before assigning test types
- [ ] Assigns BLOCKING or ADVISORY gate level per story type
- [ ] Flags stories with no AC as UNTESTABLE (does not silently skip them)
- [ ] Detects existing plan and offers update path
- [ ] Asks "May I write" before creating or updating the plan file
- [ ] Verdict is COMPLETE when plan is written
---
## Coverage Notes
- The case where `coding-standards.md` is missing (skill cannot assign test types)
is not fixture-tested; behavior would follow the BLOCKED pattern with a note
to restore the standards file.
- Multi-sprint planning (spanning 2 sprints) is not tested; the skill is designed
for one sprint at a time.
- Config/data story type (balance tuning → smoke check) follows the same
assignment pattern as other types in Case 1 and is not separately tested.

View File

@@ -0,0 +1,172 @@
# Skill Test Spec: /regression-suite
## Skill Summary
`/regression-suite` maps test coverage to GDD requirements: it reads the
acceptance criteria from story files in the current sprint (or a specified epic),
then scans `tests/` for corresponding test files and checks whether each AC has
a matching assertion. It produces a coverage report identifying which ACs are
fully covered, partially covered, or untested, and which test files have no
matching AC (orphan tests).
The skill may write a coverage report to `production/qa/` after a "May I write"
ask. No director gates apply. Verdicts: FULL COVERAGE (all ACs have tests),
GAPS FOUND (some ACs are untested), or CRITICAL GAPS (a critical-priority AC
has no test).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: FULL COVERAGE, GAPS FOUND, CRITICAL GAPS
- [ ] Contains "May I write" language (skill may write coverage report)
- [ ] Has a next-step handoff (e.g., `/test-setup` if framework missing, `/qa-plan` if plan missing)
---
## Director Gate Checks
None. `/regression-suite` is a QA analysis utility. No director gates apply.
---
## Test Cases
### Case 1: Full Coverage — All ACs in sprint have corresponding tests
**Fixture:**
- `production/sprints/sprint-004.md` lists 3 stories with 2 ACs each (6 total)
- `tests/unit/` and `tests/integration/` contain test files that match all 6 ACs
(by system name and scenario description)
**Input:** `/regression-suite sprint-004`
**Expected behavior:**
1. Skill reads all 6 ACs from sprint-004 stories
2. Skill scans test files and matches each AC to at least one test assertion
3. All 6 ACs have coverage
4. Skill produces coverage report: "6/6 ACs covered"
5. Skill asks "May I write to `production/qa/regression-sprint-004.md`?"
6. File is written on approval; verdict is FULL COVERAGE
**Assertions:**
- [ ] All 6 ACs appear in the coverage report
- [ ] Each AC is marked as covered with the matching test file referenced
- [ ] Verdict is FULL COVERAGE
- [ ] "May I write" is asked before writing the report
---
### Case 2: Gaps Found — 3 ACs have no tests
**Fixture:**
- Sprint has 5 stories with 8 total ACs
- Tests exist for 5 of the 8 ACs; 3 ACs have no corresponding test file or assertion
**Input:** `/regression-suite`
**Expected behavior:**
1. Skill reads all 8 ACs
2. Skill scans tests — 5 matched, 3 unmatched
3. Coverage report lists the 3 untested ACs by story and AC text
4. Skill asks "May I write to `production/qa/regression-[sprint]-[date].md`?"
5. Report is written; verdict is GAPS FOUND
**Assertions:**
- [ ] The 3 untested ACs are listed by name in the report
- [ ] Matched ACs are also shown (not only the gaps)
- [ ] Verdict is GAPS FOUND (not FULL COVERAGE)
- [ ] Report is written after "May I write" approval
---
### Case 3: Critical AC Untested — CRITICAL GAPS verdict, flagged prominently
**Fixture:**
- Sprint has 4 stories; one story is Priority: Critical with 2 ACs
- One of the critical-priority ACs has no test
**Input:** `/regression-suite`
**Expected behavior:**
1. Skill reads all stories and ACs, noting which stories are critical priority
2. Skill scans tests — the critical AC has no match
3. Report prominently flags: "CRITICAL GAP: [AC text] — no test found (Critical priority story)"
4. Skill recommends blocking story completion until test is added
5. Verdict is CRITICAL GAPS
**Assertions:**
- [ ] Verdict is CRITICAL GAPS (not GAPS FOUND)
- [ ] Critical priority AC is flagged more prominently than normal gaps
- [ ] Recommendation to block story completion is included
- [ ] Non-critical gaps (if any) are also listed
---
### Case 4: Orphan Tests — Test file has no matching AC
**Fixture:**
- `tests/unit/save_system_test.gd` exists with assertions for scenarios
not present in any current story's AC list
- Current sprint stories do not reference save system
**Input:** `/regression-suite`
**Expected behavior:**
1. Skill scans tests and cross-references ACs
2. `save_system_test.gd` assertions do not match any current AC
3. Test file is flagged as ORPHAN TEST in the coverage report
4. Report notes: "Orphan tests may belong to a past or future sprint, or AC was renamed"
5. Verdict is FULL COVERAGE or GAPS FOUND depending on overall AC coverage
(orphan tests do not affect verdict, they are advisory)
**Assertions:**
- [ ] Orphan test is flagged in the report
- [ ] Orphan flag includes the filename and suggestion (past sprint / renamed AC)
- [ ] Orphan tests do not cause a GAPS FOUND verdict on their own
- [ ] Overall verdict reflects AC coverage only
---
### Case 5: Director Gate Check — No gate; regression-suite is a QA utility
**Fixture:**
- Sprint with stories and test files
**Input:** `/regression-suite`
**Expected behavior:**
1. Skill produces coverage report and writes it
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is FULL COVERAGE, GAPS FOUND, or CRITICAL GAPS — no gate verdict
---
## Protocol Compliance
- [ ] Reads story ACs from sprint files before scanning tests
- [ ] Matches ACs to tests by system name and scenario (not file name alone)
- [ ] Flags critical-priority untested ACs as CRITICAL GAPS
- [ ] Flags orphan tests (exist in tests/ but no AC matches)
- [ ] Asks "May I write" before persisting the coverage report
- [ ] Verdict is FULL COVERAGE, GAPS FOUND, or CRITICAL GAPS
---
## Coverage Notes
- The heuristic for matching an AC to a test (by system name + scenario keywords)
is approximate; exact matching logic is defined in the skill body.
- Integration test coverage is mapped the same way as unit test coverage; no
distinction in verdicts is made between the two.
- This skill does not run the tests — it maps AC text to test assertions. Test
execution is handled by the CI pipeline.

View File

@@ -0,0 +1,177 @@
# Skill Test Spec: /release-checklist
## Skill Summary
`/release-checklist` generates an internal release readiness checklist covering:
sprint story completion, open bug severity, QA sign-off status, build stability,
and changelog readiness. It is an internal gate — not a platform/store checklist
(that is `/launch-checklist`). When a previous release checklist exists, it shows
a delta of resolved and newly introduced issues.
The skill writes its checklist report to `production/releases/release-checklist-[date].md`
after a "May I write" ask. No director gates apply — `/gate-check` handles
formal phase gate logic. Verdicts: RELEASE READY, RELEASE BLOCKED, or CONCERNS.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: RELEASE READY, RELEASE BLOCKED, CONCERNS
- [ ] Contains "May I write" collaborative protocol language before writing the report
- [ ] Has a next-step handoff (e.g., `/launch-checklist` for external or `/gate-check` for phase)
---
## Director Gate Checks
None. `/release-checklist` is an internal audit utility. Formal phase advancement
is managed by `/gate-check`.
---
## Test Cases
### Case 1: Happy Path — All Sprint Stories Complete, QA Passed, RELEASE READY
**Fixture:**
- `production/sprints/sprint-008.md` — all stories are `Status: Done`
- No open bugs with severity HIGH or CRITICAL in `production/bugs/`
- `production/qa/qa-plan-sprint-008.md` has QA sign-off annotation
- Changelog entry for this version exists
- `production/stage.txt` contains `Polish`
**Input:** `/release-checklist`
**Expected behavior:**
1. Skill reads sprint-008: all stories Done
2. Skill reads bugs: no HIGH or CRITICAL open bugs
3. Skill confirms QA plan has sign-off
4. Skill confirms changelog entry exists
5. All checks pass; skill asks "May I write to
`production/releases/release-checklist-2026-04-06.md`?"
6. Report written; verdict is RELEASE READY
**Assertions:**
- [ ] All 4 check categories are evaluated (stories, bugs, QA, changelog)
- [ ] All items appear with PASS markers
- [ ] Verdict is RELEASE READY
- [ ] "May I write" is asked before writing
---
### Case 2: Open HIGH Severity Bugs — RELEASE BLOCKED
**Fixture:**
- All sprint stories are Done
- `production/bugs/` contains 2 open bugs with severity HIGH
**Input:** `/release-checklist`
**Expected behavior:**
1. Skill reads sprint — stories complete
2. Skill reads bugs — 2 HIGH severity bugs open
3. Skill reports: "RELEASE BLOCKED — 2 open HIGH severity bugs must be resolved"
4. Both bug filenames are listed in the report
5. Verdict is RELEASE BLOCKED
**Assertions:**
- [ ] Verdict is RELEASE BLOCKED (not CONCERNS)
- [ ] Both bug filenames are listed explicitly
- [ ] Skill makes clear HIGH severity bugs are blocking (not advisory)
---
### Case 3: Changelog Not Generated — CONCERNS
**Fixture:**
- All stories Done, no HIGH/CRITICAL bugs
- No changelog entry found for the current version/sprint
**Input:** `/release-checklist`
**Expected behavior:**
1. Skill checks all items
2. Changelog check fails: no changelog entry found
3. Skill reports: "CONCERNS — Changelog not generated for this release"
4. Skill suggests running `/changelog` to generate it
5. Verdict is CONCERNS (advisory — not a hard block)
**Assertions:**
- [ ] Verdict is CONCERNS (not RELEASE BLOCKED — changelog is advisory)
- [ ] `/changelog` is suggested as the remediation
- [ ] Other passing checks are shown in the report
- [ ] Missing changelog is described as advisory, not blocking
---
### Case 4: Previous Release Checklist Exists — Delta From Last Release
**Fixture:**
- `production/releases/release-checklist-2026-03-20.md` exists
- Previous: 1 story was incomplete, 1 HIGH bug open
- Current: all stories Done, HIGH bug resolved, but now 1 MEDIUM bug appeared
**Input:** `/release-checklist`
**Expected behavior:**
1. Skill finds the previous checklist and loads it
2. New checklist is generated and compared:
- Newly resolved: "Story [X] — was open, now Done"
- Newly resolved: "HIGH bug [filename] — was open, now closed"
- New item: "1 MEDIUM bug appeared (advisory)"
3. Delta section shows all changes prominently
4. Verdict is CONCERNS (MEDIUM bug is advisory, not blocking)
**Assertions:**
- [ ] Delta section appears in the report with resolved and new items
- [ ] Newly resolved items from the previous checklist are noted
- [ ] New items not present in the previous checklist are highlighted
- [ ] Verdict reflects current state (not previous state)
---
### Case 5: Director Gate Check — No gate; release-checklist is an internal audit
**Fixture:**
- Active sprint with stories and bug reports
**Input:** `/release-checklist`
**Expected behavior:**
1. Skill runs the full checklist and writes the report
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is RELEASE READY, RELEASE BLOCKED, or CONCERNS — no gate verdict
---
## Protocol Compliance
- [ ] Checks sprint story completion status
- [ ] Checks open bug severity (CRITICAL/HIGH = BLOCKED; MEDIUM/LOW = CONCERNS)
- [ ] Checks QA plan sign-off status
- [ ] Checks changelog existence
- [ ] Compares against previous checklist when one exists
- [ ] Asks "May I write" before writing the report
- [ ] Verdict is RELEASE READY, RELEASE BLOCKED, or CONCERNS
---
## Coverage Notes
- Build stability verification (no failed CI runs) is listed as a check category
but relies on external CI system state; the skill notes this as a MANUAL CHECK
if CI integration is not configured.
- CRITICAL bugs always result in RELEASE BLOCKED regardless of other items;
this is equivalent to the HIGH severity case in Case 2.
- Stories with `Status: In Review` (not Done) are treated as incomplete
and result in RELEASE BLOCKED; this edge case follows the same pattern
as the HIGH bug case.

View File

@@ -0,0 +1,180 @@
# Skill Test Spec: /reverse-document
## Skill Summary
`/reverse-document` generates design or architecture documentation from existing
source code. It reads the specified source file(s), infers design intent from
class structure, method names, constants, and comments, and produces either a
GDD skeleton (for gameplay systems) or an architecture overview (for technical
systems). The output is a best-effort inference — magic numbers and undocumented
logic may result in a PARTIAL verdict.
The skill asks "May I write to [inferred path]?" before creating the document.
No director gates apply. Verdicts: COMPLETE (clean inference), PARTIAL (some
fields are ambiguous and need human review).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, PARTIAL
- [ ] Contains "May I write" collaborative protocol language before writing the doc
- [ ] Has a next-step handoff (e.g., `/design-review` to validate the generated doc)
---
## Director Gate Checks
None. `/reverse-document` is a documentation utility. No director gates apply.
---
## Test Cases
### Case 1: Well-Structured Source — Accurate design doc skeleton produced
**Fixture:**
- `src/gameplay/health_system.gd` exists with:
- `@export var max_health: int = 100`
- `func take_damage(amount: int)` with clamping logic
- `signal health_changed(new_value: int)`
- Docstrings on all public methods
**Input:** `/reverse-document src/gameplay/health_system.gd`
**Expected behavior:**
1. Skill reads the source file and identifies the health system
2. Skill infers design intent: max health, take_damage behavior, health signal
3. Skill produces GDD skeleton for health system with 8 required sections:
Overview, Player Fantasy, Detailed Rules, Formulas, Edge Cases, Dependencies,
Tuning Knobs, Acceptance Criteria
4. Formulas section includes the inferred clamping formula
5. Tuning Knobs notes `max_health = 100` as a configurable value
6. Skill asks "May I write to `design/gdd/health-system.md`?"
7. File written; verdict is COMPLETE
**Assertions:**
- [ ] All 8 required GDD sections are present in the output
- [ ] `max_health = 100` appears as a Tuning Knob
- [ ] Clamping formula is captured in the Formulas section
- [ ] "May I write" is asked with the inferred path
- [ ] Verdict is COMPLETE
---
### Case 2: Ambiguous Source — Magic Numbers, PARTIAL Verdict
**Fixture:**
- `src/gameplay/enemy_ai.gd` exists with:
- Inline magic numbers: `if distance < 150:`, `speed = 3.5`
- No comments or docstrings
- Complex state machine logic that is not self-explanatory
**Input:** `/reverse-document src/gameplay/enemy_ai.gd`
**Expected behavior:**
1. Skill reads the file and detects magic numbers with no context
2. Skill produces a GDD skeleton with notes: "AMBIGUOUS VALUE: 150 (unknown units —
is this pixels, world units, or tiles?)"
3. Skill marks the Formulas and Tuning Knobs sections as requiring human review
4. Skill asks "May I write to `design/gdd/enemy-ai.md`?" with PARTIAL advisory
5. File written with PARTIAL markers; verdict is PARTIAL
**Assertions:**
- [ ] AMBIGUOUS VALUE annotations appear for magic numbers
- [ ] Sections needing human review are marked explicitly
- [ ] Verdict is PARTIAL (not COMPLETE)
- [ ] File is still written — PARTIAL is not a blocking failure
---
### Case 3: Multiple Interdependent Files — Cross-System Overview Produced
**Fixture:**
- User provides 2 source files: `combat_system.gd` and `damage_resolver.gd`
- The files reference each other (combat calls damage_resolver)
**Input:** `/reverse-document src/gameplay/combat_system.gd src/gameplay/damage_resolver.gd`
**Expected behavior:**
1. Skill reads both files and detects the dependency relationship
2. Skill produces a cross-system architecture overview (not individual GDDs)
3. Overview describes: Combat System → Damage Resolver interaction, shared
interfaces, data flow between the two
4. Skill asks "May I write to `docs/architecture/combat-damage-overview.md`?"
5. Overview written after approval; verdict is COMPLETE (or PARTIAL if ambiguous)
**Assertions:**
- [ ] Both files are analyzed together (not as two separate docs)
- [ ] Cross-system dependency is documented in the output
- [ ] Output file is written to `docs/architecture/` (not `design/gdd/`)
- [ ] Verdict is COMPLETE or PARTIAL
---
### Case 4: Source File Not Found — Error
**Fixture:**
- `src/gameplay/inventory_system.gd` does not exist
**Input:** `/reverse-document src/gameplay/inventory_system.gd`
**Expected behavior:**
1. Skill attempts to read the specified file — not found
2. Skill outputs: "Source file not found: src/gameplay/inventory_system.gd"
3. Skill suggests checking the path or running `/map-systems` to identify
the correct source file
4. No document is created
**Assertions:**
- [ ] Error message names the missing file with the full path
- [ ] Alternative suggestion (check path or `/map-systems`) is provided
- [ ] No write tool is called
- [ ] No verdict is issued (error state)
---
### Case 5: Director Gate Check — No gate; reverse-document is a utility
**Fixture:**
- Well-structured source file exists
**Input:** `/reverse-document src/gameplay/health_system.gd`
**Expected behavior:**
1. Skill generates and writes the design doc
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is COMPLETE or PARTIAL — no gate verdict involved
---
## Protocol Compliance
- [ ] Reads source file(s) before generating any content
- [ ] Produces all 8 required GDD sections when target is a gameplay system
- [ ] Annotates ambiguous values with AMBIGUOUS VALUE markers
- [ ] Produces cross-system overview (not individual GDDs) for multiple files
- [ ] Asks "May I write" before creating any output file
- [ ] Verdict is COMPLETE (clean inference) or PARTIAL (ambiguous fields)
---
## Coverage Notes
- Architecture overview format (for technical/infrastructure systems) differs
from GDD format; the inferred output type is determined by the nature of the
source file (gameplay logic → GDD; engine/infra code → architecture doc).
- The case where a source file is readable but contains only auto-generated
boilerplate with no meaningful logic is not tested; skill would likely produce
a near-empty skeleton with a PARTIAL verdict.
- C# and Blueprint source files follow the same inference pattern as GDScript;
language-specific differences are handled in the skill body.

View File

@@ -0,0 +1,182 @@
# Skill Test Spec: /setup-engine
## Skill Summary
`/setup-engine` configures the project's engine, language, rendering backend,
physics engine, specialist agent assignments, and naming conventions by
populating `technical-preferences.md`. It accepts an optional engine argument
(e.g., `/setup-engine godot`) to skip the engine-selection step. For each
section of `technical-preferences.md`, the skill presents a draft and asks
"May I write to `technical-preferences.md`?" before updating.
The skill also populates the specialist routing table (file extension → agent
mappings) based on the chosen engine. It has no director gates — configuration
is a technical utility task. The verdict is always COMPLETE when the file is
fully written.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" collaborative protocol language before updating technical-preferences.md
- [ ] Has a next-step handoff (e.g., `/brainstorm` or `/start` depending on flow)
---
## Director Gate Checks
None. `/setup-engine` is a technical configuration skill. No director gates apply.
---
## Test Cases
### Case 1: Godot 4 + GDScript — Full engine configuration
**Fixture:**
- `technical-preferences.md` contains only placeholders
- Engine argument provided: `godot`
**Input:** `/setup-engine godot`
**Expected behavior:**
1. Skill skips engine-selection step (argument provided)
2. Skill presents language options for Godot: GDScript or C#
3. User selects GDScript
4. Skill drafts all engine sections: engine/language/rendering/physics fields,
naming conventions (snake_case for GDScript), specialist assignments
(godot-specialist, gdscript-specialist, godot-shader-specialist, etc.)
5. Skill populates the routing table: `.gd` → gdscript-specialist, `.gdshader`
godot-shader-specialist, `.tscn` → godot-specialist
6. Skill asks "May I write to `technical-preferences.md`?"
7. File is written after approval; verdict is COMPLETE
**Assertions:**
- [ ] Engine field is set to Godot 4 (not a placeholder)
- [ ] Language field is set to GDScript
- [ ] Naming conventions are GDScript-appropriate (snake_case)
- [ ] Routing table includes `.gd`, `.gdshader`, and `.tscn` entries
- [ ] Specialists are assigned (not placeholders)
- [ ] "May I write" is asked before writing
- [ ] Verdict is COMPLETE
---
### Case 2: Unity + C# — Unity-specific configuration
**Fixture:**
- `technical-preferences.md` contains only placeholders
- Engine argument provided: `unity`
**Input:** `/setup-engine unity`
**Expected behavior:**
1. Skill sets engine to Unity, language to C#
2. Naming conventions are C#-appropriate (PascalCase for classes, camelCase for fields)
3. Specialist assignments reference unity-specialist, csharp-specialist
4. Routing table: `.cs` → csharp-specialist, `.asmdef` → unity-specialist,
`.unity` (scene) → unity-specialist
5. Skill asks "May I write to `technical-preferences.md`?" and writes on approval
**Assertions:**
- [ ] Engine field is set to Unity (not Godot or Unreal)
- [ ] Language field is set to C#
- [ ] Naming conventions reflect C# conventions
- [ ] Routing table includes `.cs` and `.unity` entries
- [ ] Verdict is COMPLETE
---
### Case 3: Unreal + Blueprint — Unreal-specific configuration
**Fixture:**
- `technical-preferences.md` contains only placeholders
- Engine argument provided: `unreal`
**Input:** `/setup-engine unreal`
**Expected behavior:**
1. Skill sets engine to Unreal Engine 5, primary language to Blueprint (Visual Scripting)
2. Specialist assignments reference unreal-specialist, blueprint-specialist
3. Routing table: `.uasset` → blueprint-specialist or unreal-specialist,
`.umap` → unreal-specialist
4. Performance budgets are pre-set with Unreal defaults (e.g., higher draw call budget)
5. Skill asks "May I write" and writes on approval; verdict is COMPLETE
**Assertions:**
- [ ] Engine field is set to Unreal Engine 5
- [ ] Routing table includes `.uasset` and `.umap` entries
- [ ] Blueprint specialist is assigned
- [ ] Verdict is COMPLETE
---
### Case 4: Engine Already Configured — Offers to reconfigure specific sections
**Fixture:**
- `technical-preferences.md` has engine set to Godot 4 with all fields populated
- No engine argument provided
**Input:** `/setup-engine`
**Expected behavior:**
1. Skill reads `technical-preferences.md` and detects fully configured engine (Godot 4)
2. Skill reports: "Engine already configured as Godot 4 + GDScript"
3. Skill presents options: reconfigure all, reconfigure specific section only
(Engine/Language, Naming Conventions, Specialists, Performance Budgets)
4. User selects "Reconfigure Performance Budgets only"
5. Only the performance budget section is updated; all other fields unchanged
6. Skill asks "May I write to `technical-preferences.md`?" and writes on approval
**Assertions:**
- [ ] Skill does NOT overwrite all fields when only a section update was requested
- [ ] User is offered section-specific reconfiguration
- [ ] Only the selected section is modified in the written file
- [ ] Verdict is COMPLETE
---
### Case 5: Director Gate Check — No gate; setup-engine is a utility skill
**Fixture:**
- Fresh project with no engine configured
**Input:** `/setup-engine godot`
**Expected behavior:**
1. Skill completes full engine configuration
2. No director agents are spawned at any point
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Presents draft configuration before asking to write
- [ ] Asks "May I write to `technical-preferences.md`?" before writing
- [ ] Respects engine argument when provided (skips selection step)
- [ ] Detects existing config and offers partial reconfigure
- [ ] Routing table is populated for all key file types for the chosen engine
- [ ] Verdict is COMPLETE after file is written
---
## Coverage Notes
- Godot 4 + C# (instead of GDScript) follows the same flow as Case 1 with
different naming conventions and the godot-csharp-specialist assignment.
This variant is not separately tested.
- The engine-version-specific guidance (e.g., Godot 4.6 knowledge gap warning
from VERSION.md) is surfaced by the skill but not assertion-tested here.
- Performance budget defaults per engine are noted as engine-specific but
exact default values are not assertion-tested.

View File

@@ -0,0 +1,185 @@
# Skill Test Spec: /skill-improve
## Skill Summary
`/skill-improve` runs an automated test-fix-retest improvement loop on a skill
file. It invokes `/skill-test static` (and optionally `/skill-test category`) to
establish a baseline score, diagnoses the failing checks, proposes targeted fixes
to the SKILL.md file, asks "May I write the improvements to [skill path]?", applies
the fixes, and re-runs the tests to confirm improvement.
If the proposed fix makes the skill worse (regression), the fix is reverted (with
user confirmation) rather than applied. If the skill is already perfect (0 failures),
the skill exits immediately without making changes. No director gates apply. Verdicts:
IMPROVED (score went up), NO CHANGE (no improvements possible or user declined), or
REVERTED (fix was applied but caused regression and was reverted).
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: IMPROVED, NO CHANGE, REVERTED
- [ ] Contains "May I write" collaborative protocol language before applying fixes
- [ ] Has a next-step handoff (e.g., run `/skill-test spec` to validate behavioral compliance)
---
## Director Gate Checks
None. `/skill-improve` is a meta-utility skill. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — Skill With 2 Static Failures, Both Fixed, IMPROVED
**Fixture:**
- `.claude/skills/some-skill/SKILL.md` has 2 static failures:
- Check 4: no "May I write" language despite having Write in allowed-tools
- Check 5: no next-step handoff at the end
**Input:** `/skill-improve some-skill`
**Expected behavior:**
1. Skill runs `/skill-test static some-skill` — baseline: 5/7 checks pass
2. Skill diagnoses the 2 failing checks (4 and 5)
3. Skill proposes fixes:
- Add "May I write" language to the appropriate phase
- Add a next-step handoff section at the end
4. Skill asks "May I write improvements to `.claude/skills/some-skill/SKILL.md`?"
5. Fixes applied; `/skill-test static some-skill` re-run — now 7/7 checks pass
6. Verdict is IMPROVED (5→7)
**Assertions:**
- [ ] Baseline score is established before any changes (5/7)
- [ ] Both failing checks are diagnosed and addressed in the proposed fix
- [ ] "May I write" is asked before applying the fix
- [ ] Re-test confirms improvement (7/7)
- [ ] Verdict is IMPROVED with before/after score shown
---
### Case 2: Fix Causes Regression — Score Comparison Shows Regression, REVERTED
**Fixture:**
- `.claude/skills/some-skill/SKILL.md` has 1 static failure (missing handoff)
- Proposed fix inadvertently removes the verdict keywords section
(introducing a new failure)
**Input:** `/skill-improve some-skill`
**Expected behavior:**
1. Baseline: 6/7 checks pass (1 failure: missing handoff)
2. Skill proposes fix and asks "May I write improvements?"
3. Fix is applied; re-test runs
4. Re-test result: 5/7 (fixed the handoff but broke verdict keywords)
5. Skill detects regression: score went DOWN
6. Skill asks user: "Fix caused a regression (6→5). May I revert the changes?"
7. User confirms; changes are reverted; verdict is REVERTED
**Assertions:**
- [ ] Re-test score is compared to baseline before finalizing
- [ ] Regression is detected when score decreases
- [ ] User is asked to confirm revert (not automatic)
- [ ] File is reverted on user confirmation
- [ ] Verdict is REVERTED
---
### Case 3: Skill With Category Assignment — Baseline Captures Both Scores
**Fixture:**
- `.claude/skills/gate-check/SKILL.md` is a gate skill with 1 static failure
and 2 category (G-criteria) failures
- `tests/skills/quality-rubric.md` has Gate Skills section
**Input:** `/skill-improve gate-check`
**Expected behavior:**
1. Skill runs both static and category tests for the baseline:
- Static: 6/7 checks pass
- Category: 3/5 G-criteria pass
2. Combined baseline: 9/12
3. Skill diagnoses all 3 failures and proposes fixes
4. "May I write improvements to `.claude/skills/gate-check/SKILL.md`?"
5. Fixes applied; both test types re-run
6. Re-test: static 7/7, category 5/5 = 12/12
7. Verdict is IMPROVED (9→12)
**Assertions:**
- [ ] Both static and category scores are captured in the baseline
- [ ] Combined score is used for comparison (not just one type)
- [ ] All 3 failures are addressed in the proposed fix
- [ ] Re-test confirms improvement in both score types
- [ ] Verdict is IMPROVED with combined before/after
---
### Case 4: Skill Already Perfect — No Improvements Needed
**Fixture:**
- `.claude/skills/brainstorm/SKILL.md` has no static failures
- Category score is also 5/5 (if applicable)
**Input:** `/skill-improve brainstorm`
**Expected behavior:**
1. Skill runs `/skill-test static brainstorm` — 7/7 checks pass
2. If category applies: 5/5 criteria pass
3. Skill outputs: "No improvements needed — brainstorm is fully compliant"
4. Skill exits without proposing any changes
5. No "May I write" is asked; no files are modified
6. Verdict is NO CHANGE
**Assertions:**
- [ ] Skill exits immediately after confirming 0 failures
- [ ] "No improvements needed" message is shown
- [ ] No changes are proposed
- [ ] No "May I write" is asked
- [ ] Verdict is NO CHANGE
---
### Case 5: Director Gate Check — No gate; skill-improve is a meta utility
**Fixture:**
- Skill with at least 1 static failure
**Input:** `/skill-improve some-skill`
**Expected behavior:**
1. Skill runs the test-fix-retest loop
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is IMPROVED, NO CHANGE, or REVERTED — no gate verdict
---
## Protocol Compliance
- [ ] Always establishes a baseline score before proposing any changes
- [ ] Shows before/after score comparison in the output
- [ ] Asks "May I write" before applying any fix
- [ ] Detects regressions by comparing re-test score to baseline
- [ ] Asks for user confirmation before reverting (not automatic)
- [ ] Ends with IMPROVED, NO CHANGE, or REVERTED verdict
---
## Coverage Notes
- The improvement loop is designed to run only one fix-retest cycle per
invocation; running multiple iterations requires re-invoking `/skill-improve`.
- Behavioral compliance (spec-mode test results) is not included in the
improvement loop — only structural (static) and category scores are automated.
- The case where the skill file cannot be read (permissions error or missing file)
is not tested; this would result in an error before the baseline is established.

View File

@@ -0,0 +1,188 @@
# Skill Test Spec: /skill-test
## Skill Summary
`/skill-test` validates skill files for structural correctness, behavioral
compliance, and category-rubric scoring. It operates in three modes:
- **static**: Checks a single skill file for structural requirements
(frontmatter fields, phase headings, verdict keywords, "May I write" language,
next-step handoff) without needing a fixture. Produces a per-check PASS/FAIL
table.
- **spec**: Reads a test spec file from `tests/skills/` and evaluates the skill
against each test case assertion, producing a case-by-case verdict.
- **audit**: Produces a coverage table of all skills in `.claude/skills/` and
all agents in `.claude/agents/`, showing which have spec files and which do not.
An additional **category** mode reads the quality rubric for a skill category
(e.g., gate skills) and scores the skill against rubric criteria. The verdict
system differs by mode.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdicts: COMPLIANT, NON-COMPLIANT, WARNINGS (static mode); PASS, FAIL, PARTIAL (spec mode); COMPLETE (audit mode)
- [ ] Does NOT contain "May I write" language (skill is read-only in all modes)
- [ ] Has a next-step handoff (e.g., `/skill-improve` to fix issues found)
---
## Director Gate Checks
None. `/skill-test` is a meta-utility skill. No director gates apply.
---
## Test Cases
### Case 1: Static Mode — Well-formed skill, all 7 checks pass, COMPLIANT
**Fixture:**
- `.claude/skills/brainstorm/SKILL.md` exists and is well-formed:
- Has all required frontmatter fields
- Has ≥2 phase headings
- Has verdict keywords
- Has "May I write" language
- Has a next-step handoff
- Documents director gates
- Documents gate mode behavior (lean/solo skips)
**Input:** `/skill-test static brainstorm`
**Expected behavior:**
1. Skill reads `.claude/skills/brainstorm/SKILL.md`
2. Skill runs all 7 structural checks
3. All 7 checks pass
4. Skill outputs a PASS/FAIL table with all 7 checks marked PASS
5. Verdict is COMPLIANT
**Assertions:**
- [ ] Exactly 7 structural checks are reported
- [ ] All 7 are marked PASS
- [ ] Verdict is COMPLIANT
- [ ] No files are written
---
### Case 2: Static Mode — Skill Missing "May I Write" Despite Write Tool in allowed-tools
**Fixture:**
- `.claude/skills/some-skill/SKILL.md` has `Write` in `allowed-tools` frontmatter
- The skill body has no "May I write" or "May I update" language
**Input:** `/skill-test static some-skill`
**Expected behavior:**
1. Skill reads `some-skill/SKILL.md`
2. Check 4 (collaborative write protocol) fails: `Write` in allowed-tools but no
"May I write" language found
3. All other checks may pass
4. Verdict is NON-COMPLIANT with Check 4 as the failing assertion
5. Output lists Check 4 as FAIL with explanation
**Assertions:**
- [ ] Check 4 is marked FAIL
- [ ] Explanation identifies the specific mismatch (Write tool without "May I write" language)
- [ ] Verdict is NON-COMPLIANT
- [ ] Other passing checks are shown (not only the failure)
---
### Case 3: Spec Mode — gate-check Skill Evaluated Against Spec
**Fixture:**
- `tests/skills/gate-check.md` exists with 5 test cases
- `.claude/skills/gate-check/SKILL.md` exists
**Input:** `/skill-test spec gate-check`
**Expected behavior:**
1. Skill reads both the skill file and the spec file
2. Skill evaluates each of the 5 test case assertions against the skill's behavior
3. For each case: PASS if skill behavior matches spec assertions, FAIL if not
4. Skill produces a case-by-case result table
5. Overall verdict: PASS (all 5), PARTIAL (some), or FAIL (majority failing)
**Assertions:**
- [ ] All 5 test cases from the spec are evaluated
- [ ] Each case has an individual PASS/FAIL result
- [ ] Overall verdict is PASS, PARTIAL, or FAIL based on case results
- [ ] No files are written
---
### Case 4: Audit Mode — Coverage Table of All Skills and Agents
**Fixture:**
- `.claude/skills/` contains 72+ skill directories
- `.claude/agents/` contains 49+ agent files
- `tests/skills/` contains spec files for a subset of skills
**Input:** `/skill-test audit`
**Expected behavior:**
1. Skill enumerates all skills in `.claude/skills/` and all agents in `.claude/agents/`
2. Skill checks `tests/skills/` for a corresponding spec file for each
3. Skill produces a coverage table:
- Each skill/agent listed
- "Has Spec" column: YES or NO
- Summary: "X of Y skills have specs; A of B agents have specs"
4. Verdict is COMPLETE
**Assertions:**
- [ ] All skill directories are enumerated (not just a sample)
- [ ] "Has Spec" column is accurate for each entry
- [ ] Summary counts are correct
- [ ] Verdict is COMPLETE
---
### Case 5: Category Mode — Gate Skill Evaluated Against Quality Rubric
**Fixture:**
- `tests/skills/quality-rubric.md` exists with a "Gate Skills" section defining
criteria G1-G5 (e.g., G1: has mode guard, G2: has verdict table, etc.)
- `.claude/skills/gate-check/SKILL.md` is a gate skill
**Input:** `/skill-test category gate-check`
**Expected behavior:**
1. Skill reads `quality-rubric.md` and identifies the Gate Skills section
2. Skill evaluates `gate-check/SKILL.md` against criteria G1-G5
3. Each criterion is scored: PASS, PARTIAL, or FAIL
4. Overall category score is computed (e.g., 4/5 criteria pass)
5. Verdict is COMPLIANT (all pass), WARNINGS (some partial), or NON-COMPLIANT (failures)
**Assertions:**
- [ ] All gate criteria (G1-G5) from quality-rubric.md are evaluated
- [ ] Each criterion has an individual score
- [ ] Overall verdict reflects the score distribution
- [ ] No files are written
---
## Protocol Compliance
- [ ] Static mode checks exactly 7 structural assertions
- [ ] Spec mode evaluates each test case from the spec file individually
- [ ] Audit mode covers all skills AND agents (not just one category)
- [ ] Category mode reads quality-rubric.md to get criteria (not hardcoded)
- [ ] Does not write any files in any mode
- [ ] Suggests `/skill-improve` as the next step when issues are found
---
## Coverage Notes
- The skill-test skill is self-referential (it can test itself). The static
mode case for skill-test's own SKILL.md is not separately fixture-tested to
avoid infinite recursion in test design.
- The specific 7 structural checks are defined in the skill body; only Check 4
(May I write) is individually tested here because it has the most nuanced logic.
- Audit mode counts are approximate — the exact number of skills and agents will
change as the system grows; assertions use "all" rather than fixed counts.

View File

@@ -0,0 +1,193 @@
# Skill Test Spec: /smoke-check
## Skill Summary
`/smoke-check` is the gate between implementation and QA hand-off. It detects the
test environment, runs the automated test suite (via Bash), scans test coverage
against sprint stories, and uses `AskUserQuestion` to batch-verify manual smoke
checks with the developer. It writes a report to `production/qa/smoke-[date].md`
after explicit user approval.
Verdicts: PASS (tests pass, all smoke checks pass, no missing test evidence),
PASS WITH WARNINGS (tests pass or NOT RUN, all critical checks pass, but advisory
gaps exist such as missing test coverage), or FAIL (any automated test failure or
any Batch 1/Batch 2 smoke check returns FAIL).
No director gates apply. The skill does NOT invoke any director agents.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: PASS, PASS WITH WARNINGS, FAIL
- [ ] Contains "May I write" collaborative protocol language before writing the report
- [ ] Has a next-step handoff (e.g., `/bug-report` on FAIL, QA hand-off guidance on PASS)
---
## Director Gate Checks
None. `/smoke-check` is a pre-QA utility skill. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — Automated tests pass, manual items confirmed, PASS
**Fixture:**
- `tests/` directory exists with a GDUnit4 runner script
- Engine detected as Godot from `technical-preferences.md`
- `production/qa/qa-plan-sprint-005.md` exists
- Automated test runner reports 12 tests, 12 passing, 0 failing
- Developer confirms all Batch 1 and Batch 2 smoke checks as PASS
- All sprint stories have matching test files (no MISSING coverage)
**Input:** `/smoke-check`
**Expected behavior:**
1. Skill detects test directory and engine, notes QA plan found
2. Runs `godot --headless --script tests/gdunit4_runner.gd` via Bash
3. Parses output: 12/12 passing
4. Scans test coverage — all stories COVERED or EXPECTED
5. Uses `AskUserQuestion` for Batch 1 (core stability) and Batch 2 (sprint mechanics)
6. Developer selects PASS for all items
7. Report assembled: automated tests PASS, all smoke checks PASS, no MISSING coverage
8. Asks "May I write this smoke check report to `production/qa/smoke-[date].md`?"
9. Writes report after approval
10. Delivers verdict: PASS
**Assertions:**
- [ ] Automated test runner is invoked via Bash
- [ ] `AskUserQuestion` is used for manual smoke check batches
- [ ] "May I write" is asked before writing the report file
- [ ] Report is written to `production/qa/smoke-[date].md`
- [ ] Verdict is PASS
---
### Case 2: Failure Path — Automated test fails, FAIL verdict
**Fixture:**
- `tests/` directory exists, engine is Godot
- Automated test runner reports 10 tests run: 8 passing, 2 failing
- Failing tests: `test_health_clamp_at_zero`, `test_damage_calculation_negative`
- QA plan exists
**Input:** `/smoke-check`
**Expected behavior:**
1. Skill runs automated tests via Bash
2. Parses output — 2 failures detected
3. Records failing test names
4. Proceeds through manual smoke check batches
5. Report shows automated tests as FAIL with failing test names listed
6. Asks to write report; writes after approval
7. Delivers FAIL verdict with message: "The smoke check failed. Do not hand off to
QA until these failures are resolved." Lists failing tests and suggests fixing
then re-running `/smoke-check`
**Assertions:**
- [ ] Failing test names are listed in the report
- [ ] Verdict is FAIL
- [ ] Post-verdict message directs developer to fix failures before QA hand-off
- [ ] `/smoke-check` re-run is suggested after fixing
---
### Case 3: Manual Confirmation — AskUserQuestion used, PASS WITH WARNINGS
**Fixture:**
- `tests/` directory exists, engine is Godot
- Automated test runner reports all tests passing (8/8)
- One Logic story has no matching test file (MISSING coverage)
- Developer confirms all Batch 1 and Batch 2 smoke checks as PASS
**Input:** `/smoke-check`
**Expected behavior:**
1. Automated tests PASS
2. Coverage scan finds 1 MISSING entry for a Logic story
3. `AskUserQuestion` is used for Batch 1 and Batch 2 — developer confirms all PASS
4. Report shows: automated tests PASS, manual checks all PASS, 1 MISSING coverage entry
5. Verdict is PASS WITH WARNINGS — build ready for QA, but MISSING entry must be
resolved before `/story-done` closes the affected story
6. Asks to write report; writes after approval
**Assertions:**
- [ ] `AskUserQuestion` is used for manual smoke check batches (not inline text prompts)
- [ ] MISSING test coverage entry appears in the report
- [ ] Verdict is PASS WITH WARNINGS (not PASS, not FAIL)
- [ ] Advisory note explains MISSING entry must be resolved before `/story-done`
- [ ] Report file is written to `production/qa/smoke-[date].md`
---
### Case 4: No Test Directory — Skill stops with guidance
**Fixture:**
- `tests/` directory does not exist
- Engine is configured as Godot
**Input:** `/smoke-check`
**Expected behavior:**
1. Phase 1 checks for `tests/` directory — not found
2. Skill outputs: "No test directory found at `tests/`. Run `/test-setup` to
scaffold the testing infrastructure, or create the directory manually if
tests live elsewhere."
3. Skill stops — no automated tests run, no manual smoke checks, no report written
**Assertions:**
- [ ] Error message references the missing `tests/` directory
- [ ] `/test-setup` is suggested as the remediation step
- [ ] Skill stops after this message (no further phases run)
- [ ] No report file is written
---
### Case 5: Director Gate Check — No gate; smoke-check is a QA pre-check utility
**Fixture:**
- Valid test setup, automated tests pass, manual smoke checks confirmed
**Input:** `/smoke-check`
**Expected behavior:**
1. Skill runs all phases and produces a PASS or PASS WITH WARNINGS verdict
2. No director agents are spawned at any point
3. No gate IDs (CD-*, TD-*, AD-*, PR-*) appear in output
4. No `/gate-check` is invoked
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is PASS, PASS WITH WARNINGS, or FAIL — no gate verdict involved
---
## Protocol Compliance
- [ ] Uses `AskUserQuestion` for all manual smoke check batches (Batch 1, Batch 2, Batch 3)
- [ ] Runs automated tests via Bash before asking any manual questions
- [ ] Asks "May I write" before creating the report file — never writes without approval
- [ ] Verdict vocabulary is strictly PASS / PASS WITH WARNINGS / FAIL — no other verdicts
- [ ] FAIL is triggered by automated test failures or Batch 1/Batch 2 FAIL responses
- [ ] PASS WITH WARNINGS is triggered when MISSING test coverage exists but no critical failures
- [ ] NOT RUN (engine binary unavailable) is recorded as a warning, not a FAIL
- [ ] Does not invoke director gates at any point
---
## Coverage Notes
- The `quick` argument (skips Phase 3 coverage scan and Batch 3) is not separately
fixture-tested; it follows the same pattern as Case 1 with a coverage-skip note in output.
- The `--platform` argument adds platform-specific AskUserQuestion batches and a
per-platform verdict table; not separately tested here.
- The case where the engine binary is not on PATH (NOT RUN) follows the PASS WITH
WARNINGS pattern and is covered by the protocol compliance assertions above.

View File

@@ -0,0 +1,178 @@
# Skill Test Spec: /soak-test
## Skill Summary
`/soak-test` generates a structured soak test protocol — an extended runtime
test plan designed to surface memory leaks, performance drift, and stability
issues that only appear under sustained gameplay. The skill produces a document
specifying the test duration, system under test, monitoring checkpoints (e.g.,
memory sample every 30 minutes), pass/fail thresholds, and conditions for early
termination.
The skill asks "May I write to `production/qa/soak-[slug]-[date].md`?" before
persisting. If a previous soak test for the same system exists, the skill offers
to extend the duration or add new conditions. No director gates apply. The verdict
is COMPLETE when the soak test protocol is written.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" collaborative protocol language before writing the protocol
- [ ] Has a next-step handoff (e.g., `/regression-suite` or `/release-checklist`)
---
## Director Gate Checks
None. `/soak-test` is a QA planning utility. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — Online gameplay feature, 2-hour soak protocol
**Fixture:**
- User specifies: system = "online multiplayer lobby", duration = "2 hours"
- `technical-preferences.md` has engine configured
**Input:** `/soak-test online-lobby 2h`
**Expected behavior:**
1. Skill generates a 2-hour soak test protocol for the online lobby system
2. Protocol includes: monitoring checkpoints every 30 minutes, metrics to track
(memory usage, connection count, packet loss), pass thresholds, early termination
conditions (crash or >20% memory growth)
3. Networking-specific checks are included (session drop rate, reconnect handling)
4. Skill asks "May I write to `production/qa/soak-online-lobby-2026-04-06.md`?"
5. File is written on approval; verdict is COMPLETE
**Assertions:**
- [ ] Protocol duration matches the requested 2 hours
- [ ] Monitoring checkpoints are at reasonable intervals (e.g., every 30 minutes)
- [ ] Network-specific checks are included (not just generic memory checks)
- [ ] "May I write" is asked with the correct file path
- [ ] Verdict is COMPLETE
---
### Case 2: No Target Defined — Prompts for system, duration, and conditions
**Fixture:**
- No arguments provided
- No soak test config in session state
**Input:** `/soak-test`
**Expected behavior:**
1. Skill detects no target system or duration specified
2. Skill asks: "What system or feature should be soak-tested?"
3. After user responds with system: Skill asks: "What duration? (e.g., 1h, 4h, 8h)"
4. After user responds with duration: Skill asks for specific conditions or
uses defaults (normal gameplay loop, default player count)
5. Skill generates protocol from collected inputs and asks "May I write"
**Assertions:**
- [ ] At minimum 2 follow-up questions are asked (system + duration)
- [ ] Default conditions are applied when user doesn't specify custom ones
- [ ] Protocol is not generated until system and duration are known
- [ ] Verdict is COMPLETE after file is written
---
### Case 3: Previous Soak Test Exists — Offers to extend or add conditions
**Fixture:**
- `production/qa/soak-online-lobby-2026-03-15.md` exists with a 1-hour protocol
- User wants to extend to 4 hours with new memory threshold conditions
**Input:** `/soak-test online-lobby 4h`
**Expected behavior:**
1. Skill finds existing soak test for online-lobby
2. Skill reports: "Previous soak test found: soak-online-lobby-2026-03-15.md (1h)"
3. Skill presents options: create new protocol (4h standalone), or extend the
existing protocol to 4h and add new conditions
4. User selects extend; existing checkpoints are preserved, new ones added
5. Skill asks "May I write to `production/qa/soak-online-lobby-2026-04-06.md`?"
(new file, not overwriting old one)
**Assertions:**
- [ ] Existing soak test is surfaced and referenced
- [ ] User is offered extend vs. new options
- [ ] New file is created (old file is not overwritten)
- [ ] Extended protocol includes both old and new checkpoints
- [ ] Verdict is COMPLETE
---
### Case 4: Mobile Target Platform — Memory-specific checkpoints added
**Fixture:**
- `technical-preferences.md` specifies target platform: Mobile
- User requests soak test for "gameplay session" at 30 minutes
**Input:** `/soak-test gameplay 30m`
**Expected behavior:**
1. Skill reads `technical-preferences.md` and detects mobile target platform
2. Soak test protocol includes mobile-specific memory checkpoints:
- Check heap memory growth vs. device baseline
- Check texture memory at checkpoint intervals
- Add warning threshold at 300MB (mobile ceiling)
3. Protocol also includes thermal/battery drain advisory notes
4. Skill asks "May I write?" and writes on approval; verdict is COMPLETE
**Assertions:**
- [ ] Mobile platform is detected from technical-preferences.md
- [ ] Memory checkpoints include mobile-appropriate thresholds (not desktop)
- [ ] Thermal/battery notes are present in the protocol
- [ ] Verdict is COMPLETE
---
### Case 5: Director Gate Check — No gate; soak-test is a planning utility
**Fixture:**
- Valid system and duration provided
**Input:** `/soak-test combat 1h`
**Expected behavior:**
1. Skill generates and writes the soak test protocol
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Skill reaches COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Collects system, duration, and conditions before generating protocol
- [ ] Includes monitoring checkpoints at regular intervals
- [ ] Includes pass/fail thresholds and early termination conditions
- [ ] Adapts checkpoints to target platform (mobile vs. desktop)
- [ ] Asks "May I write" before creating the protocol file
- [ ] Verdict is COMPLETE when file is written
---
## Coverage Notes
- Soak tests for specific engine subsystems (rendering pipeline, physics
simulation) follow the same protocol structure and are not separately tested.
- The case where the user provides a duration shorter than the minimum useful
soak period (e.g., 5 minutes) is not tested; the skill would note this is
too short for meaningful results.
- Automated execution of the soak test protocol is outside this skill's scope —
this skill generates the plan, not the runner.

View File

@@ -0,0 +1,173 @@
# Skill Test Spec: /start
## Skill Summary
`/start` is the first-time onboarding skill for new projects. It guides the
user through naming the project, choosing a game engine, and setting up the
initial directory structure. It creates stub configuration files (CLAUDE.md,
technical-preferences.md) and then routes to `/setup-engine` with the chosen
engine as an argument. Each file or directory created is gated behind a
"May I write" ask, following the collaborative protocol.
The skill detects whether a project is already configured and whether a
partial setup exists, offering to resume or restart as appropriate. It has
no director gates — it is a utility setup skill that runs before any agent
hierarchy exists.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, BLOCKED
- [ ] Contains "May I write" collaborative protocol language for each config file
- [ ] Has a next-step handoff at the end (routes to `/setup-engine`)
---
## Director Gate Checks
None. `/start` is a utility setup skill. No director agents exist yet at the
point this skill runs.
---
## Test Cases
### Case 1: Happy Path — Fresh repo, no engine, full onboarding flow
**Fixture:**
- Empty repository: no CLAUDE.md overrides, no `production/stage.txt`, no
`technical-preferences.md` content beyond placeholders
- No existing design docs or source code
**Input:** `/start`
**Expected behavior:**
1. Skill detects no existing configuration and begins fresh onboarding
2. Skill asks for project name
3. Skill presents 3 engine options: Godot 4, Unity, Unreal Engine 5
4. User selects an engine
5. Skill asks "May I write the initial directory structure?"
6. Skill creates all directories defined in `directory-structure.md`
7. Skill asks "May I write CLAUDE.md stub?" and writes it on approval
8. Skill routes to `/setup-engine [chosen-engine]` to complete technical config
**Assertions:**
- [ ] Project name is captured before any file is written
- [ ] Exactly 3 engine options are presented
- [ ] "May I write" is asked for each config file individually
- [ ] No file is written without explicit user approval
- [ ] Handoff to `/setup-engine` occurs at the end with the chosen engine argument
- [ ] Verdict is COMPLETE after all files are written and handoff is issued
---
### Case 2: Already Configured — Detects existing config, offers to skip or reconfigure
**Fixture:**
- `technical-preferences.md` has engine already set (not placeholder)
- `production/stage.txt` exists with `Concept`
**Input:** `/start`
**Expected behavior:**
1. Skill reads `technical-preferences.md` and detects configured engine
2. Skill reports: "This project is already configured with [engine]"
3. Skill presents options: skip (exit), reconfigure engine, or reconfigure specific sections
4. If user selects skip: skill exits cleanly with a summary of current config
5. If user selects reconfigure: skill proceeds to the engine-selection step
**Assertions:**
- [ ] Skill does NOT overwrite existing config without user choosing reconfigure
- [ ] Detected engine name is shown to the user in the status message
- [ ] User is offered at least 2 options (skip or reconfigure)
- [ ] Verdict is COMPLETE whether user skips or reconfigures
---
### Case 3: Engine Choice — User picks Godot 4, routes to /setup-engine godot
**Fixture:**
- Fresh repo — no existing configuration
**Input:** `/start`
**Expected behavior:**
1. Skill presents engine options and user selects Godot 4
2. Skill writes initial stubs (directory structure, CLAUDE.md) after approval
3. Skill explicitly routes to `/setup-engine godot` as the next step
4. Handoff message clearly names the engine and the next skill invocation
**Assertions:**
- [ ] Handoff command is `/setup-engine godot` (not generic `/setup-engine`)
- [ ] Handoff is issued after all initial stubs are written, not before
- [ ] Engine choice is echoed back to user before writing begins
---
### Case 4: Interrupted Setup — Partial config detected, offers resume or restart
**Fixture:**
- Directory structure exists (was created) but `technical-preferences.md` is
still all placeholders (engine was never chosen — setup was interrupted)
- No `production/stage.txt`
**Input:** `/start`
**Expected behavior:**
1. Skill detects partial state: directories exist but engine is unconfigured
2. Skill reports: "A partial setup was detected — directories exist but engine is not configured"
3. Skill offers: resume from engine selection, or restart from scratch
4. If resume: skill skips directory creation, proceeds to engine choice
5. If restart: skill asks "May I overwrite existing structure?" before proceeding
**Assertions:**
- [ ] Partial state is correctly identified (directories present, engine absent)
- [ ] User is offered resume vs. restart choice — not forced into one path
- [ ] Resume path skips re-creating directories (no redundant "May I write" for structure)
- [ ] Restart path asks for permission to overwrite before touching any files
---
### Case 5: Director Gate Check — No gate; start is a utility setup skill
**Fixture:**
- Any fixture
**Input:** `/start`
**Expected behavior:**
1. Skill completes full onboarding flow
2. No director agents are spawned at any point
3. No gate IDs (CD-*, TD-*, AD-*, PR-*) appear in the output
**Assertions:**
- [ ] No director gate is invoked during the skill execution
- [ ] No gate skip messages appear (gates are absent, not suppressed)
- [ ] Skill reaches COMPLETE without any gate verdict
---
## Protocol Compliance
- [ ] Asks for project name before any file is written
- [ ] Presents engine options as a structured choice (not free text)
- [ ] Asks "May I write" separately for directory structure and for CLAUDE.md stub
- [ ] Ends with a handoff to `/setup-engine` with the engine name as argument
- [ ] Verdict is clearly stated (COMPLETE or BLOCKED) at end of output
---
## Coverage Notes
- The case where the user rejects all engine options and provides a custom
engine name is not tested — the skill is designed for the three supported
engines only.
- Git initialization (if any) is not tested here; that is an infrastructure
concern outside the skill boundary.
- Solo vs. lean mode behavior is not applicable — this skill has no gates and
mode selection is irrelevant.

View File

@@ -0,0 +1,175 @@
# Skill Test Spec: /test-helpers
## Skill Summary
`/test-helpers` generates engine-specific test helper utilities for the project's
test suite. Helpers include factory functions (for creating test entities with
known state), fixture loaders, assertion helpers, and mock stubs for external
dependencies. Generated helpers follow the naming and structure conventions in
`coding-standards.md` and are written to `tests/helpers/`.
Each helper file is gated behind a "May I write" ask. If a helper file already
exists, the skill offers to extend it rather than replace. No director gates
apply. The verdict is COMPLETE when helper files are written.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" collaborative protocol language before writing helpers
- [ ] Has a next-step handoff (e.g., write a test using the generated helper)
---
## Director Gate Checks
None. `/test-helpers` is a scaffolding utility. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — Player factory helper generated for Godot/GDScript
**Fixture:**
- `technical-preferences.md` has engine Godot 4, language GDScript
- `tests/` directory exists (test-setup has been run)
- `design/gdd/player.md` exists with defined player properties
- No existing helpers in `tests/helpers/`
**Input:** `/test-helpers player-factory`
**Expected behavior:**
1. Skill reads engine (Godot 4 / GDScript) and player GDD for property context
2. Skill generates a deterministic `PlayerFactory` helper in GDScript:
- `create_player(health: int = 100, speed: float = 200.0)` function
- Returns a player node pre-configured to a known state
- Uses dependency injection (no singletons)
3. Skill asks "May I write to `tests/helpers/player_factory.gd`?"
4. File is written on approval; verdict is COMPLETE
**Assertions:**
- [ ] Generated helper is in GDScript (not C# or Blueprint)
- [ ] Factory function parameters use defaults matching GDD values
- [ ] Helper uses dependency injection (no Autoload/singleton references)
- [ ] Filename follows snake_case convention for GDScript
- [ ] Verdict is COMPLETE
---
### Case 2: No Test Setup Exists — Redirects to /test-setup
**Fixture:**
- `tests/` directory does not exist
**Input:** `/test-helpers player-factory`
**Expected behavior:**
1. Skill checks for `tests/` directory — not found
2. Skill reports: "Test directory not found — test framework must be set up first"
3. Skill suggests running `/test-setup` before generating helpers
4. No helper file is created
**Assertions:**
- [ ] Error message identifies the missing tests/ directory
- [ ] `/test-setup` is suggested as the prerequisite step
- [ ] No write tool is called
- [ ] Verdict is not COMPLETE (blocked state)
---
### Case 3: Helper Already Exists — Offers to extend rather than replace
**Fixture:**
- `tests/helpers/player_factory.gd` already exists with a `create_player()` function
- User requests a new `create_enemy()` function be added to the factory
**Input:** `/test-helpers enemy-factory`
**Expected behavior:**
1. Skill finds an existing `player_factory.gd` and checks if it's the right file
to extend (or if a separate `enemy_factory.gd` should be created)
2. Skill presents options: add `create_enemy()` to existing factory or create
`tests/helpers/enemy_factory.gd`
3. User selects extend; skill drafts the `create_enemy()` function
4. Skill asks "May I extend `tests/helpers/player_factory.gd`?"
5. Function is added on approval; verdict is COMPLETE
**Assertions:**
- [ ] Existing helper is detected and surfaced
- [ ] User is given extend vs. new file choice
- [ ] "May I extend" language is used (not "May I write" for replacement)
- [ ] Existing `create_player()` is preserved in the extended file
- [ ] Verdict is COMPLETE
---
### Case 4: System Has No GDD — Notes missing design context in helper
**Fixture:**
- `technical-preferences.md` has Godot 4 / GDScript
- `tests/` exists
- User requests a helper for the "inventory system" but no `design/gdd/inventory.md` exists
**Input:** `/test-helpers inventory-factory`
**Expected behavior:**
1. Skill looks for `design/gdd/inventory.md` — not found
2. Skill notes: "No GDD found for inventory — generating helper with placeholder defaults"
3. Skill generates an `inventory_factory.gd` with generic placeholder values
(item_count = 0, max_capacity = 20) and a comment: "# TODO: align defaults
with inventory GDD when written"
4. Skill asks "May I write to `tests/helpers/inventory_factory.gd`?"
5. File is written; verdict is COMPLETE with advisory note
**Assertions:**
- [ ] Skill proceeds without GDD (does not block)
- [ ] Generated helper has placeholder defaults with TODO comment
- [ ] Missing GDD is noted in the output (advisory warning)
- [ ] Verdict is COMPLETE
---
### Case 5: Director Gate Check — No gate; test-helpers is a scaffolding utility
**Fixture:**
- Engine configured, tests/ exists
**Input:** `/test-helpers player-factory`
**Expected behavior:**
1. Skill generates and writes the helper file
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Reads engine before generating any helper (helpers are engine-specific)
- [ ] Reads GDD for default values when available
- [ ] Notes missing GDD context rather than blocking
- [ ] Detects existing helper files and offers extend rather than replace
- [ ] Asks "May I write" (or "May I extend") before any file operation
- [ ] Verdict is COMPLETE when helper is written
---
## Coverage Notes
- Mock/stub helper generation (for dependencies like save systems or audio buses)
follows the same pattern as factory helpers and is not separately tested.
- Unity C# helper generation (using NSubstitute or custom mocks) follows the
same logic as Case 1 with language-appropriate output.
- The case where the requested helper type is not recognized is not tested;
the skill would ask the user to clarify the helper type.

View File

@@ -0,0 +1,173 @@
# Skill Test Spec: /test-setup
## Skill Summary
`/test-setup` scaffolds the test framework for the project based on the
configured engine. It creates the `tests/` directory structure defined in
`coding-standards.md` (unit/, integration/, performance/, playtest/) and
generates the appropriate test runner configuration for the detected engine:
GdUnit4 config for Godot, Unity Test Runner asmdef for Unity, or Unreal headless
runner for Unreal Engine.
Each file or directory created is gated behind a "May I write" ask. If the test
framework already exists, the skill verifies the configuration rather than
reinitializing. No director gates apply. The verdict is COMPLETE when the
scaffold is in place.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keyword: COMPLETE
- [ ] Contains "May I write" collaborative protocol language before creating files
- [ ] Has a next-step handoff (e.g., `/test-helpers` to generate helper utilities)
---
## Director Gate Checks
None. `/test-setup` is a scaffolding utility. No director gates apply.
---
## Test Cases
### Case 1: Happy Path — Godot project, scaffolds GdUnit4 test structure
**Fixture:**
- `technical-preferences.md` has engine set to Godot 4, language GDScript
- `tests/` directory does not exist yet
**Input:** `/test-setup`
**Expected behavior:**
1. Skill reads engine from `technical-preferences.md` → Godot 4 + GDScript
2. Skill drafts the test directory structure: tests/unit/, tests/integration/,
tests/performance/, tests/playtest/, and a GdUnit4 runner config file
3. Skill asks "May I write the tests/ directory structure?"
4. Directories and GdUnit4 runner script created on approval
5. Skill confirms the runner script matches the CI command in coding-standards.md:
`godot --headless --script tests/gdunit4_runner.gd`
6. Verdict is COMPLETE
**Assertions:**
- [ ] All 4 subdirectories (unit/, integration/, performance/, playtest/) are created
- [ ] GdUnit4 runner config is generated
- [ ] Runner script path matches coding-standards.md CI command
- [ ] "May I write" is asked before creating any files
- [ ] Verdict is COMPLETE
---
### Case 2: Unity Project — Scaffolds Unity Test Runner with asmdef
**Fixture:**
- `technical-preferences.md` has engine set to Unity, language C#
- `tests/` directory does not exist
**Input:** `/test-setup`
**Expected behavior:**
1. Skill reads engine → Unity + C#
2. Skill creates `Tests/` directory with Unity conventions (capitalized)
3. Skill generates `Tests/Tests.asmdef` and `Tests/Editor/EditorTests.asmdef`
4. EditMode and PlayMode test runner modes are configured
5. Skill asks "May I write the Tests/ directory structure?"
6. Verdict is COMPLETE
**Assertions:**
- [ ] Unity-specific `Tests/` structure is created (not the Godot structure)
- [ ] `.asmdef` files are generated
- [ ] EditMode and PlayMode runner config is present
- [ ] Verdict is COMPLETE
---
### Case 3: Test Framework Already Exists — Verifies config, not re-initialized
**Fixture:**
- `tests/unit/`, `tests/integration/` exist
- GdUnit4 runner script exists (Godot project)
**Input:** `/test-setup`
**Expected behavior:**
1. Skill detects existing tests/ structure
2. Skill reports: "Test framework already exists — verifying configuration"
3. Skill checks: runner script path, directory completeness, CI command alignment
4. If all checks pass: reports "Configuration verified — no changes needed"
5. If checks fail (e.g., missing tests/performance/): reports specific gap and
asks "May I add the missing directories?"
**Assertions:**
- [ ] Skill does NOT reinitialize when framework exists
- [ ] Verification checks are performed on existing structure
- [ ] Only missing parts trigger a "May I write" ask
- [ ] Verdict is COMPLETE whether everything was OK or gaps were fixed
---
### Case 4: No Engine Configured — Redirects to /setup-engine
**Fixture:**
- `technical-preferences.md` contains only placeholders (engine not set)
**Input:** `/test-setup`
**Expected behavior:**
1. Skill reads `technical-preferences.md` and finds engine placeholder
2. Skill reports: "Engine not configured — cannot scaffold engine-specific test framework"
3. Skill suggests running `/setup-engine` first
4. No directories or files are created
**Assertions:**
- [ ] Error message explicitly states engine is not configured
- [ ] `/setup-engine` is suggested as the next step
- [ ] No write tool is called
- [ ] Verdict is not COMPLETE (blocked state)
---
### Case 5: Director Gate Check — No gate; test-setup is a scaffolding utility
**Fixture:**
- Engine configured, tests/ does not exist
**Input:** `/test-setup`
**Expected behavior:**
1. Skill scaffolds and writes all test framework files
2. No director agents are spawned
3. No gate IDs appear in output
**Assertions:**
- [ ] No director gate is invoked
- [ ] No gate skip messages appear
- [ ] Verdict is COMPLETE without any gate check
---
## Protocol Compliance
- [ ] Reads engine from `technical-preferences.md` before generating any scaffold
- [ ] Generates engine-appropriate test runner config (not generic)
- [ ] Creates all 4 subdirectories from coding-standards.md
- [ ] Asks "May I write" before creating files
- [ ] Detects existing framework and offers verification (not reinitialization)
- [ ] Verdict is COMPLETE when scaffold is in place
---
## Coverage Notes
- Unreal Engine test scaffolding (headless runner with `-nullrhi`) follows the
same pattern as Cases 1 and 2 and is not separately fixture-tested.
- CI integration file generation (e.g., `.github/workflows/test.yml`) is
referenced but not assertion-tested here — it may be a separate skill concern.
- The case where tests/ exists but is from a different engine (e.g., Unity tests
in a now-Godot project) is not tested; the skill would detect the mismatch
and offer to reconcile.