添加 claude code game studios 到项目

This commit is contained in:
panw
2026-05-15 14:52:29 +08:00
parent dff559462d
commit a16fe4bff7
415 changed files with 78609 additions and 0 deletions

View File

@@ -0,0 +1,170 @@
# Skill Test Spec: /asset-audit
## Skill Summary
`/asset-audit` audits the `assets/` directory for naming convention compliance,
missing metadata, and format/size issues. It reads asset files against the
conventions and budgets defined in `technical-preferences.md`. No director gates
are invoked. The skill does not write without user approval. Verdicts: COMPLIANT,
WARNINGS, or NON-COMPLIANT.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLIANT, WARNINGS, NON-COMPLIANT
- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
- [ ] Has a next-step handoff (what to do after audit results)
---
## Director Gate Checks
None. Asset auditing is a read-only analysis skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — All assets follow naming conventions
**Fixture:**
- `technical-preferences.md` specifies naming convention: `snake_case`, e.g., `enemy_grunt_idle.png`
- `assets/art/characters/` contains: `enemy_grunt_idle.png`, `enemy_sniper_run.png`
- `assets/audio/sfx/` contains: `sfx_jump_land.ogg`, `sfx_item_pickup.ogg`
- All files are within size budget (textures ≤2MB, audio ≤500KB)
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill reads naming conventions and size budgets from `technical-preferences.md`
2. Skill scans `assets/` recursively
3. All files match `snake_case` convention; all within budget
4. Audit table shows all rows PASS
5. Verdict is COMPLIANT
**Assertions:**
- [ ] Audit covers both art and audio asset directories
- [ ] Each file is checked against naming convention and size budget
- [ ] All rows show PASS when compliant
- [ ] Verdict is COMPLIANT
- [ ] No files are written
---
### Case 2: Non-Compliant — Textures exceed size budget
**Fixture:**
- `assets/art/environment/` contains 5 texture files
- 3 texture files are 4MB each (budget: ≤2MB)
- 2 texture files are within budget
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill reads size budget from `technical-preferences.md` (2MB for textures)
2. Skill scans `assets/art/environment/` — finds 3 oversized textures
3. Audit table lists each oversized file with actual size and budget
4. Verdict is NON-COMPLIANT
5. Skill recommends compression or resolution reduction for flagged files
**Assertions:**
- [ ] All 3 oversized files are listed by name with actual size and budget size
- [ ] Verdict is NON-COMPLIANT when any file exceeds its budget
- [ ] Optimization recommendation is given for oversized files
- [ ] Within-budget files are also listed (showing PASS) for completeness
---
### Case 3: Format Issue — Audio in wrong format
**Fixture:**
- `technical-preferences.md` specifies audio format: OGG
- `assets/audio/music/theme_main.wav` exists (WAV format)
- `assets/audio/sfx/sfx_footstep.ogg` exists (correct OGG format)
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill reads audio format requirement: OGG
2. Skill scans `assets/audio/` — finds `theme_main.wav` in wrong format
3. Audit table flags `theme_main.wav` as FORMAT ISSUE (expected OGG, found WAV)
4. `sfx_footstep.ogg` shows PASS
5. Verdict is WARNINGS (format issues are correctable)
**Assertions:**
- [ ] `theme_main.wav` is flagged as FORMAT ISSUE with expected and actual format noted
- [ ] Verdict is WARNINGS (not NON-COMPLIANT) for format issues, which are correctable
- [ ] Correct-format assets are shown as PASS
- [ ] Skill does not modify or convert any asset files
---
### Case 4: Missing Asset — Asset referenced by GDD but absent from assets/
**Fixture:**
- `design/gdd/enemies.md` references `enemy_boss_idle.png`
- `assets/art/characters/boss/` directory is empty — file does not exist
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill reads GDD references to find expected assets (cross-references with `/content-audit` scope)
2. Skill scans `assets/art/characters/boss/` — file not found
3. Audit table flags `enemy_boss_idle.png` as MISSING ASSET
4. Verdict is NON-COMPLIANT (missing critical art asset)
**Assertions:**
- [ ] Skill checks GDD references to identify expected assets
- [ ] Missing assets are flagged as MISSING ASSET with the GDD reference noted
- [ ] Verdict is NON-COMPLIANT when critical assets are missing
- [ ] Skill does not create or add placeholder assets
---
### Case 5: Gate Compliance — No gate; technical-artist may be consulted separately
**Fixture:**
- 2 files have naming convention violations (CamelCase instead of snake_case)
- `review-mode.txt` contains `full`
**Input:** `/asset-audit`
**Expected behavior:**
1. Skill scans assets and finds 2 naming violations
2. No director gate is invoked regardless of review mode
3. Verdict is WARNINGS
4. Output notes: "Consider having a Technical Artist review naming conventions"
5. Skill presents findings; offers optional audit report write
6. If user opts in: "May I write to `production/qa/asset-audit-[date].md`?"
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Technical artist consultation is suggested (not mandated)
- [ ] Findings table is presented before any write prompt
- [ ] Optional audit report write asks "May I write" before writing
---
## Protocol Compliance
- [ ] Reads `technical-preferences.md` for naming conventions, formats, and size budgets
- [ ] Scans `assets/` directory recursively
- [ ] Audit table shows file name, check type, expected value, actual value, and result
- [ ] Does not modify any asset files
- [ ] No director gates are invoked
- [ ] Verdict is one of: COMPLIANT, WARNINGS, NON-COMPLIANT
---
## Coverage Notes
- Metadata checks (e.g., missing texture import settings in Godot `.import` files)
are not explicitly tested here; they follow the same FORMAT ISSUE flagging pattern.
- The interaction between `/asset-audit` and `/content-audit` (both check GDD
references vs. assets) is intentional overlap; `/asset-audit` focuses on
compliance while `/content-audit` focuses on completeness.

View File

@@ -0,0 +1,172 @@
# Skill Test Spec: /balance-check
## Skill Summary
`/balance-check` reads balance data files (JSON or YAML in `assets/data/`) and
checks each value against the design formulas defined in GDDs under `design/gdd/`.
It produces a findings table with columns: Value → Formula → Deviation → Severity.
No director gates are invoked (read-only analysis). The skill may optionally write
a balance report but asks "May I write" before doing so. Verdicts: BALANCED,
CONCERNS, or OUT OF BALANCE.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: BALANCED, CONCERNS, OUT OF BALANCE
- [ ] Contains "May I write" language (optional report write)
- [ ] Has a next-step handoff (what to do after findings are reviewed)
---
## Director Gate Checks
None. Balance check is a read-only analysis skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — All balance values within formula tolerances
**Fixture:**
- `assets/data/combat-balance.json` exists with 6 stat values
- `design/gdd/combat-system.md` contains formulas for all 6 stats with ±10% tolerance
- All 6 values fall within tolerance
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads all balance data files in `assets/data/`
2. Skill reads GDD formulas from `design/gdd/`
3. Skill computes deviation for each value against its formula
4. All deviations are within ±10% tolerance
5. Skill outputs findings table with all rows showing PASS
6. Verdict is BALANCED
**Assertions:**
- [ ] Findings table is shown for all checked values
- [ ] Each row shows: stat name, formula target, actual value, deviation percentage
- [ ] All rows show PASS or equivalent when within tolerance
- [ ] Verdict is BALANCED
- [ ] No files are written without user approval
---
### Case 2: Out of Balance — Player damage 40% above formula target
**Fixture:**
- `assets/data/combat-balance.json` has `player_damage_base: 140`
- `design/gdd/combat-system.md` formula specifies `player_damage_base = 100` (±10%)
- All other stats are within tolerance
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads combat-balance.json and computes deviation for `player_damage_base`
2. Deviation is +40% — far outside ±10% tolerance
3. Skill flags this row as severity HIGH in the findings table
4. Verdict is OUT OF BALANCE
5. Skill surfaces the HIGH severity item prominently before the table
**Assertions:**
- [ ] `player_damage_base` row shows deviation of +40%
- [ ] Severity is HIGH for deviations exceeding tolerance by more than 2×
- [ ] Verdict is OUT OF BALANCE when any stat has HIGH severity deviation
- [ ] The HIGH severity item is called out explicitly, not buried in table rows
---
### Case 3: No GDD Formulas — Cannot validate, guidance given
**Fixture:**
- `assets/data/economy-balance.yaml` exists with 10 stat values
- No GDD in `design/gdd/` contains formula definitions for economy stats
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads balance data files
2. Skill searches GDDs for formula definitions — finds none for economy stats
3. Skill outputs: "Cannot validate economy stats — no formulas defined. Run /design-system first."
4. No findings table is generated for the economy stats
5. Verdict is CONCERNS (data exists but cannot be validated)
**Assertions:**
- [ ] Skill does not fabricate formula targets when none exist in GDDs
- [ ] Output explicitly names the missing formula source
- [ ] Output recommends running `/design-system` to define formulas
- [ ] Verdict is CONCERNS (not BALANCED, since validation was impossible)
---
### Case 4: Orphan Reference — Balance file references an undefined stat
**Fixture:**
- `assets/data/combat-balance.json` contains a stat `legacy_armor_mult: 1.5`
- `design/gdd/combat-system.md` has no formula for `legacy_armor_mult`
- All other stats have formula definitions and pass validation
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads all stats from combat-balance.json
2. Skill cannot find a formula for `legacy_armor_mult` in any GDD
3. Skill flags `legacy_armor_mult` as ORPHAN REFERENCE in the findings table
4. Other stats are evaluated normally; those within tolerance show PASS
5. Verdict is CONCERNS (orphan reference prevents full validation)
**Assertions:**
- [ ] `legacy_armor_mult` appears in findings table with status ORPHAN REFERENCE
- [ ] Orphan references are distinguished from formula deviations in the table
- [ ] Verdict is CONCERNS when any orphan references are found
- [ ] Skill does not skip orphan stats silently
---
### Case 5: Gate Compliance — Read-only; no gate; optional report requires approval
**Fixture:**
- Balance data and GDD formulas exist; 1 stat has CONCERNS-level deviation (15% above target)
- `review-mode.txt` contains `full`
**Input:** `/balance-check`
**Expected behavior:**
1. Skill reads data and GDDs; generates findings table
2. Verdict is CONCERNS (one stat slightly out of range)
3. No director gate is invoked
4. Skill presents findings table to user
5. Skill offers to write an optional balance report
6. If user says yes: skill asks "May I write to `production/qa/balance-report-[date].md`?"
7. If user says no: skill ends without writing
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Findings table is presented without writing anything automatically
- [ ] Optional report write is offered but not forced
- [ ] "May I write" prompt appears only if user opts in to the report
---
## Protocol Compliance
- [ ] Reads both balance data files and GDD formulas before analysis
- [ ] Findings table shows Value, Formula, Deviation, and Severity columns
- [ ] Does not write any files without explicit user approval
- [ ] No director gates are invoked
- [ ] Verdict is one of: BALANCED, CONCERNS, OUT OF BALANCE
---
## Coverage Notes
- The case where `assets/data/` is entirely empty is not tested; behavior
follows the CONCERNS pattern with a message that no data files were found.
- Tolerance thresholds (±10%, ±20%) are implementation details of the skill;
the tests verify that deviations are detected and classified, not the
exact threshold values.

View File

@@ -0,0 +1,172 @@
# Skill Test Spec: /code-review
## Skill Summary
`/code-review` performs an architectural code review of source files in `src/`,
checking coding standards from `CLAUDE.md` (doc comments on public APIs,
dependency injection over singletons, data-driven values, testability). Findings
are advisory. No director gates are invoked. No code edits are made. Verdicts:
APPROVED, CONCERNS, or NEEDS CHANGES.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: APPROVED, CONCERNS, NEEDS CHANGES
- [ ] Does NOT require "May I write" language (read-only; findings are advisory output)
- [ ] Has a next-step handoff (what to do with findings)
---
## Director Gate Checks
None. Code review is a read-only advisory skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Source file follows all coding standards
**Fixture:**
- `src/gameplay/health_component.gd` exists with:
- All public methods have doc comments (`##` notation)
- No singletons used; dependencies injected via constructor
- No hardcoded values; all constants reference `assets/data/`
- ADR reference in file header: `# ADR: docs/architecture/adr-004-health.md`
- Referenced ADR has `Status: Accepted`
**Input:** `/code-review src/gameplay/health_component.gd`
**Expected behavior:**
1. Skill reads the source file
2. Skill checks all coding standards: doc comments, DI, data-driven, ADR status
3. All checks pass
4. Skill outputs findings summary with all checks PASS
5. Verdict is APPROVED
**Assertions:**
- [ ] Each coding standard check is listed in the output
- [ ] All checks show PASS when standards are met
- [ ] Skill reads referenced ADR to confirm its status
- [ ] Verdict is APPROVED
- [ ] No edits are made to any file
---
### Case 2: Needs Changes — Missing doc comment and singleton usage
**Fixture:**
- `src/ui/inventory_ui.gd` has:
- 2 public methods without doc comments
- Uses `GameManager.instance` (singleton pattern)
- All other standards met
**Input:** `/code-review src/ui/inventory_ui.gd`
**Expected behavior:**
1. Skill reads the source file
2. Skill detects: 2 missing doc comments on public methods
3. Skill detects: singleton usage at specific lines (e.g., line 42, line 87)
4. Findings list the exact method names and line numbers
5. Verdict is NEEDS CHANGES
**Assertions:**
- [ ] Missing doc comments are listed with method names
- [ ] Singleton usage is flagged with file and line number
- [ ] Verdict is NEEDS CHANGES when BLOCKING-level standard violations exist
- [ ] Skill does not edit the file — findings are for the developer to act on
- [ ] Output suggests replacing singleton with dependency injection
---
### Case 3: Architecture Risk — ADR reference is Proposed, not Accepted
**Fixture:**
- `src/core/save_system.gd` has a header comment: `# ADR: docs/architecture/adr-010-save.md`
- `adr-010-save.md` exists but has `Status: Proposed`
- Code itself follows all other coding standards
**Input:** `/code-review src/core/save_system.gd`
**Expected behavior:**
1. Skill reads the source file
2. Skill reads referenced ADR — finds `Status: Proposed`
3. Skill flags this as ARCHITECTURE RISK (code is implementing an unaccepted ADR)
4. Other coding standard checks pass
5. Verdict is CONCERNS (risk flag is advisory, not a hard NEEDS CHANGES)
**Assertions:**
- [ ] Skill reads referenced ADR file to check its status
- [ ] ARCHITECTURE RISK is flagged when ADR status is Proposed
- [ ] Verdict is CONCERNS (not NEEDS CHANGES) for ADR risk — advisory severity
- [ ] Output recommends resolving the ADR before the code goes to production
---
### Case 4: Edge Case — No source files found at specified path
**Fixture:**
- User calls `/code-review src/networking/`
- `src/networking/` directory does not exist
**Input:** `/code-review src/networking/`
**Expected behavior:**
1. Skill attempts to read files in `src/networking/`
2. Directory or files not found
3. Skill outputs an error: "No source files found at `src/networking/`"
4. Skill suggests checking `src/` for valid directories
5. No verdict is emitted (nothing was reviewed)
**Assertions:**
- [ ] Skill does not crash when path does not exist
- [ ] Output names the attempted path in the error message
- [ ] Output suggests checking `src/` for valid file paths
- [ ] No verdict is emitted when there is nothing to review
---
### Case 5: Gate Compliance — No gate; LP may be consulted separately
**Fixture:**
- Source file follows most standards but has 1 CONCERNS-level finding (a magic number)
- `review-mode.txt` contains `full`
**Input:** `/code-review src/gameplay/loot_system.gd`
**Expected behavior:**
1. Skill reads and reviews the source file
2. No director gate is invoked (code review findings are advisory)
3. Skill presents findings with the CONCERNS verdict
4. Output notes: "Consider requesting a Lead Programmer review for architecture concerns"
5. Skill does not invoke any agent automatically
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] LP consultation is suggested (not mandated) in the output
- [ ] No code edits are made
- [ ] Verdict is CONCERNS for advisory-level findings
---
## Protocol Compliance
- [ ] Reads source file(s) and coding standards before reviewing
- [ ] Lists each coding standard check in findings output
- [ ] Does not edit any source files (read-only skill)
- [ ] No director gates are invoked
- [ ] Verdict is one of: APPROVED, CONCERNS, NEEDS CHANGES
---
## Coverage Notes
- Batch review of all files in a directory is not explicitly tested; behavior
is assumed to apply the same checks file by file and aggregate the verdict.
- Test coverage checks (verifying corresponding test files exist) are a stretch
goal not tested here; that is primarily the domain of `/test-evidence-review`.

View File

@@ -0,0 +1,176 @@
# Skill Test Spec: /consistency-check
## Skill Summary
`/consistency-check` scans all GDDs in `design/gdd/` and checks for internal
conflicts across documents. It produces a structured findings table with columns:
System A vs System B, Conflict Type, Severity (HIGH / MEDIUM / LOW). Conflict
types include: formula mismatch, competing ownership, stale reference, and
dependency gap.
The skill is read-only during analysis. It has no director gates. An optional
consistency report can be written to `design/consistency-report-[date].md` if the
user requests it, but the skill asks "May I write" before doing so.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
- [ ] Does NOT require "May I write" language during analysis (read-only scan)
- [ ] Has a next-step handoff at the end
- [ ] Documents that report writing is optional and requires approval
---
## Director Gate Checks
No director gates — this skill spawns no director gate agents. Consistency
checking is a mechanical scan; no creative or technical director review is
required as part of the scan itself.
---
## Test Cases
### Case 1: Happy Path — 4 GDDs with no conflicts
**Fixture:**
- `design/gdd/` contains exactly 4 system GDDs
- All GDDs have consistent formulas (no overlapping variables with different values)
- No two GDDs claim ownership of the same game entity or mechanic
- All dependency references point to GDDs that exist
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill reads all 4 GDDs in `design/gdd/`
2. Runs cross-GDD consistency checks (formulas, ownership, references)
3. No conflicts found
4. Outputs structured findings table showing 0 issues
5. Verdict: CONSISTENT
**Assertions:**
- [ ] All 4 GDDs are read before producing output
- [ ] Findings table is present (even if empty — shows "No conflicts found")
- [ ] Verdict is CONSISTENT when no conflicts exist
- [ ] Skill does NOT write any files without user approval
- [ ] Next-step handoff is present
---
### Case 2: Failure Path — Two GDDs with conflicting damage formulas
**Fixture:**
- GDD-A defines damage formula: `damage = attack * 1.5`
- GDD-B defines damage formula: `damage = attack * 2.0` for the same entity type
- Both GDDs refer to the same "attack" variable
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill reads all GDDs and detects the formula mismatch
2. Findings table includes an entry: GDD-A vs GDD-B | Formula Mismatch | HIGH
3. Specific conflicting formulas are shown (not just "formula conflict exists")
4. Verdict: CONFLICTS FOUND
**Assertions:**
- [ ] Verdict is CONFLICTS FOUND (not CONSISTENT)
- [ ] Conflict entry names both GDD filenames
- [ ] Conflict type is "Formula Mismatch"
- [ ] Severity is HIGH for a direct formula contradiction
- [ ] Both conflicting formulas are shown in the findings table
- [ ] Skill does NOT auto-resolve the conflict
---
### Case 3: Partial Path — GDD references a system with no GDD
**Fixture:**
- GDD-A's Dependencies section lists "system-B" as a dependency
- No GDD for system-B exists in `design/gdd/`
- All other GDDs are consistent
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill reads all GDDs and checks dependency references
2. GDD-A's reference to "system-B" cannot be resolved — no GDD exists for it
3. Findings table includes: GDD-A vs (missing) | Dependency Gap | MEDIUM
4. Verdict: DEPENDENCY GAP (not CONSISTENT, not CONFLICTS FOUND)
**Assertions:**
- [ ] Verdict is DEPENDENCY GAP (distinct from CONSISTENT and CONFLICTS FOUND)
- [ ] Findings entry names GDD-A and the missing system-B
- [ ] Severity is MEDIUM for an unresolved dependency reference
- [ ] Skill suggests running `/design-system system-B` to create the missing GDD
---
### Case 4: Edge Case — No GDDs found
**Fixture:**
- `design/gdd/` directory is empty or does not exist
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill attempts to read files in `design/gdd/`
2. No GDD files found
3. Skill outputs an error: "No GDDs found in `design/gdd/`. Run `/design-system` to create GDDs first."
4. No findings table is produced
5. No verdict is issued
**Assertions:**
- [ ] Skill outputs a clear error message when no GDDs are found
- [ ] No verdict is produced (CONSISTENT / CONFLICTS FOUND / DEPENDENCY GAP)
- [ ] Skill recommends the correct next action (`/design-system`)
- [ ] Skill does NOT crash or produce a partial report
---
### Case 5: Director Gate — No gate spawned; no review-mode.txt read
**Fixture:**
- `design/gdd/` contains ≥2 GDDs
- `production/session-state/review-mode.txt` exists with `full`
**Input:** `/consistency-check`
**Expected behavior:**
1. Skill reads all GDDs and runs the consistency scan
2. Skill does NOT read `production/session-state/review-mode.txt`
3. No director gate agents are spawned at any point
4. Findings table and verdict are produced normally
**Assertions:**
- [ ] No director gate agents are spawned (no CD-, TD-, PR-, AD- prefixed gates)
- [ ] Skill does NOT read `production/session-state/review-mode.txt`
- [ ] Output contains no "Gate: [GATE-ID]" or gate-skipped entries
- [ ] Review mode has no effect on this skill's behavior
---
## Protocol Compliance
- [ ] Reads all GDDs before producing the findings table
- [ ] Findings table shown in full before any write ask (if report is requested)
- [ ] Verdict is one of exactly: CONSISTENT, CONFLICTS FOUND, DEPENDENCY GAP
- [ ] No director gates — no review-mode.txt read
- [ ] Report writing (if requested) gated by "May I write" approval
- [ ] Ends with next-step handoff appropriate to verdict
---
## Coverage Notes
- This skill checks for structural consistency between GDDs. Deep design theory
analysis (pillar drift, dominant strategies) is handled by `/review-all-gdds`.
- Formula conflict detection relies on consistent formula notation across GDDs —
informal descriptions of the same mechanic may not be detected.
- The conflict severity rubric (HIGH / MEDIUM / LOW) is defined in the skill body
and not re-enumerated here.

View File

@@ -0,0 +1,164 @@
# Skill Test Spec: /content-audit
## Skill Summary
`/content-audit` reads GDDs in `design/gdd/` and checks whether all content
items specified there (enemies, items, levels, etc.) are accounted for in
`assets/`. It produces a gap table: Content Type → Specified Count → Found Count
→ Missing Items. No director gates are invoked. The skill does not write without
user approval. Verdicts: COMPLETE, GAPS FOUND, or MISSING CRITICAL CONTENT.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
- [ ] Does NOT require "May I write" language (read-only output; write is optional report)
- [ ] Has a next-step handoff (what to do after gap table is reviewed)
---
## Director Gate Checks
None. Content audit is a read-only analysis skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — All specified content present
**Fixture:**
- `design/gdd/enemies.md` specifies 4 enemy types: Grunt, Sniper, Tank, Boss
- `assets/art/characters/` contains folders: `grunt/`, `sniper/`, `tank/`, `boss/`
- `design/gdd/items.md` specifies 3 item types; all 3 found in `assets/data/items/`
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads all GDDs in `design/gdd/`
2. Skill scans `assets/` for each specified content item
3. All 4 enemy types and 3 item types are found
4. Gap table shows: all rows have Found Count = Specified Count, no missing items
5. Verdict is COMPLETE
**Assertions:**
- [ ] Gap table covers all content types found in GDDs
- [ ] Each row shows Specified Count and Found Count
- [ ] No missing items when counts match
- [ ] Verdict is COMPLETE
- [ ] No files are written
---
### Case 2: Gaps Found — Enemy type missing from assets
**Fixture:**
- `design/gdd/enemies.md` specifies 3 enemy types: Grunt, Sniper, Boss
- `assets/art/characters/` contains: `grunt/`, `sniper/` only (Boss folder missing)
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads GDD — finds 3 enemy types specified
2. Skill scans `assets/art/characters/` — finds only 2
3. Gap table row for enemies: Specified 3, Found 2, Missing: Boss
4. Verdict is GAPS FOUND
**Assertions:**
- [ ] Gap table row identifies "Boss" as the missing item by name
- [ ] Specified Count (3) and Found Count (2) are both shown
- [ ] Verdict is GAPS FOUND when any content item is missing
- [ ] Skill does not assume the asset will be added later — it flags it now
---
### Case 3: No GDD Content Specs Found — Guidance given
**Fixture:**
- `design/gdd/` contains only `core-loop.md` which has no content inventory section
- No other GDDs exist with content specifications
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads all GDDs — finds no content inventory sections
2. Skill outputs: "No content specifications found in GDDs — run /design-system first to define content lists"
3. No gap table is produced
4. Verdict is GAPS FOUND (cannot confirm completeness without specs)
**Assertions:**
- [ ] Skill does not produce a gap table when no GDD content specs exist
- [ ] Output recommends running `/design-system`
- [ ] Verdict reflects inability to confirm completeness
---
### Case 4: Edge Case — Asset in wrong format for target platform
**Fixture:**
- `design/gdd/audio.md` specifies audio assets as OGG format
- `assets/audio/sfx/jump.wav` exists (WAV format, not OGG)
- `assets/audio/sfx/land.ogg` exists (correct format)
- `technical-preferences.md` specifies audio format: OGG
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads GDD audio spec and technical preferences for format requirements
2. Skill finds `jump.wav` — present but in wrong format
3. Gap table row for audio: Specified 2, Found 2 (by name), but `jump.wav` flagged as FORMAT ISSUE
4. Verdict is GAPS FOUND (format compliance is part of content completeness)
**Assertions:**
- [ ] Skill checks asset format against GDD or technical preferences when format is specified
- [ ] `jump.wav` is flagged as FORMAT ISSUE with expected format (OGG) noted
- [ ] Format issues are distinct from missing content in the gap table
- [ ] Verdict is GAPS FOUND when format issues exist
---
### Case 5: Gate Compliance — Read-only; no gate; gap table for human review
**Fixture:**
- GDDs specify 10 content items; 9 are found in assets; 1 is missing
- `review-mode.txt` contains `full`
**Input:** `/content-audit`
**Expected behavior:**
1. Skill reads GDDs and scans assets; produces gap table
2. No director gate is invoked regardless of review mode
3. Skill presents gap table to user as read-only output
4. Verdict is GAPS FOUND
5. Skill offers to write an audit report but does not write automatically
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Gap table is presented without auto-writing any file
- [ ] Optional report write is offered but not forced
- [ ] Skill does not modify any asset files
---
## Protocol Compliance
- [ ] Reads GDDs and asset directory before producing gap table
- [ ] Gap table shows Content Type, Specified Count, Found Count, Missing Items
- [ ] Does not write files without explicit user approval
- [ ] No director gates are invoked
- [ ] Verdict is one of: COMPLETE, GAPS FOUND, MISSING CRITICAL CONTENT
---
## Coverage Notes
- MISSING CRITICAL CONTENT verdict (vs. GAPS FOUND) is triggered when the
missing item is tagged as critical in the GDD; this is not explicitly tested
but follows the same detection path.
- The case where `assets/` directory does not exist is not tested; the skill
would produce a MISSING CRITICAL CONTENT verdict for all specified items.

View File

@@ -0,0 +1,168 @@
# Skill Test Spec: /estimate
## Skill Summary
`/estimate` estimates task or story effort using a relative-size scale (S / M /
L / XL) based on story complexity, acceptance criteria count, and historical
sprint velocity from past sprint files. Estimates are advisory and are never
written automatically. No director gates are invoked. Verdicts are effort ranges,
not pass/fail — every run produces an estimate.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains size labels: S, M, L, XL (the "verdict" equivalents for this skill)
- [ ] Does NOT require "May I write" language (advisory output only)
- [ ] Has a next-step handoff (how to use the estimate in sprint planning)
---
## Director Gate Checks
None. Estimation is an advisory informational skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Clear story with known tech stack
**Fixture:**
- `production/epics/combat/story-hitbox-detection.md` exists with:
- 4 clear Acceptance Criteria
- ADR reference (Accepted status)
- No "unknown" or "TBD" language in story body
- `production/sprints/sprint-003.md` through `sprint-005.md` exist with velocity data
- Tech stack is GDScript (well-understood by team per sprint history)
**Input:** `/estimate production/epics/combat/story-hitbox-detection.md`
**Expected behavior:**
1. Skill reads the story file — assesses clarity, AC count, tech stack
2. Skill reads sprint history to determine average velocity
3. Skill outputs estimate: M (12 days) with reasoning
4. No files are written
**Assertions:**
- [ ] Estimate is M for a clear, well-scoped story with known tech
- [ ] Reasoning references AC count, tech stack familiarity, and velocity data
- [ ] Estimate is presented as a range (e.g., "12 days"), not a single point
- [ ] No files are written
---
### Case 2: High Uncertainty — Unknown system, no ADR yet
**Fixture:**
- `production/epics/online/story-lobby-matchmaking.md` exists with:
- 2 vague Acceptance Criteria (using "should" and "TBD")
- No ADR reference — matchmaking architecture not yet decided
- References new subsystem ("online/matchmaking") with no existing source files
**Input:** `/estimate production/epics/online/story-lobby-matchmaking.md`
**Expected behavior:**
1. Skill reads story — finds vague AC, no ADR, no existing source
2. Skill flags multiple uncertainty factors
3. Estimate is LXL with an explicit risk note: "Estimate range is wide due to architectural unknowns"
4. Skill recommends creating an ADR before development begins
**Assertions:**
- [ ] Estimate is L or XL (not S or M) when significant unknowns exist
- [ ] Risk note explains the specific unknowns driving the wide range
- [ ] Output recommends resolving architectural questions first
- [ ] No files are written
---
### Case 3: No Sprint Velocity Data — Conservative defaults used
**Fixture:**
- Story file exists and is well-defined
- `production/sprints/` is empty — no historical sprints
**Input:** `/estimate production/epics/core/story-save-load.md`
**Expected behavior:**
1. Skill reads story — assesses complexity
2. Skill attempts to read sprint velocity data — finds none
3. Skill notes: "No sprint history found — using conservative defaults for velocity"
4. Estimate is produced using default assumptions (e.g., 1 story point = 1 day)
5. No files are written
**Assertions:**
- [ ] Skill does not error when no sprint history exists
- [ ] Output explicitly notes that conservative defaults are being used
- [ ] Estimate is still produced (not blocked by missing velocity)
- [ ] Conservative defaults produce a higher (not lower) estimate range
---
### Case 4: Multiple Stories — Each estimated individually plus sprint total
**Fixture:**
- User provides a sprint file: `production/sprints/sprint-007.md` with 4 stories
- Sprint history exists (3 previous sprints)
**Input:** `/estimate production/sprints/sprint-007.md`
**Expected behavior:**
1. Skill reads sprint file — identifies 4 stories
2. Skill estimates each story individually: S, M, M, L
3. Skill computes sprint total: approximately 68 story points
4. Skill presents per-story estimates followed by sprint total
5. No files are written
**Assertions:**
- [ ] Each story receives its own estimate label
- [ ] Sprint total is presented after individual estimates
- [ ] Total is a sum range derived from individual ranges
- [ ] Skill handles sprint files (not just single story files) as input
---
### Case 5: Gate Compliance — No gate; estimates are informational
**Fixture:**
- Story file exists with medium complexity
- `review-mode.txt` contains `full`
**Input:** `/estimate production/epics/core/story-item-pickup.md`
**Expected behavior:**
1. Skill reads story and sprint history; computes estimate
2. No director gate is invoked in any review mode
3. Estimate is presented as advisory output only
4. Skill notes: "Use this estimate in /sprint-plan when selecting stories for the next sprint"
**Assertions:**
- [ ] No director gate is invoked regardless of review mode
- [ ] Output is purely informational — no approval or write prompt
- [ ] Next-step recommendation references `/sprint-plan`
- [ ] Estimate does not change based on review mode
---
## Protocol Compliance
- [ ] Reads story file before estimating
- [ ] Reads sprint velocity history when available
- [ ] Produces effort range (S/M/L/XL), not a single number
- [ ] Does not write any files
- [ ] No director gates are invoked
- [ ] Always produces an estimate (never blocked by missing data; uses defaults instead)
---
## Coverage Notes
- The skill does not produce PASS/FAIL verdicts; the "verdict" here is the
effort range itself. Test assertions focus on the accuracy of the range
and the quality of the reasoning, not a binary outcome.
- Team-specific velocity calibration (what "M" means for this team) is an
implementation detail not tested here; it is configured via sprint history.

View File

@@ -0,0 +1,171 @@
# Skill Test Spec: /perf-profile
## Skill Summary
`/perf-profile` is a structured performance profiling workflow that identifies
bottlenecks and recommends optimizations. If profiler data or performance logs
are provided, it analyzes them directly. If not, it guides the user through a
manual profiling checklist. No director gates are invoked. The skill asks
"May I write to `production/qa/perf-[date].md`?" before persisting a report.
Verdicts: WITHIN BUDGET, CONCERNS, or OVER BUDGET.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: WITHIN BUDGET, CONCERNS, OVER BUDGET
- [ ] Contains "May I write" language (skill writes perf report)
- [ ] Has a next-step handoff (what to do after performance findings are reviewed)
---
## Director Gate Checks
None. Performance profiling is an advisory analysis skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Frame data provided, draw call spike found
**Fixture:**
- User provides `production/qa/profiler-export-2026-03-15.json` with frame time data
- Data shows: average frame time 14ms (within 16.6ms budget), but frames 4248 spike to 28ms
- Spike correlates with a scene with 450 draw calls (budget: 200)
**Input:** `/perf-profile production/qa/profiler-export-2026-03-15.json`
**Expected behavior:**
1. Skill reads profiler data
2. Skill identifies average frame time is within budget
3. Skill identifies draw call spike on frames 4248 (450 calls vs 200 budget)
4. Verdict is CONCERNS (average OK, but spikes indicate an issue)
5. Skill recommends batching or culling for the identified scene
6. Skill asks "May I write to `production/qa/perf-2026-04-06.md`?"
**Assertions:**
- [ ] Spike frames are identified by frame number
- [ ] Draw call count and budget are compared explicitly
- [ ] Verdict is CONCERNS when spikes exceed budget even if average is OK
- [ ] At least one specific optimization recommendation is given
- [ ] "May I write" prompt appears before writing report
---
### Case 2: No Profiler Data — Manual checklist output
**Fixture:**
- User runs `/perf-profile` with no arguments
- No profiler data files exist in `production/qa/`
**Input:** `/perf-profile`
**Expected behavior:**
1. Skill finds no profiler data
2. Skill outputs a manual profiling checklist for the user to work through:
- Enable Godot profiler or target engine's profiler
- Record a 60-second play session
- Export frame time data
- Note any dropped frames or hitches
3. Skill asks user to provide data once collected before running analysis
**Assertions:**
- [ ] Skill does not crash or emit a verdict when no data is provided
- [ ] Manual profiling checklist is output (actionable steps, not just an error)
- [ ] No verdict is emitted (there is nothing to assess yet)
- [ ] No files are written
---
### Case 3: Over Budget — Frame budget exceeded for target platform
**Fixture:**
- Profiler data shows consistent 22ms frame times (target: 16.6ms for 60fps)
- All frames exceed budget; no single spike — systemic issue
- `technical-preferences.md` specifies target platform: PC, 60fps
**Input:** `/perf-profile production/qa/profiler-export-2026-03-20.json`
**Expected behavior:**
1. Skill reads profiler data and technical preferences for performance budget
2. All frames are over the 16.6ms budget
3. Verdict is OVER BUDGET
4. Skill outputs a prioritized optimization list (e.g., LOD system, shader complexity, physics tick rate)
5. Skill asks "May I write" before writing report
**Assertions:**
- [ ] Verdict is OVER BUDGET when all or most frames exceed budget
- [ ] Target frame budget is read from `technical-preferences.md` (not hardcoded)
- [ ] Optimization priority list is provided, not just the raw verdict
- [ ] "May I write" prompt appears before report write
---
### Case 4: Previous Perf Report Exists — Delta comparison
**Fixture:**
- `production/qa/perf-2026-03-28.md` exists with prior results (avg 15ms, max 19ms)
- New profiler export shows: avg 13ms, max 17ms
- Both reports are for the same scene
**Input:** `/perf-profile production/qa/profiler-export-2026-04-05.json`
**Expected behavior:**
1. Skill reads new profiler data
2. Skill detects prior report for the same scene
3. Skill computes deltas: avg improved 2ms, max improved 2ms
4. Skill presents regression check: no regressions detected
5. Verdict is WITHIN BUDGET; report notes improvement since last profile
**Assertions:**
- [ ] Skill checks `production/qa/` for prior perf reports before writing
- [ ] Delta comparison is shown (prior vs. current for key metrics)
- [ ] Verdict is WITHIN BUDGET when current metrics are within budget
- [ ] Improvement trend is noted positively in the report
---
### Case 5: Gate Compliance — No gate; performance-analyst separate
**Fixture:**
- Profiler data shows CONCERNS-level findings (some spikes)
- `review-mode.txt` contains `full`
**Input:** `/perf-profile production/qa/profiler-export-2026-04-01.json`
**Expected behavior:**
1. Skill analyzes profiler data; verdict is CONCERNS
2. No director gate is invoked regardless of review mode
3. Output notes: "For in-depth analysis, consider running `/perf-profile` with the performance-analyst agent"
4. Skill asks "May I write" and writes report on user approval
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Performance-analyst consultation is suggested (not mandated)
- [ ] "May I write" prompt appears before report write
- [ ] Verdict is CONCERNS for spike-based findings
---
## Protocol Compliance
- [ ] Reads profiler data when provided; outputs checklist when not
- [ ] Reads `technical-preferences.md` for target platform frame budget
- [ ] Checks for prior perf reports to enable delta comparison
- [ ] Always asks "May I write" before writing report
- [ ] No director gates are invoked
- [ ] Verdict is one of: WITHIN BUDGET, CONCERNS, OVER BUDGET
---
## Coverage Notes
- Platform-specific profiling workflows (console, mobile) are not tested here;
the checklist output in Case 2 would be platform-specific in practice.
- The delta comparison in Case 4 assumes reports cover the same scene; cross-scene
comparisons are not explicitly handled.

View File

@@ -0,0 +1,168 @@
# Skill Test Spec: /scope-check
## Skill Summary
`/scope-check` is a Haiku-tier read-only skill that analyzes a feature, sprint,
or story for scope creep risk. It reads sprint and story files and compares them
against the active milestone goals. It is designed for fast, low-cost checks
before or during planning. No director gates are invoked. No files are written.
Verdicts: ON SCOPE, CONCERNS, or SCOPE CREEP DETECTED.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
- [ ] Does NOT require "May I write" language (read-only skill)
- [ ] Has a next-step handoff (what to do based on verdict)
---
## Director Gate Checks
None. Scope check is a read-only advisory skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Sprint stories align with milestone goals
**Fixture:**
- `production/milestones/milestone-03.md` lists 3 goals: combat system, enemy AI, level loading
- `production/sprints/sprint-006.md` contains 5 stories, all tagged to one of the 3 goals
- `production/session-state/active.md` references milestone-03 as the active milestone
**Input:** `/scope-check`
**Expected behavior:**
1. Skill reads active milestone goals from milestone-03
2. Skill reads sprint-006 stories and checks each against milestone goals
3. All 5 stories map to one of the 3 goals
4. Skill outputs a mapping table: story → milestone goal
5. Verdict is ON SCOPE
**Assertions:**
- [ ] Each story is mapped to a milestone goal in the output
- [ ] Verdict is ON SCOPE when all stories map to milestone goals
- [ ] No files are written
- [ ] Skill does not modify sprint or milestone files
---
### Case 2: Scope Creep Detected — Stories introducing systems not in milestone
**Fixture:**
- `production/milestones/milestone-03.md` goals: combat, enemy AI, level loading
- `production/sprints/sprint-006.md` contains 5 stories:
- 3 stories map to milestone goals
- 2 stories reference "online leaderboard" and "achievement system" (not in milestone-03)
**Input:** `/scope-check`
**Expected behavior:**
1. Skill reads milestone goals and sprint stories
2. Skill identifies 2 stories with no matching milestone goal
3. Skill names the out-of-scope stories: "Online Leaderboard Feature", "Achievement System Setup"
4. Verdict is SCOPE CREEP DETECTED
**Assertions:**
- [ ] Out-of-scope stories are named explicitly in the output
- [ ] Verdict is SCOPE CREEP DETECTED when any story has no milestone goal match
- [ ] Skill does not automatically remove the stories — findings are advisory
- [ ] Output recommends deferring the out-of-scope stories to a later milestone
---
### Case 3: No Milestone Defined — CONCERNS; scope cannot be validated
**Fixture:**
- `production/session-state/active.md` has no milestone reference
- `production/milestones/` directory exists but is empty
- `production/sprints/sprint-006.md` has 4 stories
**Input:** `/scope-check`
**Expected behavior:**
1. Skill reads active.md — finds no milestone reference
2. Skill checks `production/milestones/` — no milestone files found
3. Skill outputs: "No active milestone defined — scope cannot be validated"
4. Verdict is CONCERNS
**Assertions:**
- [ ] Skill does not error when no milestone is defined
- [ ] Output explicitly states that scope validation requires a milestone reference
- [ ] Verdict is CONCERNS (not ON SCOPE or SCOPE CREEP DETECTED without data)
- [ ] Output suggests running `/milestone-review` or creating a milestone
---
### Case 4: Single Story Check — Evaluated against its parent epic
**Fixture:**
- User targets a single story: `production/epics/combat/story-parry-timing.md`
- Story references parent epic: `epic-combat.md`
- `production/epics/combat/epic-combat.md` has scope: "melee combat mechanics"
- Story title: "Implement parry timing window" — matches epic scope
**Input:** `/scope-check production/epics/combat/story-parry-timing.md`
**Expected behavior:**
1. Skill reads the specified story file
2. Skill reads the parent epic to get scope definition
3. Skill evaluates story against epic scope — "parry timing" matches "melee combat"
4. Verdict is ON SCOPE
**Assertions:**
- [ ] Single-file argument is accepted (story path, not sprint)
- [ ] Skill reads the parent epic referenced in the story file
- [ ] Story is evaluated against epic scope (not milestone scope) in single-story mode
- [ ] Verdict is ON SCOPE when story matches epic scope
---
### Case 5: Gate Compliance — No gate; PR may be consulted separately
**Fixture:**
- Sprint has 2 SCOPE CREEP stories and 3 ON SCOPE stories
- `review-mode.txt` contains `full`
**Input:** `/scope-check`
**Expected behavior:**
1. Skill reads milestone and sprint; identifies 2 scope creep items
2. No director gate is invoked regardless of review mode
3. Skill presents findings with SCOPE CREEP DETECTED verdict
4. Output notes: "Consider raising scope concerns with the Producer before sprint begins"
5. Skill ends without writing any files
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Producer consultation is suggested (not mandated)
- [ ] No files are written
- [ ] Verdict is SCOPE CREEP DETECTED
---
## Protocol Compliance
- [ ] Reads milestone goals and sprint/story files before analysis
- [ ] Maps each story to a milestone goal (or flags as unmapped)
- [ ] Does not write any files
- [ ] No director gates are invoked
- [ ] Runs on Haiku model tier (fast, low-cost)
- [ ] Verdict is one of: ON SCOPE, CONCERNS, SCOPE CREEP DETECTED
---
## Coverage Notes
- The case where the sprint file itself does not exist is not tested; the
skill would output a CONCERNS verdict with a message about missing sprint data.
- Partial scope overlap (story touches a milestone goal but also introduces
new scope) is not explicitly tested; implementation may classify this as
CONCERNS rather than SCOPE CREEP DETECTED.

View File

@@ -0,0 +1,167 @@
# Skill Test Spec: /security-audit
## Skill Summary
`/security-audit` audits the game for security risks including save data
integrity, network communication, anti-cheat exposure, and data privacy. It
reads source files in `src/` for security patterns and checks whether sensitive
data is handled correctly. No director gates are invoked. The skill does not
write files (findings report only). Verdicts: SECURE, CONCERNS, or
VULNERABILITIES FOUND.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: SECURE, CONCERNS, VULNERABILITIES FOUND
- [ ] Does NOT require "May I write" language (read-only; findings report only)
- [ ] Has a next-step handoff (what to do with findings)
---
## Director Gate Checks
None. Security audit is a read-only advisory skill; no gates are invoked.
---
## Test Cases
### Case 1: Happy Path — Save data encrypted, no hardcoded credentials
**Fixture:**
- `src/core/save_system.gd` uses `Crypto` class to encrypt save data before writing
- No hardcoded API keys, passwords, or credentials in any `src/` file
- No version numbers or internal build IDs exposed in client-facing output
**Input:** `/security-audit`
**Expected behavior:**
1. Skill scans `src/` for security patterns: encryption usage, hardcoded credentials, exposed internals
2. All checks pass: save data encrypted, no credentials found, no exposed internals
3. Findings report shows all checks PASS
4. Verdict is SECURE
**Assertions:**
- [ ] Skill checks save data handling for encryption usage
- [ ] Skill scans for hardcoded credentials (API keys, passwords, tokens)
- [ ] Skill checks for version/build numbers exposed to players
- [ ] All checks shown in findings report
- [ ] Verdict is SECURE when all checks pass
---
### Case 2: Vulnerabilities Found — Unencrypted save data and exposed version
**Fixture:**
- `src/core/save_system.gd` writes save data as plain JSON (no encryption)
- `src/ui/debug_overlay.gd` contains: `label.text = "Build: " + ProjectSettings.get("application/config/version")`
(exposes internal build version to player)
**Input:** `/security-audit`
**Expected behavior:**
1. Skill scans `src/` — finds unencrypted save write in `save_system.gd`
2. Skill finds exposed version string in `debug_overlay.gd`
3. Both findings are flagged as VULNERABILITIES
4. Verdict is VULNERABILITIES FOUND
5. Skill provides remediation recommendations for each vulnerability
**Assertions:**
- [ ] Unencrypted save data is flagged as a vulnerability with file and approximate line
- [ ] Exposed version string is flagged as a vulnerability
- [ ] Remediation suggestion is given for each vulnerability
- [ ] Verdict is VULNERABILITIES FOUND when any vulnerability is detected
- [ ] No files are written or modified
---
### Case 3: Online Features Without Authentication — CONCERNS
**Fixture:**
- `src/networking/lobby.gd` exists with functions: `join_lobby()`, `send_chat()`
- No authentication check is found before `send_chat()` — players can call it without being verified
- Game has online multiplayer features (inferred from file presence)
**Input:** `/security-audit`
**Expected behavior:**
1. Skill scans `src/networking/` — detects online feature code
2. Skill checks for authentication guard before network calls — finds none on `send_chat()`
3. Flags: "Online feature without authentication check — CONCERNS"
4. Verdict is CONCERNS (not VULNERABILITIES FOUND, as this is a missing control, not an exploit)
**Assertions:**
- [ ] Skill detects online features by scanning for networking source files
- [ ] Missing authentication checks before network operations are flagged
- [ ] Verdict is CONCERNS (advisory severity) for missing authentication guards
- [ ] Output recommends adding authentication before network calls
---
### Case 4: Edge Case — No Source Files to Analyze
**Fixture:**
- `src/` directory does not exist or is completely empty
**Input:** `/security-audit`
**Expected behavior:**
1. Skill attempts to scan `src/` — no files found
2. Skill outputs an error: "No source files found in `src/` — nothing to audit"
3. No findings report is generated
4. No verdict is emitted
**Assertions:**
- [ ] Skill does not crash when `src/` is empty or absent
- [ ] Output clearly states that no source files were found
- [ ] No verdict is emitted (there is nothing to assess)
- [ ] Skill suggests verifying the `src/` directory path
---
### Case 5: Gate Compliance — No gate; security-engineer invoked separately
**Fixture:**
- Source files exist; 1 CONCERNS-level finding detected (debug logging enabled in release build)
- `review-mode.txt` contains `full`
**Input:** `/security-audit`
**Expected behavior:**
1. Skill scans source; finds debug logging active in release path
2. No director gate is invoked regardless of review mode
3. Verdict is CONCERNS
4. Output notes: "For formal security review, consider engaging a security-engineer agent"
5. Findings are presented as a read-only report; no files written
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Security-engineer consultation is suggested (not mandated)
- [ ] No files are written
- [ ] Verdict is CONCERNS for advisory-level security findings
---
## Protocol Compliance
- [ ] Reads source files in `src/` before auditing
- [ ] Checks save data encryption, hardcoded credentials, exposed internals, auth guards
- [ ] Provides remediation recommendations for each finding
- [ ] Does not write any files (read-only skill)
- [ ] No director gates are invoked
- [ ] Verdict is one of: SECURE, CONCERNS, VULNERABILITIES FOUND
---
## Coverage Notes
- Anti-cheat analysis (client-side value validation, server authority) is not
explicitly tested here; it follows the CONCERNS or VULNERABILITIES pattern
depending on severity.
- Data privacy compliance (GDPR, COPPA) is out of scope for this spec; those
require legal review beyond code scanning.

View File

@@ -0,0 +1,171 @@
# Skill Test Spec: /tech-debt
## Skill Summary
`/tech-debt` tracks, categorizes, and prioritizes technical debt across the
codebase. It reads `docs/tech-debt-register.md` for the existing debt register
and scans source files in `src/` for inline `TODO` and `FIXME` comments. It
merges and sorts items by severity. No director gates are invoked. The skill
asks "May I write to `docs/tech-debt-register.md`?" before updating. Verdicts:
REGISTER UPDATED or NO NEW DEBT FOUND.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: REGISTER UPDATED, NO NEW DEBT FOUND
- [ ] Contains "May I write" language (skill writes to debt register)
- [ ] Has a next-step handoff (what to do after register is updated)
---
## Director Gate Checks
None. Tech debt tracking is an internal codebase analysis skill; no gates are
invoked.
---
## Test Cases
### Case 1: Happy Path — Inline TODOs plus existing register items merged
**Fixture:**
- `docs/tech-debt-register.md` exists with 2 items (LOW and MEDIUM severity)
- `src/gameplay/combat.gd` has 2 `# TODO` comments and 1 `# FIXME` comment
- `src/ui/hud.gd` has 0 inline debt comments
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill reads `docs/tech-debt-register.md` — finds 2 existing items
2. Skill scans `src/` — finds 3 inline comments (2 TODOs, 1 FIXME)
3. Skill checks whether inline comments already exist in the register (deduplication)
4. Skill presents combined list sorted by severity (FIXME before TODO by default)
5. Skill asks "May I write to `docs/tech-debt-register.md`?"
6. User approves; register updated; verdict REGISTER UPDATED
**Assertions:**
- [ ] Inline comments are found by scanning `src/` recursively
- [ ] Existing register items are not duplicated
- [ ] Combined list is sorted by severity
- [ ] "May I write" prompt appears before any write
- [ ] Verdict is REGISTER UPDATED
---
### Case 2: Register Doesn't Exist — Offered to create it
**Fixture:**
- `docs/tech-debt-register.md` does NOT exist
- `src/` contains 4 inline TODO/FIXME comments
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill attempts to read `docs/tech-debt-register.md` — not found
2. Skill informs user: "No tech-debt-register.md found"
3. Skill offers to create the register with the inline items it found
4. Skill asks "May I write to `docs/tech-debt-register.md`?" (create)
5. User approves; register created with 4 items; verdict REGISTER UPDATED
**Assertions:**
- [ ] Skill does not crash when register file is absent
- [ ] User is offered register creation (not silently skipping)
- [ ] "May I write" prompt reflects file creation (not update)
- [ ] Verdict is REGISTER UPDATED after creation
---
### Case 3: Resolved Item Detected — Marked resolved in register
**Fixture:**
- `docs/tech-debt-register.md` has 3 items; one references `src/gameplay/legacy_input.gd`
- `src/gameplay/legacy_input.gd` has been deleted (refactored away)
- The referenced TODO comment no longer exists in source
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill reads register — finds 3 items
2. Skill scans `src/` — does not find the source location referenced by item 2
3. Skill flags item 2 as RESOLVED (source is gone)
4. Skill presents the resolved item to user for confirmation
5. On approval, register is updated with item 2 marked `Status: Resolved`
**Assertions:**
- [ ] Skill checks whether each register item's source reference still exists
- [ ] Missing source locations result in items being flagged as RESOLVED
- [ ] User confirms before resolved items are written
- [ ] RESOLVED items are kept in the register (not deleted) for audit history
---
### Case 4: Edge Case — CRITICAL debt item surfaces prominently
**Fixture:**
- `src/core/network_sync.gd` has a comment: `# FIXME(CRITICAL): race condition in sync buffer — can corrupt save data`
- `docs/tech-debt-register.md` exists with 5 lower-severity items
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill scans source and finds the CRITICAL-tagged FIXME
2. Skill presents the CRITICAL item at the top of the output — before the full table
3. Skill asks user to acknowledge the critical item before proceeding
4. After acknowledgment, skill presents full debt table and asks to write
5. Register is updated with CRITICAL item at top; verdict REGISTER UPDATED
**Assertions:**
- [ ] CRITICAL items appear at the top of the output, not buried in the table
- [ ] Skill surfaces CRITICAL items before asking to write
- [ ] User acknowledgment of the CRITICAL item is requested
- [ ] CRITICAL severity is preserved in the written register entry
---
### Case 5: Gate Compliance — No gate; register updated only with approval
**Fixture:**
- Inline scan finds 2 new TODOs; register has 3 existing items
- `review-mode.txt` contains `full`
**Input:** `/tech-debt`
**Expected behavior:**
1. Skill scans source and reads register; compiles combined debt list
2. No director gate is invoked regardless of review mode
3. Skill presents sorted debt table to user
4. Skill asks "May I write to `docs/tech-debt-register.md`?"
5. User approves; register updated; verdict REGISTER UPDATED
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Debt table is presented before any write prompt
- [ ] "May I write" prompt appears before file update
- [ ] Write only occurs with explicit user approval
---
## Protocol Compliance
- [ ] Reads `docs/tech-debt-register.md` and scans `src/` before compiling
- [ ] Deduplicates inline comments against existing register items
- [ ] Sorts combined list by severity
- [ ] Always asks "May I write" before updating register
- [ ] No director gates are invoked
- [ ] Verdict is REGISTER UPDATED or NO NEW DEBT FOUND
---
## Coverage Notes
- The case where `src/` is empty or absent is not tested; behavior follows
the NO NEW DEBT FOUND path for the inline scan, but register items would
still be read and presented.
- TODO comments without severity tags are treated as LOW severity by default;
this classification detail is an implementation concern, not tested here.

View File

@@ -0,0 +1,175 @@
# Skill Test Spec: /test-evidence-review
## Skill Summary
`/test-evidence-review` performs a quality review of test files in `tests/`,
checking test naming conventions, determinism, isolation, and absence of
hardcoded magic numbers — all against the project's test standards defined in
`coding-standards.md`. Findings may be flagged for qa-lead review. No director
gates are invoked. The skill does not write without user approval. Verdicts:
PASS, WARNINGS, or FAIL.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: PASS, WARNINGS, FAIL
- [ ] Does NOT require "May I write" language (read-only; write is optional flagging report)
- [ ] Has a next-step handoff (what to do after findings are reviewed)
---
## Director Gate Checks
None. Test evidence review is an advisory quality skill; QL-TEST-COVERAGE gate
is a separate skill invocation and is NOT triggered here.
---
## Test Cases
### Case 1: Happy Path — Tests follow all standards
**Fixture:**
- `tests/unit/combat/health_system_take_damage_test.gd` exists with:
- Naming: `test_health_system_take_damage_reduces_health()` (follows `test_[system]_[scenario]_[expected]`)
- Arrange/Act/Assert structure present
- No `sleep()`, `await` with time values, or random seeds
- No calls to external APIs or file I/O
- No inline magic numbers (uses constants from `tests/unit/combat/fixtures/`)
**Input:** `/test-evidence-review tests/unit/combat/`
**Expected behavior:**
1. Skill reads test standards from `coding-standards.md`
2. Skill reads the test file; checks all 5 standards
3. All checks pass: naming, structure, determinism, isolation, no hardcoded data
4. Verdict is PASS
**Assertions:**
- [ ] Each of the 5 test standards is checked and reported
- [ ] All checks show PASS when standards are met
- [ ] Verdict is PASS
- [ ] No files are written
---
### Case 2: Fail — Timing dependency detected
**Fixture:**
- `tests/unit/ui/hud_update_test.gd` contains:
```gdscript
await get_tree().create_timer(1.0).timeout
assert_eq(label.text, "Ready")
```
- Real-time wait of 1 second used instead of mock or signal-based assertion
**Input:** `/test-evidence-review tests/unit/ui/hud_update_test.gd`
**Expected behavior:**
1. Skill reads the test file
2. Skill detects real-time wait (`create_timer(1.0)`) — non-deterministic timing dependency
3. Skill flags this as a FAIL-level finding
4. Verdict is FAIL
5. Skill recommends replacing the timer with a signal-based assertion or mock
**Assertions:**
- [ ] Real-time wait usage is detected as a non-deterministic timing dependency
- [ ] Finding is classified as FAIL severity (blocking — violates determinism standard)
- [ ] Verdict is FAIL
- [ ] Remediation suggestion references signal-based or mock-based approach
- [ ] Skill does not edit the test file
---
### Case 3: Fail — Test calls external API directly
**Fixture:**
- `tests/unit/networking/auth_test.gd` contains:
```gdscript
var result = HTTPRequest.new().request("https://api.example.com/auth")
```
- Direct HTTP call to external API without a mock
**Input:** `/test-evidence-review tests/unit/networking/auth_test.gd`
**Expected behavior:**
1. Skill reads the test file
2. Skill detects direct external API call (HTTPRequest to live URL)
3. Skill flags this as a FAIL-level finding — violates isolation standard
4. Verdict is FAIL
5. Skill recommends injecting a mock HTTP client
**Assertions:**
- [ ] Direct external API call is detected and flagged
- [ ] Finding is classified as FAIL severity (violates isolation standard)
- [ ] Verdict is FAIL
- [ ] Remediation references dependency injection with a mock HTTP client
- [ ] Skill does not modify the test file
---
### Case 4: Edge Case — No Test Files Found
**Fixture:**
- User calls `/test-evidence-review tests/unit/audio/`
- `tests/unit/audio/` directory does not exist
**Input:** `/test-evidence-review tests/unit/audio/`
**Expected behavior:**
1. Skill attempts to read files in `tests/unit/audio/` — not found
2. Skill outputs: "No test files found at `tests/unit/audio/` — run `/test-setup` to scaffold test directories"
3. No verdict is emitted
**Assertions:**
- [ ] Skill does not crash when path does not exist
- [ ] Output names the attempted path in the message
- [ ] Output recommends `/test-setup` for scaffolding
- [ ] No verdict is emitted when there is nothing to review
---
### Case 5: Gate Compliance — No gate; QL-TEST-COVERAGE is a separate skill
**Fixture:**
- Test file has 1 WARNINGS-level finding (magic number in a non-boundary test)
- `review-mode.txt` contains `full`
**Input:** `/test-evidence-review tests/unit/combat/`
**Expected behavior:**
1. Skill reviews tests; finds 1 WARNINGS-level finding
2. No director gate is invoked (QL-TEST-COVERAGE is invoked separately, not here)
3. Verdict is WARNINGS
4. Output notes: "For full test coverage gate, run `/gate-check` which invokes QL-TEST-COVERAGE"
5. Skill offers optional report write; asks "May I write" if user opts in
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] Output distinguishes this skill from the QL-TEST-COVERAGE gate invocation
- [ ] Optional report requires "May I write" before writing
- [ ] Verdict is WARNINGS for advisory-level test quality issues
---
## Protocol Compliance
- [ ] Reads `coding-standards.md` test standards before reviewing test files
- [ ] Checks naming, Arrange/Act/Assert structure, determinism, isolation, no hardcoded data
- [ ] Does not edit any test files (read-only skill)
- [ ] No director gates are invoked
- [ ] Verdict is one of: PASS, WARNINGS, FAIL
---
## Coverage Notes
- Batch review of all test files in `tests/` is not explicitly tested; behavior
is assumed to apply the same checks file by file and aggregate the verdict.
- The QL-TEST-COVERAGE director gate (which checks test coverage percentage) is
a separate concern and is intentionally NOT invoked by this skill.

View File

@@ -0,0 +1,177 @@
# Skill Test Spec: /test-flakiness
## Skill Summary
`/test-flakiness` detects non-deterministic tests by analyzing test history logs
(if available) or scanning test source code for common flakiness patterns (random
numbers without seeds, real-time waits, external I/O). No director gates are
invoked. The skill does not write without user approval. Verdicts: NO FLAKINESS,
SUSPECT TESTS FOUND, or CONFIRMED FLAKY.
---
## Static Assertions (Structural)
Verified automatically by `/skill-test static` — no fixture needed.
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
- [ ] Does NOT require "May I write" language (read-only; optional report requires approval)
- [ ] Has a next-step handoff (what to do after flakiness findings)
---
## Director Gate Checks
None. Flakiness detection is an advisory quality skill for the QA lead; no gates
are invoked.
---
## Test Cases
### Case 1: Happy Path — Clean test history, no flakiness
**Fixture:**
- `production/qa/test-history/` contains logs for 10 test runs
- All tests pass consistently across all 10 runs (100% pass rate per test)
- No test has a failure pattern
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill reads test history logs from `production/qa/test-history/`
2. Skill computes per-test pass rate across 10 runs
3. All tests pass all 10 runs — no inconsistency detected
4. Verdict is NO FLAKINESS
**Assertions:**
- [ ] Skill reads test history logs when available
- [ ] Per-test pass rate is computed across all available runs
- [ ] Verdict is NO FLAKINESS when all tests pass consistently
- [ ] No files are written
---
### Case 2: Suspect Tests Found — Test fails intermittently in history
**Fixture:**
- `production/qa/test-history/` contains logs for 10 test runs
- `test_combat_damage_applies_crit_multiplier` passes 7 times, fails 3 times
- Failure messages differ (sometimes timeout, sometimes wrong value)
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill reads test history logs — computes pass rates
2. `test_combat_damage_applies_crit_multiplier` has 70% pass rate (threshold: 95%)
3. Skill flags it as SUSPECT with pass rate (7/10) and failure pattern noted
4. Verdict is SUSPECT TESTS FOUND
5. Skill recommends investigating the test for timing or state dependencies
**Assertions:**
- [ ] Tests below the pass-rate threshold are flagged by name
- [ ] Pass rate (fraction and percentage) is shown for each suspect test
- [ ] Failure pattern (e.g., inconsistent error messages) is noted if detectable
- [ ] Verdict is SUSPECT TESTS FOUND
- [ ] Skill recommends investigation steps
---
### Case 3: Source Pattern — Random number used without seed
**Fixture:**
- No test history logs exist
- `tests/unit/loot/loot_drop_test.gd` contains:
```gdscript
var roll = randf() # unseeded random — non-deterministic
assert_gt(roll, 0.5, "Loot should drop above 50%")
```
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill finds no test history logs
2. Skill falls back to source code analysis
3. Skill detects `randf()` call without a preceding `seed()` call
4. Skill flags the test as FLAKINESS RISK (source pattern, not confirmed)
5. Verdict is SUSPECT TESTS FOUND (pattern detected, not confirmed by history)
6. Skill recommends seeding random before the call or mocking the random function
**Assertions:**
- [ ] Source code analysis is used as fallback when no history logs exist
- [ ] Unseeded random number usage is detected as a flakiness risk
- [ ] Verdict is SUSPECT TESTS FOUND (not CONFIRMED FLAKY — no history to confirm)
- [ ] Remediation recommends seeding or mocking
---
### Case 4: No Test History — Source-only analysis with common patterns
**Fixture:**
- `production/qa/test-history/` does not exist
- `tests/` contains 15 test files
- Scan finds 2 tests using `OS.get_ticks_msec()` for timing assertions
- No other flakiness patterns found
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill checks for test history — not found
2. Skill notes: "No test history available — analyzing source code for flakiness patterns only"
3. Skill scans all test files for known patterns: unseeded random, real-time waits, system clock usage
4. Finds 2 tests using `OS.get_ticks_msec()` — flags as FLAKINESS RISK
5. Verdict is SUSPECT TESTS FOUND
**Assertions:**
- [ ] Skill notes clearly that source-only analysis is being performed (no history)
- [ ] Common flakiness patterns are scanned: random, time-based assertions, external I/O
- [ ] `OS.get_ticks_msec()` usage for assertions is flagged as a flakiness risk
- [ ] Verdict is SUSPECT TESTS FOUND when source patterns are found
---
### Case 5: Gate Compliance — No gate; flakiness report is advisory
**Fixture:**
- Test history shows 1 CONFIRMED FLAKY test (fails 6 out of 10 runs)
- `review-mode.txt` contains `full`
**Input:** `/test-flakiness`
**Expected behavior:**
1. Skill analyzes test history; identifies 1 confirmed flaky test
2. No director gate is invoked regardless of review mode
3. Verdict is CONFIRMED FLAKY
4. Skill presents findings and offers optional written report
5. If user opts in: "May I write to `production/qa/flakiness-report-[date].md`?"
**Assertions:**
- [ ] No director gate is invoked in any review mode
- [ ] CONFIRMED FLAKY verdict requires history-based evidence (not just source patterns)
- [ ] Optional report requires "May I write" before writing
- [ ] Flakiness report is advisory for qa-lead; skill does not auto-disable tests
---
## Protocol Compliance
- [ ] Reads test history logs when available; falls back to source analysis when not
- [ ] Notes clearly which analysis mode is being used (history vs. source-only)
- [ ] Flakiness threshold (e.g., 95% pass rate) is used for SUSPECT classification
- [ ] CONFIRMED FLAKY requires history evidence; SUSPECT covers source patterns only
- [ ] Does not disable or modify any test files
- [ ] No director gates are invoked
- [ ] Verdict is one of: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
---
## Coverage Notes
- The pass-rate threshold for SUSPECT classification (95% suggested above) is an
implementation detail; the tests verify that intermittent failures are flagged,
not the exact threshold value.
- Tests that fail due to environment issues (missing assets, wrong platform) are
not flakiness — the skill distinguishes environment failures from non-determinism
in the test itself; this distinction is not explicitly tested here.