# Skill Test Spec: /team-polish

## Skill Summary

Orchestrates the polish team through a six-phase pipeline: performance assessment
(performance-analyst) → optimization (performance-analyst, optionally with
engine-programmer when engine-level root causes are found) → visual polish
(technical-artist, parallel with Phase 2) → audio polish (sound-designer, parallel
with Phase 2) → hardening (qa-tester) → sign-off (orchestrator collects all results
and issues READY FOR RELEASE or NEEDS MORE WORK). Uses `AskUserQuestion` at each
phase transition. Engine-programmer is spawned conditionally only when Phase 1
identifies engine-level root causes. Verdict is READY FOR RELEASE or NEEDS MORE WORK.

---

## Static Assertions (Structural)

- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: READY FOR RELEASE, NEEDS MORE WORK
- [ ] Contains "File Write Protocol" section
- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
- [ ] Sub-agents enforce "May I write to [path]?" before any write
- [ ] Has a next-step handoff at the end (references `/release-checklist`, `/sprint-plan update`, `/gate-check`)
- [ ] Error Recovery Protocol section is present
- [ ] `AskUserQuestion` is used at phase transitions before proceeding
- [ ] Phase 3 (visual polish) and Phase 4 (audio polish) are explicitly run in parallel with Phase 2
- [ ] engine-programmer is conditionally spawned in Phase 2 only when Phase 1 identifies engine-level root causes
- [ ] Phase 6 sign-off compares metrics against budgets before issuing verdict

---

## Test Cases

### Case 1: Happy Path — Full pipeline completes, READY FOR RELEASE verdict

**Fixture:**
- Feature exists and is functionally complete (e.g., `combat` system)
- Performance budgets are defined in technical-preferences.md (e.g., target 60fps, 16ms frame budget)
- No frame budget violations exist before polishing begins
- No audio events are missing; VFX assets are complete
- No regressions are introduced by polish changes

**Input:** `/team-polish combat`

**Expected behavior:**
1. Phase 1: performance-analyst is spawned; profiles the combat system, measures frame budget, checks memory usage; output: performance report showing all metrics within budget, no violations
2. `AskUserQuestion` presents performance report; user approves before Phases 2, 3, and 4 begin
3. Phase 2: performance-analyst applies minor optimizations (e.g., draw call batching); no engine-programmer needed (no engine-level root causes identified)
4. Phases 3 and 4 are launched in parallel alongside Phase 2:
   - Phase 3: technical-artist reviews VFX for quality, optimizes particle systems, adds screen shake and visual juice
   - Phase 4: sound-designer reviews audio events for completeness, checks mix levels, adds ambient audio layers
5. All three parallel phases complete; `AskUserQuestion` presents results; user approves before Phase 5 begins
6. Phase 5: qa-tester runs edge case tests, soak tests, stress tests, and regression tests; all pass
7. `AskUserQuestion` presents test results; user approves before Phase 6
8. Phase 6: orchestrator collects all results; compares before/after performance metrics against budgets; all metrics pass
9. Subagent asks "May I write the polish report to `production/qa/evidence/polish-combat-[date].md`?" before writing
10. Verdict: READY FOR RELEASE

**Assertions:**
- [ ] performance-analyst is spawned first in Phase 1 before any other agents
- [ ] `AskUserQuestion` appears after Phase 1 output and before Phases 2/3/4 launch
- [ ] Phases 3 and 4 Task calls are issued at the same time as Phase 2 (not after Phase 2 completes)
- [ ] engine-programmer is NOT spawned when Phase 1 finds no engine-level root causes
- [ ] qa-tester (Phase 5) is not launched until the parallel phases complete and user approves
- [ ] Phase 6 verdict is based on comparison of metrics against defined budgets
- [ ] Summary report includes: before/after performance metrics, visual polish changes, audio polish changes, test results
- [ ] No files are written by the orchestrator directly
- [ ] Verdict is READY FOR RELEASE

---

### Case 2: Performance Blocker — Frame budget violation cannot be fully resolved

**Fixture:**
- Feature being polished: `particle-storm` VFX system
- Phase 1 identifies a frame budget violation: particle-storm costs 12ms on target hardware (budget is 6ms for this system)
- Phase 2 performance-analyst applies optimizations reducing cost to 9ms — still over the 6ms budget
- Phase 2 cannot fully resolve the violation without a fundamental design change

**Input:** `/team-polish particle-storm`

**Expected behavior:**
1. Phase 1: performance-analyst identifies the 12ms frame cost vs. 6ms budget; reports "FRAME BUDGET VIOLATION: particle-storm costs 12ms, budget is 6ms"
2. `AskUserQuestion` presents the violation; user chooses to proceed with optimization attempt
3. Phase 2: performance-analyst applies optimizations; achieves 9ms — reduced but still over budget; reports "Optimization reduced cost to 9ms (was 12ms) — 3ms over budget. No further gains achievable without design changes."
4. Phases 3 and 4 run in parallel with Phase 2 (visual and audio polish)
5. Phase 5: qa-tester runs regression and edge case tests; all pass
6. Phase 6: orchestrator collects results; frame budget violation (9ms vs 6ms budget) remains unresolved
7. Verdict: NEEDS MORE WORK
8. Report lists the specific unresolved issue: "particle-storm frame cost (9ms) exceeds budget (6ms) by 3ms — requires design scope reduction or budget renegotiation"
9. Next Steps: schedule the remaining issue in `/sprint-plan update`; re-run `/team-polish` after fix

**Assertions:**
- [ ] Frame budget violation is flagged in Phase 1 with specific numbers (actual vs. budget)
- [ ] Phase 2 reports the post-optimization metric explicitly (9ms achieved, 3ms still over)
- [ ] Verdict is NEEDS MORE WORK (not READY FOR RELEASE) when a budget violation remains
- [ ] The specific unresolved issue is listed by name with the remaining gap quantified
- [ ] Next Steps references `/sprint-plan update` for scheduling the remaining fix
- [ ] Phases 3 and 4 still run (polish work is not abandoned due to a Phase 2 partial resolution)
- [ ] Phase 5 qa-tester still runs (regression testing is independent of the performance outcome)

---

### Case 3: No Argument — Usage guidance shown

**Fixture:**
- Any project state

**Input:** `/team-polish` (no argument)

**Expected behavior:**
1. Skill detects no argument is provided
2. Outputs usage guidance: e.g., "Usage: `/team-polish [feature or area]` — specify the feature or area to polish (e.g., `combat`, `main menu`, `inventory system`, `level-1`)"
3. Skill exits without spawning any agents

**Assertions:**
- [ ] Skill does NOT spawn any agents when no argument is provided
- [ ] Usage message includes the correct invocation format with argument examples
- [ ] Skill does NOT attempt to guess a feature from project files
- [ ] No `AskUserQuestion` is used — output is direct guidance

---

### Case 4: Engine-Level Bottleneck — engine-programmer spawned conditionally in Phase 2

**Fixture:**
- Feature being polished: `open-world` environment streaming
- Phase 1 identifies a performance bottleneck with a root cause in the rendering pipeline: "draw call overhead is caused by the engine's scene tree traversal in the spatial indexer — this is an engine-level issue, not a game code issue"
- Performance budgets are defined; the rendering overhead exceeds target frame budget

**Input:** `/team-polish open-world`

**Expected behavior:**
1. Phase 1: performance-analyst profiles the environment; identifies frame budget violation; root cause analysis points to engine-level rendering pipeline (spatial indexer traversal overhead)
2. Phase 1 output explicitly classifies the root cause as engine-level
3. `AskUserQuestion` presents the performance report including the engine-level root cause; user approves before Phase 2
4. Phase 2: performance-analyst is spawned for game-code-level optimizations AND engine-programmer is spawned in parallel for the engine-level rendering fix
5. Phases 3 and 4 also run in parallel with Phase 2 (visual and audio polish)
6. engine-programmer addresses the spatial indexer traversal; provides profiler validation showing the fix reduces overhead
7. Phase 5: qa-tester runs regression tests including tests for the engine-level fix
8. Phase 6: orchestrator collects all results; if metrics are now within budget, verdict is READY FOR RELEASE; if not, NEEDS MORE WORK

**Assertions:**
- [ ] engine-programmer is NOT spawned in Phase 2 unless Phase 1 explicitly identifies an engine-level root cause
- [ ] engine-programmer is spawned in Phase 2 when Phase 1 identifies an engine-level root cause
- [ ] engine-programmer and performance-analyst Task calls in Phase 2 are issued simultaneously (not sequentially)
- [ ] Phases 3 and 4 also run in parallel with Phase 2 (not deferred until Phase 2 completes)
- [ ] engine-programmer's output includes profiler validation of the fix
- [ ] qa-tester in Phase 5 runs regression tests that cover the engine-level change
- [ ] Verdict correctly reflects whether all metrics including the engine fix now meet budgets

---

### Case 5: Regression Found — Polish change broke an existing feature

**Fixture:**
- Feature being polished: `inventory-ui`
- Phases 1–4 complete successfully; performance and polish changes are applied
- Phase 5: qa-tester runs regression tests and finds that a shader optimization applied in Phase 3 broke the item highlight glow effect on hover — an existing feature that was working before the polish pass

**Input:** `/team-polish inventory-ui` (Phase 5 scenario)

**Expected behavior:**
1. Phases 1–4 complete; polish changes include a shader optimization from technical-artist
2. Phase 5: qa-tester runs regression tests and detects "Item highlight glow on hover no longer renders — regression introduced by shader optimization in Phase 3"
3. qa-tester returns test results with the regression noted
4. Orchestrator surfaces the regression immediately: "qa-tester: REGRESSION FOUND — `item-highlight-hover` glow broken by Phase 3 shader optimization"
5. Subagent files a bug report asking "May I write the bug report to `production/qa/evidence/bug-polish-inventory-ui-[date].md`?" before writing
6. Bug report is written after approval; it includes: the broken behavior, the polish change that caused it, reproduction steps, and severity
7. `AskUserQuestion` presents the regression with options:
   - Revert the shader optimization and find an alternative approach
   - Fix the shader optimization to preserve the glow effect
   - Accept the regression and schedule a fix in the next sprint
8. Verdict: NEEDS MORE WORK (regression present regardless of user's chosen resolution path, unless fix is applied within the current session)

**Assertions:**
- [ ] Regression is surfaced before Phase 6 sign-off
- [ ] The specific broken behavior and the responsible change are both named in the report
- [ ] Subagent asks "May I write the bug report to [path]?" before filing
- [ ] Bug report includes: broken behavior, causal change, reproduction steps, severity
- [ ] `AskUserQuestion` offers options including revert, fix in place, and schedule later
- [ ] Verdict is NEEDS MORE WORK when a regression is present and unresolved
- [ ] Verdict may become READY FOR RELEASE only if the regression is fixed within the current polish session and qa-tester re-runs to confirm

---

## Protocol Compliance

- [ ] Phase 1 (assessment) must complete before any other phase begins
- [ ] `AskUserQuestion` is used after every phase output before the next phase launches
- [ ] Phases 3 and 4 are always launched in parallel with Phase 2 (not deferred)
- [ ] engine-programmer is only spawned when Phase 1 explicitly identifies engine-level root causes
- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
- [ ] A partial report is always produced when some agents complete and others block
- [ ] Verdict is exactly READY FOR RELEASE or NEEDS MORE WORK — no other verdict values used
- [ ] NEEDS MORE WORK verdict always lists specific remaining issues with severity
- [ ] Next Steps handoff references `/release-checklist` (on success) and `/sprint-plan update` + `/gate-check` (on failure)

---

## Coverage Notes

- The tools-programmer optional agent (for content pipeline tool verification) is not
  separately tested — it follows the same conditional spawn pattern as engine-programmer
  and is invoked only when content authoring tools are involved in the polished area.
- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
  Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
  + partial-report pattern validated in Cases 2 and 5.
- Phase 6 sign-off logic (collecting and comparing all metrics) is validated implicitly
  by Cases 1 and 2. The distinction between READY FOR RELEASE and NEEDS MORE WORK is
  exercised in both directions across these cases.
- Soak testing and stress testing (Phase 5) are validated implicitly by Case 1's
  qa-tester output. Case 5 focuses on the regression detection aspect of Phase 5.
- The "minimum spec hardware" test path in Phase 5 is not separately tested — it follows
  the same qa-tester delegation pattern when the hardware is available.