Files
pixelheros/CCGS Skill Testing Framework/skills/team/team-polish.md
2026-05-15 14:52:29 +08:00

219 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Skill Test Spec: /team-polish
## Skill Summary
Orchestrates the polish team through a six-phase pipeline: performance assessment
(performance-analyst) → optimization (performance-analyst, optionally with
engine-programmer when engine-level root causes are found) → visual polish
(technical-artist, parallel with Phase 2) → audio polish (sound-designer, parallel
with Phase 2) → hardening (qa-tester) → sign-off (orchestrator collects all results
and issues READY FOR RELEASE or NEEDS MORE WORK). Uses `AskUserQuestion` at each
phase transition. Engine-programmer is spawned conditionally only when Phase 1
identifies engine-level root causes. Verdict is READY FOR RELEASE or NEEDS MORE WORK.
---
## Static Assertions (Structural)
- [ ] Has required frontmatter fields: `name`, `description`, `argument-hint`, `user-invocable`, `allowed-tools`
- [ ] Has ≥2 phase headings
- [ ] Contains verdict keywords: READY FOR RELEASE, NEEDS MORE WORK
- [ ] Contains "File Write Protocol" section
- [ ] File writes are delegated to sub-agents — orchestrator does not write files directly
- [ ] Sub-agents enforce "May I write to [path]?" before any write
- [ ] Has a next-step handoff at the end (references `/release-checklist`, `/sprint-plan update`, `/gate-check`)
- [ ] Error Recovery Protocol section is present
- [ ] `AskUserQuestion` is used at phase transitions before proceeding
- [ ] Phase 3 (visual polish) and Phase 4 (audio polish) are explicitly run in parallel with Phase 2
- [ ] engine-programmer is conditionally spawned in Phase 2 only when Phase 1 identifies engine-level root causes
- [ ] Phase 6 sign-off compares metrics against budgets before issuing verdict
---
## Test Cases
### Case 1: Happy Path — Full pipeline completes, READY FOR RELEASE verdict
**Fixture:**
- Feature exists and is functionally complete (e.g., `combat` system)
- Performance budgets are defined in technical-preferences.md (e.g., target 60fps, 16ms frame budget)
- No frame budget violations exist before polishing begins
- No audio events are missing; VFX assets are complete
- No regressions are introduced by polish changes
**Input:** `/team-polish combat`
**Expected behavior:**
1. Phase 1: performance-analyst is spawned; profiles the combat system, measures frame budget, checks memory usage; output: performance report showing all metrics within budget, no violations
2. `AskUserQuestion` presents performance report; user approves before Phases 2, 3, and 4 begin
3. Phase 2: performance-analyst applies minor optimizations (e.g., draw call batching); no engine-programmer needed (no engine-level root causes identified)
4. Phases 3 and 4 are launched in parallel alongside Phase 2:
- Phase 3: technical-artist reviews VFX for quality, optimizes particle systems, adds screen shake and visual juice
- Phase 4: sound-designer reviews audio events for completeness, checks mix levels, adds ambient audio layers
5. All three parallel phases complete; `AskUserQuestion` presents results; user approves before Phase 5 begins
6. Phase 5: qa-tester runs edge case tests, soak tests, stress tests, and regression tests; all pass
7. `AskUserQuestion` presents test results; user approves before Phase 6
8. Phase 6: orchestrator collects all results; compares before/after performance metrics against budgets; all metrics pass
9. Subagent asks "May I write the polish report to `production/qa/evidence/polish-combat-[date].md`?" before writing
10. Verdict: READY FOR RELEASE
**Assertions:**
- [ ] performance-analyst is spawned first in Phase 1 before any other agents
- [ ] `AskUserQuestion` appears after Phase 1 output and before Phases 2/3/4 launch
- [ ] Phases 3 and 4 Task calls are issued at the same time as Phase 2 (not after Phase 2 completes)
- [ ] engine-programmer is NOT spawned when Phase 1 finds no engine-level root causes
- [ ] qa-tester (Phase 5) is not launched until the parallel phases complete and user approves
- [ ] Phase 6 verdict is based on comparison of metrics against defined budgets
- [ ] Summary report includes: before/after performance metrics, visual polish changes, audio polish changes, test results
- [ ] No files are written by the orchestrator directly
- [ ] Verdict is READY FOR RELEASE
---
### Case 2: Performance Blocker — Frame budget violation cannot be fully resolved
**Fixture:**
- Feature being polished: `particle-storm` VFX system
- Phase 1 identifies a frame budget violation: particle-storm costs 12ms on target hardware (budget is 6ms for this system)
- Phase 2 performance-analyst applies optimizations reducing cost to 9ms — still over the 6ms budget
- Phase 2 cannot fully resolve the violation without a fundamental design change
**Input:** `/team-polish particle-storm`
**Expected behavior:**
1. Phase 1: performance-analyst identifies the 12ms frame cost vs. 6ms budget; reports "FRAME BUDGET VIOLATION: particle-storm costs 12ms, budget is 6ms"
2. `AskUserQuestion` presents the violation; user chooses to proceed with optimization attempt
3. Phase 2: performance-analyst applies optimizations; achieves 9ms — reduced but still over budget; reports "Optimization reduced cost to 9ms (was 12ms) — 3ms over budget. No further gains achievable without design changes."
4. Phases 3 and 4 run in parallel with Phase 2 (visual and audio polish)
5. Phase 5: qa-tester runs regression and edge case tests; all pass
6. Phase 6: orchestrator collects results; frame budget violation (9ms vs 6ms budget) remains unresolved
7. Verdict: NEEDS MORE WORK
8. Report lists the specific unresolved issue: "particle-storm frame cost (9ms) exceeds budget (6ms) by 3ms — requires design scope reduction or budget renegotiation"
9. Next Steps: schedule the remaining issue in `/sprint-plan update`; re-run `/team-polish` after fix
**Assertions:**
- [ ] Frame budget violation is flagged in Phase 1 with specific numbers (actual vs. budget)
- [ ] Phase 2 reports the post-optimization metric explicitly (9ms achieved, 3ms still over)
- [ ] Verdict is NEEDS MORE WORK (not READY FOR RELEASE) when a budget violation remains
- [ ] The specific unresolved issue is listed by name with the remaining gap quantified
- [ ] Next Steps references `/sprint-plan update` for scheduling the remaining fix
- [ ] Phases 3 and 4 still run (polish work is not abandoned due to a Phase 2 partial resolution)
- [ ] Phase 5 qa-tester still runs (regression testing is independent of the performance outcome)
---
### Case 3: No Argument — Usage guidance shown
**Fixture:**
- Any project state
**Input:** `/team-polish` (no argument)
**Expected behavior:**
1. Skill detects no argument is provided
2. Outputs usage guidance: e.g., "Usage: `/team-polish [feature or area]` — specify the feature or area to polish (e.g., `combat`, `main menu`, `inventory system`, `level-1`)"
3. Skill exits without spawning any agents
**Assertions:**
- [ ] Skill does NOT spawn any agents when no argument is provided
- [ ] Usage message includes the correct invocation format with argument examples
- [ ] Skill does NOT attempt to guess a feature from project files
- [ ] No `AskUserQuestion` is used — output is direct guidance
---
### Case 4: Engine-Level Bottleneck — engine-programmer spawned conditionally in Phase 2
**Fixture:**
- Feature being polished: `open-world` environment streaming
- Phase 1 identifies a performance bottleneck with a root cause in the rendering pipeline: "draw call overhead is caused by the engine's scene tree traversal in the spatial indexer — this is an engine-level issue, not a game code issue"
- Performance budgets are defined; the rendering overhead exceeds target frame budget
**Input:** `/team-polish open-world`
**Expected behavior:**
1. Phase 1: performance-analyst profiles the environment; identifies frame budget violation; root cause analysis points to engine-level rendering pipeline (spatial indexer traversal overhead)
2. Phase 1 output explicitly classifies the root cause as engine-level
3. `AskUserQuestion` presents the performance report including the engine-level root cause; user approves before Phase 2
4. Phase 2: performance-analyst is spawned for game-code-level optimizations AND engine-programmer is spawned in parallel for the engine-level rendering fix
5. Phases 3 and 4 also run in parallel with Phase 2 (visual and audio polish)
6. engine-programmer addresses the spatial indexer traversal; provides profiler validation showing the fix reduces overhead
7. Phase 5: qa-tester runs regression tests including tests for the engine-level fix
8. Phase 6: orchestrator collects all results; if metrics are now within budget, verdict is READY FOR RELEASE; if not, NEEDS MORE WORK
**Assertions:**
- [ ] engine-programmer is NOT spawned in Phase 2 unless Phase 1 explicitly identifies an engine-level root cause
- [ ] engine-programmer is spawned in Phase 2 when Phase 1 identifies an engine-level root cause
- [ ] engine-programmer and performance-analyst Task calls in Phase 2 are issued simultaneously (not sequentially)
- [ ] Phases 3 and 4 also run in parallel with Phase 2 (not deferred until Phase 2 completes)
- [ ] engine-programmer's output includes profiler validation of the fix
- [ ] qa-tester in Phase 5 runs regression tests that cover the engine-level change
- [ ] Verdict correctly reflects whether all metrics including the engine fix now meet budgets
---
### Case 5: Regression Found — Polish change broke an existing feature
**Fixture:**
- Feature being polished: `inventory-ui`
- Phases 14 complete successfully; performance and polish changes are applied
- Phase 5: qa-tester runs regression tests and finds that a shader optimization applied in Phase 3 broke the item highlight glow effect on hover — an existing feature that was working before the polish pass
**Input:** `/team-polish inventory-ui` (Phase 5 scenario)
**Expected behavior:**
1. Phases 14 complete; polish changes include a shader optimization from technical-artist
2. Phase 5: qa-tester runs regression tests and detects "Item highlight glow on hover no longer renders — regression introduced by shader optimization in Phase 3"
3. qa-tester returns test results with the regression noted
4. Orchestrator surfaces the regression immediately: "qa-tester: REGRESSION FOUND — `item-highlight-hover` glow broken by Phase 3 shader optimization"
5. Subagent files a bug report asking "May I write the bug report to `production/qa/evidence/bug-polish-inventory-ui-[date].md`?" before writing
6. Bug report is written after approval; it includes: the broken behavior, the polish change that caused it, reproduction steps, and severity
7. `AskUserQuestion` presents the regression with options:
- Revert the shader optimization and find an alternative approach
- Fix the shader optimization to preserve the glow effect
- Accept the regression and schedule a fix in the next sprint
8. Verdict: NEEDS MORE WORK (regression present regardless of user's chosen resolution path, unless fix is applied within the current session)
**Assertions:**
- [ ] Regression is surfaced before Phase 6 sign-off
- [ ] The specific broken behavior and the responsible change are both named in the report
- [ ] Subagent asks "May I write the bug report to [path]?" before filing
- [ ] Bug report includes: broken behavior, causal change, reproduction steps, severity
- [ ] `AskUserQuestion` offers options including revert, fix in place, and schedule later
- [ ] Verdict is NEEDS MORE WORK when a regression is present and unresolved
- [ ] Verdict may become READY FOR RELEASE only if the regression is fixed within the current polish session and qa-tester re-runs to confirm
---
## Protocol Compliance
- [ ] Phase 1 (assessment) must complete before any other phase begins
- [ ] `AskUserQuestion` is used after every phase output before the next phase launches
- [ ] Phases 3 and 4 are always launched in parallel with Phase 2 (not deferred)
- [ ] engine-programmer is only spawned when Phase 1 explicitly identifies engine-level root causes
- [ ] No files are written by the orchestrator directly — all writes are delegated to sub-agents
- [ ] Each sub-agent enforces the "May I write to [path]?" protocol before any write
- [ ] BLOCKED status from any agent is surfaced immediately — not silently skipped
- [ ] A partial report is always produced when some agents complete and others block
- [ ] Verdict is exactly READY FOR RELEASE or NEEDS MORE WORK — no other verdict values used
- [ ] NEEDS MORE WORK verdict always lists specific remaining issues with severity
- [ ] Next Steps handoff references `/release-checklist` (on success) and `/sprint-plan update` + `/gate-check` (on failure)
---
## Coverage Notes
- The tools-programmer optional agent (for content pipeline tool verification) is not
separately tested — it follows the same conditional spawn pattern as engine-programmer
and is invoked only when content authoring tools are involved in the polished area.
- The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error
Recovery Protocol are not separately tested — they follow the same `AskUserQuestion`
+ partial-report pattern validated in Cases 2 and 5.
- Phase 6 sign-off logic (collecting and comparing all metrics) is validated implicitly
by Cases 1 and 2. The distinction between READY FOR RELEASE and NEEDS MORE WORK is
exercised in both directions across these cases.
- Soak testing and stress testing (Phase 5) are validated implicitly by Case 1's
qa-tester output. Case 5 focuses on the regression detection aspect of Phase 5.
- The "minimum spec hardware" test path in Phase 5 is not separately tested — it follows
the same qa-tester delegation pattern when the hardware is available.