Files
pixelheros/CCGS Skill Testing Framework/skills/team/team-polish.md
2026-05-15 14:52:29 +08:00

14 KiB
Raw Blame History

Skill Test Spec: /team-polish

Skill Summary

Orchestrates the polish team through a six-phase pipeline: performance assessment (performance-analyst) → optimization (performance-analyst, optionally with engine-programmer when engine-level root causes are found) → visual polish (technical-artist, parallel with Phase 2) → audio polish (sound-designer, parallel with Phase 2) → hardening (qa-tester) → sign-off (orchestrator collects all results and issues READY FOR RELEASE or NEEDS MORE WORK). Uses AskUserQuestion at each phase transition. Engine-programmer is spawned conditionally only when Phase 1 identifies engine-level root causes. Verdict is READY FOR RELEASE or NEEDS MORE WORK.


Static Assertions (Structural)

  • Has required frontmatter fields: name, description, argument-hint, user-invocable, allowed-tools
  • Has ≥2 phase headings
  • Contains verdict keywords: READY FOR RELEASE, NEEDS MORE WORK
  • Contains "File Write Protocol" section
  • File writes are delegated to sub-agents — orchestrator does not write files directly
  • Sub-agents enforce "May I write to [path]?" before any write
  • Has a next-step handoff at the end (references /release-checklist, /sprint-plan update, /gate-check)
  • Error Recovery Protocol section is present
  • AskUserQuestion is used at phase transitions before proceeding
  • Phase 3 (visual polish) and Phase 4 (audio polish) are explicitly run in parallel with Phase 2
  • engine-programmer is conditionally spawned in Phase 2 only when Phase 1 identifies engine-level root causes
  • Phase 6 sign-off compares metrics against budgets before issuing verdict

Test Cases

Case 1: Happy Path — Full pipeline completes, READY FOR RELEASE verdict

Fixture:

  • Feature exists and is functionally complete (e.g., combat system)
  • Performance budgets are defined in technical-preferences.md (e.g., target 60fps, 16ms frame budget)
  • No frame budget violations exist before polishing begins
  • No audio events are missing; VFX assets are complete
  • No regressions are introduced by polish changes

Input: /team-polish combat

Expected behavior:

  1. Phase 1: performance-analyst is spawned; profiles the combat system, measures frame budget, checks memory usage; output: performance report showing all metrics within budget, no violations
  2. AskUserQuestion presents performance report; user approves before Phases 2, 3, and 4 begin
  3. Phase 2: performance-analyst applies minor optimizations (e.g., draw call batching); no engine-programmer needed (no engine-level root causes identified)
  4. Phases 3 and 4 are launched in parallel alongside Phase 2:
    • Phase 3: technical-artist reviews VFX for quality, optimizes particle systems, adds screen shake and visual juice
    • Phase 4: sound-designer reviews audio events for completeness, checks mix levels, adds ambient audio layers
  5. All three parallel phases complete; AskUserQuestion presents results; user approves before Phase 5 begins
  6. Phase 5: qa-tester runs edge case tests, soak tests, stress tests, and regression tests; all pass
  7. AskUserQuestion presents test results; user approves before Phase 6
  8. Phase 6: orchestrator collects all results; compares before/after performance metrics against budgets; all metrics pass
  9. Subagent asks "May I write the polish report to production/qa/evidence/polish-combat-[date].md?" before writing
  10. Verdict: READY FOR RELEASE

Assertions:

  • performance-analyst is spawned first in Phase 1 before any other agents
  • AskUserQuestion appears after Phase 1 output and before Phases 2/3/4 launch
  • Phases 3 and 4 Task calls are issued at the same time as Phase 2 (not after Phase 2 completes)
  • engine-programmer is NOT spawned when Phase 1 finds no engine-level root causes
  • qa-tester (Phase 5) is not launched until the parallel phases complete and user approves
  • Phase 6 verdict is based on comparison of metrics against defined budgets
  • Summary report includes: before/after performance metrics, visual polish changes, audio polish changes, test results
  • No files are written by the orchestrator directly
  • Verdict is READY FOR RELEASE

Case 2: Performance Blocker — Frame budget violation cannot be fully resolved

Fixture:

  • Feature being polished: particle-storm VFX system
  • Phase 1 identifies a frame budget violation: particle-storm costs 12ms on target hardware (budget is 6ms for this system)
  • Phase 2 performance-analyst applies optimizations reducing cost to 9ms — still over the 6ms budget
  • Phase 2 cannot fully resolve the violation without a fundamental design change

Input: /team-polish particle-storm

Expected behavior:

  1. Phase 1: performance-analyst identifies the 12ms frame cost vs. 6ms budget; reports "FRAME BUDGET VIOLATION: particle-storm costs 12ms, budget is 6ms"
  2. AskUserQuestion presents the violation; user chooses to proceed with optimization attempt
  3. Phase 2: performance-analyst applies optimizations; achieves 9ms — reduced but still over budget; reports "Optimization reduced cost to 9ms (was 12ms) — 3ms over budget. No further gains achievable without design changes."
  4. Phases 3 and 4 run in parallel with Phase 2 (visual and audio polish)
  5. Phase 5: qa-tester runs regression and edge case tests; all pass
  6. Phase 6: orchestrator collects results; frame budget violation (9ms vs 6ms budget) remains unresolved
  7. Verdict: NEEDS MORE WORK
  8. Report lists the specific unresolved issue: "particle-storm frame cost (9ms) exceeds budget (6ms) by 3ms — requires design scope reduction or budget renegotiation"
  9. Next Steps: schedule the remaining issue in /sprint-plan update; re-run /team-polish after fix

Assertions:

  • Frame budget violation is flagged in Phase 1 with specific numbers (actual vs. budget)
  • Phase 2 reports the post-optimization metric explicitly (9ms achieved, 3ms still over)
  • Verdict is NEEDS MORE WORK (not READY FOR RELEASE) when a budget violation remains
  • The specific unresolved issue is listed by name with the remaining gap quantified
  • Next Steps references /sprint-plan update for scheduling the remaining fix
  • Phases 3 and 4 still run (polish work is not abandoned due to a Phase 2 partial resolution)
  • Phase 5 qa-tester still runs (regression testing is independent of the performance outcome)

Case 3: No Argument — Usage guidance shown

Fixture:

  • Any project state

Input: /team-polish (no argument)

Expected behavior:

  1. Skill detects no argument is provided
  2. Outputs usage guidance: e.g., "Usage: /team-polish [feature or area] — specify the feature or area to polish (e.g., combat, main menu, inventory system, level-1)"
  3. Skill exits without spawning any agents

Assertions:

  • Skill does NOT spawn any agents when no argument is provided
  • Usage message includes the correct invocation format with argument examples
  • Skill does NOT attempt to guess a feature from project files
  • No AskUserQuestion is used — output is direct guidance

Case 4: Engine-Level Bottleneck — engine-programmer spawned conditionally in Phase 2

Fixture:

  • Feature being polished: open-world environment streaming
  • Phase 1 identifies a performance bottleneck with a root cause in the rendering pipeline: "draw call overhead is caused by the engine's scene tree traversal in the spatial indexer — this is an engine-level issue, not a game code issue"
  • Performance budgets are defined; the rendering overhead exceeds target frame budget

Input: /team-polish open-world

Expected behavior:

  1. Phase 1: performance-analyst profiles the environment; identifies frame budget violation; root cause analysis points to engine-level rendering pipeline (spatial indexer traversal overhead)
  2. Phase 1 output explicitly classifies the root cause as engine-level
  3. AskUserQuestion presents the performance report including the engine-level root cause; user approves before Phase 2
  4. Phase 2: performance-analyst is spawned for game-code-level optimizations AND engine-programmer is spawned in parallel for the engine-level rendering fix
  5. Phases 3 and 4 also run in parallel with Phase 2 (visual and audio polish)
  6. engine-programmer addresses the spatial indexer traversal; provides profiler validation showing the fix reduces overhead
  7. Phase 5: qa-tester runs regression tests including tests for the engine-level fix
  8. Phase 6: orchestrator collects all results; if metrics are now within budget, verdict is READY FOR RELEASE; if not, NEEDS MORE WORK

Assertions:

  • engine-programmer is NOT spawned in Phase 2 unless Phase 1 explicitly identifies an engine-level root cause
  • engine-programmer is spawned in Phase 2 when Phase 1 identifies an engine-level root cause
  • engine-programmer and performance-analyst Task calls in Phase 2 are issued simultaneously (not sequentially)
  • Phases 3 and 4 also run in parallel with Phase 2 (not deferred until Phase 2 completes)
  • engine-programmer's output includes profiler validation of the fix
  • qa-tester in Phase 5 runs regression tests that cover the engine-level change
  • Verdict correctly reflects whether all metrics including the engine fix now meet budgets

Case 5: Regression Found — Polish change broke an existing feature

Fixture:

  • Feature being polished: inventory-ui
  • Phases 14 complete successfully; performance and polish changes are applied
  • Phase 5: qa-tester runs regression tests and finds that a shader optimization applied in Phase 3 broke the item highlight glow effect on hover — an existing feature that was working before the polish pass

Input: /team-polish inventory-ui (Phase 5 scenario)

Expected behavior:

  1. Phases 14 complete; polish changes include a shader optimization from technical-artist
  2. Phase 5: qa-tester runs regression tests and detects "Item highlight glow on hover no longer renders — regression introduced by shader optimization in Phase 3"
  3. qa-tester returns test results with the regression noted
  4. Orchestrator surfaces the regression immediately: "qa-tester: REGRESSION FOUND — item-highlight-hover glow broken by Phase 3 shader optimization"
  5. Subagent files a bug report asking "May I write the bug report to production/qa/evidence/bug-polish-inventory-ui-[date].md?" before writing
  6. Bug report is written after approval; it includes: the broken behavior, the polish change that caused it, reproduction steps, and severity
  7. AskUserQuestion presents the regression with options:
    • Revert the shader optimization and find an alternative approach
    • Fix the shader optimization to preserve the glow effect
    • Accept the regression and schedule a fix in the next sprint
  8. Verdict: NEEDS MORE WORK (regression present regardless of user's chosen resolution path, unless fix is applied within the current session)

Assertions:

  • Regression is surfaced before Phase 6 sign-off
  • The specific broken behavior and the responsible change are both named in the report
  • Subagent asks "May I write the bug report to [path]?" before filing
  • Bug report includes: broken behavior, causal change, reproduction steps, severity
  • AskUserQuestion offers options including revert, fix in place, and schedule later
  • Verdict is NEEDS MORE WORK when a regression is present and unresolved
  • Verdict may become READY FOR RELEASE only if the regression is fixed within the current polish session and qa-tester re-runs to confirm

Protocol Compliance

  • Phase 1 (assessment) must complete before any other phase begins
  • AskUserQuestion is used after every phase output before the next phase launches
  • Phases 3 and 4 are always launched in parallel with Phase 2 (not deferred)
  • engine-programmer is only spawned when Phase 1 explicitly identifies engine-level root causes
  • No files are written by the orchestrator directly — all writes are delegated to sub-agents
  • Each sub-agent enforces the "May I write to [path]?" protocol before any write
  • BLOCKED status from any agent is surfaced immediately — not silently skipped
  • A partial report is always produced when some agents complete and others block
  • Verdict is exactly READY FOR RELEASE or NEEDS MORE WORK — no other verdict values used
  • NEEDS MORE WORK verdict always lists specific remaining issues with severity
  • Next Steps handoff references /release-checklist (on success) and /sprint-plan update + /gate-check (on failure)

Coverage Notes

  • The tools-programmer optional agent (for content pipeline tool verification) is not separately tested — it follows the same conditional spawn pattern as engine-programmer and is invoked only when content authoring tools are involved in the polished area.
  • The "Retry with narrower scope" and "Skip this agent" resolution paths from the Error Recovery Protocol are not separately tested — they follow the same AskUserQuestion
    • partial-report pattern validated in Cases 2 and 5.
  • Phase 6 sign-off logic (collecting and comparing all metrics) is validated implicitly by Cases 1 and 2. The distinction between READY FOR RELEASE and NEEDS MORE WORK is exercised in both directions across these cases.
  • Soak testing and stress testing (Phase 5) are validated implicitly by Case 1's qa-tester output. Case 5 focuses on the regression detection aspect of Phase 5.
  • The "minimum spec hardware" test path in Phase 5 is not separately tested — it follows the same qa-tester delegation pattern when the hardware is available.