Files
pixelheros/CCGS Skill Testing Framework/agents/directors/producer.md
2026-05-15 14:52:29 +08:00

6.0 KiB

Agent Test Spec: producer

Agent Summary

Domain owned: Scope management, sprint planning validation, milestone tracking, epic prioritization, production phase gate. Does NOT own: Game design decisions (creative-director / game-designer), technical architecture (technical-director), creative direction. Model tier: Opus (multi-document synthesis, high-stakes phase gate verdicts). Gate IDs handled: PR-SCOPE, PR-SPRINT, PR-MILESTONE, PR-EPIC, PR-PHASE-GATE.


Static Assertions (Structural)

Verified by reading the agent's .claude/agents/producer.md frontmatter:

  • description: field is present and domain-specific (references scope, sprint, milestone, production — not generic)
  • allowed-tools: list is primarily read-focused; Bash only if sprint/milestone files require parsing
  • Model tier is claude-opus-4-6 per coordination-rules.md (directors with gate synthesis = Opus)
  • Agent definition does not claim authority over design decisions or technical architecture

Test Cases

Case 1: In-domain request — appropriate output format

Scenario: A sprint plan is submitted for Sprint 7. The plan includes 12 story points across 4 team members over 2 weeks. Historical velocity from the last 3 sprints averages 11.5 points. Request is tagged PR-SPRINT. Expected: Returns PR-SPRINT: REALISTIC with rationale noting the plan is within one standard deviation of historical velocity and capacity appears matched. Assertions:

  • Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC
  • Verdict token is formatted as PR-SPRINT: REALISTIC
  • Rationale references the specific story point count and historical velocity figures
  • Output stays within production scope — does not comment on whether the stories are well-designed or technically sound

Case 2: Out-of-domain request — redirects or escalates

Scenario: Team member asks producer to evaluate whether the game's "weight-based inventory" mechanic feels fun and engaging. Expected: Agent declines to evaluate game feel and redirects to game-designer or creative-director. Assertions:

  • Does not make any binding assessment of the mechanic's design quality
  • Explicitly names game-designer or creative-director as the correct handler
  • May note if the mechanic's scope has production implications (e.g., dependencies on other systems), but defers all design evaluation

Case 3: Gate verdict — correct vocabulary

Scenario: A new feature proposal adds three new systems (crafting, weather, and faction reputation) to a milestone that was scoped for two systems only. None of these additions appear in the current milestone plan. Request is tagged PR-SCOPE. Expected: Returns PR-SCOPE: CONCERNS with specific identification of the three unplanned systems and their absence from the milestone scope document. Assertions:

  • Verdict is exactly one of REALISTIC / CONCERNS / UNREALISTIC — not freeform text
  • Verdict token is formatted as PR-SCOPE: CONCERNS
  • Rationale names the three specific systems being added out of scope
  • Does not evaluate whether the systems are good design — only whether they fit the plan

Case 4: Conflict escalation — correct parent

Scenario: game-designer wants to add a late-breaking mechanic (dynamic weather affecting all gameplay systems) that technical-director warns will require 3 additional sprints. game-designer and technical-director are in disagreement about whether to proceed. Expected: Producer does not take a side on whether the mechanic is worth adding (design decision) or feasible (technical decision). Producer quantifies the production impact (3 sprints of delay, milestone slip risk), presents the trade-off to the user, and follows coordination-rules.md conflict resolution: escalate to the shared parent (in this case, surface the conflict for user decision since creative-director and technical-director are both top-tier). Assertions:

  • Quantifies the production impact in concrete terms (sprint count, milestone date slip)
  • Does not make a binding design or technical decision
  • Surfaces the conflict to the user with the scope implications clearly stated
  • References coordination-rules.md conflict resolution protocol (escalate to shared parent or user)

Case 5: Context pass — uses provided context

Scenario: Agent receives a gate context block that includes the current milestone deadline (8 weeks away) and velocity data from the last 4 sprints (8, 10, 9, 11 points). A sprint plan is submitted with 14 story points. Expected: Assessment uses the provided velocity data to project whether 14 points is achievable, and references the 8-week milestone window to assess whether the current sprint's scope leaves adequate buffer. Assertions:

  • Uses the specific velocity figures from the provided context (not generic estimates)
  • References the 8-week deadline in the capacity assessment
  • Calculates or estimates remaining sprint count within the milestone window
  • Does not give generic scope advice disconnected from the supplied deadline and velocity data

Protocol Compliance

  • Returns verdicts using REALISTIC / CONCERNS / UNREALISTIC vocabulary only
  • Stays within declared production domain
  • Escalates design/technical conflicts by quantifying scope impact and presenting to user
  • Uses gate IDs in output (e.g., PR-SPRINT: REALISTIC) not inline prose verdicts
  • Does not make binding game design or technical architecture decisions

Coverage Notes

  • PR-EPIC (epic-level prioritization) is not covered — a dedicated case should be added when the /create-epics skill produces structured epic documents.
  • PR-MILESTONE (milestone health review) is not covered — deferred to integration with /milestone-review skill.
  • PR-PHASE-GATE (full production phase advancement) involving synthesis of multiple sub-gate results is deferred.
  • Multi-sprint burn-down and velocity trend analysis are not covered here.