Files
2026-05-15 14:52:29 +08:00

4.7 KiB
Raw Permalink Blame History

Agent Test Spec: performance-analyst

Agent Summary

Domain: Profiling, bottleneck identification, performance metrics tracking, and optimization recommendations. Does NOT own: implementing optimizations (belongs to the appropriate programmer for that domain). Model tier: Sonnet (default). No gate IDs assigned.


Static Assertions (Structural)

  • description: field is present and domain-specific (references profiling / bottleneck analysis / performance metrics)
  • allowed-tools: list includes Read, Write, Edit, Bash, Glob, Grep
  • Model tier is Sonnet (default for specialists)
  • Agent definition does not claim authority over implementing any optimization — explicitly identifies itself as analysis/recommendation only

Test Cases

Case 1: In-domain request — appropriate output

Input: "Analyze this frame time data: CPU 14ms, GPU 8ms, physics 6ms, draw calls 420, scripts 3ms." Expected behavior:

  • Identifies the primary bottleneck: CPU is over a 16.67ms (60fps) budget at 14ms total
  • Breaks down contributors: physics (6ms, 43% of CPU time) is the top culprit
  • Draw calls (420) flags as a secondary concern if the budget limit is lower (e.g., 200 draw calls per technical-preferences.md)
  • Produces a prioritized bottleneck report:
    1. Physics — 6ms, reduce simulation frequency or switch broadphase algorithm
    2. Draw calls — 420, implement batching or LOD
    3. Scripts — 3ms, profile hot paths
  • Does NOT implement any of these optimizations

Case 2: Out-of-domain request — redirects correctly

Input: "Implement the batching optimization to reduce draw calls from 420 to under 200." Expected behavior:

  • Does NOT produce implementation code for batching
  • Explicitly states that implementing optimizations belongs to the appropriate programmer (engine-programmer for rendering batching)
  • Redirects the implementation to engine-programmer with the recommendation context attached
  • May produce a requirements brief for the optimization so engine-programmer has a clear target

Case 3: Regression identification

Input: "Performance dropped significantly after last week's commits. Frame time went from 10ms to 18ms." Expected behavior:

  • Proposes a bisection strategy to identify the offending commit range
  • Requests or reviews the diff of commits in the window to narrow the likely cause
  • Identifies affected systems based on what changed (e.g., if physics code was modified, points to physics as the primary suspect)
  • Produces a regression report naming the probable commit, the affected system, and the measured delta

Case 4: Recommendation vs. code quality trade-off

Input: "The fastest optimization for the script bottleneck would be to inline all calls and remove abstraction layers." Expected behavior:

  • Surfaces the trade-off: inlining improves performance but reduces testability and violates the coding standard requiring unit-testable public methods
  • Does NOT recommend the optimization without noting the code quality cost
  • Escalates the trade-off to lead-programmer for a decision
  • May propose a middle path (e.g., profile-guided inlining of only the hottest 23 methods) that preserves testability

Case 5: Context pass — technical-preferences.md budget

Input: Technical preferences from context: Target 60fps, frame budget 16.67ms, draw calls max 200, memory ceiling 512MB. Request: "Review the current build profile." Expected behavior:

  • References the specific values from the provided context: 16.67ms, 200 draw calls, 512MB
  • Compares current measurements against each threshold explicitly
  • Labels each metric as WITHIN BUDGET / AT RISK / OVER BUDGET based on the provided numbers
  • Does NOT use different budget numbers than those provided in the context

Protocol Compliance

  • Stays within declared domain (profiling, analysis, recommendations — not implementation)
  • Redirects optimization implementation to the correct programmer domain agent
  • Returns structured findings (bottleneck report with severity, measured values, and recommended action owner)
  • Escalates code-quality trade-offs to lead-programmer rather than deciding unilaterally
  • Applies budget thresholds from provided context rather than assumed defaults
  • Labels all findings with a specific action owner (who should implement the fix)

Coverage Notes

  • Frame time analysis (Case 1) output should be structured as a report filed in production/qa/evidence/
  • Regression case (Case 3) confirms the agent investigates cause, not just measures symptoms
  • Code quality trade-off (Case 4) verifies the agent does not recommend optimizations that violate coding standards without flagging the conflict