4.7 KiB
Agent Test Spec: performance-analyst
Agent Summary
Domain: Profiling, bottleneck identification, performance metrics tracking, and optimization recommendations. Does NOT own: implementing optimizations (belongs to the appropriate programmer for that domain). Model tier: Sonnet (default). No gate IDs assigned.
Static Assertions (Structural)
description:field is present and domain-specific (references profiling / bottleneck analysis / performance metrics)allowed-tools:list includes Read, Write, Edit, Bash, Glob, Grep- Model tier is Sonnet (default for specialists)
- Agent definition does not claim authority over implementing any optimization — explicitly identifies itself as analysis/recommendation only
Test Cases
Case 1: In-domain request — appropriate output
Input: "Analyze this frame time data: CPU 14ms, GPU 8ms, physics 6ms, draw calls 420, scripts 3ms." Expected behavior:
- Identifies the primary bottleneck: CPU is over a 16.67ms (60fps) budget at 14ms total
- Breaks down contributors: physics (6ms, 43% of CPU time) is the top culprit
- Draw calls (420) flags as a secondary concern if the budget limit is lower (e.g., 200 draw calls per technical-preferences.md)
- Produces a prioritized bottleneck report:
- Physics — 6ms, reduce simulation frequency or switch broadphase algorithm
- Draw calls — 420, implement batching or LOD
- Scripts — 3ms, profile hot paths
- Does NOT implement any of these optimizations
Case 2: Out-of-domain request — redirects correctly
Input: "Implement the batching optimization to reduce draw calls from 420 to under 200." Expected behavior:
- Does NOT produce implementation code for batching
- Explicitly states that implementing optimizations belongs to the appropriate programmer (engine-programmer for rendering batching)
- Redirects the implementation to
engine-programmerwith the recommendation context attached - May produce a requirements brief for the optimization so engine-programmer has a clear target
Case 3: Regression identification
Input: "Performance dropped significantly after last week's commits. Frame time went from 10ms to 18ms." Expected behavior:
- Proposes a bisection strategy to identify the offending commit range
- Requests or reviews the diff of commits in the window to narrow the likely cause
- Identifies affected systems based on what changed (e.g., if physics code was modified, points to physics as the primary suspect)
- Produces a regression report naming the probable commit, the affected system, and the measured delta
Case 4: Recommendation vs. code quality trade-off
Input: "The fastest optimization for the script bottleneck would be to inline all calls and remove abstraction layers." Expected behavior:
- Surfaces the trade-off: inlining improves performance but reduces testability and violates the coding standard requiring unit-testable public methods
- Does NOT recommend the optimization without noting the code quality cost
- Escalates the trade-off to
lead-programmerfor a decision - May propose a middle path (e.g., profile-guided inlining of only the hottest 2–3 methods) that preserves testability
Case 5: Context pass — technical-preferences.md budget
Input: Technical preferences from context: Target 60fps, frame budget 16.67ms, draw calls max 200, memory ceiling 512MB. Request: "Review the current build profile." Expected behavior:
- References the specific values from the provided context: 16.67ms, 200 draw calls, 512MB
- Compares current measurements against each threshold explicitly
- Labels each metric as WITHIN BUDGET / AT RISK / OVER BUDGET based on the provided numbers
- Does NOT use different budget numbers than those provided in the context
Protocol Compliance
- Stays within declared domain (profiling, analysis, recommendations — not implementation)
- Redirects optimization implementation to the correct programmer domain agent
- Returns structured findings (bottleneck report with severity, measured values, and recommended action owner)
- Escalates code-quality trade-offs to lead-programmer rather than deciding unilaterally
- Applies budget thresholds from provided context rather than assumed defaults
- Labels all findings with a specific action owner (who should implement the fix)
Coverage Notes
- Frame time analysis (Case 1) output should be structured as a report filed in
production/qa/evidence/ - Regression case (Case 3) confirms the agent investigates cause, not just measures symptoms
- Code quality trade-off (Case 4) verifies the agent does not recommend optimizations that violate coding standards without flagging the conflict