# Agent Test Spec: performance-analyst

## Agent Summary
Domain: Profiling, bottleneck identification, performance metrics tracking, and optimization recommendations.
Does NOT own: implementing optimizations (belongs to the appropriate programmer for that domain).
Model tier: Sonnet (default).
No gate IDs assigned.

---

## Static Assertions (Structural)

- [ ] `description:` field is present and domain-specific (references profiling / bottleneck analysis / performance metrics)
- [ ] `allowed-tools:` list includes Read, Write, Edit, Bash, Glob, Grep
- [ ] Model tier is Sonnet (default for specialists)
- [ ] Agent definition does not claim authority over implementing any optimization — explicitly identifies itself as analysis/recommendation only

---

## Test Cases

### Case 1: In-domain request — appropriate output
**Input:** "Analyze this frame time data: CPU 14ms, GPU 8ms, physics 6ms, draw calls 420, scripts 3ms."
**Expected behavior:**
- Identifies the primary bottleneck: CPU is over a 16.67ms (60fps) budget at 14ms total
- Breaks down contributors: physics (6ms, 43% of CPU time) is the top culprit
- Draw calls (420) flags as a secondary concern if the budget limit is lower (e.g., 200 draw calls per technical-preferences.md)
- Produces a prioritized bottleneck report:
  1. Physics — 6ms, reduce simulation frequency or switch broadphase algorithm
  2. Draw calls — 420, implement batching or LOD
  3. Scripts — 3ms, profile hot paths
- Does NOT implement any of these optimizations

### Case 2: Out-of-domain request — redirects correctly
**Input:** "Implement the batching optimization to reduce draw calls from 420 to under 200."
**Expected behavior:**
- Does NOT produce implementation code for batching
- Explicitly states that implementing optimizations belongs to the appropriate programmer (engine-programmer for rendering batching)
- Redirects the implementation to `engine-programmer` with the recommendation context attached
- May produce a requirements brief for the optimization so engine-programmer has a clear target

### Case 3: Regression identification
**Input:** "Performance dropped significantly after last week's commits. Frame time went from 10ms to 18ms."
**Expected behavior:**
- Proposes a bisection strategy to identify the offending commit range
- Requests or reviews the diff of commits in the window to narrow the likely cause
- Identifies affected systems based on what changed (e.g., if physics code was modified, points to physics as the primary suspect)
- Produces a regression report naming the probable commit, the affected system, and the measured delta

### Case 4: Recommendation vs. code quality trade-off
**Input:** "The fastest optimization for the script bottleneck would be to inline all calls and remove abstraction layers."
**Expected behavior:**
- Surfaces the trade-off: inlining improves performance but reduces testability and violates the coding standard requiring unit-testable public methods
- Does NOT recommend the optimization without noting the code quality cost
- Escalates the trade-off to `lead-programmer` for a decision
- May propose a middle path (e.g., profile-guided inlining of only the hottest 2–3 methods) that preserves testability

### Case 5: Context pass — technical-preferences.md budget
**Input:** Technical preferences from context: Target 60fps, frame budget 16.67ms, draw calls max 200, memory ceiling 512MB. Request: "Review the current build profile."
**Expected behavior:**
- References the specific values from the provided context: 16.67ms, 200 draw calls, 512MB
- Compares current measurements against each threshold explicitly
- Labels each metric as WITHIN BUDGET / AT RISK / OVER BUDGET based on the provided numbers
- Does NOT use different budget numbers than those provided in the context

---

## Protocol Compliance

- [ ] Stays within declared domain (profiling, analysis, recommendations — not implementation)
- [ ] Redirects optimization implementation to the correct programmer domain agent
- [ ] Returns structured findings (bottleneck report with severity, measured values, and recommended action owner)
- [ ] Escalates code-quality trade-offs to lead-programmer rather than deciding unilaterally
- [ ] Applies budget thresholds from provided context rather than assumed defaults
- [ ] Labels all findings with a specific action owner (who should implement the fix)

---

## Coverage Notes
- Frame time analysis (Case 1) output should be structured as a report filed in `production/qa/evidence/`
- Regression case (Case 3) confirms the agent investigates cause, not just measures symptoms
- Code quality trade-off (Case 4) verifies the agent does not recommend optimizations that violate coding standards without flagging the conflict