250 lines
14 KiB
Markdown
250 lines
14 KiB
Markdown
# Skill Quality Rubric
|
||
|
||
Used by `/skill-test category [name|all]` to evaluate skills beyond structural compliance.
|
||
Each category defines 4–5 binary PASS/FAIL metrics specific to the skill's job.
|
||
|
||
A metric is PASS when the skill's written instructions clearly satisfy the criterion.
|
||
A metric is FAIL when the instructions are absent, ambiguous, or contradictory.
|
||
A metric is WARN when the instructions partially address the criterion.
|
||
|
||
---
|
||
|
||
## Skill Categories
|
||
|
||
### `gate`
|
||
|
||
**Skills**: gate-check
|
||
|
||
Gate skills control phase transitions. They must enforce correctness without
|
||
auto-advancing stage and must respect the three review modes.
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **G1 — Review mode read** | Skill reads `production/session-state/review-mode.txt` (or equivalent) before deciding which directors to spawn |
|
||
| **G2 — Full mode: all 4 directors spawn** | In `full` mode, all 4 Tier-1 directors (CD, TD, PR, AD) PHASE-GATE prompts are invoked in parallel |
|
||
| **G3 — Lean mode: PHASE-GATE only** | In `lean` mode, only `*-PHASE-GATE` gates run; inline gates (CD-PILLARS, TD-ARCHITECTURE, etc.) are skipped |
|
||
| **G4 — Solo mode: no directors** | In `solo` mode, no director gates spawn; each is noted as "skipped — Solo mode" |
|
||
| **G5 — No auto-advance** | Skill never writes `production/stage.txt` without explicit user confirmation via "May I write" |
|
||
|
||
---
|
||
|
||
### `review`
|
||
|
||
**Skills**: design-review, architecture-review, review-all-gdds
|
||
|
||
Review skills read documents and produce structured verdicts. They are primarily
|
||
read-only and must not trigger director gates during the analysis phase.
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **R1 — Read-only enforcement** | Skill does not modify the reviewed document without explicit user approval; any write operations (review logs, index updates) are gated behind "May I write" |
|
||
| **R2 — 8-section check** | Skill evaluates all 8 required GDD sections (or equivalent architectural sections) explicitly |
|
||
| **R3 — Correct verdict vocabulary** | Verdict is exactly one of: APPROVED / NEEDS REVISION / MAJOR REVISION NEEDED (design) or PASS / CONCERNS / FAIL (architecture) |
|
||
| **R4 — No director gates during analysis** | Skill does not spawn director gates during its analysis phases; post-analysis director review (as in architecture-review) is acceptable when the skill's scope and stakes warrant it |
|
||
| **R5 — Structured findings** | Output contains a per-section status table or checklist before the final verdict |
|
||
|
||
> **Exceptions:**
|
||
> - `design-review`: Has `Write, Edit` in allowed-tools to support an optional "Revise now" path (all writes gated behind user approval) and to write review logs. R1 is satisfied because the reviewed document is never silently modified.
|
||
> - `architecture-review`: Spawns TD-ARCHITECTURE and LP-FEASIBILITY gates after its analysis is complete. This is intentional — architecture review is high-stakes and benefits from director sign-off. R4 is satisfied because the gates run post-analysis, not during it.
|
||
|
||
---
|
||
|
||
### `authoring`
|
||
|
||
**Skills**: design-system, quick-design, architecture-decision, ux-design, ux-review, art-bible, create-architecture
|
||
|
||
Authoring skills create or update design documents collaboratively. Full GDD/UX
|
||
authoring skills use a section-by-section cycle; lightweight authoring skills use
|
||
a single-draft pattern appropriate to their smaller scope.
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **A1 — Section-by-section cycle** | Full authoring skills (design-system, ux-design, art-bible) author one section at a time, presenting content for approval before proceeding to the next. Lightweight skills (quick-design, architecture-decision, create-architecture) may draft the complete document then ask for approval — single-draft is acceptable for documents under ~4 hours of implementation scope. |
|
||
| **A2 — May-I-write per section** | Full authoring skills ask "May I write this to [filepath]?" before each section write. Lightweight skills ask once for the complete document. |
|
||
| **A3 — Retrofit mode** | Skill detects if the target file already exists and offers to update specific sections rather than overwriting the whole document. Lightweight skills (quick-design) that always create new files are exempt. |
|
||
| **A4 — Director gate at correct tier** | If a director gate is defined for this skill (e.g., CD-GDD-ALIGN, TD-ADR), it runs at the correct mode threshold (full/lean) — NOT in solo |
|
||
| **A5 — Skeleton-first** | Full authoring skills create a file skeleton with all section headers before filling content, to preserve progress on session interruption. Lightweight skills are exempt. |
|
||
|
||
> **Full authoring skills** (must pass all 5 metrics): `design-system`, `ux-design`, `art-bible`
|
||
> **Lightweight authoring skills** (A1, A2, A5 use single-draft pattern; A3 exempt for new-file-only skills): `quick-design`, `architecture-decision`, `create-architecture`
|
||
> **Review-mode skill** (evaluated against review metrics): `ux-review`
|
||
|
||
---
|
||
|
||
### `readiness`
|
||
|
||
**Skills**: story-readiness, story-done
|
||
|
||
Readiness skills validate stories before or after implementation. They must produce
|
||
multi-dimensional verdicts and integrate correctly with director gate mode.
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **RD1 — Multi-dimensional check** | Skill checks ≥3 independent dimensions (e.g., Design, Architecture, Scope, DoD) and reports each separately |
|
||
| **RD2 — Three verdict levels** | Verdict hierarchy is clearly defined: READY/COMPLETE > NEEDS WORK/COMPLETE WITH NOTES > BLOCKED |
|
||
| **RD3 — BLOCKED requires external action** | BLOCKED verdict is reserved for issues that cannot be fixed by the story author alone (e.g., Proposed ADR, unresolvable dependency) |
|
||
| **RD4 — Director gate at correct mode** | QL-STORY-READY or LP-CODE-REVIEW gate spawns in `full` mode, skips in `lean`/`solo` with a noted skip message |
|
||
| **RD5 — Next-story handoff** | After completion, skill surfaces the next READY story from the active sprint |
|
||
|
||
---
|
||
|
||
### `pipeline`
|
||
|
||
**Skills**: create-epics, create-stories, dev-story, create-control-manifest, propagate-design-change, map-systems
|
||
|
||
Pipeline skills produce artifacts that other skills consume. They must write files
|
||
with correct schema, respect layer/priority ordering, and gate before writing.
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **P1 — Correct output schema** | Each produced file follows the project template (EPIC.md, story frontmatter, etc.); skill references the template path |
|
||
| **P2 — Layer/priority ordering** | Skills that produce epics or stories respect layer ordering (core → extended → meta) and priority fields |
|
||
| **P3 — May-I-write before each artifact** | Skill asks "May I write [artifact]?" before creating each output file, not batch-approving all files at once |
|
||
| **P4 — Director gate at correct tier** | In-scope gates (PR-EPIC, QL-STORY-READY, LP-CODE-REVIEW, etc.) run in `full`, skip in `lean`/`solo` with noted skip |
|
||
| **P5 — Reads before writes** | Skill reads the relevant GDD/ADR/manifest before producing artifacts to ensure alignment |
|
||
|
||
---
|
||
|
||
### `analysis`
|
||
|
||
**Skills**: consistency-check, balance-check, content-audit, code-review, tech-debt,
|
||
scope-check, estimate, perf-profile, asset-audit, security-audit, test-evidence-review, test-flakiness
|
||
|
||
Analysis skills scan the project and surface findings. They are read-only during
|
||
analysis and must ask before recommending any file writes.
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **AN1 — Read-only scan** | Analysis phase uses only Read/Glob/Grep tools; no Write or Edit during the scan itself |
|
||
| **AN2 — Structured findings table** | Output includes a findings table or checklist (not prose only) with severity/priority per finding |
|
||
| **AN3 — No auto-write** | Any suggested file writes (e.g., tech-debt register, fix patches) are gated behind "May I write" |
|
||
| **AN4 — No director gates during analysis** | Analysis skills do not spawn director gates; they produce findings for human review |
|
||
|
||
---
|
||
|
||
### `team`
|
||
|
||
**Skills**: team-combat, team-narrative, team-audio, team-level, team-ui, team-qa,
|
||
team-release, team-polish, team-live-ops
|
||
|
||
Team skills orchestrate multiple specialist agents for a department. They must
|
||
spawn the right agents, run independent ones in parallel, and surface blocks immediately.
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **T1 — Named agent list** | Skill explicitly names which agents it spawns and in what order |
|
||
| **T2 — Parallel where independent** | Agents whose inputs don't depend on each other are spawned in parallel (single message, multiple Task calls) |
|
||
| **T3 — BLOCKED surfacing** | If any spawned agent returns BLOCKED or fails, skill surfaces it immediately and halts dependent work — never silently skips |
|
||
| **T4 — Collect all verdicts before proceeding** | Dependent phases wait for all parallel agents to complete before proceeding |
|
||
| **T5 — Usage error on no argument** | If required argument (e.g., feature name) is missing, skill outputs usage hint and stops without spawning agents |
|
||
|
||
---
|
||
|
||
### `sprint`
|
||
|
||
**Skills**: sprint-plan, sprint-status, milestone-review, retrospective, changelog, patch-notes
|
||
|
||
Sprint skills read production state and produce reports or planning artifacts.
|
||
They have a PR-SPRINT or PR-MILESTONE gate at specific mode thresholds.
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **SP1 — Reads sprint/milestone state** | Skill reads `production/sprints/` or `production/milestones/` before producing output |
|
||
| **SP2 — Correct sprint gate** | PR-SPRINT (for planning) or PR-MILESTONE (for milestone review) gate runs in `full` mode, skips in `lean`/`solo` |
|
||
| **SP3 — Structured output** | Output uses a consistent structure (velocity table, risk list, action items) rather than free prose |
|
||
| **SP4 — No auto-commit** | Skill never writes sprint files or milestone records without "May I write" |
|
||
|
||
---
|
||
|
||
### `utility`
|
||
|
||
**Skills**: start, help, brainstorm, onboard, adopt, hotfix, prototype, localize,
|
||
launch-checklist, release-checklist, smoke-check, soak-test, test-setup, test-helpers,
|
||
regression-suite, qa-plan, bug-triage, bug-report, playtest-report, asset-spec,
|
||
reverse-document, project-stage-detect, setup-engine, skill-test, skill-improve,
|
||
day-one-patch, and any other skills not in categories above
|
||
|
||
Utility skills pass the 7 standard static checks. If they happen to spawn director
|
||
gates, the gate mode logic must also be correct.
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **U1 — Passes all 7 static checks** | `/skill-test static [name]` returns COMPLIANT with 0 FAILs |
|
||
| **U2 — Gate mode correct (if applicable)** | If the skill spawns any director gate, it reads review-mode and applies full/lean/solo logic correctly |
|
||
|
||
---
|
||
|
||
## Agent Categories
|
||
|
||
Used to validate agent spec files in `tests/agents/`.
|
||
|
||
### `director`
|
||
|
||
**Agents**: creative-director, technical-director, art-director, producer
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **D1 — Correct verdict vocabulary** | Returns APPROVE / CONCERNS / REJECT (or domain equivalent: REALISTIC/CONCERNS/UNREALISTIC for producer) |
|
||
| **D2 — Domain boundary respected** | Does not make binding decisions outside its declared domain |
|
||
| **D3 — Conflict escalation** | When two departments conflict, escalates to correct parent (creative-director or technical-director) rather than unilaterally deciding |
|
||
| **D4 — Opus model tier** | Agent is assigned Opus model per coordination-rules.md |
|
||
|
||
### `lead`
|
||
|
||
**Agents**: lead-programmer, qa-lead, narrative-director, audio-director, game-designer,
|
||
systems-designer, level-designer
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **L1 — Domain verdict** | Returns a domain-specific verdict (e.g., FEASIBLE/INFEASIBLE for lead-programmer, PASS/FAIL for qa-lead) |
|
||
| **L2 — Escalates to shared parent** | Out-of-domain conflicts escalate to creative-director (design) or technical-director (tech) |
|
||
| **L3 — Sonnet model tier** | Agent is assigned Sonnet model (default) per coordination-rules.md |
|
||
|
||
### `specialist`
|
||
|
||
**Agents**: gameplay-programmer, ai-programmer, technical-artist, sound-designer,
|
||
engine-programmer, tools-programmer, network-programmer, security-engineer,
|
||
accessibility-specialist, ux-designer, ui-programmer, performance-analyst, prototyper,
|
||
qa-tester, writer, world-builder
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **S1 — Stays in domain** | Explicitly scopes itself to its declared domain; defers out-of-domain requests |
|
||
| **S2 — No binding cross-domain decisions** | Does not unilaterally decide matters owned by another specialist |
|
||
| **S3 — Defers correctly** | Out-of-domain requests are redirected to the correct agent, not refused silently |
|
||
|
||
### `engine`
|
||
|
||
**Agents**: godot-specialist, godot-gdscript-specialist, godot-csharp-specialist,
|
||
godot-shader-specialist, godot-gdextension-specialist, unity-specialist, unity-ui-specialist,
|
||
unity-shader-specialist, unity-dots-specialist, unity-addressables-specialist,
|
||
unreal-specialist, ue-blueprint-specialist, ue-gas-specialist, ue-umg-specialist,
|
||
ue-replication-specialist
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **E1 — Version-aware** | References engine version from `docs/engine-reference/` before suggesting API calls; flags post-cutoff risk |
|
||
| **E2 — File routing** | Routes file types to the correct sub-specialist (e.g., `.gdshader` → godot-shader-specialist, not godot-gdscript-specialist) |
|
||
| **E3 — Engine-specific patterns** | Enforces engine-specific idioms (e.g., GDScript static typing, C# attribute exports, Blueprint function libraries) |
|
||
|
||
### `qa`
|
||
|
||
**Agents**: qa-tester, qa-lead, security-engineer, accessibility-specialist
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **Q1 — Produces artifacts not code** | Primary output is test cases, bug reports, or coverage gaps — not implementation code |
|
||
| **Q2 — Evidence format** | Test cases follow the project's test evidence format (unit/integration/visual/UI per coding-standards.md) |
|
||
| **Q3 — No scope creep** | Does not propose new features; flags gaps for humans to decide |
|
||
|
||
### `operations`
|
||
|
||
**Agents**: devops-engineer, release-manager, live-ops-designer, community-manager,
|
||
analytics-engineer, economy-designer, localization-lead
|
||
|
||
| Metric | PASS criteria |
|
||
|---|---|
|
||
| **O1 — Domain ownership clear** | Agent description clearly states what it owns (pipeline, releases, economy, etc.) |
|
||
| **O2 — Defers implementation** | Does not write game logic or engine code; delegates to appropriate specialist |
|
||
| **O3 — Toolset matches role** | `allowed-tools` in frontmatter matches the operational (not coding) nature of the role |
|