添加 claude code game studios 到项目
This commit is contained in:
@@ -0,0 +1,83 @@
|
||||
# Agent Test Spec: analytics-engineer
|
||||
|
||||
## Agent Summary
|
||||
- **Domain**: Telemetry architecture and event schema design, A/B test framework design, player behavior analysis methodology, analytics dashboard specification, event naming conventions, data pipeline design (schema → ingestion → dashboard)
|
||||
- **Does NOT own**: Game implementation of event tracking (appropriate programmer), economy design decisions informed by analytics (economy-designer), live ops event design (live-ops-designer)
|
||||
- **Model tier**: Sonnet
|
||||
- **Gate IDs**: None; produces schemas and test designs; defers implementation to programmers
|
||||
|
||||
---
|
||||
|
||||
## Static Assertions (Structural)
|
||||
|
||||
- [ ] `description:` field is present and domain-specific (references telemetry, A/B testing, event tracking, analytics)
|
||||
- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/analytics/ and documentation; no game source or CI tools)
|
||||
- [ ] Model tier is Sonnet (default for operations specialists)
|
||||
- [ ] Agent definition does not claim authority over game implementation, economy design, or live ops scheduling
|
||||
|
||||
---
|
||||
|
||||
## Test Cases
|
||||
|
||||
### Case 1: In-domain request — tutorial event tracking design
|
||||
**Input**: "Design the analytics event tracking for our tutorial. We want to know where players drop off and which steps they complete."
|
||||
**Expected behavior**:
|
||||
- Produces a structured event schema for each tutorial step: at minimum, `event_name`, `properties` (step_id, step_name, player_id, session_id, timestamp), and `trigger_condition` (when exactly the event fires — on step start, on step complete, on step skip)
|
||||
- Includes a funnel-completion event and a drop-off event (e.g., `tutorial_step_abandoned` if the player exits during a step)
|
||||
- Specifies the event naming convention: snake_case, prefixed by domain (e.g., `tutorial_step_started`, `tutorial_step_completed`, `tutorial_abandoned`)
|
||||
- Does NOT produce implementation code — marks implementation as [TO BE IMPLEMENTED BY PROGRAMMER]
|
||||
- Output is a schema table or structured list, not a narrative description
|
||||
|
||||
### Case 2: Out-of-domain request — implement the event tracking in code
|
||||
**Input**: "Now that the event schema is designed, write the GDScript code to fire these events in our Godot tutorial scene."
|
||||
**Expected behavior**:
|
||||
- Does not produce GDScript or any implementation code
|
||||
- States clearly: "Telemetry implementation in game code is handled by the appropriate programmer (gameplay-programmer or systems-programmer); I provide the event schema and integration requirements"
|
||||
- Optionally produces an integration spec: what the programmer needs to know to implement correctly (event name, properties, when to fire, what analytics SDK or endpoint to use)
|
||||
|
||||
### Case 3: Domain boundary — A/B test design for a UI change
|
||||
**Input**: "We want to A/B test two versions of our HUD: the current version and a minimal version with only a health bar. Design the test."
|
||||
**Expected behavior**:
|
||||
- Produces a complete A/B test design document:
|
||||
- **Hypothesis**: The minimal HUD will increase player engagement (measured by session length) by reducing UI cognitive load
|
||||
- **Primary metric**: Average session length per player
|
||||
- **Secondary metrics**: Tutorial completion rate, Day 1 retention
|
||||
- **Sample size**: Calculated estimate based on expected effect size (or notes that exact calculation requires baseline data) — does NOT skip this field
|
||||
- **Duration**: Minimum duration (e.g., "at least 2 weeks to capture weekly player behavior patterns")
|
||||
- **Randomization unit**: Player ID (not session ID, to prevent players seeing both versions)
|
||||
- Output is structured as a formal test design, not a bullet list of ideas
|
||||
|
||||
### Case 4: Conflict — overlapping A/B test player segments
|
||||
**Input**: "We have two A/B tests running simultaneously: Test A (HUD variants) affects all players, and Test B (tutorial variants) also affects all players."
|
||||
**Expected behavior**:
|
||||
- Flags the overlap as a mutual exclusion violation: if both tests affect the same player, their results are confounded — neither test produces clean data
|
||||
- Identifies the problem precisely: players in both tests will have HUD and tutorial variants interacting, making it impossible to attribute outcome differences to either variable alone
|
||||
- Proposes resolution options: (a) run tests sequentially, (b) split the player population into exclusive segments (50% in Test A, 50% in Test B, 0% in both), or (c) run a factorial design if the interaction effect is also of interest (more complex, requires larger sample)
|
||||
- Does NOT recommend continuing both tests on overlapping populations
|
||||
|
||||
### Case 5: Context pass — new events consistent with existing schema
|
||||
**Input context**: Existing event schema uses the naming convention: `[domain]_[object]_[action]` in snake_case. Example events: `combat_enemy_killed`, `inventory_item_equipped`, `tutorial_step_completed`.
|
||||
**Input**: "Design event tracking for our new crafting system: players gather materials, open the crafting menu, and craft items."
|
||||
**Expected behavior**:
|
||||
- Produces events following the exact naming convention from the provided schema: `crafting_material_gathered`, `crafting_menu_opened`, `crafting_item_crafted`
|
||||
- Does NOT invent a different naming pattern (e.g., `gatherMaterial`, `craftingOpened`) even if it might seem natural
|
||||
- Properties follow the same structure as existing events: `player_id`, `session_id`, `timestamp` as standard fields; domain-specific fields (material_type, item_id, crafting_time_seconds) as additional properties
|
||||
- Output explicitly references the provided naming convention as the standard being followed
|
||||
|
||||
---
|
||||
|
||||
## Protocol Compliance
|
||||
|
||||
- [ ] Stays within declared domain (event schema design, A/B test design, analytics methodology)
|
||||
- [ ] Redirects implementation requests to appropriate programmers with an integration spec, not code
|
||||
- [ ] Produces complete A/B test designs (hypothesis, metric, sample size, duration, randomization unit) — never partial
|
||||
- [ ] Flags mutual exclusion violations in overlapping A/B tests as data quality blockers
|
||||
- [ ] Follows provided naming conventions exactly; does not invent alternative conventions
|
||||
|
||||
---
|
||||
|
||||
## Coverage Notes
|
||||
- Case 3 (A/B test design completeness) is a quality gate — an incomplete test design wastes experiment budget
|
||||
- Case 4 (mutual exclusion) is a data integrity test — overlapping tests produce unusable results; this must be caught
|
||||
- Case 5 is the most important context-awareness test; naming convention drift across schemas causes dashboard breakage
|
||||
- No automated runner; review manually or via `/skill-test`
|
||||
@@ -0,0 +1,81 @@
|
||||
# Agent Test Spec: community-manager
|
||||
|
||||
## Agent Summary
|
||||
- **Domain**: Player-facing communications — patch notes text (player-friendly), social media post drafts, community update announcements, crisis communication response plans, bug triage and routing from player reports (not fixing)
|
||||
- **Does NOT own**: Technical patch content (devops-engineer), QA verification and test execution (qa-lead), bug fixes (programmers), brand strategy direction (creative-director)
|
||||
- **Model tier**: Sonnet
|
||||
- **Gate IDs**: None; escalates brand voice conflicts to creative-director
|
||||
|
||||
---
|
||||
|
||||
## Static Assertions (Structural)
|
||||
|
||||
- [ ] `description:` field is present and domain-specific (references player communication, patch notes, community management)
|
||||
- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/patch-notes/ and communication drafts; no code or build tools)
|
||||
- [ ] Model tier is Sonnet (default for operations specialists)
|
||||
- [ ] Agent definition does not claim authority over technical content, QA strategy, or bug fixing
|
||||
|
||||
---
|
||||
|
||||
## Test Cases
|
||||
|
||||
### Case 1: In-domain request — patch notes for a bug fix
|
||||
**Input**: "Write player-facing patch notes for this fix: 'JIRA-4821: Fixed NullReferenceException in InventoryManager.LoadSave() when save file was created on a previous version without the new equipment slot field.'"
|
||||
**Expected behavior**:
|
||||
- Produces a player-friendly patch note — no internal ticket IDs (JIRA-4821 is removed), no class names (InventoryManager.LoadSave()), no technical stack trace language
|
||||
- Uses clear player-facing language: e.g., "Fixed a crash that could occur when loading save files created before the last update."
|
||||
- Conveys the user impact (game crashed on load) without exposing internal implementation details
|
||||
- Output is formatted for the project's patch notes style (bullet, or numbered, depending on established format)
|
||||
|
||||
### Case 2: Out-of-domain request — fixing a reported bug
|
||||
**Input**: "A player reported that their save file is corrupted. Can you fix the save system?"
|
||||
**Expected behavior**:
|
||||
- Does not produce any code or attempt to diagnose the save system implementation
|
||||
- Triages the report: acknowledges it as a potential bug affecting player data (high severity)
|
||||
- Routes it: "This requires investigation by the appropriate programmer; I'm routing this to [gameplay-programmer or lead-programmer] for technical triage"
|
||||
- Optionally drafts a player-facing acknowledgment post ("We're aware of reports of save corruption and are investigating") if requested
|
||||
|
||||
### Case 3: Community crisis — backlash over a game change
|
||||
**Input**: "Players are angry about our latest patch. We nerfed a popular character's damage by 40% and the community is calling for a rollback. Forum posts, tweets, and Discord are all very negative."
|
||||
**Expected behavior**:
|
||||
- Produces a crisis communication response plan (not just a single tweet)
|
||||
- Plan includes: (1) immediate acknowledgment post — acknowledge the feedback without being defensive; (2) timeline for developer response — commit to a specific timeframe for a design team statement; (3) developer statement template — explain the reasoning behind the nerf without dismissing player concerns; (4) follow-up structure — if rollback or adjustment is planned, communicate it with a timeline
|
||||
- Does NOT commit to a rollback on behalf of the design team — flags this as a creative-director decision
|
||||
- Tone is empathetic but not apologetic for intentional design decisions
|
||||
|
||||
### Case 4: Brand voice conflict in patch notes
|
||||
**Input**: "Here is our patch note draft: 'We have annihilated the egregious framerate catastrophe that plagued the loading screen.' Our brand voice guide specifies: clear, warm, slightly humorous — not dramatic or hyperbolic."
|
||||
**Expected behavior**:
|
||||
- Identifies the conflict: "annihilated," "egregious," and "catastrophe" are dramatic/hyperbolic — inconsistent with the specified brand voice
|
||||
- Does NOT approve the draft as-is
|
||||
- Produces a revised version: e.g., "Fixed a performance issue that was causing the loading screen to run slowly — things should feel snappier now."
|
||||
- Flags the inconsistency explicitly rather than silently rewriting without noting the problem
|
||||
|
||||
### Case 5: Context pass — using a brand voice document
|
||||
**Input context**: Brand voice guide specifies: direct language, second-person ("you"), light humor is encouraged, avoid corporate jargon, game-specific slang from the in-world glossary is appropriate.
|
||||
**Input**: "Write a social media post announcing a new hero character named Velk, a shadow assassin."
|
||||
**Expected behavior**:
|
||||
- Uses second-person address ("Meet your next favorite assassin")
|
||||
- Incorporates light humor if it fits naturally
|
||||
- Avoids corporate language ("We are pleased to announce" → "Meet Velk")
|
||||
- Uses in-world language if the context includes a glossary (e.g., if assassins are called "Shadowwalkers" in-world, uses that term)
|
||||
- Output matches the specified tone — not a generic press-release announcement
|
||||
|
||||
---
|
||||
|
||||
## Protocol Compliance
|
||||
|
||||
- [ ] Stays within declared domain (player-facing communication, patch note text, crisis response, bug routing)
|
||||
- [ ] Strips internal IDs, class names, and technical jargon from all player-facing output
|
||||
- [ ] Redirects bug fix requests to appropriate programmers rather than attempting technical solutions
|
||||
- [ ] Does NOT commit to design rollbacks without creative-director authority
|
||||
- [ ] Applies brand voice specifications from context; flags violations rather than silently accepting them
|
||||
|
||||
---
|
||||
|
||||
## Coverage Notes
|
||||
- Case 1 (patch note sanitization) is the most frequently used behavior — test on every new patch cycle
|
||||
- Case 3 (crisis communication) is a brand-safety test — verify the agent de-escalates rather than inflames
|
||||
- Case 4 requires a brand voice document to be in context; test is incomplete without it
|
||||
- Case 5 is the most important context-awareness test for tone consistency
|
||||
- No automated runner; review manually or via `/skill-test`
|
||||
@@ -0,0 +1,80 @@
|
||||
# Agent Test Spec: devops-engineer
|
||||
|
||||
## Agent Summary
|
||||
- **Domain**: CI/CD pipeline configuration, build scripts, version control workflow enforcement, deployment infrastructure, branching strategy, environment management, automated test integration in CI
|
||||
- **Does NOT own**: Game logic or gameplay systems, security audits (security-engineer), QA test strategy (qa-lead), game networking logic (network-programmer)
|
||||
- **Model tier**: Sonnet
|
||||
- **Gate IDs**: None; escalates deployment blockers to producer
|
||||
|
||||
---
|
||||
|
||||
## Static Assertions (Structural)
|
||||
|
||||
- [ ] `description:` field is present and domain-specific (references CI/CD, build, deployment, version control)
|
||||
- [ ] `allowed-tools:` list matches the agent's role (Read/Write for pipeline config files, shell scripts, YAML; no game source editing tools)
|
||||
- [ ] Model tier is Sonnet (default for operations specialists)
|
||||
- [ ] Agent definition does not claim authority over game logic, security audits, or QA test design
|
||||
|
||||
---
|
||||
|
||||
## Test Cases
|
||||
|
||||
### Case 1: In-domain request — CI setup for a Godot project
|
||||
**Input**: "Set up a CI pipeline for our Godot 4 project. It should run tests on every push to main and every pull request, and fail the build if tests fail."
|
||||
**Expected behavior**:
|
||||
- Produces a GitHub Actions workflow YAML (`.github/workflows/ci.yml` or equivalent)
|
||||
- Uses the Godot headless test runner command from `coding-standards.md`: `godot --headless --script tests/gdunit4_runner.gd`
|
||||
- Configures trigger on `push` to main and `pull_request`
|
||||
- Sets the job to fail (`exit 1` or non-zero exit) when tests fail — does NOT configure the pipeline to continue on test failure
|
||||
- References the project's coding standards CI rules in the output or comments
|
||||
|
||||
### Case 2: Out-of-domain request — game networking implementation
|
||||
**Input**: "Implement the server-authoritative movement system for our multiplayer game."
|
||||
**Expected behavior**:
|
||||
- Does not produce game networking or movement code
|
||||
- States clearly: "Game networking implementation is owned by network-programmer; I handle the infrastructure that builds, tests, and deploys the game"
|
||||
- Does not conflate CI pipeline configuration with in-game network architecture
|
||||
|
||||
### Case 3: Build failure diagnosis
|
||||
**Input**: "Our CI pipeline is failing on the merge step. The error is: 'Asset import failed: texture compression format unsupported in headless mode.'"
|
||||
**Expected behavior**:
|
||||
- Diagnoses the root cause: headless CI environment does not support GPU-dependent texture compression
|
||||
- Proposes a concrete fix: either pre-import assets locally before CI runs (commit .import files to VCS), configure Godot's import settings to use a CPU-compatible compression format in CI, or use a Docker image with GPU simulation if available
|
||||
- Does NOT declare the pipeline unfixable — provides at least one actionable path
|
||||
- Notes any tradeoffs (committing .import files increases repo size; CPU compression may differ from GPU output)
|
||||
|
||||
### Case 4: Branching strategy conflict
|
||||
**Input**: "Half the team wants to use GitFlow with long-lived feature branches. The other half wants trunk-based development. How should we set this up?"
|
||||
**Expected behavior**:
|
||||
- Recommends trunk-based development per project conventions (CLAUDE.md / coordination-rules.md specify Git with trunk-based development)
|
||||
- Provides concrete rationale for the recommendation in this project's context: smaller team, fewer integration conflicts, faster CI feedback
|
||||
- Does NOT present this as a 50/50 choice if the project has an established convention
|
||||
- Explains how to implement trunk-based development with short-lived feature branches and feature flags if needed
|
||||
- Does NOT override the project convention without flagging that doing so requires updating CLAUDE.md
|
||||
|
||||
### Case 5: Context pass — platform-specific build matrix
|
||||
**Input context**: Project targets PC (Windows, Linux), Nintendo Switch, and PlayStation 5.
|
||||
**Input**: "Set up our CI build matrix so we get a build artifact for each target platform on every release branch push."
|
||||
**Expected behavior**:
|
||||
- Produces a build matrix configuration with three platform entries: Windows, Linux, Switch, PS5
|
||||
- Applies platform-appropriate build steps: PC uses standard Godot export templates; Switch and PS5 require platform-specific export templates (notes that console templates require licensed SDK access and are not publicly distributed)
|
||||
- Does NOT assume all platforms can use the same build runner — flags that console builds may require self-hosted runners with licensed SDKs
|
||||
- Organizes artifacts by platform name in the pipeline output
|
||||
|
||||
---
|
||||
|
||||
## Protocol Compliance
|
||||
|
||||
- [ ] Stays within declared domain (CI/CD, build scripts, version control, deployment)
|
||||
- [ ] Redirects game logic and networking requests to appropriate programmers
|
||||
- [ ] Recommends trunk-based development when branching strategy is contested, per project conventions
|
||||
- [ ] Returns structured pipeline configurations (YAML, scripts) not freeform advice
|
||||
- [ ] Flags platform SDK licensing constraints for console builds rather than silently producing incorrect configs
|
||||
|
||||
---
|
||||
|
||||
## Coverage Notes
|
||||
- Case 1 (Godot CI) references `coding-standards.md` CI rules — verify this file is present and current before running this test
|
||||
- Case 4 (branching strategy) is a convention-enforcement test — agent must know the project convention, not just give neutral advice
|
||||
- Case 5 requires that project's target platforms are documented (in `technical-preferences.md` or equivalent)
|
||||
- No automated runner; review manually or via `/skill-test`
|
||||
@@ -0,0 +1,80 @@
|
||||
# Agent Test Spec: economy-designer
|
||||
|
||||
## Agent Summary
|
||||
- **Domain**: Resource economy design, loot table design, progression curves (XP, level, unlock), in-game market and shop design, economic balance analysis, sink and faucet mechanics, inflation/deflation risk assessment
|
||||
- **Does NOT own**: Live ops event scheduling and structure (live-ops-designer), code implementation, analytics tracking design (analytics-engineer), narrative justification for economy systems (writer)
|
||||
- **Model tier**: Sonnet
|
||||
- **Gate IDs**: None; escalates economy-breaking design conflicts to creative-director or producer
|
||||
|
||||
---
|
||||
|
||||
## Static Assertions (Structural)
|
||||
|
||||
- [ ] `description:` field is present and domain-specific (references economy, loot tables, progression curves, balance)
|
||||
- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/balance/ documents; no code or analytics tools)
|
||||
- [ ] Model tier is Sonnet (default for design specialists)
|
||||
- [ ] Agent definition does not claim authority over live ops scheduling, code, or narrative
|
||||
|
||||
---
|
||||
|
||||
## Test Cases
|
||||
|
||||
### Case 1: In-domain request — loot table design for a chest
|
||||
**Input**: "Design the loot table for a standard treasure chest in our dungeon game."
|
||||
**Expected behavior**:
|
||||
- Produces a probability table with distinct rarity tiers: Common, Uncommon, Rare, Epic, Legendary (or project-equivalent tiers)
|
||||
- Each tier has: probability percentage, example item categories, and expected gold equivalent value range
|
||||
- Probabilities sum to 100%
|
||||
- Includes a brief rationale for each tier's probability: why Common is set at its value, why Legendary is set at its value
|
||||
- Does NOT produce a single flat list of items — uses tiered probability structure to reflect meaningful rarity
|
||||
|
||||
### Case 2: Out-of-domain request — seasonal event schedule
|
||||
**Input**: "Design the schedule for our summer event and fall event. When should they run and how long should each last?"
|
||||
**Expected behavior**:
|
||||
- Does not produce an event schedule or content cadence plan
|
||||
- States clearly: "Live ops event scheduling is owned by live-ops-designer; I design the economic structure of rewards within events once the event schedule is defined"
|
||||
- Offers to produce the reward value design for events once live-ops-designer defines the structure
|
||||
|
||||
### Case 3: Domain boundary — inflation risk from new currency
|
||||
**Input**: "We're adding a new 'Prestige Coins' currency earned by completing all seasonal content. Players can spend them in a Prestige Shop."
|
||||
**Expected behavior**:
|
||||
- Identifies the inflation risk: if Prestige Coins accumulate faster than the shop provides sinks, the shop loses perceived value and players hoard coins without spending
|
||||
- Flags the specific risk: seasonal content completion is a finite faucet, but if the shop catalog is exhausted before the season ends, late-season coins have no value
|
||||
- Proposes a sink mechanic: rotating limited-time shop items, consumable items in the Prestige Shop, or a currency conversion option to keep coins draining
|
||||
- Does NOT approve the design as economically sound without addressing the sink question
|
||||
- Produces a structured risk assessment: faucet rate (estimated coins/week), sink capacity (estimated coins required to exhaust catalog), surplus projection
|
||||
|
||||
### Case 4: Mid-game progression curve issue
|
||||
**Input**: "Players are reporting the mid-game XP grind (levels 20-35) feels like a wall. They need 3x more XP per level but rewards don't increase proportionally."
|
||||
**Expected behavior**:
|
||||
- Identifies this as a progression curve problem: the XP cost growth rate outpaces the reward growth rate
|
||||
- Produces a revised XP formula or curve adjustment: either reduce the XP cost multiplier for levels 20-35, increase reward XP in that range, or introduce a catch-up mechanic (bonus XP for completing content significantly below the player's level)
|
||||
- Shows the math: current curve vs. proposed curve, with specific numbers for levels 20, 25, 30, 35
|
||||
- Flags that any curve change affects time-to-level-cap projections — notes the downstream impact on end-game content pacing
|
||||
|
||||
### Case 5: Context pass — balance analysis using current economy data
|
||||
**Input context**: Current economy data: average player earns 450 Gold/hour, average shop item costs 2,000 Gold, average session length is 40 minutes. Premium items cost 5,000 Gold.
|
||||
**Input**: "Is our current Gold economy healthy? Should we adjust prices or earn rates?"
|
||||
**Expected behavior**:
|
||||
- Uses the specific numbers provided: 450 Gold/hour = 300 Gold/40-min session; 2,000 Gold item requires ~4.4 sessions to afford; 5,000 Gold premium item requires ~11 sessions
|
||||
- Evaluates whether these ratios feel rewarding or frustrating based on economy design principles
|
||||
- Produces a concrete recommendation using the actual numbers: e.g., "At current earn rates, premium items take ~7.3 hours of play to afford — this is at the high end of acceptable; consider either increasing earn rate to 550 Gold/hour or reducing premium item cost to 4,000 Gold"
|
||||
- Does NOT produce generic advice ("prices may be too high") without anchoring to the provided data
|
||||
|
||||
---
|
||||
|
||||
## Protocol Compliance
|
||||
|
||||
- [ ] Stays within declared domain (loot tables, progression curves, resource economy, inflation/deflation analysis)
|
||||
- [ ] Redirects live ops scheduling requests to live-ops-designer without producing schedules
|
||||
- [ ] Flags inflation/deflation risks proactively with quantified sink/faucet analysis
|
||||
- [ ] Produces explicit math for progression curves — no vague curve adjustments without numbers
|
||||
- [ ] Uses actual economy data from context; does not produce generic benchmarks when specifics are provided
|
||||
|
||||
---
|
||||
|
||||
## Coverage Notes
|
||||
- Case 3 (inflation risk) is an economic health test — missed inflation risks cause long-term economy damage in live games
|
||||
- Case 4 requires the agent to produce actual numbers, not curve shapes — verify math is present, not just a narrative
|
||||
- Case 5 is the most important context-awareness test; agent must use provided data, not placeholder values
|
||||
- No automated runner; review manually or via `/skill-test`
|
||||
@@ -0,0 +1,81 @@
|
||||
# Agent Test Spec: live-ops-designer
|
||||
|
||||
## Agent Summary
|
||||
- **Domain**: Post-launch content strategy, seasonal events (design and structure), battle pass design, content cadence planning, player retention mechanic design, live service feature roadmaps
|
||||
- **Does NOT own**: Economy math and reward value calculations (economy-designer), analytics tracking implementation (analytics-engineer), narrative content within events (writer), code implementation
|
||||
- **Model tier**: Sonnet
|
||||
- **Gate IDs**: None; escalates monetization concerns to creative-director for brand/ethics review
|
||||
|
||||
---
|
||||
|
||||
## Static Assertions (Structural)
|
||||
|
||||
- [ ] `description:` field is present and domain-specific (references live ops, seasonal events, battle pass, retention)
|
||||
- [ ] `allowed-tools:` list matches the agent's role (Read/Write for design/live-ops/ documents; no code or analytics tools)
|
||||
- [ ] Model tier is Sonnet (default for design specialists)
|
||||
- [ ] Agent definition does not claim authority over economy math, analytics pipelines, or narrative direction
|
||||
|
||||
---
|
||||
|
||||
## Test Cases
|
||||
|
||||
### Case 1: In-domain request — summer event design
|
||||
**Input**: "Design a summer event for our game. It should run for 3 weeks and give players reasons to log in daily."
|
||||
**Expected behavior**:
|
||||
- Produces an event structure document covering: event duration (3 weeks, with start/end dates if context provides the current date), daily login retention hooks (daily missions, login streaks, time-limited rewards), progression gates (weekly milestones that reward continued engagement), and reward categories (cosmetic, functional, or currency — flagged for economy-designer to value)
|
||||
- Does NOT assign specific reward values or currency amounts — marks these as [TO BE BALANCED BY ECONOMY-DESIGNER]
|
||||
- Identifies the core player loop for the event separate from the base game loop
|
||||
- Output is a structured event brief: overview, schedule, progression structure, reward categories
|
||||
|
||||
### Case 2: Out-of-domain request — reward value calculation
|
||||
**Input**: "How much premium currency should we give out in this event? What's the fair value of each cosmetic reward tier?"
|
||||
**Expected behavior**:
|
||||
- Does not produce currency amounts or reward valuation
|
||||
- States clearly: "Reward values and currency amounts are owned by economy-designer; I design the event structure and define what rewards exist, then economy-designer assigns their values"
|
||||
- Offers to produce the reward structure (tiers, unlock gates, cosmetic categories) so economy-designer has something concrete to value
|
||||
|
||||
### Case 3: Domain boundary — predatory monetization concern
|
||||
**Input**: "Let's design the battle pass so that players need to spend premium currency on top of the pass price to complete all tiers within the season."
|
||||
**Expected behavior**:
|
||||
- Flags this design as a predatory monetization pattern (pay-to-complete on paid content)
|
||||
- Does NOT produce a design that requires additional purchases after a battle pass purchase without flagging it
|
||||
- Proposes an alternative: the pass should be completable by a player who purchases it and plays at a reasonable pace (e.g., 45 minutes/day for 5 days/week)
|
||||
- Notes that this decision has brand and ethics implications — escalates to creative-director for approval before proceeding
|
||||
- Does not refuse to continue entirely — offers the ethical alternative design and awaits direction
|
||||
|
||||
### Case 4: Conflict — event schedule vs. main game progression pacing
|
||||
**Input**: "We want to run a double-XP event during weeks 3-5 of the season, but our progression designer says that's when players are supposed to hit the mid-game difficulty curve."
|
||||
**Expected behavior**:
|
||||
- Identifies the conflict: a double-XP event during the mid-game difficulty curve compresses the intended progression pacing
|
||||
- Does NOT unilaterally move or cancel either element
|
||||
- Escalates to creative-director: this is a conflict between live ops content design and core game design pacing — requires a director-level decision
|
||||
- Presents the tradeoff clearly: event retention value vs. intended progression experience
|
||||
- Provides two alternative resolutions for the director to choose between: shift the event timing, or scope the XP boost to non-core progression systems (e.g., cosmetic grind only)
|
||||
|
||||
### Case 5: Context pass — designing to address a player retention drop-off
|
||||
**Input context**: Analytics show a 40% player drop-off at Day 7, attributed to players completing the tutorial but finding no mid-term goal to pursue.
|
||||
**Input**: "Design a live ops feature to address the Day 7 drop-off."
|
||||
**Expected behavior**:
|
||||
- Designs specifically for the Day 7 cohort — not a generic retention feature
|
||||
- Proposes a mid-term goal structure: a 2-week "Explorer Challenge" that unlocks at Day 5-7 and provides a visible progression track with rewards at Day 10, 14, and 21
|
||||
- Connects the design explicitly to the identified drop-off point: the feature must be visible and activating before or at Day 7
|
||||
- Does NOT design a feature for Day 1 retention or Day 30 monetization when the data points to Day 7 as the target
|
||||
- Notes that specific reward values are [TO BE DEFINED BY ECONOMY-DESIGNER] using the actual retention data
|
||||
|
||||
---
|
||||
|
||||
## Protocol Compliance
|
||||
|
||||
- [ ] Stays within declared domain (event structure, content cadence, retention design, battle pass design)
|
||||
- [ ] Redirects reward value and economy math requests to economy-designer
|
||||
- [ ] Flags predatory monetization patterns and escalates to creative-director rather than implementing them silently
|
||||
- [ ] Escalates event/core-progression conflicts to creative-director rather than resolving unilaterally
|
||||
- [ ] Uses provided retention data to target specific player cohorts, not generic engagement strategies
|
||||
|
||||
---
|
||||
|
||||
## Coverage Notes
|
||||
- Case 3 (monetization ethics) is a brand-safety test — failure here could result in harmful live ops designs shipping
|
||||
- Case 4 (escalation behavior) is a coordination test — verify the agent actually escalates rather than deciding independently
|
||||
- Case 5 is the most important context-awareness test; agent must target the specific drop-off point, not a generic solution
|
||||
- No automated runner; review manually or via `/skill-test`
|
||||
@@ -0,0 +1,81 @@
|
||||
# Agent Test Spec: localization-lead
|
||||
|
||||
## Agent Summary
|
||||
- **Domain**: Internationalization (i18n) architecture, string extraction workflows and tooling configuration, locale testing methodology, translation pipeline design (extraction → TMS → import), string quality standards, locale-specific formatting rules (plurals, RTL, date/number formats)
|
||||
- **Does NOT own**: Game narrative content and dialogue writing (writer), code implementation of i18n calls (gameplay-programmer), translation work itself (external translators)
|
||||
- **Model tier**: Sonnet
|
||||
- **Gate IDs**: None; escalates pipeline architecture decisions to technical-director when they affect build systems
|
||||
|
||||
---
|
||||
|
||||
## Static Assertions (Structural)
|
||||
|
||||
- [ ] `description:` field is present and domain-specific (references i18n, string extraction, locale pipeline, localization)
|
||||
- [ ] `allowed-tools:` list matches the agent's role (Read/Write for localization config, pipeline docs, string tables; no game source editing or deployment tools)
|
||||
- [ ] Model tier is Sonnet (default for specialists)
|
||||
- [ ] Agent definition does not claim authority over narrative content, game code implementation, or translation quality
|
||||
|
||||
---
|
||||
|
||||
## Test Cases
|
||||
|
||||
### Case 1: In-domain request — string extraction pipeline for a Unity project
|
||||
**Input**: "Set up a string extraction pipeline for our Unity game. We need to get all localizable strings into a format translators can work with."
|
||||
**Expected behavior**:
|
||||
- Produces a concrete extraction configuration covering: which string types to extract (UI labels, dialogue, item descriptions — not debug strings), the tool to use (e.g., Unity Localization package string tables, or a custom extraction script targeting specific component types), and the output format (CSV, XLIFF, or TMX — notes which formats are compatible with common TMS tools like Crowdin or Lokalise)
|
||||
- Specifies the folder structure: e.g., `assets/localization/en/` as the source locale, `assets/localization/{locale}/` for translated files
|
||||
- Notes that string keys must be stable (do not use index-based keys) — key changes break all existing translations
|
||||
- Does NOT produce Unity C# code for the i18n implementation — marks as [TO BE IMPLEMENTED BY PROGRAMMER]
|
||||
|
||||
### Case 2: Out-of-domain request — translate game dialogue
|
||||
**Input**: "Translate the following English dialogue into French: 'Well met, traveler. The road ahead is treacherous.'"
|
||||
**Expected behavior**:
|
||||
- Does not produce a French translation
|
||||
- States clearly: "localization-lead owns the pipeline, quality standards, and workflow; actual translation work is performed by human translators or approved translation vendors — I am not a translator"
|
||||
- Optionally notes what information a translator would need: context (who is speaking, to whom, game genre/tone), character limit constraints if any, glossary terms (e.g., if "traveler" has a game-specific translation)
|
||||
|
||||
### Case 3: Domain boundary — missing plural forms in Russian locale
|
||||
**Input**: "Our Russian locale files only have a singular form for item quantity strings. Russian requires multiple plural forms (1 item, 2-4 items, 5+ items use different forms)."
|
||||
**Expected behavior**:
|
||||
- Identifies this as a locale-specific plural form gap: Russian has 3 plural categories (one, few, many) per CLDR/Unicode plural rules — a single string is insufficient
|
||||
- Flags it as a localization quality bug, not a minor style issue — incorrect plural forms are grammatically wrong and visible to players
|
||||
- Recommends the fix: update the string extraction format to support CLDR plural categories (one/few/many/other), and flag to the translation vendor that Russian strings need all plural forms
|
||||
- Notes which other languages in the pipeline also require plural form support (e.g., Polish, Czech, Arabic)
|
||||
- Does NOT suggest using a numeric threshold workaround as a substitute for proper CLDR plural support
|
||||
|
||||
### Case 4: String key naming conflict between two systems
|
||||
**Input**: "Our UI system uses keys like 'button_confirm' and 'button_cancel'. Our dialogue system uses 'confirm' and 'cancel' for the same concepts. Translators are confused about which to use."
|
||||
**Expected behavior**:
|
||||
- Identifies the conflict: two systems use different key naming conventions for semantically identical strings, creating duplicate translation work and translator confusion
|
||||
- Produces a naming convention resolution: domain-prefixed keys with a consistent separator (e.g., `ui.button.confirm`, `ui.button.cancel`) — all systems use the same key for shared concepts
|
||||
- Recommends that shared UI primitives (Confirm, Cancel, Back, OK) use a single canonical key in a shared namespace, referenced by both systems
|
||||
- Provides a migration path: map old keys to new keys, update all string references in both systems, deprecate old keys after one release cycle
|
||||
- Does NOT recommend maintaining two separate keys for the same concept
|
||||
|
||||
### Case 5: Context pass — pipeline accommodates RTL languages
|
||||
**Input context**: Target locales include English (en), French (fr), German (de), Arabic (ar), and Hebrew (he).
|
||||
**Input**: "Design the localization pipeline for this project."
|
||||
**Expected behavior**:
|
||||
- Identifies Arabic and Hebrew as RTL languages — explicitly calls this out as a pipeline requirement
|
||||
- Designs the pipeline to include: RTL text rendering support (flag for programmer: UI must support RTL layout mirroring), bidirectional (bidi) text handling in string tables, locale-specific testing checklist entry for RTL layout
|
||||
- Does NOT design a pipeline that only accounts for LTR languages when RTL locales are specified
|
||||
- Notes that Arabic also requires a different plural form structure (6 plural categories in CLDR) — flags for translation vendor
|
||||
- Output includes all five locales in the pipeline architecture, not just the default (en)
|
||||
|
||||
---
|
||||
|
||||
## Protocol Compliance
|
||||
|
||||
- [ ] Stays within declared domain (pipeline, extraction, string quality, locale formats, i18n architecture)
|
||||
- [ ] Does not produce translations — redirects translation work to human translators/vendors
|
||||
- [ ] Flags locale-specific gaps (plural forms, RTL) as quality bugs requiring pipeline changes
|
||||
- [ ] Produces a unified key naming convention when conflicts arise — does not accept dual conventions
|
||||
- [ ] Incorporates all provided target locales, including RTL languages, into pipeline design
|
||||
|
||||
---
|
||||
|
||||
## Coverage Notes
|
||||
- Case 3 (plural forms) and Case 5 (RTL) are locale-correctness tests — these affect shipping quality in non-English markets
|
||||
- Case 4 (key naming conflict) is a pipeline hygiene test — duplicate keys cause ongoing translator confusion and cost
|
||||
- Case 5 requires the target locale list to be in context; if not provided, agent should ask before designing the pipeline
|
||||
- No automated runner; review manually or via `/skill-test`
|
||||
@@ -0,0 +1,80 @@
|
||||
# Agent Test Spec: release-manager
|
||||
|
||||
## Agent Summary
|
||||
- **Domain**: Release pipeline management, platform certification checklists (Nintendo, Sony, Microsoft, Apple, Google), store submission workflows, platform technical requirements compliance, semantic version numbering, release branch management
|
||||
- **Does NOT own**: Game design decisions, QA test strategy or test case design (qa-lead), QA test execution (qa-tester), build infrastructure (devops-engineer)
|
||||
- **Model tier**: Sonnet
|
||||
- **Gate IDs**: May be invoked by `/gate-check` during Release phase; LAUNCH BLOCKED verdict is release-manager's primary escalation output
|
||||
|
||||
---
|
||||
|
||||
## Static Assertions (Structural)
|
||||
|
||||
- [ ] `description:` field is present and domain-specific (references release pipeline, certification, store submission)
|
||||
- [ ] `allowed-tools:` list matches the agent's role (Read/Write for production/releases/ directory; no game source or test tools)
|
||||
- [ ] Model tier is Sonnet (default for operations specialists)
|
||||
- [ ] Agent definition does not claim authority over QA strategy, game design, or build infrastructure
|
||||
|
||||
---
|
||||
|
||||
## Test Cases
|
||||
|
||||
### Case 1: In-domain request — platform certification checklist for Nintendo Switch
|
||||
**Input**: "Generate the certification checklist for our Nintendo Switch submission."
|
||||
**Expected behavior**:
|
||||
- Produces a structured checklist covering Nintendo Lotcheck requirements relevant to the game type
|
||||
- Includes categories: content rating (CERO/PEGI/ESRB as applicable), save data handling, offline mode compliance, error handling (lost connectivity, storage full), controller requirement (Joy-Con, Pro Controller support), sleep/wake behavior, screenshot/video capture compliance
|
||||
- Formats output as a numbered checklist with pass/fail columns
|
||||
- Notes that Nintendo's full Lotcheck guidelines require a licensed developer account to access and flags any items that require manual verification against the current guidelines document
|
||||
- Does NOT produce fabricated requirement IDs — uses known public requirements or clearly marks uncertainty
|
||||
|
||||
### Case 2: Out-of-domain request — design test cases
|
||||
**Input**: "Write test cases for our save system to make sure it passes certification."
|
||||
**Expected behavior**:
|
||||
- Does not produce test case specifications
|
||||
- States clearly: "Test case design is owned by qa-lead (strategy) and qa-tester (execution); I can provide the certification requirements that the save system must meet, which qa-lead can then use to design tests"
|
||||
- Optionally offers to list the save-system-relevant certification requirements
|
||||
|
||||
### Case 3: Domain boundary — certification failure (rating issue)
|
||||
**Input**: "Our build was rejected by the ESRB. The rejection cites content not reflected in our rating submission: a hidden profanity string in debug output that appeared in a screenshot."
|
||||
**Expected behavior**:
|
||||
- Issues a LAUNCH BLOCKED verdict with the specific platform requirement referenced (ESRB submission accuracy requirement)
|
||||
- Identifies the immediate action required: locate and remove all debug output containing inappropriate content before resubmission
|
||||
- Notes the resubmission process: corrected build must be resubmitted with updated content descriptor if needed
|
||||
- Does NOT minimize the issue — a certification rejection is a blocking event, not an advisory
|
||||
- Escalates to producer: documents the delay impact on release timeline
|
||||
|
||||
### Case 4: Version numbering conflict — hotfix vs. release branch
|
||||
**Input**: "Our release branch is at v1.2.0. A hotfix was applied directly on main and tagged v1.2.1. Now the release branch also has changes that need to ship as v1.2.1 but they're different changes."
|
||||
**Expected behavior**:
|
||||
- Identifies the conflict: two different changesets have been assigned the same version tag
|
||||
- Applies semantic versioning resolution: one must be re-tagged — the release branch changes should become v1.2.2 if v1.2.1 is already published; if v1.2.1 is not yet published, coordinate with devops-engineer to merge or re-tag
|
||||
- Does NOT accept a state where the same version number refers to two different builds
|
||||
- Notes that once a version is submitted to a store, it cannot be reused — flags this as a potential store submission blocker
|
||||
|
||||
### Case 5: Context pass — release date constraint and certification lead time
|
||||
**Input context**: Target release date is 2026-06-01. Current date is 2026-04-06. Nintendo Lotcheck typically takes 4-6 weeks.
|
||||
**Input**: "What should we prioritize on the certification checklist given our timeline?"
|
||||
**Expected behavior**:
|
||||
- Calculates the available window: ~8 weeks to release date; Nintendo Lotcheck at 4-6 weeks means submission must be ready by approximately 2026-04-20 to 2026-05-04 to allow for a potential resubmission cycle
|
||||
- Flags that a single rejection cycle would consume the buffer — prioritizes items historically associated with Lotcheck rejections (save data, offline mode, error handling)
|
||||
- Orders the checklist by certification lead time impact, not by perceived difficulty
|
||||
- Does NOT produce a checklist that assumes first-pass certification — builds in resubmission time
|
||||
|
||||
---
|
||||
|
||||
## Protocol Compliance
|
||||
|
||||
- [ ] Stays within declared domain (release pipeline, certification checklists, version numbering, store submission)
|
||||
- [ ] Redirects test case design requests to qa-lead/qa-tester without producing test specs
|
||||
- [ ] Issues LAUNCH BLOCKED verdicts for certification failures — does not downgrade to advisory
|
||||
- [ ] Applies semantic versioning correctly and flags version conflicts as store-blocking issues
|
||||
- [ ] Uses provided timeline data to prioritize checklist items by certification lead time
|
||||
|
||||
---
|
||||
|
||||
## Coverage Notes
|
||||
- Case 3 (LAUNCH BLOCKED verdict) is the most critical test — this agent's primary safety output is blocking bad launches
|
||||
- Case 5 requires current date and release date context; verify the agent uses actual dates, not placeholder estimates
|
||||
- Certification requirements change over time — flag if the agent produces specific requirement IDs that may be outdated
|
||||
- No automated runner; review manually or via `/skill-test`
|
||||
Reference in New Issue
Block a user