添加 claude code game studios 到项目

2026-05-15 14:52:29 +08:00
parent dff559462d
commit a16fe4bff7
415 changed files with 78609 additions and 0 deletions
--- a/.claude/skills/test-flakiness/SKILL.md
+++ b/.claude/skills/test-flakiness/SKILL.md
@@ -0,0 +1,211 @@
+---
+name: test-flakiness
+description: "Detect non-deterministic (flaky) tests by reading CI run logs or test result history. Aggregates pass rates per test, identifies intermittent failures, recommends quarantine or fix, and maintains a flaky test registry. Best run during Polish phase or after multiple CI runs."
+argument-hint: "[ci-log-path | scan | registry]"
+user-invocable: true
+allowed-tools: Read, Glob, Grep, Write, Edit, Bash
+model: sonnet
+---
+
+# Test Flakiness Detection
+
+A flaky test is one that sometimes passes and sometimes fails without any code
+change. Flaky tests are worse than no tests in some ways — they train the team
+to ignore red CI runs, masking genuine failures. This skill identifies them,
+explains likely causes, and recommends whether to quarantine or fix each one.
+
+**Output:** Updated `tests/regression-suite.md` quarantine section + optional
+`production/qa/flakiness-report-[date].md`
+
+**When to run:**
+- Polish phase (tests have had many runs; statistical signal is reliable)
+- When developers start dismissing CI failures as "probably flaky"
+- After `/regression-suite` identifies quarantined tests that need diagnosis
+
+---
+
+## 1. Parse Arguments
+
+**Modes:**
+- `/test-flakiness [ci-log-path]` — analyse a specific CI run log file
+- `/test-flakiness scan` — scan all available CI logs in `.github/` or
+  standard log output directories
+- `/test-flakiness registry` — read existing regression-suite.md quarantine
+  section and provide remediation guidance for already-known flaky tests
+- No argument — auto-detect: run `scan` if CI logs are accessible, else
+  `registry`
+
+---
+
+## 2. Locate CI Log Data
+
+### Option A — GitHub Actions (preferred)
+
+Check for test result artifacts:
+```bash
+ls -t .github/ 2>/dev/null
+ls -t test-results/ 2>/dev/null
+```
+
+For Godot projects: GdUnit4 outputs XML results compatible with JUnit format.
+Check `test-results/` for `.xml` files.
+
+For Unity projects: game-ci test runner outputs NUnit XML to `test-results/`
+by default.
+
+For Unreal projects: automation logs go to `Saved/Logs/`. Grep for
+`Result: Success` and `Result: Fail` patterns.
+
+### Option B — Local log files
+
+If a path argument is provided, read that file directly.
+
+### Option C — No log data available
+
+If no logs found:
+> "No CI log data found. To detect flaky tests, this skill needs test result
+> history from multiple runs. Options:
+> 1. Run the test suite at least 3 times and collect the output logs
+> 2. Check CI pipeline output and save a log to `test-results/`
+> 3. Run `/test-flakiness registry` to review tests already flagged as flaky
+>    in `tests/regression-suite.md`"
+
+Stop and ask the user which option to pursue.
+
+---
+
+## 3. Parse Test Results
+
+For each CI log or result file found, parse:
+
+**JUnit XML format** (GdUnit4 / Unity):
+- Grep for `<testcase name=` to get test names
+- Grep for `<failure` or `<error` to identify failures
+- Parse `classname` and `name` attributes for full test identifiers
+
+**Plain text logs**:
+- Grep for pass/fail patterns:
+  - Godot: `PASSED` / `FAILED` adjacent to test names
+  - Unreal: `Result: Success` / `Result: Fail`
+  - Unity: `Test passed` / `Test failed`
+
+Build a table: `test_id → [run1_result, run2_result, run3_result, ...]`
+
+---
+
+## 4. Identify Flaky Tests
+
+A test is **flaky** if it appears in the result history with both PASS and
+FAIL outcomes across runs with no code changes between them.
+
+Flakiness thresholds:
+- **High flakiness**: Fails in >25% of runs — quarantine immediately
+- **Moderate flakiness**: Fails in 5–25% of runs — investigate and fix soon
+- **Low/suspected flakiness**: Fails in 1–5% of runs — monitor; may be
+  genuinely rare failure
+
+For each flaky test, classify the likely cause:
+
+### Cause classification
+
+| Cause | Symptoms | Fix direction |
+|-------|----------|---------------|
+| **Timing / async** | Fails after awaiting signals or timers; pass rate correlates with system load | Add explicit await/synchronisation; avoid time-based delays |
+| **Order dependency** | Fails when run after specific other tests; passes in isolation | Add proper setup/teardown; ensure test isolation |
+| **Random seed** | Fails intermittently with no pattern; involves RNG | Pass explicit seed; don't use `randf()` in tests |
+| **Resource leak** | Fails more often later in a test run | Fix cleanup in teardown; check orphan nodes (Godot) or object disposal (Unity) |
+| **External state** | Fails when a file, scene, or global exists from a prior test | Isolate test from file system; use in-memory mocks |
+| **Floating point** | Fails on comparisons like `== 0.5` | Use epsilon comparison (`is_equal_approx`, `Assert.AreApproximately`) |
+| **Scene/prefab load race** | Fails when scenes are not yet ready | Await one frame after instantiation; use `await get_tree().process_frame` |
+
+Use Grep to check the test file for timing calls, randf, global state access,
+or equality comparisons on floats to narrow down the cause.
+
+---
+
+## 5. Recommend Action
+
+For each flaky test:
+
+**Quarantine (High flakiness):**
+> "Quarantine this test immediately. Disable it in CI by adding
+> `@pytest.mark.skip` / `[Ignore]` / `GdUnitSkip` annotation. Log it in
+> `tests/regression-suite.md` quarantine section. The test is now opt-in only.
+> Fix the root cause before removing quarantine."
+
+**Investigate and fix soon (Moderate):**
+> "This test is intermittently unreliable. Root cause appears to be [cause].
+> Suggested fix: [specific fix based on cause classification]. Do not quarantine
+> yet — fix the test directly."
+
+**Monitor (Low/suspected):**
+> "This test shows suspected flakiness. Collect more run data before
+> quarantining. Note it as 'suspected' in the regression suite."
+
+---
+
+## 6. Generate Reports
+
+### In-conversation summary
+
+```
+## Flakiness Detection Results
+
+**Runs analysed**: [N]
+**Tests tracked**: [N]
+
+### Flaky Tests Found
+
+| Test | System | Fail Rate | Likely Cause | Recommendation |
+|------|--------|-----------|--------------|----------------|
+| [test_name] | [system] | [N]% | Timing | Quarantine + fix async |
+| [test_name] | [system] | [N]% | Float comparison | Fix: use epsilon compare |
+| [test_name] | [system] | [N]% | Order dependency | Investigate teardown |
+
+### Clean Tests (no flakiness detected)
+
+[N] tests ran across [N] runs with consistent results — no flakiness detected.
+
+### Data Limitations
+
+[Note if fewer than 5 runs were available — fewer runs = less statistical confidence]
+```
+
+---
+
+## 7. Update Regression Suite + Optional Report File
+
+Ask: "May I update the quarantine section of `tests/regression-suite.md`
+with the flaky tests found?"
+
+If yes: use `Edit` to append entries to the Quarantined Tests table.
+Never remove existing quarantine entries — only add new ones.
+
+Ask (separately): "May I write a full flakiness report to
+`production/qa/flakiness-report-[date].md`?"
+
+The full report includes per-test analysis with cause details and
+engine-specific fix snippets.
+
+After writing:
+
+- For each quarantined test: "Add the engine-specific skip annotation to
+  disable this test in CI. Re-enable after the root cause is fixed."
+- For fix-eligible tests: "The fix for [test] is straightforward —
+  change the equality comparison on line [N] to use `is_equal_approx`."
+- Summary: "Once all quarantine annotations are applied, CI should run green.
+  Schedule fix work for the [N] quarantined tests before the release gate."
+
+---
+
+## Collaborative Protocol
+
+- **Never delete test files** — quarantine means annotate + list, not remove
+- **Statistical confidence matters** — with < 3 runs, flag findings as
+  "suspected" not "confirmed"; ask if more run data is available
+- **Fix is always the goal** — quarantine is temporary; surface the fix
+  direction even when recommending quarantine
+- **Ask before writing** — both the regression-suite update and the report
+  file require explicit approval. On write: Verdict: **COMPLETE** — flakiness report written. On decline: Verdict: **BLOCKED** — user declined write.
+- **Flakiness in CI is a team problem** — surface the list and recommended
+  actions clearly; do not just silently quarantine without the team knowing