Files

panw a16fe4bff7 添加 claude code game studios 到项目

2026-05-15 14:52:29 +08:00

6.5 KiB

Raw Blame History

Skill Test Spec: /test-flakiness

Skill Summary

/test-flakiness detects non-deterministic tests by analyzing test history logs (if available) or scanning test source code for common flakiness patterns (random numbers without seeds, real-time waits, external I/O). No director gates are invoked. The skill does not write without user approval. Verdicts: NO FLAKINESS, SUSPECT TESTS FOUND, or CONFIRMED FLAKY.

Static Assertions (Structural)

Verified automatically by /skill-test static — no fixture needed.

Has required frontmatter fields: name, description, argument-hint, user-invocable, allowed-tools
Has ≥2 phase headings
Contains verdict keywords: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY
Does NOT require "May I write" language (read-only; optional report requires approval)
Has a next-step handoff (what to do after flakiness findings)

Director Gate Checks

None. Flakiness detection is an advisory quality skill for the QA lead; no gates are invoked.

Test Cases

Case 1: Happy Path — Clean test history, no flakiness

Fixture:

production/qa/test-history/ contains logs for 10 test runs
All tests pass consistently across all 10 runs (100% pass rate per test)
No test has a failure pattern

Input: /test-flakiness

Expected behavior:

Skill reads test history logs from production/qa/test-history/
Skill computes per-test pass rate across 10 runs
All tests pass all 10 runs — no inconsistency detected
Verdict is NO FLAKINESS

Assertions:

Skill reads test history logs when available
Per-test pass rate is computed across all available runs
Verdict is NO FLAKINESS when all tests pass consistently
No files are written

Case 2: Suspect Tests Found — Test fails intermittently in history

Fixture:

production/qa/test-history/ contains logs for 10 test runs
test_combat_damage_applies_crit_multiplier passes 7 times, fails 3 times
Failure messages differ (sometimes timeout, sometimes wrong value)

Input: /test-flakiness

Expected behavior:

Skill reads test history logs — computes pass rates
test_combat_damage_applies_crit_multiplier has 70% pass rate (threshold: 95%)
Skill flags it as SUSPECT with pass rate (7/10) and failure pattern noted
Verdict is SUSPECT TESTS FOUND
Skill recommends investigating the test for timing or state dependencies

Assertions:

Tests below the pass-rate threshold are flagged by name
Pass rate (fraction and percentage) is shown for each suspect test
Failure pattern (e.g., inconsistent error messages) is noted if detectable
Verdict is SUSPECT TESTS FOUND
Skill recommends investigation steps

Case 3: Source Pattern — Random number used without seed

Fixture:

No test history logs exist

tests/unit/loot/loot_drop_test.gd contains:

var roll = randf()  # unseeded random — non-deterministic
assert_gt(roll, 0.5, "Loot should drop above 50%")

Input: /test-flakiness

Expected behavior:

Skill finds no test history logs
Skill falls back to source code analysis
Skill detects randf() call without a preceding seed() call
Skill flags the test as FLAKINESS RISK (source pattern, not confirmed)
Verdict is SUSPECT TESTS FOUND (pattern detected, not confirmed by history)
Skill recommends seeding random before the call or mocking the random function

Assertions:

Source code analysis is used as fallback when no history logs exist
Unseeded random number usage is detected as a flakiness risk
Verdict is SUSPECT TESTS FOUND (not CONFIRMED FLAKY — no history to confirm)
Remediation recommends seeding or mocking

Case 4: No Test History — Source-only analysis with common patterns

Fixture:

production/qa/test-history/ does not exist
tests/ contains 15 test files
Scan finds 2 tests using OS.get_ticks_msec() for timing assertions
No other flakiness patterns found

Input: /test-flakiness

Expected behavior:

Skill checks for test history — not found
Skill notes: "No test history available — analyzing source code for flakiness patterns only"
Skill scans all test files for known patterns: unseeded random, real-time waits, system clock usage
Finds 2 tests using OS.get_ticks_msec() — flags as FLAKINESS RISK
Verdict is SUSPECT TESTS FOUND

Assertions:

Skill notes clearly that source-only analysis is being performed (no history)
Common flakiness patterns are scanned: random, time-based assertions, external I/O
OS.get_ticks_msec() usage for assertions is flagged as a flakiness risk
Verdict is SUSPECT TESTS FOUND when source patterns are found

Case 5: Gate Compliance — No gate; flakiness report is advisory

Fixture:

Test history shows 1 CONFIRMED FLAKY test (fails 6 out of 10 runs)
review-mode.txt contains full

Input: /test-flakiness

Expected behavior:

Skill analyzes test history; identifies 1 confirmed flaky test
No director gate is invoked regardless of review mode
Verdict is CONFIRMED FLAKY
Skill presents findings and offers optional written report
If user opts in: "May I write to production/qa/flakiness-report-[date].md?"

Assertions:

No director gate is invoked in any review mode
CONFIRMED FLAKY verdict requires history-based evidence (not just source patterns)
Optional report requires "May I write" before writing
Flakiness report is advisory for qa-lead; skill does not auto-disable tests

Protocol Compliance

Reads test history logs when available; falls back to source analysis when not
Notes clearly which analysis mode is being used (history vs. source-only)
Flakiness threshold (e.g., 95% pass rate) is used for SUSPECT classification
CONFIRMED FLAKY requires history evidence; SUSPECT covers source patterns only
Does not disable or modify any test files
No director gates are invoked
Verdict is one of: NO FLAKINESS, SUSPECT TESTS FOUND, CONFIRMED FLAKY

Coverage Notes

The pass-rate threshold for SUSPECT classification (95% suggested above) is an implementation detail; the tests verify that intermittent failures are flagged, not the exact threshold value.
Tests that fail due to environment issues (missing assets, wrong platform) are not flakiness — the skill distinguishes environment failures from non-determinism in the test itself; this distinction is not explicitly tested here.

6.5 KiB Raw Blame History

Skill Test Spec: /test-flakiness

Skill Summary

Static Assertions (Structural)

Director Gate Checks

Test Cases

Case 1: Happy Path — Clean test history, no flakiness

Case 2: Suspect Tests Found — Test fails intermittently in history

Case 3: Source Pattern — Random number used without seed

Case 4: No Test History — Source-only analysis with common patterns

Case 5: Gate Compliance — No gate; flakiness report is advisory

Protocol Compliance

Coverage Notes

6.5 KiB

Raw Blame History