添加 claude code game studios 到项目
This commit is contained in:
135
.claude/docs/templates/incident-response.md
vendored
Normal file
135
.claude/docs/templates/incident-response.md
vendored
Normal file
@@ -0,0 +1,135 @@
|
||||
# Incident Response: [Incident Title]
|
||||
|
||||
**Severity**: [S1-Critical / S2-Major / S3-Moderate / S4-Minor]
|
||||
**Status**: [Active / Mitigated / Resolved / Post-Mortem Complete]
|
||||
**Detected**: [Date Time UTC]
|
||||
**Resolved**: [Date Time UTC or ONGOING]
|
||||
**Duration**: [Total time from detection to resolution]
|
||||
**Incident Commander**: [Name/Role]
|
||||
|
||||
---
|
||||
|
||||
## Impact Summary
|
||||
|
||||
[2-3 sentences describing what players experienced. Write from the player
|
||||
perspective, not the technical perspective.]
|
||||
|
||||
- **Players affected**: [estimated count or percentage]
|
||||
- **Platforms affected**: [PC / Console / Mobile / All]
|
||||
- **Regions affected**: [All / specific regions]
|
||||
- **Revenue impact**: [estimated if applicable]
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
| Time (UTC) | Event | Action Taken |
|
||||
| ---- | ---- | ---- |
|
||||
| [HH:MM] | Incident detected via [monitoring/player report/etc.] | Incident commander assigned |
|
||||
| [HH:MM] | Root cause identified | [Brief description of cause] |
|
||||
| [HH:MM] | Mitigation deployed | [What was done] |
|
||||
| [HH:MM] | Service restored / Fix confirmed | Monitoring for recurrence |
|
||||
| [HH:MM] | All-clear declared | Post-mortem scheduled |
|
||||
|
||||
---
|
||||
|
||||
## Root Cause
|
||||
|
||||
### What Happened
|
||||
[Technical description of the root cause. Be specific about the chain of events
|
||||
that led to the incident.]
|
||||
|
||||
### Why It Happened
|
||||
[Systemic cause — why did existing processes, tests, or safeguards fail to
|
||||
prevent this? This is more important than the technical cause.]
|
||||
|
||||
### Contributing Factors
|
||||
- [Factor 1 — e.g., "Insufficient load testing for the new matchmaking system"]
|
||||
- [Factor 2 — e.g., "Monitoring alert threshold was set too high"]
|
||||
- [Factor 3]
|
||||
|
||||
---
|
||||
|
||||
## Mitigation and Resolution
|
||||
|
||||
### Immediate Actions (during incident)
|
||||
1. [Action taken to stop the bleeding]
|
||||
2. [Action taken to restore service]
|
||||
3. [Action taken to verify resolution]
|
||||
|
||||
### Follow-Up Actions (after resolution)
|
||||
1. [Permanent fix if immediate action was a workaround]
|
||||
2. [Additional testing or monitoring added]
|
||||
3. [Process changes to prevent recurrence]
|
||||
|
||||
---
|
||||
|
||||
## Player Communication
|
||||
|
||||
### Initial Acknowledgment
|
||||
*Sent: [Time] via [channel]*
|
||||
> [Exact text of the first public message acknowledging the issue]
|
||||
|
||||
### Status Updates
|
||||
*Sent: [Time] via [channel]*
|
||||
> [Text of each subsequent update]
|
||||
|
||||
### Resolution Notice
|
||||
*Sent: [Time] via [channel]*
|
||||
> [Text announcing the fix and any compensation]
|
||||
|
||||
### Compensation (if applicable)
|
||||
- **What**: [description of compensation — e.g., "500 premium currency + 24-hour XP boost"]
|
||||
- **Who**: [all players / affected players only / players who logged in during incident]
|
||||
- **When**: [delivery date and method]
|
||||
- **Rationale**: [why this compensation is appropriate for the impact]
|
||||
|
||||
---
|
||||
|
||||
## Prevention
|
||||
|
||||
### What We Are Changing
|
||||
|
||||
| Action Item | Owner | Deadline | Status |
|
||||
| ---- | ---- | ---- | ---- |
|
||||
| [Specific preventive measure] | [Role] | [Date] | [TODO/Done] |
|
||||
| [Add monitoring for X] | [Role] | [Date] | [TODO/Done] |
|
||||
| [Add test coverage for Y] | [Role] | [Date] | [TODO/Done] |
|
||||
| [Update runbook for Z] | [Role] | [Date] | [TODO/Done] |
|
||||
|
||||
### Process Improvements
|
||||
- [Process change to prevent similar incidents]
|
||||
- [Monitoring/alerting improvement]
|
||||
- [Testing improvement]
|
||||
|
||||
---
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
### What Went Well
|
||||
- [Positive aspect of incident response — e.g., "Detection was fast due to
|
||||
monitoring alerts"]
|
||||
- [Positive aspect]
|
||||
|
||||
### What Went Poorly
|
||||
- [Problem with response — e.g., "Took 20 minutes to identify the correct
|
||||
on-call person"]
|
||||
- [Problem]
|
||||
|
||||
### Where We Got Lucky
|
||||
- [Factor that reduced impact by chance rather than design — these are hidden
|
||||
risks to address]
|
||||
|
||||
---
|
||||
|
||||
## Sign-Offs
|
||||
|
||||
- [ ] Technical Director — Root cause accurate, prevention plan sufficient
|
||||
- [ ] QA Lead — Test coverage gaps addressed
|
||||
- [ ] Producer — Timeline and communication reviewed
|
||||
- [ ] Community Manager — Player communication reviewed
|
||||
|
||||
---
|
||||
|
||||
*This document is filed in `production/hotfixes/` and linked from the
|
||||
release notes for the fix version.*
|
||||
Reference in New Issue
Block a user