10/12 free views
Documentation
Documentation
Incident Response & Management

howtheysre

by upgundecha

9.4Kstars
822forks
232watchers
Updated 8 months ago
About

A comprehensive curated repository compiling Site Reliability Engineering (SRE) best practices, tools, techniques, and culture from leading tech organizations worldwide.

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

Primary Use Case

This repository serves as a centralized knowledge base for engineers, SRE practitioners, and tech leaders seeking to understand and adopt proven SRE methodologies and incident response strategies. It is ideal for teams aiming to improve reliability, automate incident management, and build resilient engineering cultures by learning from publicly shared experiences of top organizations.

Key Features
  • Curated collection of SRE best practices and tools
  • Extensive resources on incident response and post-mortem processes
  • Insights into building and hiring SRE teams
  • Coverage of monitoring, alerting, and observability techniques
  • Includes automation and chaos engineering strategies
  • Compilation of real-world case studies and blog posts from leading organizations
  • Focus on SRE culture and DevOps integration
  • Resources on testing in production and platform engineering
Security Frameworks
Reconnaissance
Resource Development
Detection
Response
Recovery
Usage Insights
  • Leverage the repository to build tailored incident response playbooks integrating SRE best practices.
  • Use the automation and chaos engineering resources to simulate realistic attack scenarios for purple team exercises.
  • Incorporate SRE monitoring and alerting techniques to enhance detection capabilities in blue team operations.
  • Adopt the cultural and hiring insights to strengthen cross-team collaboration between security and reliability engineers.
  • Utilize documented post-mortem processes to improve continuous learning and resilience after security incidents.

Docs Take 2 Hours. AI Takes 10 Seconds.

Ask anything about howtheysre. Installation? Config? Troubleshooting? Get answers trained on real docs and GitHub issues—not generic ChatGPT fluff.

3 free chats per tool • Instant responses • No credit card

Security Profile
Red Team40%
Blue Team70%
Purple Team80%
Details
LicenseCreative Commons Zero v1.0 Universal
LanguageJavaScript
Open Issues12
Topics
site-reliability-engineering
sre
chaos-engineering
dev-ops
devops
monitoring
observability
alerting
incident-response
incident-management