Most companies work hard to avoid costly failures, but in complex systems a better approach is to embrace and learn from them. Through chaos engineering, you can proactively hunt for evidence of system weaknesses before they trigger a crisis. This practical book shows software developers and system administrators how to plan and run successful chaos engineering experiments. System weaknesses go beyond your infrastructure, platforms, and applications to include policies, practices, playbooks, and people. Author Russ Miles explains why, when, and how to test systems, processes, and team responses using simulated failures on Game Days. You'll also learn how to work toward continuous chaos through automation with features you can share across your team and organization. Learn to think like a chaos engineer Build a hypothesis backlog to determine what could go wrong in your system Develop your hypotheses into chaos engineering experiment Game Days Write, run, and learn from automated chaos experiments using the open source Chaos Toolkit Turn chaos experiments into tests to confirm that you've overcome the weaknesses you discovered Observe and control your automated chaos experiments while they are running.
Learning Chaos Engineering : Discovering and Overcoming System Weaknesses Through Experimentation