Lessons in building resilient systems at Amazon and Meta | Zuodong Xiang | Conf42 IM 2024

Опубликовано: 01 Январь 1970
на канале: Conf42
14
like

Read the abstract ➤ https://www.conf42.com/Incident_Manag...
Other sessions at this event ➤ https://www.conf42.com/im2024
Support our mission ➤ https://www.conf42.com/support
Join Discord ➤   / discord  

Chapters
0:00 Introduction: What Can Possibly Go Wrong?
0:58 Real-Life Scenario: Flood of Traffic
3:01 Real-Life Scenario: Retry Storm
5:10 Real-Life Scenario: Plan B Went Poorly
6:24 Real-Life Scenario: Bad Commit
8:21 Real-Life Scenario: Lack of Sufficient Ownership
9:21 Real-Life Scenario: Script Errors
10:20 Prevention Strategies: Defensive Coding Practices
11:09 Logging and Error Handling Best Practices
12:35 Setting Effective Alerts
15:04 Mitigation Strategies for Alerts
15:46 Preparing for High Velocity Events
17:27 Conducting a Self Review
19:42 Conclusion and Takeaways