Published On Apr 18, 2024
SREcon24 Americas - Automating Disaster Recovery: The Ultimate Reliability Challenge
Ricard Bejarano, Cisco Systems Inc.
Here's how I explain my job to non-techies: if a meteor struck our servers, it's on my team to fix it. But what if it did? Realistically, what would happen if a meteor struck your datacenter?
Here's the story of a vision, one to fully automate disaster recovery away, how I pushed back on it claiming it was impossible, and how we still executed on it to great success.
Ours is also a case study on why looking at these wide surface problems through the sociotechnical lens will set you up for success in places where you could've never anticipated.
So if a metaphorical meteor hit our datacenter, we would just press our metaphorical big red button.
View the full SREcon24 Americas program at https://www.usenix.org/conference/sre...