SREcon24 Americas - Automating Disaster Recovery: The Ultimate Reliability Challenge
USENIX USENIX
34K subscribers
155 views
0

 Published On Apr 18, 2024

SREcon24 Americas - Automating Disaster Recovery: The Ultimate Reliability Challenge

Ricard Bejarano, Cisco Systems Inc.

Here's how I explain my job to non-techies: if a meteor struck our servers, it's on my team to fix it. But what if it did? Realistically, what would happen if a meteor struck your datacenter?

Here's the story of a vision, one to fully automate disaster recovery away, how I pushed back on it claiming it was impossible, and how we still executed on it to great success.

Ours is also a case study on why looking at these wide surface problems through the sociotechnical lens will set you up for success in places where you could've never anticipated.

So if a metaphorical meteor hit our datacenter, we would just press our metaphorical big red button.

View the full SREcon24 Americas program at https://www.usenix.org/conference/sre...

show more

Share/Embed