Engineering for Resilience
Description
In this episode of the Data & AI Podcast, hosts Steve Bryen, CTO at Mesh-AI, and Eric Papaluca, Principal Consultant at Mesh-AI, are joined by Adrian Hornsby, CEO and founder of Resilium Labs, focused at helping customers improve system resilience. Together, they explore what it really takes to build resilient systems and cultures that can adapt, recover, and thrive in the face of disruption.
From the hard lesson of accidentally deleting a production database, to leading Chaos Engineering at Amazon, Adrian shares decades of experience turning incidents into learning opportunities. He explains why resilience isn’t something you buy, it’s something you build, and why cultural foundations are just as critical as technical tools in achieving operational excellence.
This episode dives deep into:
- Chaos vs. resilience engineering - and why the distinction matters
- How game days, postmortems, and psychological safety drive true learning
- The prevention paradox: why good resilience often looks like “nothing happening”
- How AI creates both new opportunities and new risks in incident response
- Why building resilience means transferring knowledge, not just deploying tools
Whether you’re scaling distributed systems, leading incident response, or building a culture of reliability, this is a must-listen for anyone serious about resilience in the age of AI.
“Good resilience is invisible — its output is non-events.” – Adrian Hornsby
Hosted on Acast. See acast.com/privacy for more information.






