#722: The Frugal Architect w/Werner Vogels: How Warner Bros. Discovery keeps streaming seamless
Digest
This AWS podcast episode (722) features Werner Vogels and Tom Leman (VP of SRE at Warner Bros. Discovery, WBD), exploring WBD's approach to building reliable and cost-effective streaming platforms on AWS. Leman details his role, emphasizing the importance of observability and operational intelligence in managing hundreds of microservices across nine AWS regions. WBD utilizes a standardized operational metadata schema to catalog cloud resources, improving visibility into system health, security, and cost management, directly impacting customer experience. Cost optimization is crucial, with "cost per subscriber" serving as a key metric. The discussion highlights the balance between frugality and avoiding short-sighted cost-cutting. The episode stresses aligning technical decisions with business goals, particularly in the context of post-merger integration, where a "best of both" approach was adopted. A "closed door" philosophy guides irreversible architectural decisions, particularly concerning databases. Finally, WBD's "celebration of error" approach to incident management emphasizes learning and improvement over blame, fostering knowledge sharing and process enhancements.
Outlines

Introduction and Site Reliability Engineering at WBD
Introduction to AWS podcast episode 722, featuring Werner Vogels and Tom Leman, VP of SRE at Warner Brothers Discovery. The episode focuses on WBD's approach to building reliable and cost-effective streaming platforms, including Tom Leman's description of his role and the importance of observability and operational intelligence in managing hundreds of microservices across nine AWS regions.

Operational Metadata, Customer Experience, and Cost Optimization
Discussion on WBD's operational metadata schema, improving visibility into system health, security, and cost management. This is linked to cost optimization strategies, using "cost per subscriber" as a key metric, and balancing frugality with maintaining customer experience.

Business Alignment, Merger Integration, and Architectural Decisions
The importance of aligning technical decisions with business goals is emphasized, particularly regarding the successful integration of two organizations after a merger using a "best of both" approach and the role of shared operational metadata. The "closed door" philosophy for irreversible architectural decisions is also explained.

Incident Management and Conclusion
WBD's "celebration of error" approach to incident management is detailed, focusing on learning from incidents and improving systems, processes, and knowledge sharing.
Keywords
Operational Metadata Schema
A standardized system for cataloging and organizing cloud resources, improving visibility into system health, security, and cost. Enables efficient resource management and streamlined operations.
Cost Per Subscriber
A key performance indicator (KPI) used to measure the efficiency of a streaming service. It balances cost growth with subscriber acquisition, providing a business-focused metric for cost optimization.
Celebration of Error
A positive approach to incident management that emphasizes learning and improvement from errors. Focuses on shared learnings, process improvements, and knowledge transfer.
Frugal Architecture
Designing and building systems that are cost-effective and efficient without compromising reliability or customer experience. Prioritizes long-term value and avoids short-sighted cost-cutting.
Site Reliability Engineering (SRE)
The discipline of ensuring the reliability and scalability of systems, particularly in large-scale cloud environments.
Microservices
An architectural style that structures an application as a collection of loosely coupled, independently deployable services.
AWS
Amazon Web Services, a comprehensive cloud platform providing various services for computing, storage, databases, and more.
Cloud Cost Optimization
Strategies and techniques for reducing cloud computing expenses while maintaining performance and reliability.
Merger Integration
The process of combining two or more organizations' IT systems and infrastructure after a merger or acquisition.
Q&A
How does WBD's operational metadata schema improve efficiency and cost management?
The schema provides a standardized way to track and manage cloud resources, improving visibility into resource utilization, security vulnerabilities, and cost allocation. This allows for better resource optimization and proactive cost management.
What is the "celebration of error" approach to incident management, and what are its benefits?
It focuses on learning from incidents to improve systems, processes, and knowledge sharing. It shifts the focus from blame to understanding and improvement, leading to a more positive and proactive approach to reliability.
How does WBD balance frugality with the need for reliable and scalable systems?
WBD uses metrics like "cost per subscriber" to track efficiency, prioritizing long-term cost optimization without compromising customer experience. They also employ a "closed door" philosophy for irreversible architectural decisions.
How did WBD successfully integrate two large organizations after a merger?
A "best of both" approach was adopted, with engineers from both organizations collaborating to create a new platform. The existing operational metadata schema from Discovery+ played a key role in standardizing the new system.
Show Notes
Learn More: http://thefrugalarchitect.com/architects/tom-leaman-warner-bros-discovery.html




