Digital Library, Books and Resources Hub
The SRE resource library
SRE represents a mindset, engineering practices, and a job function. Here you will find articles, videos, and guides to help you implement SRE principles and run reliable production systems.
Start your journey by exploring
Machine Learning in Production
Continue your journey by reading
Efficient Machine Learning Inference
Extend your journey by watching
Machine Learning at Scale
-
Building Secure & Reliable Systems
Can a system be considered truly reliable if it isn't fundamentally secure? Or can it be considered secure if it's unreliable? Security is crucial to the design and operation of scalable systems in production, as it plays an important part in product quality, performance, and availability. In this book, experts from Google share best practices to help your organization design scalable and reliable systems that are fundamentally secure.
Edited by: Heather Adkins, Betsy Beyer, Paul Blankinship, Ana Oprea, Piotr Lewandowski, Adam Stubblefield
-
The Site Reliability Workbook
The Site Reliability Workbook is the hands-on companion to the bestselling Site Reliability Engineering book and uses concrete examples to show how to put SRE principles and practices to work. This book contains practical examples from Google’s experiences and case studies from Google’s Cloud Platform customers. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t.
Edited by: Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara and Stephen Thorne
-
Site Reliability Engineering
Members of the SRE team explain how their engagement with the entire software lifecycle has enabled Google to build, deploy, monitor, and maintain some of the largest software systems in the world.
Edited by: Betsy Beyer, Chris Jones, Jennifer Petoff and Niall Richard Murphy
Begin by reading
Implementing SLOs
Dig deeper by exploring
Alerting on SLOs
Build your skills with
Art of SLOs
Learn the basics by reading
Introducing Non-Abstract Large System Design
Develop fundamentals by exploring
SRE Classroom: Distribued ImageServer
Build advanced skills with this video workshop
How to Design a Distributed System
Filter by:
Sorry, no available at the moment.