AI in SRE: Where and how Google is deploying agentic AI to improve operations
Since its inception over 20 years ago, Google has used Site Reliability Engineering (SRE) to keep services like Search, Gmail, Maps, YouTube and Google Cloud reliable and highly available, adhering to the principles and practices of the reliability-first mindset. Recently though, the emergence of AI has driven multiple step-changes in system complexity. Interactions between components …
Read more “AI in SRE: Where and how Google is deploying agentic AI to improve operations”