Amazon DevOps Guru Released
May 05, 2021

Amazon Web Services announced the general availability of Amazon DevOps Guru, a fully managed operations service that uses machine learning to make it easier for developers to improve application availability by automatically detecting operational issues and recommending specific actions for remediation.

Informed by years of Amazon.com and AWS operational excellence, Amazon DevOps Guru applies machine learning to automatically analyze data like application metrics, logs, events, and traces for behaviors that deviate from normal operating patterns. When Amazon DevOps Guru identifies anomalous application behavior that could cause potential outages or service disruptions, it alerts developers with issue details to help them quickly understand the potential impact and likely causes of the issue, with specific recommendations for remediation.

Developers can use remediation suggestions from Amazon DevOps Guru to reduce time to resolution when issues arise and improve application availability—all with no manual setup or machine learning expertise required. There are no upfront costs or commitments with Amazon DevOps Guru, and customers pay only for the data Amazon DevOps Guru analyzes.

Amazon DevOps Guru’s machine learning models leverage over 20 years of operational expertise in building, scaling, and maintaining highly available applications for Amazon.com. This gives Amazon DevOps Guru the ability to automatically detect operational issues (e.g. missing or misconfigured alarms, early warning of resource exhaustion, config changes that could lead to outages, etc.), provide context on resources involved and related events, and recommend remediation actions. With just a few clicks in the Amazon DevOps Guru console, historical application and infrastructure metrics like latency, error rates, and request rates for resources are automatically ingested from a user’s AWS applications and analyzed to establish normal operating bounds. Amazon DevOps Guru then uses a pre-trained machine learning model to identify deviations from this established baseline (e.g. under-provisioned compute capacity, database I/O utilization, memory leaks, etc.). When Amazon DevOps Guru analyzes system and application data to automatically detect anomalies, it also groups this data into operational insights that include anomalous metrics, visualizations of application behavior over time, and recommendations on actions for remediation—all easily viewable in the Amazon DevOps Guru console.

Amazon DevOps Guru also correlates and groups related application and infrastructure metrics (e.g. web application latency spikes, running out of disk space, bad code deployments, etc.) to reduce redundant alarms and help focus users on high-severity issues. Customers can see configuration change histories and deployment events, along with system and user activity, to generate a prioritized list of likely causes for an operational issue via a dashboard in the Amazon DevOps Guru console.

To help customers resolve issues quickly, Amazon DevOps Guru provides intelligent recommendations with remediation steps and integrates with AWS Systems Manager for runbook and collaboration tooling, giving customers the ability to more effectively maintain applications and manage infrastructure for their deployments. For example, when an analytics application using Amazon Relational Database Service (RDS) begins to exhibit degraded latencies, Amazon DevOps Guru will detect the change by automatically analyzing the relevant metrics across the application stack, identify the underlying root cause (e.g. increased number of concurrent compute instances writing to RDS), and provide a recommendation to resolve the issue (e.g. increase the provisioned RDS capacity and IOPS storage to handle the higher load).

“Customers continue to ask AWS for more services that enable them to take advantage of our decades of operational excellence in improving application availability running Amazon.com,” said Swami Sivasubramanian, VP, Amazon Machine Learning, AWS. “With Amazon DevOps Guru, we have taken that expertise and built specialized machine learning models to detect, troubleshoot, and prevent operational issues long before they impact customers and without dealing with cold starts each time an issue arises. Amazon DevOps Guru immediately provides customers the benefits of operational best practices we have learned running Amazon.com, and we designed Amazon DevOps Guru to be so simple that turning it on would be an easy choice for every AWS customer.”

With a few clicks in the AWS Management Console, customers can enable Amazon DevOps Guru to begin analyzing account and application activity within minutes to provide operational insights. Amazon DevOps Guru gives customers a single-console experience to visualize their operational data by summarizing relevant data across multiple sources (e.g. AWS CloudTrail, Amazon CloudWatch, AWS Config, AWS CloudFormation, AWS X-Ray) and reduces the need to switch between multiple tools.

Customers can also view correlated operational events and contextual data for operational insights within the Amazon DevOps Guru console and receive alerts via Amazon SNS.

Additionally, Amazon DevOps Guru supports API endpoints through the AWS SDK, making it easy for Amazon Partner Network Partners and customers to integrate Amazon DevOps Guru into their existing solutions for ticketing, paging, and automatic notification of engineers for high-severity issues. PagerDuty and Atlassian are among the AWS Partners that have integrated Amazon DevOps Guru into their operations monitoring and incident management platforms, and customers who use their solutions can now benefit from operational insights provided by Amazon DevOps Guru.

Amazon DevOps Guru is available in US East (N. Virginia), US East (Ohio), and US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm), with availability in additional regions in the coming months.

Together with Amazon CodeGuru—a developer tool powered by machine learning that provides intelligent recommendations for improving code quality and identifying an application’s most expensive lines of code—Amazon DevOps Guru provides customers the automated benefits of machine learning for their operational data so that developers can more easily improve application availability and reliability.

Share this

Industry News

June 22, 2021

Red Hat announced new end-to-end Kubernetes-native decision management capabilities as part of the latest release of Red Hat Process Automation.

June 22, 2021

GitLab announces the next iteration of its single application with its 14 release.

June 22, 2021

Transposit introduced new platform capabilities which are developer-friendly, but built for all.

June 22, 2021

Plutora transitioned to an expanded data-centric platform, added additional metrics to monitor and manage value stream flow, and deepened its integrations with Agile planning tools.

June 22, 2021

Opsera announces its native Salesforce CI/CD release automation functionality.

June 21, 2021

Render announced the general availability of autoscaling.

June 21, 2021

Grafana Labs acquired k6, the Stockholm-based startup behind the open source load testing tool for engineering teams.

June 17, 2021

Bitrise announced the release of its new enterprise-grade Mobile DevOps platform.

June 17, 2021

Perforce Software announces a partnership with Microsoft to deliver the free Enhanced Studio Pack, providing development tools in a click-to-start model on the Azure cloud.

June 17, 2021

Tigera announced the availability of Calico Cloud in the Microsoft Azure Marketplace.

June 16, 2021

Red Hat announced the general availability of Red Hat’s migration toolkit for virtualization to help organizations accelerate open hybrid cloud strategies by making it easier to migrate existing workloads to modern infrastructure in a streamlined, wholesale manner.

June 16, 2021

BrowserStack announced it has secured $200 million in Series B funding at a $4 billion valuation.

June 16, 2021

Harness announced significant platform updates that address gaps in today's developer and DevOps market.

June 15, 2021

Broadcom announced new capabilities for Value Stream Management (VSM) in its ValueOps software portfolio, seamlessly combining the proven investment planning features of Clarity™ with the advanced Agile management capabilities of Rally® software.

June 15, 2021

Copado announced its Summer 21 Release, opening up its platform for true multi-cloud DevOps for enterprise SaaS and low-code development.