Amazon DevOps Guru Released
May 05, 2021

Amazon Web Services announced the general availability of Amazon DevOps Guru, a fully managed operations service that uses machine learning to make it easier for developers to improve application availability by automatically detecting operational issues and recommending specific actions for remediation.

Informed by years of Amazon.com and AWS operational excellence, Amazon DevOps Guru applies machine learning to automatically analyze data like application metrics, logs, events, and traces for behaviors that deviate from normal operating patterns. When Amazon DevOps Guru identifies anomalous application behavior that could cause potential outages or service disruptions, it alerts developers with issue details to help them quickly understand the potential impact and likely causes of the issue, with specific recommendations for remediation.

Developers can use remediation suggestions from Amazon DevOps Guru to reduce time to resolution when issues arise and improve application availability—all with no manual setup or machine learning expertise required. There are no upfront costs or commitments with Amazon DevOps Guru, and customers pay only for the data Amazon DevOps Guru analyzes.

Amazon DevOps Guru’s machine learning models leverage over 20 years of operational expertise in building, scaling, and maintaining highly available applications for Amazon.com. This gives Amazon DevOps Guru the ability to automatically detect operational issues (e.g. missing or misconfigured alarms, early warning of resource exhaustion, config changes that could lead to outages, etc.), provide context on resources involved and related events, and recommend remediation actions. With just a few clicks in the Amazon DevOps Guru console, historical application and infrastructure metrics like latency, error rates, and request rates for resources are automatically ingested from a user’s AWS applications and analyzed to establish normal operating bounds. Amazon DevOps Guru then uses a pre-trained machine learning model to identify deviations from this established baseline (e.g. under-provisioned compute capacity, database I/O utilization, memory leaks, etc.). When Amazon DevOps Guru analyzes system and application data to automatically detect anomalies, it also groups this data into operational insights that include anomalous metrics, visualizations of application behavior over time, and recommendations on actions for remediation—all easily viewable in the Amazon DevOps Guru console.

Amazon DevOps Guru also correlates and groups related application and infrastructure metrics (e.g. web application latency spikes, running out of disk space, bad code deployments, etc.) to reduce redundant alarms and help focus users on high-severity issues. Customers can see configuration change histories and deployment events, along with system and user activity, to generate a prioritized list of likely causes for an operational issue via a dashboard in the Amazon DevOps Guru console.

To help customers resolve issues quickly, Amazon DevOps Guru provides intelligent recommendations with remediation steps and integrates with AWS Systems Manager for runbook and collaboration tooling, giving customers the ability to more effectively maintain applications and manage infrastructure for their deployments. For example, when an analytics application using Amazon Relational Database Service (RDS) begins to exhibit degraded latencies, Amazon DevOps Guru will detect the change by automatically analyzing the relevant metrics across the application stack, identify the underlying root cause (e.g. increased number of concurrent compute instances writing to RDS), and provide a recommendation to resolve the issue (e.g. increase the provisioned RDS capacity and IOPS storage to handle the higher load).

“Customers continue to ask AWS for more services that enable them to take advantage of our decades of operational excellence in improving application availability running Amazon.com,” said Swami Sivasubramanian, VP, Amazon Machine Learning, AWS. “With Amazon DevOps Guru, we have taken that expertise and built specialized machine learning models to detect, troubleshoot, and prevent operational issues long before they impact customers and without dealing with cold starts each time an issue arises. Amazon DevOps Guru immediately provides customers the benefits of operational best practices we have learned running Amazon.com, and we designed Amazon DevOps Guru to be so simple that turning it on would be an easy choice for every AWS customer.”

With a few clicks in the AWS Management Console, customers can enable Amazon DevOps Guru to begin analyzing account and application activity within minutes to provide operational insights. Amazon DevOps Guru gives customers a single-console experience to visualize their operational data by summarizing relevant data across multiple sources (e.g. AWS CloudTrail, Amazon CloudWatch, AWS Config, AWS CloudFormation, AWS X-Ray) and reduces the need to switch between multiple tools.

Customers can also view correlated operational events and contextual data for operational insights within the Amazon DevOps Guru console and receive alerts via Amazon SNS.

Additionally, Amazon DevOps Guru supports API endpoints through the AWS SDK, making it easy for Amazon Partner Network Partners and customers to integrate Amazon DevOps Guru into their existing solutions for ticketing, paging, and automatic notification of engineers for high-severity issues. PagerDuty and Atlassian are among the AWS Partners that have integrated Amazon DevOps Guru into their operations monitoring and incident management platforms, and customers who use their solutions can now benefit from operational insights provided by Amazon DevOps Guru.

Amazon DevOps Guru is available in US East (N. Virginia), US East (Ohio), and US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm), with availability in additional regions in the coming months.

Together with Amazon CodeGuru—a developer tool powered by machine learning that provides intelligent recommendations for improving code quality and identifying an application’s most expensive lines of code—Amazon DevOps Guru provides customers the automated benefits of machine learning for their operational data so that developers can more easily improve application availability and reliability.

Share this

Industry News

April 25, 2024

JFrog announced a new machine learning (ML) lifecycle integration between JFrog Artifactory and MLflow, an open source software platform originally developed by Databricks.

April 25, 2024

Copado announced the general availability of Test Copilot, the AI-powered test creation assistant.

April 25, 2024

SmartBear has added no-code test automation powered by GenAI to its Zephyr Scale, the solution that delivers scalable, performant test management inside Jira.

April 24, 2024

Opsera announced that two new patents have been issued for its Unified DevOps Platform, now totaling nine patents issued for the cloud-native DevOps Platform.

April 23, 2024

mabl announced the addition of mobile application testing to its platform.

April 23, 2024

Spectro Cloud announced the achievement of a new Amazon Web Services (AWS) Competency designation.

April 22, 2024

GitLab announced the general availability of GitLab Duo Chat.

April 18, 2024

SmartBear announced a new version of its API design and documentation tool, SwaggerHub, integrating Stoplight’s API open source tools.

April 18, 2024

Red Hat announced updates to Red Hat Trusted Software Supply Chain.

April 18, 2024

Tricentis announced the latest update to the company’s AI offerings with the launch of Tricentis Copilot, a suite of solutions leveraging generative AI to enhance productivity throughout the entire testing lifecycle.

April 17, 2024

CIQ launched fully supported, upstream stable kernels for Rocky Linux via the CIQ Enterprise Linux Platform, providing enhanced performance, hardware compatibility and security.

April 17, 2024

Redgate launched an enterprise version of its database monitoring tool, providing a range of new features to address the challenges of scale and complexity faced by larger organizations.

April 17, 2024

Snyk announced the expansion of its current partnership with Google Cloud to advance secure code generated by Google Cloud’s generative-AI-powered collaborator service, Gemini Code Assist.

April 16, 2024

Kong announced the commercial availability of Kong Konnect Dedicated Cloud Gateways on Amazon Web Services (AWS).

April 16, 2024

Pegasystems announced the general availability of Pega Infinity ’24.1™.