Catching the Cloud Drift

October 13, 2022

Benjamin Brial
Cycloid

Cloud computing has become the cornerstone for businesses looking to scale infrastructure up or down quickly according to their needs with a lower total cost of ownership. But far from simplifying the IT executive's life, the expanded use of the cloud is introducing a whole new level of concept and complexity.

Research has found that the average enterprise uses 2.6 public clouds and 2.7 private clouds. Meanwhile, the average business is using 110 different cloud applications(link is external) in 2021, up from 80 in 2020. Digital transformation has exacerbated the problem, with organization's of all sizes now faced with an abundance of technology ‘choice' which is actually only serving to hinder cloud transformations.

It gets even more challenging when IT people need to communicate crucial aspects of an organization's cloud infrastructure to non-technical decision makers. And as more and more workloads are migrated to different clouds, the unique requirements of different processes, and the effects of configuration "drift" start to become clearer.

Put simply, cloud infrastructure is becoming harder to control manually.

The question is, how can organizations start to establish visibility into what are becoming more complex, harder to track environments, and how, with cloud configurations changing all the time, can teams develop repeatable processes that will enable them to take back control of their infrastructure and catch the drift before it gets out of hand?

The Rise of Automation and Infrastructure As Code

First, a brief history lesson. Those of a certain vintage might remember the halcyon days when you had to buy and maintain your own servers and machines. We evolved from this era of computing around 2005 with the widespread adoption of something called virtualization, a way of running multiple virtual machines on a single physical server.

Virtualization not only created infrastructure that was more efficient and easier to manage, it also allowed for the development of new technologies, such as cloud computing — something which has revolutionised the way businesses operate. But alongside all the benefits of the cloud — the flexibility, scalability, and cost efficiencies — organizations soon found themselves encountering scaling problems.

This is because provisioning, deploying and managing applications to the hybrid-cloud is costly, time-consuming and complex. For one thing, manually deploying instances of an application to multiple clouds with different configurations is prone to human error. Scale too quickly and you might end up missing key configurations and have to start all over again. Fail to configure an instance correctly and it could prove too costly to ever fix.

These problems necessitated the development of Infrastructure as Code (or IaC). With IaC, it is possible to provision and manage infrastructure using code instead of manual processes, allowing for greater speed, agility, and repeatability when provisioning infrastructure, enabling IT teams to automate cloud infrastructure deployment and processes further than ever before.

With IaC, you write a script that will automatically handle infrastructure tasks for you, saving not only time but also reducing the potential for human error. Simple, right? Problem solved! Well yes and no …

Managing IaC at Scale

Expectations often have a habit of not aligning with reality. If you've started using IaC to manage your infrastructure, you're already on your way to making cloud provisioning processes more manageable. But there's a second piece to the infrastructure lifecycle: how do you know what resources are not yet managed by IaC in your cloud? And of the managed resources, do they remain the same in the cloud as when you defined them in code?

Changes to cloud workloads happen all the time. Increasing the amount of workloads running in the cloud means an increasing number of people and authenticated services interacting with infrastructures, across several cloud environments. As IaC becomes more widely adopted and IaC codebases become larger, it becomes more and more difficult to manually track if configuration changes are being accounted for, which is why policies are required to keep on top of everything.

Misconfigurations are, of course, rife across cloud computing environments, but most organizations are not prepared to manually address the issue. If there are differences between the IaC configuration and the actual infrastructure, this is when cloud or configuration drift can occur.

Drift is when your IaC state, or configurations in code, differ from your cloud state, or configurations of running resources. For example, if you use Terraform — one of the leading IaC tools that allows users to define and manage IaC — to provision a database without encryption turned on and a site reliability engineer goes in and adds encryption using the cloud console, then the Terraform state file is no longer in sync with the actual cloud infrastructure. The code, in turn, becomes less useful and harder to manage.

Now imagine this volume of code, at scale, across multiple clouds. Tricky. So what can be done?

Prescriptive vs Declarative: Establishing an IaC baseline

The issue lies in the approach.

Today, the majority of organizations are operating in what could be described as a prescriptive way, building environments where processes are still manual, even with the introduction of IaC. People are still needed to connect to platforms to patch, enhance, and configure infrastructure. The issue, as we've seen, is managing the delta between the current automation of a platform and the rest of its life cycle.

A shift is needed to catch the drift. A shift towards a declarative approach that removes people from the equation completely so that what is versioned is what is in production. This removes one of the biggest concerns for organizations, which is how to react to problems when they occur and how to reproduce infrastructure. Establishing an IaC baseline of an environment that is deemed to be the known state means that what is versioned becomes the constant source of trust and the desired state of an organization's infrastructure.

Once this has been established it opens doors to bigger and better automation and improved all-round experience for developers who no longer need to worry about taking time away from developing to configure infrastructure and troubleshoot issues. Retro-engineering IaC in a way that enables organizations to define what infrastructure needs to look like in a declarative way would enable organizations to start taking back control of their IaC and create visibility into drift long before it gets out of hand.

What often first started in many organizations as just one cloud instance has now spiralled, ploomed, morphed and mushroomed into a hydra's head of cloud applications that often lacks coherence or defined procedures. The only two certainties are that cloud applications are here to stay and that companies that fail to manage their cloud portfolio are going to face some difficult times. The 100-plus cloud applications used by businesses today require active and planned management that eliminates human intervention and provides a consistent approach to the processes involved. The good news is through IaC the tools are at hand to do the job.

Benjamin Brial is the Founder of Cycloid

Industry News

Check Point to Acquire Veriti to Transform Threat Exposure Management and Reduce Organizations' Cyber Attack Surface

May 27, 2025

AI-fueled attacks and hyperconnected IT environments have made threat exposure one of the most urgent cybersecurity challenges facing enterprises today. In response, Check Point® Software Technologies Ltd.(link is external) announced a definitive agreement to acquire Veriti Cybersecurity, the first fully automated, multi-vendor pre-emptive threat exposure and mitigation platform.

LambdaTest Introduces Automation MCP Server

May 27, 2025

LambdaTest announced the launch of its Automation MCP Server, a solution designed to simplify and accelerate the process of triaging test failures.

Next-Gen Security Operations Center Capabilities Added to DefectDojo Pro

May 27, 2025

DefectDojo announced the launch of their next-gen Security Operations Center (SOC) capabilities for DefectDojo Pro, which provides both SOC and AppSec professionals a unified platform for noise reduction and prioritization of SOC alerts and AppSec findings.

Check Point Software Technologies Named One of America's Best Cybersecurity Companies by Newsweek and Statista

May 22, 2025

Check Point® Software Technologies Ltd.(link is external) has been recognized on Newsweek’s 2025 list of America’s Best Cybersecurity Companies(link is external).

Red Hat Introduces AI-Powered Management and Extends Container-Native Reach for Red Hat Enterprise Linux

May 22, 2025

Red Hat announced enhanced features to manage Red Hat Enterprise Linux.

StackHawk Raises $12 Million in Strategic Funding

May 22, 2025

StackHawk has taken on $12 Million in additional funding from Sapphire and Costanoa Ventures to help security teams keep up with the pace of AI-driven development.

Red Hat Introduces Cloud-Optimized Red Hat Enterprise Linux

May 21, 2025

Red Hat announced jointly-engineered, integrated and supported images for Red Hat Enterprise Linux across Amazon Web Services (AWS), Google Cloud and Microsoft Azure.

Komodor Integrates with Internal Developer Portals

May 21, 2025

Komodor announced the integration of the Komodor platform with Internal Developer Portals (IDPs), starting with built-in support for Backstage and Port.

Operant Launches Woodpecker

May 21, 2025

Operant AI announced Woodpecker, an open-source, automated red teaming engine, that will make advanced security testing accessible to organizations of all sizes.

Shopify Summer '25 Edition Released

May 21, 2025

As part of Summer '25 Edition, Shopify is rolling out new tools and features designed specifically for developers.

Lenses.io Releases Suite of AI Agents

May 21, 2025

Lenses.io announced the release of a suite of AI agents that can radically improve developer productivity.

Google Announces New Developer Tools

May 20, 2025

Google unveiled a significant wave of advancements designed to supercharge how developers build and scale AI applications – from early-stage experimentation right through to large-scale deployment.

Red Hat Advanced Developer Suite Introduced

May 20, 2025

Red Hat announced Red Hat Advanced Developer Suite, a new addition to Red Hat OpenShift, the hybrid cloud application platform powered by Kubernetes, designed to improve developer productivity and application security with enhancements to speed the adoption of Red Hat AI technologies.

Perforce Intelligence Introduced

May 20, 2025

Perforce Software announced Perforce Intelligence, a blueprint to embed AI across its product lines and connect its AI with platforms and tools across the DevOps lifecycle.

CloudBees Unify Introduced

May 20, 2025

CloudBees announced CloudBees Unify, a strategic leap forward in how enterprises manage software delivery at scale, shifting from offering standalone DevOps tools to delivering a comprehensive, modular solution for today’s most complex, hybrid software environments.

DEVOPSdigest

The Rise of Automation and Infrastructure As Code

Managing IaC at Scale

Prescriptive vs Declarative: Establishing an IaC baseline

Industry News

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

The Latest

Hot Topics

The Rise of Automation and Infrastructure As Code

Managing IaC at Scale

Prescriptive vs Declarative: Establishing an IaC baseline

Related Links

Industry News

Search form

On-Demand Webinars

Analyst Reports

White Papers

Media Partners

User login

The Latest

Hot Topics