I've seen time and again that one of the biggest gaps in cloud computing security comes from misconfiguration of cloud services. These services are extremely powerful and flexible, and with that power and flexibility comes the potential for getting something wrong with your use case. A public S3 bucket on AWS is a great thing for openly available content, but get that switch flipped the wrong way for PII data and you can end up in the news — and not in a good way. Gartner has recognized the issue and has stated that they forecast that, by 2020, 80 percent or more of cloud security incidents will be due to misconfiguration. As Dave Linthicum of Infoworld recently said regarding Fugue's new survey, Cloud Infrastructure Security and Compliance Report:
"The complexities of cloud computing, and the chance of human error, will bite you in the butt. So don't skimp on security planning before deployment nor on security validation after deployment."
Don't skimp on security planning before deployment nor on security validation after deployment
Dave is talking about a survey of cloud professionals we sponsored to better understand how they are dealing with misconfiguration of cloud resources. We had more than 300 respondents, representing a cross section of IT Operations, CISOs, and other security-focused executives, and application developers.
Most of the respondents have between 20 and 100 workloads running in the cloud, with each workload containing 100-500 cloud resources. These are sizable footprints — anywhere from 2,000 to 50,000 cloud resources — and far more than any human can manage on their own.
We found that most of the participants are using some form of automation for configuration, but still have to manually monitor for misconfiguration after deployment and are using substantial resources to do so. Additionally, they are overwhelmingly concerned that what they are doing isn't enough to prevent serious security incidents. The survey bears this out:
■ 94 percent of respondents are either highly or somewhat concerned that their organization is at risk of a major security incident.
■ 97 percent think that policy compliance is very or somewhat important for their workloads.
■ A majority of the participants reported between 50 and 1,000 misconfiguration events every day.
These are expert users employing the best practices for cloud automation tooling, and things are still in bad shape. Perhaps most worrying is the fact that these misconfiguration events are ongoing and not concentrated on provisioning. This means that even if you provision correctly, as your footprint grows so will your exposure to misconfiguration. That won't scale.
Making matter worse, monitoring and alerting tools are not able to distinguish critical misconfigurations from either false positives or unimportant issues. It can often take hours — or longer — to identify a critical misconfiguration. Then it takes a significant amount of operator time to remediate it. Once the remediation takes place, most respondents told us that it takes yet another manual operation to report on the action taken.
Clearly, as our cloud footprints grow, our exposure due to misconfiguration also grows over time, and we cannot scale teams of operators to deal with the massive inflow of alerts that must be interpreted and fixes that must be applied. The cost of the current approach itself is punishing, with each incident taking upwards of an hour of operator time (and sometimes a lot more) and there are typically 50-1,000 incidents a day. There has to be a better way.
IT Operations and Security professionals are caught between a rock and a hard place with cloud misconfiguration. They can limit their organization's cloud agility by clamping down hard on which configurations are allowed, or they can live with a great deal of exposure, uncertainty and operational costs. This is a bad set of choices, as the very nature of cloud APIs both create the issue and offer a much more elegant approach to solving it than we had in the data center.
We can know what is agreed upon as a valid and safe configuration for a given application. With the right tools, we can perform static analyses of that configuration to provide insight into whether it is compliant with policy to prevent misconfiguration without taking away agility. Once deployed, we can autonomically revert any misconfiguration to the known-good state. This is a solvable problem, and much of the work to solve it has been done. We need to change how we address the problem organizationally to leverage the cloud in order to have a scalable, secure cloud footprint. Our survey results confirm that what we are doing now isn't working. Not even close.