Day 2 Operations in the Public Cloud

July 7, 2020

Tweet This:
Share on LinkedIn:

By John Valentine, Kovarus Cloud Practice Manager

An often-overlooked component of building out a cloud strategy is how our operations teams will actually manage that cloud environment once it’s up and running. This is one of the major aspects we talk a lot about at Kovarus as we progress along the path of cloud adoption. If you look at the following image, it may look familiar as I dove deep into this in a previous 4-part blog, but determining how we run our day-2 operations should be part of the conversation during the Cloud Strategy and Cloud Landing Zone phases.

Let’s walk through some of the things we need to consider when we start looking at day-2 operations in the cloud with the first one being native tooling versus 3rd party. Because we focus primarily on helping customers build out hybrid cloud environments, we tend to recommend 3rd-party tooling over native cloud solutions. There’s a reason for this — consistency of management. This may not be as important if we are a born in the cloud organization with several cloud engineers on staff, but for most teams, that skillset is hard to find and expensive. Let’s look at some of the data points around complexity and its impact, as well as the benefits if we have consistent tooling across our environment (data provided by ESG):

  • 66% of businesses say IT is more complex compared to two years ago.
  • 54% of respondents indicated public cloud management is more difficult.
  • 47% of respondents indicated cloud management is more difficult than other cloud computing tasks.

This shows us that when IT shops are trying to manage not only their on-premises environment, but one or more public cloud environments, the pain is real. When organizations standardize the way they manage and use resources across their environments, the payoff is substantial. Let’s again look at some data around this (data provided by ESG):

  • Reduce costs by 19% on average.
  • Reduce the number of security breaches, application outages, or other events affecting their public cloud-resident data by 30% on average.
  • Shorten the calendar time needed to migrate a cloud workload from one cloud to another, or back on-premises, by 35% on average.
  • Free up an average of 70.5 person-hours per week (or nearly 2 full-time equivalents) in infrastructure management time.
  • Improve developer experience and performance: 96% believe it will be easier for developers to push code to production, with 56% saying they would expect at least daily code pushes.
  • Reduce the frequency of problematic cloud projects, shrinking the frequency of budget overages and timeline overruns by 28% and 38% respectively.
  • Increase their pace of innovation (74% reported), ultimately resulting in five incremental products/services launched annually.

So, we can see that when we begin to standardize our tooling between our on-premises environments and our various clouds, we see substantial benefit. Now let’s look at what we need to consider as we build out our cloud strategy. At Kovarus, we call this a Cloud Landing Zone. This is not intended to be an architecture in the sense of how the account structure, VPCs, connectivity, etc. are laid out. This is meant to be a checklist of things we need to consider as we plan our architecture. Kovarus helps customers build Landing Zone architectures as well, but that’s a conversation for a different day. Following is a graphic that shows an area of focus with a respective logo next to it. Don’t focus too much on the logos, those are just a few of the vendors that Kovarus feels is best-of-breed in their industry; the real focus should be having some solution for each of these areas. Let’s break each of these down in more detail.

Cloud Service Provider

This may seem obvious, but we need to have our cloud provider selected, as that may determine some of the tooling that we will use as we manage our cloud environment. There’s really no right or wrong choice when selecting a cloud provider as it really depends on the needs of the business, existing relationships and what services are important for your organization. The correct choice may be two or three clouds, depending on the need.

Cost Management and Governance

Not necessarily on the topic of consistency, but more on visibility. In our data center, we know how much we’ve spent because we bought all of our hardware through a structured procurement process. We negotiated the purchase with the vendor or our partner of choice, signed the PO and received the exact thing (hopefully) that we ordered. This isn’t the way cloud is procured and we can quickly lose sight of the charges we accrue with our cloud provider. That’s why tools like CloudHealth or others are an absolute necessity when we begin adopting the cloud. All the cloud vendors have native tooling, but it’s not focused as much on savings and where to cut costs, but rather on what you’re spending. CloudHealth and others focus on where you can cut costs, who is spending how much money and on what, and where we stray from recommended best practices.

Monitoring and Log Management

This is where we can start driving consistency across our environments. Most of us have some sort of existing infrastructure monitoring and log management solutions we have accrued over the years that work well. These may be SolarWinds, Splunk, ELK stack, Grafana or others. The problem with these traditional infrastructure monitoring solutions is they don’t really understand cloud native services, nor do they really dive deep into the actual thing we care about — our application performance. It’s also challenging to correlate various events together when we have multiple tools and dashboards that we must look at to solve the problem, so we often miss key indicators leading to a very reactive remediation approach.

This is where solutions like Application Performance Monitoring (APM) solutions and AIOps tools become incredibly valuable. Solutions like Dynatrace and DataDog are incredibly adept at monitoring and providing historical log data in the cloud. They were built for the cloud and focus on the applications we are actually trying to ensure perform well. The added benefit is these tools also work very well in our on-premises environment, so we are driving that need for consistency.

To further improve our mean time to repair (MTTR), we can layer AIOps solutions like Moogsoft on top of our APM solution (and any legacy solutions) to provide proactive resolution guidance and reduce a ton of noise in our environment.

Connectivity and Security

Networking is incredibly important when we have a hybrid-cloud environment. If we start using a lot of the native networking and security solutions provided by the cloud vendors, we create two different silos of management. This may not be an issue at a small scale, but as we grow the number of accounts, environments or cloud providers, this will grow management overhead drastically. This is where consistency is important; if we use Palo Alto Network firewalls today, why not use the same firewalls in our cloud environment? The management tools are the same, any automation templates we used will call the same APIs and our overall security posture and management approach are greatly improved. There is also the added benefit of having a more feature-rich solution over the native tooling.

Another important consideration is how we approach secrets management across both our public and private cloud environments. Solutions like HashiCorp Vault simplify this process and require no additional operational knowledge between our two environments.

Data Protection and Disaster Recovery

Every cloud provider has their own set of tooling for data protection and all have the capability (to some degree or another) to also protect on-premises workloads. That said, most traditional IT organizations have existing data protection tools that work very well, are proven and are already integrated into the existing workflows and processes. Why change our tooling just because we changed where our infrastructure lives. Almost every enterprise backup provider has a virtual, cloud-ready offering. Some solutions are much more feature-rich than others; take Rubrik for example, which has an incredibly clean and polished hybrid-cloud offering that not only backs up workloads from our private cloud, but also protects native workloads incredibly well.

Provisioning

In my opinion, this is the most important component to driving consistency across our private and public cloud environments. Not only do we at Kovarus recommend automation everywhere, but so do AWS, Azure and GCP. If we do something more than once, we should automate that thing, whether that’s provisioning a simple Azure VM, or a fully automated HashiCorp Terraform Landing Zone. The problem is that every cloud vendor offers their own infrastructure-as-code solution; AWS has CloudFormation, Azure has Azure Resource Manager and Google has Google Cloud Deployment Manager. None of these Infrastructure as Code (IaC) tools work in any other cloud, nor do they help us with provisioning infrastructure in our private cloud. Using these tools requires separate skillsets for each cloud provider and eliminate the ability to reuse code between various environments.

In order to maintain consistency, we should use tools that work everywhere. Terraform is our go-to IaC solution for this. It works in every cloud (including private), integrates with most configuration management tools like Puppet or Red Hat Ansible and the enterprise version incorporates a plethora of features that both infrastructure admins and developers find valuable.

Configuration, Compliance and Patch Management

The last piece to this is to have a solution that is easily portable between clouds that assists in the configuration, patching and governance of our workloads. Puppet, Ansible, Chef, etc. all work in this case and can be combined with Terraform, making it simple to provision and configure a full environment and associated infrastructure. These tools are used in our private cloud environments all the time, so why not use the same tooling in our cloud providers?

We covered a lot of ground, but the key takeaway should be that without consistency across our environments, we are introducing a lot of complexity that leads to many issues. Kovarus focuses primarily on helping customers adopt cloud in a consistent and simple way.

For more information or help, please reach out to myself or your local Kovarus Account Executive.


Looking to learn more about modernizing and automating IT? We created the Kovarus Proven Solutions Center (KPSC) to let you see what’s possible and learn how we can help you succeed. To learn more about the KPSC go to the KPSC page.

Also, follow Kovarus on LinkedIn for technology updates from our experts along with updates on Kovarus news and events.