Cloud Wars: Infrastructure Abstraction

September 3, 2019

Tweet This:
Share on LinkedIn:

By Steve Kaplan, Kovarus, SDDC & Cloud Management

One of the best parts of my job is being part of the team that maps out and develops new capabilities in our Kovarus Proven Solutions Center, also known as the KPSC. Over the last month or so, that has included starting to build out multi- / hybrid-cloud use cases utilizing VMware’s Cloud Automation Services (CAS), a product and platform that is best described as the next generation of cloud management and provisioning.

To say VMware has taken a lot of the feedback from the current release of vRealize Automation (vRA) would be an understatement! One of the most frustrating things to me about how vRA 7.x (and earlier) releases worked had to do with the lack of instrumentation to figure out why provisioning operations failed, particularly in the allocation process to determine provisioning operations. In this regard, CAS moves the ball forward tremendously, as you’ll see below.

With that said, let’s get right into looking at how to do some basic setup and validation for provisioning operations before you even have to author or provision a blueprint within Cloud Assembly.

Getting Your Infrastructure Ready

First thing’s first, let’s get logged into the [Cloud Services Console] . Today we’re focusing on Cloud Assembly, so we’ll go in there!

Once you’re in Cloud Assembly, the first thing to configure are Cloud Accounts, which are infrastructure endpoints that CAS will consume. As of the writing of this blog post, the following cloud account types are supported:

  • Amazon Web Services (AWS)
  • Microsoft Azure
  • Google Cloud Platform (GCP)
  • VMware Cloud on AWS (VMC)
  • vCenter
  • NSX-V
  • NSX-T

If you take a look at the previous screenshot, we’ve currently defined AWS, Azure, a few vCenter servers, and an NSX-V Manager (more are coming, friends!). Platforms that integrate or can provide consumable capabilities, such as Puppet Enterprise, vRealize Orchestrator, GitHub, or any of those are defined under the Integrations right under the cloud accounts section.

If you notice the blue box drawn under the vCenter named “NSX-V Environment,” the values captured there are capability tags, which enable you to define placement logic for deployment of infrastructure components such as virtual machines. Understanding capability tags and assigning them will in many ways govern how and what provisioning operations look like. These capability tags are defined as key:value pairs, so you can get creative with defining criteria for the use cases that are relevant to your environment! Capability tags are inherited, so if you set a capability tag at the cloud account, all discovered resources will inherit those — this is most relevant for compute resources such as vSphere clusters or availability zones in AWS.

For our purposes, our cloud accounts only have one or two tags defined:

  • cloudtype: This is the tag we use to define whether a resource is a public or private cloud. If you look at the screenshot, all endpoints have this defined and will be very relevant
  • sdntype: This is mostly going to be useful for some private- / hybrid-cloud use cases where I want to be able to determine as part of provisioning requests that I need something to reside on NSX-T, NSX-T, or possibly even Cisco ACI. You’ll notice on the vCenter endpoints that they all have this tag applied, whereas AWS and Azure does not.

After you’re done setting up your cloud accounts, next up are cloud zones! If you’re familiar with reservations within vRA, this is roughly what they translate to, but they are not implemented in the same way reservations are. Whereas with vRA, you have to configure reservations and map them to business groups, think of a cloud zone as a consumption policy for a particular cloud account that can be applied to any project within CAS without having to configure the same thing over and over. Projects are the construct that are effectively replacing business groups. This change in and of itself will reduce a lot of the operational overhead associated with resource management, and I can’t stress enough how much of an overall improvement this one thing is in real life, day-to-day operational work!

Looking at the highlighted section for the cloud zone for Amazon’s US-WEST2 region, I’ve defined two tags (in addition to the one defined for the overall cloud zone!!): ‘env:dev’ and ‘priority’. For the purposes of this posting, we’ll be focusing on the env tag.

While you can (and probably should) define capability tags at the cloud zone, you can also set tags on compute resources within the cloud account to allow for filtering on various things. Note that capability tags at the individual compute are not specific to the cloud zone itself, but are set on the compute resource, so they are reusable across multiple cloud zones.

The previous screenshot shows US-WEST2 and how we’ve determined that the west2a, b, and c are defined with a capability tag of ‘category:demo’, whereas west2d has a capability tag of ‘category:kpsc’. For the purposes of this specific post, this particular tag ends up not being 100% relevant, but it’s important to understand where these things can be set, since you can ostensibly use capability tags to dynamically make various compute resources part of a cloud zone.

Next up on the setup train, we define flavor mappings. In the simplest of terms, think of flavor mappings as the mechanism for defining instance sizing characteristics and providing “t-shirt sizes” to your blueprints in a more consistent and abstracted capacity. This capability has been significantly improved on from vRA 7.x, we could easily define sizing characteristics for vSphere requests using image component profiles, but public cloud blueprints had to be defined within the individual blueprint itself. Consolidating this into a single policy point, regardless of cloud, is a huge win. Huge!

The important thing to point out here is the difference between vSphere and public-cloud endpoints; vSphere accounts require setting CPU and memory values, whereas other cloud types utilize native instance sizes. This ensures that regardless of the cloud type, we have a consistent understanding of what a particular sizing characteristic is for any defined cloud.

Once we’ve gotten through defining flavor mappings, the next thing to define are image mappings. Image mappings provide a mechanism for abstracting images into a definition of what it is. If you look at the previous image, I’ve defined AMIs within each Amazon region, templates for all defined Azure regions, and a vSphere template for the defined vSphere cloud account. We’ll see how this becomes important in the examples below! One thing to note here for vSphere accounts is that you cannot define a customization specification (which I find odd and plan on having a chat about this), but you can define scripting code via the ‘CloudConfig’ section, which gets invoked using cloud-init. For those comfortable with AWS and Azure, this shouldn’t be a foreign concept, but for the vSphere-only faithful, this will be something that needs to get reviewed and will likely require updates to existing vSphere templates to support. Note that you can still define a customization specification within individual blueprints if you’re defining vSphere resources only, but I’m certainly using my experimentations in CAS to get a lot more comfortable with what can be done in this arena. I’m still on the fence how I feel about this direction and what it means for those who’ve heavily leveraged software components in vRA 7.x, but that’s a conversation for another time!

One thing to keep in mind about both flavor and image mappings is that they are defined on a per-cloud zone basis, not a universal “All AWS is this way” or “I always want all vSphere endpoints to consider this size characteristic” — you do have to actively set them for each account. The distinction here is that while you can only set a single flavor per mapping for a cloud account, you can set multiple images for a single cloud account in each image mapping.

The reason for this distinction is you can define constraints via capability tags on images to allow for flexibility if maintaining multiple versions of an image for whatever reasons. Just off the top of my head, I could see scenarios where organizations that have to maintain both a PCI compliant and non-compliant image may want to define a constraint for the PCI template so it gets used on blueprints tied to the PCI environment. You could also have a “beta” image that includes the latest patches and updates from information security that need to be vetted, but you don’t want to release it wide yet, and you don’t necessarily want to duplicate all of your blueprints. Add a tag that constraints to only executing that blueprint when the beta tag is added to the deployment request. This can ultimately streamline a lot of operations and lead to a much more manageable end state.

Multi-Cloud Provisioning

Now that I’ve highlighted the essential CAS constructs that get used in this context, let’s go and simulate a deployment.

When I talk about operational improvements and having better transparency in how things are decided, CAS has taken a massive step forward. From the improved transparency in how placement decisions are made to the fact that we can actually simulate a few different request types to determine if things are properly configured prior to ever authoring a blueprint, I can safely say that anybody who’s taken a look at this and spent any amount of time trying to figure out how and why vRA makes the decisions it makes… this is a breath of fresh air.

For the purposes of this blog post, I’m initiating the simulation from the projects landing page, but this can be done from anywhere the simulation is available.
Let’s go look at simulating a request that should land my request on a public cloud provider (specifically AWS US-WEST2!).

Landing on Public Cloud

As I noted previously, simulation will provide a synthetic provisioning operation to validate how the most common use cases will operate. When simulating, you can pick from any of the four in the previous image. Just set the project you want to validate against (since this governs available cloud zones!!), add in your flavor and image mappings to use, along with any capability tags, and hit the simulate button. Once it completes, you should hopefully get a green check mark for success or a red x indicating a failure. In either case, you can review the output and determine what happened by clicking on see details. Note that there is no charge for simulating, since no provisioning operations occur, so it provides a nice mechanism for making sure the basics are in place before actually authoring blueprints where things will actually be provisioned and carry a cost.

You can review how I defined everything in this first request from the screenshot, so let’s take a look at the output and see how the sausage gets made!

To make it easier to read (and so it wasn’t a massive graphic), I minimized the specific outputs of the cloud accounts — details will follow, showing a more granular look at the individual accounts. The left-most account marked in green (US-WEST2) was the cloud account where this request would have been provisioned into if this wasn’t a simulated transaction.

This first view compares the two AWS regions to one another. If you look at the error on the US-EAST1 region, it’s implying that it picked WEST2 because 1) the priority for that region was higher; and 2) matching constraints.

The more important / relevant piece of that allocation decision is the first one, which is the priority. Priorities are defined on a per-project basis and determine how much weight to give each type of cloud from an allocation standpoint. As far as the second, there are some other things going on with what we’re doing in US-EAST1, so it wasn’t a surprise that there was a matching constraint issue there. This was the expected outcome.

Flipping over to look at the other clouds, on premises vSphere and Azure, we can see that hard constraints required were not met. Fortunately, CAS makes it very easy to see which constraints were not met for those cloud zones within the project.

Landing on Private Cloud

Now let’s run the same request again, but this time flip the cloud type so our request should land on premises in a vSphere cluster! The only change I’ve made to this simulated request is changing the value for the cloud type capability tag.

As you can see from the output for this simulation, we ended up in the NSX-V vCenter, which is the expected result.

Taking a look at how the vSphere endpoint was selected, it’s pretty clear. The NSX-T site does not have an image defined in the ‘CentOS 7’ image mapping, so that’s a pretty quick dissection of why that wasn’t a viable target for this request.

Taking a look at the public-cloud side, let’s look at an Azure and AWS site. In both cases, the cloud type value was the governing problem, since both had a flavor and image mapping defined that matched.


That’s the lesson for the day! I’m not entirely sure how many times I’ve had conversations with both customers and folks within the VMware community at large about how vRA does allocation and why things go where they do, but it’s definitely a high two digit number, and without having a deep understanding of how vRA works, you’ll get lost going down a rabbit hole trying to sort out how these allocation decisions get made.

As somebody who spent a lot of time and personal capitol looking for answers to how this operated in vRA 6.x / 7.x, I’m very heartened to see that these sorts of enhancements are being made and really do make it easier to manage, operate, and troubleshoot issues in a much more holistic and transparent way.

Looking to learn more about modernizing and automating IT? We created the Kovarus Proven Solutions Center (KPSC) to let you see what’s possible and learn how we can help you succeed. To learn more about the KPSC go to the KPSC page.

Also, follow Kovarus on LinkedIn for technology updates from our experts along with updates on Kovarus news and events.