VCAP-CID Study Notes: Objective 2.4

Welcome to the VCAP-CID Study Notes. This is Objective 2.4 in the VCAP-CID blueprint Guide 2.8. The rest of the sections/objectives can be found here.

Bold items that have higher importance and copied text is in italic. Please note that this post is one of the larger ones in this series.

Knowledge

  • Identify constraints of vSphere cluster sizing
    • Each vSphere cluster can only have 32 hosts. But the clusters can be Elastic so Organizations can use multiple clusters.
    • This is from ESXi 5.1 Maximums:

VCAP CID 2-4-1

  • Identify constraints of Provider/Organization Virtual Datacenter relationships
    • You can only create 32 Resource Pools (Organization vDCs) for the same Organization. They can be a part of more than one Resource Pool for a single Provider vDC.
    • Elastic mode allows organizations to use multiple clusters (which is a resource pool)
    • This piece of advice is always good to know, but it might be to strict if you have a good estimate of the growth of the environment or use elastic Provider vDCs:
      • As the number of hosts backing a provider virtual data center approaches the halfway mark of cluster limits, implement controls to preserve headroom and avoid reaching the cluster limits. For example, restrict the creation of additional tenants for this virtual data center and add hosts to accommodate increased resource demand for the existing tenants.
  • Identify capabilities of allocation models
  • Explain vSphere/vCloud storage features, functionality, and constraints.
    • vSphere storage features include (among others):
      • Storage IO control
        • Supported in vCloud environment, as this is a feature of the vSphere environment and doesn’t have a constraint on a vCloud design.
        • I will not be explaining these vSphere features in length as I assume people know the work and might impact a design.
      • Storage DRS (Storage vMotion)
        • Supported in vCloud Director 5.1
      • Storage Clusters
        • Supported in vCloud Director 5.1. You can add a Storage Cluster in the vCloud director administrative page.
        • I recommend setting the same Storage Policy (Profile in 5.1) for each Storage Cluster
        • Each Cluster can only contain 32 Datastores, but a Storage Policy (Profile in 5.1)  can include multiple datastores from multiple Storage Clusters.
          • So VM’s for the same Organization could reside in two different Datastore Clusters.
      • vSphere Flash Read Cache (vSphere 5.5)
      • vSphere Profile Driven Storage
      • All theses features are supported for vCloud environments, all but vFRC in 5.1, but that is supported in vCloud Director 5.5.
      • Please note that only 64 ESXi hosts can access the same Datastore at any given time so if you have a large environment you might run into constraints on that fact.
      • If not familiar with storage features for vSphere please make sure to catch up on that subject.
    • vCloud storage features include:
      • The only real storage features used in vCloud Director are
        • Thin-provisioning
        • Fast-provisioning
        • Snapshots
          • A vSphere feature, but it’s capped in vCloud Director to only one Snapshot each VM.
          • Other capabilities include:
            • One snapshot per virtual machine is permitted.
            • NIC settings are marked read-only after a snapshot is taken.
            • Editing of NIC settings is disabled through the API after a snapshot is taken.
            • To take a snapshot, the user must have the vAPP_Clone user right.
            • Snapshot storage allocation is added to Chargeback.
            • vCloud Director performs a storage quota check.
            • REST API support is provided to perform snapshots.
            • Virtual machine memory can be included in the snapshot.
            • Full clone is forced on copy or move of the virtual machine, resulting in the deletion of the snapshot (shadow VMDK).
  • Explain the relationship between allocation models, vCloud workloads and resource groups.
    • First I want to recommend everyone read this document:
    • You should know VMware changed how CPU allocation works in Allocation Pools between 5.1 RTM (810718) and 5.1.2 (1068441).
      • In 810718 (5.1 RTM)
        • When configuring an organization neither Limit or Reservation of the RP created was set at creation.
        • 20 GHz Capacity and 50% guarantee. 1 vCPU is 2 GHz.  1 VM  is powered on and the RP Limit is set to 2GHz and the reservation are set to 1 GHz. So you only have 10 vCPU’s available in your environment before you hit that cap of 20GHz and the VM’s can’t be turned on.
        • And when using a lower number for the vCPU, lets say 400 MHz you get VM’s that are limited in available CPU at first as the RP is incremented in 400MHz. First VM has 400MHz, 2 VMs has 800MHz, 3VM has 1200 MHz etc.
      • In 868405 (5.1.1)
        • When configuring an organization the Limit of the RP created was set at creation, so lets say at 20GHz Capacity and 50% guarantee the resource pool would have a 20 GHz limit. No reservations was set on the RP, that was done when a VM was powered on.
        • 20 GHz Capacity and 50% guarantee. 1 vCPU is 2 GHz.  1 VM  is powered on and the RP Reservation are set to 1 GHz. You still only have 10 vCPU’s available.
        • But now if you used lower vCPU numbers you could create VM’s like you wanted (). 400Mhz per vCPU would allow you to create  50 vCPU’s for 20 GHz Capacity RP.
        • The first VM’s created now have the whole CPU Capacity to use so they are not so constrained.
        • But this means if the Allocation pool was Elastic (spanning multiple Clusters) each RP in each cluster would have the limit set to the initial Capacity, allowing organizations to use more resource than was initially configured.
        • Massimo Re Ferre’ has a great post on what changed between 5.1 an 5.1.1 : http://it20.info/2012/10/vcloud-director-5-1-1-changes-in-resource-entitlements/
      • In 1068441 (5.1.2)
    • Allocation Pool (like is works in 5.1.3)
      • What kind of resource pool does this pool create?
        • A Sub-resource pool under the Provider vDC resource pool. The pool is configured with the CPU Capacity configured as a limit, leaving CPU reservation unchanged. The Memory limit and reservation are also unchanged (with Expandable and Unlimited Selected as well)
      • What happens when a virtual machine is turned on?
        • When a VM is turned on the Sub-resource pools memory limit is left unchanged with Expandable Reservation and Unlimited selected, and its reservation is increased by the VM’s configured memory size times the percentage guarantee for that organization vDC.
          • Please note even thought the limit is not set on the resource pool vCloud Director will not power on VM’s that will break the Memory Capacity configuration for pool.
        • The CPU reservation is increased by the number of vCPU configured for the virtual machine times the vCPU specified at the organization vDC level times the percentage guarantee factor for CPU set at the organization vDC level. The virtual machine is reconfigured to set its memory and CPU reservation to zero and placed.
      • Does this allocation model have any special features?
        • Elasticity: Can span multiple Provider Resource pools.
    • Pay-as-you-Go (EDITed)
      • What kind of resource pool does this pool create?
        • A Sub-resource pools is created with zero rate and unlimited limit.
      • What happens when a virtual machine is turned on?
        • When a VM is turned on the VM’s memory limit is increased by the VM’s configured memory size, and its reservation is increased by the VM’s configured memory size times the percentage guarantee for that organization vDC. The Resource pool reservation is also increased to the same amount of reservation+the VM overhead.
        • The CPU limit on the VM is increased by the number of vCPU the virtual machine is configured with times the vCPU frequency specified at the organization vDC level, and the CPU reservation is increased by the number of vCPU configured for the virtual machine times the vCPU specified at the organization vDC level times the percentage guarantee factor for CPU set at the organization vDC level. The Resource pool reservation is also increased to the same amount of reservation.
      • Does this allocation model have any special features?
        • No resources are reserved ahead of time so they might fail to power on if there aren’t enough resources.
    • Reservation Pool
      • What kind of resource pool does this pool create?
        • A Sub-resource pools is created with the limit and reservation configured set at the organization vDC level.
      • What happens when a virtual machine is turned on?
        • Rate and limit are not modified. The organization can change these settings on a per VM level with this allocation model.
      • Does this allocation model have any special features?
        • Can not be elastic between multiple Provider resource pools.
        • Will fail to create if the resources in the Provider resource pools are insufficient.
        • Users can set shares, limits and rates on Virtual machines.

 

Skills and Abilities

  • Given a set of vCloud workloads, determine appropriate DRS/HA configuration for resource clusters.
    • These are the available HA configurations:
      • Host failures tolerated
      • % of Cluster resources reserved
      • Specify Failover Host
    • This really depends on Allocation pool type that will be used with the cluster, and lets say a whole cluster is using the same allocation type(for simplicitys sake)
      • Reserved Pool
        • The Resource pools will have reservations and limits. The VM reservations and shares will be  controlled by the users of that organization so I recommend using % based HA mode. That will make HA take the reservation of each VM into account when calculating the Current Failover Capacity.
        • If you would use Host failures at its default setting it would use the VM with the most reserved CPU and Memory as a slot size. You would need know how much resources the users are going to reserve to be able to manually set those slot sizes with advanced setting, so it’s not a very flexible option.
      • Allocation Pool
        • The resource pools will have reservation and limits based on the configuration of the Organization. The VM’s will not be configured with a limit so the slot size will be the default of 32MHz and 0 MB + overhead in memory.
        • Allocation pool VM’s can also be very different in size and again you will need to have good, no a great idea on the sizes of the VM’s that will be running in there to be able to use advanced setting for Host failure tolerates.
        • A % based HA is probably the better choice of the two.
      • Pay-as-you-Go
        • In PAYG the VM’s have a limit set to the configured size of the vCPU. But the memory limit depends on the % guaranteed setting at the creation of the organization vDC. So you will have a very predictable CPU limit (or slot size) but the memory slot size will depend on the size of the largest VM in the cluster.
        • So If you have large VM’s with perhaps 32 GB memory and any percentage guaranteed, lets say 20% (the default) the slot size will be 1 GHz and 6,4 GB. That will not be a good slot size as most VM’s will be much smaller.
        • One way of making Host failure Tolerated acceptable is to use no % guaranteed, so every VM must fight for the Memory in the resource pool.
        • But best to use % of Cluster Resources since it is the de-facto standard in most HA clusters because it delivers the most flexibility of the three options.
          • Takes all available resources in a cluster and adds them upp. Then it subtracts the % configured in HA settings. Then HA will add all reserved resource are in use (Powered-on VM’s only), but will use a 32 MHz default setting for each VM. Memory is reservation+overhead. This value is than used to calculate the failover capacity.
          • It’s best to let the experts explain this as they made a book on it so I recommend reading the vSphere 5.1 Clustering Deepdive book
          • Or read their blog on HA. This blog post from Duncan Epping explains the % based HA very well: http://www.yellow-bricks.com/vmware-high-availability-deepdiv/
    • To be able to use Anti/affinity rules you will need to use PowerCLI or other automation features to make sure certain vApp are deployed to different/same host.
    • DRS configuration
      • First I must mention you HAVE to enable DRS on a vCloud Provider resource pools as that is the only way to be able to create resource pools.
      • And it’s best practice to use different Provider vDC’s for each version of Allocation Pools
        • Pay-as-you-Go Cluster, Allocation Pool cluster and Reservation Cluster
        • But let’s be realistic, that will not be that case at all for many installations. Not everybody has a budget to create 3 different clusters. You can create Sub-resource pools for each version but it complicates DRS scheduling immensely as FrankDenneman explains in this blog post:
      • DRS moves VM’s around ESXi hosts in a cluster based on resources used in the cluster. That was a very simplified description of DRS as it uses a lot of different metric and algorithms to calculate if and when a VM should be moved.
      • I’m not going to explain in detail how DRS works as you can read up on that in various books and documentation.
      • As you might expect using different resource pools can affect DRS calculations. You might have a Reservation Allocation Pool using half of the resources and Allocation or PAYG for the rest.
      • In case of configuration of DRS settings its best in most cases to set the mode to automatic and just let DRS do its thing.
      • Read this to get a good idea on how DRS works, and of course if you really want to know more you should pick up Duncan’s and Frank’s book.
  • Given a set of vCloud workloads, determine appropriate host logical design.
    • So this means creating a logical design for the hosts that are a part of the Resource cluster.
    • Here you have the configuration of the ESXi hosts for the Resource clusters, and it depends the amount of workloads projected and usage of resources and availability requirements.
      • Chapter 4.4 in the vCAT Architecting a VMware vCloud document describes this process.
    • This Table from a VMware Partner SET documents is a good overview what you would like to have as a part of a host logical design

VCAP CID 2-4-2

  • Given a set of vCloud workloads, determine appropriate vCloud logical network design.
    • This section will require details on networking and networking security configuration for the ESXi hosts supporting the resource clusters.
      • This include details on MTU sizes, when using VCD-NI or VXLAN.
      • Increasing the number of ports on a vDS from 128 to 4096 to allow vCD to dynamically create port groups.
      • An overview of vSS and vDS usage on the ESXi hosts for each cluster (if different)
      • vSS configuration: # of ports, # network adapters, Security settings, Port Groups with VLANs and security settings.
      • vDS configuration: # of ports, # network adapters, Security settings, Port Groups with VLANs and security settings and Bindings
    • Chapter 4.4 in the vCAT Architecting a VMware vCloud document helps with sizing the environment to be able to create the logical design.
  • Given a set of vCloud workloads, determine appropriate vCloud logical storage design.
    • Chapter 4.4 in the vCAT Architecting a VMware vCloud document helps with sizing the environment to be able to create the logical design.
    • Here you need to state how much storage is needed by the projected workloads, and if different Tier of storage will be offered. There Tiers need to be explained
    • If available, create a list of VM’s with their configured disk sizes, memory sizes, safety margins and average number and sizes of snapshots. I know this is almost impossible for public clouds, but its better to be able to make a decision on something tangible rather than say “oh cause I always used 2TB datastores”.
      • Let say the storage vendor that was used says it only wants 36 VM’s per datastore (I couldn’t imagine why).
      • You have created an estimate of the number and their disk sizes of the VM’s that will be deployed/migrated into the environment.
      • From that you could size the datastores from those numbers.
    • If you will be using Datastore Clusters, they should be quite proficient in moving workloads around so you only need to make sure you have enough resources in the cluster. The 32 Datastore per Cluster restriction, and the Shadow VM for Fast provisioning might affect that design though. And plan ahead with projected growth as well.
  • Given a set of vCloud workloads, determine appropriate vCloud logical design.
    • vCloud logical design is an overview of the whole environment. vCAT Architecting a VMware vCloud document has a great picture that shows just that:

VCAp CID 2-4-3

One thought on “VCAP-CID Study Notes: Objective 2.4”

Leave a Reply

Your email address will not be published. Required fields are marked *