All posts by larushjartar

VCAP-CID Study Notes: Objective 2.5

Welcome to the VCAP-CID Study Notes. This is Objective 2.5 in the VCAP-CID blueprint Guide 2.8. The rest of the sections/objectives can be found here.

Bold items that have higher importance and copied text is in italic.

Knowledge

  • Compare and contrast vCloud allocation models.
  • Identify storage constraints for an Organization Virtual Datacenter (VDC)
    • When configuring storage for a Organization Virtual Datacenter you can set a Storage Limit in GB. This is the storage that is used by VM’s and Catalog items in the organization vDC.
    • When using Fast Provisioning there are constraints regarding the usage of Shadow VM’s and their Linked Clones.
      • When a VM is created with Fast Provisioning a Shadow VM is created. The VM is then created as a linked clone from that Shadow VM.
      • This table (from the vCAT Architecting a VMware vCloud document) shows how placement of Linked Clones behaves:

VCAP CID  2-5-1

 

 

Skills and Abilities

  • Determine applicable resource pool, CPU and memory reservations/limits for a vCloud logical design.
    • The allocation pool types do most of that configuration automatically and should not be changed using the vSphere client.
    • But you could create sub resource pools for each type of an allocation pool.
  • Determine the impact of allocation model performance to a vCloud logical design.
    • This is based on the allocation model used:
      • Reservation Pool
      • Allocation Pool
        • Since Allocation Pool is based on resource pool reservation the same concepts apply here as to the Reservation Pools. The only difference is that the users can’t change the limit, reservations and shares on a VM level.
        • Here you don’t have to worry that much as the VM’s will use the CPU/Memory capacity configured, with % of it reserved when VM’s are powered on.
      • Pay-as-you-Go
        •  A default resource pool with no configuration is created, but the VM’s have limits based on the vCPU speed configured and reservation based on the % setting. So a VM with 2 vCPU would have double the limit of the vCPU speed.
        • CPU reservations at a VM level can result in high CPU ready times. Esxtop would show %RDY, and %MLMTD show the percentage of time the VM is ready to run but isn’t because it violates the CPU limit setting. %MLMTD should be 0%.
        • Memory reservation at a VM level is covered in this blog post from Frank Denneman: http://frankdenneman.nl/2009/12/08/impact-of-memory-reservation/
        • Here the performane penalty is mostly based on the vCPU speed configured at the creation of the Organization vDC.
          • To small and you’ll have a lot of slow VM’s. You could fix this by adding more vCPU, by that increasing the limit, but the you might end up with VM’s with to many vCPU (create a vCPU scheduling war)
          • If the vCPU speed is set to the MHz number of the actual hosts used means all the VM’s will basically work like they don’t have a limit. But that means you will need to give the Organization a huge quota to work with to be able to turn on all the VM’s on, or leave it at unlimited.
  • Determine the impact to a given billing policy based on a selected allocation model.
    • There is no need for me to write about this as this subject has been covered in this excellent post from Eiad Al-Aqqad.
  • Given service level requirements, determine appropriate allocation model(s).
    • First we need to point out that service levels can be different based on what they should cover:
      • Availability – Are the system running? Based on uptime of the systems in question.
      • Backups – RTO & RPO
      • Serviceability – Initial response – Intital resolution time
      • Performance SLA – Need a certain amount of performance – 15 K disk , SSD disk etc.
      • Compliance – Logging,  ensure infrastructure is compliant to standards (PCI-DSS, etc)
      • Operations – Time when users can be added (if manual)
      • Billing – How long billing information is kept, depends on the local law.
    • Each allocation model has its caveats regarding service levels:
      • Reservation Pool
        • DRS Affinity rules can not be set by users in the default vCloud UI but will need to be “spliced” in as a part of a custom UI perhaps (Objective 2.4 has a link to a great example http://vniklas.djungeln.se/2012/06/21/vm-affinity-when-using-vcloud-director-and-vapps/)
        • If the service level are regarding the amount of resources available, Reservation Pool has all their resources reserved.
        • Availability SLA is the same for each cluster of ESXi hosts and has no impact on different allocation models.
      • Allocation Pool
        • If you are using Elastic mode your VM’s might be running two seperate DRS clusters, so you will need to keep that in mind.
        • If the service level are regarding the amount of resources available, Allocation Pool has a part of their resources reserved.
      • Pay-as-you-Go
        • If the service level are regarding the amount of resources available, PAYG has a part of their resources reserved, but it most likely to have performance problems regarding CPUs.
  • Given customer requirements, determine an appropriate storage provisioning model (thin/fast).
    • Thin Provisioning is just that, VM disks use less space on the datastores. So if space efficiency is a requirement than great.
    • Fast Provisioning like I mentioned before uses the Linked Clone technology to create clones that read from a single Shadow VM. Each VM reads that master disk, but writes to their own delta disk. Great for creating multiple VM’s at once as they don’t need to clone the whole disk of the templates.
  • Given a desired customer performance level, determine a resource allocation configuration.
    • Resource allocation configuration are allocation pools and how they are configured.
      • Reservation Pool
        • Customer gets resources reserved and can control how those resources are divided between workloads in that pool.
      • Allocation Pool
        • Customer gets a part of the resource reserved with a chance to burst to a certain amount.
      • Pay-as-you-Go
        • Customer gets a VM based reservation and limited vCPU power. Great for Test/Dev or dynamic workloads (meaning created and then deleted after a short period of time)
    • Performance levels can also mean using different performance Tiers (and perhaps with different level of service)
      • You can create different Provider vDC’s with different CPU speeds, HA configurations, and perhaps adding a SSD caching solution to create different Tiers.
      • Also you can offer storage Tiers, that could be based on different kind of spinning disks, eg. 10K, 15K and 7.2K. All based on the storage array and protocol used.
    • If we create an example
      • 3 Tiers of Provider vDC’s
        • Gold: Reservation Pool – E7 Intel processors – High speed Memory – HA at N+2 – SSD caching enabled.
        • Silver: Allocation Pool- E5 Intel processors – High speed memory – HA at N+1
        • Bronze: PAYG – E5 Intel Processors – High speed memory – HA turned off
      • Storage
        • Gold: 15K HDD + SSD caching in the storage array
        • Silver: 10K HDD
        • Bronze: 7.2K HDD

VCAP-CID Study Notes: Objective 2.4

Welcome to the VCAP-CID Study Notes. This is Objective 2.4 in the VCAP-CID blueprint Guide 2.8. The rest of the sections/objectives can be found here.

Bold items that have higher importance and copied text is in italic. Please note that this post is one of the larger ones in this series.

Knowledge

  • Identify constraints of vSphere cluster sizing
    • Each vSphere cluster can only have 32 hosts. But the clusters can be Elastic so Organizations can use multiple clusters.
    • This is from ESXi 5.1 Maximums:

VCAP CID 2-4-1

  • Identify constraints of Provider/Organization Virtual Datacenter relationships
    • You can only create 32 Resource Pools (Organization vDCs) for the same Organization. They can be a part of more than one Resource Pool for a single Provider vDC.
    • Elastic mode allows organizations to use multiple clusters (which is a resource pool)
    • This piece of advice is always good to know, but it might be to strict if you have a good estimate of the growth of the environment or use elastic Provider vDCs:
      • As the number of hosts backing a provider virtual data center approaches the halfway mark of cluster limits, implement controls to preserve headroom and avoid reaching the cluster limits. For example, restrict the creation of additional tenants for this virtual data center and add hosts to accommodate increased resource demand for the existing tenants.
  • Identify capabilities of allocation models
  • Explain vSphere/vCloud storage features, functionality, and constraints.
    • vSphere storage features include (among others):
      • Storage IO control
        • Supported in vCloud environment, as this is a feature of the vSphere environment and doesn’t have a constraint on a vCloud design.
        • I will not be explaining these vSphere features in length as I assume people know the work and might impact a design.
      • Storage DRS (Storage vMotion)
        • Supported in vCloud Director 5.1
      • Storage Clusters
        • Supported in vCloud Director 5.1. You can add a Storage Cluster in the vCloud director administrative page.
        • I recommend setting the same Storage Policy (Profile in 5.1) for each Storage Cluster
        • Each Cluster can only contain 32 Datastores, but a Storage Policy (Profile in 5.1)  can include multiple datastores from multiple Storage Clusters.
          • So VM’s for the same Organization could reside in two different Datastore Clusters.
      • vSphere Flash Read Cache (vSphere 5.5)
      • vSphere Profile Driven Storage
      • All theses features are supported for vCloud environments, all but vFRC in 5.1, but that is supported in vCloud Director 5.5.
      • Please note that only 64 ESXi hosts can access the same Datastore at any given time so if you have a large environment you might run into constraints on that fact.
      • If not familiar with storage features for vSphere please make sure to catch up on that subject.
    • vCloud storage features include:
      • The only real storage features used in vCloud Director are
        • Thin-provisioning
        • Fast-provisioning
        • Snapshots
          • A vSphere feature, but it’s capped in vCloud Director to only one Snapshot each VM.
          • Other capabilities include:
            • One snapshot per virtual machine is permitted.
            • NIC settings are marked read-only after a snapshot is taken.
            • Editing of NIC settings is disabled through the API after a snapshot is taken.
            • To take a snapshot, the user must have the vAPP_Clone user right.
            • Snapshot storage allocation is added to Chargeback.
            • vCloud Director performs a storage quota check.
            • REST API support is provided to perform snapshots.
            • Virtual machine memory can be included in the snapshot.
            • Full clone is forced on copy or move of the virtual machine, resulting in the deletion of the snapshot (shadow VMDK).
  • Explain the relationship between allocation models, vCloud workloads and resource groups.
    • First I want to recommend everyone read this document:
    • You should know VMware changed how CPU allocation works in Allocation Pools between 5.1 RTM (810718) and 5.1.2 (1068441).
      • In 810718 (5.1 RTM)
        • When configuring an organization neither Limit or Reservation of the RP created was set at creation.
        • 20 GHz Capacity and 50% guarantee. 1 vCPU is 2 GHz.  1 VM  is powered on and the RP Limit is set to 2GHz and the reservation are set to 1 GHz. So you only have 10 vCPU’s available in your environment before you hit that cap of 20GHz and the VM’s can’t be turned on.
        • And when using a lower number for the vCPU, lets say 400 MHz you get VM’s that are limited in available CPU at first as the RP is incremented in 400MHz. First VM has 400MHz, 2 VMs has 800MHz, 3VM has 1200 MHz etc.
      • In 868405 (5.1.1)
        • When configuring an organization the Limit of the RP created was set at creation, so lets say at 20GHz Capacity and 50% guarantee the resource pool would have a 20 GHz limit. No reservations was set on the RP, that was done when a VM was powered on.
        • 20 GHz Capacity and 50% guarantee. 1 vCPU is 2 GHz.  1 VM  is powered on and the RP Reservation are set to 1 GHz. You still only have 10 vCPU’s available.
        • But now if you used lower vCPU numbers you could create VM’s like you wanted (). 400Mhz per vCPU would allow you to create  50 vCPU’s for 20 GHz Capacity RP.
        • The first VM’s created now have the whole CPU Capacity to use so they are not so constrained.
        • But this means if the Allocation pool was Elastic (spanning multiple Clusters) each RP in each cluster would have the limit set to the initial Capacity, allowing organizations to use more resource than was initially configured.
        • Massimo Re Ferre’ has a great post on what changed between 5.1 an 5.1.1 : http://it20.info/2012/10/vcloud-director-5-1-1-changes-in-resource-entitlements/
      • In 1068441 (5.1.2)
    • Allocation Pool (like is works in 5.1.3)
      • What kind of resource pool does this pool create?
        • A Sub-resource pool under the Provider vDC resource pool. The pool is configured with the CPU Capacity configured as a limit, leaving CPU reservation unchanged. The Memory limit and reservation are also unchanged (with Expandable and Unlimited Selected as well)
      • What happens when a virtual machine is turned on?
        • When a VM is turned on the Sub-resource pools memory limit is left unchanged with Expandable Reservation and Unlimited selected, and its reservation is increased by the VM’s configured memory size times the percentage guarantee for that organization vDC.
          • Please note even thought the limit is not set on the resource pool vCloud Director will not power on VM’s that will break the Memory Capacity configuration for pool.
        • The CPU reservation is increased by the number of vCPU configured for the virtual machine times the vCPU specified at the organization vDC level times the percentage guarantee factor for CPU set at the organization vDC level. The virtual machine is reconfigured to set its memory and CPU reservation to zero and placed.
      • Does this allocation model have any special features?
        • Elasticity: Can span multiple Provider Resource pools.
    • Pay-as-you-Go (EDITed)
      • What kind of resource pool does this pool create?
        • A Sub-resource pools is created with zero rate and unlimited limit.
      • What happens when a virtual machine is turned on?
        • When a VM is turned on the VM’s memory limit is increased by the VM’s configured memory size, and its reservation is increased by the VM’s configured memory size times the percentage guarantee for that organization vDC. The Resource pool reservation is also increased to the same amount of reservation+the VM overhead.
        • The CPU limit on the VM is increased by the number of vCPU the virtual machine is configured with times the vCPU frequency specified at the organization vDC level, and the CPU reservation is increased by the number of vCPU configured for the virtual machine times the vCPU specified at the organization vDC level times the percentage guarantee factor for CPU set at the organization vDC level. The Resource pool reservation is also increased to the same amount of reservation.
      • Does this allocation model have any special features?
        • No resources are reserved ahead of time so they might fail to power on if there aren’t enough resources.
    • Reservation Pool
      • What kind of resource pool does this pool create?
        • A Sub-resource pools is created with the limit and reservation configured set at the organization vDC level.
      • What happens when a virtual machine is turned on?
        • Rate and limit are not modified. The organization can change these settings on a per VM level with this allocation model.
      • Does this allocation model have any special features?
        • Can not be elastic between multiple Provider resource pools.
        • Will fail to create if the resources in the Provider resource pools are insufficient.
        • Users can set shares, limits and rates on Virtual machines.

 

Skills and Abilities

  • Given a set of vCloud workloads, determine appropriate DRS/HA configuration for resource clusters.
    • These are the available HA configurations:
      • Host failures tolerated
      • % of Cluster resources reserved
      • Specify Failover Host
    • This really depends on Allocation pool type that will be used with the cluster, and lets say a whole cluster is using the same allocation type(for simplicitys sake)
      • Reserved Pool
        • The Resource pools will have reservations and limits. The VM reservations and shares will be  controlled by the users of that organization so I recommend using % based HA mode. That will make HA take the reservation of each VM into account when calculating the Current Failover Capacity.
        • If you would use Host failures at its default setting it would use the VM with the most reserved CPU and Memory as a slot size. You would need know how much resources the users are going to reserve to be able to manually set those slot sizes with advanced setting, so it’s not a very flexible option.
      • Allocation Pool
        • The resource pools will have reservation and limits based on the configuration of the Organization. The VM’s will not be configured with a limit so the slot size will be the default of 32MHz and 0 MB + overhead in memory.
        • Allocation pool VM’s can also be very different in size and again you will need to have good, no a great idea on the sizes of the VM’s that will be running in there to be able to use advanced setting for Host failure tolerates.
        • A % based HA is probably the better choice of the two.
      • Pay-as-you-Go
        • In PAYG the VM’s have a limit set to the configured size of the vCPU. But the memory limit depends on the % guaranteed setting at the creation of the organization vDC. So you will have a very predictable CPU limit (or slot size) but the memory slot size will depend on the size of the largest VM in the cluster.
        • So If you have large VM’s with perhaps 32 GB memory and any percentage guaranteed, lets say 20% (the default) the slot size will be 1 GHz and 6,4 GB. That will not be a good slot size as most VM’s will be much smaller.
        • One way of making Host failure Tolerated acceptable is to use no % guaranteed, so every VM must fight for the Memory in the resource pool.
        • But best to use % of Cluster Resources since it is the de-facto standard in most HA clusters because it delivers the most flexibility of the three options.
          • Takes all available resources in a cluster and adds them upp. Then it subtracts the % configured in HA settings. Then HA will add all reserved resource are in use (Powered-on VM’s only), but will use a 32 MHz default setting for each VM. Memory is reservation+overhead. This value is than used to calculate the failover capacity.
          • It’s best to let the experts explain this as they made a book on it so I recommend reading the vSphere 5.1 Clustering Deepdive book
          • Or read their blog on HA. This blog post from Duncan Epping explains the % based HA very well: http://www.yellow-bricks.com/vmware-high-availability-deepdiv/
    • To be able to use Anti/affinity rules you will need to use PowerCLI or other automation features to make sure certain vApp are deployed to different/same host.
    • DRS configuration
      • First I must mention you HAVE to enable DRS on a vCloud Provider resource pools as that is the only way to be able to create resource pools.
      • And it’s best practice to use different Provider vDC’s for each version of Allocation Pools
        • Pay-as-you-Go Cluster, Allocation Pool cluster and Reservation Cluster
        • But let’s be realistic, that will not be that case at all for many installations. Not everybody has a budget to create 3 different clusters. You can create Sub-resource pools for each version but it complicates DRS scheduling immensely as FrankDenneman explains in this blog post:
      • DRS moves VM’s around ESXi hosts in a cluster based on resources used in the cluster. That was a very simplified description of DRS as it uses a lot of different metric and algorithms to calculate if and when a VM should be moved.
      • I’m not going to explain in detail how DRS works as you can read up on that in various books and documentation.
      • As you might expect using different resource pools can affect DRS calculations. You might have a Reservation Allocation Pool using half of the resources and Allocation or PAYG for the rest.
      • In case of configuration of DRS settings its best in most cases to set the mode to automatic and just let DRS do its thing.
      • Read this to get a good idea on how DRS works, and of course if you really want to know more you should pick up Duncan’s and Frank’s book.
  • Given a set of vCloud workloads, determine appropriate host logical design.
    • So this means creating a logical design for the hosts that are a part of the Resource cluster.
    • Here you have the configuration of the ESXi hosts for the Resource clusters, and it depends the amount of workloads projected and usage of resources and availability requirements.
      • Chapter 4.4 in the vCAT Architecting a VMware vCloud document describes this process.
    • This Table from a VMware Partner SET documents is a good overview what you would like to have as a part of a host logical design

VCAP CID 2-4-2

  • Given a set of vCloud workloads, determine appropriate vCloud logical network design.
    • This section will require details on networking and networking security configuration for the ESXi hosts supporting the resource clusters.
      • This include details on MTU sizes, when using VCD-NI or VXLAN.
      • Increasing the number of ports on a vDS from 128 to 4096 to allow vCD to dynamically create port groups.
      • An overview of vSS and vDS usage on the ESXi hosts for each cluster (if different)
      • vSS configuration: # of ports, # network adapters, Security settings, Port Groups with VLANs and security settings.
      • vDS configuration: # of ports, # network adapters, Security settings, Port Groups with VLANs and security settings and Bindings
    • Chapter 4.4 in the vCAT Architecting a VMware vCloud document helps with sizing the environment to be able to create the logical design.
  • Given a set of vCloud workloads, determine appropriate vCloud logical storage design.
    • Chapter 4.4 in the vCAT Architecting a VMware vCloud document helps with sizing the environment to be able to create the logical design.
    • Here you need to state how much storage is needed by the projected workloads, and if different Tier of storage will be offered. There Tiers need to be explained
    • If available, create a list of VM’s with their configured disk sizes, memory sizes, safety margins and average number and sizes of snapshots. I know this is almost impossible for public clouds, but its better to be able to make a decision on something tangible rather than say “oh cause I always used 2TB datastores”.
      • Let say the storage vendor that was used says it only wants 36 VM’s per datastore (I couldn’t imagine why).
      • You have created an estimate of the number and their disk sizes of the VM’s that will be deployed/migrated into the environment.
      • From that you could size the datastores from those numbers.
    • If you will be using Datastore Clusters, they should be quite proficient in moving workloads around so you only need to make sure you have enough resources in the cluster. The 32 Datastore per Cluster restriction, and the Shadow VM for Fast provisioning might affect that design though. And plan ahead with projected growth as well.
  • Given a set of vCloud workloads, determine appropriate vCloud logical design.
    • vCloud logical design is an overview of the whole environment. vCAT Architecting a VMware vCloud document has a great picture that shows just that:

VCAp CID 2-4-3

VCAP-CID Study Notes: Objective 2.3

Welcome to the VCAP-CID Study Notes. This is Objective 2.3 in the VCAP-CID blueprint Guide 2.8. The rest of the sections/objectives can be found here.

Bold items that have higher importance and copied text is in italic.

Knowledge

  • Identify management components required for a vCloud design
    • From the vCAT Architecting a VMware vCloud document:
      • Core management cluster components include the following:
      • vCenter Server or VMware vCenter Server Appliance™
      • vCenter Server database
      • vCloud Director cells
        • (NFS storage for Multicell environment as well)
      • vCloud Director database
      • vCloud Networking and Security Manager (one per resource group vCenter Server)
      • vCenter Chargeback Manager
      • vCenter Chargeback database
      • VMware vCenter Update Manager™
      • vCenter Orchestrator
      • VMware vCloud Networking and Security Edge gateway appliances deployed by vCloud Director through vCloud Networking and Security Manager as needed, residing in the resource groups, not in the management cluster.
    • And more from the same document:
      • The following management cluster components are optional:
      • VMware vCenter Server Heartbeat™
      • vCloud Automation Center
      • vCloud Connector
      • VMware vFabric RabbitMQ™
      • vFabric Application Director
      • VMware vFabric Hyperic® HQ
      • VMware vSphere Management Assistant
      • vCenter Operations Manager
      • vCenter Configuration Manager
      • vCenter Infrastructure Navigator
      • vCenter Site Recovery Manager
      • Databases for optional components
      • Optional components are not required by the service definition but are highly recommended to increase the operational efficiency of the solution.
        The management cluster can also include virtual machines or have access to servers that provide infrastructure services such as directory (LDAP), timekeeping (NTP), networking (DNS, DHCP), logging (syslog), and security.
  • Identify management component availability requirements.
    • Most of the vCloud components have High Availability features built in.

VCAP CID 2-3-1

      • vCenter Server or VMware vCenter Server Appliance™
        • vCenter Heartbeat – its EOA but will be replaced with availability features in future vCenter releases.
      • vCenter Server database
        • SQL cluster (mirroring or AlwaysOn etc.) or Oracle RAC.
      • vCloud Director cells
        • You can cluster vCloud Director Cells but you will need a NFS Storage as common place for vApp uploads
          NFS storage
        • Either a VM running NFS or even a hardware NAS.
      • vCloud Director database
        • SQL cluster (mirroring or AlwaysOn etc.) or Oracle RAC.
      • vCloud Networking and Security Manager (one per resource group vCenter Server)
        • One of the few components that can only be covered by vSphere HA.
      • vCenter Chargeback Manager
        • See the picture
      • vCenter Chargeback database
        • See the picture.
      • VMware vCenter Update Manager™
        • There is no need for availability here. Used when patching ESXi hosts.
      • vCenter Orchestrator
        • It has a cluster mode, this KB says it all:2079967
      • VMware vCloud Networking and Security Edge gateway appliances deployed by vCloud Director through vCloud Networking and Security Manager as needed, residing in the resource groups, not in the management cluster.

Skills and Abilities

  • Design a management cluster given defined availability constraints.
    • I think I covered the constraint pretty well in the last bullet.
    • Please make sure to include all vSphere related features, HA and FT.
    • Other than those these “best practices” are from the vCAT Architecting a VMware vCloud document
      • NFS storage fro vCloud Director Cell should at least be 250GB.
      • Use 3 ESXi hosts for a managment cluster
      • Use HA with percentage based reservation. N+2 is also an option for even more high availability.
      • Network component and path redundancy.
      • Configure redundancy at the host (connector), switch, and storage array levels.
  • Size a management cluster based on required management components for a given vCloud design.
    • There is a nice table in the vCAT Architecting a VMware vCloud document:

VCAP CID 2-3-2

  • Design a management cluster that meets the needs of a given resource configuration.
    • Resource configuration is based on the size of the Resource cluster, and a number of VM’s (or at least I think they are referring to that).
    • So vCloud maximums is something you need to be aware off:
      Knowledge Base Article 2036392.
    • But there is a slight chance they want to you take the sizing of the management components and use that to design a management cluster (how many hosts, CPU, Memory, network, storage). Either way best to know about both 🙂