Objective 4.3 Topics:
- Analyze and Resolve DRS/HA faults
- Troubleshoot DRS/HA Configuration Issues
- Troubleshoot Virtual SAN/HA Interoperability
- Resolve vMotion and Storage vMotion Issues
- Troubleshoot VMware Fault Tolerance
Analyze and Resolve DRS/HA Faults
DRS faults indicate the reason that prevent DRS actions.
DRS Faults
Virtual Machine is Pinned – When DRS can’t move a VM because DRS is disabled on the VM.
Virtual Machine Not Compatible with ANY Host – Fault occurs when DRS can’t find a host that can run the VM. This might mean that there are not enough physical compute resources or disk available to satisfy the VM’s requirements.
VM/VM DRS Rule Violated When Moving to Another Host – Fault occurs when more that one virtual machines running on the same host are use the same affinity rule
Host Incompatible with Virtual Machine – Fault occurs when trying to migrate a VM to which the destination host might not have access to network or storage needed to maintain connectivity on the VM. A second reason this might occur is that the CPU’s are vastly different between the hosts that vMotion is not supported. And thirdly, is that there is a required VM/Host DRS rule in place that tells DRS to never move this particular VM.
Host has Virtual Machine That Violates VM/VM DRS Rules – Occurs when moving or powering on a VM that has a VM/VM DRS rule. Machine can still be moved if done so manually. vCenter just will not do it automatically.
Host has Insufficient Capacity for Virtual Machine – Occurs when the host does not have enough compute or memory to satisfy the VM’s requirements.
Host in Incorrect State – Occurs when a host is entering maintenance or standby state when it’s needed for DRS.
Host has Insufficient Number of Physical CPU’s for a Virtual Machine – When the VM has more physical CPU’s than the host has available.
Host has Insufficient Capacity for Each Virtual Machine CPU – Occurs when the VM has more virtual CPU’s than the host has available.
No Active Host in Cluster – Occurs when moving a virtual machine to a cluster that doesn’t have any available hosts that are connected and available.
Insufficient Resources – Occurs when a operation conflicts with a resource configuration policy. Example: If you try to power on a 4GB VM in a 2GB resource pool with a limit/not expandable.
Insufficient Resources to Satisfy Configured Fail-over Level for HA – Occurs when the HA configuration of CPU or memory resources are reserved for failover (Admission Control) is violated or can’t be met by the DRS operation. This can occur when a host is trying to enter maintenance mode or standby mode or when a VM violates fail-over when it attempts to power on.
No Compatible Hard Affinity Host – No host available to satisfy a mandatory VM/Host DRS affinity/anti-affinity rule.
No Compatible Soft Affinity Host – No host available to satisfy a preferred VM/Host DRS affinity/anti-affinity rule.
Soft Rule Violation Correction Disallowed – DRS migration threshold is set to mandatory only. This does not allow the generation of DRS actions to correct a non-mandatory VM/Host DRS affinity rule.
Soft Rules Violation Correction Impact – Correcting the non-mandatory VM/Host DRS affinity rules does not occur because it impacts performance.
Troubleshoot DRS/HA Configuration Issues
Troubleshooting Steps:
- Check for known issues. vSphere isn’t perfect but it’s close. 😉
- Ensure that HA is properly configured.
- Verify that network connectivity is working between the hosts and vCenter.
- Verify that the hosts are properly connected to vCenter.
- Verify the datastore used for heart-beating is accessible by all the hosts in the cluster.
- Verify that all the configuration files of the FDM agent were pushed successfully to the hosts.
- Log location of hosts: /etc/opt/vmware/fdm
- Increase the FDM log output to verbose.
Troubleshoot vSAN and HA Interoperability
To enable both Virtual SAN and vSphere HA on a cluster, Virtual SAN must be enabled first, followed by vSphere HA. You cannot enable Virtual SAN if vSphere HA is already enabled. To disable Virtual SAN on a cluster with vSphere HA also enabled, one must first of all disable vSphere HA. Only then can Virtual SAN be disabled.
Changing the vSphere HA Network – if both vSAN and HA are enabled on a cluster and the HA network changes, manual HA re-configuation is required to teach the hosts about the change.
Resolve vMotion and Storage vMotion Issues
Storage DRS is Disabled on a Virtual Disk – When the datastore cluster has Storage DRS enabled but some VMDK’s in the cluster have Storage DRS disabled.
Reasons for SDRS to be disabled on a virtual disk:
- VM’s swap file is specifically specified for the VM.
- VM’s .vmx file
- Storage vMotion is disabled
- The main disk of the VM is protected by HA and relocating will cause loss of HA protection
- The disk is a CD/ISO file
- If the disk is independant
- VM has system files on separate datastores.
- VM has a hidden disk. Such as snapshots
- VM is a template
- VM has Fault tolerance enabled
- VM is sharing files between disks
Datastore Can’t Enter Maintenance Mode
One or more disks can’t be migrated with Storage vMotion.
- Storage vMotion is disabled on a disk
- Storage vMotion rules are preventing Storage DRS from making migrations
How to fix?
- If SDRS is disabled, enable it and figure out why it’s disabled.
- Remove or disable any rules that are preventing SDRS from happening.
- Set the SDRS advanced option IgnoreAffinityRulesForMaintenance to 1.
Storage DRS Cannot Operate on a Datastore
- Datastore is shared across multiple datacenters
- Datastore is connect to an unsupported host
- Datastore is connected to a host that is not running Storage I/O Control
How to fix?
- Datastore must be visiable in only one datacenter
- Verify all hosts connected to the datastore cluster are running ESXi 5.0+
- Verify all host connected to the datastore cluster have Storage I/O Control enabled.
Storage DRS Generates Faults During VM Creation
- Review or remove any rule violations and retry the creation of the VM.
Applying Storage DRS Recommendations Fail
- The Thin Provisioning Threshold Crossed alarm is triggered for target datastore, which means the datastore is running out of space.
- Target datastore might be in maintenance mode
How to fix?
- Recify the issue that caused the Thin Provisioning Threshold alarm
- Remove target datastore from maintenance mode
Troubleshoot VMware Fault Tolerance
Hardware Virtualization Not Enabled
- Enable it in the BIOS of the host (if compatible)
Compatible Hosts Not Available for Secondary VM
- Add more hosts to the cluster
- Verify hosts aren’t in maintenance mode
- Verify available resources including disk space
- Verify HV is enabled on all the hosts in the cluster
Secondary VM causing poor performance on Primary VM
- Verify the second VM isn’t running on an over-committed host
- Change the secondary VM’s datastore if disk contention is the issue
- Move the second VM to another host
- Apply resource limits on other VM OR investigate other VM’s on host running secondary VM.
Increased Network Latency Observed in FT VM’s
- Verify there is sufficient bandwidth between hosts.
- Verify the network link is functioning optimally.
Some Hosts are Overloaded with FT VM’s
DRS does not load balance Fault Tolerance enabled VM’s unless they are running in legacy mode.
- Manually move FT VM’s to redistribute the workload across other hosts.
Turning on FT for Powered-On VM’s Fail
- Check to see if there’s available memory to perform the operation.
- Move the VM to a host that has available resources.
FT VM’s not Placed or Evacuated by DRS
- Check the VM(s) for a VM override that is disabled DRS.