ESX / ESXi Slow Boot – UWConflictRetries
Are some of your ESX / ESXi hosts taking a long time to boot? On some of our ESXi 4.0 Update 1 hosts we are seeing boot times over 10 minutes. Watching the boot process we see the issue is related to storage; the ESXi DCUI appears to hang at “Loading module multiextent”.
Here is why:
During the boot process an ESX(i) host rescans its accessible LUNs. If you are using MSCS clusters with Raw Device Mappings (RDMs) you will likely experience a lengthy delay during the scanning process at boot time. The issue is caused by a timeout condition during the rescan operation on RDM LUNs. In my experience you will see the slow boot issue on all hosts that are zoned to see MSCS RDM LUN(s). For example, if you have a DRS/HA cluster with 15 hosts and are using MSCS with RDMs; within that cluster you will see slow boot times on all the cluster hosts. This occurs because all hosts in a DRS/HA cluster should have access to all common datastores.
You can mitigate this issue (not resolve it) by implementing a parameter change recommended in VMware internal KB1016106. The Scsi.UWConflictRetries parameter for ESX(i) 4 Update 1 hosts has a default value of 1000. This increases the time spent enumerating LUN and VMFS volumes.
Follow the steps below:
The Scsi.UWConflictRetries parameter for ESX and ESXi 4 Update 1 hosts have a default value of 1000.
To resolve this issue and speed up the boot process, modify this value to 80.
Click on Host -> Configuration -> Advanced Settings
In the Advanced Settings -> Select SCSI
Now, Change the Scsi.UWConflictRetries value to 80(Default is 1000).
As an alternative, you might consider creating a DRS/HA cluster dedicated to MSCS Virtual Machines and mask the RDM LUNs from all your other ESX(i) hosts not participating in the dedicated MSCS DRS/HA cluster.