Hi team!
Here's a case that keeping failed LUN for a long time after disk scanning triggers the system to panic.
Symptom
1. UltraPath is successfully installed on the ESXi host.
2. A failed LUN is mapped to the host from the array.
3. On the host, disk scanning command esxcfg-rescan -A is executed.
4. Keeping the failed LUN for a long time can trigger purple screen of death (PSOD) at a low probability.
The following figure shows the stack information.

Fault Diagnosis
1. VMware engineers replied that the inherent bug of the ESXi system triggers the PSOD, after parsing the dump information. The reply is as follows:
This is a bug in ESX code. We have a similar bug#1365517 logged with ESXi-6.0. However, as the race condition is an extremely rare case (more than 40 000 times retry to reproduce in your case) and reproduced only with torture testing, so it will not be considered to get fixed for 2015 release. It is currently planned to get fixed in 2016 release.
2. VMware confirmed that the retry operation after the failure of registering scsiDev with UltraPath has no problem.
The reply is listed as follows:
--SCSIDeviceIteratorNext() is a utility function which moves the iterator forward to the next ScsiDevice. The reference count of the previous current device (if any) is decremented, and the reference count of the new current device (if any) is incremented. Retrying of device register not an issue. Usually, any PSA device layer issued I/Os you need to have a handle open or a ref on the device and there are functions which get invoked periodically (like SCSIDeviceTimeoutHandlerFn()) and uses SCSIDeviceIteratorNext().
So what I am saying is the retry operation you are trying in your MPP is ok and the issue is in ESXi code.
There is a bug reported on the same but as it is a rare case(as mentioned in my previous comment) it is marked to be taken up in future releases.
Solution
Delete the mapping of the failed LUN and remap it to the host after recovering it.
When a LUN fails, you can find the event information about the failed LUN on the array.
This is my solution, how about yours? Go ahead and share it with us!

