Hi team!
Here's a case that a single-controller alarm on a storage array cannot be cleared after a host is restarted.
Symptom
UltraPath V100R006 is installed on a Solaris host.
A storage array is connected to the host using an optical fiber.
A single-controller alarm is generated on the storage array due to path disconnection.
The host is restarted when the single-controller alarm persists.
After the path is restored, the single-controller alarm still persists.
After the connection between the storage array and host is restored, the single-controller alarm on the storage array cannot be cleared.
Fault Diagnosis
1. After the single-controller alarm is detected by the user mode of UltraPath for Solaris, the alarm is sent to the kernel. After receiving the alarm, the kernel adds the alarm to a cyclic list and constantly tries to send the alarm to the storage array.
2. After the path is restored, the user mode sends an alarm clearance command to the kernel.
The command is also added to a cyclic list and retries constantly.
In addition, the alarm is transferred from the current alarm file to the historical alarm file.
3. Due to the frequent intermittent path disconnection,
the kernel of UltraPath cannot send the alarm clearance command to the storage array because path selection fails.
4. After the host is restarted, the retry queue maintained in the kernel has been deleted.
The user mode does not maintain the previous alarm information anymore.
Therefore, the single-controller alarm clearance command is no longer sent to the storage array.
As a result, the single-controller alarm cannot be cleared.
Solution
This problem has been solved in the UltraPath V100R008.
If the problem occurs, you can perform the following operations to rectify the fault.
For storage arrays that are managed in ISM, you can log in to ISM to clear the alarm.
1. Log in to ISM and locate the alarm information in the upper right corner of the interface.
2. Click Critical. The critical alarm list is displayed.
3. Select the alarm you want to delete according to the host name and initiator WWN in the event description and click Clear.
For storage arrays that are managed in DeviceManager, you can log in to DeviceManager to clear the alarm.
1. Log in to DeviceManager and locate the alarm information in the upper right corner of the interface.
2. Click Critical. The critical alarm list is displayed.
3. Select the alarm you want to delete according to the host name and initiator WWN in the event description and click Clear.
Hope this will help you solve similar cases.