Hello, everyone!
Today, I'd like to share a case with you.
Product Model: SmartAX MA5603T.
Problem Description
The control board of the SmartAX MA5603T has shut down abnormally.
Problem Analysis
The problem is that the SCUH board enters the standby state and the standby board enters the active state without redundancy protection because the board fails to be activated.
1. Query the last logs of the system and confirm that the reset of the SCUH control board is caused by the long-term nuclear watchdog.

2. The watchdog is unclear on the mission as it waits for a long time for the spin lock.

3. The chip fault detection task cannot obtain the spin lock because the spin lock is occupied by another CPU task that sends protocol packets.
Spinlock is a locking mechanism used to protect shared resources. In fact, spin locks are similar to mutex locks in that they address exclusive use of resources. Whether an amutex or a spin lock, there can be at most one holder at any one time, that is, at least one execution unit can acquire the lock at any one time. However, the scheduling mechanism is slightly different. For mutexes, if resources are occupied, the resource applicant can only enter the sleep state. However, the spin lock does not cause the caller to sleep. If the spin lock is already held by another execution unit, the caller loops there to see if the holder of the spin lock has released the lock. The word "spin" is named after it.
The spin lock is used to protect the shared VLAN software table. The shared VLAN software table can be accessed by multiple tasks. The CPU accesses the VLAN software table when sending protocol packets. In this case, a spin lock is obtained to prevent other tasks from accessing the VLAN software table at the same time. When an interrupt occurs during the access to the VLAN software table, the interrupt service program is executed. After the interrupt service program is executed, the system reschedules the task and schedules the task to detect the fault of the forwarding chip with a higher priority. The task will also access the vlan software table, and at this point it will also acquire a spin lock. The spin lock in the task of sending protocol packets by the CPU is not released, and the task is waiting. Finally, the watchdog kills the system, and the board is reset. The following figure shows the process.

Root Cause
The spin lock is improper and the spin lock interrupt is not processed. As a result, task A (the CPU sends protocol packets) is interrupted after the spin lock is obtained. After the interrupt is processed, task B with a higher priority is scheduled (detecting forwarding chip faults). This task also obtains a spin lock. Because task A is not released, task B is always waiting.
Solution Description
Non-spinlocks are used to protect shared resources, this issue can be solved by hotpatch R18C10SPH210.
Welcome to leave a message below.
We study together.
Thank you!





