Good day!
This post is explains how a site outage resulted from a low input voltage on a service board (Line Processing Unit) of a metro router (NE40E-X3). Kindly have a look below for more details.
ISSUE DESCRIPTION
During one afternoon, we received a notification from the Network Operations Center (NOC) that 113 sites in one region were down for the 2G/3G/4G technologies. Customers were neither able to make voice calls nor browse. This was very critical as the revenue was highly impacted.
HANDLING PROCESS
After this incident was reported by NOC, the following steps below were taken:
STEP 1 : The Network Operations Center immediately created an incident ticket specifying the issue as well as area of impact and assigned to the Back Office Datacom team for investigation and resolution. This incident was marked as a Severity 1 due to the area of impact and experience of the customers.
STEP 2 : The Back Office team started by checking the recent logs on the equipment concerned for the area of impact by using the display logbuffer command. This below clearly shows the time of occurrence.


Equipment Type: NE40E-X3
Equipment Software Version: (NE40E&80E V600R008C10SPC300)
STEP 3 : Considering we found out the input voltage was 0.00V as shown in the above screenshot, it was important to check the voltage using the display voltage command as well as display logbuffer command on other nodes within the same environment. This served just for comparison purposes.

The above screenshot clearly showed that the board of this node around the same period had a cold reset but later was registered meaning service on board was recovered.
STEP 4 : From the logs in step 3 (first screenshot), it showed clearly the input voltage to this board is 0.00V and this prompted us to use a multimeter (voltage function) to test the input voltage to the router box (Power Entry Module) and we confirmed it was 0.00V for one PEM and normal for the other (-48.3V). The below diagram shows why the issue in the input power of one Power Entry Module (PEM) affected only one service board and not the other This is due to the Architecture of the Power Supply system of the NE40E-X3.

The Power Supply System works in a 1 + 1 backup mode.
ROOT CAUSE
The power insufficiency caused the board to reset and this affected the forwarding plane and thus all service on this board was impacted.
SOLUTION
We rectified the power input issue on this box and the board was registered and the services recovered to normal.
ADVISE
This situation happened because the power module of the NE40E-X3 does not function in Active / Active mode. If enough budget is available, it is advisable to go with a power system architecture which is fully redundant. In this architecture, if one Power Entry Module has an issue, the other fully powers the node and gives time for the Engineer to repair or replace the faulty one.The architecture is shown in the screenshot below:



