Problem Description
The Internet Gateway Router (IGW) in the network setup connects to the following equipment to provide internet service to the end-customer
- Provider Edge routers which act as the Service Gateway through mostly an iBGP connection
- The upstream provider through an eBGP connection
We noticed through our Network Operations Center, fluctuation in the graph of internet connectivity and also complains coming from our customer of poor internet experience during the impact period.
Handling Process
STEP 1 : The Network Operations Center noticed some fluctuation on the internet graph and immediately created an incident ticket which is assigned to the back office team for investigation and resolution.
STEP 2 : The Back Office team logs on to the Internet Gateway Router considering it is the convergence point for internet connectivity and issues the "display logbuffer" command to check the recent logs on the equipment. The log can be seen below showing the different times of occurence.

It is noticed BFD is flapping on the interface linking the Internet Gateway Router and the Provider Edge router.
This as a consequence causes the ISIS protocol configured between the 2 router interfaces to flap.
STEP 3: A deeper check is conducted to understand why the BFD session was flapping by using the diagnose logs.
It is noticed that each time there is this BFD flapping, there is an LPU TM Chip soft reset. When this TM chip resets, packet forwarding is affected. This is shown on the screenshot below

STEP 4: After noticing this, the next step is to check the patch release note for the patch version (V800R011SPH032) or a higher version and verify the conditions that causes the reset of the TM Chip.
It is clearly stated in higher version, V800R011SPH036 the conditions that cause this reset is related to :
- Type of board running on the node
- Presence of blackhole routes in the configuration of the device.
An extract of the document is shown on the screenshot below

STEP 5 : Check the patch version running on the IGW

The above shows we are running a version lower than one the indicated in the patch release concluding this patch release is affected as well by the conditions that cause the reset of the TM Chip.
STEP 6: Next, we check the board type on the device where this flapping occurs and it is noticed we have slot1_CR57LPUF50C, slot2_CR57LPUF50C, slot3_CR57LPUF120A which matches the case occur condition:
This is done using the " display device slot_ID" command

STEP 7 : We check condition 2 which is the presence of blackhole routes. Checking the configuration using the "display current-configuration " command. it is noticed there are some blackhole routes.

Root Cause :
The root cause of the issue are :
- Patch version running on the device
- Boards used on the node
- Presence of blackhole route configuration on the device.
Solution :
Upgrade the patch on the node from V800R011SPH032 to V800R011SPH058 which resolves this issue.
Considering the issue already occured, the LPU boards affected also need to be reset.

