Keywords:
OptiX OSN 9800, OptiX OSN 9600, cross-connect board, soft failure, PCIE
Summary:
The following issues may occur when the cross-connect boards and system control boards of OptiX OSN 9800/9600 products are running:
l Soft failure of the control logic on the cross-connect board
l Failure to report alarms or perform switchovers in case of a PCIE abnormality
l Alarm POWER_FAIL reported after the battery on the system control board is exhausted
These issues may interrupt services during normal live-network equipment running, SNCP switchovers, and MPN switchovers of the cross-connect boards and will result in many faulty cross-connect boards and system control boards. To prevent these issues, proactively upgrade the products to the latest patches of the corresponding mainstream version.
[Problem Description]
The following table lists the major issues. For detailed information of other issues, see the version release note.
Board
Version
Issue Description
U2UXCS on OptiX OSN 9800/9600 U32/U64
V100R001C20SPH372 and earlier versions
Issue 1: soft failure of the control logic on the cross-connect board
No alarm is reported and no switchover is performed when the cross-connect board is faulty.
Issue 2: high failure rate of the SD5805 chip on the cross-connect board
When a chip is faulty, a HARD_BAD alarm is reported and the cross-connect board is switched normally.
The parameters of the HARD_BAD alarm (two SD5805 chips involved) on the U2UXCS board are as follows:
l HARD_BAD, 0x0f,0x05,0x00,0xff,0xff,0xff
l HARD_BAD, 0x0f,0x05,0x01,0xff,0xff,0xff
Issue 3: high failure rate of the PCI bus on the cross-connect board
When a PCI bus is faulty, a HARD_ERR alarm is reported but no switchover is triggered.
The parameters of the HARD_ERR alarm (two PCI buses involved) on the U2UXCS board are as follows:
l HARD_ERR, 0x15,0x08,0x04,0xff,0xff,0xff
l HARD_ERR, 0x15,0x08,0x05,0xff,0xff,0xff
Issue 4: abnormal SD5805 chip on the cross-connect board in a reserved slot of the U64 subrack
U2UXCS on OptiX OSN 9800/9600 U32/U64
S1UXCS on OptiX OSN 9800/9600 U16
V100R002C10SPC200
V100R002C10SPC310
Issue 1: soft failure of the control logic on the cross-connect board
No alarm is reported and no switchover is performed when the cross-connect board is faulty.
Issue 2: high failure rate of the SD5805 chip on the cross-connect board
When a chip is faulty, a HARD_BAD alarm is reported and the cross-connect board is switched normally.
The parameters of the HARD_BAD alarm (two SD5805 chips involved) on the U2UXCS board are as follows:
l HARD_BAD, 0x0f,0x05,0x00,0xff,0xff,0xff
l HARD_BAD, 0x0f,0x05,0x01,0xff,0xff,0xff
Issue 3: high failure rate of the PCI bus on the cross-connect board
When a PCI bus is faulty, a HARD_BAD alarm is reported but no switchover is triggered.
The parameters of the HARD_BAD alarm (two PCI buses involved) on the U2UXCS board are as follows:
l HARD_BAD, 0x15,0x08,0x04,0xff,0xff,0xff
l HARD_BAD, 0x15,0x08,0x05,0xff,0xff,0xff
The parameters of the HARD_BAD alarm (one PCI bus involved) on the S1UXCS board are as follows:
HARD_BAD, 0x15,0x08,0x04,0xff,0xff,0xff
V100R002C10SPC300
V100R003C10SPC200
Issue 1: soft failure of the control logic on the cross-connect board
No alarm is reported and no switchover is performed when the cross-connect board is faulty.
Issue 3: high failure rate of the PCI bus on the cross-connect board
When the PCI bus is faulty, a HARD_BAD alarm is reported and the cross-connect board is switched normally.
The parameters of the HARD_BAD alarm (two PCI buses involved) on the U2UXCS board are as follows:
l HARD_BAD, 0x15,0x08,0x04,0xff,0xff,0xff
l HARD_BAD, 0x15,0x08,0x05,0xff,0xff,0xff
The parameters of the HARD_BAD alarm (one PCI bus involved) on the S1UXCS board are as follows:
HARD_BAD, 0x15,0x08,0x04,0xff,0xff,0xff
TN52SCC on OptiX OSN 9800/9600 platform subrack
All versions
Issue 5: alarm POWER_FAIL with parameter 0x05 reported on the TN52SCC board
The preceding table lists only the major issues that may occur in mainstream versions. The issues in non-mainstream versions are not described here. For details, see [TN-R-201703] Notice on Rectification for Upgrading the Non-mainstream Versions of OptiX OSN 9800&9600.
Trigger condition:
An NE uses a software version that is involved in this rectification notice.
Symptom:
For details, see section "Problem Description."
Identification method:
Check whether the NE software version is involved in this rectification notice.
[Root Cause]
The following table lists the root cause of each issue.
Issue
Root Cause
Issue 1: soft failure of the control logic on the cross-connect board
A soft failure will occur on the FPGA chip if it is affected by external electromagnetic radiation or interference. Services will be interrupted if the soft failure affects the cross-connect matrix on the cross-connect board.
Issue 2: high failure rate of the SD5805 chip on the cross-connect board
Because of the IBM manufacturing process defect, a hard failure may occur on the RAM inside the SD5805 chip, resulting in the HARD_BAD alarm.
Issue 3: high failure rate of the PCI bus on the cross-connect board
The reliability of the PCIE module inside the FPGA on the cross-connect board is not ensured and there is a low probability that the PCI bus becomes abnormal. A cold reset needs to be performed on the cross-connect board to resolve this issue.
Issue 4: suspension of the SD5805 chip on the cross-connect board in a reserved slot of the U64 subrack
There is a software defect on the delivered cross-connect boards for V100R001C20SPC360. When an upgrade is performed in package or patch loading mode, the signals for controlling the page switching of the SD5805 chip will be abnormal for several seconds. As a result, there is a possibility of suspension of the automatic switching state machine of the SD5805 chip.
Issue 5: alarm POWER_FAIL reported on the TN52SCC board
The battery discharge capability varies with temperature. The battery discharges faster under a higher temperature. If the battery ambient temperature is high (> 45°C) due to the chip heat dissipation during system control board running, the battery will be exhausted earlier than its expected lifespan. A POWER_FAIL alarm (with parameter 0x5) will be reported after the software detects battery insufficiency.
[Impact and Risk]
The following table lists the impacts and risks of each issue.
Board
Version
Issue
Impact and Risk
U2UXCS on OptiX OSN 9800/9600 U32/U64
V100R001C20SPH372 and earlier versions
Issue 1: soft failure of the control logic on the cross-connect board
No alarm is reported. Services are interrupted after the cross-connect board failure.
Issue 2: high failure rate of the SD5805 chip on the cross-connect board
l The cross-connect board failure rate is high.
l The alarm is reported and the switchover is performed automatically. Services are not interrupted.
Issue 3: high failure rate of the PCI bus on the cross-connect board
l The HARD_ERR alarm is reported, but the switchover is not performed automatically. Services are not interrupted.
l New configurations cannot be delivered successfully.
l Services are interrupted after the service protection group switchover.
Issue 4: abnormal SD5805 chip on the cross-connect board in a reserved slot of the U64 subrack
l No alarm is reported. Services are not interrupted.
l Services may be interrupted after the cross-connect board switchover.
U2UXCS on OptiX OSN 9800/9600 U32/U64
S1UXCS on OptiX OSN 9800/9600 U16
V100R002C10SPC200
V100R002C10SPC310
Issue 1: soft failure of the control logic on the cross-connect board
No alarm is reported. Services are interrupted after the cross-connect board failure.
Issue 2: high failure rate of the SD5805 chip on the cross-connect board
l The cross-connect board failure rate is high.
l The alarm is reported and the switchover is performed automatically. Services are not interrupted.
Issue 3: high failure rate of the PCI bus on the cross-connect board
l The HARD_ERR alarm is reported, but the switchover is not performed automatically. Services are not interrupted.
l New configurations cannot be delivered successfully.
l Services are interrupted after the service protection group switchover.
V100R002C10SPC300
V100R003C10SPC200
Issue 1: soft failure of the control logic on the cross-connect board
No alarm is reported. Services are interrupted after the cross-connect board failure.
Issue 3: high failure rate of the PCI bus on the cross-connect board
The HARD_BAD alarm is reported, and the switchover is performed automatically. Services are not interrupted.
TN52SCC on OptiX OSN 9800/9600 platform subrack
All versions
Issue 5: alarm POWER_FAIL reported on the TN52SCC board
Only the system control boards in the master subracks are affected.
l The services are running normally without impact.
l The time is changed to the year 1990 after a power failure. New configurations added within 30 minutes before the power failure will be lost.
[Measures and Solutions]
Recovery measures:
Board
Version
Issue
Recovery Measure
U2UXCS on OptiX OSN 9800/9600 U32/U64
V100R001C20SPH372 and earlier versions
Issue 1: soft failure of the control logic on the cross-connect board
Perform a cold reset on the faulty cross-connect board.
Issue 2: high failure rate of the SD5805 chip on the cross-connect board
None.
Issue 3: high failure rate of the PCI bus on the cross-connect board
Perform a cross-connect board switchover and a cold reset on the faulty cross-connect board.
Issue 4: suspension of the SD5805 chip on the cross-connect board in a reserved slot of the U64 subrack
Perform a cold reset on the faulty cross-connect board.
U2UXCS on OptiX OSN 9800/9600 U32/U64
S1UXCS on OptiX OSN 9800/9600 U16
V100R002C10SPC200
V100R002C10SPC310
Issue 1: soft failure of the control logic on the cross-connect board
Perform a cold reset on the faulty cross-connect board.
Issue 2: high failure rate of the SD5805 chip on the cross-connect board
None.
Issue 3: high failure rate of the PCI bus on the cross-connect board
Perform a switchover and a cold reset on the faulty cross-connect board.
TN52SCC on OptiX OSN 9800/9600 platform subrack
All versions
Issue 5: alarm POWER_FAIL reported on the TN52SCC board
l Manually synchronize the time on the NMS.
l Based on the NMS operation logs, re-deliver the configurations added within 30 minutes before the power failure.
Workarounds:
Issue 5: Mask the POWER_FAIL alarm of the TN52SCC board on the NMS.
Other issues: None.
Preventive measures:
Install the corresponding latest hot patches for NE versions involved in this rectification notice as follows:
l For non-mainstream versions:
Upgrade the non-mainstream versions to mainstream versions, and then install the corresponding latest hot patches with reference to [TN-R-201703] Notice on Rectification for Upgrading the Non-mainstream Versions of OptiX OSN 9800&9600 at the following URL:
http://support.huawei.com/carrier/docview!docview?nid=SC2000006945&path=PBI1-7275726/PBI1-7275738/PBI1-7275807/PBI1-22318904/PBI1-21110042
l For mainstream versions:
Install the corresponding latest hot patches. The following table lists the mapping between mainstream versions and hot patch versions.
Mainstream Version
Hot Patch Version
V100R001C20SPC360
V100R001C20SPC360SPH376
V100R002C10SPC200
V100R002C10SPC200SPH330
V100R002C10SPC310
V100R002C10SPC310SPH351
V100R002C10SPC300
V100R002C10SPC300SPH350
V100R003C10SPC200
V100R003C10SPC200SPH220
Precautions for installing the hot patches:
The following operations must be performed to activate the hot patches for cross-connect boards:
Version
Procedure
OptiX OSN 9800 V100R001C20SPH376
When the active and standby system control boards report the NO_ELABEL alarms, reset the standby system control board first, wait for 10 minutes and confirm that the standby system control board has started, then reset the active system control board. When the active system control board has started, the patch for the cross-connect board is activated.
For NEs in the master and slave subracks, follow the procedures in Slave Subrack Cross-Connect Patch Notes.
OptiX OSN 9800 V100R002C10SPH330
OptiX OSN 9800 V100R002C10SPH350
OptiX OSN 9800 V100R002C10SPH351
OptiX OSN 9800 V100R003C10SPH210
When the active and standby system control boards report the NO_ELABEL alarms, reset the standby system control board first, wait for 10 minutes and confirm that the standby system control board has started, then perform a working/protection switchover to switch the active and standby system control boards. Wait for 10 minutes and confirm that the switchover is successful, then perform a working/protection recovery to switch the active and standby system control boards again. When the second switchover is successful, the patch for the cross-connect board is activated.
For other precautions, see the Precautions for Installing Hot Patches of OptiX OSN 9800 Products
Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
Politically sensitive content
Content concerning pornography, gambling, and drug abuse
Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."