Got it

Notice on Rectification for the Version Upgrade of Cross-Connect Boards and System Control Boards on OptiX OSN 9800 and 9600

Latest reply: Aug 21, 2018 08:48:11 508 2 0 0 0
Keywords: OptiX OSN 9800, OptiX OSN 9600, cross-connect board, soft failure, PCIE Summary: The following issues may occur when the cross-connect boards and system control boards of OptiX OSN 9800/9600 products are running: l Soft failure of the control logic on the cross-connect board l Failure to report alarms or perform switchovers in case of a PCIE abnormality l Alarm POWER_FAIL reported after the battery on the system control board is exhausted These issues may interrupt services during normal live-network equipment running, SNCP switchovers, and MPN switchovers of the cross-connect boards and will result in many faulty cross-connect boards and system control boards. To prevent these issues, proactively upgrade the products to the latest patches of the corresponding mainstream version. [Problem Description] The following table lists the major issues. For detailed information of other issues, see the version release note. Board Version Issue Description U2UXCS on OptiX OSN 9800/9600 U32/U64 V100R001C20SPH372 and earlier versions Issue 1: soft failure of the control logic on the cross-connect board No alarm is reported and no switchover is performed when the cross-connect board is faulty. Issue 2: high failure rate of the SD5805 chip on the cross-connect board When a chip is faulty, a HARD_BAD alarm is reported and the cross-connect board is switched normally. The parameters of the HARD_BAD alarm (two SD5805 chips involved) on the U2UXCS board are as follows: l HARD_BAD, 0x0f,0x05,0x00,0xff,0xff,0xff l HARD_BAD, 0x0f,0x05,0x01,0xff,0xff,0xff Issue 3: high failure rate of the PCI bus on the cross-connect board When a PCI bus is faulty, a HARD_ERR alarm is reported but no switchover is triggered. The parameters of the HARD_ERR alarm (two PCI buses involved) on the U2UXCS board are as follows: l HARD_ERR, 0x15,0x08,0x04,0xff,0xff,0xff l HARD_ERR, 0x15,0x08,0x05,0xff,0xff,0xff Issue 4: abnormal SD5805 chip on the cross-connect board in a reserved slot of the U64 subrack U2UXCS on OptiX OSN 9800/9600 U32/U64 S1UXCS on OptiX OSN 9800/9600 U16 V100R002C10SPC200 V100R002C10SPC310 Issue 1: soft failure of the control logic on the cross-connect board No alarm is reported and no switchover is performed when the cross-connect board is faulty. Issue 2: high failure rate of the SD5805 chip on the cross-connect board When a chip is faulty, a HARD_BAD alarm is reported and the cross-connect board is switched normally. The parameters of the HARD_BAD alarm (two SD5805 chips involved) on the U2UXCS board are as follows: l HARD_BAD, 0x0f,0x05,0x00,0xff,0xff,0xff l HARD_BAD, 0x0f,0x05,0x01,0xff,0xff,0xff Issue 3: high failure rate of the PCI bus on the cross-connect board When a PCI bus is faulty, a HARD_BAD alarm is reported but no switchover is triggered. The parameters of the HARD_BAD alarm (two PCI buses involved) on the U2UXCS board are as follows: l HARD_BAD, 0x15,0x08,0x04,0xff,0xff,0xff l HARD_BAD, 0x15,0x08,0x05,0xff,0xff,0xff The parameters of the HARD_BAD alarm (one PCI bus involved) on the S1UXCS board are as follows: HARD_BAD, 0x15,0x08,0x04,0xff,0xff,0xff V100R002C10SPC300 V100R003C10SPC200 Issue 1: soft failure of the control logic on the cross-connect board No alarm is reported and no switchover is performed when the cross-connect board is faulty. Issue 3: high failure rate of the PCI bus on the cross-connect board When the PCI bus is faulty, a HARD_BAD alarm is reported and the cross-connect board is switched normally. The parameters of the HARD_BAD alarm (two PCI buses involved) on the U2UXCS board are as follows: l HARD_BAD, 0x15,0x08,0x04,0xff,0xff,0xff l HARD_BAD, 0x15,0x08,0x05,0xff,0xff,0xff The parameters of the HARD_BAD alarm (one PCI bus involved) on the S1UXCS board are as follows: HARD_BAD, 0x15,0x08,0x04,0xff,0xff,0xff TN52SCC on OptiX OSN 9800/9600 platform subrack All versions Issue 5: alarm POWER_FAIL with parameter 0x05 reported on the TN52SCC board The preceding table lists only the major issues that may occur in mainstream versions. The issues in non-mainstream versions are not described here. For details, see [TN-R-201703] Notice on Rectification for Upgrading the Non-mainstream Versions of OptiX OSN 9800&9600. Trigger condition: An NE uses a software version that is involved in this rectification notice. Symptom: For details, see section "Problem Description." Identification method: Check whether the NE software version is involved in this rectification notice. [Root Cause] The following table lists the root cause of each issue. Issue Root Cause Issue 1: soft failure of the control logic on the cross-connect board A soft failure will occur on the FPGA chip if it is affected by external electromagnetic radiation or interference. Services will be interrupted if the soft failure affects the cross-connect matrix on the cross-connect board. Issue 2: high failure rate of the SD5805 chip on the cross-connect board Because of the IBM manufacturing process defect, a hard failure may occur on the RAM inside the SD5805 chip, resulting in the HARD_BAD alarm. Issue 3: high failure rate of the PCI bus on the cross-connect board The reliability of the PCIE module inside the FPGA on the cross-connect board is not ensured and there is a low probability that the PCI bus becomes abnormal. A cold reset needs to be performed on the cross-connect board to resolve this issue. Issue 4: suspension of the SD5805 chip on the cross-connect board in a reserved slot of the U64 subrack There is a software defect on the delivered cross-connect boards for V100R001C20SPC360. When an upgrade is performed in package or patch loading mode, the signals for controlling the page switching of the SD5805 chip will be abnormal for several seconds. As a result, there is a possibility of suspension of the automatic switching state machine of the SD5805 chip. Issue 5: alarm POWER_FAIL reported on the TN52SCC board The battery discharge capability varies with temperature. The battery discharges faster under a higher temperature. If the battery ambient temperature is high (> 45°C) due to the chip heat dissipation during system control board running, the battery will be exhausted earlier than its expected lifespan. A POWER_FAIL alarm (with parameter 0x5) will be reported after the software detects battery insufficiency. [Impact and Risk] The following table lists the impacts and risks of each issue. Board Version Issue Impact and Risk U2UXCS on OptiX OSN 9800/9600 U32/U64 V100R001C20SPH372 and earlier versions Issue 1: soft failure of the control logic on the cross-connect board No alarm is reported. Services are interrupted after the cross-connect board failure. Issue 2: high failure rate of the SD5805 chip on the cross-connect board l The cross-connect board failure rate is high. l The alarm is reported and the switchover is performed automatically. Services are not interrupted. Issue 3: high failure rate of the PCI bus on the cross-connect board l The HARD_ERR alarm is reported, but the switchover is not performed automatically. Services are not interrupted. l New configurations cannot be delivered successfully. l Services are interrupted after the service protection group switchover. Issue 4: abnormal SD5805 chip on the cross-connect board in a reserved slot of the U64 subrack l No alarm is reported. Services are not interrupted. l Services may be interrupted after the cross-connect board switchover. U2UXCS on OptiX OSN 9800/9600 U32/U64 S1UXCS on OptiX OSN 9800/9600 U16 V100R002C10SPC200 V100R002C10SPC310 Issue 1: soft failure of the control logic on the cross-connect board No alarm is reported. Services are interrupted after the cross-connect board failure. Issue 2: high failure rate of the SD5805 chip on the cross-connect board l The cross-connect board failure rate is high. l The alarm is reported and the switchover is performed automatically. Services are not interrupted. Issue 3: high failure rate of the PCI bus on the cross-connect board l The HARD_ERR alarm is reported, but the switchover is not performed automatically. Services are not interrupted. l New configurations cannot be delivered successfully. l Services are interrupted after the service protection group switchover. V100R002C10SPC300 V100R003C10SPC200 Issue 1: soft failure of the control logic on the cross-connect board No alarm is reported. Services are interrupted after the cross-connect board failure. Issue 3: high failure rate of the PCI bus on the cross-connect board The HARD_BAD alarm is reported, and the switchover is performed automatically. Services are not interrupted. TN52SCC on OptiX OSN 9800/9600 platform subrack All versions Issue 5: alarm POWER_FAIL reported on the TN52SCC board Only the system control boards in the master subracks are affected. l The services are running normally without impact. l The time is changed to the year 1990 after a power failure. New configurations added within 30 minutes before the power failure will be lost. [Measures and Solutions] Recovery measures: Board Version Issue Recovery Measure U2UXCS on OptiX OSN 9800/9600 U32/U64 V100R001C20SPH372 and earlier versions Issue 1: soft failure of the control logic on the cross-connect board Perform a cold reset on the faulty cross-connect board. Issue 2: high failure rate of the SD5805 chip on the cross-connect board None. Issue 3: high failure rate of the PCI bus on the cross-connect board Perform a cross-connect board switchover and a cold reset on the faulty cross-connect board. Issue 4: suspension of the SD5805 chip on the cross-connect board in a reserved slot of the U64 subrack Perform a cold reset on the faulty cross-connect board. U2UXCS on OptiX OSN 9800/9600 U32/U64 S1UXCS on OptiX OSN 9800/9600 U16 V100R002C10SPC200 V100R002C10SPC310 Issue 1: soft failure of the control logic on the cross-connect board Perform a cold reset on the faulty cross-connect board. Issue 2: high failure rate of the SD5805 chip on the cross-connect board None. Issue 3: high failure rate of the PCI bus on the cross-connect board Perform a switchover and a cold reset on the faulty cross-connect board. TN52SCC on OptiX OSN 9800/9600 platform subrack All versions Issue 5: alarm POWER_FAIL reported on the TN52SCC board l Manually synchronize the time on the NMS. l Based on the NMS operation logs, re-deliver the configurations added within 30 minutes before the power failure. Workarounds: Issue 5: Mask the POWER_FAIL alarm of the TN52SCC board on the NMS. Other issues: None. Preventive measures: Install the corresponding latest hot patches for NE versions involved in this rectification notice as follows: l For non-mainstream versions: Upgrade the non-mainstream versions to mainstream versions, and then install the corresponding latest hot patches with reference to [TN-R-201703] Notice on Rectification for Upgrading the Non-mainstream Versions of OptiX OSN 9800&9600 at the following URL: http://support.huawei.com/carrier/docview!docview?nid=SC2000006945&path=PBI1-7275726/PBI1-7275738/PBI1-7275807/PBI1-22318904/PBI1-21110042 l For mainstream versions: Install the corresponding latest hot patches. The following table lists the mapping between mainstream versions and hot patch versions. Mainstream Version Hot Patch Version V100R001C20SPC360 V100R001C20SPC360SPH376 V100R002C10SPC200 V100R002C10SPC200SPH330 V100R002C10SPC310 V100R002C10SPC310SPH351 V100R002C10SPC300 V100R002C10SPC300SPH350 V100R003C10SPC200 V100R003C10SPC200SPH220 Precautions for installing the hot patches: The following operations must be performed to activate the hot patches for cross-connect boards: Version Procedure OptiX OSN 9800 V100R001C20SPH376 When the active and standby system control boards report the NO_ELABEL alarms, reset the standby system control board first, wait for 10 minutes and confirm that the standby system control board has started, then reset the active system control board. When the active system control board has started, the patch for the cross-connect board is activated. For NEs in the master and slave subracks, follow the procedures in Slave Subrack Cross-Connect Patch Notes. OptiX OSN 9800 V100R002C10SPH330 OptiX OSN 9800 V100R002C10SPH350 OptiX OSN 9800 V100R002C10SPH351 OptiX OSN 9800 V100R003C10SPH210 When the active and standby system control boards report the NO_ELABEL alarms, reset the standby system control board first, wait for 10 minutes and confirm that the standby system control board has started, then perform a working/protection switchover to switch the active and standby system control boards. Wait for 10 minutes and confirm that the switchover is successful, then perform a working/protection recovery to switch the active and standby system control boards again. When the second switchover is successful, the patch for the cross-connect board is activated. For other precautions, see the Precautions for Installing Hot Patches of OptiX OSN 9800 Products

:)
View more
  • x
  • convention:

good share but all the text are so congested
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.