System Reboot and Card Reset Troubleshooting

[复制链接]
发表于 : 2017-2-28 15:28:24 最新回复:2017-03-03 10:05:40
8221 3
交换机在江湖 官方号

1 General Fault Analysis Flowchart

An unexpected reboot of a switch or reset of a card will interrupt running services. This section helps you quickly understand how to handle such problems. Knowing the possible causes of such problems, you can take preventive measures against some causes to protect running services.

Figure 2-1 provides handling methods for a reboot of a fixed switch, reset of all cards in a modular switch, and reset of a single card.

Figure 1-1 General fault analysis flowchart

20170228151625817001.png

 

20170228151626087002.jpg

If a switch fails to start after a reboot or a card fails to register after a reset, see the troubleshooting manual for switch startup and card registration failures.


2 Unexpected Reboot of a Fixed Switch

2.1 A Switch Starts Successfully After an Unexpected Reboot

 

 

2.1.1 Troubleshooting Flowchart

Figure 2-1 Troubleshooting flowchart for an unexpected reboot of a fixed switch

20170228151627101003.png

 

2.1.2 Diagnosis and Troubleshooting Procedures

2.1.2.1 Checking the Switch Model and Version

                               Step 1     Run the display device command to check the model and status of the switch.

<HUAWEI> display device
S5700-52P-LI-AC's Device status:             
Slot Sub  Type    Online    Power      Register     Status   Role 
------------------------------------------------------------------
0   - S5720-56C-HI-AC Present   owerOn   Registered   Normal   Master

The command output shows that the switch model is S5720-56C-HI-AC.

                               Step 2     Run the display version command to check the software version running on the switch.

<HUAWEI> display version 
Huawei Versatile Routing Platform Software
VRP (R) software, Version 5.160 (S5720 V200R008C00)
Copyright (C) 2000-2015 HUAWEI TECH CO., LTD
HUAWEI S5720-56C-HI-AC Routing Switch uptime is 0 week, 1 day, 3 hours, 24 minutes
ES5D2T52C001 0(Master) : uptime is 0 week, 1 day, 3 hours, 23 minutes
4095M bytes DDR Memory
64M bytes FLASH
Pcb      Version :  VER.A
Basic  BootROM  Version : 0208.0015 Compiled at Mar 20 2014 , 22:53:47
BootLoad  Version : 0208.0015 Compiled at Mar 14 2014 , 13:33:43
CPLD   Version : 0100
Software Version : VRP (R) Software, Version 5.160 (V200R008C00)
CARD2 information
Pcb      Version : ES5D21X04S01 VER.A
PWR1 information
Pcb      Version : PWR VER.A 
 
 

The command output shows that the software version is V200R008C00.

----End

2.1.2.2 Checking the Cause of Reboot

                               Step 1     Run the display reboot-info command to check the reboot information.

<HUAWEI> display reboot-info
Slot ID   Times          Reboot Type          Reboot Time(DST)   
==================================================================
0         1              POWER               2013/07/18 19:19:56 
0         2              SCHEDU              2013/07/18 18:51:04 
0         3              SOFTWARE            2013/07/18 18:41:22 
0         4              EXCEPTION           2013/07/18 17:38:26 
0         5              MANUAL              2013/07/18 17:31:14 
0         6              MANUAL              2013/07/18 17:26:01 
0         7              EXCEPTION           2013/07/18 17:03:28 
==================================================================
Total   7

Table 2-1 Description of the display reboot-info command output

Item

Description

Slot ID

Stack ID (stacking function enabled) or slot ID (stacking function not enabled) of the switch.

Times

Number of reboots.

Reboot Type

Reboot type, which can be MANUAL, POWER, SCHEDU, FSP, EXCEPTION, VRP, SOFTWARE, or OTHER.

Reboot Time(DST)

Time when a reboot occurs.

If the switch does not support real-time communication (RTC), it will synchronize its system clock with the clock source on the network in 120 seconds after Network Time Protocol (NTP) is configured. During clock synchronization, the factory default clock time is displayed. If clock synchronization fails, the factory default clock time is displayed.

 

                               Step 2     Analyze the cause of reboot and take corresponding measures. Table 3-2 describes the eight causes of reboot that can be displayed in the display reboot-info command output.

Table 2-2 Reboot types, causes, and handling methods

Reboot Type

Cause

Handling Method

MANUAL

The switch was manually rebooted using the reboot command or the network management system (NMS).

Check whether any authorized user has rebooted the switch.

POWER

The switch was power cycled, usually by unplugging and plugging its power cable.

Check alarms on the switch, power module of the switch, and power supply environment to determine whether the reboot is caused by any of the following:

l  The switch has been manually powered off.

l  The external power supply system is unstable.

l  The power module of the switch has failed.

SCHEDU

The schedule reboot command was executed to reboot the switch at specified time.

This is a normal reboot and requires no action.

FSP

The reboot is caused by stack split, merging, or incorrect Mod ID allocation.

Check alarms and logs on the switch to locate the problem.

EXCEPTION

An exception or deadloop occurred.

Check alarms and logs on the switch to locate the problem.

VRP

The reboot was caused by the VRP software platform.

Check alarms and logs on the switch to locate the problem.

SOFTWARE

The reboot was caused by other software-related factors that can be traced.

Check alarms and logs on the switch to locate the problem.

OTHER

l  A hardware component, such as the flash or memory, has failed.

l  The equipment has overheated.

l  The switch was power recycled instantly, for example, when the power cable is in bad contact, or transient overvoltage and loss of voltage occurs. In this case, check whether the power cable is correctly connected to the switch.

l  The reboot was caused by other reasons that cannot be categorized into the preceding types. For example, the switch rebooted after joining a stack.

l  3.1.2.4 Check Switch Appearance and Environment

l  Check alarms and logs on the switch to locate the problem.

 

----End

2.1.2.3 Checking Alarms

How to Check Alarms on a Switch

If a switch fails or cannot operate normally because the environmental conditions do not meet operating requirements, it will generate alarm messages depending on the type of the problem.

Use either of the following methods to view alarm messages:

l   Log in to the network management system (for example, eSight) to view alarm messages.

l   Run the display trapbuffer [ size value ] command on the CLI of the switch to view alarm messages in the trap buffer.

The value parameter determines the maximum number of alarm messages that can be displayed in the command output. If the actual number of alarm messages is smaller than the specified value, all the available alarm messages are displayed.

<HUAWEI> display trapbuffer
Trapping buffer configuration and contents : enabled                            
Allowed max buffer size : 1024                                                  
Actual buffer size : 256                                                         
Channel number : 3 , Channel name : trapbuffer                                  
Dropped messages : 0                                                            
Overwritten messages : 6248                                                      
Current messages : 256                                                                                                              
#Sep 19 2012 04:38:03+08:00 HUAWEI DS/4/DATASYNC_CFGCHANGE:OID 1.3.6.1.4.1.2011
.5.25.191.3.1 configurations have been changed. The current change number is 8, 
the change loop count is 0, and the maximum number of records is 4095.          
#Sep 19 2012 04:37:39+08:00 HUAWEI LINE/5/VTYUSERLOGIN:OID 1.3.6.1.4.1.2011.5.2
5.207.2.2 A user login. (UserIndex=34, UserName=VTY, UserIP=10.135.18.114, UserC
hannel=VTY0)                                                        

You can also use the following commands to check specific types of alarm messages:

l   display alarm urgent: displays alarm messages about hardware exceptions, such as exceptions of equipment temperature, fans, and chips.

l   display alarm active: displays alarms that have not been cleared after start of the switch.

l   display alarm history: displays historical alarms recorded after start of the switch.

Common Alarms About Switch Reboots and Handling Methods

Table 2-3 Common alarms about switch reboots and handling methods

20170228160932773001.png
20170228160932943002.png
20170228160933910003.png
20170228160934484004.png
20170228160935919005.png
20170228160936556006.png
20170228160937379007.png

20170228151626087002.jpg

The following tips will help you quickly find the reference information for a specific alarm:

l  An alarm ID uniquely identifies an alarm. You can search for the ID of an alarm in the Alarm Reference to find the meaning of the alarm and handling procedure.

l  Alarms with the same ID but triggered by different causes are identified by different error codes (for example, BaseTrapProbableCause). You can search for the error code in the Alarm Reference.

l  You can also use the information query tool to query alarm information.

Do not search for alarms based on variables, such as alarm generation time, interface number, process ID, and device name.

2.1.2.4 Check Switch Appearance and Environment

If the cause of reboot displayed in the display reboot-info command output is POWER or OTHER, or alarms on power supply, fan module, or temperature are found, check the appearance and operating environment of the switch to locate the fault.

Checking Whether the Reboot Is Caused by a Power Supply Exception

                               Step 1     Determine whether a power failure has occurred around the reboot time. Check the following:

l   Whether any operations caused the switch to be powered off

l   Whether any exceptions are recorded in logs of the UPS (if the switch is powered by a UPS)

l   Whether other devices in the same rack or powered by the same power supply system were powered off at that time

l   Whether any high-power device was connected to the network at that time

l   Whether any power lines are aged or loose

l   Whether the input voltage is in the normal range (measure using a multimeter)

If any of the preceding situations exists, take measures to fix the problem of the external power supply system.

                               Step 2     Check whether there is obvious yellow mark on ports of the switch. If some RJ45 connectors turn yellowish, the switch may have experienced lightning strikes or surge current, which can damage CPU chips. In this case, go to section 3.1.2.6 Contacting Technical Support.

                               Step 3     If the external power supply system is normal, check whether the power module of the switch is faulty.

l   If the switch uses a built-in power module that cannot be replaced for cross testing, check whether there are alarms about the power module. If so, go to section 3.1.2.6 Contacting Technical Support.

l   If the switch uses a pluggable power module, check whether it is securely installed in the slot. After approved by the customer, move the problematic power module to the other power slot or replace it with another power module to check whether the power module or the switch itself is faulty. If the power module is faulty, replace it. If the switch is faulty, go to section 3.1.2.6 Contacting Technical Support.

----End

Checking Whether the Reboot Is Caused by High Temperature or Failure of Fans

                               Step 1     Check whether the operating temperature is in the normal range (generally 0°C to 45°C). If the temperature is too high, lower the temperature in the equipment room.

                               Step 2     Check whether cooling airflows of the switch are blocked. If there are obstacles nearby to affect cooling of the switch, remove the obstacles and check whether the equipment temperature drops to the normal range.

                               Step 3     If the switch uses the forced air cooling mode, check whether its fan modules are removed or loosely installed.

                               Step 4     If fan modules are securely installed, check whether fans are running normally and whether there are airflows at the air exhaust vents. If fans are not running or fan alarms are generated, replace the fan module. If the switch uses built-in fans, go to section 3.1.2.6 Contacting Technical Support.

----End

2.1.2.5 Checking Logs

If the procedures described in the preceding sections cannot locate the cause of the reboot, check logs on the switch.

How to Check Logs on a Switch

The log module of the system software logs events occurring during system operations. Logs are reference information for system diagnosis and maintenance, and help you check the equipment running status, analyze network condition, and locate faults.

To check logs on a switch, log in to the switch through the console port or using Telnet, and then run the display logbuffer command. You can also save log information on the switch and use the syslog protocol to export logs to a log server.

# Run the display logbuffer command to check all logs in the log buffer.

<HUAWEI> display logbuffer
Logging buffer configuration and contents : enabled                  
Allowed max buffer size : 1024                                        
Actual buffer size : 512                                             
Channel number : 4 , Channel name : logbuffer                        
Dropped messages : 0                                                 
Overwritten messages : 0                                             
Current messages : 43                                                
 
Oct 16 2013 06:06:48 HUAWEI %VFS/4/DISKSPACE_NOT_ENOUGH(l)[3]: Disk space is insufficient. The system begins to delete unused log files. 
Oct 10 2013 19:06:48 HUAWEI %VFS/4/DISKSPACE_NOT_ENOUGH(l)[4]: Disk space is insufficient. The system begins to delete unused log files.                                                     
  ---- More----

Common Reboot-Related Logs and Handling Methods

Table 2-4 Common reboot-related logs and handling methods

Digest

Log Description

Possible Cause

Handling Method

FSP/4/ID_ASSIGNED

The master switch assigns two different stack IDs to a slave switch.

The slave switch restarts due to an exception.

The problematic switch will restart automatically. If the fault persists after the restart, go to section 3.1.2.6 Contacting Technical Support.

FSP/4/COLLECT_TIMEOUT

The connection with a slave switch times out.

The slave switch does not work normally.

FSP/4/SPDU_LOST_NOTRUN

Heartbeat packets from the master switch are lost on a slave switch when the slave switch is in non-running state.

An exception occurs on the slave switch or the master switch interface connected to the slave switch is faulty.

FSP/4/SPDU_LOST

SPDUs from the master switch are lost on a stack member switch.

SPDUs from the master switch are lost on a stack member switch.

FSP/4/LOST_IDENTIFY

The master switch cannot identify a stack member switch.

The switch cannot join the stack.

The unidentifiable switch will restart automatically. Check whether the stack configuration is correct.

FSP/4/TOPO_CHANGE

The stack topology has changed (from ring to chain or from chain to ring).

A switch has left or joined the stack system.

l  Check whether any user has triggered a stack split. If so, ignore the log. If not, go the next step.

l  Check whether the member switch that has left the stack is powered off. If so, power it on. The switch will then join the stack and complete the merge process automatically. If the switch is not powered off, go to the next step.

l  Check whether the stack configuration is mistakenly deleted or modified.

Run the display stack current-configuration command to check whether the stack configuration of the member switch has been modified. If so, restore the original configuration and check whether the switch joins the stack again. If the switch does not join the stack or its stack configuration has not been modified, go to the next step.

l  Check whether the stack link is faulty.

Run the display interface stack-port command to check whether the stack link connected to the member switch is Up and whether packets are sent and received normally on the stack port. If the stack port is Down or the number of packets sent and received on the stack port is small, the stack link on the port is faulty. Replace the stack cable or optical module on the stack port.

If the stack link is normal but the member switch still cannot join the stack, go to the next step.

l  3.1.2.6 Contacting Technical Support

FSP/4/NBR_LOST

The neighbor of a stack port is found lost.

A member switch has left the stack or failed.

FSP/4/STACK_LEAVE

A member switch has left the stack.

The stack of the switch is Down.

LOAD/6/CLIENTLEFT

A new member switch left the stack when it was downloading the system software. The log also provides the ID of the stack member switch that provided the system software for the new member switch.

During stack setup or merging, a new member switch that uses a different system software version from others will request for the system software from an adjacent switch that has downloaded the system software. If the new member switch is powered off or its stack cable does not work when it is downloading the system software, this log is recorded in the stack system.

LOAD/6/SLOTLEFT

A member switch has left the stack.

The stack splits or a member switch is removed from the stack.

MAD/4/CONFLICT_DETECT

A multi-active condition is detected.

More than one master switch exists due to a stack link failure.

Rectify the stack link failure.

FSP/4/SWTICH_REBOOTING

A member switch restarts when multiple stacks are merging.

During the stack merging process, a member switch that fails the master competition restarts and joins the new stack.

This is a normal situation, and no action is required.

SRM/3/REF_CLK_FAULT

The reference clock on the XAUI interface has failed. The switch may restart if the fault persists for long.

The reference clock on the XAUI interface has failed.

3.1.2.6 Contacting Technical Support

 

20170228151626087002.jpg

The following tips will help you quickly find the reference information for a specific log:

l  A digest uniquely identifies a log. You can search for the digest of a log in the Log Reference to find the meaning of the log and handling procedure.

l  Do not search for logs using variables, such as log generation time, interface number, process ID, and device name.

Example:

To find reference information for the log: Apr 27 2014 07:45:35 HUAWEI %SHELL/4/LOGIN_FAIL_FOR_INPUT_TIMEOUT(s)[6]:Failed to log in due to timeout.(Ip=10.135.19.157, UserNa me=**, Times=1, AccessType=TELNET, VpnName=), search for the digest LOGIN_FAIL_FOR_INPUT_TIMEOUT in the Log Reference. Then you will find the explanation of the log: After entering a user name or password, a user failed to log in because of a timeout.

2.1.2.6 Contacting Technical Support

If you have trouble locating a switch reboot problem, collect related information and send it to Huawei agent or Huawei for fault location.

Collect the following information:

• Fault occurrence time, network topology of the failure point (for example, the upstream and downstream devices connected to the failure point, and location of the failure point), operations performed before the fault occurs, measures taken to handle the fault and results of the measures, fault symptom, and impact on services.

• Name, version, and current configuration of the faulty device, as well as related interface information. For details, see Collecting Diagnostic Information Using One Command.

• Logs recorded when the fault occurred.

• If a switch fails to start after a reboot, collect the serial port information printed during the startup process.

Collecting Diagnostic Information Using One Command

The display diagnostic-information command provides outputs of multiple commonly used display commands. You can use this command to view diagnostic information about a switch, including the startup configuration, current configuration, interface information, time, and system software version. It is an effective information collection tool.

The display diagnostic-information [ file-name] command can display running diagnostic information on screen or export it to a .txt file. If you do not specify the file-name  parameter, the command displays diagnostic information on screen. If you specify a file name, diagnostic information will be saved in the .txt file with the specified name. It is recommended that you export diagnostic information to a .txt file. The following is an example:

<HUAWEI> display diagnostic-information dia-info.txt
  This operation will take several minutes, please wait.........................
Info: The diagnostic information was saved to the device successfully.

The .txt file is saved in flash:/ by default. You can run the dir command in the user view to check whether the .txt file exists.

If diagnostic information is displayed on screen, you can press Ctrl+C to stop the display.

This command is used to collect diagnostic information for fault location. Executing this command may affect the system performance. For example, it may cause a high CPU usage. Therefore, do not run this when the switch is running normally. Do not run this command on multiple terminals connected to the switch at the same time. Otherwise, the CPU usage of the switch will increase sharply, causing system performance deterioration.

Commonly used terminal software supports information output to a specified file. For example, if you are using the HyperTerminal software of a Windows operating system, choose Transfer > Capture Text, enter the file name, and click Start. After that, run the display diagnostic-information command. Then all diagnostic information is displayed on the terminal screen and automatically saved in a file in the specified path.

Obtaining Log Files

Logs and alarms of a switch can be saved in log files. Perform the following steps to obtain log files:

1.         Run the save logfile command to save information in the log buffer to log files.

2.         Upload all files in flash:/syslogfile/ (or flash:/logfile/ in V200R005C00 or a later version) and flash:/resetinfo/ to your computer using FTP or TFTP.

20170228151626087002.jpg

For a stack split or reset problem, collect log files in all the stack member switches.

2.2 A Switch Fails to Start After an Unexpected Reboot

 

 

 

If a switch fails to start after a reboot, determine the cause of the problem according to the information displayed during the startup process. Most of such startup failures are caused by hardware failures or missing or damage of the startup software package. This section describes typical symptoms and causes of startup failures and provides solutions to these failures. For the solutions to other types of startup failures, see the startup failure troubleshooting manual.

Fault Symptom 1

A switch restarts repeatedly and prints any of the following information during startup processes:

DRV_Arch_Init: chip_init ret 1
 root <cx_lsw_init.c,5554> DRV_Lsw_Init: DRV_Arch_Init Fail!
BIOS LOADING ...
Copyright (c) 2008-2010 HUAWEI TECH CO., LTD.
(Ver127, Jan 18 2011, 22:45:47)
 
Press Ctrl+B to enter BOOTROM menu... 0
Auto-booting...
Update Epld file ............................ None
Decompressing VRP software .................. done
USB2 Host Stack Initialized.
USB Hub Driver Initialized
USBD  Wind River Systems, Inc. 512 Initialized
EHCI Controller found.
Waiting to attach to USBD...
USB_MODE_REG=0x3
Done.
usbPegasusEndInit () returned OK
0x62ffe68 (tRootTask): usbBulkDevInit() returned OK
logTask: 6 log messages lost.
 
root <cx_lsw_init.c,4634> DRV_PDT_Func_Init: DRV_VLAN_AddMember ret 8
root <cx_lsw_init.c,5634> DRV_Lsw_Init: DRV_PDT_Func_Init Fail!
--------------------------------------------------------------------
soc num 1, port num 28
soc 0 info:
modid 0, devid 0xdd74, venid 0x11ab, bar0 0xf4000000, bar1 0x0
--------------------------------------------------------------------

Or:

There
Initializing LSW ....................... failed
is

Or:

Drv_Lsw_Probe: Warning: Not All Chip Probed!

Or:

Error: Some LSW chips are not detected

The preceding information indicates LSW chip exceptions, which are usually LSW initialization failures. Contact technical support in this case.

Fault Symptom 2

A switch reboots unexpectedly during operations but fails to start. It prints the following information during startup processes:

BIOS LOADING ...
BIOS LDDR SDRAM test ...............fail
Error type: Data bus walk 0
Error bus : MDQ 0x0000000B

The printed information indicates that the LDDR SDRAM test fails. This is generally caused by the failure of the double data rate (DDR) memory or CPU. Contact technical support in this case.

Fault Symptom 3

A switch reboots unexpectedly during operations but fails to start. Formatting and erasing the flash fail during the startup process.

FILESYSTEM SUBMENU
1. Erase Flash
2. Format flash
3. Delete file from Flash
4. Rename file from Flash
5. Display Flash files
6. Update EPLD file
7. Return to main menu
Enter your choice(1-7): 2
Note: Format flash will damage Flash file system.
Format flash? Yes or No(Y/N): y
Formatting Flash, please waiting several minutes. Track_record_number 29.9
format failed!
FILESYSTEM SUBMENU
1. Erase Flash
2. Format flash
3. Delete file from Flash
4. Rename file from Flash
5. Display Flash files
6. Update EPLD file
7. Return to main menu
Enter your choice(1-7): 1
Note: Erasing flash will damage Flash file system.
After erasing Flash, you should reset your system.
Erase flash? Yes or No(Y/N): y
Erase flash ...Erase failed!!

The common cause of this problem is that the flash memory is faulty. Contact technical support in this case.

Fault Symptom 4

A switch restarts repeatedly and prints either of the following information during startup processes:

Begin to start the system, please waiting ......
INSTALL IPC AND VP DRIVER........OK
VOS VFS init.....................OK
Startup File Check...............OK
Paf File Read....................OK
VOS monitor init.................OK
CFM init advance.................OK
PAT init ........................OK
HA S2M init......................OK
VOS VFS init hind ...............OK
VRP_Root begin...
VRP_InitializeTask begin...
Init the Device Link.............OK
CFG_PlaneInit begin..............OK
CFM_Init begin...................OK
CLI_CmdInit begin................OK
VRP_RegestAllLINKCmd begin.......OK
create task begin................
task init begin...
ECMM.........................................................................RUN
cmd register begin...
cmd register end...
Recover configuration...
Error: PoE driver init fail.

Or:

Recover configuration...
Error: Failed to initialize the PoE chips

The preceding information indicates a PoE initialization failure, which is caused by either of the following:

l   The PoE power module does not provide -53 V output.

l   The PoE module in the switch is faulty.

                               Step 1     Check whether a non-PoE power module is used. If so, replace it with a PoE power module.

                               Step 2     If a PoE power module is used, perform a cross test to check whether the power module is faulty. (Replace the PoE power module with another one or install the problematic PoE power module in another switch.)

                               Step 3     If the switch uses a built-in power module that cannot be replaced for cross testing, contact technical support.

----End

Fault Symptom 5

A switch restarts repeatedly and prints the following information during startup processes:

BIOS
Register Contents when exception occur:
sr = 0x0040FB7E       cause = 0x0000FB7E         epc = 0xBFC0FFFE
badVAdrs = 0xBFC0FFFE    at = 0x0000FFFE          v0 = 0xBFC0FFFE
v1 = 0x0000FFFE          a0 = 0x0000FFFE          a1 = 0x0000FFFE
a2 = 0x8000FFFE          a3 = 0x0000FFFE          t0 = 0xB800FFFE
t1 = 0x0000FFFE          t2 = 0x0000FFFE          t3 = 0xFFFFFFFE
t4 = 0x0000FFFE          t5 = 0x0001FFFE          t6 = 0xFFFFFFFE
t7 = 0xBFC0FFFE          t8 = 0x0000FFFE          t9 = 0xBFC0FFFE
s0 = 0xFFFFFFFE          s1 = 0x0000FFFE          s2 = 0xF7FDFFFE
s3 = 0xFFDDFFFE          s4 = 0xFFFFFFFE          s5 = 0xFFFFFFFE
s6 = 0x0000FFFE          s7 = 0xFBFFFFFE          k0 = 0x0000FFFE
k1 = 0x5555FFFE          gp = 0xDFEDFFFE          ra = 0xBFC0FFFE

This may be a hardware fault or BootROM damage. Possible causes include abnormal voltage during write or read operations, surge current, or electrostatic discharge. Contact technical support in this case.

Fault Symptom 6

A switch restarts repeatedly and the printed information contains Nand flash errors:

Press Ctrl+B to enter BOOTROM menu... 0
Auto-booting...
Loading[flash:/S5700LI-V200R001C00SPC300.cc].............
Assert at file: 'E://V2R1_Main_1//product//BSP//bsp//drv//flash//nand//nflash.c', Line: 620
 
Assert at file: 'E://V2R1_Main_1//product//BSP//bsp//drv//flash//nand//nflash.c', Line: 620
 
Assert at file: 'E://V2R1_Main_1//product//BSP//bsp//drv//flash//nand//nflash.c', Line: 620
 
Assert at file: 'E://V2R1_Main_1//product//BSP//bsp//drv//flash//nand//nflash.c', Line: 620

This problem occurs when the Nand flash detects bit flipping. Perform the following steps to rectify the fault:

                               Step 1     Use the BootROM menu to erase the flash (in V200R003 or a later version).

                               Step 2     Format the flash.

         BootLoad Menu                                            
     1. Boot with default mode                                    
     2. Enter serial submenu                                      
     3. Enter startup submenu                                      
     4. Enter ethernet submenu                                    
     5. Enter filesystem submenu                                  
     6. Enter password submenu                                    
     7. Clear password for console user                           
     8. Reboot                                                    
              
    Enter your choice(1-8): 5                                     
                                                                  
        FILESYSTEM SUBMENU                                        
     1. Erase Flash                                               
     2. Format Flash                                             

                               Step 3     Load the software package matching the BootROM to the switch again.

----End

Fault Symptom 7

A switch restarts repeatedly, and the information printed during startup processes indicates that the board type cannot be obtained.

BIOS LOADING ...                                                     
Can not get board information by GPIO, Please Check!                 
Don't support board type(0x0)!                                  
Copyright (c) 2008-2010 HUAWEI TECH CO., LTD.                        
(Ver128, Aug 24 2010, 21:58:24)                                       
Press Ctrl+B to enter BOOTROM menu ...                              
Auto-booting...                                                      
Please confirm app file typeID[0x0]!                                
Invalid package file!                                                 
Auto-booting failed!                                               
Auto-booting with last time startup file...                          
Last time startup file is the same as current startup file!        
Seeking a VRP software in flash file-system...                      
Now, Current startup file is flash:/S2300-V100R005C01SPC100.cc       
Please confirm app file typeID[0x0]!                                 
Invalid package file!                                                 
Auto boot failed!                                                    
Auto-booting failed!                                                
Reboot...                                                         
BIOS LOADING ...                                                     
Can not get board information by GPIO, Please Check!                 
Don't support board type(0x0)!                                      
Copyright (c) 2008-2010 HUAWEI TECH CO., LTD.                        
(Ver128, Aug 24 2010, 21:58:24)

This problem is typically caused by the use of non-certified optical modules. Such optical modules cause signal transmission interruption on the IIC bus. As a result, the system cannot obtain the board type. Remove all non-certified optical modules and check whether the switch can start. It is recommended that you replace all the non-certified optical modules with the ones that have been certified by Huawei.

Fault Symptom 8

A switch restarts repeatedly and displays CRC check errors during startup processes:

Error: Loading error in CRC checksum. File CRC is 0x1a20, calculated CRC is 0xc173
Error: Invalid package file

This problem usually occurs when the software package is damaged. For example, a power failure occurs when the system is writing the flash. To rectify the fault, load the software package again.

Fault Symptom 9

A switch restarts repeatedly and prints the following information during startup processes:

BIOS LOADING ...
Copyright (c) 2011-2012 HUAWEI TECH CO., LTD.
(Ver121, Jun 14 2012, 10:49:20)
Current flash Fs: DosFs
                                                                     
flash:/  - Volume is OK
Press Ctrl+B to enter BOOTROM menu... 0
Auto-booting...
Loading[flash:/s5700li-v200r001c00spc300.cc]...................
Update Epld file ............................ None
Decompressing VRP software ..................
 Decoding error = 1
failed!
 
Auto-booting failed!
 
Auto-booting with last time startup file...
The last time startup file is not a .cc file!
 
Seeking a VRP software in flash file-system...
flash:/s5700li-v200r001c00spc300.cc [49+2]...................
Now, Current startup file is flash:/s5700li-v200r001c00spc300.cc
Update Epld file ............................ None
Decompressing VRP software ..................
Decoding error = 1
failed!
 
Auto boot failed!
 
Auto-booting failed!
Reboot...

The preceding information indicates that the DDR memory has failed. Contact technical support in this case.

Fault Symptom 10

A switch restarts repeatedly and prints the following information during startup processes:

BIOS LOADING ...
Copyright (c) 2008-2011 HUAWEI TECH CO., LTD.
(Ver148, Jun 26 2012, 18:45:31)
 
Press Ctrl+B to enter BOOTROM menu ... 0
Auto-booting...
Decompressing Image file ... done
ERR

This problem is caused by an LSW initialization failure, a DDR memory failure, or damage of the PCB. Contact technical support in this case.

Summary

The following table lists the keywords in information printed during startup processes. You can determine whether a startup failure is caused by a faulty hardware component according to these keywords.

Table 2-5 Keywords in information printed during startup processes

Printed Information

Description

flash initialization failed

Flash initialization failed.

DRV_Lsw_Init: DRV_Arch_Init Fail!

LSW initialization failed.

Initializing LSW ........................ failed

LSW initialization failed.

Drv_Lsw_Probe: Warning: Not All Chip Probed!

LSW initialization failed.

Some LSW chips are not detected

LSW initialization failed.

PoE driver init fail

PoE initialization failed.

Failed to initialize the PoE chips

PoE initialization failed.

Don't support board type(0x0)!

The board type cannot be obtained.

Open %s failed

A file failed to be opened during file check.

Interconnection threestep selftest Error

The interconnection three-step self-test failed.

DDR SDRAM test ................. fail

The memory test failed.

DDR SDRAM test ................. Untest

The memory test is not performed.

DDR SDRAM test ................. Invalid

The memory test result is invalid.

Loading error in CRC checksum

The software package failed the CRC check.

Init flash update area error!

Flash memory re-initialization failed.

Password is wrong, System will reboot...

The user entered a wrong password.

Data error in Flash description area!

The flash description area contains incorrect data.

Data error in Flash description backup area!

The flash description backup area contains incorrect data.

Auto-booting...

The system automatically starts with the software package without displaying the BootROM menu, but it has not performed any operation when this information is displayed. Then the switch will search for the available software package according to information about the previous startup.

Decompressing VRP software...

The switch is decompressing the software package. It will continue the startup process if the operation succeeds or perform a version rollback if the operation fails.

Auto-booting with last time startup file...

The last automatic start failed and the system is performing a version rollback by starting with the software package used upon the last successful startup.

Last time startup file is the same as current startup file

The software package used upon the last successful startup is the same as the software package specified for this automatic start. The system does not use this package for version rollback. Instead, it searches for an available software package for startup.

Seeking a VRP software in flash file-system...

The system is searching for a software package for startup.

Auto-booting failed!

The automatic start failed.

VFS_FLASH_INIT failed

The flash file system failed to be initialized.

haven't %s device

The index number of the flash memory failed to be obtained.

Can not open Flash file: %s

A file failed to be opened.

The last time startup file is not a .cc file!

The type of the last startup file is incorrect.

Can not find any file in flash file-system!

No file is available in the flash.

%s is not a valid startup file!

The startup file found by the system is not a valid software package.

There is not other valid startup file in flash file-system!

No valid startup file can be found.


3 Card Resets in a Modular Switch

3.1 All Cards Reset

 

 

3.1.1 Troubleshooting Flowchart

A modular switch uses a distributed system architecture, in which each card has an independent system. The LPUs run independently and are managed by the active MPU. A failure of the active MPU in a switch will cause all LPUs to reset. If a switch has two MPUs, the standby MPU will change to the active state and take over services once the original active MPU fails. The original active MPU will become the standby MPU after an automatic reset. Therefore, reset of the active MPU will cause a system reboot in this case.

Figure 3-1 Troubleshooting flowchart for resets of all cards

20170228151627008004.png

 

3.1.2 Diagnosis and Troubleshooting Procedures

                               Step 1     Run the display device command to check the number of MPUs in the switch.

<HUAWEI> display device
S7712's Device status:
Slot  Sub Type         Online    Power      Register       Status     Role
-------------------------------------------------------------------------------
3     -   -            Present   PowerOff   Unregistered   -          NA
4     -   ES0D0G48TA00 Present   PowerOn    Registered     Normal     NA
6     -   ES0D0X4UXC00 Present   PowerOn    Registered     Normal     NA
9     -   ES0D0F48TC00 Present   PowerOn    Registered     Normal     NA
10    -   ES0D0G24SC00 Present   PowerOn    Registered     Normal     NA
13    -   -            Present   PowerOn    Unregistered   -          Slave
14    -   ES0D00SRUA00 Present   PowerOn    Registered     Normal     Master
PWR1  -   -            Present   PowerOn    Registered     Normal     NA
CMU1  -   LE0DCMUA0000 Present   PowerOn    Registered     Normal     Master
FAN1  -   -            Present   PowerOn    Registered     Normal     NA
FAN2  -   -            Present   PowerOn    Registered     Normal     NA
FAN3  -   -            Present   PowerOn    Registered     Normal     NA
FAN4  -   -            Present   PowerOn    Registered     Normal     NA 

                               Step 2     If the switch has only one MPU, all LPUs of the switch will reset after the MPU resets. To diagnose the MPU reset problem, see 4.2 A Single Card Resets.

                               Step 3     If the switch has two MPUs, the switch may have rebooted due to a power failure.

Check whether the external power supply system is working normally.

Run the display logbuffer command to check the reboot records of the switch and confirm whether a power failure occurred around the reboot time displayed in the command output. Check the following:

l   Whether any operations caused the switch to be powered off

l   Whether any exceptions are recorded in logs of the UPS (if the switch is powered by a UPS)

l   Whether other devices in the same rack or powered by the same power supply system are powered off

l   Whether any high-power device was connected to the network at that time

l   Whether any power lines are aged or loose

l   Whether the input voltage is in the normal range (measure using a multimeter)

If any of the preceding situations exists, take measures to fix the problem of the external power supply system.

                               Step 4     If the external power supply system works normally, run the display alarm all command to check whether there are alarms about power modules of the switch.

Common power module alarms include:

l   Power is invalid for not support: An incompatible power module is installed in the switch.

l   PWR_LACK and SWITCH_STAT sensor alarms for the same power module: The power module is present but has no power cable connected or its power switch is in OFF position.

l   PWR_FAULT alarm for a power module: The power module is experiencing a fan failure, output overvoltage, external short circuit, output failure, or input failure.

                               Step 5     After approved by the customer, move the problematic power module to another power slot or replace it with another power module to check whether the power module or the switch itself is faulty.

                               Step 6     If the power module is not faulty, go to section 4.2.2.6 Contacting Technical Support.

----End

3.2 A Single Card Resets

 

 

3.2.1 Troubleshooting Flowchart

Figure 3-2 Troubleshooting flowchart for reset of a single card

20170228151628333005.png

 

3.2.2 Diagnosis and Troubleshooting Procedures

3.2.2.1 Checking the Switch Model and Version

                               Step 1     Run the display device command to check the switch model and status of each module in the switch.

<HUAWEI> display device
S9706's Device status:                                                           
Slot  Sub Type         Online    Power      Register       Status     Role      
------------------------------------------------------------------------------- 
1     -   EH1D2X12SSA0 Present   PowerOn    Registered     Normal     NA         
4     -   -            Present   PowerOn    Unregistered   -          NA        
7     -   EH1D2SRUDC00 Present   PowerOn    Registered     Normal     Master    
PWR1  -   -            Present   -          Unregistered   -          NA        
PWR2  -   -            Present   PowerOn    Registered     Normal     NA        
CMU1  -   EH1D200CMU00 Present   PowerOn    Registered     Normal     Master    
FAN1  -   -            Present   PowerOn    Registered     Normal     NA        
FAN2  -   -            Present   PowerOn    Registered     Normal     NA       

The command output shows that the switch model is S9706. The status of cards, power modules, and fan modules in the switch is also displayed.

                               Step 2     Run the display version command to check the software version running on the switch.

<HUAWEI> display version 
Huawei Versatile Routing Platform Software                                      
VRP (R) software, Version 5.160 (S9700 V200R008C00SPC300)                       
Copyright (C) 2000-2016 HUAWEI TECH CO., LTD                                    
Quidway S9706 Terabit Routing Switch uptime is 0 week, 3 days, 18 hours, 31 minu
tes                                                                             
BKP 0 version information:                                                       
1. PCB      Version  : LE02BAKK VER.B                                           
2. Support  PoE      : No                                                       
3. Board    Type     : EH1BS9706E00                                              
4. MPU Slot Quantity : 2                                                        
5. LPU Slot Quantity : 6                                                        
                                                                                 
MPU 7(Master) : uptime is 0 week, 3 days, 18 hours, 31 minutes                  
SDRAM Memory Size    : 2048    M bytes                                          
Flash Memory Size    : 128     M bytes                                           
NVRAM Memory Size    : 512     K bytes                                          
CF Card1 Memory Size : 479     M bytes                                          
MPU version information :                                                        
1. PCB      Version  : LE02SRUD0 VER.D                                          
2. MAB      Version  : 1                                                        
3. Board    Type     : EH1D2SRUDC00                                             
4. CPLD0    Version  : 1411.2411                                                
5. BootROM  Version  : 0209.00dc                                                
6. BootLoad Version  : 0209.00fa                                                
7. FPGA     Version  : 1100.0800 

The command output shows that the software version is V200R008C00.

----End

3.2.2.2 Checking the Cause of the Reset

                               Step 1     Run the display reset-reason command to view information about all card reset events.

<HUAWEI> display reset-reason
The LPU frame[1] board[1] has no reset records.
The LPU frame[1] board[2] has no reset records.
The LPU frame[1] board[3]'s reset total 1, detailed information:
--  1. 2012/03/13   19:58:15, Reset No.: 1
       Reason: Check mod infomation fail
The MPU frame[1] board[4] has no reset records.
The MPU frame[1] board[5]'s reset total 967, detailed information:
--  1. 2012/03/20   13:07:52, Reset No.: 967
       Reason: Warm reset board for no receiving message in a long time
--  2. 2012/03/20   12:57:52, Reset No.: 966
       Reason: Warm reset board for no receiving message in a long time
--  3. 2012/03/20   12:47:52, Reset No.: 965
       Reason: Warm reset board for no receiving message in a long time
--  4. 2012/03/20   12:37:52, Reset No.: 964
       Reason: Warm reset board for no receiving message in a long time
--  5. 2012/03/20   12:27:52, Reset No.: 963
       Reason: Warm reset board for no receiving message in a long time

Alternatively, run the display reset-reason slot ID command to view reset information of the card in a specified slot.

Table 3-1 Description of the display reset-reason command output

Item

Description

LPU/MPU

Interface card or main processing unit.

frame

Chassis ID of the card.

board

Slot ID of the card.

reset total

Number of times the card has reset.

detailed information

Detailed reset information.

Reset No.

Reset sequence number.

Reason

Cause of a reset.

 

                               Step 2     Analyze the cause of the reset and take corresponding measures. Table 4-2 describes the causes that may be displayed in the display reset-reason command output and provides the handling methods.

Table 3-2 Reset causes and handling methods

Cause of Reset

Handling Method

User operations

Reset by user command

A user has reset the card using the command line interface or network management system.

Check whether any user with the reset privilege has reset the card.

Power off by user command

VRP reset selfboard because of command

Reset board by vrp cmd

Reset board by snmp

Reset for rollback

The demo time of license is overtime

The temporary license loaded on the card has expired.

Obtain a commercial license from Huawei.

System loading

Reset for load

During a software upgrade, an LPU resets after loading the system software.

This is a normal reset and requires no action.

Reset for lpu resource-mode disaccord with mpu

The resource mode configured on an LPU does not match that of the MPU.

This is a normal reset and requires no action.

Reset for the LPU patch file or module does not match that on the MPU

The patch package or plug-in specified for an LPU is different from that of the MPU.

After the LPU is registered, delete the patch package or plug-in, and then load the correct one.

Reset for initializing the board's status by IFNET

An LPU's status is initialized after an active/standby switchover.

l  If the LPU configuration is not recovered after the switchover, it cannot communicate with other cards.

l  It is a normal condition if the LPU works normally after the switchover.

Reset slave board for memsize too little

The memory size of the standby MPU is smaller than that of the active MPU.

Check the memory size of the standby MPU. If its memory size is smaller than that of the active MPU, replace the standby MPU.

Reset for slave board's card statement disaccord with master's

Only one of MPUs has a subcard (such as an FSU) installed.

Install the same subcard on the other MPU or remove the current subcard to ensure that the two MPUs have the same subcard configuration.

Reset for patch load

An LPU reset after loading a patch.

This is a normal condition and requires not action.

Reset for patch get state fail

The patch failed to be loaded to a card.

l  It is normal for such resets to occur one or two times during system startup.

l  If such resets occur multiple times, go to section 4.2.2.6 Contacting Technical Support.

Reset for patch load file fail

Reset for patch synchronize file fail

Reset for patch state compare fail

Software exceptions

VRP reset selfboard because of find deadloop

A deadloop was detected.

Check alarms and logs on the switch to locate the problem.

VRP reset selfboard because of find exception

A software exception was detected.

4.2.2.6 Contacting Technical Support

Board reset by VRP for schedule

Congestion occurred.

Check alarms and logs on the switch to locate the problem.

VRP reset selfboard because of no memory

The memory has been used up.

l  Check whether the memory usage is high.

l  Check alarms and logs on the switch to locate the problem.

Reset for memory use out

Device management

Reset for no receiving mpu's heart

An LPU did not receive heartbeat packets from the MPU in 40 seconds.

See Checking Whether the Card Reset Is Caused by Bad Installation.

Reset for no heart

The MPU did not receive heartbeat packets from an LPU in 30 seconds.

Reset for not receiving register ack from mpu

An LPU registered 20 times but did not receive registration response packets from the MPU.

The inter-card communication failed. See Checking Whether the Card Reset Is Caused by Bad Installation.

Reset for state not stable

The MPU's communication with an LPU was interrupted intermittently.

Warm reset board for no register in a long time

An LPU failed to register in 30 minutes.

Warm reset board for no receiving message in a long time

The MPU did not receive any packet from an LPU in 10 minutes.

Cold reset board for no receiving message in a long time

The MPU did not receive any packet from an LPU in 20 minutes.

Cold reset board for CPU is not active

The MPU detected that the CPU of an LPU did not work.

Power off the board because of reset three times continuously

A card reset three times during the startup.

A card will be power cycled after three warm start failures.

Reset for unregister but receive heartbeat info

The MPU received heartbeat packets from an unregistered card.

Check alarms and logs on the switch to locate the problem.

Reset for slave board class disaccord with mpu

The active and standby MPUs are different types.

Check the types of the MPUs and replace one of them to ensure that the switch uses MPUs of the same type.

Reset for lpu or slave version disaccord with mpu

The startup version of a card differs from that of the MPU.

1. If the card is the standby MPU, check the versions of the active and standby MPUs. If the two MPUs run V1R2 and V1R3 respectively, the standby MPU will reset because the two versions do not support automatic synchronization. Manually synchronize the software version of the standby MPU with the active MPU.

2. If the card is an LPU, go to section 4.2.2.6 Contacting Technical Support.

Reset for no receiving master cpu's heart

A VASP card reset because the main core in its CPU did not receive heartbeat packets from the sub-core in 60 seconds.

4.2.2.6 Contacting Technical Support

Hardware components

Reset for selftest fail

A card's self-check failed.

Reinstall the card or install it into another slot, and then check whether it works normally. If the fault persists, the card is faulty.

Reset for CPLD self-test fail

The CPLD self-check failed.

Reset selfboard because of initialize fsu fail

The FSU failed to be initialized.

reset for fpga load failed

The FPGA failed to be loaded.

Reset for fpga in abnormal state

The FPGA status is abnormal.

Reset for lanswitch chip parity error

An error occurred during LSW circuit parity check.

Reset for FSU card type mismatch

The FSU does not match the chassis.

Replace the FSU with a matching one. If the problem cannot be fixed, go to section 4.2.2.6 Contacting Technical Support.

Board reset by ISIS for purging LSP error

An error occurred when clearing link state packets (LSPs).

l  It is normal for such resets to occur one or two times during system startup.

l  If such resets occur multiple times, go to section 4.2.2.6 Contacting Technical Support.

CSS

Reset for frame combine

Two chassis merged.

These are normal conditions and require not actions.

Reset for frame split

The cluster split.

Reset for fsp

The cluster reset.

Reset for one frame register, but the board is not register

A card was not registered during chassis registration.

Reset for slave to master in slave frame, but self is not register

On the standby switch, the standby MPU became the active MPU before being registered.

Reset for slave to master in master frame, but self is not register

On the master switch, the standby MPU became the active MPU being registered.

Reset by switchover command from system master chassis

The switchover command was executed in the cluster.

Reset by command from other chassis

The reset command was executed on the other cluster member switch.

Reset board after syn version

A card reset after version synchronization.

Reset board for Peer frame is in CSS force master status

The other switch was forcibly specified as the master switch.

Reset for fpga state disaccord with system master

A switch using SRUC and a switch using SRUD set up a cluster. The SRU hardware engine function is enabled on the switch using SRUD.

Run the undo detect-engine enable command to disable the SRU hardware engine function, reboot the switch for the configuration to take effect, and then reconfigure the CSS function.

 

----End

3.2.2.3 Checking Alarms

How to Check Alarms on a Switch

If a switch fails or cannot operate normally because the environmental conditions do not meet operating requirements, it will generate alarm messages depending on the type of the problem.

Use either of the following methods to view alarm messages:

l   Log in to the network management system (for example, eSight) to view alarm messages.

l   Run the display trapbuffer [ size value ] command on the CLI of the switch to view alarm messages in the trap buffer.

The value parameter determines the maximum number of alarm messages that can be displayed in the command output. If the actual number of alarm messages is smaller than the specified value, all the available alarm messages are displayed.

<HUAWEI> display trapbuffer
Trapping buffer configuration and contents : enabled                             
Allowed max buffer size : 1024                                                  
Actual buffer size : 256                                                        
Channel number : 3 , Channel name : trapbuffer                                  
Dropped messages : 0                                                            
Overwritten messages : 6248                                                     
Current messages : 256                                              
#Sep 19 2012 04:38:03+08:00 HUAWEI DS/4/DATASYNC_CFGCHANGE:OID 1.3.6.1.4.1.2011
.5.25.191.3.1 configurations have been changed. The current change number is 8, 
the change loop count is 0, and the maximum number of records is 4095.          
#Sep 19 2012 04:37:39+08:00 HUAWEI LINE/5/VTYUSERLOGIN:OID 1.3.6.1.4.1.2011.5.2
5.207.2.2 A user login. (UserIndex=34, UserName=VTY, UserIP=10.135.18.114, UserC
hannel=VTY0)                                                        

You can also use the following commands to check specific types of alarm messages:

l   display alarm all: displays all alarms on the switch.

l   display alarm active: displays alarms that have not been cleared after start of the switch.

l   display alarm history: displays historical alarms recorded after start of the switch.

Common Alarms About Card Resets and Handling Methods

Table 3-3 Common alarms about card resets and handling methods

20170228163200794001.png
20170228163201597002.png
20170228163202495003.png
20170228163202059004.png
20170228163203278005.png
20170228163204306006.png
20170228163205782007.png
20170228163206569008.png

20170228151626087002.jpg

The following tips will help you quickly find the reference information for a specific alarm:

l  An alarm ID uniquely identifies an alarm. You can search for the ID of an alarm in the Alarm Reference to find the meaning of the alarm and handling procedure.

l  Alarms with the same ID but triggered by different causes are identified by different error codes (for example, BaseTrapProbableCause). You can search for the error code in the Alarm Reference.

l  You can also use the information query tool to query alarm information.

Do not search for alarms based on variables, such as alarm generation time, interface number, process ID, and device name.

3.2.2.4 Check Switch Appearance and Environment

If a card is reset because its communication with the MPU is interrupted or power, fan, or temperature alarms are generated, check the switch appearance and environment to locate the fault.

Checking Whether the Card Reset Is Caused by Bad Installation

If the cause of a card reset is heartbeat loss or communication failure with the MPU, the card may not be securely installed in the slot.

                               Step 1     Verify that the reset card and the MPU are securely installed.

                               Step 2     Remove the reset card and check whether any pins on its connector are bent.

                               Step 3     Install the card in another slot or replace it with a new card to determine whether the card or chassis is faulty.

                               Step 4     If the fault cannot be located, go to section 4.2.2.6 Contacting Technical Support.

----End

Checking Whether the Card Reset Is Caused by Power Exception

                               Step 1     Determine whether a power failure has occurred around the reboot time. Check the following:

l   Whether any operations caused the switch to be powered off

l   Whether any exceptions are recorded in logs of the UPS (if the switch is powered by a UPS)

l   Whether other devices in the same rack or powered by the same power supply system were powered off at that time

l   Whether any high-power device was connected to the network at that time

l   Whether any power lines are aged or loose

l   Whether the input voltage is in the normal range (measure using a multimeter)

If any of the preceding situations exists, take measures to fix the problem of the external power supply system.

                               Step 2     If the external power supply is normal, check whether power modules of the switch are faulty. Check whether any power module is removed or loose. After approved by the customer, move the problematic power module to another power slot or replace it with another power module to check whether the power module or the switch itself is faulty.

                               Step 3     If the power module is faulty, replace it. If the switch is faulty, go to section 4.2.2.6 Contacting Technical Support.

----End

Checking Whether the Card Reset Is Caused by High Temperature or Failure of Fans

                               Step 1     Check whether the operating temperature is in the normal range (generally 0°C to 45°C). If the temperature is too high, lower the temperature in the equipment room.

                               Step 2     Check the cooling system of the switch. Check the air intake vents, air exhaust vents, fan modules, and air filter to ensure that:

l   The air intake vents (at the front and the left side of the chassis) and air exhaust vents (at the rear of the chassis) are not blocked. The cabinet must have side panels to isolate the chassis from devices in other cabinets. If there are obstacles nearby to affect cooling of the switch, remove the obstacles and check whether the equipment temperature drops to the normal range.

l   Fans are running normally. Check whether any fan trays are removed or loose, and whether air is exhausted from fan modules.

l   The air filter is clean and not blocked so that air can enter the chassis. If the air filter is blocked, clean or replace it.

                               Step 3     If any fan modules are faulty, replace them.

                               Step 4     If the problem cannot be located, go to section 3.1.2.6 Contacting Technical Support.

----End

3.2.2.5 Checking Logs

If the procedures described in the preceding sections cannot locate the cause of the reset, check logs on the switch.

How to Check Logs on a Switch

The log module of the system software logs events occurring during system operations. Logs are reference information for system diagnosis and maintenance, and help you check the equipment running status, analyze network condition, and locate faults.

To check logs on a switch, log in to the switch through the console port or using Telnet, and then run the display logbuffer command. You can also save log information on the switch and use the syslog protocol to export logs to a log server.

# Run the display logbuffer command to check all logs in the log buffer.

<HUAWEI> display logbuffer
Logging buffer configuration and contents : enabled                  
Allowed max buffer size : 1024                                       
Actual buffer size : 512                                              
Channel number : 4 , Channel name : logbuffer                        
Dropped messages : 0                                                 
Overwritten messages : 0                                             
Current messages : 43                                                
 
Oct 16 2013 06:06:48 HUAWEI %VFS/4/DISKSPACE_NOT_ENOUGH(l)[3]: Disk space is insufficient. The system begins to delete unused log files. 
Oct 10 2013 19:06:48 HUAWEI %VFS/4/DISKSPACE_NOT_ENOUGH(l)[4]: Disk space is insufficient. The system begins to delete unused log files
  ---- More----

Common Alarms About Card Resets and Handling Methods

Table 3-4 Common alarms about card resets and handling methods

Digest

Log Description

Possible Cause

Handling Method

ALML/4/48V_CHECK_FAULT

The sensor of a card detects alarms on two 48 V power lines.

The power supply circuit of the card is faulty, and the card cannot be powered on.

l  Check whether power modules are present.

l  If power modules are present, go to section 4.2.2.6 Contacting Technical Support.

ALML/0/BRD_PWOFF

A card overheats and is powered off because of a fan failure.

The fans used to cool the card have been removed or stopped running.

l  Run the display temperature all command. In the command output, the Status field shows whether the chassis temperature is within the normal range, and the Temperature.(C) field displays the temperature of each module. If the Status field displays minor, go to the next step.

l  Fix the problem of the cooling system. See Checking Whether the Card Reset Is Caused by High Temperature or Failure of Fans.

l  If the card temperature is still high, unplug and plug the card and then check whether it can register successfully. If not, go to section 4.2.2.6 Contacting Technical Support.

ALML/4/ENTPOWEROFF

A card is powered off.

l  The power off slot slot-id command was executed.

l  The system detects power insufficiency and powers off the card.

If the power is insufficient, rectify the fault according to Checking Whether the Card Reset Is Caused by Power Exception.

ALML/4/ENTRESET

A card was reset.

l  The card reset command was executed.

l  The system does not run normally. Check the reason field in the log for the specific reason of the reset.

If the card was not reset manually, check the reason of the reset in the log and go to section 4.2.2.6 Contacting Technical Support.

ALML/4/ENT_PULL_OUT

A card or subcard is removed.

l  The card or subcard has been removed by a user.

l  The card or subcard is not securely installed.

l  If the card is subcard has been removed by a user, ignore the log.

l  If it is not securely installed, reinstall it securely in the slot.

ALML/4/HSB_SWITCH_CAUSE

The active MPU is reset.

The active MPU may reset for any of the following reasons:

l  Unknown switch reason: The reason is unknown.

l  VRP command force: The MPU is forcibly reset by a user.

l  Master MPU is no memory: The active MPU does not have sufficient memory.

l  VRP find task deadloop: A task deadloop occurred.

l  Batch was not over: A task is not executed normally.

l  Master switch to slave Interrupt: An active/standby switchover occurred.

l  Ecm Channel was faulty: The Ethernet channel management (ECM) channel is faulty.

l  8) Monitor bus communication Interrupt: The CANbus communication is interrupted.

l  MPU board was pulled out: The MPU is removed.

l  Check whether the MPU has been removed by a user.

l  Run the display current-configuration command to check whether an active/standby switchover was triggered forcibly using the slave switchover command.

l  4.2.2.6 Contacting Technical Support

ALML/4/MASTER_TO_SLAVE

The active MPU has changed to the standby state

An active/standby switchover was triggered forcibly using the slave switchover command. (This log will not be recorded if the active MPU becomes the standby one due to an exception.)

Ignore this log.

ALML/4/POWERSUPPLY_OFF

The power supply is cut off.

l  The power supply is cut off manually.

l  The power supply system is abnormal.

Checking Whether the Card Reset Is Caused by Power Exception

ALML/4/PWRFANABSENT

A fan module is absent.

No fan module is installed in the slot.

Checking Whether the Card Reset Is Caused by High Temperature or Failure of Fans

ALML/4/TEMP_UPPER

The temperature sensor detects that the temperature exceeds the upper limit. The cooling efficiency is low for some reasons, for example, the air filter is blocked, some fans are not running, or vacant slots are not covered by filler panels.

l  Heat cannot be exhausted from the switch quickly.

l  The air filter is blocked by dust.

l  Vacant slots are not covered with filler panels.

l  The environment temperature is too high.

l  There are not enough fans in the switch.

l  Fans in the switch are faulty.

Checking Whether the Card Reset Is Caused by High Temperature or Failure of Fans

FMEA/6/AVS_ABNORMAL

The adaptive voltage scaling (AVS) module on a card does not work normally.

A hardware fault has occurred.

Replace the card.

MAD/4/CONFLICT_DETECT

A multi-active condition is detected.

More than one master switch exists due to a cluster link failure.

Rectify the cluster link failure.

MAD/4/MEMBER_LOST

The cluster splits due to loss of the cluster neighbor.

l  The cluster link has failed.

l  The cluster member switch has failed.

l  Rectify the fault of the member switch.

l  Rectify the cluster link failure.

 

20170228151626087002.jpg

The following tips will help you quickly find the reference information for a specific log:

l  A digest uniquely identifies a log. You can search for the digest of a log in the Log Reference to find the meaning of the log and handling procedure.

l  Do not search for logs using variables, such as log generation time, interface number, process ID, and device name.

Example:

To find reference information for the log: Apr 27 2014 07:45:35 HUAWEI %SHELL/4/LOGIN_FAIL_FOR_INPUT_TIMEOUT(s)[6]:Failed to log in due to timeout.(Ip=10.135.19.157, UserNa me=**, Times=1, AccessType=TELNET, VpnName=), search for the digest LOGIN_FAIL_FOR_INPUT_TIMEOUT in the Log Reference. Then you will find the explanation of the log: After entering a user name or password, a user failed to log in because of a timeout.

3.2.2.6 Contacting Technical Support

If you have trouble locating a card reset problem, collect related information and send it to Huawei agent or Huawei for fault location.

Collect the following information:

• Fault occurrence time, network topology of the failure point (for example, the upstream and downstream devices connected to the failure point, and location of the failure point), operations performed before the fault occurs, measures taken to handle the fault and results of the measures, fault symptom, and impact on services.

• Name, version, and current configuration of the faulty device, as well as related interface information. For details, see Collecting Diagnostic Information Using One Command.

• Logs recorded when the fault occurred.

• If a switch fails to start after a reboot, collect the serial port information printed during the startup process.

Collecting Diagnostic Information Using One Command

The display diagnostic-information command provides outputs of multiple commonly used display commands. You can use this command to view diagnostic information about a switch, including the startup configuration, current configuration, interface information, time, and system software version. It is an effective information collection tool.

The display diagnostic-information [ file-name ] command can display running diagnostic information on screen or export it to a .txt file. If you do not specify the file-name parameter, the command displays diagnostic information on screen. If you specify a file name, diagnostic information will be saved in the .txt file with the specified name. It is recommended that you export diagnostic information to a .txt file. The following is an example:

<HUAWEI> display diagnostic-information dia-info.txt
  This operation will take several minutes, please wait.........................
Info: The diagnostic information was saved to the device successfully.

The .txt file is saved in cfcard:/. You can run the dir command in the user view to check whether the .txt file exists.

If diagnostic information is displayed on screen, you can press Ctrl+C to stop the display.

This command is used to collect diagnostic information for fault location. Executing this command may affect the system performance. For example, it may cause a high CPU usage. Therefore, do not run this when the switch is running normally. Do not run this command on multiple terminals connected to the switch at the same time. Otherwise, the CPU usage of the switch will increase sharply, causing system performance deterioration.

Commonly used terminal software supports information output to a specified file. For example, if you are using the HyperTerminal software of a Windows operating system, choose Transfer > Capture Text, enter the file name, and click Start. After that, run the display diagnostic-information command. Then all diagnostic information is displayed on the terminal screen and automatically saved in a file in the specified path.

Obtaining Log Files

Logs and alarms of a switch can be saved in log files. Perform the following steps to obtain log files:

1.         Run the save logfile command to save information in the log buffer to log files.

2.         Upload files in cfcard:/logfile/ to your computer using FTP or TFTP. If the log files cannot be transferred using FTP or TFTP, run the more command in the user view to display the logs. For example, run the more logfile/log.log command to display logs saved in the log.log file.

20170228151626087002.jpg

l  There may be a large number of log files in the logfile folder. You only need to collect the log files generated around the fault occurrence time.

l  If the standby MPU is involved, you also need to collect log files saved on the standby MPU. These log files are saved in slave#cfcard:/logfile/.

l  For a cluster split or reset problem, collect log files in all the member switches.

3.3 Typical Reboot/Reset Troubleshooting Cases

 

 

3.3.1 A Switch Reboots

Fault Symptom

An S9312 switch rebooted twice in a day.

Mar 20 2014 13:54:27 7F-S9312 SNMP/4/COLDSTART:OID 1.3.6.1.6.3.1.1.5.1 coldStart.
Mar 20 2014 17:06:39 7F-S9312 SNMP/4/COLDSTART:OID 1.3.6.1.6.3.1.1.5.1 coldStart.

Cause Analysis

                               Step 1     Run the display device command to check device information.

S9312's Device status:
Slot  Sub Type         Online    Power      Register       Alarm      Primary
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1     -   LE0D0VAMPA00 Present   PowerOn    Registered     Normal     NA    
2     -   LE0DG48CEAT0 Present   PowerOn    Registered     Normal     NA    
4     -   LE0DG48CEAT0 Present   PowerOn    Registered     Normal     NA    
6     -   LE0DG48CEAT0 Present   PowerOn    Registered     Normal     NA    
8     -   LE0DG48CEAT0 Present   PowerOn    Registered     Normal     NA    
10    -   LE0DG48CEAT0 Present   PowerOn    Registered     Normal     NA    
13    -   LE0MSRUA     Present   PowerOn    Registered     Normal     Master
14    -   LE0MSRUA     Present   PowerOn    Registered     Normal     Slave 
PWR1  -   -            Present   PowerOn    Registered     Normal     NA    
PWR2  -   -            Present   PowerOn    Registered     Normal     NA    
CMU1  -   LE0DCMUA0000 Present   PowerOn    Registered     Normal     Master
FAN1  -   -            Present   PowerOn    Registered     Normal     NA    
FAN2  -   -            Present   PowerOn    Registered     Normal     NA    
FAN3  -   -            Present   PowerOn    Registered     Normal     NA    
FAN4  -   -            Present   PowerOn    Registered     Normal     NA   

The switch has two SRUs and two power modules, and all modules in the switches are working normally.

                               Step 2     In the reboot records, Cordstart indicates that the switch was powered off and then restarted. There is a low probability that both the power modules in the switch fail. In addition, no power alarms were generated around the reboot time.

                               Step 3     After checking the power modules, engineers confirmed that the power modules were securely installed, with power cables connected properly.

                               Step 4     The customer said that circuit breakers in the building tripped on the day when the switch rebooted. Therefore, it can be determined that the reboots were caused by exception of the external power supply system.

----End

Solution

Monitor the power grid status and switch running status.

Conclusion

Reboots of a switch with double main control units are usually caused by power failures. Locate such problems by checking the external power supply system, power modules of the switch, and reboot cause recorded in logs.

3.3.2 The Standby MPU of a Switch Resets Repeatedly

Fault Symptom

After a new standby MPU (SRUA) is installed in slot 8 of an S9306 switch, it resets repeatedly, while the old standby SRU worked normally in this slot.

Cause Analysis

                               Step 1     Run the display device command to check device information.

<HUAWEI> display device
S9306's Device status:
Slot  Sub Type     Online    Power      Register       Alarm      Primary
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
1     -   G48TC    Present   PowerOn    Registered     Normal     NA    
3     -   X2UXC    Present   PowerOn    Registered     Normal     NA    
6     -   G48TC    Present   PowerOn    Registered     Normal     NA    
7     -   SRUA     Present   PowerOn    Registered     Normal     Master
8     -   -        Present   PowerOn    Unregistered   -          Slave 
PWR1  -   -        Present   PowerOn    Registered     Normal     NA    
PWR3  -   -        Present   PowerOn    Registered     Normal     NA    
CMU1  -   CMUA     Present   PowerOn    Registered     Normal     Master
FAN1  -   -        Present   PowerOn    Registered     Normal     NA    
FAN2  -   -        Present   PowerOn    Registered     Normal     NA   

The SRU in slot 8 is unregistered.

                               Step 2     Run the display reset-reason command to view card reset information.

<HUAWEI> display reset-reason
The LPU board[1] has no reset records.
The LPU board[2] has no reset records.
The LPU board[3] has no reset records.
The LPU board[4] has no reset records.
The LPU board[5] has no reset records.
The LPU board[6] has no reset records.
The SRU board[7] has no reset records.
The SRU board[8]'s reset total 19883, detailed information:
--  1. 2014/01/26   16:23:55, Reset No.: 19883
       Reason: Warm reset board for no receiving message in a long time
--  2. 2014/01/26   16:13:55, Reset No.: 19882
       Reason: Cold reset board for no receiving message in a long time
--  3. 2014/01/26   16:03:55, Reset No.: 19881
       Reason: Warm reset board for no receiving message in a long time
--  4. 2014/01/26   15:53:55, Reset No.: 19880
       Reason: Cold reset board for no receiving message in a long time
--  5. 2014/01/26   15:43:55, Reset No.: 19879
       Reason: Warm reset board for no receiving message in a long time

The reset cause of the SRU in slot 8 is: Cold reset board for no receiving message in a long time, indicating that this SRU cannot communicate with the active SRU.

                               Step 3     Since the old standby SRU worked normally in this slot, the new SRU may be faulty. Replace the new SRU with another one of the same model to verify the inference.

                               Step 4     The replacement SRU can register successfully, indicating that the previous SRU is faulty.

----End

Solution

Replace the faulty SRU and send it back for repair.

3.3.3 LPUs and Standby MPU of a Switch All Reset

Fault Symptom

During a test, services on an S9300 switch were interrupted, and logs showed that the standby MPU and the only LPU in the switch both reset.

Dec8 2013 13:14:10 NewCallcenter-SW-2 %ALML/4/ENTRESET(l)[778]: LPU frame[1] board[1] is reset. The reason is: Warm reset board for no register in a long time.
Dec8 2013 13:14:10 NewCallcenter-SW-2 %ALML/4/PUBLISH_EVENT(l)[779]: Publish event. (Slot=1, Event ID=BOARD_RESET).
Dec8 2013 13:14:14 NewCallcenter-SW-2 ENTMIB/4/TRAP:OID 1.3.6.1.2.1.47.2.0.1 Entity MIB change.
Dec8 2013 13:25:10 NewCallcenter-SW-2 %ALML/4/ENTRESET(l)[780]: MPU frame[1] board[5] is reset. 
The reason is: Warm reset board for no receiving message in a long time.
Dec8 2013 13:25:10 NewCallcenter-SW-2 %ALML/4/PUBLISH_EVENT(l)[781]: Publish event. (Slot=5, Event ID=BOARD_RESET).
Dec8 2013 13:25:10 NewCallcenter-SW-2 %VFS/5/UNREGDEV_OK(l)[782]: Succeeded in unregistering the file system on device 5.
Dec8 2013 13:25:10 NewCallcenter-SW-2 %OSPF/6/RECV_SMB_DOWN_RM(l)[783]: OSPF backup receives slave mainboard Down event from RM. (SlaveHsbState=0)
Dec8 2013 13:25:14 NewCallcenter-SW-2 ENTMIB/4/TRAP:OID 1.3.6.1.2.1.47.2.0.1 Entity MIB change.

Cause Analysis

                               Step 1     Run the display device command to check device information.

<HUAWEI> display device
S9303's Device status:
Slot  Sub Type         Online    Power      Register       Alarm      Primary
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1     -   -            Present   PowerOn    Unregistered   -          NA    
 
4     -   LE0MMCUA     Present   PowerOn    Registered     Normal     Master
 
5     -   -            Present   PowerOn    Unregistered   -          Slave 
 
PWR1  -   -            Present   PowerOn    Registered     Normal     NA    
 
PWR2  -   -            Present   PowerOn    Registered     Normal     NA    
 
FAN1  -   -            Present   PowerOn    Registered     Normal     NA   

The standby MPU (MCUA) in slot 5 and LPU in slot 1 were in Unregistered state.

                               Step 2     The reset logs showed that the resets were caused by communication failure with the active MCU.

                               Step 3     The LPU still failed to register after it was reinstalled multiple times in slot 1 and moved to other slots.

                               Step 4     When checking the connector of the LPU, field engineers found no bent pins or rust.

                               Step 5     After the active MCU in slot 4 was removed, the LPU in slot 1 and MCU in slot 5 registered successfully. Therefore, the original active MCU or slot 4 may be faulty.

                               Step 6     After field engineers swapped the slots of the two MCUs, the MCU in slot 4 registered successfully, whereas the original active MCU failed to register in slot 5. Therefore, it can be determined that the original MCU, instead of slot 4, was faulty.

----End

Solution

Replace the faulty MCU and send it back for repair.

Conclusion

If the standby MPU and all LPUs in a switch fail to communicate with the active MPU, the active MPU or the slot may be faulty. Perform cross tests to determine which one is faulty.

3.3.4 Disabling Auto-booting Causes the Standby MPU to Reset Repeatedly

Fault Symptom

The standby MPU of a switch resets repeatedly.

Cause Analysis

                               Step 1     Run the display reset-reason command to check the cause of reset. The command output shows that the standby MPU was reset by the active MPU because it failed to communicate with the active MPU.

                               Step 2     Connect the serial port on the standby MPU to a PC and collect the displayed information.

****************************************************
*                                                  *
*          S9300 Bootload, Ver 102                 *
*                                                  *
****************************************************
 
Copyright(C) 2003-2009 by HUAWEI TECHNOLOGIES CO., LTD.
Creation date: Sep 10 2009, 13:52:56
 
PCB Version     : LE02SRUA VER.B
CPU L2 Cache    : 128KB
CPU Clock Speed : 700MHz
BUS Clock Speed : 133MHz
Memory Type     : DDR2 SDRAM
Memory Size     : 1024MB
Memory Speed    : 667MHz
 
CF Card Init...............................................................                                                                                    cfcard:/  - Volume is OK
Done
 
Auto-booting is disabled!
Password:

The system displays Auto-booting is disabled! and stays at the password input prompt. Because the auto-booting function is disabled, the standby MPU cannot proactively start the system software package in the file system and stays in the BootLoad phase. As a result, the standby MPU cannot communicate with the active MPU. The active MPU can only detect the standby MPU but cannot communicate with it. Therefore, the active MPU resets the standby MPU and repeats this process.

----End

Solution

                               Step 1     As the BootLoad stays at the password input prompt, you first need to enter the password to enter the BootLoad menu.

                               Step 2     After entering the BootLoad menu, press Ctrl+Z to enter the hidden menu.

                               Step 3     Select Enable auto-booting with default mode.

                               Step 4     Quit the hidden menu and reboot the standby MPU.

----End

3.3.5 The Standby MPU of a Switch Reset Repeatedly and Power Alarms Are Generated

Fault Symptom

The standby MPU (SRU) in slot 7 of an S9306 switch resets repeatedly, and power alarms are generated.

Cause Analysis

                               Step 1     Run the display device command to check device information.

<HUAWEI> display device
S9306's Device status:
Slot  Sub Type  Online    Power      Register       Alarm      Primary
 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
 
1     -   LPU   Present   PowerOn    Registered     Normal     NA    
 
2     -   LPU   Present   PowerOn    Registered     Normal     NA    
 
3     -   LPU   Present   PowerOn    Registered     Normal     NA    
 
4     -   LPU   Present   PowerOn    Registered     Normal     NA    
 
5     -   LPU   Present   PowerOn    Registered     Normal     NA    
 
6     -   LPU   Present   PowerOn    Registered     Normal     NA    
 
7     -   SRU   Present   PowerOn    Unregistered   -          Slave 
 
8     -   SRU   Present   PowerOn    Registered     Normal     Master

The SRU in slot 7 is unregistered.

                               Step 2     Run the display alarm all command to check alarm information. The command output contains power alarms.

<HUAWEI> display alarm all
Level          Date        Time                Info      
Warning      2013-10-31  21:18:27    The "1.2V_VDD" voltage sensor of SRU board[7](entity) exceed lower minor limit.
 
Warning      2013-10-31  21:18:27    The "2.5V" voltage sensor of SRU board[7](entity) exceed lower minor limit.
 
Warning      2013-10-31  21:18:27    The "1.8V" voltage sensor of SRU board[7](entity) exceed lower minor limit.

                               Step 3     Check logs. Logs also record power exceptions in the SRU.

Oct 31 2013 20:56:41 KeFuZuoXi-S9306-1 %ALML/3/CPU_RESET(l): The canbus node of SRU board[7] detects that CPU was reset.
Oct 31 2013 20:56:39 KeFuZuoXi-S9306-1 %ALML/3/CPU_RESET(l): The canbus node of SRU board[7] detects that CPU was reset.
Oct 31 2013 20:56:37 KeFuZuoXi-S9306-1 %ALML/3/CPU_RESET(l): The canbus node of SRU board[7] detects that CPU was reset.
Oct 31 2013 20:56:35 KeFuZuoXi-S9306-1 %ALML/3/CPU_RESET(l): The canbus node of SRU board[7] detects that CPU was reset.
Oct 31 2013 20:56:33 KeFuZuoXi-S9306-1 %ALML/3/CPU_RESET(l): The canbus node of SRU board[7] detects that CPU was reset.
Oct 31 2013 20:56:32 KeFuZuoXi-S9306-1 %ALML/3/CPU_RESET(l): The canbus node of SRU board[7] detects that CPU was reset.
Oct 31 2013 20:56:30 KeFuZuoXi-S9306-1 %ALML/3/CPU_RESET(l): The canbus node of SRU board[7] detects that CPU was reset.
Oct 31 2013 20:56:27 KeFuZuoXi-S9306-1 %ALMA/4/VOLT_LOWER(l): The "1.2V_VDD" voltage sensor of SRU board[7](entity) exceed lower minor limit.
Oct 31 2013 20:56:27 KeFuZuoXi-S9306-1 %ALML/4/VOLT_LOWER(l): The "1.2V_VDD" voltage sensor of SRU board[7](entity) exceed lower minor limit.
(SensorNum=5, Value=0.01, UpperThreshold=1.44, LowerThreshold=0.96)
Oct 31 2013 20:56:27 KeFuZuoXi-S9306-1 %ALMA/4/VOLT_LOWER(l): The "2.5V" voltage sensor of SRU board[7](entity) exceed lower minor limit.
Oct 31 2013 20:56:27 KeFuZuoXi-S9306-1 %ALML/4/VOLT_LOWER(l): The "2.5V" voltage sensor of SRU board[7](entity) exceed lower minor limit.
(SensorNum=10, Value=0.86, UpperThreshold=3.00, LowerThreshold=2.00)
Oct 31 2013 20:56:27 KeFuZuoXi-S9306-1 %ALMA/4/VOLT_LOWER(l): The "1.8V" voltage sensor of SRU board[7](entity) exceed lower minor limit.
Oct 31 2013 20:56:27 KeFuZuoXi-S9306-1 %ALML/4/VOLT_LOWER(l): The "1.8V" voltage sensor of SRU board[7](entity) exceed lower minor limit.
(SensorNum=9, Value=0.01, UpperThreshold=2.16, LowerThreshold=1.44)
Oct 31 2013 20:56:27 KeFuZuoXi-S9306-1 %ALML/4/PUBLISH_EVENT(l): Publish event. (Slot=7,Eventid=BOARD_RESET)
Oct 31 2013 20:56:27 KeFuZuoXi-S9306-1 %ALML/4/ENTRESET(l): SRU board[7] is reset, The reason is: Cold reset board for CPU is not active.
Oct 31 2013 20:56:26 KeFuZuoXi-S9306-1 %ALML/3/CPU_RESET(l): The canbus node of SRU board[7] detects that CPU was reset.
Oct 31 2013 20:56:24 KeFuZuoXi-S9306-1 %ALML/3/CPU_RESET(l): The canbus node of SRU board[7] detects that CPU was reset.

Therefore, this problem is caused by abnormal power supply in the SRU.

----End

Solution

Replace the faulty SRU and send it back for repair.

Conclusion

None.

 

For more information,click    ★★★Summary★★★ All About Huawei Switch Features and Configurations



This post was last edited by 交换机在江湖 at 2017-02-28 17:02.

本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有帐号?注册

x

本帖被以下专题推荐:

  • x
  • 常规:

点评 回复

跳转到指定楼层
zhangyongjie   发表于 2017-2-28 15:50:57 已赞(0) 赞(0)

求大神翻译下
  • x
  • 常规:

点评 回复

Paradise  管理员 发表于 2017-3-3 10:04:58 已赞(0) 赞(0)

  • x
  • 常规:

点评 回复

Paradise  管理员 发表于 2017-3-3 10:05:40 已赞(0) 赞(0)

Paradise 发表于 2017-03-03 10:04 http://forum.huawei.com/enterprise/forum.php?mod=viewthread&tid=370839&page=1&extra=#pid2078015中 ...
http://support.huawei.com/huaweiconnect/enterprise/thread-227721.html
中文的交换机在江湖汇总贴
  • x
  • 常规:

点评 回复

发表回复
您需要登录后才可以回帖 登录 | 注册

内容安全提示:尊敬的用户您好,为了保障您、社区及第三方的合法权益,请勿发布可能给各方带来法律风险的内容,包括但不限于政治敏感内容,涉黄赌毒内容,泄露、侵犯他人商业秘密的内容,侵犯他人商标、版本、专利等知识产权的内容,侵犯个人隐私的内容等。也请勿向他人共享您的账号及密码,通过您的账号执行的所有操作,将视同您本人的行为,由您本人承担操作后果。详情请参看“隐私声明
如果附件按钮无法使用,请将Adobe Flash Player 更新到最新版本!

登录参与交流分享

登录
快速回复 返回顶部