Hi,
During the daily device maintaining, we usually should check the devices' health so that the potential failure could be found out. In this post, we are going to introduce the normal checking items during the daily device maintaining.
1. CPU usage.
CPU is responsible for the packets handling. When the CPU usage is too high, usually, the administrator might feel latency during telnet the device, and also, the packets dropping.
To check the switch CPU usage, you can execute command 'display cpu-usage'. By default, the CPU usage alarm threshold of APs is 90, which means, when the CPU usage reachs 90%, the alarm logs will be generated. These logs remind the adminitrator to check and handle the tasks those occupy too much CPU resource. You can find the CPU usage for each task in the output of command 'display cpu-usage'.
<HUAWEI> display cpu-usage
CPU Usage Stat. Cycle: 60 (Second)
CPU Usage: 20% Max: 99%
CPU Usage Stat. Time : 2013-10-23 10:04:45
CPU utilization for five seconds: 5%: one minute: 5%: five minutes: 5%
Max CPU Usage Stat. Time : 2013-10-21 08:14.
TaskName CPU Runtime(CPU Tick High/Tick Low) Task Explanation
VIDL 80% 0/e3a150c0 DOPRA IDLE
OS 10% 0/ bfb0440 Operation System
2. Memory usage.
When the device is running, routes, ARP table, FIB and other data are loaded in the memory. Insufficient memory might cause unexpected error. So, the memory is also a very important index for the devices' health. By default, the memory usage reaches 90% or 95%, the alarm logs are generated. To check the devices' memory usage, you can execute command 'display memory-usage'.
<HUAWEI> display memory-usage
Memory utilization statistics at 2013-10-21 08:14+08:00
System Total Memory Is: 394152720 bytes
Total Memory Used Is: 130975664 bytes
Memory Using Percentage Is: 33%
3. Temperature.
As the same with all the electronic equipments, not only the high temperature but also the low temperature will cause the device working abnormal. Due to the switch characteristics, they are installed in the server room in which the air conditioner controls the temperature, the abnormal temperature usually caused due to the enviroment temperature rising or the fan module error.
To check the devices' temperature, you can runn command 'display temperature all'.
<HUAWEI> display temperature all
-------------------------------------------------------------------------------
Slot Card Sensor Status Current(C) Lower(C) Lower Upper(C) Upper
Resume(C) Resume(C)
-------------------------------------------------------------------------------
0 NA NA Normal 44 0 4 72 68
4. Power supply.
For all most of the module switches, they have two power source at least, while for the most box switches, only one power source. Though two power source reduce the risk of power off, it's necessary that checking the power supply during the daily maintaining checking.
<HUAWEI> display power
------------------------------------------------------------
Slot PowerID Online Mode State Power(W)
------------------------------------------------------------
0 PWRI Present AC Supply 500.00
0 PWRII Absent - - -
5. Fan.
As described in aritcle 3, fan module error might cause the temperature. So the fan module is also a necessary items.
<HUAWEI> display fan
-------------------------------------------------------------------------
Slot FanID Online Status Speed Mode Airflow
-------------------------------------------------------------------------
0 1 Absent - - - -
0 2 Present Normal 100% AUTO Side-to-Side
0 3 Absent - - - -
6. Device storage.
For each device, the storage is limited. When the storge runs out of usage, the logs or other infomration might be impacted. Usually, it's recommended to delete the unnecessary files while the storage usage reaches 80%. To check the storage usage, you can run comand 'dir'.
<HUAWEI> dir /all
Directory of flash:/
Idx Attr Size(Byte) Date Time FileName
0 -rw- 889 Feb 25 2012 10:00:58 private-data.txt
1 -rw- 6,311 Feb 17 2012 14:05:04 backup.cfg
2 -rw- 836 Jan 01 2012 18:06:20 rr.dat
3 drw- - Jan 01 2012 18:08:20 logfile
4 -rw- 836 Jan 01 2012 18:06:20 rr.bak
5 drw- - Feb 27 2012 00:00:54 security
6 -rw- 523,240 Mar 16 2011 11:21:36 bootrom_53hib66.bin
7 -rw- 2,290 Feb 25 2012 16:46:06 vrpcfg.cfg
8 -rw- 812 Dec 12 2011 15:43:10 hostkey
9 drw- - Jan 01 2012 18:05:48 compatible
10 -rw- 25,841,428 Nov 17 2011 09:48:10 basicsoft.cc
11 -rw- 540 Dec 12 2011 15:43:12 serverkey
12 -rw- 26,101,692 Dec 21 2011 11:44:52 devicesoft.cc
13 -rw- 6,292 Feb 14 2012 11:14:32 1.cfg
14 -rw- 6,311 Feb 17 2012 10:22:56 1234.cfg
15 -rw- 6,311 Feb 25 2012 17:22:30 [11.cfg]
65,233 KB total (13,632 KB free)
That's all for this post, if you have any other suggestions, please comment below.