Hello everyone!
Today I would like to share a case about using rescue mode to repair host file systems.
ISSUE DESCRIPTION
After the FusionCompute compute node CNA8 is powered off unexpectedly, it is suspended during startup, as shown in the below screenshot:
ISSUE ANALYSIS
When the system cannot be started or accessed, it may be because the system is damaged, the system files are missing, or the system files are corrupted. If the system fails to start, you need to enter the repair mode to rectify the fault.
The minor damage can be repaired successfully. The severe damage may not be repaired, or the system cannot be started after the repair. In this case, you can only reinstall the system.
SOLUTION
1. Start or restart the server. The grub page is displayed:
2. Press e to edit the grub command line parameters. The system prompts you to enter the password:
3. Enter the username and password.
4. Modify the grub parameter in the following three ways:
delete the 'console=tty0' field (if the grub parameter contains the console=ttyS0,115200 field, delete this field);
add the rd.break=pre-mount field;
change initramfs-3.10.0-514.44.5.10_109.x86_64.img (the actual version number may vary) to initramfs-hostos-rescue.img.
Before modifying parameters:
After the parameter is modified:
5. Press Ctrl+x to start the system. If the interface shown in the below figure is displayed, the system enters rescue mode. You can run commands to restore the file system, mount logical volumes, view OS logs and modify the configurations.
6. Check and repair the file system in rescue mode.
a) Run the lvscan command to scan the logical volumes.
b) If all LVs are in active state, run the fsck command. The following uses the ext system as an example:
fsck -n /dev/cpsVG/xxxxx
[-n means no, that is, only check, but not repair; the operation is read-only and does not affect the file system.]
fsck -y /dev/cpsVG/xxxxx
[-y means yes, that is, check and repair the faulty file system.]
Note:
In the preceding command, xxxxx indicates all partitions scanned by lvscan. You are advised to run the -y command to restore the partitions. If information similar to the following is displayed, the partition is not damaged:
If a large number of information is displayed, the partition is damaged and being repaired. After the fsck -y /dev/cpsVG/xxxxx command is executed, you are advised to run the same command again until the command output shown in the preceding figure is displayed.
7. After the restoration is complete, reboot and restart the system. If the restoration is successful, the system enters the normal OS. If the system cannot be accessed during the startup, reinstall the PXE. If the system can be accessed, you are advised to check whether each FusionSphere service and VM service are normal. If an exception occurs, it is also recommended that PXE be reinstalled.
This is my solution, how about yours? Go ahead and share it with us!