Hello all,
If exceptions occur during the upgrade, how to handle them? Don’t be worry about it.
This post shows if nodes fail to be upgraded, the upgrade is automatically terminated. Options including rollback, retry, continue, and terminate are provided.
The Rollback, Retry, Continue, and Terminate are described as follows:
Rollback: indicates rolling back the node to the source version.
Retry: Upgrade the node again. If the upgrade succeeds, the upgrade process will continue. If the upgrade fails, the upgrade process will be suspended again.
Continue: Ignore the node upgrade failure and continue the upgrade. After the upgrade of the cluster succeeds, the cluster will reset the node. Then the node is automatically updated to the version of the upgrade package.
Terminate: Upgrade process exit.
If a node fails to be upgraded, the SmartKit prompts that the upgrade is in Paused state, as shown in Figure 1.
Figure 1 Paused state

Clicks Details, as shown in Figure 2.
Figure 2 Details

In the dialog box that is displayed, available options, such as Retry, are displayed, as shown in Figure 3.
Figure 3 Clicking Retry

Confirm your choice and click OK, as shown in Figure 4.
Figure 4 Clicking OK

When a node fails to be upgraded and the upgrade process is suspended, you can roll back the upgrade, perform the upgrade again, or ignore the upgrade failure on the CLI if the SmartKit is unable to connect to devices or is disabled.
1. Run the show upgrade status command to check the current update status. the upgrade status can be Suspended Before Continue, Suspended Before Rollback or Suspended Before Terminate.
2. After the upgrade status is confirmed, Huawei R&D engineers locate the causes of the upgrade failure. Then R&D engineers instruct operators to troubleshoot the upgrade failure.
If the upgrade status is Suspended Before Continue, select Continue or Retry.
If the upgrade status is Suspended Before Rollback, select Roll back or Retry.
If the upgrade status is Suspended Before Terminate, select Retry.
3. Run the change upgrade flow resume_type=? command on the CLI. Five options continue, rollback, retry, terminate and repair are available. For example, change upgrade flow resume_type=retry indicates that you need to perform the upgrade again. for more information about the parameters, run the help upgrade command on the CLI.
Thank you.