[Issue Description]
End customer reported several EIP didn't work after HCS finished upgrade.
[Scenario information] Huawei Cloud Stack 6.5.1 type1 (upgrade from 6.3.1)
[Effect] Three EIPs were not working, the business service running on these ECS has been impacted.
[Handling Process]
1.we analyze the network topology, the three VMS were from 2 VPC, and they hosted on two physiacl hosts. in different VPC, there were many EIP which were working fine.on the two hosts, it's same.only the three didn't work. so we guessed maybe something was wrong on neutron component.
2.based on the EIP traffic model, we traced and analyzed the EIP traffic from vroute node to physical host which VM hosted on.
EIP traffic model
3.query the vroute node on controller node of cascading layer first.
logging on the two nodes, we captured the EIP traffic packets..
4.on physical hosts, however, we didn't get any EIP traffic packets.
5.we viewed the component logs of neutron-drv-compute-agent, then we found some abnormal printouts.
the error prompt "qrouter-xxxx" meant something was wrong on neutron-dvr-compute-agent.
[Root Cause]
In the scenario of huge traffic, there was a small probability that the neutron driver agent happen abnormal, it would cause some EIP didnt' work.
[Solution]
To restart the neutron driver agent on these physical hosts by manual.
stop agent command: cps host-template-instance-operate –service network-agent neutron-dvr-compute-agent –action stop –host <host ID>
start:agent command: cps host-template-instance-operate –service network-agent neutron-dvr-compute-agent –action start –host <host ID>