Problem Description: Customer X has an IS-IS running MPLS network. All of the clients for system A is located in site1. Server Farm 1 and Server Farm 2 are the distributed virtual server resources for these clients.
Server Farm 2 is located in Site 2. Static vxlan is used for servers to extend server network in order to keep servers synchronized. Layer3 gateways of clients and servers are configured on PE3 & PE4 which are dual-active system based on M-LAG.
In scope of redundancy tests
PE3 is powered off : there is no service interruption
PE3 is powered off + PE1 is powered off : there is no service interruption
PE3 is powered on while PE 1 is powered off : there is no service interruption
PE3 is poweredon + PE 1 is powered on: there is no service interruption
After a few minutes of PE1 powered on immediately PE4 is powered off: there is a service interruption for approximately 1.5 minutes. All of the servers are rebooted in order to select masters because heartbeat over vxlan and l3 connection between witness server and server farms are down.

Vxlan tunnel is down and L3 gateways are not reachable during this period.
Handling Process: Traffic is recovered after 1.5 minutes later without any configurations.
Root Cause: According to ISIS configurations on PE devices
set-overload on-startup is configured with default settings that is 10 minutes
"If an IS-IS device needs to be temporarily isolated, configure the IS-IS device to enter the overload state to prevent other devices from forwarding traffic to this IS-IS device and prevent blackhole routes."
When PE4 is powered off before 10 minutes of PE1 startup which is approximately 8.5 minutes, all traffic from PE3&PE4 cannot be routed via PE1.

aa bb xxxx 12:38:11+03:00 PE1 %ISIS/3/isisDatabaseOverload(t):CID=0x8086055c-OID=1.3.6.1.3.37.2.0.1;The overload state of IS-IS LSDB changed. (isisSysInstance=1, isisSysLevelIndex=2, isisSysLevelOverloadState=2)
aa bb xxxx 12:48:00+03:00 PE1 %ISIS/3/isisDatabaseOverload(t):CID=0x8086055c-OID=1.3.6.1.3.37.2.0.1;The overload state of IS-IS LSDB changed. (isisSysInstance=1, isisSysLevelIndex=2, isisSysLevelOverloadState=1)

Solution: Before redundancy tests prepare a SOP document properly in order to estimate all possible scenarios.
An alternate scenario for this topology can be suggested to customer. Refer to suggestions
Suggestions and Summary:
Do not use a distributed server architecture. Since all clients are located on site 1 keep all active servers on site1.
Do not use only one gateway on site 1.
Use bgp evpn distributed vxlan scenario instead of static vxlan scenario to configure same ip address on site1 and site2. This will allow witness server to reach at least one of the server farms and prevent reboot of all servers. This allows all clients and server farm1 can communicate via PE3&PE4 even PE1&PE2 are not reachable.