I have two pairs of MWG configured in HA mode and chained together for two sites, so MWG A1 and A2 in HA, B1 and B2 in HA, while Group A has NHP set to Group B to go to internet.
Client workstation points to VIP of Group A.
The setting of NHP list has B1 and B2 in Round Robin Mode.
Getting on the internet, client was assigned with A1 (should be assigned by HA LB) connected to B1 (should be assigned by the NHP engine from the NHP list).
While browsing, B1 was down and then it could not get to internet. The browser just kep waiting and did not have anything come back. Shouldn't NHP switch to B2 automatically if B1 is down in Round Robin Mode?
In order to acheive load balance of B1 and B2 with failover capability, should VIP of B1 and B2 be used in the NHP list instead of individua B1 and B2 on the list?
I tried both enabling and disabling 'use persistent connections' but it the same. It looks like the connection is set to the first member in the list and is not able to switch over when the first member is down.
Can anyone help?
Hope you are doing well.
I did a quick test with NHP settings with 2 proxy servers configured in round robin on port 9090. Internet was working well with traffic being sent to both proxy servers in round robin manner accordingly.
For first proxy I changed HTTP proxy port to 8080 and it is no longer listening on port 9090, so entire traffic was going through second proxy completely in NHP list and internet was working fine.
For first proxy , it was seen that SYN was being sent on port 9090 with RST being received.
Now in 7.8.1.x version there is a value of 10 seconds default for connect timeout (timeout for tcp connect to server or next hop proxy ). As we are receiving RST which helps as MWG straight finds that next hop device is down.. In case for SYN being sent and no RST being received it waits for time out which in older version was 120 seconds.
So to guess issue here can be that SYN without RST and then wait time of 120 seconds assuming you are not using 7.8.1 version onwards.
MWG needs "something" for the decision. if there is no RST, it will have only the timeout for the decision.
Also we would require a packet capture taken during the time of issue to check more.