What we have
There is MWG HA cluster in place withing two nodes: Node1 and Node2. Balancing is not needed, we only need HA.
Node1 owns IP1 and 3128/tcp as a proxy end-point.
Node2 owns IP2, VIP and 3128/tcp as a proxy end-point.
- IP1 is accessible on 3128/tcp
- IP2 is accessible on 3128/tcp
- VIP is accessible on 3128/tcp
"mfend-lb -s" running on Node2 reports "state: FAULT" for Node1. Port redirects are not configured.
If we configure port redirects for 3128 on both nodes and apply settings then:
- "mfend-lb -s" running on Node2 reports "state: OK"
- IP1 is accessible on 3128/tcp
- IP2 is not accessible on 3128/tcp
- VIP is not accessible on 3128/tcp
Why could it be? What am I missing?
One more question
What exactly does actual redirecting work? I've checked iptables and found it empty.
iptables is not used here. The MFEND module is a kernel driver that is responsible for the HA implementation. The MFEND module is capable of modifying packets before they go to the application, so when a packet is accepted by the kernel modification can be applied by MFEND before the packet moves on to the application (MWG in this case).
When you set up a "Port Redirect" you basically tell MFEND what ports to look at. So if you have no port redirects configured MFEND will never look at any of the packets coming in on port 3128. The virtual IP address is provided by "keepalived", which is another separate process. The virtual IP address points to the MAC address of Node1 and when someone talks to the virtual IP address he talks to MWG on Node1 directly.
With redirects setup someone would connect to the virtual IP address, send the packets to the MAC address of Node1 but before the packet is received by the application (MWG) the MFEND module sees the packet and can decide whether it should be handled by Node1 or any other node in the cluster (load sharing).
If you say you don't want to use any balancing in theory (although not tested or supported) your setup should work. The keepalived daemon ensures there is someone listening on the virtual IP address and the packets are directly handled by MWG. As this is not a scenario we had in mind when building the Proxy HA functionality it is possible that mfend-lb -s indicates problems. If everything works as expected I would leave it as it is.
BUT the official and supported proxy HA implementation assumes that port redirects are set up and MFEND shares the load across all scanning nodes. In such a scenario also mfend-lb -s should not indicate any failures, unless a node becomes unavailable. Adding the port redirect for 3128 is OK but obviously there is something else wrong in the configuration or communication between the nodes, otherwise mfend-lb -s would not indicate errors.
You should review the configuration or share some details with the community/support to have the configuration checked. Also make sure that the nodes are all in the same broadcast domain, as the MFEND module uses broadcast calls to talk to all other nodes.
Take a look at /var/log/daemon or messages, you should find some lines about keepalived. It exactly tells you when it sets up the virtual IP or something changes (e.g. a node goes offline). If the VIP is not accessible keepalived may already indicate some problems.
Thank you for the detailed explanation, I did not manage to find such information in the official documentation so it is very helpful.
Adding the port redirect for 3128 is OK but obviously there is something else wrong in the configuration or communication between the nodes, otherwise mfend-lb -s would not indicate errors
The thing is that mfend-lb -s reports OK if port redirect is in place but direct node IP and VIP becomes inaccessible in port 3128 and only scanning node is accessible. Any ideas how to debug this?
yes the behaviour is odd and not correct, so something must be incorrectly configured or there are some communication problems. Did you look into the log files for keepalived? Probably it tells you why it shuts down the VIP which is the most important thing to find out.
You may want to follow the guide Alex posted just to make sure all settings are correct. Also make sure that the settings are correct on all nodes that participate the cluster. For example the VIP must be configured identically on all machines, otherwise keepalived won't start up correctly.
That would be my first step to go in order to find out what is wrong.
Probably it tells you why it shuts down the VIP which is the most important thing to find out.
VIP does not shutdown. Again - "mfend-lb -s" reports that everything is okay. netstat -nutlp shows that 3128 is listened on both nodes and VIP is also in place on the director.
But if I do telnet VIP 3128 or telnet "director's IP" 3128 from a client I got timeout. If I telnet "scanning's IP" 3128 I get connection.
You may want to follow the guide Alex posted just to make sure all settings are correct.
I followed exactly this guide. The guide is pretty simple so I can't see any missing points.
okay. Because you told the VIP is unreachable I was assuming it is not in place. So the VIP is in place (e.g. you can ping it, for example), but there is no response when you telnet 3128 on either the director VIP or the directors physical IP?
That sounds strange. Did you reboot MWG after you set the port redirects? Maybe that helps.
If it does not if you don't mind you could share some screenshots of the Proxy configuration tab so that we can see some of the configuration details? The settings for Proxy HA and the Proxy ports would be interesting.
Otherwise it may be required to review the complete configuration and/or collect some packet captures to find out why MWG does not accept the connection attempts - it might be helpful to file an SR with support to have all the data submitted and analyzed.
Thanks for you help.
I have found that our colleagues played with routing tables on these appliances therefore cluster become not not working.
Well, routing is fixed but the issue is still in place.
Each node have two interfaces eth0 and eth1.
- proxy end point (for users)
- management interface
When clients use VIP ip to connect some of them use director node and others scanning node. Those who redirected to director node - can surf. Those who redirected to scanning node - can't.
What I found by tcpdump:
1) when it works:
client1 -> to VIP (eth0, director) -> client1
2) when it does not work:
client1 -> to VIP (eth0, director node) -> (scanning node) -> from VIP to client1 (eth1 scanning nide) ---x client1
Why it tries to go back within eth1 and not eth0 interface from scanning node while there is no any routing entries that would make it to do so?
In "Configuration -> Proxies -> Proxy HA" right under the "Director Priority" you configure a "Management IP". The IP address must belong to a physical interface and this interface is used for sharing the load across the nodes, e.g. this is the interface used for all the MFEND communication.
It seems like you have probably configured eth1 here. So the director picks up the traffic and decides to forward the traffic to another scanning node. From your description it seems like eth1 is used for this. MWG tries to answer directly to the client, which probably does not work.
I would assume that MWG picks the right interface for talking back to the client, but according to your observations this is not the case. So this might be something support should take a look at to find out if this is something that needs to be fixed or a specific problem for your environment.
You could try picking an IP address that is bound to eth0 as the "Management IP" and see if that makes a difference.