2 Replies Latest reply on Sep 7, 2017 10:38 AM by echacon

    Issue with load balancing on MWG 7.6.2

    echacon

      Hi MWG Community!

       

      I'm checking a weird issue in our MWG environment.

      We have two appliances configured in Proxy HA, and normally the load balancing works like a charm. If we check our Web reporter, and traffic monitor, we see an activity relation around 55%-45%, wich is considered normal.

      But in the last 3 days, we see in Log Source Activity report more activity only in one of the apliances (86%-14%) and it was shifted per day.

      Wednesday was top used  VMmwgappl1

      Tuesday was top used  VMmwgappl2

      Friday, today, is top used VMmwgappl1

       

       

      In CLI exists a command to check, or force to assign, the number of connections per device?

      Regards!

        • 1. Re: Issue with load balancing on MWG 7.6.2
          aloksard

          Hi Erick,

           

          Hope you are doing well.

           

          MWG does some sort of round robin based on the IP address of the incoming connection and tries to equally share IP addresses across the nodes. If you have 50 clients and MWG sees all of these IP addresses it tries to put 25 of the IP addresses to scanning node 1 and 25 to scanning node 2. It depends a little on how quickly new IP addresses come in so the distribution will not be 100%ly equal, but you should see some traffic on both nodes.

           

          Now it could happen that there is one client IP address which creates much more requests/second than others. Now you will see that - although the clients are almost equally shared across all nodes - one node does much more requests per second than the other one. The load sharing does not care for requests/second but only for connections.

           

          the load sharing algorithm is source IP based.

           

          The client IP addresses are "sticky". This means that a client that was balanced to one of the appliances will stay on this appliance until 5 minutes of inactivity.

           

          This means, when you add a new appliance to a running system, don't expect that this appliance will get an even load. Only new clients will be directed to the new appliance.

           

          On the next morning when all persistent connections have timed out, the load will be distributed more or less equally.

           

           

          mwg-mon:-

           

          A health-check process that monitors the Proxy HA redirect ports you defined to ensure that they are available to accept incoming traffic.

          if a redirect port is not listening, then mwg-mon detects this as a failure and load balancing will cease.

          monitors all of the port redirects as a whole and not individually

           

          mwg-mon -v –c

           

          Loadbalancing functionality of the mfend network driver:-

           

          The network driver has a metric for calculating the load balancing that is near to a round-robin metric, but not exactly.

           

          The metric uses the following values:-

           

          client count this is the number of active connections on the node

           

          persistent timeout queue size this is the number of persistent entries that are currently not active

           

          local and remote weighting parameter that can be configured in /proc/wsnat/status. 0 by default - so not affecting the result.

           

           

          The node with the lowest score gets the next connection from a new client

           

          score = client_count + persistent_connections_on_timeout_queue * 10

           

           

          score +=  (score * weighting) / 100

           

          The timeout for persistent connections can be controlled by the parameter from /proc/wsnat/status i.lbpersisttimeout. To turn it of one can use:

           

          echo "lbpersisttimeout 0" > /proc/wsnat/status

           

          Q: Why do I sometimes get unexpected balancing?

           

          A: Well, connections to the director behave different to connections to the scanning node and this is also dependent to the client.

           

           

          Example: Situation after one wget request:

           

           

              local:    client count = 0, persistto = 1

              scanning: client count = 1, persistto = 0

          The TCP socket closes the network-driver connection immediately whereas on the scanning node the connection is removed after a period of time. During this time the connection is still counted as one connection.

           

           

          Below is an example scenario:-

           

          I did some tests with two nodes, four clients one after the other and wget:

           

           

          director: client count = 0, persistto = 1 => score = 10

          scanning: client count = 1, persistto = 0 => score = 1

          scanning: client count = 2, persistto = 0 => score = 2

          scanning: client count = 3, persistto = 0 => score = 3

          Now there are three connections on the scanning node and one on the director.

           

           

          After waiting some time the scanning node has persistto = 3 (for the three clients) and the score 30. So now the director will get more connections.

           

           

          Q: How to achieve round robin?

           

           

          A: Make sure the connections are held open until all IP addresses are distributed.

           

           

          A: mfend-lb -s shows: stats: loadavg clientcount total persistto peercount

           

           

          upcoming mfend-lb version might show the score with "mfend-lb -l [-v]"

           

           

          The load balancing in MWG 7 is dependent on the McAfee Network Driver in conjunction with other network variables. The load balancing is dependent on source client IP address and there is a certain amount of IP stickyness that is needed for important functions to continue to work. There is also a running count, as you indicated, where active connections are counted, scored and diverted by the director to a scanning node to help distribute the load. The score = client_count + persistent_connections_on_timeout_queue * 10. The reason is that this queue contains one entry for each IP address whereas the client count is one per connection. The typical browser connection has about 10 connections from one client. Therefore, a client IP that currently has no connection open (but might have 10 in the future) is weighted in the same way as a client that currently has 10 connections open.

           

           

          Regards

          Alok Sarda

          • 2. Re: Issue with load balancing on MWG 7.6.2
            echacon

            Thank you aloksard , the information you provide is very useful!

            Now it is clear to me how load balancing works and will help us a lot to answer doubts of this kind.

             

            regards!