Best Practices: Central Management in Web Gateway 7.x

Version 4

     

     

    Introduction

     

    The goal of this document is to outline the best practices for having multiple Web Gateway proxies in a Central Management cluster. After reading this, you will have a better understanding of how to manage your Web Gateways, whether in a small cluster or on a large global scale.

     

     

    What does it do?

     

    Central Management is used to synchronize the configuration (policy) between two or more McAfee Web Gateway appliances. This is useful as it solves the problem of making duplicate changes on each appliance. Everything under the “policy” tab, along with the admin accounts, is synced automatically when creating the cluster. Each time a change is made and “Save Changes” is invoked the change will be propagated to all cluster members automatically. This allows the administrator to ensure that filtering policy is the same no matter which appliance is handling a given request.  Settings under the Configuration tab are unique for each cluster node.  This allows the administrator to assign separate networking configuration (IPs, routes, etc.) on each appliance.

    Central management will also automatically sync a full configuration from every node to every other node in the cluster. This means that it is only necessary to take a backup on one appliance as that backup file will contain the full configuration for every cluster member.

     

     

    What doesn’t it do?

     

    While the term “cluster” is used to refer to a group of MWGs joined in Central Management, it should not be confused with the idea of a traditional proxy cluster. A Central Management cluster does not include any load balancing or failover functionality for end user traffic.

     

     

    Where’s my master?

     

    Web Gateway operates using a "master-less" philosophy. As the whole purpose of the Central Management Cluster is to keep the Policy in sync throughout the cluster, you may log in to the Web UI on any node to make changes to the policy and the policy will be synced to every node instantly when saving changes. However, you may attach the GUI to only one node at a time in order to limit the possibility of conflicts. Therefore, it is recommended that a single node be chosen to be the Management node and all Administrators will only connect to that node to make changes. If a GUI is currently attached to one node and you attempt to login to a different node, you will be presented with the following screen along with a redirect URL directing you to the appliance that the GUI is currently attached to so that you may login:

     

    tomcat is already attached.png

     

     

    Another tomcat is already attached to this cluster
    Please log on to that one.
    Overall Status "STATUS_ERROR":generic error
    Node "564D0FA3-6391-5B1C-ECDD-18116FD8DADD" reports STATUS_ERROR_GUI_ATTACH:
    could not attach tomcat to node 564D0FA3-6391-5B1C-ECDD-18116FD8DADD because there is already a tomcat attached at node 564D9E0F-0826-9B8D-3BEB-0DC024262A64
    Node "564D9E0F-0826-9B8D-3BEB-0DC024262A64" reports STATUS_ERROR_GUI_ATTACH:
    could not attach tomcat to node 564D0FA3-6391-5B1C-ECDD-18116FD8DADD because there is already a tomcat attached at node 564D9E0F-0826-9B8D-3BEB-0DC024262A64
    Please try to reattach to: https://10.10.73.71:4712/Konfigurator/request
    
    
    
    
    
    
    
    
    
    
    

     

    Configuration

    Prerequisites

     

    The following is needed prior to setting up a Central Management cluster:

      1. Appropriate routes must be configured in your network to allow cluster communications. In case there are Firewalls between your Web Gateways, you also need to ensure that the CM port (default tcp 12346) is allowed.
      2. Time must be in sync. Cluster communication is very dependent on this as this is how it knows which node has the most up-to-date configuration. It is highly recommended to configure NTP (Configuration > Date & Time) to handle this automatically. If you do not have an internal NTP server, you may use ntp.webwasher.com
      3. All appliances should be running the same version & build. It is never recommended to mix versions in a Central Management cluster

     

    Settings

     

    Configuration takes place under Configuration > Central Management. Under Central Management Settings, you will want to configure the physical IP address of the NIC that will be used for communicating in the cluster. It is recommended to stick with the default port of 12346, however you may change it as long as it is configured the same on all appliances

    settings.png

    If you change your IP address under your network configuration, you must remember to update it here as well as it will not be updated automatically. Do not configure it as 0.0.0.0 as that will not work either.

     

    Node Priority

     

    Node priority configures how the cluster will re-act when policy is out sync. The node with a higher priority (lower value) would win in case of a conflict. If all nodes have the same priority, the node where the most recent change was made will win. It is recommended to keep all nodes with the same priority.

     

     

    Group Definitions

     

    Network

    Network groups are used to control communication flow in a cluster. If all nodes are in the same network group, then all nodes will talk to all other nodes. The Network Groups can be thought of as a kind of “routing” of Central Management communication. Defining a unique network group for each physical location with one node in each location in a "Transit" group is recommended. All nodes in one location will then be forced to talk through the Transit node in order to communicate with the rest of the cluster.

     

    Update

    Update groups control how engine updates (URL Filter, AV DATs, etc.) can be shared throughout the cluster so that updates only need to be downloaded from the internet once. In general, it is best practice to have one Update group for each geographic location. That way, nodes in the same location (fast connections, LAN links) can share URL Filter and AV Engine updates, while these updates are not shared between the different locations (slower connections, WAN links), and no bandwidth is utilized on these potentially large files.

     

    Runtime

    Runtime groups control how runtime data is shared. Runtime data includes coaching, quota, authorized override and pdstorage values. It is recommended to have a unique Runtime group for each physical location if users will be browsing through multiple proxies. If there is no overlap in the users accessing specific appliances, then it is recommended to put them into separate Runtime groups to reduce the overhead of information sharing.

     

    groups.png

     

     

    Adding a Node

     

    Adding a node is as simple as going to Configuration > Add > enter IP of node to be added. Please keep in mind that the node that you are adding will have its policy overwritten by the node for which the GUI is currently attached (the one that you are logged in to).

     

    adding a node.png

     

     

    Once the nodes have been added to the cluster you will see them on the left side.

     

    4cluster.png

     

    Updates

     

    The Automatic Engine Updates sections configures how engine updates, include URL Filter, AntiVirus, CRL, Application Datbase and DLP, are handled.

     

    Enable automatic updates - This checkbox enables the local ‘interval check’ for updates (like a cron job). When the interval time is reached, the local node will check whether the option for “Allow to download updates from internet” is enabled. If “… from internet” is NOT enabled, NOTHING will be done. No update is triggered. If it is enabled, the node will send a CM message to all other nodes (in its update group) and ask for their current list versions and whether they would like to receive updates.

     

    All nodes that have the third option “Allow to download updates from other nodes” checked, will respond with their current version information. The “updating” node will then assemble a list of versions for the other nodes, including itself, and contact the update server with this information. The update server then sends the updates back with information about the nodes to update. The “updating” node then distributes the updates to the corresponding nodes in the cluster (or not if their versions are current).

     

    Update packages are not “stored” within the cluster. They are for immediate delivery only. If a node with no current lists comes online and cannot do its own “internet” check, it will have to wait for another node to reach its update interval

     

     

    Example with two locations

     

    Here is an example cluster with 2 nodes in Tokyo and 2 nodes in New York City. Each location has its own unique runtime and update groups with one node from each location in the Transit group.

     

    smallcluster.png

     

    Tokyo

    tokmwg01

    Runtime: tokyo

    Update: tokyo

    Network: tokyo, transit

     

    tokmwg02

    Runtime: tokyo

    Update: tokyo

    Network: tokyo

     

    New York City

    nycmwg01

    Runtime: newyork

    Update: newyork

    Network: newyork, transit

     

    nycmwg02

    Runtime: newyork

    Update: newyork

    Network: newyork

     

     

    Example with a larger cluster

     

    As a best practice, we recommend only putting up to 10 nodes behind a single transit node. If you have more than 10 nodes in a location, you should have more than one transit node and create smaller network groups that are tied to the transit node. Here’s an example with a larger cluster with nodes in Tokyo, New York and Paderborn. For the smaller locations with one transit node, the runtime and network groups use the same name.

     

    largecluster.png

    In this example:

    • 'tokyo' and 'newyork' are both runtime and update groups
    • toknet1, toknet2, nynet1 & nynet2 are individual network groups
    • paderborn is a network, runtime and update group

     

     

    Common Questions

    How do I verify my cluster is in sync?

     

    This can be checked under Configuration > select Appliances (Cluster) on the left side. Then, look at the Appliances Information section on the lower part of the main pane.  This shows the UUID (Universally Unique Identifier – unique per appliance), Name, Version & Current Storage Timestamp for all nodes in the cluster. An in sync cluster will show the same Current Storage Timestamp for all nodes.

     

    cluster.png

     

    How do I troubleshoot connectivity problems?

     

    Your go to data should be the mwg-coordinator.errors.log found under Troubleshooting > Log Files > mwg-errors.

    Examples:

     

    Time out of sync: [Coordinator] [NodeRequestFailed] . . . failed with error ‘(405 – ‘time difference too high’)
    Network problems: [Coordinator] [NodeRequestFailed] … failed with error '(301 - 'cannot connect') - "co_distribute_init_subscription: failure 'errno: 113 - 'No route to host'' while sending request - last action: connect"'
    
    
    
    
    
    
    
    
    
    

    Why can't I add a node?

     

    Most errors when adding a node should be self explanatory. For example:

     

    • No Local Listener Defined: Add appliance failed: cannot add node because local node has no running listener available - new node would not be able to talk back to this node
    • Incorrect IP Specified or No Listener on Node That is Being e Added: co_distribute_add_clsuter_node: failure 'errno: 111 - 'Connection refused' while sending request - last action: connect
    • No or Wrong Network Group Defined on Node That is Being Added: [StorageConfigurationRejectedLocal] Configuration rejected by binary 'COORDINATOR' with status 'checking new settings failed - Invalid node path matrix / unreachable nodes'.
    • Add appliance failed: co_distribute_add_cluster_node: ssl failure on message socket 14 while sending request - last action: certificate verification. --  The Cluster CA was changed on your primary node while your secondary node is still making use of the default cluster CA. It's found under Configuration > Appliances tab > select Appliances (Cluster) > then Cluster CA will show up to the right. Verify that if you select this Cluster CA button 'McAfee Web Gateway Cluster CA' appears as the CA in use. If they are different, the CA in use on your primary node will need to be imported on your secondary node.

       

    How do I check NTP or why is my time still off?

     

    You can check your NTP settings from the GUI under Configuration > Date & Time.  If your time is too far out of sync (more than ~1000 seconds) then NTP will not sync automatically. You can manually set the time for the GUI or use the following commands from the command line:

      • Force a sync  (MWG 7.2 and earlier):
    ntpd -s

     

      • Force a sync  (MWG 7.3 and above):
    service ntpd stop
    service ntpdate start
    service ntpd start
    
    
    
    
    
    
    
    
    
    

    Note: ntpdate is also called to sync time at startup.

      • Log to the console so that you can verify that communication is taking place:
    ntpd -d

     

    Here is a good link to reference how NTP works: http://www.ntp.org/ntpfaq/NTP-s-algo.htm

     

     

    How are conflicts handled?

     

    If it happens that two cluster nodes have different policy settings, first the node priority is checked (higher priority/lower value wins) and if they have the same priority, the one with the last/newest change wins.

     

    How do I upgrade my cluster?

     

    We recommend breaking the cluster to make each node stand-alone and upgrading nodes individually. Please see KB76905.

     

    How do I create a region specific policy?

     

    You can add criteria to your rule sets that would be location specific. For example:

      • Proxy.IP
      • System.HostName
      • System.UUID

    If the locations use different IP schemes/subnets, you could also use Client.IP.

    These criteria would be placed on top level rule sets with all sub rule sets contained inside. If you only have minor differences in the regional policies (for example just different directories for authentication), it is easy enough to keep one global policy and make the changes in just the important rules. Here is an example with Top level region specific Policies:

     

    region policy.png

     

    How do I configure my access log pushing?

     

    Please see KB76899 for information on how to configure one log configuration that works for all cluster nodes.dd

     

     

    What is a shared data error and should I be concerned?

     

    You might notice the following error in your dashboard alerts intermittently. This "nuisance" error might stand out or could cause unnecessary alarm. This error will occur when a node in the Central Management cluster is not able to send it's shared data to another node in the cluster.  This "shared data" can be classified as runtime data or dashboard alert data.  Run-time data is described here;

    Should I be concerned?

    No. Given the frequency that this data needs to be shared in the cluster, it would not be unusual for this error to occur from time to time. This error is not something to be concerned with - especially if afterwards the dashboard shows a subsequent update indicating a successful synchronization:

    Review your Runtime group configuration

    As a best practice, Runtime groups should be implemented to ensure that runtime data is only shared amongst appropriate cluster nodes.  Without runtime groups, the cluster nodes are unnecessarily synchronizing runtime data across all members of the cluster---which can increase the frequency of the 'Shared Data not synchronized' error.  As described here, runtime groups should consist of nodes which service unique groups of users.  For example, if North American users only reach Web Gateways located in North America and European users only reach Web Gateways located in EMEA, then the NA Web Gateways and EMEA Web Gateways should each belong to their own unique runtime groups.  The unique runtime groups will prevent NA Web Gateways and EMEA Web Gateways from unnecessarily attempting to synchronize runtime data to each other.

     

    Common causes

    This error can also be prevalent under the following circumstances:

      • Your Web Gateways experience an intermittent network issue while one node attempts to transmit shared data to another node. (i.e. tcp handshake failed or node wasn't reachable)
      • While this can occur in any sized cluster, you're more likely to see this in a large cluster as there is a lot of information to share amongst nodes!

     

    Policy Synchronization is not impacted

    One important thing to know is that your "Policy synchronization" is not affected by this error. Your policy is passed between nodes in the "Storage Configuration" where you can see the following;

    If you have concerns that your policy is not syncing correctly please review the following section:

    https://community.mcafee.com/docs/DOC-4823#jive_content_id_How_do_I_verify_my_cl uster_is_in_sync

     

    Additional debugging

     

     

     

    You will want to look at the following log:

     

     

     

    Here is an example error:

     

    [2014-12-13 21:06:33.311 +00:00] [Coordinator] [NodeRequestFailed] Message '<co_distribute_shared_data_raw>' on node 564DAD79-A7BB-BCC2-55C0-58383C9557C7 (mwg75-2) failed with error '(301 - 'cannot connect') - "co_distribute_shared_data_raw: failure 'errno: 107 - 'Transport endpoint is not connected'' while sending request - last action: connect"'.

     

    In this log entry, the "301 - Cannot Connect" denotes that node mwg75-1, couldn't establish a tcp handshake with mwg75-2.

     

     

    Conclusion

     

    In conclusion, you should now have the necessary information to manage your Web Gateway Central Management cluster, no matter how big or small. You should aslo have a good foundation for starting to troubleshoot in the event of a problem.