Web Gateway domain communication for NTLM authentication

Version 5

     

    Introduction

     

    The purpose of this article is to cover requirements, configuration, common issues and troubleshooting Active Directory (AD) NTLM domain communication on the Web Gateway (MWG). Being the most commonly used form of authentication, this is also meant to cover the most common questions and issues we experience in support, as well as making it easier to understand overall. This is not meant to cover authentication issues like intermittent authentication prompts.

     

    Prerequisites

    For NTLM authentication, the MWG must become a member of your AD domain. There are a few things you have to make sure are setup correctly for this to work:

     

    1. MWG must be able to connect to your AD server over TCP port 445 (no other ports are required).

    2. For successful NTLM authentication the MWG needs both the IP address (for tcp level communication) and the Fully Qualified Domain Name (FQDN from here on) of the Domain controller (for SMB level communication). One of the two (either IP or FQDN) is provided in the MWG configuration. You have to ensure that the other one can be resolved by your DNS.

    3. When initially setting up the domain membership on the MWG, a domain administrator account is needed so a computer account can be created in AD for the MWG. Keep in mind that the domain administrator account is only used for the MWG account creation on the domain and those credentials are not stored on the MWG.

     

    Configuration

     

    The first step in configuration is to join the Web Gateway(s) to the domain(s) that will be used to authenticate against. This is done within Configuration > [[ Appliance Name]] > Windows Domain Membership > Join

     

    join_ad_settings-1.jpg

    1. Windows Domain Name: The AD domain (netbios name) to which Web Gateway should be joined. In case you have issues determining the correct Netbios name, a helpful command to run from a cmd prompt in windows is nbtstat  -n and the 'GROUP' that's returned is the name of the domain that the computer is part of.

    2. Gateway account name: This will be the name of the Web Gateway computer account that's added to Active Directory when it successfully joins the domain. After this account is created, it should not be modified, nor should it be created manually.

    3. Overwrite existing account:  If checked, this will overwrite the existing Web Gateway computer name if it exists on Active Directory. Each Web Gateway will need a unique account (computer name) on Active Directory, so if a computer name has been used by another Web Gateway or computer, it will be overwritten. Keep in mind that if the same account is overwritten, the Web Gateway that was using it will no longer be part of the domain and will no longer be able to authenticate against it.

    4. Use NTLMv2: It's recommended to use NTLMv2 if it's supported by your Active Directory environment. This option only enforces NTLMv2 for the Web Gateway while it is joining your AD domain.  It does not enforce NTLMv2 for client requests.

    5. Timeout for requests: This is the amount of time that the Web Gateway will wait for a response from Active Directory before timing out. In case this timeout is reached, the domain controller in question will be flagged as down and we will fail over to the next one (if other DCs have been configured).

    6. Configured Domain Controllers: A comma separated list of Active Directory Domain Controllers that the Web Gateway should be using for this domain. It is suggested to use the fully qualified domain name (FQDN) since it's more likely to properly resolve (forward DNS -> Hostname to IP) than the IP address (Reverse DNS -> IP to hostname) of the Active Directory servers. You can leave this field blank to force the Web Gateway to perform auto discovery of your DCs.  Auto discovery is not recommended as it introduces more complex DNS requirements.  Hard coding the DCs is recommended for most environments.

    7. Number of Active DCs: The total number of active domain controllers the Web Gateway will use for authentications. The Web Gateway will distribute authentication requests between the active DCs.  See 'Understanding 'Active' Domain Controllers, failover, & authentication request distribution' section below for add'l info.

    8. Administrator account/password: The domain admin account and password used to create the computer account in AD. The account and password is not stored anywhere on the Gateway after it's used (Just like joining your windows PC to the domain)

     

    Join the domain by clicking 'OK'.

     


    MWG joined the domain successfully

     

    After joining the domain, you'll want to see a consistently green status indicator in the GUI after selecting refresh, as seen below. If the status is red, there is an issue (see troubleshooting further down)

     

    properly_joined_to_domain.jpg

     

    Additionally, if the account creation was successful, the computer name should be visible within Active Directory.

     

    account_on_ad.jpg

     

    The best method to test user credentials after joining the domain is to see what is returned in an authentication test. The settings for authentication can be found under Policy > Settings >  Engines > Authentication > select your configured NTLM engine (or create one) > select the arrow next to 'Authentication Test' and test with your domain credentials. Here's an example of a successful and failed test.

     

    Good credentials

    good_auth_test.jpg

     

    Bad credentials

    bad_auth_test.jpg

    If nothing has failed so far and your authentication tests were successful, you are ready to start deploying authentication policy for your users. More about that here https://community.mcafee.com/docs/DOC-4384

     

    Understanding 'Active' Domain Controllers, failover, & authentication request distribution

     

    How Web Gateway finds active DCs and handles failover

    In this example, there are 4 configured Domain Controller IP addresses, which we’ll refer to as DC1, DC2, DC3, and DC4, and the ‘Number of active Domain Controllers’ is set to 2.  Default timeout values are used.

     

    Note:  Web Gateway tries to connect to up to 2 DCs. It doesn't connect to all 4 defined DCs simultaneously to select 2 DCs that answered first.  Rather, the DC list defines in what order Web Gateway will try to connect to the servers.  A DC is marked as offline for 3 minutes in case of a communication error (Ex: Web Gateway is not able to connect, or connection to the DC was aborted by a timeout (15 seconds)).

     

    • Web Gateway tries to connect to DC1 and DC2.  Connection to DC1 failed and DC1 is marked offline for 3 minutes.  Connection to DC2 is successful and DC2 is marked active.
    • Web Gateway looks for a second active DC and tries to connect to DC3 (next in the list).  Connection to DC3 is successful and DC3 is marked active.  Both DC2 and DC3 are active.
    • DC2 is no longer reachable and is marked offline for 3 minutes.  DC3 is still active.
    • Web Gateway looks for a second active DC and tries to connect to DC4 (next in the list).  Connection to DC4 failed and DC4 is marked offline for 3 minutes. No additional DC can be contacted right now (DC1, DC2, DC4 are all still within the 3 minutes offline status).
    • DC1 status changes to standby status (3 minutes offline status expired).
    • Web Gateway tries to connect to DC1.  Connection to DC1 is successful and DC1 is marked active.  Both DC3 and DC1 are active.
    • DC2 and DC4 status changes to standby status.  DC3 and DC1 remain the active servers until one or both go offline.

     

    As described above, the ‘active’ domain controller(s) are sticky and DCs in standby status are not checked unless an active DC goes offline.   A restart forces Web Gateway to start over from the beginning to find active DCs.

     

    Authentication request distribution

    Authentication requests are distributed across the active DCs where the fastest DC (first available of the active DCs) handles the next request.

     

    What if the number of DCs in active status are fewer than the specified number of active DCs?

    In an example with 3 configured Domain Controllers and 2 active, if 2 DCs are offline and only 1 remains active, Web Gateway will attempt to reconnect to the offline DCs once they return to standby status in effort to find a 2nd active DC.  In an the case where all DCs are offline, all requests fail immediately until DCs return to standby status and Web Gateway is able to find an active DC.

     

    Troubleshooting

     

    Here are a few troubleshooting examples where the MWG did not join the domain successfully or it has issues communicating with the DCs.

     

    Note that there are only two main troubleshooting tools:

     

    1. The Web Gateway authentication debug log

    2. A network capture/tcpdump taken on the Web Gateway (this will give you the most comprehensive troubleshooting data)

     

    Authentication Debug Log

    You can find the authentication debug log under Configuration > [[ Appliance Name]] >  Troubleshooting > Authentication Troubleshooting

     

    The log files written can be found under under Troubleshooting > Log files > Debug > mwg-core_Auth.debug.log

     

    There are two main options for the authentication debug log:

     

    1) Log management events

    We recommend to have this option permanently enabled. It will log all events that have to do with your AD connection, joining or leaving the domain or failing over from one DC to another. Very little log data is being written, which allows you to always have this option enabled.

     

    2) Log authentication events

    We recommend that you only enable this option for specific troubleshooting, limit it to a specific IP and disable it again as soon as possible after replicating an issue. This logging option will log all events related to actual user authentications. As you can imagine it will grow fast when enabled as not only every authentication request from a client but also group memberships and so on are being logged. It is most useful if you have specific clients that constantly get prompted for credentials or if they simply cannot login at all. Enable the authentication event option and specify the client IP address that will be replicating the problem (for example open the browser and get a prompt). Right afterwards disable the authentication event option again so the log does not grow to a point where it becomes a problem.

     

     

    log_management_events.jpg

     

    TCPdumps

    You can take a packet capture (tcpdump) from the Web Gateway UI or from the command line (recommended option) as 'root':

     

    Command Line: (ssh or console access)

     

    cd /opt

    tcpdump -i any -s0 -w ntlmcapture.cap port 445 or port 53

     

    Reproduce Problem and let capture run for at least 3 minutes. (this is the default timeout value in which MWG attempts to reconnect to a DC)

    Stop capture. (Ctrl +c)

     

    File will be present in the directory (/opt) in which you ran the command.

     

    MWG UI: (Troubleshooting > Packet Tracing)

     

    Add these command line parameters:

     

    -s0 -i any port 445 or port 53

     

    Start Capture.

     

    Reproduce Problem and let capture run for at least 3 minutes. (this is the default timeout value in which MWG attempts to reconnect to a DC)

    Stop Capture.

     

    You can view created traces on your desktop with the free tool "wireshark".

     

    Below are a few examples of what you might see:

    No IP address (Forward DNS failed)

     

    In this example, I tried to join to the Active Directory server by providing the FQDN bob.jimc.local in the MWG UI (see field 6 above) but there is no DNS record for this name, so DNS returns 'No such name.'

    Joining the domain will fail immediately

     

    join_attempt_dns_fails.jpg

     

     

    No or incorrect hostname (reverse DNS failed)

    In this example, I tried to join via IP of 10.10.95.12 which has no reverse record in DNS (or an incorrect hostname is returned). The MWG can estabish the TCP connection to the DC as it has the IP address provided in the UI, but once the TCP connection is established and the protocol switches to SMB, the connection fails as the correct hostname is required.

     

    A similar situation applies when the domain controllers are being load balanced via a virtual hostname. For example, if you provide the FQDN of DCpool.company.com (virtual name for load balanced DCs) and it resolves to the IP of one of your DCs (for example dc1.company.com), your connection will fail because as soon as the protocol switches to SMB, the hostname provided is DCpool.company.com and not the expected/correct hostname of dc1.company.com. Do NOT use virtual hostnames for your DCs. Use the real hostnames and let the Web Gateway do the load balancing for you.

    load_bal_invalid_computer.jpg

     

     

    Bad admin credentials

     

    In this example, the credentials for the administrator used to join the domain were not valid.

     

    join_attempt_bad_password.jpg

     

     

    Computer account deleted or disabled

     

    In this example, the computer account for the Web Gateway was deleted in AD, but the same error could also be thrown if the account is disabled/modified. Also note that the error message is the same as when the incorrect administrator credentials were used while trying to initially join the domain.

     

    ad_account_deleted.jpg

     

     

    'Logon To' Account Permissions in Active Directory

     

    When you join the Web Gateway to the domain, a computer account is created within Active Directory.  When Web Gateway talks to the Domain controller to authenticate users, it uses this computer account.

     

    Some users in Active Directory may have restrictions as to which workstations they are able to logon to. If the user is only allowed to logon to specific workstations, you will need to make sure the Web Gateway computer account is also added as an allowed workstation.  Failure to do so will cause authentication to fail and the user will be prompted to authenticate.

     

    In this scenario, the Web Gateway is joined to the domain with computer account 'WebGateway'.  The user 'user1' is only able to logon to workstation 'Desktop1'.

     

    Web Gateway Domain Membership.png  AD Logon Workstations.png

     

     

    The example below shows the Web Gateway trying to authenticate 'user1' using the computer account 'WebGateway'.  The domain controller responds with an error message indicating that authentication failed.  The error the Domain Controller sends is STATUS_INVALID_WORKSTATION as seen in the screenshot below.

     

    WebGateway Computer Name TCP Dump.png

     

     

    It is important to add the web gateway's computer account into the user's allowed workstations, or to allow the user to logon to all workstations for this to work properly.

     

    Web Gateway in Logon To Section.png

     

     

    Alerting

     

    If you would like to get notifications in case issues arise with your domain membership, you can utilize some of the dash board alerts the MWG produces.Please see the following article on incident alerting: https://community.mcafee.com/docs/DOC-4837

     

    dash_error.jpg

     

     

     

    Last resorts

     

    Hosts file entry

     

    If DNS issues cannot be overcome (temporarily or permanently), an entry into the hosts file of each Web Gateway will likely be required. It is required to change this in the GUI as seen below (do not make /etc/hosts changes on the command line)

     

    example_hosts.jpg

     

     

    Rolling captures for intermittent issues

     

    Log into the Web Gateway with a tool like putty as the 'root' user. Browse to /var (cd /var) and verify that you have enough free space to store the captures using 'df -k'. With the syntax I've provided, you will need 2GB of free space on var, but that can be changed, keeping in mind that if you reduce how many captures will be stored by too much you may have the worthwhile tcpdump deleted before you stop the rolling capture.

     

    nohup tcpdump -Z root -s 0 -i any port 445 or port 53 -C 100 -W 20 -w capturefilename.pcap & <press enter twice>

     

    -C is how large the capture can be before a new one is started in MB

     

    -W is how many captures will be stored before the oldest is deleted for a new capture to start.

     

    -port 445 is for active directory and 53 is for dns

     

    -the other parameters should remain unchanged

     

    To stop the capture, run 'ps aux | grep tcpdump' and get the process ID for the rolling capture, then run 'kill -9 processID' to stop the rolling capture. The completed captures will be in /var/empty/

     

    Takeaways

     

      • Always hardcode the 'Configured Domain Controllers' field with the address of your Domain Controllers. Do NOT use a Virtual Hostname.
      • MWG needs both the IP and FQDN of the configured Domain Controller. You'll specify one in the field provided; the other needs to be resolved by DNS.
      • Remember to enable the 'Log Management Events' debugging option.