Incident Response: Support Recommendations

Version 10

     

    Introduction

    This document is complimentary to McAfee Web Gateway support's Incident Notifications ruleset.

     

     

    ID: 5

    Description:

    This is a test incident used for Monitoring. This incident occurs every minute.

    Message:

    Monitor Incident (7): Monitor Incident

    Action:

    Ignore this email, it should only be used for testing. You need to disable "Test Notifications" in the Incident Notification ruleset.

     

     

    ID: 20

    Description:

    RAID monitoring reported critical status or failure of one or more hard disks.

    Message:

    health monitor (3): RAID reports 0 critical disks, 1 failed disks, and 1 degraded virtual drives. Disk Info: Physical Slot: 4, Device Id : 3

    Action:

      1. Determine if drive is truly failed or needs to be rebuilt: Adding a Hard Drive back into RAID on a Web Gateway 5000 or 5500 Intel based Appliance
      2. If drive is failed gather the necessary hardware logs as per Generating hardware log files with "getlogs.sh" script. Open a Support ticket, please include RMA information (contact and shipping details), and the alert message you received.

     

     

    ID: 22

    Description:

    File system usage has exceeded a configured limit.

    Message:

    health monitor (4): Filesystem usage on /opt exceeds selected limit (91% / 90%).

    Action:

    Identify files which are filling disk, attempt to disable any debug logging that was enabled erroneously, and delete any unnecessary files.

      1. Use the steps outlined in Best Practices: Monitoring the /opt partition  article.
      2. Once the problem is identified and solved, you may consider resizing the disk.

     

     

    ID: 24

    Description:

    System load has exceeded a configured limit.

    Message:

    health monitor (4): 5 minute load average exceeds selected limit (3.35 / 3.00).

    Action:

    System load alerts alone, are not always indicative of a problem. Other performance metrics should also be taken into account.

      1. Identify if there is user impact. An easy test is to test web browsing performance through the proxy. Was there slowness during the time of the alerts?
      2. Is this an appliance or a VM? Appliances will have dedicated CPUs versus a VM based deployment which may share resources with other virtual hosts.
      3. Identify what process is using most CPU load using the top command. Usually this will be the mwg-antimalware or mwg-core process.
        • mwg-antimalware - run the command below to see what is currently being scanned:

    /opt/mwg/bin/mwg-antimalware -S threads

        • mwg-core - there are many possibilities it is best to consult your McAfee Technical Support professional if you are seeing user impact as a result of this alert. Please include a feedback file taken during the problem state.

     

    ID: 26

    Description:

    A check has been executed to detect a BBU RAID error. (Only for Intel based appliances)

    Message:

    health monitor (4): RAID BBU check reports remaining capacity of battery as low.

    health monitor (4): RAID BBU check reports the requirement of battery replacement.

    Action:

    If battery is low and appliance was recently powered on, please monitor for 24 hours. If the low battery message persists or the message indicates failure, gather the necessary hardware logs as per Generating hardware log files with "getlogs.sh" script, open a Support ticket. Please also include RMA information (contact and shipping details) and the alert message you received.

     

     

    ID: 501

    Description:

    Log File Manager failed to push log files.

    Message:

    Log File Manager (3): "Cannot push '/opt/mwg/log/user-defined-logs/access.log/access1505182209.log' to 'ftp://myFtpServer:9121/access1505182209-10.10.69.72.log'"

    Action:

    If log files fail to push, deletion of those log files cannot occur. If action is not taken the disk could fill.

     

    Most log push failures occur due to network or log server issues:

      1. Identify which log is failing to push (see notification).
      2. Investigate why the log push is failing. Consult the mwg-logmanager.errors.log (Troubleshooting > Log Files > mwg-errors).
        • Search the log for the file indicated in the alert that failed to push. Look for the error message
    # DNS problem
    Error output is 'curl: (6) Couldn't resolve host 'myFtpServer''
    
    
    
    
    
        • If using Web Reporter or Content Security Reporter, verify it is running and reachable from Web Gateway.

     

     

    ID: 700

    Description:

    Web Gateway entered overload state

    Message:

    Proxy (2): Overload: Connection limit of 25000 simultaneous connections has been exceeded. Delaying accepts

    Action:

    Web Gateway is overloaded.

     

    ID: 701

    Description:

    Web Gateway had entered an overload state, and is still in an overload state even after delaying accepting connections

    Message:

    Proxy (2): Overload: The Webgateway is still overloaded and delays accepts

    Action:

    Web Gateway is overloaded and issue should be addressed immediately.

     

    ID: 702

    Description:

    Web Gateway left an overload state

    Message:

    Proxy (4): Overload: Left overload handling. Accepts will be done immediately again

    Action:

    No action should be required unless Web Gateway is going back and forth from overloaded to not overloaded.

     

    ID: 901

    Description:

    The appliance is connected to n servers for NTLM authentication in Windows domain x.

    Message:

    Authentication (6): Connected to 1 server(s) in domain x

    Action:

    There is no action required for this incident. The Web Gateway successfully connected to the domain. If you recently received an alert that the MWG was disconnected from the domain, it's useful to know that MWG was able to re-connect.

    Related IDs:

    902,903

     

     

    ID: 902

    Description:

    The appliance could not connect to n servers for NTLM authentication in Windows domain x.

    Message:

    Authentication (3): Failed to connect to DC dc1.mcafee.com in domain mcafee.com

    Action:

    Depending on your available domain controllers, this may or may not be an issue.

      1. Verify how many DC's the Web Gateway is using (Configuration > Windows Domain Membership). If more than one DC is configured, Web Gateway should communicate with the active DCs so users should remain unaffected.
      2. Verify with your Domain Admins that the DC was not taken offline for maintenance.
      3. If there is continued user impact, replicate the problem with a test client while the authentication debug log and tcpdump are running, see the troubleshooting steps outlined in the NTLM Best Practices. This data can be provided to McAfee Technical Support for further analysis. Along with the data described above please include client IP, observations, and a feedback file (includes authentication debug log).

     

     

    ID: 903

    Description:

    The appliance could not contact Windows domain x for NTLM authentication.

    Message:

    Authentication (3): The following domain(s) can't be contacted: mcafee.com

    Action:

    This alert requires action as it signifies that no configured DCs are reachable for a given domain. Users required to authenticate with this domain, will be impacted.

      1. Verify how many DC's the Web Gateway is using (Configuration > Windows Domain Membership). If only one DC is configured, more should be added if available.
      2. If there is continued user impact, replicate the problem with a test client while a tcpdump is running, see the troubleshooting steps outlined in the NTLM Best Practices. This data can be provided to McAfee Technical Support for further analysis. Along with the data described above please include client IP, observations, and a feedback file.