Best Practices: Monitoring the /opt partition

Version 13

     

    Introduction

     

    This document focuses on hard drive usage (disk space) for the Mcafee Web Gateway and the best practices for monitoring it. Our primary focus will involve the /opt partition on your Mcafee Web Gateway.

     

    It is important to monitor the /opt partition as this is where the MWG (McAfee Web Gateway) software is installed. Also the MWG processes utilizes the /opt partition as a temporary work space, to store user-defined logs, and troubleshooting/debug files. A full /opt partition will cause a negative impact on your appliance.

     

    A /opt partition that is full can result in the following symptoms:

    • Failure to access MWG’s User Interface (UI)
    • Failure to save changes while in UI.
    • Unable to pass web traffic through the proxy
    • Other permission issues as MWG is prevented from writing to the /opt partition’s filesystem.

     

    How do I monitor the /opt partition?


    There are a number of ways to monitor the /opt partition.  A best practice for monitoring /opt is to configure alerts to trigger when /opt usage reaches a specific threshold. This can help you avoid problems before theyhappen. Below are some of our recommended methods to trigger alerts:

     

    Incident IDs

     

    Dashboard alerts are presented when the /optpartition usage exceeds 90%.

     

    incident.png

     

     

    Every dashboard alert equates to something called an Incident ID. An Incident ID can be used as criteria to trigger events such as an Email Notifications, syslog, etc. so that the Administrator can be warned about the event before a bigger problem can occur.  The Incident ID used when /opt partition usage exceeds 90% is Incident ID 22.  Please see this community article for best practices when monitoring based on Incident IDs: https://community.mcafee.com/docs/DOC-4837

     

     

    Statistical Counters

     

    Statistical counters are another method that you can use to monitor your /opt partition. There are many different statistical counters available. In this case we will be dealing with the counter called ‘FileSystemUsage’. The main benefit of this method is that you can set the threshold for the counters you are monitoring. Instead of only triggering an email notification when /opt is 90% utilized like above, you can now tweak this to a value that better suits your organization.

     

    Statistical counters are monitored are through a special Incident ID (5).  This “incident” is intended to be just a trigger. Incident ID 5 is calling the error handler once every minute and can be used to trigger checks. In this example, we use it to monitor our statistics counter (FileSystemUsage).

     

    incident1.png

     

     

    How do I create a rule to monitor and send alerts?

     

    Creating a rule to monitor and send alerts involves a few steps. The rules are created in the monitoring rule set which is located in Policy > Rule Sets  > Error Handler > Default > Monitoring rule set. The rule will include:

     

      • Criteria to Enter the rule
      • Building a notification message
      • Configuring alerts to use the notification message

    Criteria for Monitoring rule set

    Incident.ID equals 5

     

    Criteria for Opt Partition Rule Set

    “Statistics.Counter.GetCurrent (“FilesystemUsage”)<Default>greater than or equal 85”

     

    Building a notification message

    Build a notification message called User-Defined.notificationMessage. As an example this notification message will contain “MWG /Opt partition usage at: 87%” when usage is 87%

     

    incident2.png

     

    Add an alert that uses the notification message

    Build an email alert that uses the notification message by adding an event to the rule. Notice that the email body uses the same notification message created above.

     

    The example below has a rule to create the notification message and 4 alert rules (SNMP, syslog, email, and log file)

     

    incident3.png

    The example below has a rule to create the notification message and 4 alert rules (SNMP, syslog, email, and log file)

     

    incident4.png

     

    Example Ruleset: Check MWG Partition

     

    I received an alert, what do I do?

    Time is critical so you will want to identify and correct the problem as soon as possible.

    1.   Log into the Appliance with an SSH client using the ‘root’ user account.

        • Open the SSH client.
        • In the Host section, type the IP address of the Web Gateway Appliance.
        • Click Open.

    2.   Run the command below to review disk usage on /opt:

     

    df -h

     

     

    This will help to confirm the alert message you received.

     

    3.   Next, run:

     

    du /opt -Dm | sort -n|tail 

     

     

    This will find the largest folders on /opt and output the results in ascending order based on MB.

     

     

    4.   The example below shows that a cores directory is using approximately 8GB of disk space due to four core files.

     

    incident5.png

     

    5.   If you’re not able to determine the source of the problem, please create a listing for your support representative. You can run this command to output a .txt file with a directory listing:

     

    du /opt/ -Dm|sort -n > opt_directory_listing.txt

     

    Attach ‘opt_directory_listing.txt’ to your open Service Request email.

    Common causes, Preventative measures, and file locations

     

    •   Common reasons the /opt partition would run out of space:

     

    a.)    User-defined logs: Logs like the access.logs and access_denied.logs, are located in the /opt/mwg/log/user-defined-logs directory and sub directories.

     

    To prevent partition utilization issues you should setup an appropriate rotation, deletion, and push schedules for the logs located in Policy > Settings > Engines > File System Logging. Additionally, it is a best practice to enable the setting “GZIP log files after rotation”.

     

    b.)    Tracing files: These are located in the /opt/mwg/log/debug directory or subdirectories. Some example debug file locations are:

     

    /opt/mwg/log/debug/connection_tracing
    /opt/mwg/log/debug/ruleengine_tracing
    

     

    The /opt/mwg/log/debug/ directory also contains the authentication trace, quota trace, PD storage trace, log manager trace, and coordinator trace files.

     

    To prevent partition utilization issues these files should only be created just prior to testing and then disabled/stopped after the testing has completed. Many of the debugs tracing options allow restricting the logging to one test IP address. It is our recommendation that you do this to limit the amount of information generated.

     

    Note: Connection tracing files will be removed on reboot or restart of the service. It is still recommended to manually remove the files and/or disable the tracing immediately after.

     

    c.)    Temporary files: are located in the /opt/mwg/temp directory. It is normal for MWG to create mwg-core temp files in this directory while it is downloading and scanning files. If this directory is the source of your disk space usage on /opt, it could be that many large files are being downloaded at that time.

     

    These files are managed by the mwg-core process and are removed after download/scanning has completed.

     

    Note: Streaming Media files that are scanned by AV can cause very large temp files as streams have no known end. It is recommended that these files be bypassed from scanning by the Streaming Detector. More information on the Streaming detector can be found here: https://community.mcafee.com/docs/DOC-4806

     

    d.)    Support Files: Support may request additional troubleshooting files for review.

     

    /opt/mwg/log/debug/tcpdump
    /opt/mwg/log/debug/feedbacks 
    /opt/mwg/log/debug/cores
    

     

    Note: after support has ensured the receipt of these files, you may remove these files from your system.

    Note: Core files have a deletion schedule which is define under Configuration > Appliances tab > Log file manager> Advanced