SIEM Use Case: Maintaining High Quality of Service

Version 4

    1 Index

     

    2 Introduction

    In many SIEM environments, it's helpful to implement a set of processes that can be used to monitor quality of service, and to provide early visibility in the event that problems should arise.  This document will provide several examples of important dashboards, and is designed to track performance of an environment using several methods over time to ensure a high quality of service.

    • By measuring the EPS rate into the SIEM and comparing it against the baseline we can detect abnormal spikes of activity.
    • By monitoring for Windows lockout events, we will proactively detect account lockouts to determine impact to users who would then not be able to access the environment.
    • We will monitor configuration changes by accounts with privilege, and review to ensure that there is no negative impact as a result of those changes.
    • Multiple failures of configuration changes will also be tracked and reported on. Repeated failures in short time frames are likely indications that something is incorrectly configured, and requires attention.

     

    Specifically, we will implement two dashboards that show this information.  The first, QoS Dashboard, focuses on event rates and denial of service (DOS) events:

    • Event rates (measured in events per second) are a useful indicator of overall network load and particularly load on the SIEM infrastructure and how it is deviating over time with alarms generated when thresholds are exceeded.
    • Potential DOS attacks could have a negative affect or disrupt service.  We will show top 10 source/destination IP pairs based on activity.

     

    1Dashboard.png
    Dashboard 1: QoS Dashboard

     

    A second view, QoS User Activity Dashboard, will be used to view activities by users including:

    • Tracking account lockouts to ensure that there is the least degree of impact to users due to either potential attack activity or simple mistakes
    • Tracking of system changes that could potentially have a negative affect or disrupt service
    • Monitoring the activities of administrative users with privilege, to indicate if there are changes being made with privileged accounts that could potentially put the environment at risk

    2SecondView.png

    3 Prerequisites

    The following are prerequisites to accomplish this use case.

     

    3.1 Required data sources

    This use case has been designed to work well with a wide variety of inputs.  Suggested data sources include:

      • Perimeter firewalls
      • Active Directory (Domain Controllers)
      • Network infrastructure devices (switches, routers) including configuration logging
      • Windows Servers
      • Unix/Linux Servers

     

    3.2 Other infrastructure

      • A Correlation Engine configured in your environment, preferably an ACE appliance
      • SMTP Server configuration, for sending alert notifications via email
      • McAfee ePO, for supporting realtime responses
      • McAfee IPS, for supporting realtime system quarantine

     

    3.3 Required configuration

    The SIEM must have access to query Active Directory. This requires AD to be set up in the Asset Manager and also for there to be access to query AD to build a Dynamic Watchlist that contains accounts that are in Lockout status.  Information on how to use the Asset Manager to connect to AD can be found here: SIEM Foundations: Connecting the SIEM to a Windows Domain Controller for Asset Import

     

    4 Configuration and Buildout

    With prerequisites properly defined and configured, the next step is to begin the build out of our use case.  The sections below will step you through the process of importing the necessary content and configuring the McAfee SIEM.

     

    4.1 Import and customize dashboards

    First we will import predefined dashboards, which will serve as the basis for our use cases.  Once imported, we will customize and tune them to meet our specific needs.

     

    Import QoS Dashboard

    To import the QoS Dashboard:

      1. Obtain a copy of the dashboard definition file: High QoS – QoS Dashboard.vpx. (all content is linked below) Store it in an accessible location on your workstation.
      2. In the ESM UI, Click the Manage Views 3ManageViewsIcon.png icon in the top center
      3. Select a location in the View tree where you would like to store your new view.  As a suggestion, you might create a folder for "Monitoring Quality of Service"
      4. In the View Manager, click Import
      5. Click Choose File and browse to the location of the High QoS – QoS Dashboard.vpx file.
      6. Click Upload to complete the upload.

     

    Import QoS User Activity Dashboard

    Repeat steps 1–6 above for the file named High QoS – QoS User Activity Dashboard.vpx

     

    Customize QoS Dashboard

    The QoS dashboard includes 4 dials, each of which may be tied to a separate device or set of data sources. The queries behind these dials will need to be modified to reflect the devices in your environment.  To customize the QoS Dashboard:

      1. Edit the view by clicking the Edit icon in the top center:
        4EditIconArrow.png
      2. Start with the ESM EPS dial and click on it to select it.  Note that the frame of the query is highlighted in orange and the Properties dialog appears on the right of the screen.
      3. Now choose the Edit Query button in the Properties dialog.
      4. Click the Filter button from the Query Wizard.
      5. When the Query Filters dialog appears, click on the Display Filters List icon beside "Device ID":
        5FilterButtonarrow.png
      6. We will tie this first dial to the the ESM.  Ensure your ESM device is selected from the tree and click OK.
        6QueryFilters.png
      7. Repeat the above process for the remaining three dial components. For each dial, select a different receiver device(s) or set of data sources from your environment. You may add or remove dials if desired. Ensure that you change the Title for each dial at the top Properties panel for each dial component to reflect what it's tracking.
      8. Finish by saving your view.
        7SaveAs.png

     

    4.2 Configure Variables

    In this use case we will leverage the Privileged Users variable which is predefined in the SIEM and can be found in the Policy Editor (Variable/application/PRIVILEGED_USERS).

    8PolicyEditor.png

    This variable can be modified to match the privileged accounts in your environment. Examine the existing values, and add/remove account names as appropriate. Good examples: Windows Domain Administrators, Local Windows Administrator accounts, user accounts with Linux root privileges, sys or sa level users in databases, etc.

     

    4.3 Create Watchlists

    Next we'll build a series of watchlists, which will be incorporated into our views, reports and correlation rules.

     

    Dynamic Watchlist  QoS Locked Accounts

    We will create a watchlist that queries Active Directory and populates a Dynamic Watchlist of accounts which are currently in a locked state. This watchlist will be used in a variety of ways. As an example this watchlist can be used as a filter on events to show today's events related to accounts which have been locked out. This can be useful in determining if a particular user has locked their account due to multiple password failures, or to show the events around multiple account lockouts to determine if there is a pattern of suspicious activity.

     

    To Create a Dynamic Watchlist to query AD for Locked Accounts:

     

      1. Select the Watchlists icon on the upper right of the main window
        9WatchlistsIcon.png
      2. Click Add to create a new watchlist.  The Add Watchlist window opens.  Enter the name QoS Locked Accounts and select the Dynamic option.  By default the query will be run at midnight GMT every day as indicated by the "Update:" section:
        10AddWatchlist.png
      3. Click on the Source tab and select LDAP from the drop down.  Enter in the IP address and credentials to your AD server.
      4. Click the Query tab and enter in the LDAP query for locked accounts:
        • Lookup Attribute: sAMAccountName
        • Query: (&(objectCategory=Person)(objectClass=User)(lockoutTime>=1))
          11WatchlistQuery.png
      5. Test your query by clicking the Test button.
      6. Once the query executes successfully, select the Values tab and then select the Type of Source User.
        12SelectType.png
      7. Click Finish to save your work.

     

    The QoS Locked Accounts watchlist will now be updated every day at the configured time (midnight GMT by default) and can be used as a filter for events, in correlation rules, and anywhere a grouping of Source User entities is available.  We will use it in a custom correlation rule defined below.

     

    4.4 Import Custom Correlation Rules

    Our use case will leverage two custom correlation rules in order to identify relevant behaviors.

     

    QoS – Unusual Activity after multiple Account Lockouts

    This correlation rule identifies unusual activity after an account lockout. This rule is triggered by a pair of conditions. First, the rule waits for either:

      1. Three account lockout events for the same destination IP or,
      2. A user who exists on the locked account Watchlist (defined above)

    The rule ultimately triggers when the SIEM sees a high severity event within 10 minutes of one of the above, with the same destination IP.

     

    To import the "Unusual activity after multiple Account Lockouts" correlation rule:

      1. Download the file "QoS – Unusual Activity after multiple Account Lockouts.xml" to your local workstation.
      2. Open the correlation rule editor
        13CorrelationRuleEd.png
      3. Select File > Import > Rules
        14FileImportRules.png
      4. Click Import Rules
      5. Click Choose File and browse to the "QoS – Unusual Activity after multiple Account Lockouts.xml" file to upload.
      6. Ensure your new rule is enabled in your Correlation Engine policy, and then select Operations > Rollout to push out your new correlation rules.
        15CorEngEnabled.png

    The rule is displayed below:

    16RuleDisplayed.png

    The intent of the rule is to detect when an attacker is attempting to access the environment and is locked out but is eventually successful in breaching a system. The account lockouts highlight the initial brute-force attack while the SIEM logic provides the analysis for suspicious activity after the lockouts occur. This rule leverages the analytics and normalization capabilities of the SIEM, as well as the Dynamic Watchlist capability and can be part of a standard SOC implementation.

     

    QoS Multiple Configuration Change Failures

    We will also create a 2nd correlation rule designed to detect 5 failed configuration changes within 5 minutes that originate from a specific IP.  The intent of this rule is to detect if an attacker is attempting to change configurations of various system(s) in the environment and alert based on the activity.

     

    Repeat steps 1–6 above with the file "QoS – Multiple Configuration Change Failures.xml."

    17CorRule2.png

     

    5 Concept of Operations

     

    5.1 QoS Dashboard – EPS Tracking

    The QoS Dashboard provides a single dashboard to monitor daily operations for abnormal activity that could affect the quality of service in the environment.

     

    The dial components for EPS show activity that is significantly above baseline. When high levels of activity are identified, it would be wise to investigate the Event Distribution graph to dig down into the cause of the deviation.  Typically anything greater than an additional 15 – 30% could be an indicator of an issue in the environment.  In the example below, the gauge for Receiver 2 shows exceptionally high EPS rates compared to the baseline, and warrants some investigation:

    18Dials.png

     

    When this occurs it shows a data source (or data sources) generating a significantly greater than normal amount of data.  This could be due to:

      1. A configuration change – View the configuration changes pane to see if there are any recent configuration changes that are trending significantly above the baseline.  Unless there is a scheduled/known activity (such as an infrastructure upgrade) there should not be a large degree of deviation from baseline for configuration changes in a stable and well operating environment. A higher than normal degree of configuration changes could mean that there is a problem with a device or an administrator who is unfamiliar with a device is attempting multiple changes in an effort to try different things.  Either of these scenarios does not lend itself to a high quality of service to end users.
      2. A malfunction – This will typically be represented as a large spike in events in the Distribution pane. Select the events in the spike to drill in on these and see if there are many repeated events of the same type (i.e. spurious LPR data from the same source IP and destination IP)
      3. An attack attempt – Greater than normal event activity could be an indication of a DoS/DDoS attack. Leverage the DoS pane to view the source events in the correlation rules that have triggered to determine if outside sources are attempting to execute DoS/DDoS attacks.  Consider taking action by blacklisting the source IP to prevent the attack from executing

     

    In any of these scenarios the likely source of the increased event activity can be identified by digging down into the Event Distribution pane.  First select the "spike" in event activity by clicking on it:

    19Spike.png

    Then drill down into the events using the menus on the Distribution pane:

    20Events.png

     

    Once the Events window appears, sort by the number of events by clicking on the Event Count column:

    21EventCountCol.png

    The event(s) with the highest count are the ones which are likely causing a disruption to service due to their volume. At this point the source of the events can be identified (server, network device, etc.) and remediation action taken.

     

    5.2 QoS Dashboard – DoS Tracking

    Additionally as part of the initial QoS dashboard, there is a DoS Events pane for detecting attacks which could cause a disruption in service.

     

    The SIEM uses correlation logic to detect DoS/DDoS attack events. DoS/DDoS events can be a symptom of poor network performance and/or excessive flow data being generated. The DoS Events pane shows when there is a correlation rule detection using McAfee published rules. Should there be rule hits on DoS/DDoS rules, action should be considered to mitigate the attack.  .

    22DOS.png

    In addition to the visual indication of the DoS correlation rule triggers, the view also includes the source IP addresses where the DoS attacks are originating from in the DoS Source IPs pane.

     

    McAfee's Security Connected Platform provides the SIEM with the ability to interact controls such as network and endpoint security countermeasures, and automate many types of responses.  In this example, we'll demonstrate using McAfee NSM to take manual action and block the attack activity via a blacklist:

     

      1. Select the Source IP of the host you wish to blacklist:
        23IPSource.png
      2. Select Actions > Blacklist [IP address] from the menu:
        24Blacklist.png
      3. Select the device on which to blacklist the host (if applicable):
        25DeviceList.png
      4. Click OK.
      5. Blacklist the host in the Blacklist Editor.  The IP address field will automatically be filled out.  Enter any specific port information (which can be discovered from an event summary by viewing the Source IP events in the QoS Dashboard or simply use the default of "0" to block all ports) and click the Add button.
        26BlacklistEditor.png

    The relevant host (24.91.160.238 in the example) is now blacklisted without having left the SIEM console. This will provide protection against the DoS attack activity detected in ESM.

     

    5.3 QoS User Activity Dashboard – Account Lockouts

    Many activities by users and administrators can cause a disruption of service.  The second dashboard, QoS User Activity Dashboard, contains several panes.

    27ActivityDashbd.png

    The Account Lockout pane indicates which Active Directory accounts have been locked out during the selected time period.  Summarize on the source user to show other events related to the lockout.

     

    Account lockouts are likely to be caused by one of two scenarios:

      1. A user incorrectly entering their password too many times and being locked out by corporate policy.  This activity will often occur as part of normal operations – users forget their password, have the caps lock key on, etc.  By tracking the degree of account lockouts (using the Account Lockout Distribution graph) and comparing them against the baseline it can be determined if this is a normal occurrence, or if there are a larger than normal degree of account lockouts it could be an indicator of…
      2. Attack activity which is targeting local accounts in an attempt to access the environment

     

    In the first case, Summarize by source user and determine what other events are associated with the account.  To do this use the Summarize feature to provide a summary by the user in question to determine other event activity associated with the account. If there is no indication of there being unusual activity then it may be a good idea to proactively contact the user and determine if they have locked themselves out.

    28AccountLockouts.png

     

    If there appears to be suspicious activity related to the account lockout then additional investigation is warranted.

     

    5.4 QoS User Activity – Configuration Change Summary

    Also included as part of the QoS User Activity Dashboard is the Configuration Change Summary pane for baselined configuration changes.

     

    Errors in configuration changes can cause disruptions to service. Firewall rule changes can mistakenly prevent inbound or outbound access, aggressive IPS policies can prevent communication and changes to devices such as routers can render networks unavailable. The Configuration Change Summary pane tracks configuration changes and can be used as a forensics tool when there is an issue reported by users. It can also be used to proactively watch configuration changes, compare them against the calculated baseline, and determine if there is either a larger than typical number of changes or if there are changes in unusual or unexpected areas of the company.

    config-change-summ.png

     

    5.5 Monitoring Privileged Accounts

    On the QoS User Activity Dashboard you will also find the Most Active Privileged Accounts pane for monitoring activities by accounts with privilege.  Activity from these accounts should be closely tracked.  Actions from these sources, either by mistake or through malicious intent, could present major disruptions to service.  The panes are configured to track the events associated with privileged users via the Most Active Privileged Accounts pane.  In addition, the events are ranked by criticality level (calculated by values associated with the type of observed events) in the Average Privileged User Activity Severity pane:

    30PrivilegedAccts.png

    These panes should be viewed on a regular basis and if there is either:

      1. a significant deviation from baseline in the Most Active Privileged Accounts pane (other than at times of infrastructure changes such as planned upgrades), or
      2. a significant increase in the severity of events (as indicated by both the color:  green = low, yellow = medium, red = high, and the numerical value ranging from 1 to 100) in the Average Privileged User Activity Severity pane,

    then further investigation should be undertaken to determine if there is malicious activity occurring.

     

    To explore high levels of suspicious privileged account activity:

    31ActiveAccts.png

      1. Drill down into the event activity and select the Events drilldown:
        32Events.png

      2. View the events to determine the number and type of activity:
        33Events.png

    In this example there are a significant number of "failed password" events for the privileged "root" account that originate from the 64.111.207.2 IP address. Given that these events are all from one source it is likely that there is either a configuration error (perhaps a script attempting to access a remote service) or malicious intent behind the events.

     

    6 Going further

     

    The fundamental basis of this use case is in performance management. As part of a broader initiative the scenarios that have been presented here can be integrated into a performance management framework which ties together network, application and infrastructure monitoring. Many of these tools could also be data sources for the SIEM and provide additional context to the data gathered by traditional security logs.

     

    Consider incorporating flow data, to help determine amount of network activity when one of the scenarios (particularly the first – a flood of events causing a large EPS spike) occurs.

     

    Additionally when considering typical SOC activities the included DoS queries included in the QoS Dashboard would be beneficial for close monitoring by security personnel. These queries could easily be added to a standard SOC dashboard to enhance the view and provide timely notification of DoS type events. In addition this ties in well with McAfee's Global Threat Intelligence service to be able to determine if the source IPs are known bad actors.

     

    Automation is a key capability of the McAfee SIEM. Many of the functions can be automated via alarm actions so that when specific events occur, action is taken without the need for human intervention. Consider the account lockout scenarios listed in this document – identifying those lockout activities could automatically cause an email to be send to your helpdesk to ensure proactive service to users who have locked out their accounts.