9.2 is in the early adopter's hands, so it's time to let the world know about some of the very cool things the McAfee SIEM can do now.  The documentation is a must read for the how, this post is meant to introduce you to the why of some advanced correlation features that will be shipped with the product.


Correlation rules will look a bit different, for example:


Notice the blue components: these are instances of the new deviation component.  If you look closer at the toolbar, you'll notice something different as well.

new toolbar.png

Second from the right, you see the icon for the deviation component.  When you drag it onto the panel and edit it, you see this window:


deviation component.png

NOW we're talking.


This is where you can see the two biggest adds in the 9.2 release for correlation: you can now correlate on flows and events, and you can now set components to fire based on deviations from the norm.  Baselines have been in the product for a while, but now deviations from the baseline can be the basis of correlation rules.  From a use case perspective, this enables network anomaly detection, user anomaly detection, even combinations of the two!  You can also use the other components to add context or other supporting conditions, to tune out random noise and false positives.


You can expect a stream of use cases and implementation guides from us in the coming months, but it doesn't hurt to start with some background on how to detect anomalous behavior.  The deviation component gives you a lot of options for picking out something unusual in the stream of events, changing any of them can give you a very different means of detection. Think of one instance of one deviation component as an indicator (some call it an observable).  It's a unit of behavior, something you can capture in a sentence that can be answered with a "yes" or "no".  In the component above that sentence would be: "An unusual increase in the amount of traffic leaving a host (outlier bytes).  This corresponds to data exfiltration in the APT killchain or an active botnet member performing its assignment.  I'll go through the options top to bottom and talk about how they change the indicator.




Events vs. Flows: What you select depends on what data sources you need for your threat model.  If you are looking for deviations in events that have a clear footprint in a log (even combinations of them) you would select events.  If you were looking for anomalies in traffic (no clear footprint in a log) you would select flows.  Your choice of events and flows will impact what choices are available in the the filter and deviation field options.




Filter:  This is actually a second filter, but possibly the most challenging option on the component.  This options filters events or flows BEFORE the deviation logic is applied.  If you don't filter these, then the deviation logic will be applied on practically ALL the events or flows.  If you have multiple rules (who doesn't?) using the deviation components (who wouldn't?), you could lead yourself to a performance issue.  No doubt there are some use cases that may require this.  This is where looking at your indicator definition helps.  My input would be that if you are NOT filtering these events or flows, your indicator may be a little too vague.  My indicator here is "traffic leaving a host", with outbound traffic to an external network implied.  Here the filter is explicit.



Deviation Type and Threshold: Similar to the features available in an alarm, these are alternatives to comparisons like =, >, <, etc.  Raw value allows you to look at deviations over a specific amount, which requires you to have some analysis available that gives you those numbers.  If you don't have that, the others might be a better choice to determine "unusual".   While you can get a more detailed explanation here, you use standard deviation for an indicator to identify when a value in a set of data falls far outside the range of expected values.  What is cool about the correlation in the SIEM is that you can group by things like source user.  Now you have an indicator that tells you when a value is unusual for that user, which is much more powerful.  Statistics can give you gems like "the average US family has 2.3 kids", but the group by functionality gives you something much more meaningful and powerful.  Power users would alarm on an arbitrary theshold, but they would not trigger on a deviation threshold grouped by user.  Infrequent users would fall below an arbitrary average threshold, but even small changes in their usage pattern would trigger on a deviation threshold grouped by user (there is a way around this, future blog post for sure).


Besides the deviation type, you have to pick a threshold. This is a measure of how unusual the value you are looking for is in the scheme of things.  The graphic below of the beloved bell curve gives you an idea of where your choice will fall on the curve, I usually go with 2.0 and adjust from there.  This triggers on things that fall outside of 95% of the occurrences.  As you can see percent is a close cousin of standard deviation, but with less math background required.  You can look up the 68-95-99.7 rule for that relationship.




Deviation Operator: This is closely related to the deviation threshold.  Looking at the diagram above you see that standard deviation is a symmetric thing, you can go n standard deviations above and n standard deviations below.  The question is: do you want that?  If you are looking for unusual upticks, then you would select "Greater Than".  If you were looking for unusual downswings, you go with "Less Than".  Again, the indicator should be specific enough to make this choice a no-brainer.



Calculation Type:  Differentiating these options could be an entire post or series of posts.  Putting it in terms of how you implement an indicator: average per event looks the individual event for some outlier attribute picking out surges in the stream; total sum looks at buckets and picks out unusually large or small ones; cardinality tells you if you are looking at an unusual variety than an unusual number.  You can expect illustrative use cases and threat models in coming posts.  For now, let's take a knee together and say that Total Sum and Cardinality are your best bets.  Whether you go with one or the other depends on your indicator: if you can say something like "count" or "quantity" to describe it, go with Total Sum; if you can say something like "distinct" or "variety".  If you think of a threat model as something composed of indicators, a good threat model will have some indicators that use Total Sum, and some indicators that use Cardinality.



Deviation Field: Your choices here will be determined by whether you selected Events, Flows, or both at the top of the deviation component.  This is what you measuring for unusual; since we were looking at outbound traffic, destination bytes is the way to go.  The work you put into the indicator should drive you to your choice of field.  I can't say that is easy, but it is made possible by defining the behavior well and knowing the data well.  These are not always available at the same time, we hope to add content on our rules server to help out in this respect.



Sample Size: Statistical measures in themselves are a bit oblique in how they describe data.  The key piece in making statistical measures work for threat detection is to make them time-based.  By this I mean, that the time period that you choose to compare events helps you tie numbers to behavior.  The time range you choose here causes the data to be put in buckets based on time and then calculations performed on them.  It is key for tuning false positives.  For instance, for user behavior indicators, I find that going with 7 days is a solid sample size.  We aren't robots, we don't do the same activities the same amount every day.  When you go up to a week, this smooths out.  For machine behavior indicators, a week is "too smooth".  Everything will look normal over a long enough time period, go for a day or even an hour in this case.  The deviation component sets the sample size, so you CAN and SHOULD have different sample sizes for different components in your rule.  I gave the example of user vs. machine behavior, there are many other things to consider.


I have given an overview of the new deviation component, and how you can use network flows with it as well.  The use case drives the threat model which drives the indicators, but it helps to understand what choices you have in shaping those indicators.


Grant Babb