cancel
Showing results for 
Search instead for 
Did you mean: 

Web Gateway: Understanding the Rule Engine and Optimizing your Rules

Web Gateway: Understanding the Rule Engine and Optimizing your Rules

Introduction

One of the most beautiful things about McAfee Web Gateway is its near infinite flexibility. If you can dream up an idea for a rule, you can probably make it happen. This flexibility can be a double-edged sword however - and while it allows you nearly limitless potential, it also introduces the possibility of inefficiencies in your ruleset logic. My goal with this document is to equip you with the understanding and technology such that you can get the best performance possible out of your Web Gateway.

Understanding Cycles

Before we can begin discussing rulesets, rules, and how to lay them out, we must first grasp the concept of cycles. Web Gateway has 4 different cycles. I will touch on all of them here briefly, but for the rest of the document, we’re only going to be concerning ourselves with the first three.

    • Request
    • Response
    • Embedded
    • Logging

The Request Cycle works with anything available in the initial user request. This means things like URL, Client IP, User Name (if we’re authenticating), and Headers sent by the client’s browser will be available.

The Response Cycle works with, as you’d guess, the response coming back from the web server after we’ve completed the request cycle. This will have the actual data requested, as well as any server-side headers.

The Embedded Cycle comes into play when we have an opener called in either the Request or Response Cycle. Openers allow your Web Gateway to look more deeply into content of a given type. Currently, the two openers available are:

    • Composite Opener - used for looking inside of other files -- such as .zip, .exe, etc
    • HTML Opener - very rarely used, typically only in very advanced and specialized configs

The Logging Cycle kicks off after the Request, Response and all Embedded cycles have completed -- allowing you to write to log files.

Important note about the logging cycle

If you have values in your access.log that you always want filled (for example: category information) -- you might have to create a rule at the very top of your rule sets to call the properties that fill these values, like so:

The action on this kind of population rule would be 'Continue' - as we don't want to block the traffic, but we still want all the subsequent rules to apply.

Without this kind of rule, some log fields may have blank entries if we are hitting a block action or a stop-cycle action before a specific point.

Here is an example to illustrate better what the issue is:

Your very first rule is your Global Whitelist, which contains youtube.com and the action is stop cycle. When a user goes to youtube.com, the request will be allowed, but your log files will not show the category information for youtube.com because the property URL.Categories was never called in the rules. To prevent this you can create an initialization rule above your Global Whitelist that uses the property URL.Categories.

Here is a graphical representation of how the request and response cycle work, when handling a request from a client.

As general principle, you ideally want to try and get any traffic that is going to be ‘blocked’ out of the way as soon as possible (to limit the amount of work your MWG needs to do to block the traffic).

For an example - take URL Filtering. We will have the URL and can perform URL filtering in the request cycle (which is ideally where we’d want to do it). If we were to perform URL filtering in the response cycle - we would have already retrieved the page, only to find out the request is to be blocked.

Information that we have right off the bat (URL, Client.IP, etc) should always be checked for in the request cycle if possible. If we’re checking the Client.IP or URL in the request cycle and allowing it, what’s the point of checking the same Client.IP/URL again in the response cycle? It’s simply additional work that doesn’t need to be done.

As a rule of thumb, these are the kinds of things you’d want to be doing in each cycle:

Request:

        • URL Filtering
        • Blacklisting
        • User Authentication
        • Rules based on browser-sent headers (User-Agent, etc)
        • Anti-Malware Scanning for Uploads

Response:

        • Anti-Malware Scanning
        • Media Type Filtering
        • Rules based on website-sent headers (Content-Length, etc)

Embedded:

        • Body filtering for specific content
        • Anti-Malware Scanning (if using Composite Opener to look into archives/etc)
        • Media Type Filtering

There are some other things that we will sometimes want to occur in both request and response cycles - such as Whitelisting.

Understanding Criteria

With both rulesets and rules, we can specify criteria to limit when a particular rule or ruleset will trigger. This is useful as we generally will not want to apply the same rules to all users across the board.

Some of the most common criteria used are:

  • URL / URL.Host (used for looking at URL or URL host)
  • Client.IP (IP address of client machine making request)
  • URL.Categories (Categories the requested URL falls into from the TrustedSource db)
  • Proxy.Port (The current proxy port being used - can be useful to differentiate between different clients)
  • System.Hostname (Useful if you want rules to only occur on a single Web Gateway in a cluster)

There are many more criteria available, you can see a full listing of them in the product guide, addendum A.

One important thing to keep in mind regarding criteria is that calling criteria that has not yet been filled will initiate whatever mechanism is required to fill it.

For example, the first time you call the criteria of URL.Categories - MWG will, at that point in the processing, perform a URL lookup. Likewise, the first time you call Antimalware.Infected, MWG will start it’s antimalware scanning process.

Because of this, it is very important for optimal performance that we structure our ruleset in a manner that attempts to check as many ‘cheap’ criteria as we can first -- before we resort to ‘expensive’ criteria such as Antimalware scanning. This is true both on a large scale ruleset design, as well as on a smaller-scale when dealing with multiple criteria for a rule or ruleset.

If we can block a website based on having undesirable URL categories, then your Web Gateway will never have to scan it for viruses since the traffic will already have been blocked.

Criteria by Cost/Weight

LowMediumHigh
Client.IPURL.Destination.IP*Antimalware.Infected
URL // URL.HostMedia.EnsuredTypesDLP
Proxy.IP // Proxy.PortAuthentication*HTML Opener (Event)
URL.Categories*Composite Opener (Event)
System.Hostname
HTTP Headers

* Some of these properties rely on external services like Active Directory, DNS or cloud lookups that could introduce delays beyond the control of Web Gateway

Rule Engine Logic

It is possible to combine multiple criteria together. With two criteria, the logical operators of AND and OR come into play. It’s important to note that with an AND statement, if the first criteria checked is false, it will not check the second -- and the same is true with an OR statement if the first criteria is true.

As a general rule of thumb, you will want to use the least expensive of the two as your first criteria.

AND rule (two variables)

Web Gateway will first check the Client’s IP address. If it does not match, the rule will not be applied and it would continue moving on in the ruleset.

If the Client IP was 1.2.3.4, then and only then would Web Gateway do the additional check of looking at the URL Host to see if it matched the wildcard of *abcd.com. If it did, the request would be blocked, if it didn’t, it will not apply the rule and the traffic will continue on.

We want to check the client IP for a match first, because if the user has a different IP, then we will not have to check the regex match against the URL.Host. Using a ‘matches’ action with wildcards/asterisks isn’t incredibly taxing, but it is more than a direct comparison check against the Client.IP.

OR rule (two variables)

With this ruleset, if the first parameter is true, we won’t bother to check the second since we will already have confirmed the criteria as true.

So, we will first check to see if the URL.Host matches *xyz.com. If it does, we will stop the search and apply our action (Block).

If it doesn’t match, we will proceed further and check the URL.Destination.IP (by performing a DNS lookup), and then check to see if it matches in the range. If it is in the range, we will block. If not, then the rule does not match either parameter and will not be applied.

In this example, we’re making use of a URL.Host wildcard lookup first, because the amount of work and latency introduced by doing a quick wildcard check against the URL.Host (which is a value we have right from the start) is much lower of an impact than asking your MWG to go out and do a reverse DNS lookup -- which is what the URL.Destination.IP criteria has your MWG do.

If we match *xyz.com -- there will be no need to check the second criteria because this is an OR statement.

More than two criteria

When you get beyond 2 criteria, you can involve another level of complexity -- that of parentheses (). These work just like they did in algebra class -- meaning whatever is in them will be evaluated first.

We generally suggest keeping your rules as straightforward as possible (ideally no more than 2 criteria per rule/ruleset) -- not because MWG cannot handle the complexity -- but more because dealing with incredibly complex rules can be very difficult to read for you as the administrator later on.

Compare the two sets of rules and see which is easier to understand logically.

Both of these rulesets accomplish the same thing -- the first is all done as a single rule with 4 different criteria and a couple sets of parenthesis.

The second is accomplished by splitting the logic out to multiple rules, no more than 2 criteria per ruleset. It’s also much easier to read and understand on first glance.

If the URL.Host matches *testdomain.com, then we check the proxy port. Any proxy ports other than 9090 will result in a block page. After that, we check to see if the user is not an admin user (signified by the group membership and IP range), and if they are not, they will be blocked.

One last note about criteria - if you find yourself wanting to add more than 2 of any specific thing (Client.IPs, URL.Host checks, etc) -- or if you see yourself wanting to add to them in the future, you would be well served to create a list and then use the criteria of ‘is in list’ or ‘matches in list’.

This will help to keep your criteria neat and easy to read, but allow you to have large lists of data in situations where it might be appropriate/necessary (such as whitelist/blacklists, group policy assignments, etc).

Here’s an example whitelist ruleset that makes use of lists:

Rulesets and Policy Architecture

Now that we have an understanding of Cycles, we can have a look into rulesets. Rulesets are means by which we organize our rules and sub-rulesets, and make a configuration easier to understand and manage. Rulesets are also where we specify what cycle(s) the rules and sub-rulesets within will be configured to run in. Much like with our criteria, it’s important for maximum performance to structure your ruleset in a manner that progresses from ‘least expensive’ to ‘most expensive’. While no one specific layout is necessarily ‘correct’ -- from reviewing a number configurations, a general rule of thumb would be a ruleset that looked a little something like this:

    • Whitelists/Blacklist
    • SSL Scanner
    • Authentication
    • URL Category filtering
    • Common rules (cache/progress indications/composite opener)
    • Media Type Filtering
    • Gateway Anti-Malware

Obviously, all of these are optional - as you can pick and choose what rulesets you wish to use in your web gateway configuration.

The vast majority of customers tend to go with a rather stock layout when it comes to the majority of the rulesets. By and large, the bulk of the customization comes by way of whitelist/blacklists, and applying URL Category Filtering based on criteria (Username/Group/IP/etc).

User-Defined Properties

User-Defined Properties can help you when it comes to optimizing the amount of checks that need to be done for rule evaluations. For example, group memberships in enterprise environments can get fairly complex. It is not uncommon to see users with several hundred group memberships in AD.

On the Web Gateway side, you would have to check against that long list of groups every time you need the group membership for policy assignments. Instead, you could do the check once and write the resulting policy name into a user-defined property.

Once you have your User-Defined variable set, you can decide which rules to apply based on this simple string variable instead of having to check the whole list of group memberships every time.

The check of User-Defined.URLFilteringPolicy equals "Admins" is cheaper than the check for Authentication.UserGroups contains "Administrators".

If you are interested in more information about policy mappings, please see this article:

Keep in mind that this was just one example for the usage of User-Defined Properties. You can take advantage of this feature every time you need to temporarily store information for later use. User-Defined properties persist for the duration of a transaction (request + response + logging).

Conclusion

You should now have a better understanding of how MWG works with cycles, logic and criteria -- and can use this knowledge to help weed out the inefficiencies in your configuration.

Takeaways:

  • No more than 2 criteria per rule (for easy administration!)
  • Remember cheap vs expensive criteria! Block as much ‘cheaply’ as you can.
  • Use appropriate cycles for your rules. There’s no need to run a URL-Category rule in the response cycle, since we could have blocked it in the request and saved the time and bandwidth!
Comments
Regis

Very informative and well written article.

One nit that might trigger a minor edit to remove a word:  " than asking your MWG to go out and do a reverse DNS lookup -- which is what the URL.Destination.IP criteria has your MWG do"     I believe that'd just a regular forward DNS lookup that URL.Destination.IP is doing, no?


Thanks again for the content.


You are correct sir, I'll talk to Justin about correcting that.

jacek

For example: I do a DNS lookup, by checking URL.Destination.IP

I do this twice or more times in all rule sets.

I'm not sure if DNS lookup was performed once, will it be cached "in memory", or DNS server will be asked for DNS resolution twice or more?

The same: Antimalware.Infected

I'm checking for malware twice (checking Antimalware.Infected). Will second query consume so many time that the first one or it will be cached and answered immediately?

I'm thinking about MWG 7.6.

Regis

Pay heed to these tips, even if you have no desire for hotrodding.  You can use them to avoid slowrodding your rulesets too.  :-)

A couple concepts covered in this article recently bit me in the /dev/null  very hard because I hadn't paid attention to it when slapping together a rule that leveraged MediaType.EnsuredTypes  (listed as a medium impact method) and I'd screwed up the order of an AND clause.        

The rule I'd brilliantly crafted was  an "whitelist streaming media for host XXXX" rule.  It had 3 clauses and I did them in this order:  MeadiaTypes.EnsuredTypes at least one in list (a list of 20 or so streaming media types) ,  client.ip equals (ip of a latency sensitive webcasting box) and  URL.host   matches a list (sites used by the webcasting provider) and action Stop cycle. This was up top in the Global whitelist rule. 

The net result was that I was boneheadedly evaluating the media type of EVERY SINGLE FILE FROM EVERY SINGLE REQUEST GOING THROUGH THE WEB GATEWAY.    Amazingly, we didn't notice the performance issue this caused for over a year.    Support found it after everyone was banging their heads on the desk for several days when load got large enough for the 5000b hardware to notice (Windows 10 update downloads were involved.  There was wailing and gnashing of teeth).

The fix was simple:  just move the the mediatype.ensuretypes clause to the END of the AND's and lead the rule with the client.ip and URL.host clauses.   That way the mediatype is only evaluated for all file for that one host for just those destinations.

jacek

I understand that criteria order is very important, but it wasn't my question.

I need to use URL.Destination.IP three times in single cycle - once in Global Whitelist, once in Bandwidth throttling and once in URL Filtering rulesets.

Does Web Gateway make DNS resolution once (it will do DNS query in first occurrence and cache result for next lookups in this cycle) or 3 times?

feickholt

If you are not sure:   store the first result in an user-defined Variable.... and use this one.

Regis

My comment wasn't in reply to your question but another comment in support of the article. 

My suspicion is that yes if you do the operations twice in your ruleset and for some reason both are evaluated before hitting a block that you're having a performance hit.  Rule tracing central will tell ya I reckon.  itll tell you how much time too. 

deathbywedgie

I was once told by a McAfee employee (I wish I could remember who) that he was pretty sure that MWG retained the result of each comparison so that it didn't have to recalculate if the same comparison comes up again later in the policy. This would seem to explain why many properties contain no values when used in log rules unless they are evaluated in the policy... the log rules do not actually analyze the event, thus they can't generate the value, but they are able to tap into the "cached" results.

Is there anyone on here who might be able to confirm? If a huge comparison is performed, such as comparing URL.Destination.IP to a giant list, I'd certainly like to know that it doesn't have equal impact if it's mistakenly evaluated more than once.

(A more practical example would be if someone accidentally has an AV rule in two places for the same cycle, but since AV is its own beast it might work differently, and the engine itself might do more caching than the rules engine.)

Can confirm MWG will perform semi-intelligent (if not intelligent) caching of property evaluation for the transaction cycle.

So if you use the property URL.Destination.IP multiple times, MWG will not perform a DNS lookup multiple times. Same goes for AV, if we have multiple AV rules using the same settings in succession, we'll use the cached property result to reduce any performance hits.

Perhaps in the early days (7.0) caching wasnt as robust, but around 7.2 more robust caching of results was added.

Edit: Property evaluation caching (i.e. What is the URL.Destination.IP?) wouldnt have an impact on property to list evaluation (Is this URL.Destination.IP in this giant list?). The former is cachable (as mentioned), but the latter requires that we perform the property-to-list comparison everytime. So getting the IP will not be needed multiple times, but checking a giant list will still need to be performed.

Best Regards,

Jon

dohi1

Thank you so much for your useful topic.

It helped me a lot to optimize my system in the future.

Version history
Revision #:
1 of 1
Last update:
‎05-01-2013 06:21 AM
Updated by: