Best Practices: Creating URL related list entries

     

    Introduction

     

    I have written the following guide to help understand the use of properties within rules, and how to formulate list entries to go with the corresponding rules. For example, a common question I get a lot is... I added [INSERT-SITE-HERE].com to [INSERT-LIST-HERE], but the site is still blocked---why isn't the whitelist entry working as expected?  Understanding the rule criteria is essential in managing the Web Gateway's rules and how they apply. This article will attempt to simplify some very common examples and explain use cases of certain properties. To start I will be focusing on URL based properties only.

     

     

    Best Practices

     

    If you read any piece of this document, please at least read this section. After you read this, you can use the "Good/Bad Examples" for further detail and reference. The below examples outline use cases for the most commonly used URL related properties.

     

    URL.SmartMatch Purpose

     

    The URL.SmartMatch property was created to allow for greater flexibility and usability. For example, URL.SmartMatch has relaxed syntax requirements whereas other properties, such as URL, URL.Host, URL.Domain, have specific syntax requirements and require the management of multiple lists--one list needed per property.  With URL.SmartMatch, those lists can be combined into a single list.  The URL.SmartMatch property will accept the list as "input", and return TRUE if the given URL, URL host, or URL path variation was found in the list.  In past experience we have found that customers often have to deal with multiple lists which have entries in differing formats.  The URL.SmartMatch property was created to help accommodate these variations thereby reducing the need to manage multiple lists.

    URL.SmartMatch

     

    The URL.SmartMatch property was introduced in version 7.4.1. Similarly to the URL.HostBelongsToDomains property it was designed to simplify the whitelisting process.

     

    SmartList entries can be entered in the form of Host, Domain, URL, or Fragment of the URL. Example entries (wildcards "*" assumed on both sides of the entry):

      • host.domain.tld
        • Is equivalent to URL.Host matches *.host.domain.tld or host.domain.tld
      • domain.tld
        • Is equivalent to URL.Host matches *.domain.tld or domain.tld
      • http://domain.tld
      • domain.tld/path
        • Is equivalent to URL matches *domain.tld/path*
      • /path
        • Is equivalent to URL matches */path*

     

    8.0.0_smartmatch.png

     

    Good

     

    Entries in Good: URL SmartMatch List

     

    Entry: mcafee.com

    Why it's good: Using this entry, it would correctly match for all mcafee.com subdomains, including mcafee.com, www.mcafee.com, secure.mcafee.com, etc...

     

    Entry: mcafee.com/us/products/

    Why it's good: Using this entry would allow content from the 'mcafee.com' domain, which includes the path of '/us/products/'.

     

    Entry: http://mcafee.com

    Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains.

     

    Entry: http://www.mcafee.com/

    Why it's good: Using this entry will allow only HTTP access to all 'www.mcafee.com'.

     

    Entry: http://mcafee.com/us/products/

    Why it's good: Using this entry would allow content from the 'mcafee.com' domain, which includes the path of '/us/products/'.

     

    Entry: mcafee.com:80

    Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains on port 80.

     

    Entry: mcafee.com:80/

    Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains on port 80.

     

    Entry: http://mcafee.com:80/

    Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains on port 80 if it's HTTP.

     

    Entry: mcafee.com.

    Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains.

     

    8.0.1_smartmatch.png

    Bad

     

    Entries in Bad: URL SmartMatch List

     

    Entry: /us/products/

    Why it's bad: Using this entry could potentially match on other hosts which contain the path '/us/products/', example: http://maliciousdomain.mwginternal.com/us/products/.

     

    Entry: http://www.mcafee.com:8080/

    Why it's bad: This entry would not match because request is for port 80, however entry has port 8080.

     

    Entry: http://download.mcafee.com/

    Why it's bad: Subdomain is 'www' not 'download'.

     

    Entry: *.mcafee.com

    Why it's bad: Wildcards are not used in URL.SmartMatch entries.

     

    Entry: *.mcafee.com/*

    Why it's bad: Wildcards are not used in URL.SmartMatch entries.

     

    Entry: .mcafee.com

    Why it's bad: Leading period causes entry to not match.

     

    8.0.2_smartmatch.png

     

     

     

    Example URL Breakdown

     

    Example URL

    http://www.mcafee.com/us/products/web-gateway.aspx

     

    The following shows examples of how the Example URL above could be whitelisted when using various properties.  Please notice the syntax flexibility of the URL.SmartMatch property.

     

    URL.SmartMatch (example entries)

    mcafee.com

    http://mcafee.com

    http://www.mcafee.com/

    http://mcafee.com/us/products/

    /us/products/

    mcafee.com/us/products/

    URL

    http://www.mcafee.com/us/products/web-gateway.aspx

     

    URL.Host

    www.mcafee.com

     

    URL.Domain (7.4+)

    mcafee.com

     

    URL.Host.BelongsToDomains (example entry)

    mcafee.com

    URL.Protocol

    http

     

    URL.Path

    /us/products/web-gateway.aspx

     

     

    Operator importance

     

    is in list

    Use of "is in list" implies exact string match. Wildcard characters will be interpretted as literal strings.

     

    matches in list

    Use of "matches in list" allows for wildcard matches. Although wildcard characters are accepted, they are not completely necessary.

     

     

    Good/Bad Examples by Property

     

    The following examples below are listed by property used in the rule along with the corresponding operator.

     

    URL using "is in list"

     

    Using the property "URL", implies that you will create list entries which take into account the full URL. Using the operator "is in list" implies an exact string match.

     

    2.0.0_url_isinlist.png

    Good

     

    Entries in "Good: URL String List"

     

    Entry: http://www.mcafee.com/us/products/web-gateway.aspx

    Why it's good: Full URL is used as it is needed due to "is in list" operator.

     

    2.0.1_url_isinlist.png

     

    Bad

     

    Entries in "Bad: URL String List"

     

    Entry: www.mcafee.com/us/products/web-gateway.aspx

    Why it's bad: The entry doesn't include the protocol information (http://). The URL property evaluates the full URL and the operator "is in list", implies exact string match.

     

    2.0.2_url_isinlist.png

     

     

    URL using "matches in list"

     

    Using the property "URL" implies that you will create list entries which take into account the full URL. Using the operator "matches in list" allows for wildcard matches.

     

    2.1.0_url_matchesinlist.png

     

    Good

     

    Entries in "Good: URL Wildcard List"

     

    Entry: http://www.mcafee.com/*

    Why it's good: This entry contains a trailing wildcard which will allow any HTTP request to www.mcafee.com. However, it will not match on requests for http://mcafee.com/.


    Entry: regex(^htt(p|ps):\/\/([\w.-]*\.|\.?)mcafee\.com(\/.*|\/?))

    Why it's good: This entry is a bit more complex as it uses regular expressions. This entry will allow any request, HTTP or HTTPS, to mcafee.com and it's subdomains.

     

    Entry: regex(^htt(p|ps):\/\/([\w.-]*\.|\.?)mcafee\.(com|co\.uk)(\/.*|\/?))

    Why it's good: This entry is the same as the previous entry but demonstrates how you can allow other top level domains, such as '.com' or '.co.uk'.

     

    MOVED TO BAD (thanks to mcnag for pointing out the error)

    Entry: regex(htt(p|ps)://(.*\.|\.?)mcafee.com(\/.*|\/?))

    Why it's good: This entry is a bit more complex as it uses regular expressions. This entry will allow any request, HTTP or HTTPS, to mcafee.com and it's subdomains.

     

    MOVED TO BAD (thanks to mcnag for pointing out the error)

    Entry: regex(htt(p|ps)://(.*\.|\.?)mcafee.(com|co.uk)(\/.*|\/?))

    Why it's good: This entry is the same as the previous entry but demonstrates how you can allow other top level domains, such as '.com' or '.co.uk'.

     

    2.1.1_url_matchesinlist.png

     

    Bad

     

    Entries in "Bad: URL Wildcard List"

     

    Entry: *.mcafee.com*

    Why it's bad: Using this entry, the entry could match on another string within the URL, for example: http://malicious-download-site.mwginternal.com/malicious-file.exe?url=www.mcafee.com


    Entry: regex(htt(p|ps)://(.*\.|\.?)mcafee.com(\/.*|\/?))

    Why it's bad: This entry is a bit more complex as it uses regular expressions. This entry will allow any request, HTTP or HTTPS, to mcafee.com and it's subdomains. However, the entry could match on another string within the URL, for example: http://malicious-download-site.mwginternal.com/malicious-file.exe?url=www.mcafee.com

     

    Entry: regex(htt(p|ps)://(.*\.|\.?)mcafee.(com|co.uk)(\/.*|\/?))

    Why it's bad: The entry could match on another string within the URL, for example: http://malicious-download-site.mwginternal.com/malicious-file.exe?url=www.mcafee.com

     

    2.1.2_url_matchesinlist.png

     

     

    URL.Host using "is in list"

     

    Using the property "URL.Host" implies that you will create list entries which take into account only the domain portion of the URL. Using the operator "is in list" implies an exact string match.

     

    3.0.0_urlhost_isinlist.png

     

    Good

     

    Entry in "Good: URL.Host String List"

     

    Entry: www.mcafee.com

    Why it's good: The domain of the requested URL is 'www.mcafee.com' which is an uses exact string match.

     

    3.0.1_urlhost_isinlist.png

     

    Bad

     

    Entries in "Bad: URL.Host String List"

     

    Entry: mcafee.com

    Why it's bad: The entry value is incorrect (mcafee.com), the actual property value is 'www.mcafee.com'.

     

    Entry: *.mcafee.com

    Why it's bad: The operator is "is in list" which implies an exact string match, wildcards will not match.

     

    Entry: *.mcafee.com/us*

    Why it's bad: The URL.Host property is limited only to the domain portion of the URL, not the path (/us). In addition, the operator "is in list" which implies an exact string match, wildcards will not match.

         

        3.0.2_urlhost_isinlist.png

         

        URL.Host using "matches in list"

         

        Using the property "URL.Host" implies that you will create list entries which take into account only the domain portion of the URL. Using the operator "matches in list" allows for wildcard match.

         

        3.1.0_urlhost_matchesinlist.png

         

        Good

         

        Entries in "Good: URL.Host Wildcard List"

         

        Entry: mcafee.com

        Why it's good: This entry will not match for 'www.mcafee.com' but if you intend to allow access to mcafee.com (no www) you will need it unless you use regular expressions.

         

        Entry: *.mcafee.com

        Why it's good: This entry will match on any subdomain of mcafee.com (but not actually mcafee.com itself).


        Entry: regex((.*\.|\.?)mcafee\.com)

        Old (bad) entry: regex((.*\.|\.?)mcafee.com) -- (Thanks to darkfell for pointing out the error)

        Why it's good: This single entry uses regular expressions and will allow both mcafee.com and any subdomains of mcafee.com.

         

        3.1.1_urlhost_matchesinlist.png

         

        Bad

         

        Entries in "Bad: URL.Host Wildcard List"

         

        Entry: *.mcafee.com*

        Why it's bad: Using this entry, the entry could match on another string within the URL, for example: http://www.mcafee.com.malicious-download-site.mwginternal.com/

         

        Entry: *.mcafee.com/us*

        Why it's bad: URL.Host property is limited only to the domain portion of the URL is acceptable, not the path (/us).

           

          3.1.2_urlhost_matchesinlist.png

           

          URL.Domain vs. URL.Host.BelongsToDomains

           

          The URL.Domain property was introduced in 7.4. It was a property designed to be more consistent with other URL related properties (URL.Host, URL, etc...). It acts nearly identically to that of URL.Host.BelongsToDomains, but does not require a list as a setting, instead the list can be the operand.

           

          URL.Domain is a string property which contains the top level domain of the requested URL (i.e. "mcafee.com").

          7.0.0_urldomain_isintlist.png

           

          URL.Host.BelongsToDomains<ListName> is a boolean property which returns true if the URL's top level domain is in the list specified on the rule (ListName). If the domain of the URL is not in the list, the property returns false.

          7.1.0_urlhost_belongs.png

           

          URL.Domain using "is in list"

           

          Using the property "URL.Domain" implies that you will create list entries which take into account just the top level domain of the URL. Using the operator "is in list" implies an exact string match.

           

          6.0.0_urldomain_isintlist.png

           

          Good

           

          Entries in "Good: URL.Domain String List"

           

          Entry: mcafee.com

          Why it's good: URL.Domain will simply equal "mcafee.com".

           

          6.0.1_urldomain_isintlist.png

           

           

          Bad

           

          Entries in "Bad: URL.Domain String List"

           

          Entry: www.mcafee.com

          Why it's bad: URL.Domain is "mcafee.com", not "www.mcafee.com". Use URL.Host instead.

           

          Entry: *.mcafee.com

          Why it's bad: URL.Domain equals "mcafee.com", so "*." would prevent matching. "is in list" implies a string, not a wildcard.

           

          6.0.2_urldomain_isintlist.png

           

           

          URL.Domain using "matches in list"

           

          Using the property "URL.Domain" implies that you will create list entries which take into account just the top level domain of the URL. Using the operator "matches in list" allows for wildcard matches.

           

          6.1.0_urldomain_matchesintlist.png

           

          Good

           

          Entries in "Good: URL.Domain Wildcard List"

           

          Entry: regex(mcafee\.(com|co\.uk))

          Old Entry: regex(mcafee.(com|co.uk)) -- (Thanks to darkfell for pointing out the error)

          Why it's good: URL.Domain equals "mcafee.com" so it will match. "mcafee.co.uk" will also match.

           

          6.1.1_urldomain_matchesintlist.png

           

           

          Bad

           

          Entries in "Bad: URL.Domain Wildcard List"

           

          Entry: *.mcafee.com

          Why it's bad: URL.Domain of "mcafee.com" will not match due to the "*.".

           

          Entry: *mcafee.com

          Why it's bad: It will match on "mcafee.com", BUT it could match on "maliciousdomainmcafee.com" too.

           

          6.1.2_urldomain_matchesintlist.png

           

           

          URL.Host.BelongsToDomains

           

          The URL.Host.BelongsToDomains property was introduced in 7.2. It was designed to simplify the complexity of adding list entries. Using the property "URL.Host.BelongsToDomains" allows you to simply enter the domain of interest.

           

          So if you wish to white list all mcafee.com sites (including subdomains), you can simply enter mcafee.com, there is no need to worry about wildcards.

           

          4.0.0_urlhost_belongs.png

          Good

           

          Entries in "Good: Only Domain List"

           

          Entry: mcafee.com

          Why it's good: Using this entry, it would correctly match for all mcafee.com subdomains, including mcafee.com, www.mcafee.com, secure.mcafee.com, etc...

           

          Entry: www.mcafee.com

          Why it's good: Using this entry, it would correctly match only for www.mcafee.com subdomains. It would not allow other subdomains of the top domain 'mcafee.com'. This is useful in case you wanted to allow a subdomain, but not the entire domain.

           

          4.0.1_urlhost_belongs.png

           

          Bad

           

          Entries in "Bad: Only Domain List"

           

          Entry: *.mcafee.com

          Why it's bad: Using URL.Host.BelongsToDomains does not need wildcards, the property requires an exact domain match such as 'www.mcafee.com' or the top domain 'mcafee.com'.

             

            4.0.2_urlhost_belongs.png

             

             

             

            Test Ruleset

             

            You can use the test ruleset in your own environment to see how it works! The test ruleset will work in versions 7.4.1+.

             

             

            Conclusion

             

            From the examples, it should be clear that the cleanest/easiest way to create domain based whitelist entries is through the use of the "URL.SmartMatch" property. I hope this helps clarify use cases for the various URL related properties, perhaps it will help with understanding other properties as well.

             

             

            Changelog


            2014-10-28 - URL related regex entries were invalid. Updated examples to be http:// instead of hxxp://. Added information regarding URL.SmartMatch property.