cancel
Showing results for 
Search instead for 
Did you mean: 

Web Gateway: Understanding URL related Properties

Web Gateway: Understanding URL related Properties

Introduction

I have written the following guide to help understand the use of properties within rules, and how to formulate list entries to go with the corresponding rules. For example, a common question I get a lot is... I added [INSERT-SITE-HERE].com to [INSERT-LIST-HERE], but the site is still blocked---why isn't the whitelist entry working as expected?  Understanding the rule criteria is essential in managing the Web Gateway's rules and how they apply. This article will attempt to simplify some very common examples and explain use cases of certain properties. To start I will be focusing on URL based properties only.

Best Practices

If you read any piece of this document, please at least read this section. After you read this, you can use the "Good/Bad Examples" for further detail and reference. The below examples outline use cases for the most commonly used URL related properties.

URL.SmartMatch Purpose

The URL.SmartMatch property was created to allow for greater flexibility and usability. For example, URL.SmartMatch has relaxed syntax requirements whereas other properties, such as URL, URL.Host, URL.Domain, have specific syntax requirements and require the management of multiple lists--one list needed per property.  With URL.SmartMatch, those lists can be combined into a single list.  The URL.SmartMatch property will accept the list as "input", and return TRUE if the given URL, URL host, or URL path variation was found in the list.  In past experience we have found that customers often have to deal with multiple lists which have entries in differing formats.  The URL.SmartMatch property was created to help accommodate these variations thereby reducing the need to manage multiple lists.

URL.SmartMatch

The URL.SmartMatch property was introduced in version 7.4.1. Similarly to the URL.HostBelongsToDomains property it was designed to simplify the whitelisting process.

SmartList entries can be entered in the form of Host, Domain, URL, or Fragment of the URL. Example entries (wildcards "*" assumed on both sides of the entry):

    • host.domain.tld
      • Is equivalent to URL.Host matches *.host.domain.tld or host.domain.tld
    • domain.tld
      • Is equivalent to URL.Host matches *.domain.tld or domain.tld
    • http://domain.tld
    • domain.tld/path
      • Is equivalent to URL matches *.domain.tld/path* or domain.tld/path*
    • /path
      • Is equivalent to URL matches */path*

8.0.0_smartmatch.png

Good

Entries in Good: URL SmartMatch List

Entry: mcafee.com

Why it's good: Using this entry, it would correctly match for all mcafee.com subdomains, including mcafee.com, www.mcafee.com, secure.mcafee.com, etc...

Entry: mcafee.com/us/products/

Why it's good: Using this entry would allow content from the 'mcafee.com' domain, which includes the path of '/us/products/'.

Entry: http://mcafee.com

Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains.

Entry: http://www.mcafee.com/

Why it's good: Using this entry will allow only HTTP access to all 'www.mcafee.com'.

Entry: http://mcafee.com/us/products/

Why it's good: Using this entry would allow content from the 'mcafee.com' domain, which includes the path of '/us/products/'.

Entry: mcafee.com:80

Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains on port 80.

Entry: mcafee.com:80/

Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains on port 80.

Entry: http://mcafee.com:80/

Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains on port 80 if it's HTTP.

Entry: mcafee.com.

Why it's good: Using this entry will allow only HTTP access to all 'mcafee.com' and subdomains.

8.0.1_smartmatch.png

Bad

Entries in Bad: URL SmartMatch List

Entry: /us/products/

Why it's bad: Using this entry could potentially match on other hosts which contain the path '/us/products/', example: http://maliciousdomain.mwginternal.com/us/products/.

Entry: http://www.mcafee.com:8080/

Why it's bad: This entry would not match because request is for port 80, however entry has port 8080.

Entry: http://download.mcafee.com/

Why it's bad: Subdomain is 'www' not 'download'.

Entry: *.mcafee.com

Why it's bad: Wildcards are not used in URL.SmartMatch entries.

Entry: *.mcafee.com/*

Why it's bad: Wildcards are not used in URL.SmartMatch entries.

Entry: .mcafee.com

Why it's bad: Leading period causes entry to not match.

8.0.2_smartmatch.png

Example URL Breakdown

Example URL

http://www.mcafee.com/us/products/web-gateway.aspx

The following shows examples of how the Example URL above could be whitelisted when using various properties.  Please notice the syntax flexibility of the URL.SmartMatch property.

URL.SmartMatch (example entries)

mcafee.com

http://mcafee.com

http://www.mcafee.com/

http://mcafee.com/us/products/

/us/products/

mcafee.com/us/products/

URL

http://www.mcafee.com/us/products/web-gateway.aspx

URL.Host

www.mcafee.com

URL.Domain (7.4+)

mcafee.com

URL.Host.BelongsToDomains (example entry)

mcafee.com

URL.Protocol

http

URL.Path

/us/products/web-gateway.aspx

Operator importance

is in list

Use of "is in list" implies exact string match. Wildcard characters will be interpretted as literal strings.

matches in list

Use of "matches in list" allows for wildcard matches. Although wildcard characters are accepted, they are not completely necessary.

Good/Bad Examples by Property

The following examples below are listed by property used in the rule along with the corresponding operator.

URL using "is in list"

Using the property "URL", implies that you will create list entries which take into account the full URL. Using the operator "is in list" implies an exact string match.

2.0.0_url_isinlist.png

Good

Entries in "Good: URL String List"

Entry: http://www.mcafee.com/us/products/web-gateway.aspx

Why it's good: Full URL is used as it is needed due to "is in list" operator.

2.0.1_url_isinlist.png

Bad

Entries in "Bad: URL String List"

Entry: www.mcafee.com/us/products/web-gateway.aspx

Why it's bad: The entry doesn't include the protocol information (http://). The URL property evaluates the full URL and the operator "is in list", implies exact string match.

2.0.2_url_isinlist.png

URL using "matches in list"

Using the property "URL" implies that you will create list entries which take into account the full URL. Using the operator "matches in list" allows for wildcard matches.

2.1.0_url_matchesinlist.png

Good

Entries in "Good: URL Wildcard List"

Entry: http://www.mcafee.com/*

Why it's good: This entry contains a trailing wildcard which will allow any HTTP request to www.mcafee.com. However, it will not match on requests for http://mcafee.com/.

Entry: regex(^htt(p|ps):\/\/([\w.-]*\.|\.?)mcafee\.com(\/.*|\/?))

Why it's good: This entry is a bit more complex as it uses regular expressions. This entry will allow any request, HTTP or HTTPS, to mcafee.com and it's subdomains.

Entry: regex(^htt(p|ps):\/\/([\w.-]*\.|\.?)mcafee\.(com|co\.uk)(\/.*|\/?))

Why it's good: This entry is the same as the previous entry but demonstrates how you can allow other top level domains, such as '.com' or '.co.uk'.

MOVED TO BAD (thanks to for pointing out the error)

Entry: regex(htt(p|ps)://(.*\.|\.?)mcafee.com(\/.*|\/?))

Why it's good: This entry is a bit more complex as it uses regular expressions. This entry will allow any request, HTTP or HTTPS, to mcafee.com and it's subdomains.

MOVED TO BAD (thanks to for pointing out the error)

Entry: regex(htt(p|ps)://(.*\.|\.?)mcafee.(com|co.uk)(\/.*|\/?))

Why it's good: This entry is the same as the previous entry but demonstrates how you can allow other top level domains, such as '.com' or '.co.uk'.

2.1.1_url_matchesinlist.png

Bad

Entries in "Bad: URL Wildcard List"

Entry: *.mcafee.com*

Why it's bad: Using this entry, the entry could match on another string within the URL, for example: http://malicious-download-site.mwginternal.com/malicious-file.exe?url=www.mcafee.com

Entry: regex(htt(p|ps)://(.*\.|\.?)mcafee.com(\/.*|\/?))

Why it's bad: This entry is a bit more complex as it uses regular expressions. This entry will allow any request, HTTP or HTTPS, to mcafee.com and it's subdomains. However, the entry could match on another string within the URL, for example: http://malicious-download-site.mwginternal.com/malicious-file.exe?url=www.mcafee.com

Entry: regex(htt(p|ps)://(.*\.|\.?)mcafee.(com|co.uk)(\/.*|\/?))

Why it's bad: The entry could match on another string within the URL, for example: http://malicious-download-site.mwginternal.com/malicious-file.exe?url=www.mcafee.com

2.1.2_url_matchesinlist.png

URL.Host using "is in list"

Using the property "URL.Host" implies that you will create list entries which take into account only the domain portion of the URL. Using the operator "is in list" implies an exact string match.

3.0.0_urlhost_isinlist.png

Good

Entry in "Good: URL.Host String List"

Entry: www.mcafee.com

Why it's good: The domain of the requested URL is 'www.mcafee.com' which is an uses exact string match.

3.0.1_urlhost_isinlist.png

Bad

Entries in "Bad: URL.Host String List"

Entry: mcafee.com

Why it's bad: The entry value is incorrect (mcafee.com), the actual property value is 'www.mcafee.com'.

Entry: *.mcafee.com

Why it's bad: The operator is "is in list" which implies an exact string match, wildcards will not match.

Entry: *.mcafee.com/us*

Why it's bad: The URL.Host property is limited only to the domain portion of the URL, not the path (/us). In addition, the operator "is in list" which implies an exact string match, wildcards will not match.

      3.0.2_urlhost_isinlist.png

      URL.Host using "matches in list"

      Using the property "URL.Host" implies that you will create list entries which take into account only the domain portion of the URL. Using the operator "matches in list" allows for wildcard match.

      3.1.0_urlhost_matchesinlist.png

      Good

      Entries in "Good: URL.Host Wildcard List"

      Entry: mcafee.com

      Why it's good: This entry will not match for 'www.mcafee.com' but if you intend to allow access to mcafee.com (no www) you will need it unless you use regular expressions.

      Entry: *.mcafee.com

      Why it's good: This entry will match on any subdomain of mcafee.com (but not actually mcafee.com itself).

      Entry: regex((.*\.|\.?)mcafee\.com)

      Old (bad) entry: regex((.*\.|\.?)mcafee.com) -- (Thanks to for pointing out the error)

      Why it's good: This single entry uses regular expressions and will allow both mcafee.com and any subdomains of mcafee.com.

      3.1.1_urlhost_matchesinlist.png

      Bad

      Entries in "Bad: URL.Host Wildcard List"

      Entry: *.mcafee.com*

      Why it's bad: Using this entry, the entry could match on another string within the URL, for example: http://www.mcafee.com.malicious-download-site.mwginternal.com/

      Entry: *.mcafee.com/us*

      Why it's bad: URL.Host property is limited only to the domain portion of the URL is acceptable, not the path (/us).

        3.1.2_urlhost_matchesinlist.png

        URL.Domain vs. URL.Host.BelongsToDomains

        The URL.Domain property was introduced in 7.4. It was a property designed to be more consistent with other URL related properties (URL.Host, URL, etc...). It acts nearly identically to that of URL.Host.BelongsToDomains, but does not require a list as a setting, instead the list can be the operand.

        URL.Domain is a string property which contains the top level domain of the requested URL (i.e. "mcafee.com").

        7.0.0_urldomain_isintlist.png

        URL.Host.BelongsToDomains<ListName> is a boolean property which returns true if the URL's top level domain is in the list specified on the rule (ListName). If the domain of the URL is not in the list, the property returns false.

        7.1.0_urlhost_belongs.png

        URL.Domain using "is in list"

        Using the property "URL.Domain" implies that you will create list entries which take into account just the top level domain of the URL. Using the operator "is in list" implies an exact string match.

        6.0.0_urldomain_isintlist.png

        Good

        Entries in "Good: URL.Domain String List"

        Entry: mcafee.com

        Why it's good: URL.Domain will simply equal "mcafee.com".

        6.0.1_urldomain_isintlist.png

        Bad

        Entries in "Bad: URL.Domain String List"

        Entry: www.mcafee.com

        Why it's bad: URL.Domain is "mcafee.com", not "www.mcafee.com". Use URL.Host instead.

        Entry: *.mcafee.com

        Why it's bad: URL.Domain equals "mcafee.com", so "*." would prevent matching. "is in list" implies a string, not a wildcard.

        6.0.2_urldomain_isintlist.png

        URL.Domain using "matches in list"

        Using the property "URL.Domain" implies that you will create list entries which take into account just the top level domain of the URL. Using the operator "matches in list" allows for wildcard matches.

        6.1.0_urldomain_matchesintlist.png

        Good

        Entries in "Good: URL.Domain Wildcard List"

        Entry: regex(mcafee\.(com|co\.uk))

        Old Entry: regex(mcafee.(com|co.uk)) -- (Thanks to for pointing out the error)

        Why it's good: URL.Domain equals "mcafee.com" so it will match. "mcafee.co.uk" will also match.

        6.1.1_urldomain_matchesintlist.png

        Bad

        Entries in "Bad: URL.Domain Wildcard List"

        Entry: *.mcafee.com

        Why it's bad: URL.Domain of "mcafee.com" will not match due to the "*.".

        Entry: *mcafee.com

        Why it's bad: It will match on "mcafee.com", BUT it could match on "maliciousdomainmcafee.com" too.

        6.1.2_urldomain_matchesintlist.png

        URL.Host.BelongsToDomains

        The URL.Host.BelongsToDomains property was introduced in 7.2. It was designed to simplify the complexity of adding list entries. Using the property "URL.Host.BelongsToDomains" allows you to simply enter the domain of interest.

        So if you wish to white list all mcafee.com sites (including subdomains), you can simply enter mcafee.com, there is no need to worry about wildcards.

        4.0.0_urlhost_belongs.png

        Good

        Entries in "Good: Only Domain List"

        Entry: mcafee.com

        Why it's good: Using this entry, it would correctly match for all mcafee.com subdomains, including mcafee.com, www.mcafee.com, secure.mcafee.com, etc...

        Entry: www.mcafee.com

        Why it's good: Using this entry, it would correctly match only for www.mcafee.com subdomains. It would not allow other subdomains of the top domain 'mcafee.com'. This is useful in case you wanted to allow a subdomain, but not the entire domain.

        4.0.1_urlhost_belongs.png

        Bad

        Entries in "Bad: Only Domain List"

        Entry: *.mcafee.com

        Why it's bad: Using URL.Host.BelongsToDomains does not need wildcards, the property requires an exact domain match such as 'www.mcafee.com' or the top domain 'mcafee.com'.

          4.0.2_urlhost_belongs.png

          Test Ruleset

          You can use the test ruleset in your own environment to see how it works! The test ruleset will work in versions 7.4.1+.

          Conclusion

          From the examples, it should be clear that the cleanest/easiest way to create domain based whitelist entries is through the use of the "URL.SmartMatch" property. I hope this helps clarify use cases for the various URL related properties, perhaps it will help with understanding other properties as well.

          Changelog

          2014-10-28 - URL related regex entries were invalid. Updated examples to be http:// instead of hxxp://. Added information regarding URL.SmartMatch property.

          Attachments
          Comments
          cestrada

          Great article as usual Jon- my only question is what is the best recommendation to differentiate between what is whitelisted externally ( external URL's \ host\ ip addresses,etc.)  and what is whitelisted internally especially if you are using NTLMor similar to authenticate internal users.  There must be a better more structured way

          At the moment we bunch all of this onto several rulesets :

          1.    Global Whitelist\------for all sites allowed inside the company

          a.    Global domain allow

          b.    Global Url.host

          c.    Global IP allow

          2.    Global Bypass NTLM\  ----for internal websites authenticate

          a.    Global No Need to authenticate Users---- just what it implies

          b.    Global No need to authenticate servers---just as it implies

          Hey Carlos,

          This guide is mainly geared at showing you how to use the properties and how to correctly create list entries. What you do with them in the rules is up to you.

          One could easily create authentication exemptions using the properties outlined above.

          For questions about authentication and organizing rules I would recommend creating a discussion thread or checking out my authentication guide: https://community.mcafee.com/docs/DOC-4384

          In that guide I discuss common exception examples for authentication.

          Best,

          Jon

          bragot

          This article is really helpful.  It should be added to your standard documentation.

          Funny you say that! It actually was!

          Web Gateway 7.3.2 Product Guide - https://kc.mcafee.com/corporate/index?page=content&id=PD24502 page 245

          darkfell
          regex((.*\.|\.?)mcafee.com) 

          - not correctly

          regex((.*\.|\.?)mcafee\.com)

          Hi ,

          I just noticed this comment!

          I will be updating this document to fix the error you mentioned as well as two more. I will also be adding details regarding the new URL.SmartMatch property (added in 7.4.1).

          Thanks for pointing this out, sorry I missed your comment...

          Best Regards,

          Jon

          haaris

          I have created a rule

          client.IP is in list & URL is in list,where in URL list I have added URL http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=migr-5091393

          but when user is getting the page without the images,its luk like that half of the page is open,do we need to allow the category for that but if we do so,user will b able to access other URLs part of that category.

          Even the url http://www.mcafee.com/us/products/web-gateway.aspxin ur above example is opening in the same way

          Hi Haaris,

          That is a correct observation. Whitelisting a single URL will in most cases not work as you expect. You will need to allow the domain to go with it, if you do not then CSS, javascript, and other content will probably not load.

          The very first example () is an actual use case for whitelisting a *full* URL.

          The URL (http://www.mcafee.com/us/products/web-gateway.aspx) used in the document was merely for demonstration of correct use of the various URL properties.

          Best Regards,

          Jon

          edit: "of correct use of the various URL properties"

          haaris

          But If i have to allow a specific URL ,it will not open as expected so if we allow the domain then what is the point in using URL.path or URL for specific URL as by domain it will allow full URL.

          Please explain me,ur views might help m

          m.bagheryan

          Great Document.

          kozzy

          Fantastic, thank you for this! But I have a question about the regex when it comes to setting the .com, .net and so on.

          is there anything wrong with doing it like this:

          regex(^htt(p|ps):\/\/([\w.-]*\.|\.?)mcafee\.(co\.uk|[\w]*)(\/.*|\/?))

          so instead of using (com|net|co\.uk) I placed this (co\.uk|[\w]*)

          Only issue I see is that it will match something like http://www.mcafee.commmmmm and such, but not http://www.mcafee.maliciousdomain.com, but is there any security issues?

          trishoar

          Hi Kozzy,

          What is the benefit to your version over the example provided? The regex you created is matching on things like www.mcafee.org I suggest you use a site like http://regexr.com/ to test out your syntax.

          Regards,

          Tris

          kozzy

          Hi, exactly and that is the benefit, lets say you want to make one for google, google has many domains, .com, .net, .se, pl, .it, pretty much for every country, so instead of putting each one i use (co\.uk|[\w]*)

          Google is just an example, many companies have sites in each country, and like you wrote, this example works ofr things like www.mcafee.org or www.mcafee.net and so on, without really needing to know all the domains McAfee has.

          Primarily this is useful for companies that are global but use the same Web Gateway policy.

          trishoar

          If you are just looking to do country code TLD's and other common domains then you could do something like this:

          regex(^htt(p|ps):\/\/([\w]*\.)?google\.[\w]{2,3})

          looking at List of Internet top-level domains - Wikipedia, the free encyclopedia all the domains listed at 2 or 3 characters long.

          If you want to cover off co.uk then you could do something like this instead:

          regex(^htt(p|ps):\/\/([\w]*\.)?google\.[\w]{2,3}(\.[\w]{2})?)

          Which includes an optional match for a 2 letter domain on the end.

          Regards,

          Tris

          kozzy

          This one does the trick for me : regex(^htt(p|ps):\/\/([\w.-]*\.|\.?)mcafee\.(co\.uk|[\w]{2,3})(\/.*|\/?))

          Tested it and it covers all domains up to 3 letters including co.uk, matches input like http://test-site.mcafee.com/someplace

          belvincent

          I have one question for allowing specific TLDs.

          Let's allow *.gov.in

          If I use regex I would use something like this:

          URL.Domain "matches in list" regex(.*\.gov\.in)

          I would prefer using URL.Smartmatch rather than regex if possible, so my question is: is it dangerous to use

          URL.SmartMatch(.gov.in) equals True

          My fear is that URL.SmartMatch would also allow:

          http://www.myhackedsite.com/getowned.gov.in/badscript.jsp

          Is there a way without regex to allow specific TLDs (which can be tricky)?

          Is the property URL.DomainSuffix smart enough to know the TLD?

          Hi BelVincent,

          This is exactly what URL.DomainSuffix was created for, no need for regex either. Just URL.DomainSuffix equals 'gov.in'

          I checked SmartMatch, and it seems to handle it correctly. Adding 'gov.in' to a SmartMatch list would *not* allow hxxp://myhackedsite.com/getowned.gov.in, nor would it allow hxxp://gov.in.getowned.com. The only thing it would allow is hxxp(s)://*.gov.in

          Best Regards,

          Jon

          belvincent

          Thanks a lot for your answer Jon!

          belvincent

          Please correct me if I am wrong, but I want to be sure I understood what the support said.

          A string entered in the list tested in a URL.SmartMatch rule will is tested against the URL being called, based on URL, URL Host and URL Path.

          So we have to avoid entries containing a '?' in the SmartMatch list as it is getting interpreted.

          A simple example : you want to allow a specific Youtube video while blocking all other Youtube content (the Youtube APIv3 can be used on local proxies, but not on a SaaS policy sadly, so we do not really much of a choice here).

          So you add this URL in the SmartMatch list

          https://www.youtube.com/watch?v=EoTqx9mVsu4

          What you are actually allowing are all videos on Youtube, because the URL.Path all Youtube videos (https://www.youtube.com/watch) match the URL.Path of this entry (https://www.youtube.com/watch).

          So this URL will also be allowed for instance:

          https://www.youtube.com/watch?v=dfRzLP_KrBQ

          In this case you want to use something else than SmartMatch for filtering, because, if I may rewrite the entry in SmartMatch:

          The URL.SmartMatch property will accept the list as "input", and return TRUE if the given URL, URL host, or URL path variation was found in the list or URL path matches any URL Path of any entry in the list.

          Hi Vincent!

          You can add full URLs to the list and they would work as expected (like https://www.youtube.com/watch?v=EoTqx9mVsu4 or https://www.youtube.com/watch?v=dfRzLP_KrBQ).

          You are correct that all landing pages for videos would be allowed if you add "https://www.youtube.com/watch" to a global whitelist using SmartMatch.

          Attempting to allow based on the URL parameters alone would not work (?v=EoTqx9mVsu4), nor would using the unique string in the parameter (EoTqx9mVsu4). For this you would need to use the URL.Path parameter paired with a regex list.

          Best Regards,

          Jon

          belvincent

          Then I have some trouble understanding this in the rule tracing, can you explain a bit more?

          This is my test ruleset with a single entry in the list for this example:

          RuleSet.png

          This is what I see in Rule Tracing when I watch another Video ID:

          RuleTracing.png

          eelsasser

          As you see, SmartMatch stops at the ? parameters.

          You could use a matches to get the parameter, but what happens when the v= is not right after the ?

          https://www.youtube.com/watch?something=else&v=ABCDEFG

          Then you would have to use a wildcard to get to:

          URL matches

          https://www.youtube.com/watch?*v=ABCDEFG*

          Or you could look for the parameter directly using:

          Application.Name equals YouTube AND

          URL.Path equals "/watch" AND

          URL.HasParameter("v") equals true AND

          URL.GetParamter("v") is in list ListOfVideoIDs

          Where ListOfVideoIDs is a list of just the IDs you want to whitelist, like:

          dfRzLP_KrBQ

          8lMxpDYA5Wg

          D56wGhy6qkk

          LnU0Xh5_nIQ

          belvincent

          Thanks Eric it is much more clear now.

          We already use the Youtube API v3 to allow selected videos, but some admins thought they could use full URLs as well. I will send a communication on this.

          pejacoby

          Support pointed me here with questions about McAfee Client Proxy bypass entries. Is there a similar reference for the best way to set up bypasses in MCP?  The documentation doesn't say much about formats or wildcards.

          Version history
          Revision #:
          1 of 1
          Last update:
          ‎01-03-2013 03:01 PM
          Updated by: