Showing results for 
Search instead for 
Did you mean: 

Web Reporter showing subhosts seperately

Not sure if this is the right section but here it goes.  When running reports I notice that sites that have different "subhosts" in the URL are shown as seperate entries.  This gives an inaccurate report.  An example of this is when you report on Streaming media.  You will see many lines listing many different site names but they all finish with

I want it to simply ignore the subhosts and give me a total of all traffic to adding up all those subhosts and just showing the total bandwidth from youtube.  Below is an example of what i see in a report

189           Streaming Media, Media Sharing      449.54

190           Streaming Media, Media Sharing      448.79

191         Streaming Media, Media Sharing      448.67

192      Streaming Media, Media Sharing      445.76

193      Streaming Media, Media Sharing      444.70

194      Streaming Media, Media Sharing      441.94

195      Streaming Media, Media Sharing      441.19

0 Kudos
5 Replies
Level 13

Re: Web Reporter showing subhosts seperately


You are correct, Web Reporter does not strip subdomains from the hostname.  If you want to report on top level domains, then they will need to be put into a user-defined column that using a ruleset.

I haven't tested this, and I'm relatively certain this won't be 100% correct, but the regex for the ruleset would need to be something like this.


The URLs look like this

"GET HTTP/1.1"

A break down of what I was trying to achieve is this.

.*       matches any character.

\.*     matches a single period (which may or may not exist)

(stuff between paren is what we will be keeping)

[a-zA-Z\-0-9]+       Maches alpha-numeric characters and dash, one or more required

\.      matches a single perios (required)

[a-zA-Z]+       matches only alpha characters, one or more required

\/*              matches optional forward slash

.*           matches any character, one or more times.

Have fair warning that many countries outside of the US have two levels at the end of their domains.  For example or

The regex above would only give you the "" or "" part.

A better solution would need to put a list of these known domains into an optional list.  For example, something like this.


Since + is a greedy operator, it would continue to consume the top level domains in the list. Then in front of it, you could optionally get another level domain "[a-zA-Z\-0-9]*".  The reason I say optional is you might want to consider the possibility that  "" is the complete domain.

And lastly, you might want to make a save default rule that keeps the entire host.  Hopefully that's enough to get you started.  Web Reporter uses the Java Regex notation if you need to find documentation.  Here's one example.

0 Kudos
Level 13

Re: Web Reporter showing subhosts seperately

Another thought.. keep in mind that it's perfectly OK to have an IP address for the hostname. That's another case you would need to handle. I'm sure there are other's I haven't thought of yet either.

0 Kudos

Re: Web Reporter showing subhosts seperately

Thanks for the suggestion.  I understand your points where we might have problems.  How about i make it a little eaiser.  Im specifically looking to combine Youtube and Facebook hits only.  The should be easier to nail down since we know what we are working with in regard to domain names. What so you think?

0 Kudos

Re: Web Reporter showing subhosts seperately

Ok i have made some headway with this.  I have created a rule that will take basically anything ending in and turn it into just

For my needs this is perfect.  The second part is I have to apply this rule to a User defined column.  I want it to populate a "new" site option with this rule but when i tell it to use a Log record from the source data the "site" field is not shown, only URL is the closest.  How does Web Reporter work out the Site field?  Does it use the URL field or a particular field log file header?


0 Kudos
Level 13

Re: Web Reporter showing subhosts seperately

Unfortunately you cannot modify the pre-populated columns such as url or site name without actually modifying the access log before importing.  The user defined columns essentially allow you to pull custom data from the log and store it as an extra value.

0 Kudos