I have created rules to pull out the search terms into a SearchTerms.log. I have attached the rules that are imported into the log handler. This creates a separate log file with the terms loaded into a different column.
It basically takes a URL like this:
http://www.google.com/#hl=en&safe=active&sout=1&sa=X&ei=Z8QOUMTTOuGh6wHd9YDQCw&v ed=0CFIQvwUoAQ&q=McAfee+web+gateway+7&spell=1&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb &fp=9cd3aac8854ea7f3&biw=1280&bih=891&sout=1&surl=1
and creates a log entry like this for easier reading:
[24/Jul/2012:11:51:08 -0400] 192.168.2.2 "McAfee web gateway 7" http://www.google.com/search?hl=en&safe=active&sout=1&sa=X&ei=Z8QOUMTTOuGh6wHd9Y DQCw&ved=0CFIQvwUoAQ&q=McAfee+web+gateway+7&spell=1&bav=on.2,or.r_gc.r_pw.r_qf., cf.osb&fp=9cd3aac8854ea7f3&biw=1280&bih=891&tch=3&ech=4&psi=-MMOUMyCH6X26AG6loCI Dg.1343144956749.1&wrapid=tlif134314495674910&sout=1&surl=1
All it really does is that the q= parameter out of the URL and isolate it, and replace the '+' with space.
I have not attempted to load them into WR as a user-defined field. I suppose it could be done, but i am usually reluctant to add a lot of user-defined fields to WR. On a large user environment, it could affect performance adversely if the hardware and the database are not optimized.
Thanks for the quick response Erik! I had just started working on the rule and using URL.HasParameter and comparing it to popular engines(to get the q=, p=). I thought about limiting it to category search engines, but your URL path check is another way I hadn't thought of right away.
1 of 1 people found this helpful
I've only tried this in the major google, yahoo, bing engines.
I just noticed it doesn't work for ask.com.
However, if you change the criteria for the first rule to this, it will work on ask.com:
URL.HasParameter ("q") equals true AND (
URL.Path equals "/search" OR
URL.Path equals "/custom" OR
URL.Path equals "/images" OR
URL.Path equals "/images/search" OR
URL.Path equals "/videosearch" OR
URL.Path equals "/web" OR
URL.Path equals "/pictures")
Do you know why I am getting an invalid parser error at Web Reporter when trying to process this search term log? Here is a sample log output, and my Web Reporter failures.
#time_stamp src_ip "search_term" "url"
[07/Aug/2012:13:07:23 -0400] X.X.X.X "Test search term" "https://www.google.ca/search?hl=en&safe=off&sclient=psy-ab&q=Test+search+term&oq =Test+search+term&gs_l=hp.3..0j0i30j0i8i30j0i22.1303.3648.0.4083.16.16.0.0.0.1.4 86.4115.0j5j3j5j2.15.0...0.0...1c.Y1lm-0Dj5Mo&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.& fp=f1de276a028ad7d9&biw=1600&bih=799&tch=1&ech=1&psi=RUshUMHGIajL0QHy4IGICg.1344 359238809.3"
[07/Aug/2012:13:07:31 -0400] X.X.X.X "sample search term" "https://www.google.ca/search?hl=en&safe=off&sclient=psy-ab&q=sample+search+term& oq=sample+search+term&gs_l=hp.3..0i30.3664.7919.1.822.214.171.124.126.96.36.1998.3534.0j 10j5j1j1.17.0...0.0...1c.4hfrJeTB-5I&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&fp=f1de27 6a028ad7d9&biw=1600&bih=799&tch=1&ech=1&psi=RUshUMHGIajL0QHy4IGICg.1344359238809 .5"
2012-08-08 00:00:05,468 INFO [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit) Begin processing file 'SearchTerm1208080000-X.X.X.X.log20120808-000005421.dat'.
2012-08-08 00:00:05,468 INFO [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit) Finish counting: [0 seconds to complete] File='C:\Program Files\McAfee\Web Reporter\reporter\jboss\bin\..\..\tmp\logparsing\processing\SearchTerm120808000 0-X.X.X.X.log20120808-000005421.dat' contains 7 lines and 2299 bytes.
2012-08-08 00:00:05,468 ERROR [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit) Invalid parser, parser initialization failed, id='WebWasherV1'
2012-08-08 00:00:05,468 ERROR [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit) SearchTerm1208080000-X.X.X.X.log20120808-000005421.dat: processing failed:Unable to determine log format due to invalid parser ID.
2012-08-08 00:00:05,655 INFO [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit) Aborted processing file 'SearchTerm1208080000-X.X.X.X.log20120808-000005421.dat': 0 lines processed with 0 errors.
That searchterm.log wasn't intended for Web Reporter consumption.
"url" is not a recognized value for WR log parsing.
You would have to create a custom log format and map the fields into the log source.
I just pulled up the documentation on the standard formats and custom formats. For some reason my mind completely blanked out on Log sources and the format assigned within it. Fridays...
I'll have to get ahold of a premium license to perform some testing on custom log formats.
So i just tried this. Change the logLine to look like this:
Set User-Defined.logLine = DateTime.ToWebReporterString
+ " "
+ IP.ToString (Client.IP)
+ " ""
+ String.ReplaceAll (URL.GetParameter ("q"), "+", " ")
+ "" ""
Change the Headers to be:
time_stamp src_ip "search_term" "req_line"
Create a new log source that will only handle this file. Don't try to integrate with your other access logs. That way you can generate reportes just with this data.
Format the log source as McAfee Web Gateway (Autodiscover)
Specify User-defined column is search_term
Then you can write detail reports like this:
Search Terms Line DateTime User IP Search Terms Site Name 1 8/10/12 4:48:58 PM 192.168.2.2 - www.google.com 2 8/10/12 4:49:05 PM 192.168.2.2 - www.google.com 3 8/10/12 4:58:12 PM 192.168.2.2 - www.google.com 4 8/10/12 5:10:25 PM 192.168.2.2 how to get away with murder www.google.com 5 8/10/12 5:10:36 PM 192.168.2.2 how to get away with murder and not get caught www.google.com 6 8/10/12 5:13:34 PM 192.168.2.2 how to get away with murder and not get caught www.bing.com 7 8/10/12 5:13:52 PM 192.168.2.2 how to get away with murder and not get caught search.yahoo.com 8 8/10/12 5:10:15 PM 192.168.2.2 how to pass time while waiting on death row www.google.com 9 8/10/12 5:14:40 PM 192.168.2.2 how to successfully defend yourself for muder search.yahoo.com 10 8/10/12 5:14:51 PM 192.168.2.2 how to successfully defend yourself for murder www.bing.com 11 8/10/12 5:14:59 PM 192.168.2.2 how to successfully defend yourself for murder www.google.com