cancel
Showing results for 
Search instead for 
Did you mean: 
pcoates
Level 10

Custom log entry for Search terms

Jump to solution

Hey Everyone,

I was just wondering if anyone has developed a custom log for pulling out search terms from search engines and writing them to a log file and then potentially adding it as a custom column in Web Reporter.

Cheers,

Pete

0 Kudos
1 Solution

Accepted Solutions
eelsasser
Level 15

Re: Custom log entry for Search terms

Jump to solution

Yes. Partially.

I have created rules to pull out the search terms into a SearchTerms.log. I have attached the rules that are imported into the log handler. This creates a separate log file with the terms loaded into a different column.

It basically takes a URL like this:

http://www.google.com/#hl=en&safe=active&sout=1&sa=X&ei=Z8QOUMTTOuGh6wHd9YDQCw&ved=0CFIQvwUoAQ&q=McA...

and creates a log entry like this for easier reading:

[24/Jul/2012:11:51:08 -0400] 192.168.2.2 "McAfee web gateway 7" http://www.google.com/search?hl=en&safe=active&sout=1&sa=X&ei=Z8QOUMTTOuGh6wHd9YDQCw&ved=0CFIQvwUoAQ...

All it really does is that the q= parameter out of the URL and isolate it, and replace the '+' with space.

I have not attempted to load them into WR as a user-defined field. I suppose it could be done, but i am usually reluctant to add a lot of user-defined fields to WR. On a large user environment, it could affect performance adversely if the hardware and the database are not optimized.

7 Replies
eelsasser
Level 15

Re: Custom log entry for Search terms

Jump to solution

Yes. Partially.

I have created rules to pull out the search terms into a SearchTerms.log. I have attached the rules that are imported into the log handler. This creates a separate log file with the terms loaded into a different column.

It basically takes a URL like this:

http://www.google.com/#hl=en&safe=active&sout=1&sa=X&ei=Z8QOUMTTOuGh6wHd9YDQCw&ved=0CFIQvwUoAQ&q=McA...

and creates a log entry like this for easier reading:

[24/Jul/2012:11:51:08 -0400] 192.168.2.2 "McAfee web gateway 7" http://www.google.com/search?hl=en&safe=active&sout=1&sa=X&ei=Z8QOUMTTOuGh6wHd9YDQCw&ved=0CFIQvwUoAQ...

All it really does is that the q= parameter out of the URL and isolate it, and replace the '+' with space.

I have not attempted to load them into WR as a user-defined field. I suppose it could be done, but i am usually reluctant to add a lot of user-defined fields to WR. On a large user environment, it could affect performance adversely if the hardware and the database are not optimized.

pcoates
Level 10

Re: Custom log entry for Search terms

Jump to solution

Thanks for the quick response Erik!  I had just started working on the rule and using URL.HasParameter and comparing it to popular engines(to get the q=, p=). I thought about limiting it to category search engines, but your URL path check is another way I hadn't thought of right away.

Thanks again!


Cheers,

Pete

0 Kudos
eelsasser
Level 15

Re: Custom log entry for Search terms

Jump to solution

I've only tried this in the major google, yahoo, bing engines.

I just noticed it doesn't work for ask.com.

However, if you change the criteria for the first rule to this, it will work on ask.com:

Rule Criteria:

URL.HasParameter ("q") equals true AND (

URL.Path equals "/search" OR

URL.Path equals "/custom" OR

URL.Path equals "/images" OR

URL.Path equals "/images/search" OR

URL.Path equals "/videosearch" OR

URL.Path equals "/web" OR

URL.Path equals "/pictures")

pcoates
Level 10

Re: Custom log entry for Search terms

Jump to solution

Do you know why I am getting an invalid parser error at Web Reporter when trying to process this search term log? Here is a sample log output, and my Web Reporter failures.

#time_stamp src_ip "search_term" "url"

[07/Aug/2012:13:07:23 -0400] X.X.X.X "Test search term" "https://www.google.ca/search?hl=en&safe=off&sclient=psy-ab&q=Test+search+term&oq=Test+search+term&gs..."

[07/Aug/2012:13:07:31 -0400] X.X.X.X "sample search term" "https://www.google.ca/search?hl=en&safe=off&sclient=psy-ab&q=sample+search+term&oq=sample+search+ter..."

2012-08-08 00:00:05,468 INFO  [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit[1239]) Begin processing file 'SearchTerm1208080000-X.X.X.X.log20120808-000005421.dat'.

2012-08-08 00:00:05,468 INFO  [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit[1239]) Finish counting: [0 seconds to complete]  File='C:\Program Files\McAfee\Web Reporter\reporter\jboss\bin\..\..\tmp\logparsing\processing\SearchTerm1208080000-X.X.X.X.log20120808-000005421.dat' contains 7 lines and 2299 bytes.

2012-08-08 00:00:05,468 ERROR [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit[1239]) Invalid parser, parser initialization failed, id='WebWasherV1'

2012-08-08 00:00:05,468 ERROR [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit[1239]) SearchTerm1208080000-X.X.X.X.log20120808-000005421.dat: processing failed:Unable to determine log format due to invalid parser ID.

2012-08-08 00:00:05,655 INFO  [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit[1239]) Aborted processing file 'SearchTerm1208080000-X.X.X.X.log20120808-000005421.dat': 0 lines processed with 0 errors.

Thanks

0 Kudos
eelsasser
Level 15

Re: Custom log entry for Search terms

Jump to solution

That searchterm.log wasn't intended for Web Reporter consumption.

"url" is not a recognized value for WR log parsing.

You would have to create a custom log format and map the fields into the log source.

0 Kudos
pcoates
Level 10

Re: Custom log entry for Search terms

Jump to solution

Thanks.

I just pulled up the documentation on the standard formats and custom formats. For some reason my mind completely blanked out on Log sources and the format assigned within it. Fridays...

I'll have to get ahold of a premium license to perform some testing on custom log formats.

Thanks again

0 Kudos
eelsasser
Level 15

Re: Custom log entry for Search terms

Jump to solution

So i just tried this. Change the logLine to look like this:

Set User-Defined.logLine = DateTime.ToWebReporterString

+ " "

+ IP.ToString (Client.IP)

+ " ""

+ String.ReplaceAll (URL.GetParameter ("q"), "+", " ")

+ "" ""

+ Request.Header.FirstLine

+ """

Change the Headers to be:

time_stamp src_ip "search_term" "req_line"

Create a new log source that will only handle this file. Don't try to integrate with your other access logs. That way you can generate reportes just with this data.

Format the log source as McAfee Web Gateway (Autodiscover)

Specify User-defined column is search_term

Then you can write detail reports like this:

Search Terms
LineDateTimeUser IPSearch TermsSite Name
18/10/12 4:48:58 PM192.168.2.2-www.google.com
28/10/12 4:49:05 PM192.168.2.2-www.google.com
38/10/12 4:58:12 PM192.168.2.2-www.google.com
48/10/12 5:10:25 PM192.168.2.2how to get away with murderwww.google.com
58/10/12 5:10:36 PM192.168.2.2how to get away with murder and not get caughtwww.google.com
68/10/12 5:13:34 PM192.168.2.2how to get away with murder and not get caughtwww.bing.com
78/10/12 5:13:52 PM192.168.2.2how to get away with murder and not get caughtsearch.yahoo.com
88/10/12 5:10:15 PM192.168.2.2how to pass time while waiting on death rowwww.google.com
98/10/12 5:14:40 PM192.168.2.2how to successfully defend yourself for mudersearch.yahoo.com
108/10/12 5:14:51 PM192.168.2.2how to successfully defend yourself for murderwww.bing.com
118/10/12 5:14:59 PM192.168.2.2how to successfully defend yourself for murderwww.google.com
0 Kudos