We have been using webreporter for some time, I've recently rebuilt our service on newer hardware as the performance was too poor. The service is currently working ok for the log parsing stage, but reporting is still slower than I'd like, so I'm hoping some one can suggest some ways I can improve our setup.
What we have at present is a webreporter server running 5.2.0.02 1105. This is on RHEL5.7 x64. The server is an IBM HS21 with 16GB memory and 2 quadcore 3.0GHZ CPU's so should have plenty of grunt. The storage is 250GB connected over Fibre Channel to an IBM XIV SAN.
I have a second of these Blades that is running RHEL6.1 x64 and MySQL 5.1.52 again with 250GB storage.
I'm puling in data from 12 WW6 servers, 8 of them are shifting 300GB per day the others are 90-100GB. this represents about 60-80 Million lines per day.
When doing the import I'm processing 20-30k lines per second which is fine, I've set webreporter up thusly
5 Log processing Jobs
Queue throttle threshold 150000 records
Site request cache 150000
Aggregate record cache 1500000
Java is set to consume 10GB RAM
We do not store logs for detailed reporting, and are condensing into page views.
Monitoring the server when doing the log import shows it is pushing the CPU hard which is good
When doing reporting though the DB server will sit querying the DB using just 1 thread until its gathered all the data for 1 part of the report. I've tried tuning the MYSQL db a bit, but I'm no DBA and have no real idea what I'm doing in this aspect. in /etc/my.cnf I've set the following parameters
Also, on the DB the memory usage is very low, with MySQL using only about 1GB.
The server has already take a noticeable slow down over the last week or so since I set it up,
A report covering all log sources for yesterday took 30 Minutes to run, this covered the following areas:
Bandwith: Volume by hour (Chart)
Top 30 Sites by Hits (Chart)
Top 30 Sites by Volume (Chart)
top 20 Content Types by volume (Chart)
Bandwith Volume By Site
Top 100 IP's by Volume, with percent of Bytes, Hits, Hits - Blocked and percent of hits
Log Source by Volume
Trusted source protection area (Detail)
That report was originally taking under 1 Minute.....
My concern is that I need to be able to produce a similar report that will capture the whole month for specific customers, in that instance will will filter the customers by IP.
If any one has any thoughts on improving the DB performance I'd most appreciate it!
As to the other part of the question, I'm seeing alot of log lies being ignored due to spaces in the refer string, such as
2011-09-07 00:00:32,128 ERROR [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit) invalid line at linenumber 22245 '10.122.145.32 - "-" [06/Sep/2011:11:19:16 +0000] "GET http://www.slideshare.net/rss/slideshow/id/7148126 HTTP/1.1" 200 2418 "http://static.slidesharecdn.com/swf/menu.swf?embedCode=<div style="width:425px" id="__ss_7148126"> <strong style="display:block;margin:12px" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)" "ia, md" - "application/xml" "Teaching_Resources" 0.730 "-" -' from file 'access1109061119.merged.log20110907-000027304.dat'
and also due to buggy User agents used by Sophos
2011-09-07 00:00:47,122 ERROR [securecomputing.smartfilter.logparsing.LogAudit] (LogAudit) invalid line at linenumber 706 '10.122.204.140 - "-" [06/Sep/2011:11:33:07 +0000] "GET http://es-web-4.sophos.com/update/ccosx/Sophos%20Anti-Virus%20Home%20Edition.mpk g/Contents/Packages/SophosAV.mpkg/Contents/Resources/agen-tgh.ide HTTP/1.1" 200 5142 "" "<UA a="Mac" c="A-Wrights-iMac.local" u="cca0009aa5e" v="7.3.3" />" "-" - "application/octet-stream" "-" 1.614 "-" -' from file 'access1109061135.merged.log20110907-000040969.dat'
I already have the following in my exclude log lines matching
.*"<UA u="[A-Z0-9\-]*" c="[A-Z0-9\-]*"/>".*$|.*SophosUpdateManager.*
can any one suggest something to capture those other 2 as well?