Hello, we would like to get a feedback from users or McAfee experts on the validity of WebReporter as a sound instrument to report web navigation on large companies.
Here is the situation:
We have two proxy farms, each built with ten WebGateway MWG-5500 hardware balanced. Fo each site we just installed a server for a WebReporter service. After several days trying to optimise, tune and rearrange the settings, we are actually unable to successfully import the logs generated in a single day. Based on your experience, we would like to understand if the problem is related to our implementations, design and decisions, and so if it can be worked-out and sorted, or if it is due to WebReporter inability to deal with large data volumes.
A single farm generates approximately 150 GBytes of logs per day (220 GBytes if HTTP 407 error is logged). The logs are rotated when they reach 100 Mbytes, which gives us approximately 1.500 log files per day.
In order to comply to national laws and regulations regarding users' privacy, we had to develop a two-stages process: the original WebGateway logs are moved every night towards a secure and certified log collecting facility. Just before this transfer, a scheduled script copies these logs providing anonimity (the user name is replaced by a hash with a random seed) and transfers them to the WebReporter node, where they land compressed (approximately 16 GByte of gzip-compressed daily logs) on their respective directory (one for each source appliance).
Once received, WebReporter starts to import each log file (32 maximum concurrent log processing jobs are set under Administration, Performance, Advanced options). Ten log sources are defined, one for each directory, corresponding to a source appliance.
With MySQL (either internal or external) the first job lasted a couple of minutes, the sixth or seventh (of more than a thousand!) three to four hours; we were never able to succesfully import all the logs for a single day.
We are now trying with Oracle 11G, as suggested, but with no avail. The import time is larger, probably due to the increased complexity of both the engine and the structure (referential integrity and system tables). We imported five logs in a day (0,3%).
So my question is: do we have to skip WebReporter and look for an alternate solution? Do we need to improve our hardware? Can we try to develop an in-house solution to perform parallel import of logs to Oracle and then use webReporter to manipulate that data (is a detailed description of the import operations available)?
Thank you very much for you advices and suggestions.
P.S. We are using a single HP Proliant DL580 G8 server per site, with two Intel Xeon E5-2660 running at 2.20GHz, 128 GBytes of RAM, 2 Terabytes of space provided by 6 hard-disk drive units under RAID. WebReporter ha 16 GBytes of RAM assigned, while Oracle 96 GBytes.
In any moment during the log import, A SINGLE CORE is working (100%), while the remaining 31 cores are always idle. The RAM is only used by the operating system to cache the files. This situation was the same with MySQL, so even with partitioning enabled the application seems to be designed for a sequential import (but then why the concurrent jobs setting in the Administration pane?) and not for parallel processing.