Let me explain my current landscape. We have Secure Web Reporter 5.x with three log sources for 3 proxy servers (let’s call them A, B and C). Log files are received on a daily basis for each of the log sources.
When importing the situation is the following:
I have made a dozen or more log imports with its corresponding manual task to cleanup those days at database level, as far as I understand there is no way to work around this unless I manually (or automatically) move log entries from one day to another to make sure that each day only has records for they day being processed.
Any idea? Is there any other way to get this solved? Can I specify SWR to ignore that log files can contain information for previous day? What is the best method to do a mass import for a complete month?
By the way. It would be a really good and appreciated improvement to have a command line importing tool that would be a bit more verbose and more flexible. I find the current way of importing a bit unfriendly.
I have another question I could not get an answer neither in documentation or KB.
I hope you can help me solve this as I have already spent too much time on this.
This issue will come about because Web Reporter may have imported a "newer" log file, and will not import an "older" file. This is a necessary evil so there is not redundant data. In situations like this it is best to push the data to Web Reporter. See below for instructions on importing in bulk. The below assumes that you have a log source setup to accept incoming log files, and the username and password correspond to that log source in Web Reporter. Also, the below is written for use with Webwasher 6.x but can be adapted to any linux based system with LFTP.
# MANUALLY FTPing LOGS TO WEBREPORTER FROM WEBWASHER #
# Access logs are named in the following fashion:
# The below command will allow you to upload directly from the Web Gateway to Web Reporter, you will need to fill in the blanks for the [YYMMDDhhmm]:
lftp -u [LOGSOURCE-USERNAME]:[LOGSOURCE-PASSWORD] -e 'set ftp:use-feat off; mput /opt/webwasher-csm/logs/access[YYMMDDhhmm]*.log*; quit' -p 9121 [WEB-REPORTER_IP]
Hope this moves you in the right direction, it should get you over the hump of importing the data at least.
Thanks for your prompt reply. I will for sure try your method.
Since the moment I posted my message I have done some progress. As commented before I have one log source that is composed of four proxy servers each of them producing one log file for each day. When trying to import each log file SWR always fails with the message "Too old".
I merged all four log files for each day into a huge log file with Unix sort command and now seems to be importing correctly.
So, my assumption is that SWR does not like to have more than one log file for one specific log source. Would it be better to have for log sources one for each of the proxy servers?
I would say that is a good rule of thumb to have one log source per proxy because you are leaving it up to Web Reporter to reach into a mixed bag and determine what data is the latest. You also have to relalize it can only go based on file attributes. It would be nice if it could peek into the file to see what data it has, but thats not possible.
If it were me I would be pushing all my data to a central log repository or server (where it would remain for X ammount of months) and each proxy would have its own repository folder. Then Web Reporter would collect from the individual folders (separate log sources).
My personal experience is that it is best to have one log source per proxy.
You can generate reports based on an individual proxy's traffic or the combined traffic across all of them. You can get more insight into how they are being utilized that way.
Merging log files from multiple proxies can have unwanted side affects. Unless the combined file is arbitrarily small (for example 100Mb uncompressed), you are likely to see very very severe performance issues processing this file. The caching algorithms used for log parsing are specifically designed to have one log source per each proxy.
So since one log source (configured to actively collect logs) can only collect from one location, I assume you are pulling from a local directory or FTP server? Please let me know which OS Web Reporter is installed on if you are collecting log files from a local directory because there is another potential issue.
In short, the recommendation to have one log source per proxy is the best solution.
I have just checked SWR and observed the following. Let me explain the details of the failure:
1. Copied 10GB log file into SWR directory specified in log source.
2. Clicked on "Process Now"
3. SWR starts importing. It copies log file into parsing directory. There is enough free space on the directory therefore this should not be an issue.
4. After 4 hours looking at it with no activity, this is, nothing moves on Statistics tab and server remains under low load and memory usage I decide to click on "Delete" the import task.
5. I perform a clean up so I click on "Database Maintenance" -> "Manual maintenance" -> "Delete summary and detailed records" for the date that failed importing.
6. After this finishes I try to import again the same 10 GB log file, previously checking that log parsing directory is empty. The following message os obtained on the logparsing logfile.
2011-02-10 19:20:54,107 INFO [securecomputing.smartfilter.logparsing.LogAudit] (FileGetter) Begin copying log files from 'X:\2011.01.12.log' into log parsing
2011-02-10 19:20:54,120 INFO [securecomputing.smartfilter.logparsing.LogAudit] (FileGetter) Skipping file 'X:\2011.01.12.log' because it is too old to be imported.
2011-02-10 19:20:54,120 INFO [securecomputing.smartfilter.logparsing.LogAudit] (FileGetter) End copying log files from 'X:\2011.01.12.log' into log parsing
On SWR web interface I have a "Completed - No files available"
So now my question is...
¿How do I recover from this?
Answering your questions:
- I am importing log files from local SWR directory.
- I am using SWR 5.0.1 under Windows 2008 Enterprise Server
- Database backend is SQL Server 2005. Both SWR and MS-SQL are on the same server.
Thanks for your support!
Web Reporter cannot determine which log files have currently been imported into the database. If you import a log file, then remove the contents from the DB using DB Maintenance, it will not automatically be reimported. For log sources that are configured to import logs from a local directory on the server, it uses the timestamp of the file to determine which files are newer. This prevents Web Reporter from processing the same log file twice.
Using a command line utility such as (touch) can be used to update the timestamp of the files and Web Reporter to treat them as new files again.
However, this is a potential issue in the current version of Web Reporter (5.1.x) on Windows where Web Reporter may not import all access logs if there are many new logs. The problem is that the list of files is not sorted by date, and if Web Reporter happens to import the files out of order, it's possible for some log files to be missed. That bug is scheduled to be fixed in a future release. If you believe this is happening, you should open a service request with Technical Support so that you can be notified when a fix is available.
Toll Free: +1.800.700.8328
United Kingdom: +44 (0) 870 460 4755
Australia: +61 1300 559 109
Support Portal: https://mysupport.mcafee.com
Ok, good to know about how SWR determines newer files, but, what happens if I import a file that was already imported (using touch to trick SWR)?? Would it duplicate records?? Would be great that it could determine if a record was already imported and therefore skip it by some type of hashing of the record.
If you import the same data twice, you would have douplicate records in the detail data table since it is not aggrigated. The summary data would have the same number of records (no additional disk space required) but would have double hits, and double bytes since URLs are condensed into hourly buckets.
So importing log files twice would skew certain report results, such as hits, bytes, etc, but not affect things like site name, categories, etc.Message was edited by: sroering for further clarification on 2/10/11 11:28:04 PM CST