1 2 Previous Next 11 Replies Latest reply on Feb 11, 2011 8:49 AM by sroering

    Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”

       

      Hi,

       

      Let me explain my current landscape. We have Secure Web Reporter 5.x with three log sources for 3 proxy servers (let’s call them A, B and C). Log files are received on a daily basis for each of the log sources.

       

      • Log sources A and B are not a problem as they come in one single file per log source with complete days from 00:00:00 to 23:59:59.

       

      • Log Source C is the problem. This log source is composed of 4 proxy servers so we get for each day 4 log files corresponding to each of the proxy servers. In addition, each log file comes always with several entries corresponding to records of the day before, more precisely from the last hour of the day before. I guess this is related to some misalignment on how log rate is done, but I am not sure.

       

      When importing the situation is the following:

       

      • Log sources A and B are not a problem. They get imported correctly.

       

      • Log Source C. For this group of proxy servers I need to do a mass import for complete month of January. So my plan is to uncompress each days log file into the import directory. When I import 2011-01-01 there is no issue, logs get imported correctly. Repeating this process with 2011-01-02 I always get this “funny” message of “Skipping file because it is too old”. My assumption here is that as log files for 2011-01-02 contain the last hour for 2011-01-01 SWR fails because it understands that I am trying to re-import the same log files, although this is not correct because log entries never overlap, I don’t have repeated records with same timestamp, I am sure about that.

       

      I have made a dozen or more log imports with its corresponding manual task to cleanup those days at database level, as far as I understand there is no way to work around this unless I manually (or automatically) move log entries from one day to another to make sure that each day only has records for they day being processed.

       

      Any idea? Is there any other way to get this solved? Can I specify SWR to ignore that log files can contain information for previous day? What is the best method to do a mass import for a complete month?

       

      By the way. It would be a really good and appreciated improvement to have a command line importing tool that would be a bit more verbose and more flexible. I find the current way of importing a bit unfriendly.

       

      I have another question I could not get an answer neither in documentation or KB.

       

      • What happens with log sources when I delete them? How are the records imported through that log source treated? Do they get deleted also? Are they assigned to some other log source?

       

      I hope you can help me solve this as I have already spent too much time on this.

       

      Kind Regards

        • 1. Re: Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”
          Jon Scholten

          This issue will come about because Web Reporter may have imported a "newer" log file, and will not import an "older" file. This is a necessary evil so there is not redundant data. In situations like this it is best to push the data to Web Reporter. See below for instructions on importing in bulk. The below assumes that you have a log source setup to accept incoming log files, and the username and password correspond to that log source in Web Reporter. Also, the below is written for use with Webwasher 6.x but can be adapted to any linux based system with LFTP.

           

          # MANUALLY FTPing LOGS TO WEBREPORTER FROM WEBWASHER #

           

          # Access logs are named in the following fashion:
          access[YYMMDDhhmm]*.log*

           

          # The below command will allow you to upload directly from the Web Gateway to Web Reporter, you will need to fill in the blanks for the [YYMMDDhhmm]:
          lftp -u [LOGSOURCE-USERNAME]:[LOGSOURCE-PASSWORD] -e 'set ftp:use-feat off; mput /opt/webwasher-csm/logs/access[YYMMDDhhmm]*.log*; quit' -p 9121 [WEB-REPORTER_IP]

           

          Hope this moves you in the right direction, it should get you over the hump of importing the data at least.

           

          ~Jon

          1 of 1 people found this helpful
          • 2. Re: Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”

            Hi Jon,

             

            Thanks for your prompt reply. I will for sure try your method.

             

            Since the moment I posted my message I have done some progress. As commented before I have one log source that is composed of four proxy servers each of them producing one log file for each day. When trying to import each log file SWR always fails with the message "Too old".

             

            I merged all four log files for each day into a huge log file with Unix sort command and now seems to be importing correctly.

             

            So, my assumption is that SWR does not like to have more than one log file for one specific log source. Would it be better to have for log sources one for each of the proxy servers?

             

            Regards,

            Mario

            • 3. Re: Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”
              Jon Scholten

              I would say that is a good rule of thumb to have one log source per proxy because you are leaving it up to Web Reporter to reach into a mixed bag and determine what data is the latest. You also have to relalize it can only go based on file attributes. It  would be nice if it could peek into the file to see what data it has,  but thats not possible.

               

              If it were me I would be pushing all my data to a central log repository or server (where it would remain for X ammount of months) and each proxy would have its own repository folder. Then Web Reporter would collect from the individual folders (separate log sources).

               

              ~Jon

              1 of 1 people found this helpful
              • 4. Re: Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”

                My personal experience is that it is best to have one log source per proxy.

                 

                You can generate reports based on an individual proxy's traffic or the combined traffic across all of them. You can get more insight into how they are being utilized that way.

                • 5. Re: Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”
                  sroering

                  Merging log files from multiple proxies can have unwanted side affects.  Unless the combined file is arbitrarily small (for example 100Mb uncompressed), you are likely to see very very severe performance issues processing this file.  The caching algorithms used for log parsing are specifically designed to have one log source per each proxy.

                   

                  So since one log source (configured to actively collect logs) can only collect from one location, I assume you are pulling from a local directory or FTP server?  Please let me know which OS Web Reporter is installed on if you are collecting log files from a local directory because there is another potential issue.

                   

                  In short, the recommendation to have one log source per proxy is the best solution.

                  • 6. Re: Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”

                    I  have just checked SWR and observed the following. Let me explain the details of the failure:

                     

                    1. Copied 10GB log file into SWR directory specified in log source.

                    2. Clicked on "Process Now"

                    3. SWR starts importing. It copies log file into parsing directory. There is enough free space on the directory therefore this should not be an issue.

                    4. After 4 hours looking at it with no activity, this is, nothing moves on Statistics tab and server remains under low load and memory usage I decide to click on "Delete" the import task.

                    5. I perform a clean up so I click on "Database Maintenance" -> "Manual maintenance" -> "Delete summary and detailed records" for the date that failed importing.

                    6. After this finishes I try to import again the same 10 GB log file, previously checking that log parsing directory is empty. The following message os obtained on the logparsing logfile.

                     

                    2011-02-10 19:20:54,107 INFO  [securecomputing.smartfilter.logparsing.LogAudit] (FileGetter) Begin copying log files from 'X:\2011.01.12.log' into log parsing
                    2011-02-10 19:20:54,120 INFO  [securecomputing.smartfilter.logparsing.LogAudit] (FileGetter) Skipping file 'X:\2011.01.12.log' because it is too old to be imported.
                    2011-02-10 19:20:54,120 INFO  [securecomputing.smartfilter.logparsing.LogAudit] (FileGetter) End copying log files from 'X:\2011.01.12.log' into log parsing

                     

                    On SWR web interface I have a "Completed - No files available"

                     

                    So now my question is...

                     

                    ¿How  do I recover from this?

                     

                    Answering your questions:

                     

                    - I am importing log files from local SWR directory.

                    - I am using SWR 5.0.1 under Windows 2008 Enterprise Server

                    - Database backend is SQL Server 2005. Both SWR and MS-SQL are on the same server.

                     

                    Thanks for your support!

                     

                    Regards,

                    Mario

                    • 7. Re: Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”
                      sroering

                      Web Reporter cannot determine which log files have currently been imported into the database.  If you import a log file, then remove the contents from the DB using DB Maintenance, it will not automatically be reimported.  For log sources that are configured to import logs from a local directory on the server, it uses the timestamp of the file to determine which files are newer. This prevents Web Reporter from processing the same log file twice.

                       

                      Using a command line utility such as (touch) can be used to update the timestamp of the files and Web Reporter to treat them as new files again.

                       

                      However, this is a potential issue in the current version of Web Reporter (5.1.x) on Windows where Web Reporter may not import all access logs if there are many new logs.  The problem is that the list of files is not sorted by date, and if Web Reporter happens to import the files out of order, it's possible for some log files to be missed.  That bug is scheduled to be fixed in a future release.  If you believe this is happening, you should open a service request with Technical Support so that you can be notified when a fix is available.

                       

                       

                      Toll Free:      +1.800.700.8328
                      International:  +1.651.628.1500
                      United Kingdom: +44 (0) 870 460 4755
                      Australia:      +61 1300 559 109

                       

                      Support Portal: https://mysupport.mcafee.com

                       

                       

                      Regards

                      1 of 1 people found this helpful
                      • 8. Re: Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”

                        Ok, good to know about how SWR determines newer files, but, what happens if I import a file that was already imported (using touch to trick SWR)?? Would it duplicate records?? Would be great that it could determine if a record was already imported and therefore skip it by some type of hashing of the record.

                         

                        Thanks!

                        • 9. Re: Secure Web Reporter. Issues when bulk importing log files -> “Skipping file because it is too old to be imported.”
                          sroering

                          If you import the same data twice, you would have douplicate records in the detail data table since it is not aggrigated. The summary data would have the same number of records (no additional disk space required) but would have double hits, and double bytes since URLs are condensed into hourly buckets.

                           

                          So importing log files twice would skew certain report results, such as hits, bytes, etc, but not affect things like site name, categories, etc.

                           

                           

                          Message was edited by: sroering for further clarification on 2/10/11 11:28:04 PM CST
                          1 2 Previous Next