cancel
Showing results for 
Search instead for 
Did you mean: 
McAfee Employee jebeling
McAfee Employee
Report Inappropriate Content
Message 1 of 5

Example Bash Script for Log Pull from Web Gateway Cloud Service?

Jump to solution

Are there any examples of Linux scripts that can be used to retrieve logs from the Web Gateway cloud service so that they can be imported into other reporting tools?

 

4 Solutions

Accepted Solutions
McAfee Employee jebeling
McAfee Employee
Report Inappropriate Content
Message 2 of 5

Re: Example Bash Script for Log Pull from Web Gateway Cloud Service?

Jump to solution

Why yes Jeff, I'm glad you asked. 😉  Details about the API can be found in the Reporting section of the documentation for Web Gateway Cloud Service  here . Included in the documentation are all the available additional filters that can be used in addition to the two required timestamp filters. Also included are the available versions with a listing of included fields and the supported return format types (csv and xml).

I am by no means a scripting wizard but the attached code is fully functional and should give the reader a good idea of how to get the log files from WGCS in bulk or selectively, What you do with the retrieved files is up to you and your selected reporting tool vendor. This code is iintended to be an example and  comes without any warranty or support. I'm sure there are much better ways to implement the scripting for example override via command line arguements as opposed to a file, but I didn't have time for that. If you do use the code or improve upon it please post  back here so that others may benefit.

There are three files in the attached zip. logpull.conf, logpull.sh and logpull.add.conf

Note that none of the files include any credentials. In order to download log files from WGCS you need to use the admin (not user) credentials that are used to log in to manage.mcafee.com. These credentials should be placed in a .netrc file in the home directory of the user that is running the job. 

The .netrc file format looks like this:

machine msg.mcafeesaas.com

    login <your wgcs admin email>

    password <your wgcs admin password>

logpull.sh is the actual bash script if you run it just by name it will pull a log file using the parameters in logpull.conf. If you run it followed by a second filename, if that filename exists the variables set in that file will override the variables in the logpull.conf file

The logpull.conf script provides an initial configuration file that logpull.sh uses as a base configuration for all runs. logpull.conf can be modified with your customerid and other parameters for your scheduled jobs. If logpull.conf is not present in the directory with logfile.sh, logfile.sh will create logpull.conf with default parameters including a bogus customerid. 

Logpull.sh is designed to be run as a periodic  cron job. You can run it at any time interval you like, but the default setting is to only pull a log covering the endtime of the last successfull pull plus one second to the current time less 5 minutes. If the pull is successful then the conf file is updated so that the next pull will have a start time one second later than the end time of the previous successful pull. 

There is also a file named logpull.add.conf which can be used as a basis for creating override files for running custom jobs for example, adding additional filters or changing the time range for a pull. You can create files with any names you like. Using an override settings file is simple, just run ./logpull.sh <setting override file name>. Since this is intended for ad-hoc log pulls the script does not update lastSuccessEpoch if you specify an additional config file on the command line.

One last note, there is some minimal error checking and logging included. Execution logs go to logpull.log and there is a check to see if the curl command returned without error and that the log file is at least 400 bytes. If either of those checks fail logpull.conf variable lastSuccessEpoch isn't updated

 

McAfee Employee jebeling
McAfee Employee
Report Inappropriate Content
Message 3 of 5

Re: Example Bash Script for Log Pull from Web Gateway Cloud Service?

Jump to solution

Experience and testing is a wonderful but arduous process. Through use of the script at a large customer desiring XML format, several challenges required many changes and enhancements to the script that can now be leveraged for general use.

Challenges and discoveries:

  • Query times
    • Queries for CSV accept type are much much faster than queries for XML type
    • Queries for large amounts of data are much faster if the timestamp filters are used and the time span is recent and not longer than a day or two
  • Timeouts
    • Network timeouts can occur, keep-alive is not implemented in the API, if a network timeout occurs curl will error and no file will be saved
    • Query timeouts can occur, if the query takes more than 5 minutes, it will time out and return a file with no data
  • Record limits
    • There is an available record limit filter that can be used to help with handling timeouts
    • If the record limit is hit there is no guarantee that all the records for a particular timestamp have been received as part of the query. 
  • Record transit time PoP to central log storage
    • It can take up to several minutes for logs to arrive in the central database
    • The backset variable was and still is used to account for this. Default is 10 minutes now, but 5 minutes should still be OK
    • There are plans to add the ability to filter based on database arrival time but this is not yet available in production. This should eliminate the need to use the backset and it should be possible to retrieve all logs that arrived in the database on or before the time specified in the query filter. 
  • XML format 
    • It generates much larger files that take longer to download
    • The query itself takes much longer on the backend which can lead to query or network timeouts
    • There was a limit of 1000 records when retrieving XML, the limit for CSV is 10 million records
  • Converting CSV to XML 
    • Proper conversion process is relatively simple but CPU intensive
    • Doing it in bash is highly inefficient and error prone
    • Python and Perl are much more efficient, but there are still things to watch out for
      • Certain fields need to have characters escaped to deliver XML compliant output
      • Any fields that are anonymized in WGCS will need to be escaped adding significant processing time
      • The field names delivered by the API when querying for CSV (field names in the header) are not exactly the same as the field names delivered when querying for XML
      • CSV currently adds two blank lines at the end of the file
  • Multiiple instances running
    • Never run multiple instances of the script from the same folder at the same time. Bad things will likely happen. See below about lock file to protect against this potential issue.

The above challenges and issues resulted in a new script, with new variables that will update an existing logpull.conf file appropriately, however I recommend looking at your existing conf file and copying the lastSuccessEpoch and customerID and any other variables you may have modified to the new script then deleting the conf file and letting the script build a new one.

The new script has a companion python script that will convert CSV format to XML format so that the query will always be CSV format for best performance. If acceptType is specified in the conf file as XML the script will call the python script to convert the log file to XML format. The script could be further modified to actually format and send syslog direct to a SIEM.

The new script also implements a lock file (logpull.lck) that is created and deleted automatically by the script as long as the script is not interrupted. This ensures that no more than one instance of the script runs at a time. All kinds of unintended consequences and issues would likely arise if two instances of the script were running in the same directory at the same time. Be very careful about manually deleting the lock file, only delete it if you are absolutely sure the script is not currently running.

 

McAfee Employee jebeling
McAfee Employee
Report Inappropriate Content
Message 4 of 5

Re: Example Bash Script for Log Pull from Web Gateway Cloud Service?

Jump to solution

Version 3 of the log pull example bash script is attached to this reply.

In addition to fixing a few bugs with versions 2 and 1, this version takes advantage of the newly available (8/29/19) createdTimestamp filters to allow pull of the most current database records without the risk of missing records that may arrive in the database as many as a few minutes after the actual request.

As always this is unsupported public example code that is not supported by McAfee although the API is fully supported per available documentation. Feel free to "fold, spindle and mutilate" as you see fit. But please contribute back to the community if you have suggestions for improvement, if you find bugs, or if you actually code improvements. 

Highlighted
McAfee Employee jebeling
McAfee Employee
Report Inappropriate Content
Message 5 of 5

Re: Example Bash Script for Log Pull from Web Gateway Cloud Service?

Jump to solution

On Oct 10 at 2PM US MST the API will change what data is delivered based on the request timestamp filters. Currently it delivers  request timestamps => from and <= to. The change will deliver request timestamps >= from and < to. The previous scripts avoided duplicate records in scheduled pulls by adding 1 sec to the previous success (previous pull to) and using that for the current from. This needs to change to no longer add the second. This also impacted the code for when maxTime determines more than one pull is required. The script below is modified to account for the change, but will result in duplicate records if used before Oct 10 2PM US MST. The old script will result in missed records if it is still used after the change.

Usual caveats apply. Example only, no support, yada, yada, yada 

4 Replies
McAfee Employee jebeling
McAfee Employee
Report Inappropriate Content
Message 2 of 5

Re: Example Bash Script for Log Pull from Web Gateway Cloud Service?

Jump to solution

Why yes Jeff, I'm glad you asked. 😉  Details about the API can be found in the Reporting section of the documentation for Web Gateway Cloud Service  here . Included in the documentation are all the available additional filters that can be used in addition to the two required timestamp filters. Also included are the available versions with a listing of included fields and the supported return format types (csv and xml).

I am by no means a scripting wizard but the attached code is fully functional and should give the reader a good idea of how to get the log files from WGCS in bulk or selectively, What you do with the retrieved files is up to you and your selected reporting tool vendor. This code is iintended to be an example and  comes without any warranty or support. I'm sure there are much better ways to implement the scripting for example override via command line arguements as opposed to a file, but I didn't have time for that. If you do use the code or improve upon it please post  back here so that others may benefit.

There are three files in the attached zip. logpull.conf, logpull.sh and logpull.add.conf

Note that none of the files include any credentials. In order to download log files from WGCS you need to use the admin (not user) credentials that are used to log in to manage.mcafee.com. These credentials should be placed in a .netrc file in the home directory of the user that is running the job. 

The .netrc file format looks like this:

machine msg.mcafeesaas.com

    login <your wgcs admin email>

    password <your wgcs admin password>

logpull.sh is the actual bash script if you run it just by name it will pull a log file using the parameters in logpull.conf. If you run it followed by a second filename, if that filename exists the variables set in that file will override the variables in the logpull.conf file

The logpull.conf script provides an initial configuration file that logpull.sh uses as a base configuration for all runs. logpull.conf can be modified with your customerid and other parameters for your scheduled jobs. If logpull.conf is not present in the directory with logfile.sh, logfile.sh will create logpull.conf with default parameters including a bogus customerid. 

Logpull.sh is designed to be run as a periodic  cron job. You can run it at any time interval you like, but the default setting is to only pull a log covering the endtime of the last successfull pull plus one second to the current time less 5 minutes. If the pull is successful then the conf file is updated so that the next pull will have a start time one second later than the end time of the previous successful pull. 

There is also a file named logpull.add.conf which can be used as a basis for creating override files for running custom jobs for example, adding additional filters or changing the time range for a pull. You can create files with any names you like. Using an override settings file is simple, just run ./logpull.sh <setting override file name>. Since this is intended for ad-hoc log pulls the script does not update lastSuccessEpoch if you specify an additional config file on the command line.

One last note, there is some minimal error checking and logging included. Execution logs go to logpull.log and there is a check to see if the curl command returned without error and that the log file is at least 400 bytes. If either of those checks fail logpull.conf variable lastSuccessEpoch isn't updated

 

McAfee Employee jebeling
McAfee Employee
Report Inappropriate Content
Message 3 of 5

Re: Example Bash Script for Log Pull from Web Gateway Cloud Service?

Jump to solution

Experience and testing is a wonderful but arduous process. Through use of the script at a large customer desiring XML format, several challenges required many changes and enhancements to the script that can now be leveraged for general use.

Challenges and discoveries:

  • Query times
    • Queries for CSV accept type are much much faster than queries for XML type
    • Queries for large amounts of data are much faster if the timestamp filters are used and the time span is recent and not longer than a day or two
  • Timeouts
    • Network timeouts can occur, keep-alive is not implemented in the API, if a network timeout occurs curl will error and no file will be saved
    • Query timeouts can occur, if the query takes more than 5 minutes, it will time out and return a file with no data
  • Record limits
    • There is an available record limit filter that can be used to help with handling timeouts
    • If the record limit is hit there is no guarantee that all the records for a particular timestamp have been received as part of the query. 
  • Record transit time PoP to central log storage
    • It can take up to several minutes for logs to arrive in the central database
    • The backset variable was and still is used to account for this. Default is 10 minutes now, but 5 minutes should still be OK
    • There are plans to add the ability to filter based on database arrival time but this is not yet available in production. This should eliminate the need to use the backset and it should be possible to retrieve all logs that arrived in the database on or before the time specified in the query filter. 
  • XML format 
    • It generates much larger files that take longer to download
    • The query itself takes much longer on the backend which can lead to query or network timeouts
    • There was a limit of 1000 records when retrieving XML, the limit for CSV is 10 million records
  • Converting CSV to XML 
    • Proper conversion process is relatively simple but CPU intensive
    • Doing it in bash is highly inefficient and error prone
    • Python and Perl are much more efficient, but there are still things to watch out for
      • Certain fields need to have characters escaped to deliver XML compliant output
      • Any fields that are anonymized in WGCS will need to be escaped adding significant processing time
      • The field names delivered by the API when querying for CSV (field names in the header) are not exactly the same as the field names delivered when querying for XML
      • CSV currently adds two blank lines at the end of the file
  • Multiiple instances running
    • Never run multiple instances of the script from the same folder at the same time. Bad things will likely happen. See below about lock file to protect against this potential issue.

The above challenges and issues resulted in a new script, with new variables that will update an existing logpull.conf file appropriately, however I recommend looking at your existing conf file and copying the lastSuccessEpoch and customerID and any other variables you may have modified to the new script then deleting the conf file and letting the script build a new one.

The new script has a companion python script that will convert CSV format to XML format so that the query will always be CSV format for best performance. If acceptType is specified in the conf file as XML the script will call the python script to convert the log file to XML format. The script could be further modified to actually format and send syslog direct to a SIEM.

The new script also implements a lock file (logpull.lck) that is created and deleted automatically by the script as long as the script is not interrupted. This ensures that no more than one instance of the script runs at a time. All kinds of unintended consequences and issues would likely arise if two instances of the script were running in the same directory at the same time. Be very careful about manually deleting the lock file, only delete it if you are absolutely sure the script is not currently running.

 

McAfee Employee jebeling
McAfee Employee
Report Inappropriate Content
Message 4 of 5

Re: Example Bash Script for Log Pull from Web Gateway Cloud Service?

Jump to solution

Version 3 of the log pull example bash script is attached to this reply.

In addition to fixing a few bugs with versions 2 and 1, this version takes advantage of the newly available (8/29/19) createdTimestamp filters to allow pull of the most current database records without the risk of missing records that may arrive in the database as many as a few minutes after the actual request.

As always this is unsupported public example code that is not supported by McAfee although the API is fully supported per available documentation. Feel free to "fold, spindle and mutilate" as you see fit. But please contribute back to the community if you have suggestions for improvement, if you find bugs, or if you actually code improvements. 

Highlighted
McAfee Employee jebeling
McAfee Employee
Report Inappropriate Content
Message 5 of 5

Re: Example Bash Script for Log Pull from Web Gateway Cloud Service?

Jump to solution

On Oct 10 at 2PM US MST the API will change what data is delivered based on the request timestamp filters. Currently it delivers  request timestamps => from and <= to. The change will deliver request timestamps >= from and < to. The previous scripts avoided duplicate records in scheduled pulls by adding 1 sec to the previous success (previous pull to) and using that for the current from. This needs to change to no longer add the second. This also impacted the code for when maxTime determines more than one pull is required. The script below is modified to account for the change, but will result in duplicate records if used before Oct 10 2PM US MST. The old script will result in missed records if it is still used after the change.

Usual caveats apply. Example only, no support, yada, yada, yada 

More McAfee Tools to Help You
  • Subscription Service Notification (SNS)
  • How-to: Endpoint Removal Tool
  • Support: Endpoint Security
  • eSupport: Policy Orchestrator
  • Community Help Hub

      New to the forums or need help finding your way around the forums? There's a whole hub of community resources to help you.

    • Find Forum FAQs
    • Learn How to Earn Badges
    • Ask for Help
    Go to Community Help

    Join the Community

      Thousands of customers use the McAfee Community for peer-to-peer and expert product support. Enjoy these benefits with a free membership:

    • Get helpful solutions from McAfee experts.
    • Stay connected to product conversations that matter to you.
    • Participate in product groups led by McAfee employees.
    Join the Community
    Join the Community