This may be an odd response to a perfectly reasonable question, but the answer starts with "Parse as generic syslog" should never be enabled under any circumstances, serves no useful purpose and should be removed from the product. In fact, it fills up your rules table and creates a mess that needs to be cleaned up, as you now see, so it's worse than useless. The initial idea was good, "auto-create parsing rules when they don't match a parser", but the implementation was flawed when every changing timestamp created a new rule. The rules table is limited to a million rules and usually will hit its limit within a few minutes-hours.
You'll want to go to Policy Manager | Receiver | Data Source, highlight a rule, then Edit and Delete Auto Learned Rules. From there it gives you some options and generally you want to delete as few as you need to, but if a datasource has never parsed correctly, make sure you delete all of the rules that it's associated with. If you get excited and delete all autolearned rules (sometimes it's worth it), you will notice that some of your rule names have become just '0' in the UI. To fix this, you can go to the Receiver Properties | Events, Flows and Logs, set the Last Downloaded String Record to some date a really long time ago (year+ is fine if the data is there) and it will usually populate all of the rule names back.
Now that is established, and everyone that ever reads this has disabled "Parse as generic syslog" on every data source, now and forever, we are left we two other options: "Log 'unknown syslog' event" and "Do nothing". Do nothing is always the eventual goal for every data source, but with some data sources you want to watch the logs for a bit to verify everything is parsing as expected. If you see an unknown event get created, take a look at the Packet tab and see if it is something that you might have cared about or was relevant to a use case. If it is relevant, write a parsing rule for it. If everything is unknown then it might be an supported device or mis-configured, but that's leading to an answer of a different question. If you need help writing a parsing rule, that's also a separate question.
The reason that "Do nothing" is the goal is that the more events that have to hit every branch of the regex tree until they realize no one loves them and they become and unknown, the more performance will be impacted. This is a game of little wins that add up and details matter. So while I admit to having systems that been set to 'Log unknown' for years, the load was low enough that I didn't care. If performance matters though, then this is a goal for you.
So, more specifically for this, cursory investigation seems to indicate this is a CA (formerly Layer 7 SSG) API Gateway and that doesn't appear to be a currently supported device. From here the next question on the decision tree is: does this device generate any data directly related to a use case? If not, uncheck the "Parsing" checkbox under the datasource and never look back. If you ever need to get to the log, it's available on the ELM.
If the answer is yes, this data matters, the next question is: do you think that the data, at least the logs that you care about, can be easily represented with a few parsing rules or is this something you need to go your vendor and ask for formal support for?
The former is pretty quick to deal with vs. the latter, but the latter is certainly appropriate for any major product. Many times, use cases call for a limited set of the logs though, say authentication, so it's easier just to bang out a quick parsing rule or even post it here and have someone help you with it. If there is a requirement for a new device, the fastest way is to bring it up through your account team or use the Idea site.
So, in summary:
- Never use "Parse as generic syslog",
- Try to get every data source to "Do nothing"
- Write parsing rules when possible
- Ask for help for device support or custom parsing if needed
- Be ruthless when deciding to send data to the ESM and the ELM or just the ESM.
- Even though log management is an important use case, it's still separate from SIEM.
(I should have just made this a blog post).
Thanks for your response. What really is driving this question is our attempt to save space (on the ELM) as these devices are very noisey firewalls. Will a "do nothing" setting help us in this regard? Retaining data is important to our use case.
As a little background, the architecture is such that there are two representations of the data, one on the ESM, which populates your UI and one on the ELM. The data on the ESM is an aggregated, parsed version of the data on the ELM. This allows it to be quickly searched and analyzed. The ELM data is heavily compressed and while easily searchable, is not optimized for real time search. Since we have two sets of data we also have two different retention windows to consider. The ESM data is measured in database records and an aggregated event, which may represent thousands of actual events, populates one record. The amount of retention that you have in your UI is independent of the storage pool(s) that you have in the ELM.
Under the configuration for a data source, you have Parsing and Logging checkboxes (you also have SNMP, but pretend that you don't). If the Parsing checkbox is checked then the data goes to the ESM. If the Logging checkbox is checked then the logs will go to the ELM. On a per device basis you could uncheck the Parsing checkbox and let lower value logs be stored on the ELM, usually for compliance purposes.
If more granularity is required than a per device basis, for example, drop events with the vulnerability scanner's IP or system account names that end with $, logs can be directed to the ESM, ELM, both or neither, using Receiver Filters. More information on Receiver Filters can be found in these links.