I have a log that when injested turns a multi line file into a single line with multiple logs. I have a parser that if all the logs were on separate lines it would work fine, however since it is a single line that is just wrapping text it stops after the first complete match. I have tested my regex in regex101.com and it works great, but in the ESM it stops after the bold portion. Does anyone have any idea what could fix this?
Log
--------
############################################# # FTP access: # ############################################# DATE TIME HOST CLIENT IP ADDRESS Jan 29 06:04:31 somehost 10.30.142.80 Jan 29 06:09:31 somehost 10.30.142.80
Regex
----------
\s*(?P<Date>[A-Za-z]{3}\s[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2})\s(?P<Host>[A-Za-z\-]*)\s(?P<SourceIP>[\d]{1,3}\x2e[\d]{1,3}\x2e[\d]{1,3}\x2e[\d]{1,3})
How are you getting this log to the receiver? Are you polling for it or sending it via syslog or some other mechanism?
It is coming in as a flat file from a custom application that runs on a solaris box. It is dropped into a folder than picked up from that receiver. I think I may have figured it out by telling the datasource to utilize \r?\n as line breaks but we will see once another upload comes in.
Yeah, that can be the issue. Or the encoding is set incorrectly.
A useful *nix command is file. You can run it on an arbitrary file and get some basic inforamtion around it's contents. For example;
$ file data.json data.json: ASCII text, with very long lines, with no line terminators
Then ensure the encoding in the datasource is set correctly. UTF-8 is bacwards compatible to ASCII so I recommend just setting ASCII encoding to UTF-8.
Corporate Headquarters
6220 America Center Drive
San Jose, CA 95002 USA