Hi,
Can anybody help me to understand the parsing regular expression, I'm just working on creating a custom parsing for a device. and i got stuck in the regular expressions and what to understand that.
Thank you
Regards
Soji Thomas
This page migth be halp to unterstand regex --> RegExr: Learn, Build, & Test RegEx
The goal of the regular expression is to match the fields in the log that need to be captured and put into the relevant database fields so they are available for search, correlation and reporting on the ESM/ACE. Parentheses within the regex are used to create each match group. For instance, if I have the following log line:
<22>Jul 9 12:55:49 tecate dovecot: imap(bob): Disconnected: Logged out bytes=671/3325
I can parse it with the following regex:
(\w+)\s(dovecot)\x3a\simap\x28(\w+)\x29\x3a\sDisconnected\x3a\sLogged\sout\sbytes\x3d(\d+)\x2f(\d+)
Every time you see a pair of parentheses I'm capturing a field. So the fields here would be:
hostname=tecate
application=dovecot
source user=bob
bytes_received=671
bytes_sent=3325
This information alone won't directly help you to make the parsing rules though since you do need to learn a bit of regex and there are some nuances with the editor but it should give you an idea of the process.
That being said, it looks like your log sample in your screenshot is of McAfee NSP logs. It's better to grab those via a SQL pull of the NSM. If you need to use syslog, there are existing rules for the data source.
That is the CheatSheet from the RegEXR that was mentioned by XDED
Now they also have a desktop app
Character classes | |
---|---|
. | any character except newline |
\w \d \s | word, digit, whitespace |
\W \D \S | not word, digit, whitespace |
[abc] | any of a, b, or c |
[^abc] | not a, b, or c |
[a-g] | character between a & g |
Anchors | |
^abc$ | start / end of the string |
\b | word boundary |
Escaped characters | |
\. \* \\ | escaped special characters |
\t \n \r | tab, linefeed, carriage return |
\u00A9 | unicode escaped © |
Groups & Lookaround | |
(abc) | capture group |
\1 | backreference to group #1 |
(?:abc) | non-capturing group |
(?=abc) | positive lookahead |
(?!abc) | negative lookahead |
Quantifiers & Alternation | |
a* a+ a? | 0 or more, 1 or more, 0 or 1 |
a{5} a{2,} | exactly five, two or more |
a{1,3} | between one & three |
a+? a{2,}? | match as few as possible |
ab|cd | match ab or cd |
Also you might find really useful that you could create multiple small parsers so you could match fragments instead of parsing the entire message.
Give it a try and let us know so we could give you some hints.
Corporate Headquarters
6220 America Center Drive
San Jose, CA 95002 USA