cancel
Showing results for 
Search instead for 
Did you mean: 
edfapack
Level 7

Fine tuning a regex in NDLP

I have defined the majority of the regex that I need, but I'm having diffuclty with one.  The issue I have is finding an expression that will catch a string of numbers with a certain number of digits without any characters or spaces before or after that string. I am set as long as the string has a space or any non-digit character before and after, but running into trouble if the string is the only thing in a line.  Any recommendations are appreciated! 

0 Kudos
20 Replies
elisowash
Level 7

Re: Fine tuning a regex in NDLP

I'm running into the same sort of thing. Did you ever get any resolution or success? It's almost like nDLP is ignoring some basic regex tenents.

0 Kudos
edfapack
Level 7

Re: Fine tuning a regex in NDLP

Unfortunately I have not had any success in finding a resolution.  It would be GREAT if McAfee adopted standard regex rather than creating their own, which are somewhat ineffective. 

0 Kudos
rtrezza
Level 7

Re: Fine tuning a regex in NDLP

Can you post an example of what you are trying to match together with the expression you are using? Maybe I can help you

0 Kudos
elisowash
Level 7

Re: Fine tuning a regex in NDLP

I've had some success today, actually. I'm working with the SSN concept, and I found that using these expressions instead of the defaults solved my issue.

\D\d\d\d\d\d\d\d\d\d\D

\D\d\d\d[\D]\d\d[\D]\d\d\d\d\D

So, if I were to offer a suggestion, it'd be to work with digits and non-digits exclusively.

I have NOT yet made this change in production, so I don't have data on False Positives yet.

0 Kudos
edfapack
Level 7

Re: Fine tuning a regex in NDLP

That is the pattern I've found most helpful thus far, my issue is that when you there is nothing else on a particular line, for example:

123-45-6789

or

123456789

there are no spaces or characters before or after the number, only a carriage return after. When I try to validate, it comes up as false. I have to put either a space or another non-digit character before and after for it to be recognized as matching their "regex".

0 Kudos
rtrezza
Level 7

Re: Fine tuning a regex in NDLP

Assuming you are using the default concept for "SOCIAL-SECURITY-NUMBER", that expression requires the string begin with a whitespace (\s) and ends with a non-digit character (\D). If you remove those items from the default expression then the pattern will validate without the spaces. Here is the default concept:

concept-post.jpg

I remove the leading \s and trailing \D and now when I enter the pattern into the validate window, the pattern does not require the spaces as shown below:

concept-post2.jpg

The consideration for production needs to be what is the likelihood that the number pattern you seek will not have a space or other boundary. For example, if there a product serial number with the string

SESD12345678934333, then the expression modified above will flag the 9 digits inside (123456789) as a match, which is clearly a false positive.

0 Kudos
edfapack
Level 7

Re: Fine tuning a regex in NDLP

rtrezza- That is the ONLY way I've found for DLP to detect SSNs when they're the only text in a line.  Unfortunately, this causes a large number of false positives due to things like you stated, serial numbers, order numbers, foreign phone numbers, etc.  I really wish this product was a more viable DLP solution.

0 Kudos
SafeBoot
Level 21

Re: Fine tuning a regex in NDLP

Unfortunately you're coming up against the problem of machine learning - how do you tell that 1234567890 is a social security number, a telephone number, or a part number? Even you don't know if this is my social security number or not.

If I wrote it as 123-45-6789 you (and DLP) might make an inference that it's an SSN, just because of the tradition of putting the "-" in certain places, but what if I wrote it like 9876-54-321 - it's more vague.

You're going to find that DLP as a product category, regardless of which vendor you choose has this same limitation - unless there's a way if definitively describing a concept in a mathematically defined way you're always going to be balancing accuracy vs false positives.

I wish there was a good answer for you, but it's not a problem technology can solve.

0 Kudos
edfapack
Level 7

Re: Fine tuning a regex in NDLP

There are regex patterns that can be fine tuned to target SSNs. There are certain numbers that SSNs do not start with, there are certain strings of numbers that are not used by the Social Security Administration.  Regex patterns that consider these do exist, just not in this product.

0 Kudos