is it possible to make a classification rule which recognize password? It is possible by using a dictionary (words like "password", "pswrd"), but generate lots of false positive.We would like block unencrypted emails, which contain passwords.
You may be able to use Validation Algorithms to detect passwords in unencrypted emails. However, you would need to write your own custom Regex Expression using Google RE2 regular expression syntax to recognize your environments particular password criteria.
NOTE: This would be very difficult, considering endusers can create very complicated and long password using special characters, numbers, and symbols. It would be almost impossible for DLP to distinguish between what is or isn't a password in a document.
I know this is an old thread, but the question is relevant, and the answer is - as far as I know - wrong.
I do not believe it is possible to write a password recognizing regex with RE2 because there is no backtracking. You can make an expression chekcing for a length of 8 characters, and you can make an expression that checks for at least one digit, expressions that checks for at least one upper and lower case character. But you cannot make a single expression that checks for 8 characters of length, and at least one digit, one upper, and one lower case character.
If you put those single regexes into the advanced patterns, they will be OR'ed, not AND'ed.
So the question remains: is there no way of identifying passwords, even simple passwords with DLP?
The expressions can be validated with a password validation algorithm. I wonder what that validation does, because I cannot see any effect, and I cannot find anything in the documentation.
As you noted, the McAfee Data Loss Prevention regular expression tool makes use of RE2, which does not support lookahead. As you correctly note, there is, in fact, no good way to validate passwords as a result, due to the fact that it is impossible to know where in a string the numerics or special characters could be, nor how long the password might be. You could ensure that it meets certain length requirements (e.g., the string must be at least 8 characters in length), but you're most likely to end up with massive numbers of false positives in the process.
What you could do is look for a string of a certain length and then use that regular expression in a proximity rule looking for one or more terms that indicate the presence of a password. That would help decrease the rate of false positives, but would also possibly eliminate some legitimate hits due to guessing wrong as far as keywords to match with.