3 Replies Latest reply on Sep 11, 2013 10:22 AM by mololly

    URL False Positives when Searching for Social Security Number SSN

    JoeyMc

      Hello,

      I'm running HDLP 9.2.0.522.

       

      I'm having an issue where if a user send a hyperlink it can also be flagged as a Social Security Number (SSN).

       

      I have email and print protection rules that search for SSN tags.

      My text pattern for SSN is : (?<!(\w|\-))(?!000)(?!666)([0-6]\d\d|7[01256]\d|73[0123]|77[012])([-\s]?)(?!00)( \d{2})\3(?!0000)(\d{4})(?!(\w|\-))

      This seems much better than the built in McAfee SSN pattern. To give proper credit I found this in an article by Steve Lovaas of Colorado State University titled: Parsing (and improving) the CMU SSN regex

       

      The scenario is that if a user emails a link such as: http://www.telegram.com/article/20120403/NEWS/104039996/1116

      104039996 contained in this article can be a valid ssn.

      The email will be tagged with the SSN tag and blocked.

       

      To attempt to stop this false positive I have tried two things:

       

      1) Create a hyperlink text pattern and exclude that pattern from the protection rule.

      My hyperlink pattern is: (http|ftp|https|file):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\- \@?^=%&amp;/~\+#])?

      This works fine for finding hyperlink patterns.

      I thought this was my solution until I realized that if a user send both a SSN and a hyperlink in an email it will allow it through the protection rule!

       

      Since the above failed I tried:

      2) Add the hyperlink text pattern to the ignore section of the SSN text pattern.

      This never worked, because when DLP find the SSN text pattern, it then checks that exact pattern(not the entire string) for the exclusion list.

       

      I will open a ticket with McAfee support, but sometimes they do not know anything about text patterns.

      Forgive me if this is answered on another post.  I couldn't find it.

       

      Any suggestions will be welcome!

       

      On another note: I am also getting false positves for Zip+4 zipcodes. I know this is not a real false positive because some Zip+4 are actual socials security numbers. I just want to say: What was the goverment thinking! No other social security number format string should be allowed to be used by anything other than SSN!