Regex causes many false positive so we don't block, heard that 'fingerprinting' data is the way to go. How has fingerpinting worked for anyone in the community?
Your RegEx needs to be created based on your company's business requirements.
Are you seeing false positives
1. across all rules or is it limited to certain rules only?
2. for all data types or is it limited to certain data types only?
Need to protect source code (files & snippets)
Neet to protect Account number with the corresponding last name. Want to do it where the last name and it's own account number are blocked. How do I do this?
Thank you for your help
For source code or any data that could be dynamic in nature, use RegExes.
It is not easy since the code varies depedning on the programming language.
If the Data Protection process was followed per standards i.e. Data Identification and Inventory was completed before DLP deployment you wouldn't have issues knowing what to look for.
For static data, in your example, account number + last name, fingerprinting would be the best option. You can create RegExes for account number, but not for any name.
Inventory of data is not accurate due to employee error (eg they tag public when it should be private or priviledged), so it seems (from what I've researched) that fingerprinting is the best way to go, right?
The problem with RegExes are the false postives (false negs), so currently I am unable to protect Source Code or documents (inherently dynamic) with the McAfee dlp. Are there any "fingerprinting" modules which my org can add? If so, how accurate is the detection?
My previous reply still holds good for your questions.
Other than RegExes, since you mentioned that the users are not classifying data properly, to me it sounds like they need to be trained better.
Since there's a false positive issue of tagging or classifying data with keywords and as you said earlier, classification will not work for dynamic data, we've realized that we need to fingerprint the data in order to catch a partial file or code snippet. So how is this done? Does NDLP have a fingerprinting module?
So how do I fingerprint my sourcecode and setup the policy for a partial match? This is a much better alternative than classifying and running Regexes on file tags, also how accurate is the detection with NDLP fingerprinting algorithm?