6.1 Detect sensitive information

The main feature of the Proactive DLP engine is to detect and block sensitive data in files, including credit card numbers and social security numbers. The engine supports a wide range of file types, including Microsoft Office documents and PDF.

Sensitive Data

  • Social Security Number (SSN)

  • Credit Card Number (CCN)

  • IPv4

  • Classless Inter-Domain Routing (CIDR)

  • Any specific data pattern using the regular expression
    Note: Defining so simple regexes is not the recommended way to use this engine, For example: \d , \w. Because most documents contain many numbers, so the complexity and time needed to scan are going to be increased a lot when you define something so simple. Turning redaction on with this regex would cause even worse performance.

Certainty score

Certainty score is defined by the relevance of the given hit in its context . It is calculated based on multiple factors such as the number of digits, Bank Identification Number (BIN) lookup, context ...

  • SSN Certainty levels

    • High

    • Medium

    • Low

  • CCN/IPv4/ CIDR Certainty levels

    • Very High

    • High

    • Medium

    • Low

    • Very Low

  • Custom RegEx Certainty levels

    • Medium

    • High

    • Very High

  • Custom metadata Certainty levels

    • High

Supported File Types

Text and Documents

  • Ansi Text (*.txt)

  • ASCII Text

  • CSV (Comma-separated values) (*.csv)

  • Microsoft Excel for Mac 2.2, 3, 4, 5, 98, 2001, X, 2004, 2008, 2011

  • Microsoft Excel for Windows 2, 3, 4, 5

  • Microsoft Excel 95, 97, 2000, XP, 2003, 2007, 2010, 2013, 2016 (*.xls)

  • Microsoft Excel Office Open XML 2007, 2010, 2013, and 2016 (*.xlsx)

  • Microsoft PowerPoint 3, 4, 95, 97, 98, 2000, 2001, 2002, 2003, 2004, 2007, 2008, 2010, 2011, 2013, 2016 (*.ppt)

  • Microsoft PowerPoint Office Open XML 2007, 2010, 2013, and 2016 (*.pptx)

  • Microsoft Rich Text Format (*.rtf)

  • Microsoft Word for DOS 1, 2, 3, 4, 5, 6 (*.doc)

  • Microsoft Word for Mac 1, 3, 4, 5, 6, 98, 2001, X, 2004, 2008, 2011

  • Microsoft Word for Windows 1, 2, 6 (*.doc)

  • Microsoft Word 95, 97, 98, 2000, 2002, 2003, 2007, 2010, 2013, 2016 (*.doc)

  • Microsoft Word 2003 XML (*.xml)

  • Microsoft Word Office Open XML 2007, 2010, 2013, 2016 (*.docx)

  • OpenOffice/LibreOffice versions 1, 2, 3, 4, and 5 documents, spreadsheets, and presentations (*.sxc, *.sxd, *.sxi, *.sxw, *.sxg, *.stc, *.sti, *.stw, *.stm, *.odt, *.ott, *.odg, *.otg, *.odp, *.otp, *.ods, *.ots, *.odf) (includes OASIS Open Document Format for Office Applications)

  • PDF files (*.pdf), note: Encrypted PDF files cannot be indexed, unless the PDF file can be opened without a password and the PDF file permissions allow for text extraction.

  • PDF Portfolio files (*.pdf), including embedded non-PDF documents.

  • Unicode (UCS16, Mac or Windows byte order, or UTF-8)

  • XML (*.xml)

  • JSON (*.json)

Email, HTML

  • EML (emails saved by Outlook Express) (*.eml)

  • MSG (emails saved by Outlook), including attachments (*.msg)

  • Eudora MBX message files (*.mbx)

  • HTML (*.htm, *.html)

Media (Metadata check only)

  • Adobe Photoshop images (*.psd)

  • ASF media files (*.asf)

  • JPEG (*.jpg)

  • MP3 (*.mp3)

  • TIFF (*.tif)

  • WMA media files (*.wma)

  • WMV video files (*.wmv)

  • GIF (*.gif)

  • PNG (*.png)