5. Data Sanitization (CDR)

What is Data Sanitization?

An increasingly popular and effective method of compromising computer security, especially as part of a targeted attack, involves sharing common document types or image files with victims. Even though the original versions of these files do not contain executable data, attackers have found ways to trigger these files to execute embedded malicious code. Popular techniques used to accomplish this include VBA macros, exploit payloads, and embedded Flash or JavaScript code. This type of attack has a high success rate because most users don’t expect common file types to contain infections. For high-risk files or scenarios, Data Sanitization, also known as Content Disarm & Reconstruction (CDR), prevents any possibility of malicious content (including zero-day threats) from executing. High-risk files can be sanitized through several different methods:

  • Removing hidden exploitable objects (e.g, scripts, macros, etc.)

  • Converting the file format

Supported File Types

Source File Type

Target Sanitized Types

doc

doc, pdf

xls

xls, pdf

ppt

ppt, pdf

rtf

rtf

docx

docx, txt, html, pdf, ps, jpg, bmp, png, tiff, svg

xlsx

xlsx, csv, html, tiff, pdf, ps, jpg, bmp, png, svg

pptx

pptx, pdf

htm/html

html, pdf, ps, jpg, bmp, png, svg

pdf

pdf, html, svg, jpg, bmp, png, tiff, txt

jpg

jpg, bmp, png, tiff, svg, gif, ps, eps, pdf

bmp

bmp, jpg, png, tiff, svg, gif, ps, eps, pdf

png

png, jpg, bmp, tiff, svg, gif, ps, eps, pdf

tiff

tiff, jpg, bmp, png, svg, gif, ps, eps

svg

jpg, bmp, png, tiff, gif, ps, eps

gif

jpg, bmp, png, tiff, svg, ps, eps, pdf

hwp*

hwp

jtd*

jtd

xml*

xml, pdf

* JTD / HWP / XML / WMF sanitization is in BETA. Please do not enable for production usage. However, it should not affect other sanitization when it is enabled. Please contact OPSWAT tech support if you have any samples that you would like to share with us for investigation.

Single Output File or Multiple Output File

If target contains only one file, it will be not zipped and treat as single output file. For example, If a PDF file has only one page, converts to JPG will be JPG. If a PDF file has more than one page, there will be multiple JPG files and will result in a ZIP file. The following sanitization result in potentially multiple files (single ZIP file).

  • PDF->HTML

  • PDF->IMG

  • DOCX→HTML, IMG

  • XLSX->HTML, CSV, IMG

  • PPTX→HTML, IMG

Data Sanitization is only available on Windows OS.