3. Data Sanitization (CDR)

What is Data Sanitization?

An increasingly popular and effective method of compromising computer security, especially as part of a targeted attack, involves sharing common document types or image files with victims. Even though the original versions of these files do not contain executable data, attackers have found ways to trigger these files to execute embedded malicious code. Popular techniques used to accomplish this include VBA macros, exploit payloads, and embedded Flash or JavaScript code. This type of attack has a high success rate because most users don’t expect common file types to contain infections. For high-risk files or scenarios, Data Sanitization, also known as Content Disarm & Reconstruction (CDR), prevents any possibility of malicious content (including zero-day threats) from executing. High-risk files can be sanitized through several different methods:

  • Removing hidden exploitable objects (e.g, scripts, macros, etc.)

  • Converting the file format

Supported File Types For Windows

Source File Type

Target Sanitized Types

doc

doc, pdf

dot

dot

xls

xls, pdf

ppt

ppt, pdf

rtf

rtf

docx

docx, txt, html, pdf, ps, jpg, bmp, png, tiff, svg

docm

docm, txt, html, pdf, ps, jpg, bmp, png, tiff, svg

dotx

dotx

xlsx

xlsx, csv, html, tiff, pdf, ps, jpg, bmp, png, svg

xlsm

xlsm, csv, html, tiff, pdf, ps, jpg, bmp, png, svg

xlsb

xlsb

pptx

pptx, html, pdf, ps, jpg, bmp, png, tiff, svg

pptm

pptm, html, pdf, ps, jpg, bmp, png, tiff, svg

htm/html

html, pdf, ps, jpg, bmp, png, svg

pdf

pdf, html, svg, jpg, bmp, png, tiff, txt

hwp

hwp

jtd

jtd

xml*

xml, pdf

jpg

jpg, bmp, png, tiff, svg, gif, ps, eps, pdf

bmp

bmp, jpg, png, tiff, svg, gif, ps, eps, pdf

png

png, jpg, bmp, tiff, svg, gif, ps, eps, pdf

tiff

tiff, jpg, bmp, png, svg, gif, ps, eps

svg

jpg, bmp, png, tiff, gif, ps, eps

gif

jpg, bmp, png, tiff, svg, ps, eps, pdf

wmf*

jpg,bmp,png,tiff,svg,gif,ps,eps,pdf

7z

zip

gz

zip

rar

zip

xz

zip

zip

zip

XML / WMF sanitization is in BETA. Please do not enable for production usage. However, it should not affect other sanitization when it is enabled. Please contact OPSWAT tech support if you have any samples that you would like to share with us for investigation.

XML sanitization is specific to XML vulnerability. It does not eliminate other threat such as Microsoft Office XML formats. For example, Microsoft office 2003 supports XML format document (different from Microsoft Open XML, which is more strict version and zipped format). Please do not enable XML sanitization on production server to sanitize XML-based document. XML sanitization should be used only to reduce risk of XML parser vulnerability.

Supported File Types For Linux (BETA)

Source File Type

Target Sanitized Types

doc

doc, pdf

docx

docx, txt, html, pdf

xlsx

xlsx, csv, html

pptx

pptx

pdf

pdf, bmp

jpg

jpg, bmp, png, tiff, svg, gif, ps, eps, pdf

bmp

bmp, jpg, png, tiff, svg, gif, ps, eps, pdf

png

png, jpg, bmp, tiff, svg, gif, ps, eps, pdf

tiff

tiff, jpg, bmp, png, svg, gif, ps, eps

gif

jpg, bmp, png, tiff, svg, ps, eps, pdf

7z

zip

gz

zip

rar

zip

xz

zip

zip

zip

Single / Multiple Output File

If target contains only one file, it will be not zipped and treat as single output file. For example, If a PDF file has only one page, converts to JPG will be JPG. If a PDF file has more than one page, there will be multiple JPG files and will result in a ZIP file. The following sanitization result in potentially multiple files (single ZIP file).

  • PDF->HTML

  • PDF->IMG

  • DOCX→HTML, IMG

  • XLSX->HTML, CSV, IMG

  • PPTX→HTML, IMG