Data Sanitization (CDR)

What is Data Sanitization?

An increasingly popular and effective method of compromising computer security, especially as part of a targeted attack, involves sharing common document types or image files with victims. Even though the original versions of these files do not contain executable data, attackers have found ways to trigger these files to execute embedded malicious code. Popular techniques used to accomplish this include VBA macros, exploit payloads, and embedded Flash or JavaScript code. This type of attack has a high success rate because most users don’t expect common file types to contain infections. For high-risk files or scenarios, Data Sanitization, also known as Content Disarm & Reconstruction (CDR), prevents any possibility of malicious content (including zero-day threats) from executing. High-risk files can be sanitized through several different methods:

  • Removing hidden exploitable objects (e.g, scripts, macros, etc.)

  • Converting the file format

Supported File Types (both Windows and Linux)

 

Source File Type

Description

Target Sanitized Types

1

doc

Microsoft Word 97-2003 Document

doc, pdf

2

dot

Microsoft Word 97-2003 Template

dot

3

xls

Microsoft Excel 97-2003 Workbook

xls, pdf*

4

ppt

Microsoft PowerPoint 97-2003 Presentation

ppt, pdf*

5

rtf

Microsoft Rich Text Format

rtf

6

docx

Microsoft Word Document

docx, txt, html, pdf, ps*, jpg*, bmp*, png*, tiff*, svg*

7

docm

Microsoft Word Macro-Enabled Document

docm, docx*, txt*, html*, pdf*, ps*, jpg*, bmp*, png*, tiff*, svg*

8

dotx

Microsoft Word Template

dotx

9

dotm

Microsoft Word Macro-Enabled Template

dotm , dotx*

10

xlsx

Microsoft Excel Workbook

xlsx, csv, html, tiff*, pdf*, ps*, jpg*, bmp*, png*, svg*

11

xlsm

Microsoft Excel Macro-Enabled Workbook

xlsm , xlsx*, csv*, html*, tiff*, pdf*, ps*, jpg*, bmp*, png*, svg*

12

xlsb

Microsoft Excel Binary Workbook

xlsb

13

csv

Comma-separated values

csv

14

pptx

Microsoft PowerPoint Presentation

pptx, html*, pdf*, ps*, jpg*, bmp*, png*, tiff*, svg*

15

pptm

Microsoft PowerPoint Macro-Enabled Presentation

pptm , pptx*, html*, pdf*, ps*, jpg*, bmp*, png*, tiff*, svg*

16

ppsx

Microsoft PowerPoint Show

ppsx

17

vsdx

Microsoft Visio Drawing

vsdx

18

vsdm

Microsoft Visio Macro-Enabled Drawing

vsdm

19

odt

OpenDocument Text

odt

20

htm/html

Hypertext Markup Language

html, pdf*, ps*, jpg*, bmp*, png*, svg*

21

pdf

Adobe Portable Document Format

pdf, html*, svg*, jpg*, bmp, png*, tiff*, txt*

22

hwp

Hangul Word Processor

hwp

23

jtd

Ichitaro Document

jtd

24

jtdc

Ichitaro Compressed Document

jtdc

25

xml

Extensible Markup Language

xml

26

xml-doc

Microsoft Word 2003 XML Document

pdf

27

xml-docx

Microsoft Word XML Document

pdf

28

xml-xls

Microsoft XML Spreadsheet 2003

pdf

29

vcs

vCalendar

vcs

30

ics

iCalendar

ics

31

jpg

JPEG Image

jpg, bmp, png, tiff, svg, gif, ps, eps, pdf*

32

bmp

Windows Bitmap Image

bmp, jpg, png, tiff, svg, gif, ps, eps, pdf*

33

png

Portable Network Graphics

png, jpg, bmp, tiff, svg, gif, ps, eps, pdf*

34

tiff

Tagged Image File Format

tiff, jpg, bmp, png, svg, gif, ps, eps

35

svg

Scalable Vector Graphics

svg, jpg*, bmp*, png*, tiff*, gif*, ps*, eps*

36

gif

Graphics Interchange Format

gif, jpg, bmp, png, tiff, svg, ps, eps, pdf*

37

wmf

Windows Metafile

wmf, jpg, bmp*, png*, tiff*, svg*, gif*, ps*, eps*, pdf*

38

emf

Windows Enhanced Metafile

emf

39

dwg

AutoCAD

dwg

40

7z

7-zip Compressed Archive

7z, zip, gz, xz

41

gz

GNU Zipped Archive

gz, 7z, zip, xz

42

rar

WinRAR Compressed Archive

zip, 7z, gz, xz

43

xz

XZ Compressed Archive

xz, zip, 7z, gz

44

zip

ZIP Archive

zip, 7z, gz, xz

* Only supported on Windows for now.

ODT / XML / WMF / SVG (to SVG) / VSDM / VSDX / WMF / EMF / JTDC / VCS / ICS sanitization is in BETA. Please do not enable for production usage. However, it should not affect other sanitization when it is enabled. Please contact OPSWAT tech support if you have any samples that you would like to share with us for investigation.

XML sanitization is specific to XML vulnerability. It does not eliminate other threat such as Microsoft Office XML formats. For example, Microsoft office 2003 supports XML format document (different from Microsoft Open XML, which is more strict version and zipped format). Please do not enable XML sanitization on production server to sanitize XML-based document. XML sanitization should be used only to reduce risk of XML parser vulnerability.

HTML sanitization is designed for Email Security purpose, should not use for sanitizing normal HTML traffic.

HWP: there are two versions of HWP, v3.0 and v5.0. v3.0 is document only can be created from legacy old Hangul Word Processor. For this reason, we do not support HWP v3 and result in "failed to sanitize". We recommend this old version file as suspicious. If you need support for v3.0, please contact support.

Archive sanitization (7z, gz, rar, xz, zip) is for Metadefender Core V4 only.

Additional notes for Metadefender Core v3.x:

  • It is required to restart Metadefender service after changes to the configuration. You can locate the ini file under <Metadefender Core v3.x install directory>\omsDSConfig.ini

Additional notes for Metadefender Core v4.x:

  • To change configuration, log into the Web Management Console then go to Inventory → Technologies. Press the edit button on the Data Sanitization row and enter the configuration in the Advanced Engine Configuration box.

  • The modified configuration will be deployed within a few minutes.

  • There is no need to restart Metadefender service.

  • Due to strict file type enforcement, not all the file type listed in this table are supported depending on file type analysis result. For example, if specific file is not detected correctly as PDF, no PDF sanitization will be performed.

Single / Multiple Output File

If target contains only one file, it will be not zipped and treat as single output file. For example, If a PDF file has only one page, converts to JPG will be JPG. If a PDF file has more than one page, there will be multiple JPG files and will result in a ZIP file. The following sanitization result in potentially multiple files (single ZIP file).

  • PDF->HTML

  • PDF->IMG

  • DOCX→HTML, IMG

  • XLSX->HTML, CSV, IMG

  • PPTX→HTML, IMG