3. Data Sanitization (CDR)

What is Data Sanitization?

An increasingly popular and effective method of compromising computer security, especially as part of a targeted attack, involves sharing common document types or image files with victims. Even though the original versions of these files do not contain executable data, attackers have found ways to trigger these files to execute embedded malicious code. Popular techniques used to accomplish this include VBA macros, exploit payloads, and embedded Flash or JavaScript code. This type of attack has a high success rate because most users don’t expect common file types to contain infections. For high-risk files or scenarios, Data Sanitization, also known as Content Disarm & Reconstruction (CDR), prevents any possibility of malicious content (including zero-day threats) from executing. High-risk files can be sanitized through several different methods:

  • Removing hidden exploitable objects (e.g, scripts, macros, etc.)

  • Converting the file format

Supported File Types For Windows

 

Source File Type

Target Sanitized Types

1

doc

doc, pdf

2

dot

dot

3

xls

xls, pdf

4

ppt

ppt, pdf

5

rtf

rtf

6

docx

docx, txt, html, pdf, ps, jpg, bmp, png, tiff, svg

7

docm

docm, docx, txt, html, pdf, ps, jpg, bmp, png, tiff, svg

8

dotx

dotx

9

dotm

dotm , dotx

10

xlsx

xlsx, csv, html, tiff, pdf, ps, jpg, bmp, png, svg

11

xlsm

xlsm , xlsx, csv, html, tiff, pdf, ps, jpg, bmp, png, svg

12

xlsb

xlsb

13

csv

csv

14

pptx

pptx, html, pdf, ps, jpg, bmp, png, tiff, svg

15

pptm

pptm , pptx, html, pdf, ps, jpg, bmp, png, tiff, svg

16

ppsx

ppsx

17

odt

odt

18

htm/html

html, pdf, ps, jpg, bmp, png, svg

19

pdf

pdf, html, svg, jpg, bmp, png, tiff, txt

20

hwp

hwp

21

jtd

jtd

22

xml

xml

23

xml-doc

pdf

24

xml-docx

pdf

25

xml-xls

pdf

26

jpg

jpg, bmp, png, tiff, svg, gif, ps, eps, pdf

27

bmp

bmp, jpg, png, tiff, svg, gif, ps, eps, pdf

28

png

png, jpg, bmp, tiff, svg, gif, ps, eps, pdf

29

tiff

tiff, jpg, bmp, png, svg, gif, ps, eps

30

svg

svg, jpg, bmp, png, tiff, gif, ps, eps

31

gif

gif, jpg, bmp, png, tiff, svg, ps, eps, pdf

32

wmf

jpg, bmp, png, tiff, svg, gif, ps, eps, pdf

33

dwg

dwg

34

7z

7z, zip, gz, xz

35

gz

gz, 7z, zip, xz

36

rar

zip, 7z, gz, xz

37

xz

xz, zip, 7z, gz

38

zip

zip, 7z, gz, xz

DWG / ODT / XML / WMF / SVG (to SVG) sanitization is in BETA. Please do not enable for production usage. However, it should not affect other sanitization when it is enabled. Please contact OPSWAT tech support if you have any samples that you would like to share with us for investigation.

XML sanitization is specific to XML vulnerability. It does not eliminate other threat such as Microsoft Office XML formats. For example, Microsoft office 2003 supports XML format document (different from Microsoft Open XML, which is more strict version and zipped format). Please do not enable XML sanitization on production server to sanitize XML-based document. XML sanitization should be used only to reduce risk of XML parser vulnerability.

XML-* are Microsoft Office XML formats.

HWP: there are two versions of HWP, v3.0 and v5.0. v3.0 is document only can be created from legacy old Hangul Word Processor. For this reason, we do not support HWP v3 and result in "failed to sanitize". We recommend this old version file as suspicious. If you need support for v3.0, please contact support.

Archive sanitization (7z, gz, rar, xz, zip) is for Metadefender Core V4 only.

Additional notes for Metadefender Core v3.x:

  • It is required to restart Metadefender service after changes to the configuration. You can locate the ini file under <Metadefender Core v3.x install directory>\omsDSConfig.ini

Additional notes for Metadefender Core v4.x:

  • To change configuration, log into the Web Management Console then go to Inventory→Engines. Press the edit button on the Data Sanitization row and enter the configuration in the Advanced Engine Configuration box.

  • The modified configuration will be deployed within a few minutes.

  • There is no need to restart Metadefender service.

  • Due to strict file type enforcement, not all the file type listed in this table are supported depending on file type analysis result. For example, if specific file is not detected correctly as PDF, no PDF sanitization will be performed.

Supported File Types For Linux

 

Source File Type

Target Sanitized Types

1

doc

doc, pdf

2

docx

docx, txt, html, pdf

3

dot

dot

4

xlsx

xlsx, csv, html

5

pptx

pptx

6

odt

odt

7

pdf

pdf, bmp

8

xml

xml

9

csv

csv

10

jpg

jpg, bmp, png, tiff, svg, gif, ps, eps

11

bmp

bmp, jpg, png, tiff, svg, gif, ps, eps

12

png

png, jpg, bmp, tiff, svg, gif, ps, eps

13

tiff

tiff, jpg, bmp, png, svg, gif, ps, eps

14

gif

gif, jpg, bmp, png, tiff, svg, ps, eps

15

svg

svg

16

7z

7z, zip, gz, xz

17

gz

gz, 7z, zip, xz

18

rar

zip, 7z, gz, xz

19

xz

xz, zip, 7z, gz

20

zip

zip, 7z, gz, xz

Single / Multiple Output File

If target contains only one file, it will be not zipped and treat as single output file. For example, If a PDF file has only one page, converts to JPG will be JPG. If a PDF file has more than one page, there will be multiple JPG files and will result in a ZIP file. The following sanitization result in potentially multiple files (single ZIP file).

  • PDF->HTML

  • PDF->IMG

  • DOCX→HTML, IMG

  • XLSX->HTML, CSV, IMG

  • PPTX→HTML, IMG