5. Data Sanitization (CDR)

What is Data Sanitization?

An increasingly popular and effective method of compromising computer security, especially as part of a targeted attack, involves sharing common document types or image files with victims. Even though the original versions of these files do not contain executable data, attackers have found ways to trigger these files to execute embedded malicious code. Popular techniques used to accomplish this include VBA macros, exploit payloads, and embedded Flash or JavaScript code. This type of attack has a high success rate because most users don’t expect common file types to contain infections. For high-risk files or scenarios, Data Sanitization, also known as Content Disarm & Reconstruction (CDR), prevents any possibility of malicious content (including zero-day threats) from executing. High-risk files can be sanitized through several different methods:

  • Removing hidden exploitable objects (e.g, scripts, macros, etc.)

  • Converting the file format

Supported File Types For Windows

 

Source File Type

Target Sanitized Types

1

doc

doc, pdf

2

dot

dot

3

xls

xls, pdf

4

ppt

ppt, pdf

5

rtf

rtf

6

docx

docx, txt, html, pdf, ps, jpg, bmp, png, tiff, svg

7

docm

docm, docx, txt, html, pdf, ps, jpg, bmp, png, tiff, svg

8

dotx

dotx

9

dotm

dotm , dotx

10

xlsx

xlsx, csv, html, tiff, pdf, ps, jpg, bmp, png, svg

11

xlsm

xlsm , xlsx, csv, html, tiff, pdf, ps, jpg, bmp, png, svg

12

xlsb

xlsb

13

csv

csv

14

pptx

pptx, html, pdf, ps, jpg, bmp, png, tiff, svg

15

pptm

pptm , pptx, html, pdf, ps, jpg, bmp, png, tiff, svg

16

ppsx

ppsx

17

odt

odt

18

htm/html

html, pdf, ps, jpg, bmp, png, svg

19

pdf

pdf, html, svg, jpg, bmp, png, tiff, txt

20

hwp

hwp

21

jtd

jtd

22

xml*

xml

23

xml-doc*

pdf

24

xml-docx*

pdf

25

xml-xls*

pdf

26

jpg

jpg, bmp, png, tiff, svg, gif, ps, eps, pdf

27

bmp

bmp, jpg, png, tiff, svg, gif, ps, eps, pdf

28

png

png, jpg, bmp, tiff, svg, gif, ps, eps, pdf

29

tiff

tiff, jpg, bmp, png, svg, gif, ps, eps

30

svg

jpg, bmp, png, tiff, gif, ps, eps

31

gif

gif, jpg, bmp, png, tiff, svg, ps, eps, pdf

32

wmf*

jpg,bmp,png,tiff,svg,gif,ps,eps,pdf

33

dwg

dwg

34

7z

zip

35

gz

zip

36

rar

zip

37

xz

zip

38

zip

zip

Notes:

  • Archive sanitization (7z, gz, rar, xz, zip) is for Metadefender Core V4 only.

XML / WMF sanitization is in BETA. Please do not enable for production usage. However, it should not affect other sanitization when it is enabled. Please contact OPSWAT tech support if you have any samples that you would like to share with us for investigation.

XML sanitization is specific to XML vulnerability. It does not eliminate other threat such as Microsoft Office XML formats. For example, Microsoft office 2003 supports XML format document (different from Microsoft Open XML, which is more strict version and zipped format). Please do not enable XML sanitization on production server to sanitize XML-based document. XML sanitization should be used only to reduce risk of XML parser vulnerability.

XML-* are Microsoft Office XML formats.

Supported File Types For Linux (BETA)

 

Source File Type

Target Sanitized Types

1

doc

doc, pdf

2

docx

docx, txt, html, pdf

3

xlsx

xlsx, csv, html

4

pptx

pptx

5

odt

odt

6

pdf

pdf, bmp

7

jpg

jpg, bmp, png, tiff, svg, gif, ps, eps

8

bmp

bmp, jpg, png, tiff, svg, gif, ps, eps

9

png

png, jpg, bmp, tiff, svg, gif, ps, eps

10

tiff

tiff, jpg, bmp, png, svg, gif, ps, eps

11

gif

jpg, bmp, png, tiff, svg, ps, eps

12

7z

zip

13

gz

zip

14

rar

zip

15

xz

zip

16

zip

zip

Single / Multiple Output File

If target contains only one file, it will be not zipped and treat as single output file. For example, If a PDF file has only one page, converts to JPG will be JPG. If a PDF file has more than one page, there will be multiple JPG files and will result in a ZIP file. The following sanitization result in potentially multiple files (single ZIP file).

  • PDF->HTML

  • PDF->IMG

  • DOCX→HTML, IMG

  • XLSX->HTML, CSV, IMG

  • PPTX→HTML, IMG