3. Data Sanitization (CDR)

What is CDR?

An increasingly popular and effective method of compromising computer security, especially as part of a targeted attack, involves sharing common document types or image files with victims. Even though the original versions of these files do not contain executable data, attackers have found ways to trigger these files to execute embedded malicious code. Popular techniques used to accomplish this include VBA macros, exploit payloads, and embedded Flash or JavaScript code. This type of attack has a high success rate because most users don’t expect common file types to contain infections. For high-risk files or scenarios, Content Disarm & Reconstruction (CDR) prevents any possibility of malicious content (including zero-day threats) from executing. High-risk files can be sanitized through several different methods:

  • Removing hidden exploitable objects (e.g, scripts, macros, etc.)

  • Converting the file format

Supported File Types (both Windows and Linux)

 

Source File Type

Description

Target Sanitized Types

1

doc

Microsoft Word 97-2003 Document

doc, docx, pdf, rtf

2

dot

Microsoft Word 97-2003 Template

dot, dotx

3

xls

Microsoft Excel 97-2003 Workbook

xls, pdf*, csv

4

xlt

Microsoft Excel 97-2003 Template

xlt, pdf*, png*

5

ppt

Microsoft PowerPoint 97-2003 Presentation

ppt, pdf*

6

pot

Microsoft PowerPoint 97-2003 Template

pot, pdf*, png*

7

rtf

Microsoft Rich Text Format

rtf, pdf*

8

docx

Microsoft Word Document

docx, txt, html, pdf, ps*, jpg*, bmp*, png*, tiff*, svg*

9

docm

Microsoft Word Macro-Enabled Document

docm, docx*, txt*, html*, pdf*, ps*, jpg*, bmp*, png*, tiff*, svg*, rtf

10

dotx

Microsoft Word Template

dotx

11

dotm

Microsoft Word Macro-Enabled Template

dotm , dotx*

12

xlsx

Microsoft Excel Workbook

xlsx, csv, html, tiff*, pdf*, ps*, jpg*, bmp*, png*, svg*

13

xlsm

Microsoft Excel Macro-Enabled Workbook

xlsm , xlsx*, csv*, html*, tiff*, pdf*, ps*, jpg*, bmp*, png*, svg*

14

xlsb

Microsoft Excel Binary Workbook

xlsb

15

xltx

Microsoft Excel Template

xltx, pdf*, png*, csv

16

xltm

Microsoft Excel Macro-Enabled Template

xltm, pdf*, png*, csv

17

csv

Comma-separated values

csv

18

pptx

Microsoft PowerPoint Presentation

pptx, html*, pdf*, ps*, jpg*, bmp, png*, tiff*, svg*

19

potx

Microsoft PowerPoint Template

potx, pdf*, png*

20

pptm

Microsoft PowerPoint Macro-Enabled Presentation

pptm , pptx*, html*, pdf*, ps*, jpg*, bmp*, png*, tiff*, svg*

21

potm

Microsoft PowerPoint Macro-Enabled Template

potm, pdf*, png*

22

pps

Microsoft PowerPoint 97-2003 Show

pps, pdf*, png*

23

ppsm

Microsoft PowerPoint Macro-Enabled Show

ppsm, pdf*, png*

24

ppsx

Microsoft PowerPoint Show

ppsx, bmp

25

sldx

Microsoft Powerpoint Slide 2007+

sldx

26

sldm

Microsoft Office PowerPoint 2007 Slide - Macro Enabled

sldm

27

vsdx

Microsoft Visio Drawing

vsdx, pdf, xps, jpg, png, bmp, tiff, svg, emf, html, xaml, swf

28

vssx

Microsoft Visio Stencil

vssx*, pdf*, xps*, jpg*, png*, bmp*, tiff*, svg*, emf*, html*, xaml*, swf*

29

vstx

Microsoft Visio Template

vstx*, pdf*, xps*, jpg*, png*, bmp*, tiff*, svg*, emf*, html*, xaml*, swf*

30

vsdm

Microsoft Visio Macro-Enabled Drawing

vsdm, pdf, xps, jpg, png, bmp, tiff, svg, emf, html, xaml, swf

31

vssm

Microsoft Visio Macro-Enabled Stencil

vstx*, pdf*, xps*, jpg*, png*, bmp*, tiff*, svg*, emf*, html*, xaml*, swf*

32

vstm

Microsoft Visio Macro-Enabled Template

vstx*, pdf*, xps*, jpg*, png*, bmp*, tiff*, svg*, emf*, html*, xaml*, swf*

33

vsx

Microsoft Visio XML Stencil

pdf*, xps*, jpg*, png*, bmp*, tiff*, svg*, emf*, html*, xaml*, swf*

34

vtx

Microsoft Visio XML Template

pdf*, xps*, jpg*, png*, bmp*, tiff*, svg*, emf*, html*, xaml*, swf*

35

vdx

Microsoft Visio XML Drawing

pdf*, xps*, jpg*, png*, bmp*, tiff*, svg*, emf*, html*, xaml*, swf*

36

odt

OpenDocument Text

odt

37

ods

OpenDocument Spreadsheet

ods

38

ott

OpenDocument Document Template

ott

39

ots

OpenDocument Spreadsheet Template

ots

40

odp

OpenDocument Presentation

odp

41

htm/html

Hypertext Markup Language

html, pdf*, ps*, jpg*, bmp*, png*, svg*, txt

42

mht

MIME HTML

pdf*,jpg*,bmp*,png*,tiff*

43

hta

HTML Application

hta

44

pdf

Adobe Portable Document Format

pdf, html*, svg*, jpg*, bmp, png*, tiff*, txt*

45

ai

Adobe Illustrator

ai

46

hwp

Hangul Word Processor

hwp

47

hwt

Hangul Word Template

hwt

48

cell

Hancom Cell

cell

49

show

Hancom Show

show

50

jtd

Ichitaro Document

jtd

51

jtdc

Ichitaro Compressed Document

jtdc

52

xml

Extensible Markup Language

xml

53

xml-doc

Microsoft Word 2003 XML Document

pdf

54

xml-docx

Microsoft Word XML Document

pdf

55

xml-xls

Microsoft XML Spreadsheet 2003

pdf

56

vcs

vCalendar

vcs

57

ics

iCalendar

ics

58

lnk

Windows Shortcut

lnk

59

jpg

JPEG Image

jpg, bmp, png, tiff, svg, gif, ps, eps, pdf*

60

jp2

JPEG 2000

jp2

61

bmp

Windows Bitmap Image

bmp, jpg, png, tiff, svg, gif, ps, eps, pdf*

62

png

Portable Network Graphics

png, jpg, bmp, tiff, svg, gif, ps, eps, pdf*

63

tiff

Tagged Image File Format

tiff, jpg, bmp, png, svg, gif, ps, eps

64

svg

Scalable Vector Graphics

svg, jpg*, bmp, png*, tiff*, gif*, ps*, eps*

65

gif

Graphics Interchange Format

gif, jpg, bmp, png, tiff, svg, ps, eps, pdf*

66

wmf

Windows Metafile

wmf, jpg, bmp*, png*, tiff*, svg*, gif*, ps*, eps*, pdf*

67

emf

Windows Enhanced Metafile

emf

68

ico

Icon

ico*

69

cur

Cursor

cur*

70

webp

Google Image File Format for Web

webp

71

wdp

HD Photo

wdp*

72

dwg

AutoCAD

dwg

73

dwt

AutoCAD Drawing Template

dwt

74

dws

AutoCAD Drawing Standards

dws

75

dxf

Drawing Interchange Format

pdf*, jpg*, png*, bmp*, gif*, tiff*

76

dwf

Design Web Format

pdf*, jpg*, png*, bmp*, gif*, tiff*

77

3ds

3D Studio

3ds*, dae*, stl*, fbx*

78

dae

Digital Asset Exchange

dea*, 3ds*, stl*, fbx*

79

u3d

Universal 3D

u3d*, 3ds*, dae*, stl*, pdf*, drc*, rvm*, fbx*

80

drc

Google Draco

drc*, 3ds*, dae*, pdf*, u3d*, rvm*, fbx*

81

rvm

AVEVA Plant Design Management System Model

rvm*, 3ds*, dae*, stl*, pdf*, u3d*, drc*, fbx*

82

dcm

Digital Imaging and Communications in Medicine

dcm*

83

wmv

Windows Media Video

wmv*

84

mpeg

Moving Picture Experts Group

mpeg*

85

wav

Waveform Audio

wav*

86

mp3

MPEG-1 Audio Layer-3

mp3*

87

mp4

MPEG-4 Part 14

mp4*

88

avi

Audio Video Interleave

avi*

89

eml

Electronic mail

eml

90

msg

Microsoft Outlook Message

msg

91

pst

Outlook Personal Folder

pst*

92

txt

Text

txt*, pdf*

93

7z

7-zip Archive

7z, zip, gz, xz, tar

94

gz

GNU Zipped Archive

gz, 7z, zip, xz, tar

95

rar

WinRAR Archive

rar, zip, 7z, gz, xz, tar

96

xz

XZ Archive

xz, zip, 7z, gz, tar

97

zip

ZIP Archive

zip, 7z, gz, xz, tar

98

alz

ALZip

zip, 7z, gz, xz, tar

99

tar

Tape Archive

tar, zip, 7z, gz, xz

100

bz2

BZ2 Archive

zip, 7z, gz, xz, tar

101

lzma

LZMA Archive

zip, 7z, gz, xz, tar

102

lzh

LZH Archive

zip, 7z, gz, xz, tar

103

arj

ARJ Archive

zip, 7z, gz, xz, tar

104

cab

Cabinet Archive

zip, 7z, gz, xz, tar

105

wsp

Windows Sharepoint

zip, 7z, gz, xz, tar

106

ace

WinAce archive format

zip, 7z, gz, xz, tar

107

tse

TIP Test Selection Engine

tse, zip, 7z, gz, xz, tar

108

tsez

TIP Test Selection Engine

tsez, zip, 7z, gz, xz, tar

109

tsec

TIP Test Selection Engine

tsec, zip, 7z, gz, xz, tar

* Only supported on Windows for now.

Sanitization is in BETA for these file types:

  • XLT / XLTX / XLTM

  • PPS / POT / PPSM / POTX / POTM / SLDM / SLDX

  • VSDX / VSDM / VSSX / VSTX / VSTM / VSSM / VSX / VTX / VDX

  • ODT / OTT / ODS / OTS

  • SVG (to SVG) / WMF / EMF / AI / DCM / WebP / WDP

  • ICO / CUR

  • DXF / DWF / DWT / DWS

  • DAE / 3DS / U3D / DRC / RVM

  • MP3 / MP4 / WMV / MPEG / AVI

  • EML / MSG / PST

  • MHT

  • JTDC

  • XML

  • TXT

  • LNK

  • TAR / CAB / LZH / LZMA / BZ2 / ARJ / ALZ / WSP / ACE

  • CELL / SHOW

  • TSE / TSEZ / TSEC

XML sanitization is specific to XML vulnerability. It does not eliminate other threat such as Microsoft Office XML formats. For example, Microsoft office 2003 supports XML format document (different from Microsoft Open XML, which is a more strict version and zipped format). Please do not enable XML sanitization on the production server to sanitize XML-based document. XML sanitization should be used only to reduce risk of XML parser vulnerability.

HTML sanitization is designed for Email Security purposes, should not use for sanitizing normal HTML traffic.

HWP: there are two versions of HWP, v3.0 and v5.0. v3.0 document can be created from only legacy old Hangul Word Processor. For this reason, we do not support HWP v3 and result in "failed to sanitize". We recommend this old version file as suspicious. If you need support for v3.0, please contact support.

Archive sanitization is for MetaDefender Core V4 only.

EML sanitization is available from MetaDefender Core 4.14.2 only

Additional notes for Metadefender Core v3.x:

  • It is required to restart Metadefender service after changes to the configuration. You can locate the ini file under <Metadefender Core v3.x install directory>\omsDSConfig.ini

Additional notes for Metadefender Core v4.x:

  • To change configuration, log into the Web Management Console then go to Inventory → Technologies. Press the edit button on the Deep CDR row and enter the configuration in the Advanced Engine Configuration box.

  • The modified configuration will be deployed within a few minutes.

  • There is no need to restart Metadefender service.

  • Due to strict file type enforcement, not all the file type listed in this table are supported depending on the file type analysis result. For example, if a specific file is not detected correctly as PDF, no PDF sanitization will be performed.

Single / Multiple Output File

If target contains only one file, it will be not zipped and treat as single output file. For example, If a PDF file has only one page, converts to JPG will be JPG. If a PDF file has more than one page, there will be multiple JPG files and will result in a ZIP file. The following sanitization result in potentially multiple files (single ZIP file).

  • PDF->HTML

  • PDF->IMG

  • DOCX→HTML, IMG

  • XLSX->HTML, CSV, IMG

  • PPTX→HTML, IMG

Known Issues

  1. Not supporting Microsoft Office 95 document format

  2. Conversion from HTML to an image would fail if the size of the HTML file is bigger than 90KB

  3. Support AutoCAD file (.DWG) versions: 2004-2018. With version 2007-2009, when removing macro from the original file (if it has), opening sanitize file will display an error message "Failed to load project from storage" appeared but the file still works as usual

  4. Support MPEG2 only

  5. Support TXT in ASCII and UTF-8 only

  6. When converting Excel files to TXT, only the first sheet is converted

  7. Support AI in PDF format