8.1.3.2.2. Polling mechanism

Retrieve Processing Results Using Data ID

Analysis is done asynchronously and each analysis request is tracked by a data ID. Initiating file analysis and retrieving the results need to be done using two separate API calls. This request needs to be made multiple times until the analysis is complete. Analysis completion can be traced using “process_info.progress_percentage” value from the response.

Request

Value

Method

GET

URL

/file/{data_id} or /process/{data_id}

Retrieve Processing Results Using Hash

Request

Value

Method

GET

URL

/hash/{md5|sha1|sha256 hash}

Request HTTP header parameters

name

type

required

value

rule

string

false

the name is the desired rule to query for (see 8.1.3.5. Fetching available processing rules)

apikey

string

false

User's session id, if 8.1.3.1. Process a file has API key sent, then API key is required for fetching

The retrieved result is always the most recent for the processed item, if rule is set then it will be the most recent under the given rule.

Successful response

HTTP status code: 200

{
"data_id": "8101abae27be4d63859c55d9e0ed0135",
"dlp_info": {
"certainty": "High",
"errors": {
},
"filename": "OPSWAT_Proactive_DLP_CCN_proactive-dlp-processed_by_OPSWAT_MetaDefender_8101abae27be4d63859c55d9e0ed0135.pdf",
"hits": {
"ccn": {
"display_name": "Credit Card Number",
"hits": [
{
"after": "123 Cherry Lane st.",
"before": "Card Number",
"certainty": "Very High",
"certainty_score": 100,
"hit": "XXXXXXXXXXXXXXX1938",
"isRedacted": true,
"severity": 0
}
]
},
"ssn": {
"display_name": "Social Security Number",
"hits": [
{
"after": "",
"before": "Social Security Number:",
"certainty": "High",
"certainty_score": 100,
"hit": "XXXXXXX2315",
"isRedacted": true,
"severity": 0
},
{
"after": "",
"before": "• Your reference number is",
"certainty": "Low",
"certainty_score": 8,
"hit": "XXXXX3578",
"isRedacted": false,
"severity": 0
}
]
}
},
"metadata_removal": {
"result": "not removed"
},
"redact": {
"result": "redacted"
},
"severity": 0,
"verdict": 1,
"watermark": {
"result": "added"
}
},
"file_info": {
"display_name": "OPSWAT_Proactive_DLP_CCN.pdf",
"file_size": 75906,
"file_type": "application/pdf",
"file_type_description": "Adobe Portable Document Format",
"md5": "c4863c8ce44fb7ae84eb48c9b78f8b5e",
"sha1": "a33c72a996a9603d479e3dff3d23bf619c975fbe",
"sha256": "b9fdc10b47950b9e503ef4dc0ef42d28e7c37ccd749d4a5dcd7d9b3218996b7f",
"upload_timestamp": "2020-03-12T08:37:05.412Z"
},
"process_info": {
"blocked_reason": "Sensitive Data Found",
"file_type_skipped_scan": false,
"outdated_data": [
"sanitization",
"enginedefinitions"
],
"post_processing": {
"actions_failed": "",
"actions_ran": "Sanitized",
"converted_destination": "",
"converted_to": "",
"copy_move_destination": "",
"sanitization_details": {
"description": "Sanitized successfully.",
"details": [
{
"action": "removed",
"count": 2,
"object_name": "hyperlink"
},
"sanitized_file_info": {
"file_size": 2312,
"sha256": "3603748179C79628AE4025E5252456286DC57FA7A420799B9EE268AFB884DB9E"
}
]
}
},
"processing_time": 4804,
"profile": "File process",
"progress_percentage": 100,
"queue_time": 15,
"result": "Blocked",
"user_agent": "webscan",
"username": "LOCAL/admin",
"verdicts": [
"Sensitive Data Found"
]
},
"scan_results": {
"data_id": "8101abae27be4d63859c55d9e0ed0135",
"progress_percentage": 100,
"scan_all_result_a": "Sensitive Data Found",
"scan_all_result_i": 20,
"scan_details": {
"ClamAV": {
"def_time": "2020-03-11T11:08:00.000Z",
"eng_id": "clamav_1_windows",
"location": "local",
"scan_result_i": 0,
"scan_time": 336,
"threat_found": "",
"wait_time": 3
}
},
"start_time": "2020-03-12T08:37:05.427Z",
"total_avs": 1,
"total_time": 4804
},
"vulnerability_info": {
"verdict": 0
},
"yara_info": {
}
}

Response description:

  • data_id: data ID of the requested file

  • file_info: basic information of the scanned file

  • scan_results: results of the scan

    • data_id: data ID of the requested file

    • progress_percentage: percentage of progress, if it is 100, then the scan is completed

    • scan_all_result_a: the overall scan result in string

    • scan_all_result_i: the overall scan result in number code

    • individual scan engine results will be consolidated according to the following priority:

      1. Threat found

      2. Object is suspicious

      3. Object is encrypted / too deep (archive only) / too big (archive only) / containing too many files (archive only) / extraction timeout exceeded (archive only)

      4. Filetype mismatch

      5. No threat detected

      6. Object was not scanned

      7. Failed to scan the object

    • scan_details: scan results for each antivirus engine. The key is the name of the antivirus engine and the value is the result of the antivirus engine

      • def_time: the database definition time for this engine

      • eng_id: the unique identification string for the engine

      • location: place of scan engine

      • scan_result_i: numeric code of engine scan result

      • scan_time: time elapsed during scan with the engine in milliseconds

      • wait_time: time elapsed between sending file to node and receiving the result from the engine in milliseconds

      • threat_found: name of the scan result

    • start_time: start time of scan

    • total_avs: number of used antivirus engines

    • total_time: total time elapsed during scan in milliseconds

  • process_info: process information

    • post_processing: Contains information about result of data sanitization

      • "actions_ran": "Sanitized" or "" and the names of Post Actions that were also run.
        The separator is "|" (pipe). (e.g.: actions_ran: "PAscript" or actions_ran: "Sanitized | PAscript")

      • "actions_failed": "Sanitization Failed" or "" and the names of failed Post Actions.
        The separator is "|" (pipe). (e.g.: actions_failed: "PAscript failed" or actions_failed: "Sanitization Failed | PAscript failed" )

      • "converted_to": contains target type name of sanitization

      • "copy_move_destination": ""

      • "converted_destination": contains the name of the sanitized file

    • processing_time: total time elapsed during processing file on the node in milliseconds

    • progress_percentage: percentage of processing completed

    • queue_time: total time elapsed during file waits in the queue in milliseconds

    • user_agent: who called this API

    • username: user identifier who submitted scan request earlier

    • profile: the name of the rule used

    • result: the final result of processing the file (Allowed / Blocked / Processing)

    • blocked_reason: gives the reason if the file is blocked

    • file_type_skipped_scan: indicates if the input file's detected type was configured to skip scanning

    • issues: task related issues (e.g.: blocked by 3rd party software, can not access file for scanning )

    • outdated_data: array of flags - if occur - describing outdated data in the result, these can be

      • enginedefinitions: at least one of the AV engines the item was scanned with has a newer definition database

      • configuration: the process' rule - or any item used by the rule - was modified since the item was processed

      • sanitization: if item was sanitized this flag notifies that the sanitization information regarding this result is outdated, meaning the sanitized item is no longer available

  • vulnerability_info: see 8.1.6. Vulnerability Info In Processing Result

  • dlp_info: information on matched sensitive data

    • certainty: describes how certain the hit is, possible values:

      • Very Low

      • Low

      • Medium

      • High

      • Very High

    • errors: a list of error objects (empty if no errors happened), each error object contains following keys:

      • scan: scan related error description

      • redact: redaction related error description

      • watermark: watermark related error description

      • metadata_removal: metadata removal related error description

    • filename: output processed file name (pre-configured on engine settings under Core's worflow rule)

    • hits: detailed results that contains:

      • type of matched rule: ccn (credit card number), ssn (social security number), regex_<number> (regular expression with a number in order to differentiate the RegEx rules if there are more.)

        • display_name: Credit Card Number, Social Security Number, or in case of RegEx, the name of the rule that has been given by the user

        • hits: the hits for that type

          • before: the context before the matched data

          • after: the context after the matched data

          • certainty: text version of "certainty_score", possible values:

            • Very Low

            • Low

            • Medium

            • High

            • Very High

          • certainty_score: is defined by the relevance of the given hit in its context. It is calculated based on multiple factors such as the number of digits, possible values: [0-100]

          • hit: the matched data

          • isRedacted: file was redacted or not

          • severity (NOTE: this field is deprecated): can be 0 (detected) or 1 (suspicious)

    • metadata_removal: result of metadata removal

      • result: result of the metadata removal process, possible values:

        • "removed"

        • "not removed"

        • "failed to remove"

    • redact: result of redaction

      • result: result of the redaction process, possible values:

        • “redacted”

        • “not redacted”

        • “failed to redact”

    • watermark: result of watermarking

      • results: result of the watermarking process, possible values:

        • “added”

        • "not added"

        • "failed to add"

    • severity (NOTE: this field is deprecated): represents the severity of the data loss, possible values:

      • 0 - Certainly is data loss

      • 1 - Might be data loss

    • verdict: the overall result for the scanned file. It can be

      • 0 - clean

      • 1 - found matched data

      • 2 - suspicious

      • 3 - failed

      • 4 - not scanned (e.g. not supported file type)

  • yara_info: information on data that matched yara rules

    • hits: detailed results that contains:

      • the name of the matched rules

      • a description

    • verdict: the overall result for the scanned file.

      • 0 - clean

      • 1 - found matched data

      • 2 - suspicious

      • 3 - failed

      • 4 - not scanned

Please find possible overall and per engine scan results here.

Response (not existing data_id)

HTTP status code: 200

{
"61dffeaa728844adbf49eb090e4ece0e": "Not Found"
}

Error response

Unexpected event on server

HTTP status code: 500

{
"err": "<error message>"
}

Note: Check Metadefender Core server logs for more information.