URL extraction - Developer Guide - Cortex XSIAM - Cortex - Security Operations

Cortex XSIAM Developer Guide

Product
Cortex XSIAM
Creation date
2023-05-01
Last date published
2024-06-04
Category
Developer Guide
Abstract

Extract a URL indicator from text that is recognized from a regular expression and then formatted with a formatting script.

The Cortex XSIAM URL indicator type is built using regular expression and a formatting script. The following describes the URL extraction components and what output you should expect when extracting URL indicators.

URL extraction components

There are two components when extracting URL indicators:

  • Regular expression

  • Formatting script

Regular expression

From a given text, a URL regular expression tries to catch a valid URL based on the following characteristics:

  • A URL prefixed by one of the following protocols:

    • HTTP

    • HTTPS

    • FTP

    • FTPS

    • HXXP (defanged HTTP)

    • HXXPS (defanged HTTPS)

  • A URL with ASCII or non-ASCII characters

  • Escaped and unescaped URLs

  • URL with or without query parameters

Formatting script

After extracting the URL using regular expression, a FormatURL formatting script iterates on each given URL and does the following:

  1. If the URL is prefixed by a URL defense system, Proofpoint or ATP, the script extracts the redirected URL and continues with steps 3-6 for the original and extracted redirected URL.

  2. If the URL is NOT prefixed by a URL defense system, Proofpoint or ATP, the script checks if the first query parameter is a redirected URL query parameter by checking if the first parameter value starts with HTTP or HTTPS.

    For example:

    https://www.good.site/index.html?redirectURL=https://evil.com/mal.html

    If the query parameter exists, the script extracts the redirected URL and performs steps 3-6 both for the given URL and the one extracted from the query parameter.

  3. Replaces "[.]" with "." .

    For example:

    https://www[.]example.com becomes https://www.example.com

  4. Decodes the URL.

    For example:

    https://www.example.com%2F%21%40 becomes https://www.example.com/!@

  5. Converts obfuscated characters.

    For example:

    hxxp → http becomes hxxps → https

  6. Returns the formatted URL.

Common URL structures

The following are the most common supported URL structures:

  • http://öevil.tld/

  • https://evilö.tld/evil.html

  • www.evilö.tld/evil.aspx

  • https://www.evöl.tld/

  • www.evil.tld/resource

  • http://xn--e1v2i3l4.tld/evilagain.aspx

  • https://www.xn--e1v2i3l4.tld

  • hxxps://www.xn--e1v2i3l4.tld

  • hxxp://www.xn--e1v2i3l4.tld

  • www.evil.tld:443/path/to/resource.html

  • https://1.2.3.4/path/to/resource.html

  • 1.2.3.4/path

  • 1.2.3.4/path/to/resource.html

  • http://1.2.3.4:8080/

  • http://1.2.3.4:8080/resource.html

  • http://☺.evil.tld/

  • http://1.2.3.4

  • ftp://foo.bar/resource

  • ftps://foo.bar/resource

For more information, see Indicator Extraction.