Use the DBotPredictURLPhishing script for URL phishing detection.
The following describes how to use the machine learning model for URL Phishing detection. The model is pre-trained and does not need any training from the user.
The URL Phishing model ingests data such as the screenshot and HTML of a web page, URL syntax, and domain information, and predicts if the URL is a phishing attack. The verdict can be Malicious, Suspicious, or Benign.
The command to use is DBotPredictURLPhishing
.
Arguments
After running the DBotPredictURLPhishing
command with your arguments, the model extracts URLs from the emailBody
, emailBodyHTML
, and urls
arguments. Then the model selects and runs only on the maxNumberOfURL
URLs.
The selection of URLs is made in order to give priority to URLs from different domains and URLs for which the domain does not belong to our top 300k domains list (Majestic top domains).
By default, if one of the selected URLs belongs to the first 300k Majestic top domains, the model won’t run. This behavior can be changed by setting the forceModel
argument to True.
Inputs
The following are the list of inputs:
urls: Space-separated list of URLs.
emailBody: The plain text of the email body for which you want to get a prediction.
emailHTML: The HTML of the email for which you want to get a prediction.
forceModel: Whether to force the model to run if the URL belongs to the whitelist. If True, the model will run in every case. If False, the model will run only if the URL does not belong to the whitelist.
resetModel: Whether to reset the model to the model existing in Docker.
maxNumberOfURL: The maximum number of extracted URLs on which to run the model.
Outputs
After running the command, Cortex XSOAR returns the following:
Phishing prediction summary for URLs: Final verdict for each of the extracted URLs.
Phishing prediction evidence | domain: Explanation of the verdict for each of the URLs.
In the Phishing prediction evidence section, the following information will appear:
Domain: Domain of the URL.
Is there a login form: Indicates if there is a login form in the HTML. Usually phishing attacks try to steal credentials from the victim and attackers using a login form to retrieve this information.
New Domain (less than 6 months ago): Indicates if the domain is younger than 6 months. New domains tend to be malicious.
Search engine optimization: Evaluates the SEO quality of the URL. Malicious domains tend to have a poor SEO.
Suspicious use of company logo: Checks if a logo (from our list of top most companies used for phishing) has been fraudulently used. Our predefined list of logos is: Paypal, iInstagram, Gmail, Outlook, Linkedin, Facebook, Ebay, amazon, Google, Microsoft.
URL severity score: Probability that the URL is malicious based only on the URL syntax.
A screenshot of the page is displayed with a matched logo (if available).
DBotPredictURLPhishing.URL: URL on which the model was run.
DBotPredictURLPhishing.FinalVerdict: Final verdict of the URL.
Troubleshooting
URL not correct: If the URL is misspelled or does not exist, the model will display the corresponding error in the Phishing prediction summary for URLs section.
URL blocked by firewall: If the URL is blocked by your firewall, the model will display the corresponding error in the Phishing prediction summary for URLs section.
Logo appears in legitimate URL: If a logo from our predefined list of logos appears in a legitimate web page that does not belong to the top majestic domain, it will raise an alert. This is because it might not be a popular legitimate URL that was registered with Google.
Skip phishing page registered under top Majestic domain: It can happen that a page is registered under a domain that belongs to our top Majestic domain list. In that case, the URL will be predicted as Benign (but can be malicious). We use this skip whitelist because applying the model for many URLs might cause performance issues. For example, if a phishing page is registered under https://docs.google.com/ it will be skipped even if it is malicious.