Use the Phishing Classifier in Production - Administrator Guide - 6.6 - Cortex XSOAR - Cortex

Use the Phishing Classifier in Production - Administrator Guide - 6.6 - Cortex XSOAR - Cortex - Security Operations

Cortex XSOAR Administrator Guide

Product

Cortex XSOAR

Version

6.6

Creation date

2022-09-29

Last date published

2024-04-08

End_of_Life

EoL

Category

Administrator Guide

Abstract

The phishing classifier enables you to train a machine learning model for incidents.

The phishing classifier enables you to train a machine learning model based on your incidents, which can be applied later on phishing incidents in a production environment. The machine learning model prediction can be used in various ways:

As additional indicators
Criteria for setting severity
Take action based on the model prediction, such as auto close an incident.

For more information about using the phishing classifier generally, see the Phishing blog and Machine Learning Models.

Prerequisites

The phishing classifier requires available incidents for training. To train a classifier, you need the following requirements:

At least 100 incidents of each class (such as malicious, false-positive, etc.). Different classes can be merged into a single class, but their total number of incidents must exceed 100 incidents. Although this is the required minimum, in some cases more incidents are required to achieve a model with high precision.
Phishing incidents need an email subject and email body stored as incident fields. The Phishing - Generic v3 playbook extracts the original email subject and email body and stores them as incidents fields.
Phishing incidents need a field which specifies its class (such as out-of-the-box fields like Email Classification and closeReason, but any single-select custom field can be used for training). You can define the fields using the close form or as a manual step in the playbook.

As it might take some time to obtain enough incidents for training, it is recommended to validate the second and third requirements at an early stage as they are part of the preprocessing and investigation process of phishing incidents.

Phishing Classifier Demo

If you want to get a better sense of how the output of the phishing classifier looks without going through the process of training a classifier by yourself, you can use the Phishing Classifier Demo to get a prediction of a phishing incident through the pre-trained model. Note that the script is mainly used as a demo, and it is recommended to train a phishing classifier based on your incidents for production, as described in the next section. You can see the model works and create your machine learning model.

Note

You need to install the Machine Learning content pack to use the phishing classifier demo.

Training a Phishing Classifier

You can Create a Machine Learning Model in the ML Models tab in the Settings page. Once the training is complete, you can examine the generated model evaluation. If one of the model’s metrics does not match expectations, you can retrain the model using a higher number of incidents, or try to set different classification values. For example if the original model was trained on 3 classes, Legit, Spam and Malicious, you can try to merge the Legit and Spam into a single verdict for the next training.

Apply the Model

After a phishing classifier has been trained successfully, the next phase is to apply it as part of your investigation process, which enables you to store the model predictions as incident fields. To do so, you need to add a task which runs the DBotPredictPhishingWords script as part of your playbook. Set the modelName argument with the same name of the trained model, and setIncidentFields=true, so the model’s predictions are stored as incidents fields (the DBotPrediction field stores the predicted class, and DBotPredictionProbability contains the probability of the prediction). You can find an usage example of this script at the Phishing Investigation - Generic v2, under the Machine Learning section.

Second Evaluation

After training and applying the phishing classifier for the first time, it is recommended to let it run for a while, and not yet perform any action based on its predictions.

If your model was applied for a period of time and you have a few hundred incidents which were predicted by the new model, you may want to run a second evaluation. Contact the Customer Support Data Science team and they can guide you through this process. Based on this second evaluation and how you would wish to use the model, they can suggest the best ways to use the model.

Use the Machine Learning Model in the investigation

Based on the evaluation, there are a number of different ways to involve the model in the investigation.

Display the model’s output as an indicator as part of the phishing layout.
Set severity based on the model prediction. For example, if DBotPrediction is Malicious and DBotPredictionProbability is larger than some threshold, the incident severity can be raised to High. As a result, phishing incidents that were predicted as malicious within a high probability would be prioritized and handled first.
Auto close malicious or non-malicious incidents, based on a similar condition as described above.

It is recommended to begin with the first two ways and consider moving to the last one after the model was deployed for a while, and you gained more confidence with its performance. When setting severity or auto-closing incidents conditioned on the prediction fields, it is recommended to consider adding other meaningful fields to the condition to obtain a stronger condition. For example, auto-close incidents which were predicted as legitimate and did not have indicators with a bad reputation.

Machine Learning Model Disclaimer

The phishing classifier is a machine learning model, and as such, it is a statistical model, which means that you should expect some false positive predictions. This is why it is recommended having multiple evaluations over a period of time. When deciding how to involve the model in the investigation process, you should take the possibility of false positives into account. Contact the Data Science team via Customer Support for help with any related issue and how to produce the best results from a phishing classifier.