Create a machine learning (ml) model in Cortex XSOAR to predict the classification of phishing incidents.
A machine learning model enables Cortex XSOAR to predict the classification of phishing incidents. For example, whether the incident should be classified as legitimate, malicious, or Spam. You can use these models in conjunction with your default investigation playbooks, or run commands separately in the War Room. It is usually used for training a model to predict the classification of a phishing incident. The main goal of the machine learning model is leveraging past phishing incidents to assist with the investigation of future incidents.
Select
→ → → .Define the Incidents Training Set Scope.
In the Model name field, type the name of the model that you want to create.
(Optional) In the Description field, type a meaningful description for the model.
To choose which incidents are to be used for training the model, in the Incident type field, from the dropdown list, select the type of incident for training, such as Phishing.
Select the date range from which incidents will be used for the training set. The more incidents, the better the expected results. It is recommended to use a longer period.
Select the field for which you want the model to learn to predict.
In the Incident field from the dropdown list, select the relevant field.
The Incident Field (classification field), stores the classification of the incident. This is a single select field, where the classification or the closed reason of incidents are stored. The out of the box fields are “Email Classification” or “Close Reason”, but you can use any other custom field.
After selecting the Incident field in the Field Values field, you can see the different values of classifications and the number of values across the selected incidents scope of incidents.
Set the final classification values.
In the Verdict columns, define the names of the verdicts for mapping your existing classification values.
This stage allows you to control which incidents’ classifications will be used in the training, and also merge multiple classifications into a single category. Verdict is a group of classifications values, for which each verdict includes one classification or more. The trained model predicts each new incident as one of those verdicts.
Map your data by associating the verdict with your defined classification values by dragging and dropping the Field Values into the respective Verdict fields.
Where values remain in the Field Values column, their corresponding incidents are not involved in the training. You may want to leave classifications such as Undetermined, Internal Phishing Test, or any other classifications that you do not want to participate in the training. For example:
It is possible to drag multiple classifications values into a single verdict. If so, the model treats all the classification values under the same verdict as if they had the same classification. This allows you to better define the prediction task of the model and merge some smaller groups into a single group.
This might be helpful if you have different sub-types of classifications. For example, if you have classification values of Spear Phishing, Malware, and Ransomware, you may want to map them all into a single verdict called Phishing. If you want to have a model which distinguishes between one classification and the rest (for instance, if you want to train a model which distinguishes between phishing and the rest of the classifications, you can map all other classifications other than phishing into a single verdict called “Non-Phishing”). In the following example we have 2 verdicts, one has phishing, the other has everything other than phishing:
You can have 2-3 different verdicts, where each verdict needs a minimum of 50 incidents for each. For an example, see Machine Learning Model Example.
(Optional) Change any of the Advanced training configuration settings.
In the Query incidents to include in training field, add a query to include specific incidents in the training model.
For example, if you use the phishing classifier to close incidents automatically without a manual review, you can train the model to include only those incidents that were not closed automatically. If you define the query as
closenotes: x
, only incidents where the Close Notes field equalsx
will be used for the training model. You may also want to train a classifier from a playbook include specific incidents based on a query.Use the Incident page to test your query and then copy it to this field.
If your environment contains phishing incidents with multiple languages, and you want to train a machine learning model using incidents of one specific language, you can select a Language. Incidents recognized as being in different languages will be filtered out of training.
Select a Training algorithm.
From Scratch. Train a new model based only on your incidents.
Fine Tune. Use your incidents to better adjust a model that is already partially trained.
Auto. One of the two aforementioned algorithm options is automatically selected based on the number of incidents provided.
In general, From Scratch is designed for use with a larger number of incidents, while Fine Tune is designed for use with a smaller number of incidents. If you’re not sure which option to choose, you can leave the default value of Auto selected, or you can train different models using different algorithms to test which model achieves better results.
In the Maximum number of incidents to test field, type the number of incidents that will be used to train the model.
Reduce the number only if the number of incidents is too large and causes performance problems. Use a higher number if you have more samples in your environment. Default is 3000.
In the Argument Mapping select the equivalent fields for Email body, Email HTML and Email subject.
By default, training is done based on the Email body, Email HTML, and Email subject.
Train the model by clicking Start Training.
You will be redirected back to the Machine Learning Models page, and the training process takes several minutes (it is possible to close the page).
If training is completed successfully, the percentage scores appear, which reflect the precision of the model of the different verdicts.
(Optional) View detailed performance information of the model.
Expand the results information by clicking + to the model name.
View a detailed evaluation, by clicking Evaluation of model performance.
A window opens showing a detailed evaluation of the model, which enables you to decide whether and how to use the trained model. You can see a detailed breakdown, showing what is the expected performance of the model for each class, displaying different metrics, such as precision, coverage, suggestions for applying a confidence threshold, etc.
If using the phishing incident type, you can now use model in the machine learning or War Room window or in the playbook. For more information, see Machine Learning Models.