The phishing classifier enables you to train a machine learning model for incidents.
The phishing classifier enables you to train a machine learning model based on your incidents, which can be applied later on phishing incidents in a production environment. The machine learning model prediction can be used in various ways:
As additional indicators
Criteria for setting severity
Take action based on the model prediction, such as auto close an incident.
For more information about using the phishing classifier generally, see the Phishing blog and Machine Learning Models.
Prerequisites
The phishing classifier requires available incidents for training. To train a classifier, you need the following requirements:
At least 100 incidents of each class (such as
malicious
,false-positive
, etc.). Different classes can be merged into a single class, but their total number of incidents must exceed 100 incidents. Although this is the required minimum, in some cases more incidents are required to achieve a model with high precision.Phishing incidents need an email subject and email body stored as incident fields. The Phishing - Generic v3 playbook extracts the original email subject and email body and stores them as incidents fields.
Phishing incidents need a field which specifies its class (such as out-of-the-box fields like Email Classification and closeReason, but any single-select custom field can be used for training). You can define the fields using the close form or as a manual step in the playbook.
As it might take some time to obtain enough incidents for training, it is recommended to validate the second and third requirements at an early stage as they are part of the preprocessing and investigation process of phishing incidents.
Phishing Classifier Demo
If you want to get a better sense of how the output of the phishing classifier looks without going through the process of training a classifier by yourself, you can use the Phishing Classifier Demo to get a prediction of a phishing incident through the pretrained model. Note that the script is mainly used as a demo, and it is recommended to train a phishing classifier based on your incidents for production, as described in the next section. You can see the model works and create your machine learning model.
Note
You need to install the Machine Learning content pack to use the phishing classifier demo.
Training a Phishing Classifier
You can Create a Machine Learning Model in the ML Models tab in the Settings page. Once the training is complete, you can examine the generated model evaluation. If one of the model’s metrics does not match expectations, you can retrain the model using a higher number of incidents, or try to set different classification values. For example if the original model was trained on 3 classes, Legit
, Spam
and Malicious
, you can try to merge the Legit
and Spam
into a single verdict for the next training.
Apply the Model
After a phishing classifier has been trained successfully, the next phase is to apply it as part of your investigation process, which enables you to store the model predictions as incident fields. To do so, you need to add a task which runs the DBotPredictPhishingWords script as part of your playbook. Set the modelName
argument with the same name of the trained model, and setIncidentFields=true
, so the model’s predictions are stored as incidents fields (the DBotPrediction
field stores the predicted class, and DBotPredictionProbability
contains the probability of the prediction). You can find a usage example of this script in the Phishing - Machine Learning Analysis playbook.
Second Evaluation
After training and applying the phishing classifier for the first time, it is recommended to let it run for a while, and not yet perform any action based on its predictions.
If your model was applied for a period of time and you have a few hundred incidents which were predicted by the new model, you may want to run a second evaluation. Contact the Customer Support Data Science team and they can guide you through this process. Based on this second evaluation and how you would wish to use the model, they can suggest the best ways to use the model.
Use the Machine Learning Model in the investigation
Based on the evaluation, there are a number of different ways to involve the model in the investigation.
Display the model’s output as an indicator as part of the phishing layout.
Set severity based on the model prediction. For example, if
DBotPrediction
isMalicious
andDBotPredictionProbability
is larger than some threshold, the incident severity can be raised toHigh
. As a result, phishing incidents that were predicted as malicious within a high probability would be prioritized and handled first.Auto close malicious or non-malicious incidents, based on a similar condition as described above.
It is recommended to begin with the first two ways and consider moving to the last one after the model was deployed for a while, and you gained more confidence with its performance. When setting severity or auto-closing incidents conditioned on the prediction fields, it is recommended to consider adding other meaningful fields to the condition to obtain a stronger condition. For example, auto-close incidents which were predicted as legitimate and did not have indicators with a bad reputation.
Machine Learning Model Disclaimer
The phishing classifier is a machine learning model, and as such, it is a statistical model, which means that you should expect some false positive predictions. This is why it is recommended having multiple evaluations over a period of time. When deciding how to involve the model in the investigation process, you should take the possibility of false positives into account. Contact the Data Science team via Customer Support for help with any related issue and how to produce the best results from a phishing classifier.