Train a classifier for non English machine learning. Adjust the language and tokenization method by which to train a classifier on in Cortex XSOAR.
To train a classifier on languages other than those referred to in Train a Classifier on Languages with Adjusted Tokenization, you need to configure the language and tokenization method. Tokenization is the method by which the classifier breaks up sentences and words to analyze threats appropriately. When the language for the classifier is configured to Other
, the user can configure the method of tokenization by which to train a classifier on to one of the following options:
Tokenization - (Default) automatically separate sentences by words
Word - separates the text based on spacing
Letter - separates the text based on charachters and symbols
Follow the steps below to adjust the language and tokenization method by which to train a classifier on for other languages.
Go to Automation.
Search for
DBotPreProcessTextData
.Copy the automation by selecting Duplicate Automation.
(Optional) Change the name of the duplicated script to make it distinguishable.
From the Argument section, expand the tokenizationMethod field, and change the Initial value to the desired tokenization method. For example,
byWord
.Expand the language field and change the Initial value to
Other
.Click Save.
Search for
DBotPredictPhishingWords
.Copy the automation, by selecting Duplicate Automation.
(Optional) Change the name of the duplicated script to make it distinguishable.
From the Argument section, expand the tokenizationMethod field, and change the Initial value to the desired tokenization method. For example,
byWord
.Expand the language field, and change the value to
Other
.Click Save.
Navigate to Playbooks.
Search for the
DBot Create Phishing Classifier V2
playbook to update.Copy the playbook by selecting Duplicate Playbook.
(Optional) Change the name of the duplicated playbook to make it distinguishable.
Select the Pre-process file task.
From the dropdown menu replace the automation with the duplicated version of DBotPreProcessTextData created in Step 2.
Click OK and Save Version.