Elasticsearch Best Practices - Administrator Guide - 6.10 - Cortex XSOAR - Cortex

Elasticsearch Best Practices - Administrator Guide - 6.10 - Cortex XSOAR - Cortex - Security Operations

Cortex XSOAR Administrator Guide

Product

Cortex XSOAR

Version

6.10

Creation date

2022-10-13

Last date published

2024-04-15

End_of_Life

EoL

Performance

To achieve high availability, you will need to set up at least three nodes.
Note
Disaster Recovery for Elasticsearch is implemented using snapshots. Elasticsearch Cross-Cluster Replication (CCR) is not supported.
It is important to utilize Elasticsearch versatile behavior using shards and replicas. If you are using more than one node, we recommend that you set the elasticsearch.defaultShardsPerIndex and elasticsearch.defaultReplicasPerIndex to 1 and 2, respectively. You can also use elasticsearch.shards and elasticsearch.replicas to configure specific number of shards and replicas respectively on each index.
For example, setting the following values in demisto.conf sets two replicas for each shard for the incidents index:
"elasticsearch": { "shards": { "common-incident": 1 }, "replicas": { "common-incident": 2 } } }
Note
This configuration takes effect when the index is created.
Suggested configuration based on the number of data nodes in the Elasticsearch cluster:
Configuration
2 Data Nodes
3 Data Nodes
4 Data Nodes
defaultShardsPerIndex
1
1
1
defaultReplicasPerIndex
1
2
2
Shards:
- common entry
- common-invplaybook
2
3
4
Replicas:
- common entry
- common-invplaybook
1
1
1

Configuration	2 Data Nodes	3 Data Nodes	4 Data Nodes
defaultShardsPerIndex	1	1	1
defaultReplicasPerIndex	1	2	2
Shards: - common entry - common-invplaybook	2	3	4
Replicas: - common entry - common-invplaybook	1	1	1

Cortex XSOAR uses an internal batch to Elasticsearch to avoid heavy requests and large indexing latency. There are two flags you can increase/decrease to improve performance.

Flag	Description
elasticsearch.innerBatchSize	Number of docs in each batch. The default is 250.
elasticsearch.maxContentLength	If `http.max_content_length` is not defined in your Elasticsearch, you can set `elasticsearch.maxContentLength` in configurations instead. This is used to calculate batch size in MB. The default is 100 MB.

Due to Cortex XSOAR’s high usage of Elasticsearch, we recommend you configure the lowest latency possible between the XSOAR server and the Elasticsearch nodes. We also recommend to have low latency between nodes to avoid slow indexing or possible data loss on outages. Any latency between Cortex XSOAR and Elasticsearch will greatly impact performance.
Elasticsearch uses an inner memory management implemented in Java that is called the heap size. It defaults to 1GB and we recommend you increase the heap size to 50% of data nodes memory and to 80% of master and coordinating nodes if the machine is dedicated to Elasticsearch.
Elasticsearch uses a refreshing mechanism to flush updated documents to the disk every configured number of seconds. We recommend setting the configuration index to one second. You can use the Cortex XSOAR demisto.conf file to set a specific refreshing interval in seconds for each index using elasticsearch.refreshIntervals.
For example, the following sets the configuration index to one second:
"elasticsearch": { "refreshIntervals": { "common-configuration": "1s" } }
Note
This configuration takes effect when the index is created.
We recommend using the most recent supported version of Elasticsearch, to ensure you have the latest features and all security updates.
Circuit breakers in Elasticsearch are enabled by default. Verify they are enabled and configured correctly, specifically the request circuit breaker and the parent circuit breaker. These circuit breakers will prevent data nodes from crashing on high memory consumption and will also drop certain requests that Cortex XSOAR will retry later.
Elasticsearch requires a large number of open file descriptors. Verify that the limit on the number of open file descriptors is sufficient to handle a high load of requests, to avoid possible data loss or service interruptions.

Scaling

To optimize an Elasticsearch deployment for scaling, we recommend using dedicated master nodes and coordinating nodes to handle requests, thus having all connections and requests go through the coordinating nodes. Using dedicated coordinating nodes improves Elasticsearch balancing when some data nodes might be unavailable or unresponsive due to high load. Dedicated master node(s) ease data node responsibility around shard management, snapshots, and monitoring. Elasticsearch distributes the requests more efficiently through the use of master and coordinating nodes. For full high availability, use three master nodes and three coordinating nodes.

We also recommend using a load balancer or round-robin DNS server when configuring an elasticsearch.url that points to all/any coordinating nodes. A load balancer enables you to swap, add or remove data nodes, master nodes and coordinating nodes without having to update the Cortex XSOAR configuration file and restart the Cortex XSOAR service, thus significantly decreasing down time for any changes.

Mapping

Indices with custom fields such as incidents, evidence, and indicators might exceed the max fields mapping limit in Elasticsearch. By default, Cortex XSOAR sets the limit to 2000. If you require more, you can increase the elasticsearch.totalFields field in the demisto.conf file.

For example, the following sets the incidents index maximum to 3000 mapped fields:

"elasticsearch": { "totalFields": { "common-incident": 3000 } }

Note

This configuration takes effect when the index is created.

Suggested Configuration for High Availability

The following is a recommended configuration for Cortex XSOAR indexing within Elasticsearch for three data nodes.

 "shards": { 
                "common-invplaybook": 3,
                "common-entry": 3
              },
         "replicas": { 
                "common-invplaybook": 1,
                "common-entry": 1
              },
         "defaultShardsPerIndex": 1,
         	"defaultReplicasPerIndex": 2,
         	"refreshIntervals": {
                 		"*": "30s",
                 		"common-configuration": "1s",
                 		"common-incident": "1s"
        }
          }

In this configuration, the default shards and replicas are 1 and 2, respectively, to support the high availability functionality. In addition, we have set the playbook and entry index shards to 3 each. This is done as playbooks and entries are both resource intensive.

Also, the refresh interval is set to 30 seconds (the default in Elasticsearch is one second) and have defined specific indices, such as the incidents and configurations, to one second. This allows for more efficient accessibility for these specific indices.

Note

The index names do not include any indexPrefix used in the configuration nor the dmst default prefix since the prefixes are calculated by the system automatically. Therefore, the index name mentioned in the configurations should start after dmst. For example, myprefix-dmst-common-incident_202203 is named common-incident index.

Indexing

To limit memory consumption, indexing for HTML and markdown fields is disabled by default so that these fields are not searchable. If you want to search for these fields, add the following server configurations:

server.large.markdown.unsearchable: Set to false to make markdown fields searchable in the UI. Default is true.
server.large.html.unsearchable: Set to false to make HTML fields searchable in the UI. Default is true.

Note

Marking the fields as searchable only takes effect in the next month. For example, if you make a change on February 10, the change takes effect on March 1.

In addition, by default, indexing HTML, markdown, and long text fields, are set to 30,000 characters. If large fields are detected, only the first 30,000 characters are searchable. You can change this by adding the server.text.max.characters server configuration and adding the amount of characters as required.

Note

Increasing the amount of characters can decrease performance. Reducing the amount of characters, limits disk space consumption and increases performance.