High Availability Overview - Administrator Guide - 6.10 - Cortex XSOAR - Cortex - Security Operations

Cortex XSOAR Administrator Guide

Product
Cortex XSOAR
Version
6.10
Creation date
2022-10-13
Last date published
2024-12-04
End_of_Life
EoL
Category
Administrator Guide
Abstract

Overview of high availability in Cortex XSOAR, including information about the different deployment architectures.

High availability is intended to keep your systems running even if one of the components in the system fails. It provides redundancy for the different components so if a problem occurs, there is minimal effect on your system. High availability is an active-active failover configuration, and is different from Live Backup, which is active-passive and requires manually enabling a backup server. Live Backup is not available for high availability deployments.

Out of the box, Cortex XSOAR is installed with the app server and database on the same machine, but for a high availability deployment, the app servers and database must be separated.

The app server processes all of the requests like running playbooks or creating custom content, while the database stores all of the data, including the content (custom and OOTB), indicators, incidents, etc.

In Cortex XSOAR there are different degrees of redundancy. The first, and also requisite degree, is database redundancy. To enable any level of high availability (HA) in Cortex XSOAR, you must migrate your database to Elasticsearch. This means that the app server and database server are located on different machines.

In addition, you can implement high availability by installing additional app servers, and if you are using Cortex XSOAR Multi-tenant, also configure HA Groups.

Flow

Implementing a high availability deployment requires you to prepare your environment with the following:

Architecture

Depending on the configuration of your system, single instance or multi-tenant, you can achieve high availability using one of the following architectures.

Single instance deployment

In a single-instance deployment, the application server is installed on a dedicated machine and connects to an Elasticsearch database server. The Elasticsearch database server automatically provides redundancy in accordance with how you have configured Elasticsearch.

ha_single_instance.png

In addition, you can also install multiple application servers behind a load balancer. Requests are managed by the load balancer, most commonly, but not required, using a round-robin methodology.

Also, the app servers use a shared file system to ensure that all of the necessary files are available to all of the application servers in the cluster.

Multi-Tenant Deployment

Similar to a single instance, in a multi-tenant configuration you must first migrate your data to an Elasticsearch database, separating the main account server from the database server. Elasticsearch, depending on how it is configured, provides the database redundancy.

To achieve full high availability, you can then install multiple main account servers behind a load balancer, which also uses an NFS server to share the required files. The multiple main account servers communicate directly with the designated indexes in Elasticsearch.

Note

Once the main host servers are highly available, you can no longer host new accounts on those servers. Existing accounts on the Main host will still exist, but will not be highly available. Therefore, Cortex XSOAR recommends that you move the accounts from the Main host to an HA group.

ha_multi-tenant.png

In addition, Cortex XSOAR v6.2 and above enables you to create high availability groups (HA groups), which form a cluster of hosts that provide redundancy for all of the accounts on those hosts. The HA groups also use an NFS server to share the required files. The hosts communicate directly with the designated indexes in Elasticsearch.

HA groups provide your system with:

  • Redundancy for every account in the group

  • Redundancy for every host machine in the group

  • Performance improvements for every account in the group

Note

All hosts in the HA Group must have the same hardware specifications.

In the full high availability architecture, you can still have hosts that are not part of an HA group and work with Elasticsearch, and you can also maintain hosts that use the out-of-the-box configuration where the application server and database are on the same machine, using a Bolt/Bleve database.

High Availability and Remote Repositories

The remote repositories feature in the UI is not supported on development environments that run as High Availability (multi-app servers). You can still use a development > staging > production set up, where development is a single server (not High availability), but production can be High Availability. In this setup, both staging and production pull from the same git repository. If your development environment runs as High Availability, use the CI/CD Solution.

Load Balancing

You can install multiple application servers behind a load balancer. The load balancer must be configured to use sticky sessions and request timeouts. The load balancer should use a health check to verify that the App Server is up and running. Cortex XSOAR recommends you use the /health route and verify that you receive a 200 HTTP response.

Load balancing within HA groups for Multi-Tenant Deployment

Load balancing within HA groups uses an internal algorithm to distribute requests between all available hosts in the group. By default, the algorithm used is round robin, where requests are routed to available hosts on a cyclical basis.

You can set the load balancing algorithm used with the ha.host.selection.alg server configuration. Possible values are:

  • round-robin - (Default) Requests are routed to available hosts on a cyclical basis.

  • random - Each host is randomly assigned the next request.

When a host is alive and reachable, the requested account needs to be alive and reachable as well for the request to be directed to it:

  • On the host level, as long as at least 1 host in an HA group is alive and reachable, the host and all of the accounts within that group are fully available.

  • On the account level, as long as an account is alive and reachable on any of the hosts within that group, requests will be directed to it, offering full availability for that account

  • Load balancing on the Main account level is handled by the external load balancer.

Engines and Load balancing

Before you install an engine, ensure that you set both the Base URL and the External Host Name to the IP address of the Load Balancer. ( SettingsAboutTroubleshooting).

The Base URL enables engines to connect to the Load Balancer. Setting the external host name to the load balancer enables the engines to connect to the Load Balancer even if one of the app servers fails. You can also add or remove app servers but the engines remain connected.

Note

When you upgrade to v6.5 or later from a previous version, you need to create either a new engine or edit the configuration file.