Overview of high availability in Cortex XSOAR, including information about the different deployment architectures.
High availability is intended to keep your systems running even if one of the components in the system fails. It provides redundancy for the different components so if a problem occurs, there is minimal effect on your system. High availability is an active-active failover configuration, and is different from Live Backup, which is active-passive and requires manually enabling a backup server. Live Backup is not available for high availability deployments.
Out of the box, Cortex XSOAR is installed with the app server and database on the same machine, but for a high availability deployment, the app servers and database must be separated.
The app server processes all of the requests like running playbooks or creating custom content, while the database stores all of the data, including the content (custom and OOTB), indicators, incidents, etc.
In Cortex XSOAR there are different degrees of redundancy. The first, and also requisite degree, is database redundancy. To enable any level of high availability (HA) in Cortex XSOAR, you must migrate your database to Elasticsearch. This means that the app server and database server are located on different machines.
In addition, you can implement high availability by installing additional app servers, and if you are using Cortex XSOAR Multi-tenant, also configure HA Groups.
Flow
Implementing a high availability deployment requires you to prepare your environment with the following:
Migrate your existing database to Elasticsearch or, for new customers, install a new instance of Elasticsearch.
Configure a load balancer.
Configure a shared file server. For single-instance deployments, this needs to be done between the app servers, and for multi-tenant deployments, this needs to be done for the main hosts, as well as each HA group.
Proceed with the implementation of the high availability configuration.
Architecture
Depending on the configuration of your system, single instance or multi-tenant, you can achieve high availability using one of the following architectures.
Single instance deployment
In a single-instance deployment, the application server is installed on a dedicated machine and connects to an Elasticsearch database server. The Elasticsearch database server automatically provides redundancy in accordance with how you have configured Elasticsearch.
In addition, you can also install multiple application servers behind a load balancer. Requests are managed by the load balancer, most commonly, but not required, using a round-robin methodology.
Also, the app servers use a shared file system to ensure that all of the necessary files are available to all of the application servers in the cluster.
Multi-Tenant Deployment
Similar to a single instance, in a multi-tenant configuration you must first migrate your data to an Elasticsearch database, separating the main account server from the database server. Elasticsearch, depending on how it is configured, provides the database redundancy.
To achieve full high availability, you can then install multiple main account servers behind a load balancer, which also uses an NFS server to share the required files. The multiple main account servers communicate directly with the designated indexes in Elasticsearch.
Note
Once the main host servers are highly available, you can no longer host new accounts on those servers. Existing accounts on the Main host will still exist, but will not be highly available. Therefore, Cortex XSOAR recommends that you move the accounts from the Main host to an HA group.
In addition, Cortex XSOAR v6.2 and above enables you to create high availability groups (HA groups), which form a cluster of hosts that provide redundancy for all of the accounts on those hosts. The HA groups also use an NFS server to share the required files. The hosts communicate directly with the designated indexes in Elasticsearch.
HA groups provide your system with:
Redundancy for every account in the group
Redundancy for every host machine in the group
Performance improvements for every account in the group
Note
All hosts in the HA Group must have the same hardware specifications.
In the full high availability architecture, you can still have hosts that are not part of an HA group and work with Elasticsearch, and you can also maintain hosts that use the out-of-the-box configuration where the application server and database are on the same machine, using a Bolt/Bleve database.
High Availability and Remote Repositories
The remote repositories feature in the UI is not supported on development environments that run as High Availability (multi-app servers). You can still use a development > staging > production set up, where development is a single server (not High availability), but production can be High Availability. In this setup, both staging and production pull from the same git repository. If your development environment runs as High Availability, use the CI/CD Solution.
Load Balancing
You can install multiple application servers behind a load balancer. The load balancer must be configured to use sticky sessions and request timeouts. The load balancer should use a health check to verify that the App Server is up and running. Cortex XSOAR recommends you use the /health route and verify that you receive a 200 HTTP response.
Load balancing within HA groups for Multi-Tenant Deployment
Load balancing within HA groups uses an internal algorithm to distribute requests between all available hosts in the group. By default, the algorithm used is round robin, where requests are routed to available hosts on a cyclical basis.
You can set the load balancing algorithm used with the ha.host.selection.alg
server configuration. Possible values are:
round-robin
- (Default) Requests are routed to available hosts on a cyclical basis.random
- Each host is randomly assigned the next request.
When a host is alive and reachable, the requested account needs to be alive and reachable as well for the request to be directed to it:
On the host level, as long as at least 1 host in an HA group is alive and reachable, the host and all of the accounts within that group are fully available.
On the account level, as long as an account is alive and reachable on any of the hosts within that group, requests will be directed to it, offering full availability for that account
Load balancing on the Main account level is handled by the external load balancer.
Engines and Load balancing
Before you install an engine, ensure that you set both the Base URL and the External Host Name to the IP address of the Load Balancer. ( → → ).
The Base URL enables engines to connect to the Load Balancer. Setting the external host name to the load balancer enables the engines to connect to the Load Balancer even if one of the app servers fails. You can also add or remove app servers but the engines remain connected.
Note
When you upgrade to v6.5 or later from a previous version, you need to create either a new engine or edit the configuration file.