Ensure reliable and continuous operation with High Availability.
High availability keeps your systems running even if one of your components fails. It provides redundancy for the different components, so if a problem occurs, it has a minimal effect on your system.
If you deploy a cluster of three nodes and set the Cortex XSOAR IP address access to either a virtual IP or the reverse proxy/ingress controller IP, the system implements built-in high availability. This enables workload distribution and data replication across the nodes, and continuous operation in case one node fails.
Note
Kubernetes requires a majority of control plane nodes to be online for it to function, so a three node cluster requires two to be online. If two nodes fail but are fixed and go back online, the cluster will recover. However, if two nodes fail and are not able to go back online, open a support session for assistance.
Built-in High Availability
Built-in High Availability works as follows:
Tasks and data are distributed across the nodes to balance the load.
Data is replicated across nodes, ensuring no single point of failure.
If a node goes down, workloads on the failed node are automatically distributed to the other nodes.
Note
There may be several minutes of downtime until the other nodes take over.
Once the failed node is restored, it automatically reintegrates into the cluster and the workloads are automatically rescheduled.
For more information on setting up built-in High Availability for your specific deployment by deploying a cluster of three nodes, see Cortex XSOAR Installation.
Backup and restore between primary and secondary data centers
Once you deploy your cluster, you can utilize disaster recovery functionality using backup and restore operations.
Important
The restore environment must run the same Cortex XSOAR version with the same resources as the original environment to ensure seamless restoration (the clusters must be the same).
With periodic backups of the cluster to external storage, if the original cluster becomes unavailable, you can easily restore it from the external storage. For more information, see Back up data.
Monitor and manage nodes
Once you set up and install your cluster, you can monitor node status and recover from node failure as needed.
In Cortex XSOAR, monitor the node health on the System Diagnostics page. For more information, see View system status in the System Diagnostics page.
If there is a node failure, manage the nodes from the textual UI.
For example, if a node fails remove it and then add a new node to replace it. For more information, see Manage nodes in a cluster.
You need to set the host again and reestablish trust between all the nodes if you want to replace a node in the cluster after completing the installation.