Ensure reliable and continuous operation with High Availability.
High availability (HA) keeps your systems running even if one of the components in the system fails. It provides redundancy for the different components so if a problem occurs, there is minimal effect on your system.
If you deploy a cluster of three nodes and set the Cortex XSOAR IP address access to either a virtual IP or the IP of a reverse proxy/ingress controller, the system implements built-in High Availability (HA), enabling workload distribution and data replication across the nodes, and continuous operation in case of node failure.
Once you deploy your cluster, you can deploy a second cluster in a secondary data center to enable High Availability and disaster recovery functionality using backup and restore operations.
Note
Kubernetes requires a majority of control plane nodes to be online for it to function, so a three node cluster requires two to be online. If two nodes fail but are fixed and go back online, the cluster will recover. However, if two nodes fail and are not able to go back online, open a support session for assistance.
Built-in High Availability
Built-in High Availability works as follows:
Tasks and data are distributed across the nodes to balance the load.
Data is replicated across nodes, ensuring no single point of failure.
If a node goes down, workloads on the failed node are automatically distributed to the other nodes.
Note
There may be a several minutes of downtime until the other nodes take over.
Once the failed node is restored, it automatically reintegrates into the cluster and the workloads are automatically rescheduled.
For more information on setting up built-in High Availability for your specific deployment, see Cortex XSOAR Installation.
Monitor and manage nodes
Once you set up your cluster and install Cortex XSOAR, you can monitor node status and recover from node failure as needed.
In Cortex XSOAR, monitor the node health in the System Diagnostics page. For more information, see View system status in the System Diagnostics page.
If there is a node failure, manage the nodes from the textual UI.
For example, if a node fails remove it and then add a new node to replace it. For more information, see Manage nodes in a cluster.
You need to set the host again and reestablish trust between all the nodes if you want to replace a node in the cluster after completing installation.