High availability (HA) is a deployment in which at least two Broker VMs are placed in a Broker VM cluster and their configuration is synchronized to prevent a single point of failure on your network at the hardware and application level. A heartbeat connection between the Broker VM nodes and the Cortex XDR Server ensures seamless failover if a node fails. Setting up a HA cluster provides redundancy and enables data collection continuity.
The Clusters tab on the
Broker VMs page enables you to view your cluster configurations, which displays the associated nodes, node statuses, applets configured, and applet statuses. You can add as many clusters as you want in a tenant. Each Cortex XDR cluster can include as many nodes as you need. The cluster operation is fully managed from the tenant, and there is no need to install additional components. There is no need for cluster nodes to communicate with one another on the network. In each cluster, one Broker VM is designated as the Primary cluster node and the rest of the nodes are designated as standby nodes. The cluster architecture is dependent on the type of applets configured in the cluster. Applets on cluster nodes run either in the active/active mode or in the active/passive mode and exhibit different behaviors as detailed in the table below.
With Cortex XDR Prevent, it's only relevant to configure a HA cluster with a Local Agent Settings applet as this is the only applet supported for this product license. The other applets are collector applets, which are only available in Cortex XDR Pro or Cortex XSIAM.
In each cluster, whenever there's a failure on the Primary node, Cortex XDR automatically switches to one of the standby nodes, initiates the applets on the new Primary node, and continues data collection on that node. Any successful or unsuccessful failover attempt displays an alert in the notification area and is logged in the Management Audit Logs table.
The following conditions can trigger a failover for the Primary node:
Connectivity issues between a Primary node and the Cortex XDR server.
Application failure, such as failing to start an applet or an applet crashes.
Any failure of one of the internal components, such as MariaDB, Redis, RabbitMQ, or Docker engine.
Hardware failure, including:
Running out of disk space
CPU usage of more than 95% for more than 10 minutes
Memory usage of more than 95% for more than 10 minutes
At any time, you can change the role of the current Primary node in the cluster to another node in the HA cluster, for example, to perform maintenance, by initiating a manual switchover.
You can configure automatic upgrades within Broker VM HA cluster nodes to update cluster nodes without noticeable down-time or other disruption of the HA cluster service by implementing the rolling upgrade mechanism. An automatic upgrade is performed in the following order:
Standby nodes are upgraded one by one.
The Primary node is switched over to one of the upgraded standby nodes.
The previous Primary node, now a standby node, is upgraded.