Broker VM High Availability Cluster - Learn more about creating Broker VMs in a High Availability Cluster - Administrator Guide - Cortex XSIAM - Cortex CLOUD - Cortex - Security Operations

Cortex Cloud Runtime Security Documentation

Product
Cortex Cloud Application Security > Cortex CLOUD
License
Cloud Runtime Security
Creation date
2024-12-24
Last date published
2026-06-10
Category
Administrator Guide
Abstract

Learn more about creating Broker VMs in a High Availability Cluster

High availability (HA) is a deployment in which at least two Broker VMs are placed in a Broker VM cluster, and their configuration is synchronized to prevent a single point of failure on your network at the hardware and application level. A heartbeat connection between the Broker VM nodes and the Cortex Cloud Server ensures seamless failover if a node fails. Setting up a HA cluster provides redundancy and enables data collection continuity.

The Clusters tab on the Broker VMs page enables you to view your cluster configurations, which display the associated nodes, node statuses, applets configured, and applet statuses. You can add as many clusters as you want in a tenant. Each Cortex Cloud cluster can include as many nodes as you need. The cluster operation is fully managed from the tenant, and there is no need to install additional components. There is no need for cluster nodes to communicate with one another on the network. In each cluster, one Broker VM is designated as the Primary cluster node, and the rest of the nodes are designated as standby nodes. The cluster architecture is dependent on the type of applets configured in the cluster. Applets on cluster nodes run either in the active/active mode or in the active/passive mode and exhibit different behaviors as detailed in the table below.

In each cluster, whenever there's a failure on the Primary node, Cortex Cloud automatically switches to one of the standby nodes, initiates the applets on the new Primary node, and continues data collection on that node. Any successful or unsuccessful failover attempt displays an issue in the notification area and is logged in the Management Audit Logs table.

The following conditions can trigger a failover for the Primary node:

  • Connectivity issues between a Primary node and the Cortex Cloud server

  • Application failure, such as failing to start an applet or an applet crashes

  • Any failure of one of the internal components, such as MariaDB, Redis, RabbitMQ, or Docker engine

  • Hardware failure, including:

    • Running out of disk space

    • CPU usage of more than 95% for more than 10 minutes

    • Memory usage of more than 95% for more than 10 minutes

At any time, you can change the role of the current Primary node in the cluster to another node in the HA cluster, for example, to perform maintenance, by initiating a manual switchover.

You can configure automatic upgrades within Broker VM HA cluster nodes to update cluster nodes without noticeable downtime or other disruption of the HA cluster service by implementing the rolling upgrade mechanism. An automatic upgrade is performed in the following order:

  1. Standby nodes are upgraded one by one.

  2. The Primary node is switched over to one of the upgraded standby nodes.

  3. The previous Primary node, now a standby node, is upgraded.