Unified, cost-effective data querying against distributed, non-ingested data sources
Federated Search is a query mechanism designed to provide unified access to distributed data sources without requiring pre-ingestion or centralization. This capability enables you to query data in place, significantly reducing the complexity and operational costs associated with the ingestion process and long-term data retention.
Notice
Federated Search is not enabled by default. To enable it in your tenant, please contact your Customer Support Team.
Modern enterprises store massive volumes of data across multiple cloud providers and hybrid environments. Centralized data ingestion and warehousing may be insufficient or expensive for cold or regulatory-mandated data. Federated Search allows you to:
Decouple data management from data analytics for cost optimization.
Maintain economic solutions for long-term data storage.
Perform on-demand incident response or compliance audits against existing long-term storage solutions without the overhead of ingestion.
The main use cases for Federated Search include:
Incident Investigation: Querying events that occurred a long time ago, where the data might not have been ingested into Cortex XSIAM.
Compliance audits: Accessing historical data needed for audits without the need for extensive ingestion.
Long-Term data storage: Providing an integrated solution for retaining data for many years.
Data linking: Joining external datasets with ingested datasets for comprehensive and unified data analysis.
You can keep non-critical, high-volume data types in their native storage locations while preserving the ability to query this data using XQL (Extended Query Language). This ensures that visibility is gained into a broader spectrum of data while maintaining the core value proposition of deep analytics on ingested data.
Note
Federated search query costs are calculated according to timeframe, complexity, and any cross-cloud egress costs that may apply.
Supported configurations
Federated Search supports the following configurations.
Property | Configuration |
|---|---|
Storage solutions | Amazon Web Services (AWS) S3 Google Cloud Storage (GCS) Azure Blob Storage |
Formats | CSV Parquet JSONL NoteFor optimal results, we recommend the Parquet format. |
Partitioning/File Structure | Your data must be partitioned and must follow the Hive partitioning format, which uses key-value pairs. Partitions must be named in the yyyy-mm-dd format (for example, ds=2025-10-07). |
Supported Regions | AW S3: us-east-1, us-west-2, ap-northeast-2, ap-southeast-2, eu-west-1, eu-central-1 GCS: africa-south1, asia-east1, asia-east2, asia-northeast1, asia-northeast2, asia-northeast3, asia-south1, asia-south2, asia-southeast1, asia-southeast2, australia-southeast1, australia-southeast2, europe-central2, europe-north1, europe-north2, europe-southwest1, europe-west1, europe-west10, europe-west12, europe-west2, europe-west3, europe-west4, europe-west6, europe-west8, europe-west9, me-central1, me-central2, me-west1, northamerica-northeast1, northamerica-northeast2, northamerica-south1, southamerica-east1, southamerica-west1, us-central1, us-east1, us-east4, us-east5, us-south1, us-west1, us-west2, us-west3, us-west4 Azure Blob Storage: eastus2 NoteThe list of supported regions may change in the future. |
Limitations
The following limitations apply to Federated Search:
Limitation | Description |
|---|---|
Regions | If your tenant is on a specific region server (and not on a multi-region server), the bucket must be in the same region as your tenant. If your tenant is on a multi-region server, you can only configure regions that are in the multi-region of your tenant. The bucket must be in the same multi-region as your Cortex tenant. For example, if your Cortex XSIAM tenant is located in the US multi-region, you can configure an external dataset only from regions in the US multi-region. |
Queries | The following functions are not available in Federated Search and remain exclusive to fully ingested data:
|