Configure Federated Search to create external datasets for querying distributed data sources
Before you run federated searches, you must first create an external dataset to run the query.
To define a new external dataset, go to → → → → and click Add External Dataset. You can also access the wizard through the Query builder page → → → .
Select the storage provider and follow the wizard which takes you through the following steps.
Prerequisites: Perform preliminary steps on your remote storage, such as creating a policy and attaching it to a role.
Connection setup and dataset definition:
Configure the connection and trust relationship with the CSP.
Define the dataset name, description, path within the storage, region, and format.
Schema Validation: Initiate the process to access the remote storage, pull sample data, and deduce the schema. You can view the auto-detected schema and if the fields aren't accurate, add or delete fields as needed.
Configuration review and dataset creation: Go over the details and create the dataset.
Add an external dataset for an Amazon S3 bucket.
Prerequisites
Access to Cortex XSIAM communication. For a list of the authorized IP addresses, see Enable access to required PANW resources.
An AWS bucket that contains your data sources.
Permissions to modify IAM policies in AWS.
In Amazon S3, create an IAM policy to allow access to your bucket.
Navigate to → → and select S3.
In Actions Allowed, select → .
In List, select ListBucket.
In Read, select GetObject.
In Resources, click ARN and fill your bucket name for both Bucket and Object. For Object, use and asterisk (*) and select Any object name.
Check the details and click Next.
Click Create Policy.
Your policy appears in the Policies table.
Create a role for the policy you created.
Navigate to → → .
In the Trusted entity type page, select Web identity.
In the Web identity page, under Identity provider, select Google, and under Audience, type 00000, and click Next.
This will later be replaced by the identity created by Cortex XSIAM.
Select your policy and click Next.
Type a name for your role and select Create Role.
Configure the connection.
In the Federated Search wizard, type the Role ARN from AWS.
Specify the bucket region. Supported regions are us-east-1, us-west-2, ap-northeast-2, ap-southeast-2, eu-west-1, eu-central-1.
Click Generate to create a new Identity for this connection and copy the generated Identity.
This is the identity provided by Cortex XSIAM to create a trust relationship with AWS.
In the AWS IAM console, add a trust relationship by adding the identity you generated above to the role and set a maximum session duration.
In AWS IAM, select Roles.
Select the role you created.
Click Edit and set Maximum session duration to 12 hours.
This configures the length of time the session lasts before requiring re-authentication.
Click Save changes.
Select Trust Relationships and click Edit policy.
Replace the value of
accounts.google.com:audwith the identity you generated above. You can also replace the policy content with the provided code snippet.Click Update policy.
Configure the dataset.
In the Federated Search wizard, type a meaningful dataset name and add an optional description. External dataset names must always begin with
external_.In S3 URI, enter the Amazon S3 path of the partition directory using the S3 format. To find the path, in the AWS bucket click the directory to display Object overview and copy the S3 URI. The path can only include letters, digits, and the symbols "-","_","=",".". For example,
s3://bucket-name/table-name/. Don't use wildcards.Your partitioned data must follow the Hive partitioning format, which uses key-value pairs. In your directory, name your partitions in the yyyy-mm-dd format, for example ds=2025-10-07. This creates external datasets based on your partitioned data source paths.
When you filter a query using the Time frame selection, the query uses the dates in the partition.
Specify the format. For correct deduction of the schema, you must provide the correct file format. Federated Search supports CSV, Parquet, and JSONL files. For optimal results, we recommend using Parquet format with explicit schema definition.
Test the connection.
If the connection was successful, click Next.
Validate the schema.
Cortex XSIAM displays the detected schema. This schema is based on the first 500 records in your data.
Note
We highly recommend that you don't change the auto-detected schema. However, you can add, edit or delete fields.
You can't delete the ds field, which is used for Hive partitioning.
After you save the schema, you can't delete any fields you added during setup.
Review all the details. You can go back to change any details you want, save the query and return to the external datasets table, or save and start a query in the XQL query page. Saving and starting a query can take some time.
Add an external dataset for a Google Cloud Storage bucket.
Prerequisites
Access to Cortex XSIAM communication. For a list of the authorized IP addresses, see Enable access to required PANW resources.
A GCS bucket that contains your data sources.
Permissions to modify IAM policies in GCS.
Configure the connection.
Specify the bucket region. Federated Search supports the following regions: africa-south1, asia-east1, asia-east2, asia-northeast1, asia-northeast2, asia-northeast3, asia-south1, asia-south2, asia-southeast1, asia-southeast2, australia-southeast1, australia-southeast2, europe-central2, europe-north1, europe-north2, europe-southwest1, europe-west1, europe-west10, europe-west12, europe-west2, europe-west3, europe-west4, europe-west6, europe-west8, europe-west9, me-central1, me-central2, me-west1, northamerica-northeast1, northamerica-northeast2, northamerica-south1, southamerica-east1, southamerica-west1, us-central1, us-east1, us-east4, us-east5, us-south1, us-west1, us-west2, us-west3, us-west4
Note
You can only configure regions that are in the multi-region of your tenant.
Click Generate to create a new Identity for this connection and copy the generated Identity.
This is the service account that will allow read access to the GCS bucket.
Grant access to the connection.
In the GCS project, navigate to IAM.
In → , click Grant access.
Under New principles, paste the Identity you generated in the Federated Search wizard.
Under Assign roles, select the role Storage Object Viewer.
Click Save.
Configure the dataset.
In the Federated Search wizard, type a meaningful dataset name and add an optional description. External dataset names must always begin with
external_.In GS URI, specify the partition directory. For example,
s3://bucket-name/table-name/. Don't use wildcards.Your partitioned data must follow the Hive partitioning format, which uses key-value pairs. In your directory, name your partitions in the yyyy-mm-dd format, for example ds=2025-10-07. This creates external datasets based on your partitioned data source paths
When you filter a query using the Time frame selection, the query uses the dates in the partition.
Specify the format. For correct deduction of the schema, you must provide the correct file format. Federated Search supports CSV, Parquet, and JSONL files. For optimal results, we recommend using Parquet format with explicit schema definition.
Test the connection.
If the connection was successful, click Next.
Validate the schema.
Cortex XSIAM displays the detected schema. This schema is based on the first 500 records in your data.
Note
We highly recommend that you don't change the auto-detected schema. However, you can add, edit or delete fields.
You can't delete the ds field, which is used for Hive partitioning.
After you save the schema, you can't delete any fields you added during setup.
Review all the details. You can go back to change any details you want, save the query and return to the external datasets table, or save and start a query in the XQL query page. Saving and starting a query may take some time.
Add an external dataset for an Azure blob.
Prerequisites
Access to Cortex XSIAM communication. For a list of the authorized IP addresses, see Enable access to required PANW resources.
An Azure Blob Storage blob with your data sources.
Permissions to modify IAM policies in Azure Blob Storage.
Create a new registration in Azure Blob Storage to be used by Federated Search.
In your Azure tenant, navigate to → , and click New registration.
Fill the following fields as below:
Name: Type a name
Supported account types: Accounts in this organizational directory only
Redirect URI: Leave blank for now.
Click Register.
Note
Copy the Directory (tenant) ID, the Application (client) ID, and the Object ID. You will use these in the connection step.
Configure the connection.
In the Federated Search wizard, paste the following values from Azure: Directory ID, Application ID, Object ID.
Specify the blob region. Federated Search supports only eastus2.
Note
You can only configure regions that are in the multi-region of your tenant.
Click Generate to create a new Identity for this connection and copy the generated Identity.
This is the identity used to establish the trust with Azure Storage.
Create credentials for the application.
In your Azure tenant, under → , select your application and click Add a certificate or secret.
Select Federated credentials and click Add credential.
In the Add a credential page, fill in the following values:
Federated credential scenario: Other issuer
Issuer: https://accounts.google.com
Type: Explicit subject identifier
Value: Identity you generated above in the Federated Search wizard.
Type a name and description, and click Add.
Assign a role to the application.
In → , select your blob.
Select the container and in the left menu click Access Control (IAM).
Under Check Access, click Add role assignment.
In → , select Storage Blob Data Reader and click Next.
For the Assign access to field, select User, group, or service principal.
Click Select members, search for the name of your app registration. Select the app registration and click Select.
Click Review + assign to finalize.
Configure the dataset.
In the Federated Search wizard, type a meaningful dataset name and add an optional description. External dataset names must always begin with
external_.In Container URL, specify the partition directory. For example,
s3://bucket-name/table-name/. Don't use wildcards.Your partitioned data must follow the Hive partitioning format, which uses key-value pairs. Name your partitions in the yyyy-mm-dd format, for example ds=2025-10-07. This creates external datasets based on your partitioned data source paths.
When you filter a query using the Time frame selection, the query uses the dates in the partition.
Specify the format. For correct deduction of the schema, you must provide the correct file format. Federated Search supports CSV, Parquet, and JSONL files. For optimal results, we recommend using Parquet format with explicit schema definition.
Test the connection.
If the connection was successful, click Next.
Validate the schema.
Cortex XSIAM displays the detected schema. This schema is based on the first 500 records in your data.
Note
We highly recommend that you don't change the auto-detected schema. However, you can add, edit or delete fields.
You can't delete the ds field, which is used for Hive partitioning.
After you save the schema, you can't delete any fields you added during setup.
Review all the details. You can go back to change any details you want, save the query and return to the external datasets table, or save and start a query in the XQL query page. Saving and starting a query can take some time.