Import historical data into cold storage - Administrator Guide - Cortex XSIAM - Cortex - Security Operations

Cortex XSIAM Documentation

Product
Cortex XSIAM
Creation date
2024-03-06
Last date published
2025-03-17
Category
Administrator Guide
Abstract

Learn more about importing historical data into cold storage.

Notice

Importing historical data into cold storage requires a Period-Based Retention - Cold Storage add-on license.

Danger

Importing historical data into cold storage requires a View/Edit RBAC permission for Data Management (under Configurations).

Importing historical data into Cortex XSIAMCortex XSIAM cold storage is a detailed process made up of different phases as illustrated in the image below. This dedicated process is available for data migration to ensure secure, long-term storage. Some phases, such as the Data Extraction phase and Data Preparation phase, are not performed in Cortex XSIAM, and are the customer’s responsibility to complete before the data is ready to be sent to Cortex XSIAM. It is critical that the data is sent in the recommended format to be able to access and search the data for analysis, compliance, and audits.

Each data source that is imported to Cortex XSIAM is available as a cold storage dataset and can be accessed using the Cortex Query Language (XQL). These datasets are a new type of archived dataset in cold storage and after the import are renamed using the format archive_<dataset name>.

The entire import process requires sufficient time and planning as it takes time to extract data from your third party sources, prepare the data according to your requirements, send the files to the HTTP collector, and import the data into cold storage. There are also additional limitations due to your retention licenses for cold and hot storage, and how the HTTP collector is configured to work. We highly recommend that you review carefully the different phases explained below, so you can make the best decisions to access and analyze your data from cold storage.

The import process requires initial planning and data preparation before the data is ready to import in Cortex XSIAM. This process can take time, so ensure to factor this into your import timeline.

The sections below explain the various phases in the import process along with the necessary preparation and prerequisite guidelines, so you understand what is required at each phase before your data is ready to be imported. Review these initial guidelines carefully, so that your data is imported successfully, and meets your expectations when you choose to query the data in cold storage.

These phases are listed in a suggested order for importing your data in cold storage. Some phases can overlap or are interchangeable according to your preferences. This is explained more in each phase.

BulkLoadImage.png
Data extraction

Reason to perform: Initial prerequisite task for you to import any data into Cortex XSIAM.

Where is this performed: You must perform this task on a system. This phase is not performed in Cortex XSIAM.

Description: To import data into Cortex XSIAM, you need to extract the data from a system. It’s your responsibility to perform this task according to your vendor’s specifications. Since extracting data can involve multiple sources of different schemas (called datasets in Cortex XSIAM), you’ll need to bring each one individually into Cortex XSIAM.

Understand the process:

Typically there are two approaches to data extraction depending on the system:

  • Querying the system: Involves querying the system for the data and storing the output results.

  • Bulk extraction: In some systems, there are bulk extraction capabilities that you can leverage to extract the data.

Preparation and prerequisite guidelines:

  • How to decide which data extraction method to use? Both approaches to extraction require the data to be prepared before the data can be sent to the HTTP collector. It’s important to carefully review the Data preparation phase to decide which of the two approaches to implement to extract your data. There are tradeoffs to each extraction method that you choose.

    • Example 1: If you need to extract a lot of data, it could be that bulk extraction is the more suitable method. In contrast, if your data is relatively small, querying the system may make more sense.

    • Example 2: It’s possible that the bulk extraction capabilities outputs the data quicker. Yet, it may require more data engineering effort to prepare the data after it’s extracted. In this case, you may decide to go with querying the system as it can be the more efficient option.

  • Understand all Cortex XSIAM requirements in the context of preparing the data to influence your extraction decision.

  • The Cortex XSIAM retention licenses for hot and cold storage limit the data that can be imported. You have to send data within the license retention period for hot and cold storage, so ensure to only extract data within this time period.

Data preparation

Reason to perform: Initial prerequisite task for you to import historical data into Cortex XSIAM.

Where is this performed: You must perform this task on all data extracted from the third party systems in phase 1. This phase is not performed in Cortex XSIAM.

Description: Prepare the extracted data into manageable chunks that can be sent to the HTTP collector and meets Cortex XSIAM requirements for sending data to the HTTP collector. This phase requires you to make decisions about your data by carefully reviewing the guidelines provided, so you'll be able to access the data according to your requirements in cold storage.

Understand the output of the process:

  • Separate data according to the data source / schema (dataset).

  • For each separated source, separate by time periods (date).

  • For each separated time period, separate the data into files, where no file can be more than 25 MB.

  • Separate records with a new line.

  • Files sent to the HTTP collector must be uncompressed.

Preparation and prerequisite guidelines:

Before dividing up the data, consider the following points to determine how best to prepare the data before sending it to the HTTP collector:

  • Data format: The format of the data and how the data is prepared impacts your ability to query the data in Cortex XSIAM. You can send the data in any format to cold storage. Yet, querying some data formats from cold storage can be difficult. If you plan on querying the data and analyzing it in XQL, we recommend that you transform the data to a JSON format and parse it. If not, the data will be stored as is in cold storage. Within the JSON file, you can store the raw log and/or parse the data. When importing JSON files, they are parsed and brought into cold storage, so instead of having a table with a single column, you'll have a table of n columns according to the dataset schema.

  • Raw format: You can prepare the data to include the raw format of the raw log if it's important for you to hold on to the original raw log format. The raw format of the raw log for data sent to cold storage is stored in a unique column called RAW_FORMAT. If you want to save on import capacity and don't want to send a lot of data, you don't have to leave the raw format. This data is only for you to query using cold storage. The data is not used for any other purposes, such as detection or analysis. We recommend converting the data into a JSON and parsing it. If not, the data will be stored as is in cold storage.

  • File size: No file can be more than 25 MB.

  • Files sent to the HTTP collector must be uncompressed.

HTTP POST requests preparation

Reason to perform: Prerequisite task for you to import data into Cortex XSIAM.

Where is this performed: You must create the HTTP POST requests, which is performed outside Cortex XSIAM.

Description: Prepare an HTTP POST request for each file that needs to be sent (as explained in Phase 2). Each request sent to the HTTP collector must contain the appropriate headers for the dataset and date as well as conform to a specific format. You will need to use the API URL and generated key that is available when you enable the HTTP collector connection (as explained in Phase 4).

Understand the process:

For examples on how to prepare the HTTP POST requests, see Task 2. Send data to your Cortex XSIAM HTTP collector. Yet, you can only retrieve all the information necessary to complete your HTTP POST request, when you enable the HTTP collector connection as explained in Phase 4.

Preparation and prerequisite guidelines:

  • When sending files to the HTTP collector, the header of the HTTP request must include the following tags / headers to store the data correctly:

    • Dataset name: A valid dataset name is alphanumeric with an option to use underscores (_) to concatenate multiple names using the format <dataset name1>_<dataset name2>_.... This is added in the header using the x-cortex-source-dataset parameter.

    • Date: Use the format YYYY-MM-DD. This is added in the header using the x-cortex-partition parameter.

  • What determines whether data is found when you query it? These headers determine how this data will be accessible through querying cold storage. For example, the date that you put in the header of the HTTP request and send to the HTTP collector impacts how you can query the data. If you put the date that you sent the data in the header, but the data is really for a different day, when you search cold storage you won't find the data. The date in the header is the date used for the data in cold storage.

  • How is the data sent? The headers are critical to send data to cold storage and alignment of the data is important. If the headers in the HTTP POST request sent to the HTTP collector are valid, the data is accepted. As a result, you need to ensure the data is set up correctly. For example, if you send different data with the same headers or the same source with the same date, you'll create a problem that you'll have data of different schemas, potentially of different formats, for the same day and if you query the data in XQL it will be difficult to understand the output.

Enable the HTTP collector connection

Reason to perform: Prerequisite task for you to perform in Cortex XSIAM to retrieve the required HTTP collector settings to define in the HTTP POST requests (as explained in Phase 3), and send these requests (as explained in Phase 5).

Where is this performed: You must perform this task in Cortex XSIAM to prepare the HTTP POST requests and be able to send the requests to the HTTP collector (as explained in Phase 5).

Description: Data is sent to Cortex XSIAM in an HTTP request using the dedicated API of the HTTP collector when the HTTP connection is enabled and a key is generated. You'll need to use the API URL and the generated key to prepare your HTTP POST requests.

Understand the process:

See Task 1. Enable the HTTP collector connection in Cortex XSIAM.

Once the HTTP collector connection is enabled, you can finish defining the HTTP POST requests as explained in Task 2. Send data to your Cortex XSIAM HTTP collector.

Preparation and prerequisite guidelines:

The HTTP collector connection is automatically disabled if not used for 14 days. If disabled, you will need to generate a new key for your HTTP POST requests.

Send data to HTTP collector through the HTTP requests

Reason to perform: Prerequisite task for you to import data in Cortex XSIAM.

Where is this performed: You must send all the HTTP POST requests that you've created in phase 3.

Description: Send the files through the HTTP POST requests to the HTTP collector. Every POST request sent is answered with a notification to indicate if the request to upload the data was successful or not. If there is an issue, the request can fail and there are different error messages provided to help troubleshoot.

Understand the process:

For details on how to send the HTTP POST requests, see Task 2. Send data to your Cortex XSIAM HTTP collector.

Preparation and prerequisite guidelines:

  • Files uploaded by the HTTP collector must be uncompressed.

  • Maximum file size limit is 25 MB for uploading data by the HTTP collector.

  • HTTP request headers must include the dataset name and date as explained in the step above.

  • Daily upload limit of 100,000 files sent to the HTTP Collector.

  • Total daily upload capacity in GB units for sending files to the HTTP collector is based on the formula: 100 * (Daily GB License).

  • Total upload capacity in GB units for sending files to the HTTP Collector is related to the retention license using the formula: (# of months of hot + cold storage) * 30 * (Daily ingest limit), where 30 represents the number of days in a month.

Import the files in Cortex XSIAM

Reason to perform: Final task performed in Cortex XSIAM.

Where is this performed: You must perform this task in Cortex XSIAM.

Description: After you validate that all the files sent to Cortex XSIAM are listed in the Remote Files table in the Archived Data page with a Remote status, you can now import the dataset files into cold storage. Once imported, the files are no longer available in the Remote Files table. They are now accessible in cold storage as archived datasets and are listed in the Dataset Management page.

Understand the process:

See Task 3. Import the dataset files to cold storage.

Preparation and prerequisite guidelines:

  • You can import multiple remote files at once, which can take time, up to several days, to complete. A notification is sent once the import is completed.

Perform the following procedures in the order listed below.

The Remote Files table on the Archived Data page enables you to keep track of your data that you're in the process of uploading to import to cold storage and the data ready to import. As soon as any files are received by the HTTP collector through the HTTP POST requests sent, the file contents are displayed in the Remote Files table. The data is ordered by the datasets, where for each dataset you can see the aggregated number of folders/files, total folder size when calculated, and the status of the files. When you select any dataset name, the folders indicate the different dates of the data as sent in the HTTP request header. For each folder, the following is listed: aggregated number of folders/files, total folder size when calculated, and the status of the files.

You can select different datasets and folders to import to cold storage. It can take time for all the files to be sent to the HTTP collector and then imported as this is dependent on several factors, such as the numbers of files, daily upload limit of files sent to the HTTP collector, and total daily upload capacity by the HTTP collector. In some cases, it can take several days to complete. Use the statuses to help you monitor your data.

You can pivot (right-click) any dataset or folder listed in the Remote Files table to import, delete, and calculate the folder size. You can calculate the folder size of a dataset or folder, when the dataset or folder status are either Remote or Partially Imported.