Learn more about importing historical data into cold storage.
Notice
Importing historical data into cold storage requires a Period-Based Retention - Cold Storage add-on license.
Danger
Importing historical data into cold storage requires a View/Edit RBAC permission for Data Management (under Configurations).
Importing historical data into Cortex XSIAMCortex XSIAM cold storage is a detailed process made up of different phases as illustrated in the image below. This dedicated process is available for data migration to ensure secure, long-term storage. Some phases, such as the Data Extraction phase and Data Preparation phase, are not performed in Cortex XSIAM, and are the customer’s responsibility to complete before the data is ready to be sent to Cortex XSIAM. It is critical that the data is sent in the recommended format to be able to access and search the data for analysis, compliance, and audits.
Each data source that is imported to Cortex XSIAM is available as a cold storage dataset and can be accessed using the Cortex Query Language (XQL). These datasets are a new type of archived dataset in cold storage and after the import are renamed using the format archive_<dataset name>
.
The entire import process requires sufficient time and planning as it takes time to extract data from your third party sources, prepare the data according to your requirements, send the files to the HTTP collector, and import the data into cold storage. There are also additional limitations due to your retention licenses for cold and hot storage, and how the HTTP collector is configured to work. We highly recommend that you review carefully the different phases explained below, so you can make the best decisions to access and analyze your data from cold storage.
The import process requires initial planning and data preparation before the data is ready to import in Cortex XSIAM. This process can take time, so ensure to factor this into your import timeline.
The sections below explain the various phases in the import process along with the necessary preparation and prerequisite guidelines, so you understand what is required at each phase before your data is ready to be imported. Review these initial guidelines carefully, so that your data is imported successfully, and meets your expectations when you choose to query the data in cold storage.
These phases are listed in a suggested order for importing your data in cold storage. Some phases can overlap or are interchangeable according to your preferences. This is explained more in each phase.
Reason to perform: Initial prerequisite task for you to import any data into Cortex XSIAM.
Where is this performed: You must perform this task on a system. This phase is not performed in Cortex XSIAM.
Description: To import data into Cortex XSIAM, you need to extract the data from a system. It’s your responsibility to perform this task according to your vendor’s specifications. Since extracting data can involve multiple sources of different schemas (called datasets in Cortex XSIAM), you’ll need to bring each one individually into Cortex XSIAM.
Understand the process:
Typically there are two approaches to data extraction depending on the system:
Querying the system: Involves querying the system for the data and storing the output results.
Bulk extraction: In some systems, there are bulk extraction capabilities that you can leverage to extract the data.
Preparation and prerequisite guidelines:
How to decide which data extraction method to use? Both approaches to extraction require the data to be prepared before the data can be sent to the HTTP collector. It’s important to carefully review the Data preparation phase to decide which of the two approaches to implement to extract your data. There are tradeoffs to each extraction method that you choose.
Example 1: If you need to extract a lot of data, it could be that bulk extraction is the more suitable method. In contrast, if your data is relatively small, querying the system may make more sense.
Example 2: It’s possible that the bulk extraction capabilities outputs the data quicker. Yet, it may require more data engineering effort to prepare the data after it’s extracted. In this case, you may decide to go with querying the system as it can be the more efficient option.
Understand all Cortex XSIAM requirements in the context of preparing the data to influence your extraction decision.
The Cortex XSIAM retention licenses for hot and cold storage limit the data that can be imported. You have to send data within the license retention period for hot and cold storage, so ensure to only extract data within this time period.
Reason to perform: Initial prerequisite task for you to import historical data into Cortex XSIAM.
Where is this performed: You must perform this task on all data extracted from the third party systems in phase 1. This phase is not performed in Cortex XSIAM.
Description: Prepare the extracted data into manageable chunks that can be sent to the HTTP collector and meets Cortex XSIAM requirements for sending data to the HTTP collector. This phase requires you to make decisions about your data by carefully reviewing the guidelines provided, so you'll be able to access the data according to your requirements in cold storage.
Understand the output of the process:
Separate data according to the data source / schema (dataset).
For each separated source, separate by time periods (date).
For each separated time period, separate the data into files, where no file can be more than 25 MB.
Separate records with a new line.
Files sent to the HTTP collector must be uncompressed.
Preparation and prerequisite guidelines:
Before dividing up the data, consider the following points to determine how best to prepare the data before sending it to the HTTP collector:
Data format: The format of the data and how the data is prepared impacts your ability to query the data in Cortex XSIAM. You can send the data in any format to cold storage. Yet, querying some data formats from cold storage can be difficult. If you plan on querying the data and analyzing it in XQL, we recommend that you transform the data to a JSON format and parse it. If not, the data will be stored as is in cold storage. Within the JSON file, you can store the raw log and/or parse the data. When importing JSON files, they are parsed and brought into cold storage, so instead of having a table with a single column, you'll have a table of
n
columns according to the dataset schema.Raw format: You can prepare the data to include the raw format of the raw log if it's important for you to hold on to the original raw log format. The raw format of the raw log for data sent to cold storage is stored in a unique column called
RAW_FORMAT
. If you want to save on import capacity and don't want to send a lot of data, you don't have to leave the raw format. This data is only for you to query using cold storage. The data is not used for any other purposes, such as detection or analysis. We recommend converting the data into a JSON and parsing it. If not, the data will be stored as is in cold storage.File size: No file can be more than 25 MB.
Files sent to the HTTP collector must be uncompressed.
Reason to perform: Prerequisite task for you to import data into Cortex XSIAM.
Where is this performed: You must create the HTTP POST requests, which is performed outside Cortex XSIAM.
Description: Prepare an HTTP POST request for each file that needs to be sent (as explained in Phase 2). Each request sent to the HTTP collector must contain the appropriate headers for the dataset and date as well as conform to a specific format. You will need to use the API URL and generated key that is available when you enable the HTTP collector connection (as explained in Phase 4).
Understand the process:
For examples on how to prepare the HTTP POST requests, see Task 2. Send data to your Cortex XSIAM HTTP collector. Yet, you can only retrieve all the information necessary to complete your HTTP POST request, when you enable the HTTP collector connection as explained in Phase 4.
Preparation and prerequisite guidelines:
When sending files to the HTTP collector, the header of the HTTP request must include the following tags / headers to store the data correctly:
Dataset name: A valid dataset name is alphanumeric with an option to use underscores (
_
) to concatenate multiple names using the format<dataset name1>_<dataset name2>_...
. This is added in the header using thex-cortex-source-dataset
parameter.Date: Use the format
YYYY-MM-DD
. This is added in the header using thex-cortex-partition
parameter.
What determines whether data is found when you query it? These headers determine how this data will be accessible through querying cold storage. For example, the date that you put in the header of the HTTP request and send to the HTTP collector impacts how you can query the data. If you put the date that you sent the data in the header, but the data is really for a different day, when you search cold storage you won't find the data. The date in the header is the date used for the data in cold storage.
How is the data sent? The headers are critical to send data to cold storage and alignment of the data is important. If the headers in the HTTP POST request sent to the HTTP collector are valid, the data is accepted. As a result, you need to ensure the data is set up correctly. For example, if you send different data with the same headers or the same source with the same date, you'll create a problem that you'll have data of different schemas, potentially of different formats, for the same day and if you query the data in XQL it will be difficult to understand the output.
Reason to perform: Prerequisite task for you to perform in Cortex XSIAM to retrieve the required HTTP collector settings to define in the HTTP POST requests (as explained in Phase 3), and send these requests (as explained in Phase 5).
Where is this performed: You must perform this task in Cortex XSIAM to prepare the HTTP POST requests and be able to send the requests to the HTTP collector (as explained in Phase 5).
Description: Data is sent to Cortex XSIAM in an HTTP request using the dedicated API of the HTTP collector when the HTTP connection is enabled and a key is generated. You'll need to use the API URL and the generated key to prepare your HTTP POST requests.
Understand the process:
See Task 1. Enable the HTTP collector connection in Cortex XSIAM.
Once the HTTP collector connection is enabled, you can finish defining the HTTP POST requests as explained in Task 2. Send data to your Cortex XSIAM HTTP collector.
Preparation and prerequisite guidelines:
The HTTP collector connection is automatically disabled if not used for 14 days. If disabled, you will need to generate a new key for your HTTP POST requests.
Reason to perform: Prerequisite task for you to import data in Cortex XSIAM.
Where is this performed: You must send all the HTTP POST requests that you've created in phase 3.
Description: Send the files through the HTTP POST requests to the HTTP collector. Every POST request sent is answered with a notification to indicate if the request to upload the data was successful or not. If there is an issue, the request can fail and there are different error messages provided to help troubleshoot.
Understand the process:
For details on how to send the HTTP POST requests, see Task 2. Send data to your Cortex XSIAM HTTP collector.
Preparation and prerequisite guidelines:
Files uploaded by the HTTP collector must be uncompressed.
Maximum file size limit is 25 MB for uploading data by the HTTP collector.
HTTP request headers must include the dataset name and date as explained in the step above.
Daily upload limit of 100,000 files sent to the HTTP Collector.
Total daily upload capacity in GB units for sending files to the HTTP collector is based on the formula:
100 * (Daily GB License)
.Total upload capacity in GB units for sending files to the HTTP Collector is related to the retention license using the formula:
(# of months of hot + cold storage) * 30 * (Daily ingest limit)
, where 30 represents the number of days in a month.
Reason to perform: Final task performed in Cortex XSIAM.
Where is this performed: You must perform this task in Cortex XSIAM.
Description: After you validate that all the files sent to Cortex XSIAM are listed in the Remote Files table in the Archived Data page with a Remote status, you can now import the dataset files into cold storage. Once imported, the files are no longer available in the Remote Files table. They are now accessible in cold storage as archived datasets and are listed in the Dataset Management page.
Understand the process:
See Task 3. Import the dataset files to cold storage.
Preparation and prerequisite guidelines:
You can import multiple remote files at once, which can take time, up to several days, to complete. A notification is sent once the import is completed.
Perform the following procedures in the order listed below.
Select Settings → Configurations → Data Management → Archived Data.
Open the HTTP collector settings by clicking HTTP Collector.
Select the Enable HTTP connection toggle.
Copy the API URL.
Click the copy icon beside the API URL displayed and record it somewhere safe. You'll use this URL when you configure your HTTP POST request to send your data to the HTTP collector.
Click Generate Key.
Next to the key displayed, in the Generated Key dialog box, click the copy icon and record it somewhere safe. You will need to provide this key when you configure your HTTP POST request and define the Authorization key. If you forget to record the key and close the window, you will need to generate a new key and repeat this process. The HTTP connection is only established after a key is generated.
Click Close when finished.
Send an HTTP POST request to the URL for your HTTP collector.
Here is a CURL example:
curl -X POST "https://api-{tenant external URL}/logs/v1/bulk_load" \ -H "Authorization: {generated_key}" \ -H "x-cortex-partition: {partition_date_of_the_data}" \ -H "x-cortex-source-dataset: {dataset_name}" \ -H "Content-Type: application/json" \ -d '{"example1": "test", "timestamp": 1609100113039} {"example2": [12321,546456,45687,1]}'
Python 3 example:
import requests def test_http_collector(generated_key): headers = { "Authorization": generated_key, "x-cortex-partition": partition_date_of_the_data, "x-cortex-source-dataset": dataset_name, "Content-Type": "application/json" } # Note: the logs must be separated by a new line body = "{'example1': 'test', 'timestamp': 1609100113039}" \ "{'example2': [12321,546456,45687,1]}" res = requests.post(url="https://api-{tenant external URL}/logs/v1/event", headers=headers, data=body) return res
Substitute the values specific to your configuration.
API URL: Paste the API URL that you copied when you enabled the HTTP collector from the Archived Data page. The format of the URL is
https://api-{tenant external URL}/logs/v1/bulk_load
.Authorization: Paste the generated key you previously recorded when enabling the HTTP collector, which is defined in the header.
x-cortex-partition: Enter the name of the file/folder containing the data from the log source / schema that you want to send to the HTTP collector. The file/folder name must be a date in the format
YYYY-MM-DD
. This is defined as part of the header.x-cortex-source-dataset: Enter the name of the dataset for the data you want to send to the HTTP collector. A valid dataset name is alphanumeric with an option to use underscores (
_
) to concatenate multiple names using the format<dataset name1>_<dataset name2>_....
. This is defined as part of the header.Content-Type: This setting is dependent on the data object format of your files. For example, use
application/json
for JSON format ortext/plain
for Text format. This is defined as part of the header.Body: The body contains the records you want to send to Cortex XSIAM. Separate records with a \n (new line) delimiter. The request body can contain up to 25 MB of records, and uncompressed. In the case of a CURL command, the records are contained in the -d ‘<records>’ parameter.
Review the possible success and failure code responses to your HTTP POST requests.
For more information on the possible error codes you can encounter, so you can troubleshoot the errors, see Success and failure code responses to your HTTP POST requests.
Monitor the Remote Files table in the Settings → Configurations → Data Management → Archived Data page.
Once the HTTP requests are sent to the HTTP collector, the data begins to be displayed by the dataset name in the Remote Files table. It can take time for all the datasets and associated folders with the data to be displayed as this is dependent on several factors, such as the number of files, daily upload limit of the HTTP collector, and total daily upload capacity for sending files to the HTTP collector. In some cases, it can take several days to complete. When the dataset has a Remote status, the upload is complete.
Select the datasets and associated folders with the data to upload in the Remote Files table.
You can select the data to upload in two different ways:
To import all the folders associated with the dataset, select the dataset name in the Remote Files table.
To import specific folders from a dataset, click the dataset name in the Remote Files table, and then select the files you want to import.
When you've finished selecting the applicable datasets and folders to import, right-click, and select
Import
.Confirm the import in the dialog box that opens by clicking Start Import.
The statuses of the datasets and folders imported in the Remote Files table will update to In Progress. The import can take time, even up to several days. For more information, see Phase 6.
When the import finishes, a notification is sent to your Notification Center indicating whether the import was successful or not. The statuses of the imported datasets and folders get updated, which is dependent on the dataset and folders imported.
Import entire dataset: If you import a dataset, or multiple datasets, with all its folders, after the import completes the status of the dataset updates to Imported, the dataset is disabled from the Remote Files table, and the dataset is now listed in the Dataset Management page with the name using the format
archive_<dataset name>
.Partial import of dataset: If you import only some of the folders of a dataset, after the import completes the status of the dataset updates to Partially Imported, the folders imported have a status of Imported, and the rest of the folders not imported remain with a Remote status. The dataset is left enabled to allow you to import the rest of the data at a later time. Also, the dataset is now listed in the Dataset Management page with the name using the format
archive_<dataset name>
. If you choose to import the same dataset with the rest of the folders at a later time, the leftover files are added to the dataset that was created previously in the Data Management page. In addition, in the Remote Files table, the dataset is disabled and the status updates to Imported.Reimport dataset or folders in a dataset: If you resend data to the HTTP collector that has already been imported previously, the statuses of the dataset and folders update to Partially Imported. If you choose to reimport, these files will be added (not replaced) to the existing dataset in the Dataset Management page. It's also possible to combine previously imported folders with new folders, so the statuses of the new folders update to Remote and the previously imported dataset and folders update to Partially Imported.
The Remote Files table on the Archived Data page enables you to keep track of your data that you're in the process of uploading to import to cold storage and the data ready to import. As soon as any files are received by the HTTP collector through the HTTP POST requests sent, the file contents are displayed in the Remote Files table. The data is ordered by the datasets, where for each dataset you can see the aggregated number of folders/files, total folder size when calculated, and the status of the files. When you select any dataset name, the folders indicate the different dates of the data as sent in the HTTP request header. For each folder, the following is listed: aggregated number of folders/files, total folder size when calculated, and the status of the files.
You can select different datasets and folders to import to cold storage. It can take time for all the files to be sent to the HTTP collector and then imported as this is dependent on several factors, such as the numbers of files, daily upload limit of files sent to the HTTP collector, and total daily upload capacity by the HTTP collector. In some cases, it can take several days to complete. Use the statuses to help you monitor your data.
You can pivot (right-click) any dataset or folder listed in the Remote Files table to import, delete, and calculate the folder size. You can calculate the folder size of a dataset or folder, when the dataset or folder status are either Remote or Partially Imported.