||
Skip to end of metadata
Go to start of metadata

WSO2 BAM uses MapReduce jobs to archive Cassandra data. As a result, you can archive a large amount of data in a cluster of Hadoop nodes. To initiate the data archiving process, log in to BAM management console and click Archive Data menu in the Configure menu as illustrated below:

The general configuration parameters of the data archival process are as follows:

ParameterDescription

Stream Name

In the BAM data model, stream name maps to a Cassandra column family. Provide the stream name to archive the data stored under that stream name.
Version

Version of the stream. Used to specify which version to archive, when there are multiple versions under the same stream name (as recommended).

Date Range

Select this option to archive the data manually.

Retention PeriodSelect this option to archive the data by scheduling it using a cron expression.
Archival / DeletionSelect Archival. For instructions on deleting data, see Deleting Cassandra Data.

You can run the archival process through the BAM management console, either manually or by scheduling it using a cron expression as explained below.

Archiving data manually

Follow the steps below to for the manual data archiving process:

  1. Click Date Range to manually archive data between a specific date range.

    archiving data manually

    The specific configuration parameters of the manual data archival process are explained below:

    ParameterDescription
    FromThe start date of the date range to be specified. For example: 25/01/2013 00:00:00 AM
    ToThe end date of the date range to be specified. For example: 03/02/2013 00:00:00 AM
  2. Click Submit.

Scheduling the data archive

Follow the steps below to for the scheduled data archiving process:

  1. Click Retention Period to schedule an archive process. For example:

    scheduling the data archive
    The specific configuration parameters of the scheduled data archival process are explained below:

    ParameterDescription

    No of days

    Keeps only last 'n' no of days data in the Column Family. For example, according to above configuration, the system only runs data from the last 90 days and archives the older data.
    Cron expressionCron expression is used to schedule the archive process. For example, the following cron expression will configure the archive job to run at 12:00 PM (noon) every day: 0 0 12 * * ? For more information on cron expressions, go to Oracle Documentation.
    • Name of the archive column family is: <original column family name> + _arch
    • Cassandra streams are generated with underscores (_). Replace the underscores in the stream name with dot (.) when archiving. For example: If stream name is org_wso2_bam_phone_retail_store_kpi, mention it as org.wso2.bam.phone.retail_store.kpi when archiving.
  2. Click Submit. Once you submit a scheduled archive, the system creates a Hive script and executes it.

  3.  Click List under Analytics in the Main menu.

    This step is not applicable in the manual data archival process, which only executes the Hive query, but doesn't save it.

  4.  Click Schedule Script associated with your script, to change the schedule time of your script.

  • No labels