The latest version for DAS is WSO2 Data Analytics Server 3.2.0. View documentation for the latest release.
WSO2 Data Analytics Server is succeeded by WSO2 Stream Processor. To view the latest documentation for WSO2 SP, see WSO2 Stream Processor Documentation.
||
Skip to end of metadata
Go to start of metadata

WSO2 DAS introduces the ability to have a pluggable Data Access Layer (DAL). The DAL (Analytics Data Service) is made up of two main components which are specified in the  <DAS_HOME>/repository/conf/analytics/analytics-config.xml  file as follows.

sample analytics-config.xml
<analytics-dataservice-configuration>
   <!-- The name of the primary record store -->
   <primaryRecordStore>EVENT_STORE</primaryRecordStore>
   <!-- The name of the index staging record store -->
   <indexStagingRecordStore>INDEX_STAGING_STORE</indexStagingRecordStore>
   <!-- Analytics File System - properties related to index storage implementation -->
   <analytics-file-system>
      <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsFileSystem</implementation>
      <properties>
         <!-- the data source name mentioned in data sources configuration -->
         <property name="datasource">WSO2_ANALYTICS_FS_DB</property>
         <property name="category">large_dataset_optimized</property>
      </properties>
   </analytics-file-system>
   <!-- Analytics Record Store - properties related to record storage implementation -->
   <analytics-record-store name="EVENT_STORE">
      <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation>
      <properties>
         <property name="datasource">WSO2_ANALYTICS_EVENT_STORE_DB</property>
         <property name="category">large_dataset_optimized</property>
      </properties>
   </analytics-record-store>
   <analytics-record-store name="INDEX_STAGING_STORE">
      <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation>
      <properties>
         <property name="datasource">WSO2_ANALYTICS_EVENT_STORE_DB</property>
         <property name="category">limited_dataset_optimized</property>
      </properties>
   </analytics-record-store>
   <analytics-record-store name="PROCESSED_DATA_STORE">
      <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation>
      <properties>
         <property name="datasource">WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB</property>
         <property name="category">large_dataset_optimized</property>
      </properties>
   </analytics-record-store>
   <!-- The data indexing analyzer implementation -->
   <analytics-lucene-analyzer>
      <implementation>org.apache.lucene.analysis.standard.StandardAnalyzer</implementation>
   </analytics-lucene-analyzer>
   <!-- The maximum number of threads used for indexing per node, -1 signals to aute detect the optimum value,
        where it would be equal to (number of CPU cores in the system - 1) -->
   <indexingThreadCount>-1</indexingThreadCount>
   <!-- The number of index shards, should be equal or higher to the number of indexing nodes that is going to be working,
        ideal count being 'number of indexing nodes * [CPU cores used for indexing per node]' -->
   <shardCount>6</shardCount>
   <!-- The number of batch index records, the indexing node will process per each indexing thread. A batch index record basically
        encapsulates a batch of records retrieved from the receiver to be indexed -->
   <shardIndexRecordBatchSize>100</shardIndexRecordBatchSize>
   <!-- Data purging related configuration -->
   <analytics-data-purging>
      <!-- Below entry will indicate purging is enable or not. If user wants to enable data purging for cluster then this property
       need to be enable in all nodes -->
      <purging-enable>false</purging-enable>
      <cron-expression>0 0 0 * * ?</cron-expression>
      <!-- Tables that need include to purging. Use regex expression to specify the table name that need include to purging.-->
      <purge-include-tables>
         <table>.*</table>
         <!--<table>.*jmx.*</table>-->
      </purge-include-tables>
      <!-- All records that insert before the specified retention time will be eligible to purge -->
      <data-retention-days>365</data-retention-days>
   </analytics-data-purging>
   <!-- Receiver/Indexing flow-control configuration -->
   <analytics-receiver-indexing-flow-control enabled="true">
      <!-- maximum number of records that can be in index staging area before receiving is throttled -->
      <recordReceivingHighThreshold>10000</recordReceivingHighThreshold>
      <!-- the limit on number of records to be lower than, to reduce throttling -->
      <recordReceivingLowThreshold>5000</recordReceivingLowThreshold>
   </analytics-receiver-indexing-flow-control>
</analytics-dataservice-configuration>

Analytics Record Store

The Analytics Record Store is the section that handles the storing of records that are received by WSO2 DAS in the form of events. This store contains raw data relating to events in a tabular form to be retrieved later.

The following record stores are configured in the <DAS_HOME>/repository/conf/analytics/analytics-config.xml  file by default.

Record Store TypeDefault NameDescription
Primary Store EVENT_STORE This record store is used to store the persisted incoming events of WSO2 DAS. It contains raw data in a tabular structure which can be used later.
Index Staging StoreINDEX_STAGING_STORE This record store is used to store meta data that need to be saved before writing indexed data into the file system.
Processed Data Store.PROCESSED_DATA_STOREThis record store is used to store summarised event data.
Configuring a record store 

The following is a sample configuration of a record store.

<analytics-record-store name="EVENT_STORE">
   <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsRecordStore</implementation>
   <properties>
         <property name="datasource">WSO2_ANALYTICS_EVENT_STORE_DB</property>
         <property name="category">large_dataset_optimized</property>
   </properties>
</analytics-record-store>

The following needs to be specified for each record store.

  • Name: A unique name for the record store.
  • Implementation: This specifies the implementation for the record store. For the record store to function, the provider for the datasource type mentioned in this implementation should be enabled in the <DAS_HOME>/repository/conf/datasources/analytics-datasources.xml file.
  • Record store specific properties: The properties that are defined per record store are described in the table below.

    PropertyDescriptionDefault Value
    EVENT_STOREINDEX_STAGING_STOREPROCESSED_DATA_STORE
    datasourceThe name of the datasource used to connect to the database used by the record store.WSO2_ANALYTICS_EVENT_STORE_DBWSO2_ANALYTICS_EVENT_STORE_DBWSO2_ANALYTICS_PROCESSED_DATA_STORE_DB
    category

    Possible values are as follows:

    • large_dataset_optimized: If this property value is added the record store is more suitable to be used by event streams with a high load of events.
    • limited_dataset_optimized: If this property value is added, the record store is more suitable to be used by event streams which handle relatively few events.
    large_dataset_optimizedlimited_dataset_optimizedlarge_dataset_optimized

Once a record store is configured in the analytics-config.xml  file, you can select it as the record store for the required event streams. For more information, see Persisting Data for Interactive Analytics.

 

Analytics File System

Analytics File System represents a storage area used for storing index data. The index data is generated from the records added to DAS. A set of background tasks are continuously run to look up data in Analytics Record Store and identify the tables that should be indexed. These tables are identified based on the information entered when persisting event streams. Then the index processing is carried out using Apache Lucene, and the resulting data is stored in the Analytics File System.

Configuring the file system

The following is a sample configuration of a File System in the analytics-config.xml  file.

<analytics-file-system>
   <implementation>org.wso2.carbon.analytics.datasource.rdbms.RDBMSAnalyticsFileSystem</implementation>
   <properties>
         <!-- the data source name mentioned in data sources configuration -->
         <property name="datasource">WSO2_ANALYTICS_FS_DB</property>
         <property name="category">large_dataset_optimized</property>
   </properties>
</analytics-file-system>

The following needs to be specified in a file system configuration.

  • Implementation: This specifies the implementation for the file system. For the file system to function, the provider for the datasource type mentioned in this implementation should be enabled in the <DAS_HOME>/repository/conf/datasources/analytics-datasources.xml file.
  • File system specific properties: The properties that are defined for a file system are described in the table below.

    PropertyDescriptionDefault Value
    EVENT_STOREINDEX_STAGING_STOREPROCESSED_DATA_STORE
    datasourceThe name of the datasource used to connect to the database used by the file system.WSO2_ANALYTICS_EVENT_STORE_DBWSO2_ANALYTICS_EVENT_STORE_DBWSO2_ANALYTICS_PROCESSED_DATA_STORE_DB
    category

    Possible values are as follows:

    • large_dataset_optimized: If this property value is added the file system is more suitable to be used by event streams with a high load of events.
    • limited_dataset_optimized: If this property value is added, the file system is more suitable to be used by event streams which handle relatively few events.
    large_dataset_optimizedlimited_dataset_optimizedlarge_dataset_optimized

Analytics indexing

By default, WSO2 DAS executes indexing operation when the server is started. The following system property can be used to disable the indexing operations if required.

  • For Windows: wso2server.bat -DdisableIndexing
  • Fow Linux: wso2server.sh -DdisableIndexing

This option allows you to create servers that are dedicated for specific operations such as event receiving, analytics, indexing, etc.

Configuring common parameters

Data purging parameters

ParameterDescriptionDefault Value
<purging-enable>This parameter specifies whether the functionality to purge data from event tables is enabled or not.false
<cron-expression>A regex expression to select the tables from which data should be purged.0 0 0 * * ?
<purge-include-tables>A list of event tables from which data should be purged can be defined as subelements of this element. 
<data-retention-days>The number of days for which the data should be retained in the event tables that were selected to have their data purged. All the data in these tables are cleared once the number of days that equal the value specified for this parameter have elapsed.365

Flow control parameters

ParameterDescriptionDefault Value
<recordReceivingHighThreshold>The minimum number of records that should be accumulated in the index staging record store in order to stop further records from being written into it.10000
<recordReceivingLowThreshold>The maximum number of records that should be accumulated in the index staging record store in order to allow further records to be written into it. This is relevant in a situation where receiving further records is throttled as a result of the number of records reaching the value specified for the recordReceivingHighThreshold parameter.5000

 

Other Parameters

ParameterDescriptionDefault Value
<analytics-lucene-analyzer>

The implementation of the Analytics Lucene Analyzer is defined as a subelement of this parameter.

e.g.,  <implementation>org.apache.lucene.analysis.standard.StandardAnalyzer</implementation>

 
<indexingThreadCount>

The maximum number of threads used for indexing per node. When -1 is specified, the optimum value (equal to the number of cores in the system) is automatically detected and the number of threads is generated accordingly.

-1
<shardCount>

The number of index shards the server should maintain per cluster. This fine tunes the scaling nature of the indexing cluster.

This parameter can only be set once for the lifetime of the cluster, and cannot be changed later on.


6
<shardIndexRecordBatchSize>

The number of batch index records the indexing node should process per each indexing thread at a given time.

An index record contains data of a record batch inserted in a single put operation. This batch can be as high as the event receiver queue data size, which is 10MB by default. Therefore, the highest amount of in-memory record data that an indexing processing thread can have is 10MB * 100. This parameter should be configured to change the maximum amount of memory available to the indexing node based on your requirement.

The above implementations can be done by general users and can be plugged in at-will to the server. And allows implementors to provide new, as well as special wrapper implementations on top of existing implementations to provide additional enhanced features such as data encryption, custom auditing etc..

The above two interfaces can be found in DAS_HOME/repository/components/plugins/ org.wso2.carbon.analytics.datasource.core-*.jar.

100
  • No labels
  • Download PDF icon Download PDF
  • Download a PDF file of the documentation