This documentation is for WSO2 Enterprise Integrator version 6.4.0 . View documentation for the latest release in the 6.x.x family and the latest release in the 7.x.x family.

All docs This doc
||
Skip to end of metadata
Go to start of metadata

The recommended deployment for EI Analytics is the Active-Active deployment pattern. The active-active pattern is a highly scalable deployment pattern. For an overview of the Active-Active deployment pattern and instructions to configure it, see the following topics.

Overview


The above diagram represents a deployment where you are not limited to two nodes. You can scale the analytics setup horizontally by adding more  EI Analytics nodes to the deployment. In this deployment, it is recommended to configure the EI node to publish events to multiple EI Analytics nodes in a Round Robin manner to ensure better fault tolerance. And the same event should NOT be duplicated to multiple analytics nodes.  

Active-Active deployment pattern utilizes distributed aggregations in order to perform analytics in a scalable manner. Distributed aggregations allow multiple nodes to write data to the same aggregation parallelly. This allows you to deploy any number of nodes to process a single aggregation and thereby avoid performance bottlenecks. In this setup, all EI analytics nodes must share the same EI_ANALYTICS DB.

To understand how an active-active cluster processes aggregations when aggregations are partitioned and assigned to different nodes, consider the following Siddhi query which defines an aggregation called EIStatsAgg and let's assume this aggregation is processed in a distributed manner

define stream PreProcessedESBStatStream(componentType string, componentID string, requestTimestamp long);

@store(type = 'rdbms', datasource = 'EI_ANALYTICS')
define aggregation EIStatsAgg

from PreProcessedESBStatStream
select componentType, componentID, count() as totalRequestCount
group by componentType, componentID
aggregate by requestTimestamp every seconds...years;

The above query addresses a simple use case which involves calculating the total request count for different types of ESB components (i.e., different proxies, REST APIs, sequences, mediators, etc.). Each request received by EI will publish an event which contains various information about pertaining to the request, and the information captured includes the component ID, the component type, and the timestamp to which the information applies. When an analytics node receives such an event, it will be passed into the aggregation, perform the required calculations and stores this information in the EI_ANALYTICS data store defined in the <EI_HOME>/wso2/analytics/conf/worker/deployment.yaml file.

Now let's assume that during a specific hour, the EI node publishes 30,000 events to analytics-node-1, and 40,000 events to analytics-node-2 for a proxy named JMSProxy. When you retrieve the total request count during that hour for the JMSProxy proxy via a retrieval query, the result is 70,000.

The steps to enable aggregation partitioning are provided under Configuring an active-active cluster.


Configuring an active-active cluster

To configure the EI Analytics nodes to deploy them as an active-active cluster, edit the <EI_HOME>/wso2/analytics/conf/worker/deployment.yaml file as follows:

Before you begin:

  • Download two binary packs of WSO2 EI Analytics.
  • Set up a working RDBMS instance to be used by the WSO2 EI Analytics cluster.


  1. For each node, enter a unique ID for the id property under the wso2.carbon section. This is used to identify each node within a cluster. For example, you can add IDs as shown below.
    • For node 1:

      wso2.carbon:
        id: wso2-ei-analytics-1
    • For node 2:

      wso2.carbon:
        id: wso2-ei-analytics-2
  2. Enable partitioning aggregations for each node, and assign a unique shard ID for each node. To do this, set the partitionById and shardId parameters as Siddhi properties as shown below. 

    Assigning shard IDs to nodes allows the system to identify each unique node when assigning parts of the aggregation. If the shard IDs are not assigned, system uses the unique node IDs (defined in step 1) for this purpose.

    • For node 1:

      siddhi:
        properties:
          partitionById: true
          shardId: wso2-sp-analytics-1
    • For node 2:

      siddhi:
        properties:
          partitionById: true
          shardId: wso2-sp-analytics-2
      • To maintain data consistency, do not change the shard IDs after the first configuration
      • When you enable the aggregation partitioning feature, a new column ID named SHARD_ID is introduced to the aggregation tables. Therefore, you need to do one of the following options after enabling this feature to avoid errors occuring due to the differences in the table schema.
        • Delete all the aggregation tables for SECONDSMINUTESHOURSDAYSMONTHSYEARS
        • Edit the aggregation tables by adding a new column named SHARD_ID, and specify it as a primary key.
  3. Configure a database, and then update the default configuration for the EI_ANALYTICS data source with parameter values suitable for your requirements

For instructions to configure WSO2 EI to publish statics to this EI Analytics deployment, see Publishing ESB Data to the Analytics Profile.

As explained in above the events are processed in multiple active nodes. Eventhough this is usually a stateful operation, you can overcome the node-dependent calculations via distributed aggregation. This allows WSO2 EI to execute EI Analytics scripts that depend on incremental distributed aggregation.

However, an active-active deployment can affect alerts because alerts also depend on some in-memory stateful operations such as windows. Due to this, alerts can be generated based on the events received by specific node. Thus the alerts are node-dependent, and you need to disable them ton run scripts with distributed incremental aggregation.

  • No labels