The following diagram indicates the fully-distributed deployment pattern used for high availability.
|Distributed component||Minimum number of nodes||Description|
|Receiver nodes||2||For data analytics to happen, it is necessary to first collect the relevant data you require. DAS provides data agents that capture information on the messages flowing through the WSO2 ESB, WSO2 Application Server, and other products that use the DAS data publisher. The information is obtained by the data receivers and are then stored in a datastore, where it is optimized for analysis. The receiver nodes are used to obtain this data from the data agents.|
A background indexing process fetches the data from the datastore and does the indexing operations. These operations are handled by the indexer nodes in a fully distributed, highly available system.
|Analyzer (Spark) nodes||2||The analyzer engine, which is powered by Apache Spark, analyzes this data according to defined analytic queries. This will usually follow a pattern of retrieving data from the datastore, performing a data operation such as an addition, and storing the data back in the datastore. The analyzer operations are performed by the analyzer nodes.|
The dashboard sends queries to the datastore for the analyzed data and displays them graphically. This function can be distributed to the dashboard nodes.
|Storm nodes||0||Apache Storm can be used to handle any additional load. This can be any number of nodes and need not be used in a fully distributed system unless required.|
When configuring the Fully distributed cluster following setups should be done in each DAS node.
- Mount all governance registries to a single governance registry and config registries to a single config registry.
- Point all user stores to a single user store DB.
- Enable Hazelcast clustering in all nodes and configure as a single WKA based cluster.
<DAS_HOME>/repository/conf/analytics/spark/spark-defaults.conffile, which is the number of spark masters in the setup in all analyzer nodes. Configure that file with the
carbon.das.symbolic.linkconfigurations as described in Spark Configurations.
In a multi node DAS cluster that runs in a RedHat Linux environment, you also need to update the
<DAS_HOME>/bin/wso2server.shfile with the following entry so that the
<DAS_HOME>is exported. This is because the symbolic link may not be resolved correctly in this operating system.
Export CARBON_HOME=<symbolic link>
- Point WSO2_ANALYTICS_FS_DB, WSO2_ANALYTICS_EVENT_STORE_DB and WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB in the
<DAS_HOME>/repository/conf/datasources/analytics-datasources.xmlfile in all nodes to common 3 different datasources.
- If using other datasource provider types uncomment relevant providers on the beginning of that config file.
- Enable dep-sync in all DAS nodes and enable commit to true only in a single receiver node.
When starting the instances you can provide predefined profiles to start the instances as receiver nodes, analyzer nodes or Indexer nodes.
|Node Type||Disabled Components||Option|
|Receiver Node||AnalyticsEngine, AnalyticsExecution, Indexing, DataPurging, AnalyticsSparkCtx, AnalyticsStats||-receiverNode|
|Indexer Node||AnalyticsExecution, AnalyticsEngine, EventSink, AnalyticsSparkCtx, AnalyticsStats, DataPurging||-indexerNode|
|Analyzer Node||Indexing, EventSink, DataPurging, IndexThrottling, AnalyticsStats||-analyzerNode|
These can be provided at the server startup. For example: