Data analyzer component of each BAM node uses Hive query language scripts to retrieve data from the Cassandra cluster, process the data into meaningful information, and save the information in an RDBMS. In this example, we use MySQL as the RDBMS. You get an H2 database with WSO2 BAM by default but it is not recommended in a high volume, production setting. The analyzer components in node1 and node2 are clustered in this setup and it extends the data processing part to yet another external Apache Hadoop cluster.
The data analyzer cluster uses the Registry to store metadata related to Hive scripts and scheduled tasks. It uses Hazelcast to handle coordination required by the nodes in the Analyzer clusters when running Hive scripts. These settings ensure high availability using a failover mechanism so that if one node fails, the rest can take up its load and complete the task. The diagram below depicts this setup:
The BAM nodes in the analyzer cluster are used for three main purposes:
- Submit analytics queries to Hadoop cluster periodically as scheduled
- Receive data from data agents and persist them to Cassandra cluster
- Host end-user dashboards
The following steps provide instructions on how to configure the analyzer cluster. Here you must do the configurations in the analyzer nodes. The instructions in this section assume that node 1 and node 2 are the data analyzer nodes.
Do the following steps for both node 1 and node 2.
- Download and extract WSO2 BAM to both analyzer nodes.
- Place MySQL connector .jar file inside
<BAM_HOME>/repository/components/libfolder. You must download this.
Add the following datasource configuration in
master-datasources.xmlfile. Be sure to change the database URL and credentials according to your environment. The
WSO2_REG_DBdatabase is used in this example by the shared registry.
Add the following to
<BAM_HOME>/repository/conf/registry.xmlfile. These are mounting configurations to share the registry for both nodes.
Now the registry has been mounted and shared in both nodes.
To create the registry schema, execute
reg-dbdatabase. This needs to be done only in one node as the registry is now shared.
Alternatively you could just use the following startup script to create the required tables (if they are not created already). Note that this also needs to be done only in one node.
bat wso2server.bat -Dsetup(for Windows). When starting up the server, you can also check if the registry has been mounted properly.
- Edit the
.xmlfile and enable clustering as follows. This is to be done in both nodes.
<clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" enable="true">
- In above clustering configuration, make sure to also configure the following properties correctly.
membershipScheme- This indicates the cluster membership scheme being used. Set it to "multicast".
localMemberHost- The host name or IP address of the member. Set it to relevant host name of the machine (e.g., node1).
Add the following to the
tasks-config.xmlfile, which is in the
<BAM_HOME>/repository/conf/etc/directory in the Analyzer nodes.
About the task server count
This value indicates the number of task servers running in the cluster along with the analyzer nodes.
The task server count handles the analyzer node startup, where the analyzer nodes will hold the startup of the server until the number of servers specified in the
taskServerCountproperty are started up. It is only when that count is reached that the startup of all of those servers will continue to the end.
The reason that the analyzer nodes are held in the startup is so that other analyzers can also join in when scheduling the tasks (Hive script jobs). That way the scripts will be shared between all the analyzers that are available, and all scripts will not just be scheduled initially in the first server that is started up.
The following configuration must be done if you wish to change the database used to store metadata for the Hive script. Modify
<BAM_HOME>/repository/conf/advanced/hive-site.xmlas follows. It has a line added to
hive.aux.jars.pathproperty to include MySQL connector JAR in Hadoop job execution runtime. Windows users must use the
Additional details and recommendations
By default, this is stored in an H2 database, and these steps will enable this to be stored in MySQL as appropriate for this scenario. While this step is not a must, it is recommended for production environments to use a separate database instance such as MySQL or Oracle as a Hive metastore. See Configuring a Metadata Store for Hive for more information.
Add the following configuration for
BAM_HOME>/repository/conf/datasources/bam-datasources.xmlfile of both analyzer nodes. Be sure to change the database URL and credentials according to your environment.
WSO2BAM_DATASOURCEis the default data source available in BAM and it should be connected with the database you are using. This example uses the
bam-dbdatabase to store BAM summary data.
Note that this configuration must be changed in the
master-datasources.xmlfile if you are using BAM 2.4.0 instead of BAM 2.4.1.
If you are using BAM 2.4.1, start the BAM server in both analyzer nodes, and use the Deployment Synchronizer to specify one node as a read/write node and one as a read-only node.
Tip: Note that there is no concept of worker/manager separation for the BAM cluster and the topic on SVN-based deployment synchronizer mentions worker and manager configurations. Consider the manager and worker nodes mentioned there as node 1 and node 2.
Additional instructions and points to note
When starting BAM instances, use
disable.cassandra.server.startupproperty to stop running Cassandra bundled with BAM by default. We need to point to the external Cassandra cluster.
For BAM 2.4.0 or setups without SVN, remove BAM Toolbox Deployer feature using feature manager. We remove the feature because having deployers in both Analyzer BAM nodes interferes with proper Hive task fail-over functionality. We leave the BAM Toolbox Deployer feature in node1 so that it can copy the relevant files to the target location and schedule the Hive script.
BAM 2.4.1 gives you the option of disabling certain BAM components in addition to this. See here for more information on this.
You may also use the following to disable notifications.