Data that comes to BAM through data receivers is usually stored in the default Cassandra database. The image above shows how the Cassandra databases of all two BAM nodes are deployed in a cluster. This ensures that even if one node fails, data can be received and stored in other databases in the cluster, and also ensures high availability of data to run the Hive scripts on.
Information to know before you start
- Increase the heap memory size of BAM nodes to at least 2 GB and sync times in all nodes.
- BAM 2.4.0 uses Cassandra version 1.1.3 while BAM 2.4.1 uses Cassandra version 1.2.13.
- The fully-distributed BAM setup uses node 3, 4 and 5, which is why this topic includes configurations for node 3, 4 and 5, so you must change the configurations accordingly if you are using different setup.
- You can start the BAM server using the Cassandra profile, thus BAM can act as Cassandra in your cluster. See Running the Product on a Preferred Profile for more information on how to do this.
- For instructions on using external Cassandra with WSO2 BAM, see Connecting to External Cassandra.
Add the following configurations to
BAM_HOME>/repository/conf/etc/cassandra.yamlfile in the nodes mentioned below.
In WSO2 BAM 2.4.1, we use Cassandra version 1.2.13. You can generate tokens for the nodes using the script available in Apache Cassandra Documentation - Generating Tokens.
In WSO2 BAM 2.4.0, we use Cassandra version 1.1.3. You can generate tokens for the nodes using the script available in http://www.datastax.com/docs/0.8/install/cluster_init#calculating-tokens-for-a-single-data-center
For Cassandra 1.2.13 (in BAM 2.4.1) the
initial_tokenvalue cannot be 0. You must enter the value generated by the script.
Connect the nodes to Cassandra endpoints.
This is for Cassandra version 1.2.13. Change the
hector-config.xmlfile in all nodes as follows.
This is for Cassandra version 1.1.3. Change the
cassandra-component.xmlfile in all nodes as follows.
BAM_HOME>/repository/conf/advanced/streamdefn.xmlfile in all nodes as follows. This changes replication factor and read/write consistency levels using which data receivers write data to Cassandra. For example, if you have four Cassandra nodes in the cluster, enter 3 as the value for the
Configure the datasources. A set of JDBC URLs must be added as a comma separated list when load balancing is required.
This is for Cassandra version 1.2.13. These configurations are done in the
BAM_HOME>/repository/conf/ datasources/bam-datasources. xmlfile for all nodes as follows.
Additionally, you need to configure the
externalCassandraproperty of the should be changed to
trueif you are connecting to an external Cassandra cluster.
If you are using Hive analyzing functions, update the replication factor in the
This is for Cassandra version 1.1.3. These configurations are done in the
master-datasources.xmlin all nodes .
Additionally, you need to configure the
Optionally in order to view the cluster information in the Cassandra Keyspaces List UI, add a file named cassandra-endpoint.xml in
<BAM_HOME>/repository/conf/etcwith following configuration. The cassandra-endpoint.xml file is required when deploying the backend Cassandra cluster in a IaaS like AWS. IaaS may not provide real IPs, hence it is necessary to use this configuration file to list the mapped real IPs.
When configuring an external Cassandra cluster, you must additionally enable clustering in the
After starting the Cassandra cluster, you can verify the status of the cluster using a NodeTool command. For example, the below command is used to access the Cassandra keyspaces via NodeTool. (Port 9999 is the JMX port.)
./nodetool -u admin -pw admin -h localhost -p 9999 cfstats
You can connect to the Cassandra cluster using the Cassandra CLI tool. For example, the following commands are used to access the
EVENT_KSCassandra keyspace using Cassandra CLI.
When configuring the Cassandra cluster in this setup, you need to do the following for the Cassandra keyspaces feature to function and list the Cassandra keyspaces in the Main menu of the WSO2 BAM maangement console.
If you are using internal Cassandra, which is shipped with WSO2 BAM, both BAM nodes and Cassandra nodes should be in the same clustering domain.
If you are using external Cassandra, to change the following configuration in the
<BAM_HOME>/repository/conf/etc/cassandra.yamlfile to use the