This documentation is for older WSO2 products. View documentation for the latest release.
Configuring Cassandra Cluster - Clustering Guide 4.2.0 - WSO2 Documentation
||
Skip to end of metadata
Go to start of metadata

Data that comes to BAM through data receivers is usually stored in the default Cassandra database. The image above shows how the Cassandra databases of all two  BAM nodes are deployed in a cluster. This ensures that even if one node fails, data can be received and stored in other databases in the cluster, and also ensures high availability of data to run the Hive scripts on.   

Information to know before you start

  • Increase the heap memory size of BAM nodes to at least 2 GB and sync times in all nodes. 
  • BAM 2.4.0 uses Cassandra version 1.1.3 while BAM 2.4.1 uses Cassandra version 1.2.13.
  • The fully-distributed BAM setup uses node 3, 4 and 5, which is why this topic includes configurations for node 3, 4 and 5, so you must change the configurations accordingly if you are using different setup.
  • You can start the BAM server using the Cassandra profile, thus BAM can act as Cassandra in your cluster. See Running the Product on a Preferred Profile for more information on how to do this.
  • For instructions on using external Cassandra with WSO2 BAM, see Connecting to External Cassandra.
  1. Add the following configurations to < BAM_HOME>/repository/conf/etc/cassandra.yaml file in the nodes mentioned below.

    In WSO2 BAM 2.4.1, we use Cassandra version 1.2.13. You can generate tokens for the nodes using the script available in Apache Cassandra Documentation - Generating Tokens.

    In WSO2 BAM 2.4.0, we use Cassandra version 1.1.3. You can generate tokens for the nodes using the script available in http://www.datastax.com/docs/0.8/install/cluster_init#calculating-tokens-for-a-single-data-center

    To node3:

    cluster_name:   Test Cluster
    initial_token:  0 
    seed_provider:
           - seeds: "node3,node4,node5"
    listen_address: node3
    rpc_address: node3
    rpc_port: 9160

    For Cassandra 1.2.13 (in BAM 2.4.1) the initial_token value cannot be 0. You must enter the value generated by the script.

    to node4:

    cluster_name: Test Cluster
    initial_token: 56713727820156410577229101238628035242
    seed_provider:
           - seeds: "node3,node4,node5"
    listen_address: node4
    rpc_address: node4
    rpc_port: 9160

    to node5:

    cluster_name: Test Cluster
    initial_token: 113427455640312821154458202477256070485
    seed_provider:
           - seeds: "node3,node4,node5"
    listen_address: node5
    rpc_address:    node5
    rpc_port:	    9160
  2. Connect the nodes to Cassandra endpoints.

    This is for Cassandra version 1.2.13. Change the hector-config.xml file in all nodes as follows.

    <Cassandra>
     	<Cluster>
         		<Name>Test Cluster</Name>
         		<Nodes>node3:9160,node4:9160,node5:9160</Nodes>
    		<DefaultPort>9160</DefaultPort>
         		<AutoDiscovery disable="false" delay="1000"/>
     	</Cluster>
    </Cassandra>

    This is for Cassandra version 1.1.3. Change the cassandra-component.xml file in all nodes as follows.

    <Cassandra>
     	<Cluster>
         		<Name>Test Cluster</Name>
         		<Nodes>node3:9160,node4:9160,node5:9160</Nodes>
    		<DefaultPort>9160</DefaultPort>
         		<AutoDiscovery disable="false" delay="1000"/>
     	</Cluster>
    </Cassandra>
  3. Edit the < BAM_HOME>/repository/conf/advanced/streamdefn.xml file in all nodes as follows. This changes replication factor and read/write consistency levels using which data receivers write data to Cassandra. For example, if you have four Cassandra nodes in the cluster, enter 3 as the value for the <ReplicationFactor> property.

    <StreamDefinition>
    	<ReplicationFactor>3</ReplicationFactor>
    	<ReadConsistencyLevel>QUORUM</ReadConsistencyLevel>
    	<WriteConsistencyLevel>ONE</WriteConsistencyLevel>
    	<StrategyClass>org.apache.cassandra.locator.SimpleStrategy</StrategyClass>
    </StreamDefinition>
  4. Configure the datasources. A set of JDBC URLs must be added as a comma separated list when load balancing is required.

    This is for Cassandra version 1.2.13. These configurations are done in the  < BAM_HOME>/repository/conf/ datasources/bam-datasources. xml  file for all nodes as follows.

    <datasource>
    	<name>WSO2BAM_CASSANDRA_DATASOURCE</name>
    	<description>The datasource used for Cassandra data</description>
    	<definition type="RDBMS">
    		<configuration>
    			<url>jdbc:cassandra://node3:9160/EVENT_KS,jdbc:cassandra://node4:9160/EVENT_KS,jdbc:cassandra://node5:9160/EVENT_KS<url> 
    			<username>admin</username>
    			<password>admin</password>
    		</configuration>
    	</definition>
    </datasource>

    Additionally, you need to configure the WSO2BAM_UTIL_DATASOURCE as follows.

    The externalCassandra property of the should be changed to true if you are connecting to an external Cassandra cluster.

    <datasource>
            <name>WSO2BAM_UTIL_DATASOURCE</name>
            <description>The datasource used for BAM utilities, such as message store etc..</description>
            <definition type="RDBMS">
                    <configuration>
                            <url>jdbc:cassandra://localhost:9160/BAM_UTIL_KS</url>
                            <username>admin</username>
                            <password>admin</password>
                            <dataSourceProps>
                                  <property name="externalCassandra">false</property>
                            </dataSourceProps>
                    </configuration>
            </definition>
    </datasource>

    If you are using Hive analyzing functions, update the replication factor in the WSO2BAM_HIVE_INCREMENTAL_DATASOURCE as follows.

    <datasource>
    			<name>WSO2BAM_HIVE_INCREMENTAL_DATASOURCE</name>
    			<definition type="RDBMS">
    				<configuration>
    					<username>admin</username>
    					<password>admin</password>
    					<dataSourceProps>
    						<property name="replicationFactor">1</property>
    						<property name="strategyClass">org.apache.cassandra.locator.SimpleStrategy</property>
    						<property name="readConsistencyLevel">QUORUM</property>
    						<property name="writeConsistencyLevel">QUORUM</property>
    						<property name="keyspaceName">HIVE_INCREMENTAL_KS</property>
    					</dataSourceProps>
    				</configuration>
    			</definition>
    		</datasource>

    This is for Cassandra version 1.1.3. These configurations are done in the < BAM_HOME>/repository/conf/datasources/ master-datasources.xml in all nodes .

    <datasource>
    	<name>WSO2BAM_CASSANDRA_DATASOURCE</name>
    	<description>The datasource used for Cassandra data</description>
    	<definition type="RDBMS">
    		<configuration>
    			<url>jdbc:cassandra://node3:9160/EVENT_KS,jdbc:cassandra://node4:9160/EVENT_KS,jdbc:cassandra://node5:9160/EVENT_KS<url>
    			<username>admin</username>
    			<password>admin</password>
    		</configuration>
    	</definition>
    </datasource>

    Additionally, you need to configure the WSO2BAM_UTIL_DATASOURCE .

    <datasource>
    	<name>WSO2BAM_UTIL_DATASOURCE</name>
    	<description>The datasource used for BAM utilities, such as message store etc..</description>
    	<definition type="RDBMS">
    		<configuration>
    			<url>jdbc:cassandra://localhost:9160/BAM_UTIL_KS</url>
    			<username>admin</username>
    			<password>admin</password>
    		</configuration>
    	</definition>
    </datasource>
  5. Optionally in order to view  the cluster information in the Cassandra Keyspaces List UI, add a file named cassandra-endpoint.xml in  <BAM_HOME>/repository/conf/etc with following configuration. The cassandra-endpoint.xml file is required when deploying the backend Cassandra cluster in a IaaS like AWS. IaaS may not provide real IPs, hence it is necessary to use this configuration file to list the mapped real IPs. 

    <Cassandra>
     <EndPoints>
        <EndPoint><HostName>name_of_machine1(BAM N1)</HostName></EndPoint>
        <EndPoint><HostName>name_of_machine2(BAM N2)</HostName></EndPoint>
     </EndPoints>
    </Cassandra>

    When configuring an external Cassandra cluster, you must additionally enable clustering in the <BAM_HOME>/repository/conf/axis2/axis2.xml file.

    <clustering class="org.wso2.carbon.core.clustering.hazelcast.HazelcastClusteringAgent" enable="true">
  6. After starting the Cassandra cluster, you can verify the status of the cluster using a NodeTool command. For example, the below command is used to access the Cassandra keyspaces via NodeTool. (Port 9999 is the JMX port.)

    ./nodetool -u admin -pw admin -h localhost -p 9999 cfstats

  • You can connect to the Cassandra cluster using the Cassandra CLI tool. For example, the following commands are used to access the EVENT_KS Cassandra keyspace using Cassandra CLI.

    ./cassandra-cli -h localhost -pw admin -u admin
    show keyspaces
    use EVENT_KS;
    show schema EVENT_KS;

    When configuring the Cassandra cluster in this setup, you need to do the following for the Cassandra keyspaces feature to function and list the Cassandra keyspaces in the Main menu of the WSO2 BAM maangement console.

    • If you are using internal Cassandra, which is shipped with WSO2 BAM, both BAM nodes and Cassandra nodes should be in the same clustering domain.

    • If you are using external Cassandra, to change the following configuration in the <BAM_HOME>/repository/conf/etc/cassandra.yaml file to use the AllowAllAuthenticator.

      authenticator:AllowAllAuthenticator
  • No labels