This page guides you through setting up deployment pattern 2, which is a HA clustered deployment of WSO2 Identity Server with WSO2 Identity Analytics. For more information about deployment pattern 2 and its high level architecture, see Deployment Patterns - Pattern 2.
Minimum High Availability Deployment for WSO2 IS Analytics
This section explains how to configure WSO2 Identity Server Analytics in a distributed setup. You can configure alerts to monitor these APIs and detect unusual activity, manage locations via geo location statistics and to carry out detailed analysis of logs relating to the APIs. WSO2 IS Analytics is powered by WSO2 DAS. The following diagram indicates the minimum deployment pattern used for high availability.
WSO2 Identity Server Analytics supports a deployment scenario that has focus on high availability (HA) along with HA processing. To enable HA processing, you should have two WSO2 IS Analytics servers in a cluster.
For this deployment, both nodes should be configured to receive all events. To achieve this, clients can either send all the requests to both the nodes or each request to any one of the two nodes (i.e., using load balancing or failover mechanisms). If clients send all the requests to both nodes, the user has to specify that events are duplicated in the cluster (i.e., the same event comes to all the members of the cluster). Alternatively, if a client sends a request to one node, internally it sends that particular request to the other node as well. This way, even if the clients send requests to only one node, both IS Analytics nodes receive all the requests.
In this scenario, one IS Analytics node works in active mode and the other works in passive mode. However, both nodes process all the data.
If the active node fails, the other node becomes active and receives all the requests.
When the failed node is up again, it fetches all the internal states of the current active node via synching.
The newly arrived node then becomes the passive node and starts processing all the incoming messages to keep its state synched with the active node so that it can become active if the current active node fails.
Warning: Some of the requests may be lost during the time the passive node switches to the active mode.
Before you configure a minimum high availability IS Analytics cluster, the following needs to be carried out.
- Download the WSO2 IS Analytics distribution. Click DOWNLOAD ANALYTICS in the WSO2 Identity and Access Management page.
- Take the following steps to install WSO2 IS Analytics. Sicne this procedure is identical to installing WSO2 Data Analytics Server (DAS), these steps take you to the DAS documentation for details.
- Follow the steps below to set up MySQL.
Download and install MySQL Server.
Download the MySQL JDBC driver.
Unzip the downloaded MySQL driver zipped archive, and copy the MySQL JDBC driver JAR (
mysql-connector-java-x.x.xx-bin.jar) into the
<IS Analytics_HOME>/repository/components/libdirectory of all the nodes in the cluster.
- Enter the following command in a terminal/command window, where
usernameis the username you want to use to access the databases.
mysql -u username -p
- When prompted, specify the password that will be used to access the databases with the username you specified.
Create two databases named
About using MySQL in different operating systems
For users of Microsoft Windows, when creating the database in MySQL, it is important to specify the character set as latin1. Failure to do this may result in an error (error code: 1709) when starting your cluster. This error occurs in certain versions of MySQL (5.6.x) and is related to the UTF-8 encoding. MySQL originally used the latin1 character set by default, which stored characters in a 2-byte sequence. However, in recent versions, MySQL defaults to UTF-8 to be friendlier to international users. Hence, you must use latin1 as the character set as indicated below in the database creation commands to avoid this problem. Note that this may result in issues with non-latin characters (like Hebrew, Japanese, etc.). The following is how your database creation command should look.
mysql> create database <DATABASE_NAME> character set latin1;
For users of other operating systems, the standard database creation commands will suffice. For these operating systems, the following is how your database creation command should look.
mysql> create database <DATABASE_NAME>;
Execute the following script for the two databases you created in the previous step.
mysql> source <IS Analytics_HOME>/dbscripts/mysql.sql;
From WSO2 Carbon Kernel 4.4.6 onwards there are two MySQL DB scripts available in the product distribution. Click here to identify as to which version of the MySQL script to use.Click here to view the commands for performing steps f and g
Configure the datasource in the
<IS Analytics_HOME>/repository/conf/analytics/analytics-conf.xmlfile as shown in the code extract below. As it is possible to maintain the data in one database, you can point all three datasources to a single database.
Alternatively, if you want to separate the data logically, create the following two databases in MySQL and point to the respective database as shown in the extract below.
When configuring the minimum high availability cluster following setups should be done for both nodes.
- Do the following database-related configurations.
Follow the steps below to configure the
<IS Analytics_HOME>/repository/conf/datasources/master-datasources.xmlfile as required.
Note that you can point all these datasources to a single database as it is not technically neccessary to separate the data into different databases. However, if required, you can have separate databases as well.
The steps given below demonstrate the flow assuming you have created separate databases for each. If you are using a single database instead, simply point the datasources indicated below to a single database.
Enable all the nodes to access the users database by configuring a datasource to be used by user manager as shown below.
Enable the nodes to access the registry database by configuring the
WSO2REG_DBdata source as follows.
For detailed information about registry sharing strategies, see the library article Sharing Registry Space across Multiple Product Instances.
Point to your database
<IS Analytics_HOME>/repository/conf/datasources/analytics-datasources.xmlfile as shown below.
For more information, see Datasources in DAS documentation.
To share the user store among the nodes, open the
<IS Analytics_HOME>/repository/conf/user-mgt.xmlfile and modify the
dataSourceproperty of the
<configuration>element as follows.
The datasource name specified in this configuration should be the same as the datasource used by user manager that you configured in sub step a, i.
<IS Analytics_HOME>/repository/conf/registry.xmlfile, add or modify the
dataSourceattribute of the
<dbConfig name="govregistry">element as follows.
Do not replace the following configuration when adding in the mounting configurations. The registry mounting configurations mentioned in the above steps should be added in addition to the following.
- Update the
<IS Analytics_HOME>/repository/conf/axis2/axis2.xmlfile as follows to enable Hazlecast clustering for both nodes.
Click here to view the complete clustering section of the axis2.xml file. with the changes mentioned above.
trueas shown below to enable Hazlecast clustering.
Enable wka mode on both nodes as shown below. For more information on wka mode, read About membership schemes.
Add both the nodes as well known members in the cluster under the
memberstag in each node as shown in the example below.
For each node, enter the respective server IP address as the value for the
localMemberHostproperty as shown below.
<IS Analytics_HOME>/repository/conf/event-processor.xmlfile as follows to cluster IS Analytics in the Receiver.
Click here to view the complete event-processor.xml file with the changes mentioned above.
HAmode by setting the following property.
Distributedmode by setting the following property.
For each node, enter the respective server IP address under the
HA modeConfig section as shown in the example below.
When you enable the HA mode for WSO2 IS Analytics, the following are enabled by default:
State persistence: If there is no real time use case that requires any state information after starting the cluster, you should disable event persistence by setting the
<IS Analytics_HOME>/repository/conf/event-processor.xmlfile as shown below.
When state persistence is enabled for WSO2 IS Analytics, the internal state of IS Analytics is persisted in files. These files are not automatically deleted. Therefore, if you want to save space in your IS Analytics pack, you need to delete them manually.
These files are created in the
<IS Analytics_HOME>/cep_persistence/<tenant-id>directory. This directory has a separate sub-directory for each execution plan. Each execution plan can have multiple files. The format of each file name is
1493101044948_MyExecutionPlan). If you want to clear files for a specific execution plan, you need to leave the two files with the latest timestamps and delete the rest.
- Event synchronization: However, if you set the
event.duplicated.in.cluster=trueproperty for an event receiver configured in a node, IS Analytics does not perform event synchronization for that receiver.
The following node types are configured for the HA deployment mode in the
eventSync: Both the active and the passive nodes in this setup are event synchronizing nodes as explained in the introduction. Therefore, each node should have the host and the port on which it is operating specified under the
Note that the
eventSyncport is not automatically updated to the port in which each node operates via port offset.
management: In this setup, both the nodes carry out the same tasks, and therefore, both nodes are considered manager nodes. Therefore, each node should have the host and the port on which it is operating specified under the
Note that the
managementport is not automatically updated to the port in which each node operates via port offset.
presentation: You can optionally specify only one of the two nodes in this setup as the presenter node. The dashboards in which processed information is displayed are configured only in the presenter node. Each node should have the host and the port on which the assigned presenter node is operating specified under the
<presentation>element. The host and the port as well as the other configurations under the
<presentation>element are effective only when the
presenter enable="falseproperty is set under the
<!-- HA Mode Config -->section.
<IS Analytics_HOME>/repository/conf/analytics/spark/spark-defaults.conffile as follows to use the Spark cluster embedded within IS Analytics.
- Keep the
local. This instructs Spark to create a Spark cluster using the Hazelcast cluster.
2as the value for the
carbon.spark.master.countconfiguration. This specifies that there should be two controllers in the Spark cluster. One controller serves as an active controller and the other serves as a stand-by controller.
The following example shows the
<IS Analytics_HOME>/repository/conf/analytics/spark/spark-defaults.conffile with changes mentioned above.
For more information, see Spark Configurations in DAS documentation.
Important: If the path to
<IS Analytics_HOME>is different in the two nodes, please do the following.
Create a symbolic link to
<IS Analytics_HOME>in both nodes, where paths of those symbolic links are identical. This ensures that if we use the symbolic link to access IS Analytics, we can use a common path. To do this, set the following property in the
In the Windows environment there is a strict requirement to have both IS Analytics distributions in a common path.
- Keep the
In order to share the C-Apps deployed among the nodes, configure the SVN-based deployment synchronizer. For detailed instructions, see Configuring SVN-Based Deployment Synchronizer.
IS Analytics Minimum High availability Deployment set up does not use a manager and a worker. For the purpose of configuring the deployment synchronizer, you can add the configurations relevant to the manager for the node of your choice, and add the configurations relating to the worker for the other node.
If you do not configure the deployment synchronizer, you are required to deploy any C-App you use in the IS Analytics Minimum High Availability Deployment set up to both the nodes.
If the physical IS Analytics server has multiple network interfaces with different IPs, and if you want Spark to use a specific Interface IP, open either the
<IS Analytics_HOME>/bin/load-spark-env-vars.shfile (for Linux) or
<IS Analytics_HOME>/bin/load-spark-env-vars.batfile (for Windows), and add the following parameter to configure the Spark IP address.
Starting the cluster
Once you complete the configurations mentioned above, start the two IS Analytics nodes. If the cluster is successfully configured, the following CLI logs are generated.
The following is displayed in the CLIs of both nodes, and it indicates that the registry mounting is successfully done.
A CLI log similar to the following is displayed for the first node you start to indicate that it has successfully started.
Once you start the second node, a CLI log similar to the following will be displayed for the first node to indicate that another node has joined the cluster.
A CLI log similar to the following is displayed for the second node once it joins the cluster.
Following are some exceptions you may view in the start up log when you start the cluster.
When you start the passive node of the HA cluster, the following errors are displayed.Click here to view the errors
This is because the artifacts are yet to be deployed in the passive node even though it has received the sync message from the active node. This error is no longer displayed once the start up for the passive node is complete.
When the Apache Spark Cluster is not properly instantiated, the following errors are displayed.Click here to view the errors
All the nodes in the Spark cluster should be started in order to stop this exception from occurring.
Testing the HA deployment
The HA deployment you configured can be tested as follows.
- Access the Spark UIs of the active controller and the stand-by controller using <
node ip>:8081in each node.
- Information relating to the active controller is displayed as shown in the example below.
- Information relating to the stand-by controller is displayed as shown in the example below.
- Information relating to the active controller is displayed as shown in the example below.
- Click the links under Running Applications in the Spark UI of the active controller to check the Spark application UIs of those applications. A working application is displayed as shown in the following example.
- Click the Environment tab of a Spark application UI to check whether all the configuration parameters are correctly set. You can also check whether the class path variables in this tab can be accessed manually.
- Check the Spark UIs of workers to check whether they have running executors. If a worker UI does not have running executors or if it is continuously creating executors, it indicates an issue in the Spark cluster configuration. The following example shows a worker UI with a running executor.
- Check the symbolic parameter, and check if you could manually access it via a
cd <directory>command in the CLI.
- Log into the IS Analytics Management Console and navigate to Main => Manage => Batch Analytics => Console to open the Interactive Analytics Console. Run a query in this console.