This section explains how to start a DAS instance and connect it to an Apache Spark cluster in a location different to that DAS instance.
Prerequisites
You are required to have access to an existing external Apache Spark cluster. For more information about configuring an Apache Spark cluster, see Apache Spark Documentation - Cluster Mode Overview.
Configuring DAS to connect to an external Apache Spark cluster
Download WSO2 DAS from here. Unzip the file you downloaded in your node, as well in each node of the external Apache Spark cluster.
- Set up MySQL as follows in your node, as well in each node of the external Apache Spark cluster. WSO2 DAS is configured with MySQL in this scenario because the datasource used needs to be of a type that can be accessed by the external Apache Spark cluster.
Download and install MySQL Server.
Download the MySQL JDBC driver.
Unzip the downloaded MySQL driver zipped archive, and copy the MySQL JDBC driver JAR (
mysql-connector-java-x.x.xx-bin.jar
) into the<DAS_HOME>/repository/components/lib
directory of all the nodes in the cluster.- Enter the following command in a terminal/command window, where
username
is the username you want to use to access the databases.
mysql -u username -p
- When prompted, specify the password that will be used to access the databases with the username you specified.
Create two databases named
userdb
andregdb.
About using MySQL in different operating systems
For users of Microsoft Windows, when creating the database in MySQL, it is important to specify the character set as latin1. Failure to do this may result in an error (error code: 1709) when starting your cluster. This error occurs in certain versions of MySQL (5.6.x) and is related to the UTF-8 encoding. MySQL originally used the latin1 character set by default, which stored characters in a 2-byte sequence. However, in recent versions, MySQL defaults to UTF-8 to be friendlier to international users. Hence, you must use latin1 as the character set as indicated below in the database creation commands to avoid this problem. Note that this may result in issues with non-latin characters (like Hebrew, Japanese, etc.). The following is how your database creation command should look.
mysql> create database <DATABASE_NAME> character set latin1;
For users of other operating systems, the standard database creation commands will suffice. For these operating systems, the following is how your database creation command should look.
mysql> create database <DATABASE_NAME>;
Execute the following script for the two databases you created in the previous step.
mysql> source <DAS_HOME>/dbscripts/mysql.sql;
- Create the following databases in MySQL.
WSO2_ANALYTICS_EVENT_STORE_DB
WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB
Point to
WSO2_ANALYTICS_FS_DB, WSO2_ANALYTICS_EVENT_STORE_DB
andWSO2_ANALYTICS_PROCESSED_DATA_STORE_DB
in the<DAS_HOME>/repository/conf/datasources/analytics-datasources.xml
file as shown below. This configuration should be done in your node, as well in each node of the external Apache Spark cluster.<datasources-configuration> <providers> <provider>org.wso2.carbon.ndatasource.rdbms.RDBMSDataSourceReader</provider> </providers> <datasources> <datasource> <name>WSO2_ANALYTICS_FS_DB</name> <description>The datasource used for analytics file system</description> <definition type="RDBMS"> <configuration> <url>jdbc:mysql://[MySQL DB url]:[port]/WSO2_ANALYTICS_FS_DB</url> <username>[username]</username> <password>[password]</password> <driverClassName>com.mysql.jdbc.Driver</driverClassName> <maxActive>50</maxActive> <maxWait>60000</maxWait> <testOnBorrow>true</testOnBorrow> <validationQuery>SELECT 1</validationQuery> <validationInterval>30000</validationInterval> <defaultAutoCommit>false</defaultAutoCommit> </configuration> </definition> </datasource> <datasource> <name>WSO2_ANALYTICS_EVENT_STORE_DB</name> <description>The datasource used for analytics record store</description> <definition type="RDBMS"> <configuration> <url>jdbc:mysql://[MySQL DB url]:[port]/WSO2_ANALYTICS_EVENT_STORE_DB</url> <username>[username]</username> <password>[password]</password> <driverClassName>com.mysql.jdbc.Driver</driverClassName> <maxActive>50</maxActive> <maxWait>60000</maxWait> <testOnBorrow>true</testOnBorrow> <validationQuery>SELECT 1</validationQuery> <validationInterval>30000</validationInterval> <defaultAutoCommit>false</defaultAutoCommit> </configuration> </definition> </datasource> <datasource> <name>WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB</name> <description>The datasource used for analytics record store</description> <definition type="RDBMS"> <configuration> <url>jdbc:mysql://[MySQL DB url]:[port]/WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB</url> <username>[username]</username> <password>[password]</password> <driverClassName>com.mysql.jdbc.Driver</driverClassName> <maxActive>50</maxActive> <maxWait>60000</maxWait> <testOnBorrow>true</testOnBorrow> <validationQuery>SELECT 1</validationQuery> <validationInterval>30000</validationInterval> <defaultAutoCommit>false</defaultAutoCommit> </configuration> </definition> </datasource> </datasources> </datasources-configuration>
For more information, see Datasources.
Create a symbolic link pointing to the
<DAS_HOME>
in your DAS node as well as in each node of the external Apache Spark cluster. The path in which this symbolic link is located should be the same for each node. You can use a command similar to the following command in order to create the symbolic link (change the location specified as required).
sudo ln -s /home/centos/das/wso2das-3.0.0-SNAPSHOT das_symlink
This creates a symbolic link in the
mnt/DAS location
.In a multi node DAS cluster that runs in a RedHat Linux environment, you also need to update the
<DAS_HOME>/bin/wso2server.sh
file with the following entry so that the<DAS_HOME>
is exported. This is because the symbolic link may not be resolved correctly in this operating system.Export CARBON_HOME=<symbolic link>
- Do the following in the
<DAS_home>/repository/conf/analytics/spark/spark-defaults.conf
file in your DAS node as well as all the nodes of the external Apache Spark cluster.- Set the
carbon.spark.master
property tocluster
. - Add the
carbon.das.symbolic.link
property and enter the symbolic link you created in the previous step as the value.
- Set the
- Start the external Apache Spark cluster if it is not already started.
- Start the DAS instance It will connect to the external Apache Spark cluster at start up.
- Go to external-spark-node-IP:4040 and check whether you can see the following web page.
The class paths listedn thespark.driver.extraClassPath
should include the following symbolic link location.
(mnt/das/das_symlink)
