Skip to end of metadata
Go to start of metadata

Click the relevant tab for instructions to deploy WSO2 ML in the preferred mode.

Before deploying WSO2 ML, follow the instructions in Production Deployment Guidelines

WSO2 ML is bundled with an inbuilt Apache Spark instance. In the standalone deployment pattern of WSO2 ML, Spark runs in local mode with one or more worker threads within the same machine. WSO2 ML is set to run in the standalone mode by default. The ML handles the driver program of the Spark instance which submits jobs to the Spark master.

The number of worker threads with which Spark is run can be set by the property spark.master in the < WSO2ML_HOME>/repository/conf/etc/spark-config.xml file. Possible values are as follows.

localRuns Spark locally with one worker thread. There will be no multiple threads running in parallel.
local[k]Runs Spark locally with k number of threads. (K is ideally the number of cores in your machine).
local[*]Runs Spark locally with a number of worker threads that equals the number of logical cores in your machine.

By default, WSO2 ML runs with an inbuilt Apache Spark instance. However, when working with big data, you can handle those large data sets in a distributed environment through WSO2 ML. You can carry out data pre-processing and model building processes on an Apache Spark cluster to share the workload between the nodes of the cluster. Using a Spark cluster optimizes the performance and reduces the time consumed to build and train a machine learning model for a large data set.

Follow the steps below to run the ML jobs by connecting WSO2 ML to an external Apache Spark cluster.

  • When following the instructions below you need to use Apache Spark version 1.4.1 with Apache Hadoop version 2.6 and later in the Apache Spark cluster.
  • The Spark deployment pattern can be Standalone, Yarn or Mesos.
  • WSO2 ML is unaware of the underlying configuration of the Spark cluster. It only interacts with the Spark master to which the jobs are submitted.
  1. Press Ctrl+C keys to shutdown the WSO2 ML server. For more information on shutting down WSO2 ML server, see Running the Product .

  2. Create a directory named <SPARK_HOME>/ml/ and copy the following jar files into it. These jar files can be found in the <ML_HOME>/repository/components/plugins directory.
    • org.wso2.carbon.ml.core_1.1.3.jar
    • org.wso2.carbon.ml.commons_1.1.3.jar
    • org.wso2.carbon.ml.database_1.1.3.jar
    • kryo_2.24.0.wso2v1.jar
  3. Create a file named spark-env.sh in the <SPARK_HOME>/conf/ directory and add the following entries.


    Instead of performing steps 2 and 3, you can create a symbolic link pointing to the <ML_HOME> in each node of the external Apache Spark cluster following the steps below. The path in which this symbolic link is located should be the same for each node. 

    1. Issue a command similar to the command given below in order to create the symbolic link (change the location specified as required).
      sudo ln -s  /home/ml/wso2ml-1.1.1 ml_symlink  
    2. Open the <ML_HOME>/repository/conf/analytics/spark/spark-defaults.conf file, and enter the symbolic link you created in the previous step as the value for the carbon.das.symbolic.link property.
  4. Restart the external Spark cluster using the following commands:

    {SPARK_HOME}$ ./sbin/stop-all.sh
    {SPARK_HOME}$ ./sbin/start-all.sh 
  5. In the <ML_HOME>/repository/conf/analytics/spark/spark-defaults.conf file, enter the Spark master URL as the value of the carbon.spark.master property as shown in the example below.

    You can find the Spark Master URL in the Apache Spark Web UI as shown below.

    carbon.spark.master  {SPARK_MASTER_URL}
  6. Restart the WSO2 ML server. For more information on restarting WSO2 ML server, see Running the Product.

WSO2 DAS has an embedded Spark server which automatically creates a Spark cluster when the DAS is started in a clustered mode. Follow the steps below to run the ML jobs by connecting WSO2 ML to a WSO2 DAS cluster that serves as a Spark cluster.

  1. Setup DAS cluster using Carbon clustering. Configure it to have at least one worker node. For more information on setting up a DAS cluster, see Clustering Data Analytics Server.

  2. Stop all DAS nodes. For more information on stopping DAS nodes, see Running the Product in DAS documentation.
  3. Start DAS cluster again without initializing Spark contexts with CarbonAnalytics and ML features. Use the following option when starting the cluster.

    This option disables the CarbonAnalytics Spark context.

  4. To configure ML to use DAS as the Spark cluster, set the following property in the <ML_HOME>/repository/conf/analytics/spark/spark-defaults.conf file.

    carbon.spark.master  {SPARK_MASTER_URL}

    {SPARK_MASTER_URL} should be replaced with the URL of the Spark cluster created with the DAS cluster.

  5. Enter values that are less than or equal to the allocated resources for Spark workers in the DAS cluster for the following two properties in the <ML_HOME>/repository/conf/analytics/spark/spark-defaults.conf file. This ensures that the ML does not call for unsatisfiable resources from the DAS Spark cluster.
    • spark.executor.memory:

      spark.executor.memory {memory_in_m/g}


    • spark.executor.cores

      spark.executor.cores {number_of_cores}


  6. Start the ML server. For more information on starting WSO2 ML server, see Running the Product.

This deployment method should only be used when using deep learning algorithms.

The deep learning algorithms used in WSO2 ML use the H2O Library. Therefore, when using those algorithms, the ML needs to connect to the H2O server. This connection can be made in one of the following two modes.

Local ModeThe H2O server starts along with the ML server.
Client ModeThe ML connects to an external H2O cloud as a client node.

Click on the relevant tab for instructions to deploy ML with an external H2O cluster in the preferred mode.

This is the default scenario when the ML is deployed in the standalone mode. H2O server is automatically started when you start the ML server when the H2O server is set to the local mode. This is done by setting the following property in the <ML_HOME>/repository/conf/etc/h2o-config.xml file.

<property name="mode">local</property>

This property is set by default.



In order to start H2O in client mode with ML, it is required to have a running external H2O cluster. Current ML uses H2O version (Slater release). Therefore, the external H2O cluster in this scenario should be created from the H2O version. To download this H2O version, follow the instructions in the official H2O website.

Starting ML with external H2O cluster

  1. Start the H2O server with the following command.
    java -jar h2o.jar -md5skip

    Make sure you include the -md5skip property in the command to prevent the H2O cluster from comparing md5 checksums of the two h2o.jar files in the H2O cluster and in the WSO2 ML. If a difference in the md checksums is detected, The ML server may be refused access to the external H2O cluster.

    To start a customized H2O cluster, see H2O deployment documentation.

    You can view the configurations of H2O server in the command line as shown in the example below.

    The IP address and the name of the H2O cloud to start the external H2O cluster can be taken from this log. In this example, the name of the cloud is maheshakya and the IP address is Note that the H2O cloud uses the ports 54321 and 54322.

  2. Configure WSO2 ML to start H2O in client mode in order to connect to the external cluster. This configuration is done by setting the following properties in the  <ML_HOME>/repository/conf/etc/h2o-config.xml file.

    <property name="mode">client</property> 
    <property name="ip">{IP}</property>

    {IP} is the address of the H2O server.

    <property name="port">{PORT}</property>{PORT} should be a port that is not being used in the external node e.g., 54345. Ports 54321 and 54322 cannot be used since they are used by the external H2O cluster.
    <property name="name">{H2O_CLOUD_NAME}</property>{H2O_CLOUD_NAME} is the name of the external H2O server.
  3.  Start the ML server. The following is displayed in the command line.
  • No labels