This documentation is for WSO2 ML version 1.0.0. View documentation for the latest release.

||
Skip to end of metadata
Go to start of metadata

WSO2 Machine Learner uses Apache Spark for integration with WSO2 Data Analytics Server (DAS). You can use the Spark server embedded within WSO2 ML or connect to an external Spark cluster. Click on the relevant tab to for your preferred method.

WSO2 Machine Learner (WSO2 ML) ships an embedded Spark server for the ease of use.

In order to create datasets out of the data tables created by WSO2 DAS and then build models using the data collected in those tables, make sure that the following databases have the same URL in

<ML_HOME>/repository/conf/datasources/analytics-datasources.xml file as well as <DAS_HOME>/repository/conf/datasources/analytics-datasources.xml file.

  • WSO2_ANALYTICS_FS_DB
  • WSO2_ANALYTICS_EVENT_STORE_DB
  • WSO2_ANALYTICS_PROCESSED_DATA_STORE_DB

Once the WSO2 ML is started, you can proceed with the dataset creation.

WSO2 ML can connect to an external Spark cluster and retrieve data using the Data Access Layer (DAL) of WSO2 DAS.


Pre-requisites

In order to connect to an external Spark cluster, you need to do the following.

  • Set up an external Spark cluster with a master node and at least one worker node.

    Use Spark version 1.4.1 binary built on Hadoop 2.6.0.

  • Create an event stream in DAS and publish some events to it. (This is the data we are going to use as our dataset to perform predictive analysis.) 

Configurations in the Spark cluster

Follow the procedure below to do the required Spark cluster related configurations.

  1. Create the directory named analytics in the <SPARK_HOME> directory and copy all the following DAS related jars to it. These jars can be found in <DAS_HOME>/repository/components/plugins directory.
    • axiom_1.2.11.wso2v6.jar
    • axis2_1.6.1.wso2v14.jar
    • h2-database-engine_1.2.140.wso2v3.jar
    • hazelcast_3.5.0.wso2v1.jar
    • jdbc-pool_7.0.34.wso2v2.jar
    • lucene_5.2.1.wso2v1.jar
    • org.wso2.carbon.analytics.api_1.0.3.jar
    • org.wso2.carbon.analytics.dataservice.commons_1.0.3.jar
    • org.wso2.carbon.analytics.dataservice.core_1.0.3.jar
    • org.wso2.carbon.analytics.datasource.cassandra_1.0.3.jar
    • org.wso2.carbon.analytics.datasource.commons_1.0.3.jar
    • org.wso2.carbon.analytics.datasource.core_1.0.3.jar
    • org.wso2.carbon.analytics.datasource.hbase_1.0.3.jar
    • org.wso2.carbon.analytics.datasource.rdbms_1.0.3.jar
    • org.wso2.carbon.analytics.io.commons_1.0.3.jar
    • org.wso2.carbon.analytics.spark.admin_1.0.3.jar
    • org.wso2.carbon.analytics.spark.core_1.0.3.jar
    • org.wso2.carbon.analytics.spark.utils_1.0.3.jar
    • org.wso2.carbon.analytics.tools.backup_1.0.3.jar
    • org.wso2.carbon.analytics.tools.migration_1.0.3.jar
    • org.wso2.carbon.base_4.4.1.jar
    • org.wso2.carbon.core.common_4.4.1.jar
    • org.wso2.carbon.core.services_4.4.1.jar
    • org.wso2.carbon.core_4.4.1.jar
    • org.wso2.carbon.datasource.reader.hadoop_4.3.1.jar
    • org.wso2.carbon.ndatasource.common_4.4.1.jar
    • org.wso2.carbon.ndatasource.core_4.4.1.jar
    • org.wso2.carbon.ndatasource.rdbms_4.4.1.jar
    • org.wso2.carbon.ntask.common_4.4.7.jar
    • org.wso2.carbon.ntask.core_4.4.7.jar
    • org.wso2.carbon.ntask.solutions_4.4.7.jar
    • org.wso2.carbon.registry.admin.api_4.4.8.jar
    • org.wso2.carbon.registry.api_4.4.1.jar
    • org.wso2.carbon.registry.common_4.4.8.jar
    • org.wso2.carbon.registry.core_4.4.1.jar
    • org.wso2.carbon.registry.indexing_4.4.8.jar
    • org.wso2.carbon.registry.properties_4.4.8.jar
    • org.wso2.carbon.registry.resource_4.4.8.jar
    • org.wso2.carbon.registry.search_4.4.8.jar
    • org.wso2.carbon.registry.server_4.4.1.jar
    • org.wso2.carbon.registry.servlet_4.4.8.jar
    • org.wso2.carbon.utils_4.4.1.jar
  2. Create a directory named ml in the <SPARK_HOME> directory and copy the following ML related jars to it. These jars can be found in <ML_HOME>/repository/components/plugins directory.
    • org.wso2.carbon.ml.commons_1.0.2.jar
    • org.wso2.carbon.ml.core_1.0.2.jar
    • org.wso2.carbon.ml.database_1.0.2.jar
    • kryo_2.24.0.wso2v1.jar
  3. Create a file named spark-env.sh with the following entries and save it in the <SPARK_HOME>/conf directory.

    Change SPARK_MASTER_IP and SPARK_CLASSPATH values accordingly.

  4. Create a directory named datasources in the <SPARK_HOME>/conf directory. Copy the following files from <DAS_HOME>/repository/conf/datasources directory to it. Make sure that these files contain the URL pointing to the exact databases used by WSO2 DAS.
    • analytics-datasources.xml
    • master-datasources.xml

    As noted in the prerequisite section, you need to first publish events/data into an event stream of WSO2 DAS.

    For the H2 database (which is default for DAS), you need to append AUTO_SERVER=TRUE to the database connection string as shown below.

  5. Create a directory named analytics in the <SPARK_HOME>/conf directory. Copy the following files from <DAS_HOME>/repository/conf/analytics to it.
    • analytics-config.xml

      Comment out the following section in the analytics-config.xml file once you copy it.

       

    • analytics-data-config.xml
    • rdbms-query-config.xml 

  6. Restart the Spark cluster using the following commands

    To stop the cluster: <SPARK_HOME>$ ./sbin/stop-all.sh 

    To start the cluster: <SPARK_HOME>$ ./sbin/start-all.sh

Configurations in WSO2 ML

Follow the procedure below to do the required ML related configurations.

  1. Open the <ML_HOME>/repository/conf/etc/spark.config.xml file and do the following changes.
    • Change the spark.master property as required.
      e.g.,

    • Add the spark.executor.extraJavaOptions property.
      e.g.,

  2. Open the <ML_HOME>/repository/conf/datasources/analytics-datasources.xml file. Make sure that the URL in this file for the following databases are the same as that in <DAS_HOME>/repository/conf/datasources/analytics-datasources.xml file.

    • WSO2_ANALYTICS_FS_DB
    • WSO2_ANALYTICS_EVENT_STORE_D
    • WSO2_ANALYTICS_PROCESSED_DATA_STORE_D

    The H2 database (which is default) in addition requires AUTO_SERVER_TRUE to be appended to the database connection string as shown in the example below.

After setting up the above configurations, start WSO2 ML and create datasets out of the data tables created by WSO2 DAS and build models using the data collected in those tables.

Create Dataset

Follow the procedure below to create a dataset out of a WSO2 DAS data table.

  1. Log in to the ML UI using admin/admin credentials and the following URL: http://<ML_HOST>:<ML_PORT>/ml.
  2. Create a dataset as shown in the example below, selecting DAS as the source type.
  3. In the Data Source parameter, select the required table from the list of available tables. To view the available table, click the following icon in the Data Source parameter.
    The available tables will be displayed as shown in the example below.
     
  4. Click Create Dataset once you have entered values for all the required parameters
After creating a dataset, you can follow the WSO2 ML model generation wizard and build models. For more information see Generating Models.
  • No labels