This documentation is for Machine Learner 1.1.0. View documentation for the latest release.
||
Skip to end of metadata
Go to start of metadata

WSO2 Machine Learner maintains a set of product-specific configurations in the <ML_HOME>/repository/conf/machine-learner.xml file. Following are the detailed definitions of the configurations.


Database configurations

The following configuration specifies the datasource which connects the ML to the database in which product-specific data relating to the MB is stored. 

<DataSourceName>jdbc/WSO2ML_DB</DataSourceName>

The following table describes the parameters of the database related configuration.

Parameter NameDescriptionTypeDefault Value
DataSourceName

The datasource which connects the ML to the database in which product-specific data relating to the MB is stored. The default value is the inbuilt H2 database. The configuration of this database can be found in the <ML_HOME>/repository/conf/datasources/ml-datasources.xml file.

 

Currently WSO2 ML supports H2 and MySQL for its inbuilt database. If you want to change this default database you need to do the following.

  • Use the scripts in the <ML_HOME>/dbscripts/ directory to create the tables of the database.
  • Change the properties of the above datasource configuration accordingly.

For similar instructions on changing the default database, see Setting up H2 and Setting up MySQL.


String
jdbc/WSO2ML_DB

Summary statistics settings 

When a dataset is created, WSO2 ML calculates summary statistics for the datatset. Following configurations are used by WSO2 ML to calculate summary statistics.

<SummaryStatisticsSettings>
	<HistogramBins>20</HistogramBins>
	<CategoricalThreshold>20</CategoricalThreshold>
	<SampleSize>10000</SampleSize>
</SummaryStatisticsSettings>

The following table describes the parameters of the summary statistics configuration.

Parameter NameDescriptionTypeDefault Value

HistogramBins

The number of intervals generated for continuous variables when plotting histograms.Integer20

CategoricalThreshold

The cut-off value for the number of unique values in a numerical variable that is used in deciding whether that particular variable is a categorical variable or a continuous variable. Any numerical variable in the dataset having unique values less than or equal to this value, are treated as a categorical variable. Otherwise, it will be treated as a continuous variable.Integer80

SampleSize

Size of the sample that is used for the summary statistics calculation.Integer10000

Input/output handling configurations 

Following set of properties define the input/output handling configurations of WSO2 ML.

<Properties>
	<Property name="ml.thread.pool.size" value="100" />
	<Property name="file.in" value="org.wso2.carbon.ml.core.impl.FileInputAdapter" />
	<Property name="file.out" value="org.wso2.carbon.ml.core.impl.FileOutputAdapter" />
	<Property name="hdfs.in" value="org.wso2.carbon.ml.core.impl.HdfsInputAdapter" />
	<Property name="hdfs.out" value="org.wso2.carbon.ml.core.impl.HdfsOutputAdapter" />
	<Property name="das.in" value="org.wso2.carbon.ml.core.impl.BAMInputAdapter" />
	<Property name="registry.in" value="org.wso2.carbon.ml.core.impl.RegistryInputAdapter" />
	<Property name="registry.out" value="org.wso2.carbon.ml.core.impl.RegistryOutputAdapter" />
</Properties>

The following table describes the properties of the input/output handling configuration.

Property NameDescriptionTypeDefault Value

ml.thread.pool.size

The size of the thread pool used by WSO2 ML.Integer100
file.inThe adapter that reads files from the local file system.Stringorg.wso2.carbon.ml.core.impl.FileInputAdapter
file.outThe adapter that writes files to the local file system.Stringorg.wso2.carbon.ml.core.impl.FileOutputAdapter
hdfs.inThe adapter that reads files from a Hadoop File System (HDFS).Stringorg.wso2.carbon.ml.core.impl.HdfsInputAdapter
hdfs.outThe adapter that writes files to a Hadoop File System (HDFS).Stringorg.wso2.carbon.ml.core.impl.HdfsOutputAdapter
registry.inThe adapter that reads data from WSO2 registry.Stringorg.wso2.carbon.ml.core.impl.RegistryInputAdapter
registry.outThe adapter that writes data into WSO2 registry.Stringorg.wso2.carbon.ml.core.impl.RegistryOutputAdapter 
If you want to add an custom input/output adapter, add the following properties to the above input/output handling configurations:
<Property name="custom.in" value="org.wso2.carbon.ml.custom.adapter.input.CustomMLInputAdapter"/>
<Property name="custom.out" value="org.wso2.carbon.ml.custom.adapter.output.CustomMLOutputAdapter"/>

Storage configurations

This section contains configurations relating to the storage of datasets and models using the storage type file or hdfs. Configurations relating to storage are defined as shown in the example below. This configuration is optional and commented out by default. You can uncomment it and edit the default configurations as required.

<HdfsURL>hdfs://localhost:9000</HdfsURL>
<!-- DatasetStorage> 
	<StorageType>file</StorageType> 
	<StorageDirectory>/tmp</StorageDirectory> 
</DatasetStorage -->

<!-- ModelStorage> 
	<StorageType>file</StorageType> 
	<StorageDirectory>/tmp</StorageDirectory> 
</ModelStorage -->

The following table explains the parameters of the storage configuration.

Parameter NameDescriptionTypeDefault Value
HdfsURLThe HDFS location in which the ML is allowed to store files. This needs to be specified if you select HDFS as the storage type for datasets and models.  
 Location where datasets are stored. By default, the value of this server configuration is the file system. For information on using HDFS as the dataset storage, see HDFS Support, and for information on using custom input/output adapters as the dataset storage, see ML Custom Adapter Extension.N/AN/A
ModelStorage
Location where models are persisted. By default, the value of this server configuration is the file system. For information on using HDFS as the model storage, see HDFS Support. For information on using HDFS as the model storage, see HDFS Support, and for information on using custom input/output adapters as the model storage, see ML Custom Adapter Extension.N/AN/A
StorageTypeThis parameter specifies whether the relevant artifact should be stored in the file system, HDFS or a storage defined by a custom input/output adapter.String
  • If you want to use the file system as the storage type, enter file as the value of this parameter.

  • If you want to use HDFS as the storage type, enter hdfs as the value of this parameter.

  • If you want to use a storage defined by a custom input/output adapter, as the storage type, enter the prefix (e.g. custom) of the custom input/output adapter property name (e.g. custom.in) as the value of this parameter.

StorageDirectoryThe storage directory in which the relevant artifact should be saved.String
  • If the storage type is file, the artifact is saved in the <CARBON_HOME>/datasets or <CARBON_HOME>/models/ directory by default (i.e. depending on whether your are configuring storage parameters for datasets or models).
  • If the storage type is hdfs, the artifact is saved in the directory (which is in the location to which the HDFS URL points). Specify this location as the value of this parameter.
  • If the storage type is a storage defined by a custom input/output adapter, the artifact is saved in the directory which you define as the value of this parameter.

Algorithm configurations

WSO2 ML supports various machine learning algorithms. Configurations of these algorithms are defined as shown in the example below.

<Algorithms>
		<Algorithm>
			<Name>LINEAR_REGRESSION</Name>
			<Type>Numerical_Prediction</Type>
			<Parameters>
				<Name>Iterations</Name>
				<Value>100</Value>
			</Parameters>
			<Parameters>
				<Name>Learning_Rate</Name>
				<Value>0.001</Value>
			</Parameters>
			<Parameters>
				<Name>SGD_Data_Fraction</Name>
				<Value>1</Value>
			</Parameters>
		</Algorithm>
	</Algorithms>

The following table describes the parameters of an algorithm configuration.

Parameter NameDescriptionType
NameThe name of the algorithm.String
TypeThe type of the algorithm.String
IterationsThe number of iterations of gradient descent to run.Integer

 

Other configurations

Parameter NameDescriptionTypeDefault Value
EmailNotificationEndpointThis parameter is used to enter a list of comma-separated email addresses to which model building status mails should be sent. This is an optional parameter.StringN/A
ModelRegistryLocation

The location in the Governance Registry where ML related models are published.

e.g.,

<ModelRegistryLocation>ml</ModelRegistryLocation>
Stringml
  • No labels