Message-ID: <843357576.559627.1571428493062.JavaMail.confluence@docs-node.wso2.com> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_559626_1535594896.1571428493062" ------=_Part_559626_1535594896.1571428493062 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html Machine Learner Algorithms

Machine Learner Algorithms

WSO2 ML uses the following algorithms to create  models usi= ng the data in a give data set. Algorithm Description Type Supported Publish/Download Formats Related Samples
LINEAR REGRESSION Linear Regression algorithm trains= a Generalized Linear Model that contains a relationship between independen= t variables (feature values in data) and the dependent variable (response v= ariable in the data). Numerical prediction
• Serialized
• PMML
RIDGE REGRESSION Ridge Regression algorithm is a va= riant of Linear Regression where the loss function is the linear = least squares function and the regularization is L2. Numerical prediction
• Serialized
• PMML
LASSO REGRESSION Lasso Regression algorithm is a va= riant of Linear Regression trained with L1 prior as regularizer. Numerical Prediction
• Serialized
• PMML
LOGISTIC REGRESSION Logistic Regression algorithm is&n= bsp;a Generalized Linear Model which predicts the probability of a binary o= utcome. Logistic function is used to determine the probabilities of th= e outcomes. Binary Classfication
• Serialized
• PMML
Support Vector Machine Support Vector Machine is a non-pr= obabilistic binary classifier. It constructs a hyperplane or set of hy= perplanes in a high (or infinite) dimensional space which generates a good = separation of data points between classes. Binary classification
• Serialized
• PMML
LOGISTIC REGRESSION L-BFGS Binary logistic regression can be = generalized into multinomial logistic regression to train and predict = multiclass classification problems. For k number of classes, It treats the = first class as one class and the rest of the k-1 classes as another class a= nd the class with the largest probability is chosen as the prediction. L-BF= GS (Limited memory BFGS) is used as an optimization technique for faster co= nvergence. Multiclass Classification
• Serialized
DECISION TREE Decision Tree algorithm creates a =  tree-like model that predicts the value of a target variable by learn= ing simple decision rules inferred from the features of the dataset. Multiclass Classification
• Serialized
RANDOM FOREST CLASSIFICATION= Random Forest Classification algor= ithm is an ensemble learning method which combines many decision trees in o= rder to reduce the risk of overfitting. Different decision trees are traine= d with different bootstraps drawn from the dataset (both feature bootstrapp= ing and data point bootstrapping). At prediction, majority vote is taken fr= om the trained decision trees. Multiclass Classification
• Serialized
RANDOM FOREST REGRESSION Random Forest Regression algorithm= is an ensemble learning method which combines many decision tree regressor= s in order to reduce the risk of overfitting. Different decision tree regre= ssors are trained with different bootstraps drawn from the dataset (both fe= ature bootstrapping and data point bootstrapping). The value is predicted t= o be the average of the tree predictions. Numerical Prediction
• Serialized
NAIVE BAYES Naive Bayes algorithm assumes the = independence between every pair of features in the dataset. It computes the=  conditional probability distribution of each feature given the class = label, and then it applies Bayes=E2=80=99 theorem to compu= te the conditional probability distribution of label given a data point and= use it for prediction. Negative feature values are not allowed when traini= ng a Naive Bayes model. Multiclass Classification
• Serialized
K-MEANS K-Means algorithm partitions the d= ata points into a predefined number of clusters (k) in which each data poin= t belongs to the cluster with the nearest mean, serving as a representative= (cluster center) of the cluster. Clustering
• Serialized
• PMML
K-MEANS WITH UNLABLED DATA This is a state-of-art algorithm w= hich performs K-means clustering algorithm on the training data. Data point= s which are beyond the cluster boundaries (according to a specific percenti= le value) are detected as anomalies. Labeled data is not required. Anomaly Detection
• Serialized
K-MEANS WITH LABLED DATA This is a state-of-art algorithm w= hich performs K-means clustering algorithm on the training data. Data point= s which are beyond the cluster boundaries (according to a specific percenti= le value) are detected as anomalies. This is used when labels (normal and a= nomalous) are available. Anomaly Detection
• Serialized
STACKED AUTOENCODERS<= /td> Stacked Autoencoders algorithms is= a multi-layer feed-forward artificial neural network that is trained = with stochastic gradient descent using back-propagation. The node= s in the input layer represent the features in the dataset and the nodes in= the output layer represent the class labels of the outcomes. Deep Learning
• Serialized
COLLABORATIVE FILTERING (Exp= licit Data) Collaborative Filtering is used in= recommendation systems and aims to fill in the missing entries of a u= ser-item association matrix. This algorithm allows entries in the user-item= matrix as explicit preferences(ratings) given by the user to the item. Rec= ommendations are based on these explicitly rates. Recommendation
• Serialized
COLLABORATIVE FILTERIN= G (Implicit Feedback Data) Collaborative Filtering is used in= recommendation systems and aims to fill in the missing entries of a u= ser-item association matrix. This algorithm allows preferences on the produ= cts to be implicit feedbacks such as views, clicks, purchases, likes, share= s etc. Recommendations are based on these implicit feedbacks. Recommendation
• Serialized