Machine learning has emerged as a key component of big data analytics space. The goal of WSO2 Machine Learner is to make machine learning accessible to WSO2 Data Analytics Platform. WSO2 Machine Learner provides a user friendly wizard like interface, which guides users through a set of steps to find and configure machine learning algorithms. The outcome of this process is a model that can be deployed in multiple WSO2 products, such as WSO2 Enterprise Service Bus (ESB), WSO2 Complex Event Processor (CEP) etc.
The novice-friendly machine learning workflow allows developers, data scientists to quickly implement machine learning solutions. You can utilize WSO2 Machine Learner to build machine learning models for various tasks, such as fraud detection, anomaly detection, classification etc. WSO2 Machine Learner is built up on top of the award-winning, WSO2 Carbon platform, which is based on the OSGi framework enabling better modularity for your service oriented architecture (SOA). WSO2 Machine Learner exposes all its operations via a RESTful API.
WSO2 Machine Learner is released under Apache Software License Version 2.0, one of the most business-friendly licenses available today.
Key Concepts of WSO2 ML
Following are the key concepts and terminology of WSO2 Machine Learner as illustrated in the image below.
A Dataset is a collection of data organized according to a defined schema in an CSV or TSV format. Header row should be available in the first row and if it is not available, WSO2 Machine Learner will generate a header similar to V1, V2 ... VX. Datasets are uploaded from a file system, a Hadoop distributed file system (HDFS) or WSO2 Data Analytics Server (DAS).
A dataset version is always inherited from a dataset. You can maintain the new data as versions under a dataset. All the dataset versions of a given dataset have the same schema (i.e. a feature set).
Project is a logical grouping of machine learning analyses, which are performed on a dataset. To analyze multiple datasets, you need to create multiple projects. A project is bound to a dataset not to a dataset version.
Analysis is a logical grouping of a set of machine learning tasks, which holds a pre-processed feature set, a selected machine learning algorithm and its calibrated set of hyper-parameters. A ML project contains one or more ML analyses which are immutable.
Model is an entity that is generated by running a ML analysis on a selected version of a dataset.