This sample demonstrates how to run Linear Regression using the Timeseries Toolbox. This sample uses Event simulator for inputs and the logger publisher for logging the outputs to the CEP console.
The data used for the regression is from a baseball stats dataset. The dependent variable (predictor variable) is the salary of the Baseball player based on his performance statistics which are the independent variables – rbi, walks, strikeouts and errors.
The execution plan used in this sample is as follows:
The inputs to the regression function are as follows.
Calculation Interval – 2
Batch size – 10,000
Confidence Interval – 0.95
Y (dependent) variable – salary
X (independent) variables – rbi, walks, strikeouts, errors
The output of the query will be the coefficients of the regression equation for the accumulated dataset at each 2nd event. The output attributes will include the input variable values, beta coefficients for each X variable, beta zero and the standard error.
For more detail on input and output parameters of regression please refer https://docs.wso2.com/display/CEP400/Regression
See Prerequisites in CEP Samples Setup page.
Building the sample
Start the WSO2 CEP server with the sample configuration numbered 0116. For instructions, see Starting sample CEP configurations. This sample configuration does the following:
- Points the default Axis2 repo to <CEP_HOME>/sample/artifacts/0116 (by default, the Axis2 repo is <CEP_HOME>/repository/deployment/server).
Executing the sample
Log into the CEP management console which is located at https://localhost:9443/carbon.
Go to Tools -> Event Simulator. Under the 'Multiple Events' section, you can see the listed ‘BaseballData.csv' file which contains the sample data. Click 'play' to start sending sample events from the file.
See the output events received from the CEP console. This sample uses the logger adaptor to log output events to the console.
For example, given below is a screenshot of the final regression output for this data.