Siddhi enables users to perform linear regression on real time, data streams. The regress function takes in a dependent event stream (Y), any number of independent event streams (X1, X2,...Xn) and returns all coefficients of the regression equation
Input Parameters
Parameter | Required / Optional | Description |
Calculation Interval | Optional | The frequency of regression calculation. Default value: 1 (i.e. at every event) |
Batch Size | Optional | The maximum number of events used for a regression calculation Default value: 1,000,000,000 events |
Confidence Interval | Optional | Confidence Interval to be used for regression calculation Default value: 0.95 |
Y Stream | Required | Data stream of the dependent variable |
X Stream(s) | Required | Data stream(s) of the independent variable |
Output Parameters
Parameter | Name | Description |
Standard Error | stdError | Standard Error of the Regression Equation |
β coefficients | beta0, beta1, beta2 etc; | n+1 β coefficients where n is the number of x parameters |
Input Stream Data | Name given in the input stream | All attributes sent in the input stream |
The regress function will nullify any β coefficients that fail the T-test based on the confidence interval. The user can access any of the output parameters using the ‘Name’ of the parameter given above.
Examples
The following query submits a calculation interval (every 10 events), a batch size (100,000 events), a confidence interval (0.95), a dependent input stream (Y) and 3 independent input streams (X1, X2, X3) that will be used to perform linear regression between Y and all X streams.
from StockExchangeStream#transform.timeseries:regress(10, 100000, 0.95, Y, X1, X2, X3)
select *
insert into StockForecaster
When executed, the above query will return the standard error of the regression equation (ε), 4 β coefficients (β_{0}, β_{1}, β_{2}, β_{3}) and all the items available in the input stream. Using these results, the user can build a relationship between Y and all Xs (regression equation) as follows