Siddhi enables users to identify outliers using linear regression on real time, data streams. The outlier function takes in a dependent event stream (Y), an independent event stream (X) and a user specified range for outliers, and returns whether the current event is an outlier, based on the regression equation that fits historical data.
The frequency of regression calculation.
Default value: 1 (i.e. at every event)
The maximum number of events used for a regression calculation
Default value: 1,000,000,000 events
Confidence Interval to be used for regression calculation
Default value: 0.95
Number of standard deviations from the regression equation
Data stream of the dependent variable
Data stream of the independent variable
True if the event is an outlier, False if not
Standard Error of the Regression Equation
β coefficients of the Regression Equation
Input Stream Data
Name given in the input stream
All items sent in the input stream
The following query submits the number of standard deviations to be used as a range (2), a dependent input stream (Y) and an independent input stream X, that will be used to perform linear regression between Y and X and output whether the current event is an outlier or not.
from StockExchangeStream#transform.timeseries:outlier(2, Y, X)
insert into StockForecaster
When executed, the above query will return whether the current event is an outlier or not along with the standard error of the regression equation (ε), β coefficients and all the items available in the input stream.