The latest version for DAS is WSO2 Data Analytics Server 3.1.0. View documentation for the latest release.
WSO2 Data Analytics Server is succeeded by WSO2 Stream Processor. To view the latest documentation for WSO2 SP, see WSO2 Stream Processor Documentation.

All docs This doc
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
||
Skip to end of metadata
Go to start of metadata

Siddhi allows you to identify outliers using linear regression on real time data streams. The outlier function takes in a dependent event stream (Y), an independent event stream (X) and a user specified range for outliers, and returns an output to indicate whether the current event is an outlier based on the regression equation that fits historical data.

The two implementations of outlier function can be distinguished as follows.

  • outlier: This allows you to specify a batch size (optional) that defines the number of events to be considered for the calculation of regression when finding outliers.
  • lengthTimeOutlier : This allows you to restrict the number of events considered for the regression calculation performed when finding outliers based on a specified time window and/or a batch size.

Input parameters of each implementation are as follows.

Input parameters for the outlier function

The following table describes the input parameters available for the outlier function.

ParameterDescriptionRequired/OptionalDefault Value
Calculation IntervalThe frequency with which the regression calculations should be carried out.Optional1 (i.e., for every event)
Batch SizeThe maximum number of events to be used for a regression calculation.Optional100,000,000
Confidence IntervalThe confidence interval to be used for a regression calculation.Optional0.95
RangeThe number of standard deviations from the regression equation.Required0.95
Y StreamThe data stream of the dependent variable.Required 
X StreamThe data stream of the independent variable.Required 

Format: outlier(range, Y, X) or outlier(calculation interval, batch size, confidence interval, range, Y, X) 

Input Parameters for Length Time Outlier Function

The following table describes the input parameters available for the lengthTimeOutlier function.

ParameterDescriptionRequired/OptionDefault Value
Time WindowThe maximum time duration to be considered for a regression calculation.Required 
Batch SizeThe maximum number of events to be used for a regression calculation.Required 
RangeThe number of standard deviations from the regression calculation.Required 
Calculation IntervalThe frequency with which the regression calculation should be carried out.Optional1 (for every event)
Confidence LevelThe confidence interval to be used for a regression calculation.Optional0.95
Y StreamThe data stream of the dependent variable.Required 
X StreamThe data stream of the independent variable.Required 

Format: lengthTimeOutlier(time window, batch size, range, Y, X) or lengthTimeOutlier(time window, batch size, range, calculation interval, confidence interval, Y, X)

Output parameters

The following table describes the output parameters.

The same output parameters are available for each implementation.

ParameterNameDescription
OutlieroutlierTrue if the event is an outlier, False if not.
Standard ErrorstdErrorThe standard error of the regression equation.
β coefficientsbeta0, beta1β coefficients of the regression equation.
Input Stream DataThe name given in the input stream.All the items sent in the input stream.

Examples

In each example given below, the query returns an indication whether the current event is an outlier or not together with the standard error of the regression equation (ε), β coefficients and all the items available in the input stream.

Example 1

The following query submits the number of standard deviations to be used as a range (2), a dependent input stream (Y) and an independent input stream (X) that are used to perform linear regression between Y and X. It returns an output that indicates whether the current event is an outlier or not.

from StockExchangeStream#timeseries:outlier(2, Y, X)
select *
insert into StockForecaster 

Example 2

The following query submits a time window (2 seconds), a batch size (100 events), the number of standard deviations to be used as a range (2), a dependent input stream (Y) and an independent input stream (X), that are used to perform linear regression between Y and X. It returns an output that indicates whether the current event is an outlier or not.

from StockExchangeStream#timeseries:lengthTimeOutlier(2 sec, 100, 2, Y, X)
select *
insert into StockForecaster  
  • No labels