In the previous tutorials, you created a simple Siddhi application and understood how data arriving from outside sources can be captured and pre-processed by WSO2 SP. Further, you understood how to persist data in data stores to be used later.
In this tutorial, let's consider a more complax scenario which involves summarizing data in real time.
The foreman of the sweet factory requires the following information to understand the production capacity of the factory for each sweet category.
- The total sweet production for each sweet category for the last minute (at any given time).
- The highest amount of sweets produced during a production run needs to be identified for each 10 production runs.
For both expected results mentioned above, WSO2 SP needs to consider events that fall within a certain frame instead of considering all the events sent to a specific stream. WSO2 Siddhi supports this via the Window concept.
A window allows you to capture a subset of events based on a specific criterion from an input stream to generate a result. The specific criterion can be time or length. Time windows capture events that occur during a specific time frame (e.g., within a minute), and a length windows capture events based on the number of events (e.g., every 10 events). Further, a window can be a sliding window (continuous window updates) or a batch/tumbling window (where window updates take place only when the specified time period has elapsed or the number of events have occured).
This tutorial covers the following Siddhi concepts:
- Introduction to windows
- Time windows for time-based summarization
- Unit windows for count-based summarization
Before you begin:
This section covers the two scenarios mentioned above. Let's get started!
Scenario 1 - Calculating the total sweet production for each sweet category for the last minute
In this scenario, a Siddhi application is created to produce a time-based summarization.
Let's reuse the following input stream definition that you used in previous tutorials to capture data about the sweet production.
To output the overall production during a minute per sweet category for the past minute, let's define an output stream as follows.
To specify how the data must be derived from the
SweetProductionStreaminput stream and inserted into the output stream, let's add a query as follows.
This inserts the value for
PastMinuteProductionStreamoutput stream with the same attribute name. The sum for the
amountis calculated for all the events that have arrived, and inserted into the output stream as
pastMinuteTotal. The output is grouped by the name of the sweet category.
- The query given in the above step calculates the total produced for a sweet category based on all the events sent to the
SweetProductionStreaminput stream. However, at any given time, you need to see only the total amount produced during the last minute. To achieve this, let's update the query as follows:
To consider only events that are within a specific time frame, let's add a window as follows.
In this scenario, the subset of events to be captured by the window is based on time and the period of time considered is one minute. To specify this, update the window as follows.
#window.time(1 minute)indicates that the window is a sliding window. This means that the window is of a fixed duration (i.e., 1 minute in this scenario), and it slides over incoming events to maintain this constant duration.
Once these changes are applied, the
SweetTotalsAppSiddhi application looks as follows.
- Let's try out this Siddhi application in the Stream Processor Studio. To do this, start and access the Stream Processor Studio. Then add the
PastMinuteSweetProductionAppSiddhi application you created as a new file, and save it. Now you can start it by clicking the followig icon for it while it is open.
To try out the
SweetTotalsAppSiddhi application with the latest changes, let's send the following four cURL commands.
This generates an output similar to the following. (Note: the Gateau amount is increased to 10)
The actual output may differ based on the time taken to issue the above cURL commands.
Scenario 2 - Identifying the highest amount of sweets produced during a production run
In this scenario, let's create a new Siddhi application named
MaximumSweetProductionApp to capture the highest production reported for each sweet category during a production run, for 10 production runs.
The data arriving from the Sweet Bots is the same as in the previous scenario of this tutorial. Therefore, we can use the same input stream definition.
The output should include the name of the sweet and the highest production total observed during the last 10 production runs. Therefore, let's define an output stream definition as follows.
To calculate the highest production total observed in a production run, the
max()Siddhi function can be used as follows.
In this scenario, the output is derived based on events that fall within a a fixed batch of 10 events. For this purpose, let's add a window as follows:
Unlike the previous scenario, the window must be a length window and not a time window. Therefore, let's add a window and specify that it needs to be a length window as shown below. You also need specify the exact length of the length window (10 in this scenario).
The above configuration has added a sliding length window of 10 production runs. However, the requirement of the foreman is to calculate the maximum once per 10 production runs. Therefore, let's convert the window you added to a batch window by adding
Batchto the window configuration as shown below.
The completed Siddhi application with source and sink mappings added should look as follows:
To test the
MaximumSweetProductionAppSiddhi application, let's start the Siddhi application in the Stream Processor Studio and send 10 events by issuing 11 cURL commands as follows.
This generates the following log in the console.
Note that in the last event representing the last production run, the total production was 17, but the maximum detected total production output is 16. This is because you have used a batch withdow, and the 11th event does not belong to the fixed batch of 10 events.