The latest version for DAS is WSO2 Data Analytics Server 3.2.0. View documentation for the latest release.
WSO2 Data Analytics Server is succeeded by WSO2 Stream Processor. To view the latest documentation for WSO2 SP, see WSO2 Stream Processor Documentation.
||
Skip to end of metadata
Go to start of metadata

This section summarizes the results of performance tests carried out with the minimum fully distributed DAS deployment setup with RDBMS (MySQL) and HBase event stores separately.

Infrastructure used

  • c4.2xlarge Amazon EC2 instances as the DAS nodes
  • One DAS node was used as the publisher
  • c3.2xlarge Amazon instances as database nodes

Receiver node Data Persistence Performance

A reduction in the throughput is observed after 1200000 events in both DAS receiver nodes as shown below. This reduction is caused by limitations of MySQL. The receiver performance variation of the second node of the 2 node receiver cluster is as given below. The event rate after the first 1200000 events was considered for the the following graph because the initial buffer filling in receiver queues gives a very high receiver performance at the beginning of the event publishing.

 

MySQL Breakpoint

After around 30 million events are published, a sudden drop can be observed in receiver performance. This can be considered as the break point of MySQL event store. Another type of event store such as HBase event store should be used when the receiver performance has to be maintained unchanged.

 

Testing with large events

The following results were obtained by testing the two-node HA DAS cluster with 10 million events published via the Analysing Wikipedia Data sample. Each event in this sample contains several kilobytes, to represent large events.

In the above graph, TPS represents the total number of events published per second. This stabilizes at about 8500 events per second.

The above graph shows the amount of data that is published per second (referred to as the data rate). The data rate published is significantly reduced at the initial stages due to the flow control mechanisms of the receiver. It stabilizes at around 25 MB per second.

 

With MySQL RDBMS event store

DAS data persistence was measured by publishing to 2 loadbalanced receiver nodes with MySQL database.

 Number of EventsMean Event Rate

Smart Home sample

1000000005741 events per second

Wikipedia sample

159011274438 events per second

 

With HBase event store

DAS data persistence was measured by publishing to 2 loadbalanced receiver nodes with a 3 node HBase database cluster.

 Number of EventsMean Event Rate

Smart Home sample

50000000012638 events per second

Wikipedia sample

159011271640 events per second

 

Analyzer Performance

This section provides information about the Spark analyzing performance with different database types

With MySQL RDBMS event store

Spark analyzing performance (time to complete execution) was measured using a 2 node DAS analyzer cluster with MySQL database.

Time taken for each type of Spark query is as given below.

Data setEvent CountQuery TypeTime Taken (seconds)
Smart Home10000000INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS avg_usage, min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM smartHomeData GROUP BY metro_area 26.304
Smart Home10000000INSERT OVERWRITE TABLE peakDeviceUsageRange SELECT house_id, (max(power_reading) - min(power_reading)) AS usage_range FROM smartHomeData WHERE is_peak = true AND metro_area = "Seattle" GROUP BY house_id 21.659
Smart Home10000000INSERT OVERWRITE TABLE stateAvgUsage SELECT state, avg(power_reading) AS state_avg_usage FROM smartHomeData21.003
Smart Home10000000INSERT OVERWRITE TABLE stateUsageDifference SELECT a2.state, (a2.state_avg_usage-a1.overall_avg) AS avg_usage_difference FROM (select avg(state_avg_usage) as overall_avg from stateAvgUsage) as a1 join stateAvgUsage as a2 0.759
Wikipedia10000000INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki2883.66
Wikipedia10000000INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username6288.236
Wikipedia10000000INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki2619.713
Wikipedia10000000INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki4626.654

 

With HBase event store

Spark analyzing performance (time to complete execution) was measured using a 2 node DAS analyzer cluster with a 3 node HBase database cluster.

Time taken for each type of Spark query is as given below.

Data setEvent CountQuery TypeTime Taken (seconds)
Smart Home500000000INSERT OVERWRITE TABLE cityUsage SELECT metro_area, avg(power_reading) AS avg_usage, min(power_reading) AS min_usage, max(power_reading) AS max_usage FROM smartHomeData GROUP BY metro_area2218.23
Smart Home500000000INSERT OVERWRITE TABLE peakDeviceUsageRange SELECT house_id, (max(power_reading) - min(power_reading)) AS usage_range FROM smartHomeData WHERE is_peak = true AND metro_area = "Seattle" GROUP BY house_id2229.134
Smart Home500000000INSERT OVERWRITE TABLE stateAvgUsage SELECT state, avg(power_reading) AS state_avg_usage FROM smartHomeData GROUP BY state2185.097
Smart Home500000000INSERT OVERWRITE TABLE stateUsageDifference SELECT a2.state, (a2.state_avg_usage-a1.overall_avg) AS avg_usage_difference FROM (select avg(state_avg_usage) as overall_avg from stateAvgUsage) as a1 join stateAvgUsage as a20.923
Wikipedia15901127INSERT INTO TABLE wikiContributorSummary SELECT contributor_username, COUNT(*) as page_count FROM wiki GROUP BY contributor_username829.075
Wikipedia15901127INSERT INTO TABLE wikiTotalArticleLength SELECT SUM(length) as total_article_chars FROM wiki741.101
Wikipedia15901127INSERT INTO TABLE wikiTotalArticlePages SELECT COUNT(*) as total_pages FROM wiki643.101
Wikipedia15901127INSERT INTO TABLE wikiAvgArticleLength SELECT AVG(length) as avg_article_length FROM wiki709.001


Indexing Performance

shardIndexRecordBatchSize: The amount of index data (in bytes) to be processed at a time by a shard index worker.

ModeData setshardIndexRecordBatchSizeReplication FactorEvent CountTime Taken (seconds)Average TPS
Stand aloneWikipedia10MBNA1590112779751993.871724
Stand aloneWikipedia20MBNA1590112767652350.499187
Stand aloneSmart Home20MBNA20000000138514440.43321
Minimum Fully DistributedWikipedia20MB11590112768702314.574527
Minimum Fully DistributedWikipedia20MB01590112772802184.220742
  • No labels
  • Download PDF icon Download PDF
  • Download a PDF file of the documentation