WSO2 Data Analytics Server is succeeded by WSO2 Stream Processor. To view the latest documentation for WSO2 SP, see WSO2 Stream Processor Documentation.

All docs This doc
||
Skip to end of metadata
Go to start of metadata

Introduction

This sample demonstrates an analysis of data that are collected on the usage of the Wikipedia.

Prerequisites

Follow the steps below to set up the prerequisites before you start.

  1. Set up the  general prerequisites required for WSO2 DAS.
  2. Download a Wikipedia data dump, and extract the compressed XML articles dump file to a preferred location of your machine.

Building the sample

Follow the steps below to build the sample.

Uploading the Carbon Application

Follow the steps below to upload the Carbon Application (c-App) file of this sample. For more information, see Carbon Application Deployment for DAS.

  1. Log in to the DAS management console using the following URL: https://<DAS_HOST>:<DAS_PORT>/carbon/
  2. Click Main, and then click Add in the Carbon Applications menu.
  3. Click  Choose File, and upload the <DAS_HOME>/capps/Wiki[pedia.car file as shown below.
  4. Click  Main , then click Carbon Applications, and then click List view, to see the uploaded Carbon application as shown below.  

Executing the sample

Follow the steps below to execute the sample.

Tuning the server configurations

The wikipedia dataset is transferred as a single article in a single event. Therefore, an event is relatively large (~300KB). Hence, you need to tune the server configurations as follows.

  1. Edit the values of the following properties in the <DAS_HOME>/repository/conf/data-bridge/data-bridge-config.xml file as shown below, to tune the queue sizes available for data receiving .

    <dataBridgeConfiguration>
    	<maxEventBufferCapacity>5000</maxEventBufferCapacity>
    	<eventBufferSize>2000</eventBufferSize>
    </dataBridgeConfiguration>
  2. Edit the values of the following properties in the  <DAS_HOME>/repository/conf/data-bridge/data-agent-config.xml  file as shown below., to tune the  queue sizes available for data persistence .

    <DataAgentsConfiguration>
    	 <Agent>
            <Name>Thrift</Name>
    			<QueueSize>65536</QueueSize>
           	 	<BatchSize>500</BatchSize>
    	</Agent>
    </DataAgentsConfiguration>
  3. Edit the values of the following properties in the  <DAS_HOME>/repository/conf/analytics/analytics-eventsink-config.xml  file as shown below, to change the Thrift publisher related configurations .

    <AnalyticsEventSinkConfiguration>
    	<QueueSize>65536</QueueSize>
    	<maxQueueCapacity>1000</maxQueueCapacity>
    	<maxBatchSize>1000</maxBatchSize>
    </AnalyticsEventSinkConfiguration
Running the data publisher

Navigate to <DAS_HOME>/samples/wikipedia/ directory in a new CLI tab, and execute the following command to run the data publisher: ant -Dpath=/home/laf/Downloads/enwiki-20150805-pages-articles.xml -Dcount=1000

 

Set the values of the -D path and -Dcount Java system properties in the above command, to point them to the location where you stored the Wikipedia article XML dump file which you downloaded in Prerequisites, and to the number of articles you need to publishe as events out of the total dataset respectively. (E.g. -Dcount=-1 to publish all articles.) This sends events to the event stream which is deployed through the  above C-App .

Executing the scripts

Follow the steps below to execute the Spark scripts which are deployed by the sample C-App.

  1. Log in to the DAS management console using the following URL: https://<DAS_HOST>:<DAS_PORT>/carbon/
  2. Click Main, and then click Scripts in the Batch Analytics menu.
  3. Click the corresponding Execute option of each of the following scripts to execute them.
    scripts to be executed

Viewing the output

You may use the Data Explorer or the Analytics Dashboard of the WSO2 DAS Management Console to browse published sample events.

Using the Data Explorer 

Follow the steps below to use the Data Explorer to view the output. 

Using the Data Explorer 

Follow the steps below to use the Data Explorer to view the output. 

  1. Log in to the DAS management console if you are not already logged in.
  2. Click Main, and then click Data Explorer in the Interactive Analytics menu.
  3. Select ORG_WSO2_DAS_SAMPLE_WIKIPEDIA_DATA for the Table Name as shown below.

    select the event stream

    You can also select the other streams which are deployed by the sample C-App as shown below.

  4. Click  Search . You view the published data as shown below.

Using the Analytics Dashboard

Follow the steps below to use the Analytics Dashboard to view the output. 

  1. Log in to the DAS management console if you are not already logged in.
  2. Click Main, and then click Analytics Dashboard in the Dashboard menu.
  3. Log in to the Analytics Dashboard using admin/admin credentials.
  4. Click the following CREATE DASHBOARD button in the top navigational bar to create a new dashboard.

  5.  Enter a  Title  and a  Description  for the new dashboard as shown below, and click  Next as shown below.
    create a new Dasboard
  6. Select a layout to place its components as shown below.

    select a layout

  7. Click  Select button of the Single Comun layout. You view a layout editor with the chosen layout blocks marked using dashed lines.
  8. Click the following CREATE GADGET button in the top menu bar. 
  9. Select the input data source as shown below, and click  Next .

    select the data source
  10. Select  Chart Type  and enter the preferred x, y axis and additional parameters based on the selected chart type as shown below, and click Preview.
    create a new gadget
  11. Click  Add to Gadget Store.
  12. Click the corresponding  Design  button of the Wikepedia_Samples_Dashboard to add the Contributor Summary gadget as shown below.
    designing the Dashboard
  13.  Click the following gadget browser icon in the side menu bar.
     

     

    You view the new gadget listed in the gadget browser as shown below.

    new gadget in the list of all available gadgets
  14. Click on  the new gadget, drag it out, and place it in the preferred grid of the selected layout in the dashboard editor as shown below.

    add gadget to layout

  15. Click the following  PREVIEW  button in the top menu bar. 

    previewing the Dashboard
    You view the preview of the Wikepedia_Samples_Dashboard  with the Contributor Summary gadget added to it as shown below.
    previewing the Dashboard
  • No labels