This documentation is for WSO2 Data Analytics Server 3.0.1. View documentation for the latest release.

All docs This doc
||
Skip to end of metadata
Go to start of metadata

Interactive SQL (Structured Query Language) queries are widely used for exploring and analyzing data in the current context by many business intelligence users. WSO2 DAS 3.0.0. ships with the feature of running SQL queries on the underlying datasources as specified in the DAS Data Access Layer (DAL).

It uses Spark SQL as the query engine, which succeeds Apache Hive from WSO2 DAS 3.0.0 onwards. This provides a powerful integration with the rest of the Spark analytics engine. For more information on Spark SQL, see Spark SQL Programming Guide.

Spark SQL queries

Spark SQL follows the standard SQL format. For information on the syntax explanations of the standard SQL format, see SQL Syntax.

Some query types of the standard SQL format are not supported by Spark SQL.

The query types that are supported by the  Spark SQL parser are yet to appear in the published docs by the Apache Spark project. For more information on the SparkSQL query syntax, see the SparkSQL parser code, and the SQL Query test suite.

WSO2 DAS SQL guide

WSO2 DAS inherits the query parsing options from the Spark SQL’s native query parser. Click on the relevant tab to view the query formats to be used for the required action.

Use the following query syntax to register a temporary table in the Spark environment using data from Carbon analytics or any other relation provider class.

CREATE TEMPORARY TABLE <table_name> 
USING <provider_name> 
OPTIONS ( <options> ) 
AS <alias>;

The parameters of the above syntax are described below.

ElementDescription
<table_name>
Name of the table which is created in the Spark environment.
<provider_name>
Provider of data to create the table. It can be either Carbon analytics, or a relation provider class.
<options>
Other options for Spark to refer when creating the table.
<alias>
An alias to uniquely identify the created table. This is optional.

The provider used to create the temporary table can be Carbon Analytics, Carbon JDBC or other.

Creating the table using Carbon Analytics as the provider

Use the following query to create a table in the Spark environment (if it does not already exists), using data from Carbon analytics. Carbon analytics refer to either the built-in H2 database or any external database which is connected to the DAL.

CREATE TEMPORARY TABLE plugUsage 
USING CarbonAnalytics 
OPTIONS (tableName "plug_usage", 
         schema "house_id INT, household_id INT, plug_id INT, usage FLOAT -sp, composite FACET -i",
		 primaryKeys "household_id, plug_id"
        );

 

Carbon analytics relation provider options

The options that can be used with the Carbon analytics relation provider are described below. 

Specify the options in key value pairs separated by commas, and give the values within quotation marks.

OptionDescriptionExample

tableName or streamName

Name of the table in the DAL.

tableName "plug_usage" or

streamName "plug.usage"
schema

Schema of the table in the DAL. This is optional.

You do not need to specify a schema for a table which already exists in the DAL, as its schema would be inferred. Specifying a schema again for an existing table with the given name will overwrite the initial schema.

Schema fields are column name and column type value pairs with indexing options. These fields should be comma separated. Following are the schema indexing options.

  • -i denotes an indexed column. All indexed columns should be of numeric type.
  • -sp denotes an indexed column with score param.
schema "house_id INT, household_id INT, plug_id INT, usage FLOAT -sp, composite FACET -i"
primaryKeysPrimary key of the table in the DAL. This is optional. Assign primary keys if and only if you have provided a schema.
primaryKeys "household_id, plug_id"
mergeSchemaA boolean flag used for schema merging. If this option is set to true, the given schema is merged with the corresponding table schema in the Data Access Layer (if a schema exists). If the option is set to false, the given schema overwrites the table schema in the Data Access Layer.mergeSchema "false"
recordStore

The Analytics Record Store in which this table is created.

The default Analytics Record Store used by CarbonAnalytics is the PROCESSED_DATA_STORE.

recordStore "EVENT_STORE"

Creating the table using Carbon JDBC as the provider

Use the following query syntax to create a table in the Spark environment using data from Carbon JDBC.

CREATE TEMPORARY TABLE <temp_table> using CarbonJDBC options (dataSource "<datasource name>", tableName "<table name>");  

Options in the above syntax are described below.

OptionDescriptionExample
dataSource "<datasource name>"
The name of the data source from which data should be obtained for the temporary table.dataSource "test"
tableName "<table name>
The name of the table in the selected data source from which data should be obtained for the temporary table.tableName "TEST.PEOPLE"


Creating the table using other relation providers

Use the following query syntax to create a table in the Spark environment, using data from a relation provider class. A relation provider builds the connection from Spark to any external database. For example, the following query creates a table in the Spark environment using the Spark JDBC provider connecting to a H2 database

CREATE TEMPORARY TABLE foo 
USING jdbc 
OPTIONS (url "jdbc:h2:mem:testdb0", 
         dbtable "TEST.PEOPLE", 
         user "testUser",
         password "testPass"
         ); 

 

Other relation provider options

For more information on the options that can be used with the Spark JDBC relation provider, see Spark SQL and DataFrame Guide.

Specify the options in key value pairs separated by commas, and give the values within quotation marks.


Use the following query syntax to insert data into the temporary tables that already exist in the Spark environment.

INSERT INTO/OVERWRITE TABLE <table_name> <SELECT_query>

Parameters of the above syntax are described below.

ParameterDescription
<table_name>
The name of the temporary table you want to insert values into.
<SELECT_query>
The select statement used to enter values into the temporary table being overwritten.

 

For example;

INSERT OVERWRITE TABLE plugUsage 
select house_id, household_id, plug_id, max(value) - min (value) as usage, compositeID(house_id, household_id, plug_id) as composite_id from debsData where property = false group by house_id, household_id, plug_id;

You can use any SELECT query in the standard SQL syntax to select data from a table which is created in the Spark environment.

SELECT * from <temp_table>;

<temp_table> parameter specifies the name of the temporary table from which data should be selected.

Reserved words in Spark SQL

The following are the reserved words in Spark SQL by default. These words cannot be used in Data Definition Language (DDL) tasks (e.g., as column names, etc).

The reserved words are case insensitive

  • ABS
  • ALL
  • AND
  • APPROXIMATE
  • AS
  • ASC
  • AVG
  • BETWEEN
  • BY
  • CASE
  • CAST
  • COALESCE
  • COUNT
  • DESC
  • DISTINCT
  • ELSE
  • END
  • EXCEPT
  • FALSE
  • FIRST
  • FROM
  • FULL
  • GROUP
  • HAVING
  • IF
  • IN
  • INNER
  • INSERT
  • INTERSECT
  • INTO
  • IS
  • JOIN
  • LAST
  • LEFT
  • LIKE
  • LIMIT
  • LOWER
  • MAX
  • MIN
  • NOT
  • NULL
  • ON
  • OR
  • ORDER
  • SORT
  • OUTER
  • OVERWRITE
  • REGEXP
  • RIGHT
  • RLIKE
  • SELECT
  • SEMI
  • SQRT
  • SUBSTR
  • SUBSTRING
  • SUM
  • TABLE
  • THEN
  • TRUE
  • UNION
  • UPPER
  • WHEN
  • WHERE
  • WITH

Reserved words in the WSO2 Carbon environment

The following words are reserved in the WSO2 Carbon environment. 

The reserved words are case sensitive

  • CarbonAnalytics
  • CarbonJDBC
  • No labels