Interactive SQL (Structured Query Language) queries are widely used for exploring and analyzing data in the current context by many business intelligence users. WSO2 DAS 3.0.0. ships with the feature of running SQL queries on the underlying datasources as specified in the DAS Data Access Layer (DAL).
It uses Spark SQL as the query engine, which succeeds Apache Hive from WSO2 DAS 3.0.0 onwards. This provides a powerful integration with the rest of the Spark analytics engine. For more information on Spark SQL, see Spark SQL Programming Guide.
Spark SQL queries
Spark SQL follows the standard SQL format. For information on the syntax explanations of the standard SQL format, see SQL Syntax.
Some query types of the standard SQL format are not supported by Spark SQL.
The query types that are supported by the Spark SQL parser are yet to appear in the published docs by the Apache Spark project. For more information on the SparkSQL query syntax, see the SparkSQL parser code, and the SQL Query test suite.
WSO2 DAS SQL guide
WSO2 DAS inherits the query parsing options from the Spark SQL’s native query parser. Click on the relevant tab to view the query formats to be used for the required action.
Use the following query syntax to register a temporary table in the Spark environment using data from Carbon analytics or any other relation provider class.
CREATE TEMPORARY TABLE <table_name> USING <provider_name> OPTIONS ( <options> ) AS <alias>;
The parameters of the above syntax are described below.
Element | Description |
---|---|
<table_name> | Name of the table which is created in the Spark environment. |
<provider_name> | Provider of data to create the table. It can be either Carbon analytics or a relation provider class. |
<options> | Other options for Spark to refer when creating the table. |
<alias> | An alias to uniquely identify the created table. This is optional. |
The provider used to create the temporary table can be Carbon Analytics, Carbon JDBC, Compressed Event Analytics or other. See the following sections for detailed information on how to use each provider to create the table.
Use the following query syntax to insert data into the temporary tables that already exist in the Spark environment.
INSERT INTO/OVERWRITE TABLE <table_name> <SELECT_query>
Parameters of the above syntax are described below.
Parameter | Description |
---|---|
<table_name> | The name of the temporary table you want to insert values into. |
<SELECT_query> | The select statement used to enter values into the temporary table being overwritten. |
For example;
INSERT OVERWRITE TABLE plugUsage select house_id, household_id, plug_id, max(value) - min (value) as usage, compositeID(house_id, household_id, plug_id) as composite_id from debsData where property = false group by house_id, household_id, plug_id;
You can use any SELECT
query in the standard SQL syntax to select data from a table which is created in the Spark environment.
SELECT * from <temp_table>;
<temp_table>
parameter specifies the name of the temporary table from which data should be selected.
From DAS 3.1.0 onwards, it is possible to process data incrementally based on the timestamp provided. Incremental processing is only available for the Carbon Analytics relation provider. It should be enabled per table via the IncrementalParams
option. For more information about this option, see Creating the Table Using Carbon Analytics as the Provider.
The incremental processing queries that can be carried out are as follows.
Query | Incremental_table_commit |
---|---|
Description | This query sets the incremental metadata. It should be set at a time when processing a set of data is complete. Once this query is committed, only the new data from the last processed data row is processed. |
Syntax | INCREMENTAL_TABLE_COMMIT <incremental ID1, incremental ID2, ...> |
Example | incremental_table_commit TableID1, TableID2; |
Query | Incremental_table_show |
---|---|
Description | This query shows the incremental metadata for the given tables. |
Syntax | INCREMENTAL_TABLE_SHOW <incremental ID1, incremental ID2, ...> |
Example | incremental_table_show TableID1, TableID2; |
Query | incremental_table_reset |
---|---|
Description | This query resets the incremental metadata for the given tables. It resets the incremental meta data values to Long.MIN_VALUE . |
Syntax | INCREMENTAL_TABLE_RESET <incremental ID1, incremental ID2, ...> |
Example | incremental_table_reset TableID1, TableID2; |
Reserved words in Spark SQL
The following are the reserved words in Spark SQL by default. These words cannot be used in Data Definition Language (DDL) tasks (e.g., as column names, etc).
The reserved words are case insensitive
- ABS
- ALL
- AND
- APPROXIMATE
- AS
- ASC
- AVG
- BETWEEN
- BY
- CASE
- CAST
- COALESCE
- COUNT
- DESC
- DISTINCT
- ELSE
- END
- EXCEPT
- FALSE
- FIRST
- FROM
- FULL
- GROUP
- HAVING
- IF
- IN
- INNER
- INSERT
- INTERSECT
- INTO
- IS
- JOIN
- LAST
- LEFT
- LIKE
- LIMIT
- LOWER
- MAX
- MIN
- NOT
- NULL
- ON
- OR
- ORDER
- SORT
- OUTER
- OVERWRITE
- REGEXP
- RIGHT
- RLIKE
- SELECT
- SEMI
- SQRT
- SUBSTR
- SUBSTRING
- SUM
- TABLE
- THEN
- TRUE
- UNION
- UPPER
- WHEN
- WHERE
- WITH
Reserved words in the WSO2 Carbon environment
The following words are reserved in the WSO2 Carbon environment.
The reserved words are case sensitive
- CarbonAnalytics
- CarbonJDBC