||
Skip to end of metadata
Go to start of metadata

JDBC Storage Handler for Hive is a storage handling feature of WSO2 BAM. This page describes the usage of Hive JDBC Storage Handler with its basic functionality. Currently It supports writing into any database and reading from major databases (MySQL, Microsoft SQL, Oracle, H2, PostgreSQL).

Setting up BAM to use Hive JDBC handler

  1. Obtain the jdbc-driver JAR file that is shipped with the respective database type you use.
  2. Add the jdbc-driver to <BAM_HOME>/repository/component/lib/ directory, before starting the server. 

Sample 

Below is a sample Hive query, which uses the JDBC Storage Handler to call and connect to the specified database, and store the query results in it.

CREATE EXTERNAL TABLE IF NOT EXISTS PhonebrandTable(brand STRING, 
totalOrders INT, totalQuantity INT) STORED BY 
'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler' 
TBLPROPERTIES ( 
'wso2.carbon.datasource.name'='WSO2BAM_DATASOURCE',
'hive.jdbc.update.on.duplicate' = 'true', 
'hive.jdbc.primary.key.fields' = 'brand', 
'hive.jdbc.table.create.query' = 
'CREATE TABLE brandSummary (brand VARCHAR(100) NOT NULL PRIMARY KEY, 
totalOrders INT, totalQuantity INT)');
insert overwrite table PhonebrandTable

Storage handler table properties

Following are the properties that can be defined under  TBLPROPERTIES in a Hive query, which uses the Hive JDBC Storage Handler:

Property nameRequiredDetails
mapred.jdbc.driver.classYesThe classname for the JDBC Driver to use. This should be available on Hive's classpath.
mapred.jdbc.url YesThe connection URL for the database.
hive.jdbc.primary.key.fieldsYesWhetehr you have any primary keys in the database table.
mapred.jdbc.usernameNoThe username to access the database, if it's required.
mapred.jdbc.password NoThe password to access the database, if it's required.
hive.jdbc.table.create.queryNoIf the table already exists in the database, then you don't need this. Else, provide the SQL query for creating the table in the database.
mapred.jdbc.output.table.name NoThe name of the table in the database. It need not be the same as the name of the table in Hive. If you specified the SQL query for creating the table, handler  will pick the table name from the query. Otherwise, if your meta  table name is different from the table in database, specify this.
hive.jdbc.update.on.duplicateNoEnter either true or false as the value. If true is entered, then the storage handler will  update the records with duplicate keys. Otherwise it will insert all the data. 
hive.jdbc.output.upsert.queryNo

This is used to optimize the update operation. The default implementation is  to use either insert or update statements after the select statement. Therefore, there will be two database  round trips. But we can reduce it to one, by using database specific upsert statements.

For Example a query for a MySQL database is:

'INSERT INTO productSummary (product, itemsSold) values (?,?) ON DUPLICATE KEY UPDATE itemsSold=?'

 

hive.jdbc.upsert.query.values.orderNo

If you are using an upsert query, then this is mandatory. Sample values for the above query  will be:

'product,itemsSold,itemsSold' //values order for each question mark

hive.jdbc.input.columns.mappingNo

This is mandatory if your field names in the meta table and database tables are different. P rovide the field names in the database table in the same order as the field names in meta  table with ',' separated values.

For example: productNames,noOfItemsSold

These will map to  your meta table with product,itemsSold field names.

mapred.jdbc.input.table.nameNoUsed when reading from a database table. This is needed if the meta table name and  database table name are different.
  • No labels