||
Skip to end of metadata
Go to start of metadata

 When executing Hive scripts, you might want to populate their parameter values (e.g., username, password) at runtime. This way, you do not have to hard-code values in the Hive scripts and your implementation will be more dynamic, secure and easy to maintain. You can do this in two ways:

Let's take a look at each option.

Using the GUI to get values from the registry

BAM provides a simple GUI for you to configure Hive scripts to get values from the Registry at runtime using placeholder keys with particular registry/repository prefixes. Only text/plain media type supports parameterization.

Follow the instructions below for a sample demonstration.

1. Log on to the BAM Management Console and select Registry-> Browse menu from the Main menu. That opens the Browse window. From there, select the Detail View tab.

2. Click the Add Resource link to add a resource.

Now, let's see how to configure the Hive script to get the value of this resource at runtime.

3. You can use the following notation to access a resource: ${conf:path} or ${gov:path} or ${local:path} from a Hive script. This is because the Registry consists of 3 repositories as follows:

  • Local Data Repository  (local)
  • Configuration Registry (conf)
  • Governance Registry (gov)

The resource path you use in the script is prefixed with any of these 3 repositories depending on where you created the resource. It can be config, local or governance. For example, shown below is a resource by the name password in the local repository.

You can use : ${local:/repository/credential/password} value in that format for the password resource in the Hive script.

Custom Hive analyzer

Using the GUI as described above has some limitations since it only allows you to get values from the Registry at run-time  There can be instances where you have to dynamically get and set variable values that might not necessarily be stored in the Registry. For example, sometimes you have to perform a logical operation on data, store the result in a temporary variable and use that variable value in yet another operation. In such cases, you can write your own Java class by implementing the HiveAnalyzer class and implementing its execute method

The HiveAnalyzer interface is shown below.

public interface HiveAnalyzer {
    public abstract void execute(AnalyzerContext analyzerContext);
}

HiveAnalyzer is an interface which allows you to put custom logic in to hive scripts and it passes the AnalyzerContext which contains the parameters parsed at run time. You can use getParameters() method in AnalyzerContext to retrieve the parameters parsed at run time and write some custom java logic using them.  

You can also use following property-methods in AnalyzerContext  to set parameters to the Hive configuration used during a particular Hive script execution.  

 

setProperty(String key, String value)
getProperty(String key)
removeProperty(String key)

For example, if you set a property named lastRunTime using the setProperty method, the property will be available in subsequent Hive queries with notation { hiveconf:lasRunTime }. You can dynamically populate this lastRunTime property in each run of the script, by looking up values from the registry, databases etc. This is useful in maintaining state between two executions of a Hive script and using them subsequently.

In order to use the custom Java implementation, drop the JAR file containing the class to <BAM_HOME>/repository/components/lib and restart the server.

There are two ways to utilize custom hive analyzer in hive script. Those are,

Syntax 1

class 'Fully qualified class name of the implementation';

Syntax 2

Aliasing  custom hive analyzers

To provide much more flexible and user friendly way of using class analyzers you can add aliases for custom hive analyzers. Following are the steps.

Step 1  

Add the following details to the analyzer-config.xml where it registers the alias for specific class analyzers.

name - alias name which maps to the class analyzer  

class - class analyzer

parameters - parameters that are accepted by class analyzer (optional)

Following is an example config,

<analyzerConfig xmlns="http://wso2.org/carbon/analytics">
    <analyzers>
        <analyzer>
            <name>foo</name>
            <class>org.wso2.carbon.analytics.hive.extension.builtin.FooAnalyzer</class>
            <parameters>bar,bat1,*</parameters>
        </analyzer>
    </analyzers>
</analyzerConfig>

Step 2

Then you can use that defined analyzer with the following syntax in hive scripts,

analyzer foo(bar="value",bar1="value1",*);

If in case you don't have alias for hive analyzer then you can simply put absolute class name along with parameters as bellow,  

analyzer org.wso2.carbon.analytics.hive.extension.builtin.FooAnalyzer(bar="value",bar1="value1",*);
Built-in Hive Analyzers

These are made out by considering the common usecases in hive scripting. Following are the currently available built in analyzers.  

analyzer resolvePath(path=”value”);  

resolvePath analyzer can be used to get the correct file path depending on the OS platform.

parameters

path - file path (ie :-file://${CARBON_HOME}/repository/components/lib/CustomUDF_Country.jar)

Generally, you can include this line of code in any place of the script according to your requirement. For example, if you want some pre-processing done, add it at the beginning of the script. If you want some post-processing to be done after normal Hive queries, add it at the end.

Maven dependencies for Hive custom analyzer

Also,  add the following code segment under maven dependencies.

<dependency>
    <groupId>org.wso2.carbon</groupId>
    <artifactId>org.wso2.carbon.analytics.hive</artifactId>
     <version>4.2.0</version>
</dependency>
  • No labels