Skip to end of metadata
Go to start of metadata

One of the most fundamental tasks of BAM is data analysis. The BAM analytics framework runs summarization and data analytics on collected data. WSO2 BAM implements data analysis using an Apache Hadoop-based big data analytics framework, which uses the highly-scalable, MapReduce technology underneath it. MapReduce is a programming model designed to process large data sets. As a result, the BAM analytics framework provides capability to scale out data processing operations on a large number of data processing nodes and handle large data volumes. 

Although BAM uses MapReduce technology underneath, you do not have to write complex Hadoop jobs to process data. BAM decouples you from these underlying complexities and enables you to write data processing queries and analytic jobs in integrated Apache Hive query language. Hive is a simple query language similar to SQL, and is easy to learn and use. Hive provides you the right level of abstraction from Hadoop engine while internally submitting the analytic jobs to Hadoop. It spawns a Hadoop JVM internally or delegates to a Hadoop cluster. Refer to section Creating Hive Queries to Analyze Data for instructions to write Hive scripts, configure Hive tables etc.

By default, Hive submits analytic jobs to a Hadoop instance running in local mode but you can configure a multi-node Hadoop cluster. You can read about these execution modes from Hadoop wiki documentation at http://hadoop.apache.org/docs/stable/index.html.

The following feature in the WSO2 feature repository provides BAM analytics framework's functionality:

Name : WSO2 Carbon - Analytics Feature
Identifier : org.wso2.carbon.analytics.server.feature.group

It is bundled by default in WSO2 BAM. Next, take a look at how to write Hive queries and set up the databases to store and execute them in section, Creating Hive Queries to Analyze Data.

  • No labels