I, Me and Myself

Tuesday, September 09, 2014

HiveAccessControlException Permission Denied. (Error from hive: error code:'40000')

While importing data from Hortonworks Hive DSN in office 2013 and you faced error for HiveAccessControlException Permission Denied.

Login to console for hortonworks using root (hadoop). Goto hive shell and grant permission to the table you are trying to import.

hive> grant SELECT on table m_user to user hue;

And then re-import the table in office 2013.

Thursday, September 04, 2014

Creating jar with dependencies in Maven Project

Simple but quite useful :

Simple add this in your pom.xml and run mvn package or install

<!--  create jar with dependencies -->
 <build>
  <plugins>
  <plugin>
    <artifactId>maven-assembly-plugin</artifactId>
   <configuration>
   <archive>
    <manifest>
     <mainClass>fully.qualified.MainClass</mainClass>
    </manifest>
   </archive>
   <descriptorRefs>
    <descriptorRef>jar-with-dependencies</descriptorRef>
   </descriptorRefs>
  </configuration>
   <executions>
   <execution>
     <id>make-assembly</id><!-- this is for inheritance merges -->
     <phase>package</phase> <!-- bind to the packaging phase -->
    <goals>
     <goal>single</goal>
    </goals>
   </execution>
  </executions>
   </plugin>
  </plugins>
 </build>

Wednesday, September 03, 2014

How to resolve Missing artifact jdk.tools:jdk.tools:jar:1.X

I have faced this issue with couple of maven artifacts like hbase-client (v 0.98.0-hadoop2)or solr-core (v 4.8.0).

Error :
Description Resource Path Location Type
The container 'Maven Dependencies' references non existing library 'C:\Users\spras3\.m2\repository\jdk\tools\jdk.tools\1.7\jdk.tools-1.7.jar' storm-analytics Build path Build Path Problem

I found the easiest way to resolve this issue by adding exclusion to the dependency for e.g.

Wednesday, August 27, 2014

Simple Item-Based Recommendation using Mahout on Hortonworks sandbox

1. Set up the hortonworks sandbox

2. Install Mahout on sandbox ( yum install mahout )

3. This is an extension of chapter . This chapter shows how to download and set up the Omniture logs, products and user data.

· User table – 38455 rows

· Product table – 31 rows

· Omniture logs – 421266 rows

4. Processing the data to have the sequence id or primary key for both Product and User and then applying it to Omniture logs.

Creating Omniture data with both userId and ProductId. In this example we are using hash to create userid and getting the product id from the URL to assign in m_omniture table

This hack is required as the mahout input data should have relation of UserId, ProductId and Score (relation ship strength)

create view m_omniture as

Select

col_2 ts,

col_8 ip,

col_13 url,

substr(split(col_13, '/')[4],3) product_id,

col_14 swid,

positive(hash(col_14)) userId,

col_50 city,

col_51 country,

col_53 state

from omniturelogs

Creating Product table with Product Id

create view m_product as

select

substr(split(url, '/')[4],3) product_id,

url url,

category category,

from products

Creating user table with UserId

CREATE TABLE m_user as

SELECT

postivie(hash(swid)) userId,

birth_dt bday,

gender_cd gender,

swid sessionId

FROM user

5. Creating the Data for intake of the mahout algorithm.

Here we are grouping the URL accessed by user and based on the number of times and we are putting score as the number of times user has accessed the URL.

We are also removing the URL for the home page (without product ID as it will create a row with the null entry). The final output is 3733 rows.

hive -e 'select userId,product_id,count(*) relation from m_omniture group by userId,product_id' > /apps/mahout_input/temp.tsv

hadoop fs -put /tmp/temp.tsv /apps/mahout_input/

hadoop fs -rmr /user/**/temp/*

Note : remove the null rows from the file (:g/NULL/ d)

6. Run the mahout algorithm to create the output file. We are using item recommendations.

mahout recommenditembased -s SIMILARITY_LOGLIKELIHOOD -i /apps/mahout_input/temp.tsv -o /apps/mahout_output --numRecommendations 3

This will create the out put file in mahout_output folder. For e.g. with user id and the list of recommendations.

-2144047953 [55173281:28.527079,55175948:28.522099,55156528:28.460249,55170364:27.141182,

55149415:26.843468,55173061:26.8081,55165149:26.568054,55166807:26.551744]

-2142884193 [55177927:40.85718,55149415:37.643143,55179070:37.496075,55173281:37.040237,

55175948:36.780933,55147564:36.429764,55169229:33.978317,55156528:33.863533]

7. Dump this output in the hbase for faster access and also store the m_product table in Hbase

On hive shell

CREATE TABLE

mahout_recommendations (id STRING, c1 STRING)

STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'

TBLPROPERTIES (

'hbase.table.name' = 'mahout_recommendations',

'hbase.columns.mapping' = 'd:c1',

'hcat.hbase.output.bulkMode' = 'true'

);

vi pig.txt

inpt = LOAD 'hdfs://sandbox.hortonworks.com/apps/mahout_output/itemitemout/part-r-00000' USING PigStorage('\t') AS (id:chararray, c1:chararray);

STORE inpt INTO 'mahout_recommendations' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ('d:c1');

execute

pig -x local pig.txt

Load the product data in Hbase

CREATE TABLE product_hbase(product_id string, category string, url string)

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:category,cf:url");

SET hive.hbase.bulk=true;

INSERT OVERWRITE TABLE product_hbase

SELECT product_id,category,url from m_products where product_id is not null;

8. Analyze the result either by logging to hbase shell or start the hbase rest service and access it using the rest client.

./bin/hbase-daemon.sh start rest -p 8500

curl http://localhost:8500/mahout_recommendations/-2135953226/d

curl http://localhost:8500/product_hbase/-2135953226/cf

Thursday, February 27, 2014

Akamai Cache Headers check using POSTMAN

Install Postman (chrome plugin app)

Get the collection directly download from

https://www.getpostman.com/collections/531dde467a635f8ef502

Or set up manually by putting any URL
and setting Header Pragma as akamai-x-cache-on, akamai-x-cache-remote-on, akamai-x-check-cacheable, akamai-x-get-cache-key, akamai-x-get-extracted-values, akamai-x-get-nonces, akamai-x-get-ssl-client-session-id, akamai-x-get-true-cache-key, akamai-x-serial-no

Most of headers and the interpreting the header is defined in the diagram itself.
Or you can use alternatively curl command to get the same information.

curl -H "Pragma: akamai-x-cache-on, akamai-x-cache-remote-on, akamai-x-check-cacheable, akamai-x-get-cache-key, akamai-x-get-extracted-values, akamai-x-get-nonces, akamai-x-get-ssl-client-session-id, akamai-x-get-true-cache-key, akamai-x-serial-no" -IXGET http://abc.com/na/terms-n-conditions