问题描述:

Lets say we have JSON data and we want to generate some results for business users.So does following seems to be good approach?

Loading data into hive from HDFS and then analyse it from pig using hcatalog. I have below question in this regards.

Q. Is it ok to load data from hcatalog and analyse it into pig, will this have performance overhead compare to directly read data from pig by keeping it into the HDFS.

网友答案:

I would personally prefer to do ETL using Pig.In your case JSON data can be loaded using JsonLoader and can be stored using JsonStorage.So I would load the data using Jsonloader and then store them in csv.Then I would use Hive to analyze this data.

JSON load

http://joshualande.com/read-write-json-apache-pig/

Alternative we can use twitter elephantbird json loader

http://eric.lubow.org/2011/hadoop/pig-queries-parsing-json-on-amazons-elastic-map-reduce-using-s3-data/

相关阅读:
Top