问题描述:

So I'm trying to import my log files into a hadoop cluster using Hive throught the HUE web interface . The format of the log files is

"/log/apache/apache91" "10.93.123.135" "8081" "12.93.145.7" "12.93.123.7" "/index.html" "" "114" "111211" "21111" "200" "200" "[14/Mar/2013:23:00:15 -0400]" "-" "-" "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)" "-" "-" "-" "-"

So i tried using the automatic table creation in HUE using a quotation as a delimiter. But this give me a null column for every second column. I understand why this happens because of the delimiter. Is there a way to import the data without the null column's or can I delete the null column or can I create a new table from the existing table and extract the only the data I want.

I have a lot of data to import . If anyone has a better solution for me , I would be open to it.

网友答案:

Hive only support one character as separator, so indeed you would need to have a single field separator or a TSV/CSV format.

Maybe you can configure the separator of the logger (switch to TAB or comma instead of space) and you won't need the preprocessing step.

相关阅读:
Top