问题描述:

I have a csv file full of float numbers encoded with a comma rather than a dot.

I made a pig loading script precising the float type, but when reading the comma, pig can"t convert that to a float (he expect float to have dots).

How could I change the commas by dot in the loading phase ?

I understand a UDF could make the trick, but is there another simpler way ?

Thanks.

网友答案:

Okay, tested this just for fun with some simple data:

1,2;2,3;4,5
5,6;6,7;7,8

Pig script:

data = load 'commatest.csv' using PigStorage(';') as (f1:chararray, f2:chararray, f3:chararray);
replaced = foreach data generate REPLACE(f1, ',', '.') as f1dot, REPLACE(f2, ',', '.') as f2dot, REPLACE(f3, ',', '.') as f3dot;
fdata = foreach replaced generate (float)f1dot as f1, (float)f2dot as f2, (float)f3dot as f3;
dump fdata;

Output:

(1.2,2.3,4.5)
(5.6,6.7,7.8)

To test if it is really converted to float:

test = foreach fdata generate f1*f2*f3;
dump test;

Output:

(12.42)
(292.65598)
相关阅读:
Top