apache pig - Pig Load How to mix scalar and map datatypes? -
using apache pig version 0.10.1.21 (rexported)
content of data sample file:
atomicnumber,elementname,symbol,atomicmass,propertymap 46,palladium,pd,106.42,[p#46,n#60,struc#cubic] 49,indium,in,114.818,[p#49,n#66,struc#tetragonal] 52,tellurium,te,127.6,[p#52,n#76,struc#hexagonal] 86,radon,222.0,rn,[p#86,n#136,struc#cubic] 38,strontium,sr,87.62,[p#38,n#50,struc#cubic] plutonium,94,pu,244.0,[p#94,n#150,struc#monoclinic]
note: columns swapped intentionally (for radon , plutonium) see how pig handle datatype mismatch
pig script:
atomelem = load 'data/atoms.txt' using pigstorage(',') (atomicnumber:int, elementname:chararray, symbol:chararray, atomicmass:float, propertymap:map[]); dump atomelem;
results:
(,elementname,symbol,,) (46,palladium,pd,106.42,) (49,indium,in,114.818,) (52,tellurium,te,127.6,) (86,radon,222.0,,) (38,strontium,sr,87.62,) (,94,pu,244.0,)
question1: hoping propertymap displayed. can please show me how modify either pig script or data file in order display propertymap colum map datatype.
question2: in declaration of map schema, strong type datatype. declared schema propertymap:map[int, int, chararray] pig had rejected syntax (error on , right bracket expected). possible declare map having several keys? if yes, should schema declaration like?
thanks in advance help.
the reason script not produce map have used comma field delimeter, delimeter map key-value pairs. when pig splits line fields, fifth field not [p#46,n#60,struc#cubic]
, expect, rather [p#46
. pig cannot parse map, converted null
.
as second question, cannot specify datatypes of individual map values. in first place, order means nothing in map. , map can have number of elements. if want specify single datatype values, can so, beyond either pig figure out type or need explicitly cast value when use it.
to illustrate both of these points, have modified input data tab-delimited (and accordingly updated script using pigstorage('\t')
), , swapped position of 2 map elements in second line show pig not reproduce order provided in.
$ cat data.txt atomicnumber elementname symbol atomicmass propertymap 46 palladium pd 106.42 [p#46,n#60,struc#cubic] 49 indium in 114.818 [n#66,struc#tetragonal,p#49] 52 tellurium te 127.6 [p#52,n#76,struc#hexagonal] 86 radon 222.0 rn [p#86,n#136,struc#cubic] 38 strontium sr 87.62 [p#38,n#50,struc#cubic] plutonium 94 pu 244.0 [p#94,n#150,struc#monoclinic] $ cat test.pig atomelem = load 'data.txt' using pigstorage('\t') (atomicnumber:int, elementname:chararray, symbol:chararray, atomicmass:float, propertymap:map[]); dump atomelem; $ pig -x local test.pig (,elementname,symbol,,) (46,palladium,pd,106.42,[p#46,n#60,struc#cubic]) (49,indium,in,114.818,[p#49,n#66,struc#tetragonal]) (52,tellurium,te,127.6,[p#52,n#76,struc#hexagonal]) (86,radon,222.0,,[p#86,n#136,struc#cubic]) (38,strontium,sr,87.62,[p#38,n#50,struc#cubic]) (,94,pu,244.0,[p#94,n#150,struc#monoclinic])
Comments
Post a Comment