hadoop - Load data into Hive with custom delimiter -
i'm trying create internal (managed) table in hive can store incremental log data. table goes this:
create table logs (foo int, bar string, created_date timestamp) row format delimited fields terminated '<=>' stored textfile;
i need load data table periodically.
load data inpath '/user/foo/data/logs' table logs;
but data not getting inserted table properly. there might problem delimiter.can't find why.
example log line:
120<=>abcdefg<=>2016-01-01 12:14:11
on select * logs;
get,
120 =>abcdefg null
first attribute fine, second contains part of delimiter since it's string getting inserted , third null since expects date time.
can please on how provide custom delimiters , load data successfully.
by default, hive allows user use single character field delimiter. although there's regexserde specify multiple-character delimiter, can daunting use, amateurs.
the patch (hive-5871) adds new serde
named multidelimitserde
. multidelimitserde
, users can specify multiple-character field delimiter when creating tables, in way similar typical table creations.
hive> create table logs (foo int, bar string, created_date timestamp) > row format serde 'org.apache.hadoop.hive.contrib.serde2.multidelimitserde' > serdeproperties ("field.delim"="<=>") > stored textfile; hive> dfs -put /home/user1/multi_char.txt /user/hive/warehouse/logs/. ; hive> select * logs; ok 120 abcdefg 2016-01-01 12:14:11 time taken: 1.657 seconds, fetched: 1 row(s) hive>
Comments
Post a Comment