hadoop - Load data into Hive with custom delimiter -


i'm trying create internal (managed) table in hive can store incremental log data. table goes this:

create table logs (foo int, bar string, created_date timestamp) row format delimited fields terminated '<=>' stored textfile; 

i need load data table periodically.

load data inpath '/user/foo/data/logs' table logs; 

but data not getting inserted table properly. there might problem delimiter.can't find why.

example log line:

120<=>abcdefg<=>2016-01-01 12:14:11 

on select * logs; get,

120  =>abcdefg  null 

first attribute fine, second contains part of delimiter since it's string getting inserted , third null since expects date time.

can please on how provide custom delimiters , load data successfully.

by default, hive allows user use single character field delimiter. although there's regexserde specify multiple-character delimiter, can daunting use, amateurs.

the patch (hive-5871) adds new serde named multidelimitserde. multidelimitserde, users can specify multiple-character field delimiter when creating tables, in way similar typical table creations.

hive> create table logs (foo int, bar string, created_date timestamp)     > row format serde 'org.apache.hadoop.hive.contrib.serde2.multidelimitserde'      > serdeproperties ("field.delim"="<=>")     > stored textfile;  hive> dfs -put /home/user1/multi_char.txt /user/hive/warehouse/logs/. ;  hive> select * logs; ok 120 abcdefg 2016-01-01 12:14:11 time taken: 1.657 seconds, fetched: 1 row(s) hive>  

Comments

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -