java - Spark Job Take too Much Time -
i have 1gb data in kafka , running spark-streaming job take data kafka , in put java cleansing code , after put hadoop, in parsing or in cleansing it's taking time. spark standalone cluster configs are:- 2 nodes 16 gb ram , 4 core.
sample code given below:
private void processrdd(javapairdstream < string, string > messages) throws jsonexception { //messages.print(); javadstream < string > fields = messages.map(new function < tuple2 < string, string > , string > () { /** * */ private static final long serialversionuid = 2179983619724411716 l; /** * @throws jsonexception * */ public string call(tuple2 < string, string > message) throws jsonexception { customparser customparser = new customparser(); string parsedata = customparser.parse(message._2.getbytes()); //system.out.println("=====>>>>>>1"); return parsedata; } }); javadstream < string > msglines = fields.flatmap(new flatmapfunction < string, string > () { public iterable < string > call(string x) { //return lists.newarraylist(x.split("\n")); return arrays.aslist(x.split("\n")); } }); msglines.dstream().saveastextfiles("/tmp/", ".csv"); }
Comments
Post a Comment