java - Spark Job Take too Much Time -


i have 1gb data in kafka , running spark-streaming job take data kafka , in put java cleansing code , after put hadoop, in parsing or in cleansing it's taking time. spark standalone cluster configs are:- 2 nodes 16 gb ram , 4 core.

sample code given below:

private void processrdd(javapairdstream < string, string > messages) throws jsonexception {     //messages.print();     javadstream < string > fields = messages.map(new function < tuple2 < string, string > , string > () {         /**          *           */         private static final long serialversionuid = 2179983619724411716 l;         /**          * @throws jsonexception           *           */         public string call(tuple2 < string, string > message) throws jsonexception          {             customparser customparser = new customparser();             string parsedata = customparser.parse(message._2.getbytes());             //system.out.println("=====>>>>>>1");              return parsedata;         }     });     javadstream < string > msglines = fields.flatmap(new flatmapfunction < string, string > () {         public iterable < string > call(string x) {             //return lists.newarraylist(x.split("\n"));             return arrays.aslist(x.split("\n"));          }     });     msglines.dstream().saveastextfiles("/tmp/", ".csv"); } 


Comments

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -