Spark off heap memory leak on Yarn with Kafka direct stream -
i running spark streaming 1.4.0 on yarn (apache distribution 2.6.0) java 1.8.0_45 , kafka direct stream. using spark scala 2.11 support.
the issue seeing both driver , executor containers gradually increasing physical memory usage till point yarn container kill it. have configured upto 192m heap , 384 off heap space in driver runs out of it
the heap memory appears fine regular gc cycles. there no outoffmemory encountered ever in such runs
infact not generating traffic on kafka queues still happens. here code using
object simplesparkstreaming extends app { val conf = new sparkconf() val ssc = new streamingcontext(conf,seconds(conf.getlong("spark.batch.window.size",1l))); ssc.checkpoint("checkpoint") val topics = set(conf.get("spark.kafka.topic.name")); val kafkaparams = map[string, string]("metadata.broker.list" -> conf.get("spark.kafka.broker.list")) val kafkastream = kafkautils.createdirectstream[string,string,stringdecoder,stringdecoder](ssc, kafkaparams, topics) kafkastream.foreachrdd(rdd => { rdd.foreach(x => { println(x._2) }) }) kafkastream.print() ssc.start() ssc.awaittermination() } i running on centos 7. command used spark submit following
./bin/spark-submit --class com.rasa.cloud.prototype.spark.simplesparkstreaming \ --conf spark.yarn.executor.memoryoverhead=256 \ --conf spark.yarn.driver.memoryoverhead=384 \ --conf spark.kafka.topic.name=test \ --conf spark.kafka.broker.list=172.31.45.218:9092 \ --conf spark.batch.window.size=1 \ --conf spark.app.name="simple spark kafka application" \ --master yarn-cluster \ --num-executors 1 \ --driver-memory 192m \ --executor-memory 128m \ --executor-cores 1 \ /home/centos/spark-poc/target/lib/spark-streaming-prototype-0.0.1-snapshot.jar any appreciated
regards,
apoorva
try increasing executor cores. in example core dedicated consuming streaming data, leaving no cores process in incoming data.
Comments
Post a Comment