Why spark streaming receive data from kafka use more memory than <executorMemory * executorCount + driverMemory>? -
i submitted spark streaming application yarn cluster client mode follows:
./spark-submit \ --jars $jars \ --class $appcls \ --master yarn-client \ --driver-memory 64m \ --executor-memory 64m \ --conf spark.shuffle.service.enabled=false \ --conf spark.dynamicallocation.enabled=false \ --num-executors 6 \ /data/apps/app.jar
executormemory * executorcount + drivermemory = 64m*6 + 64m = 448m,
but application used 3968mb. why did happen , how can reduce memory use?
there spark configuration parameters spark.yarn.executor.memoryoverhead
, spark.yarn.driver.memoryoverhead
default 384 mb in case (docs).
then there fact yarn has memory allocation granularity (yarn.scheduler.increment-allocation-mb
) defaults 512 mb. rounded multiple of that.
also there minimum allocation size (yarn.scheduler.minimum-allocation-mb
) defaults 1 gb. it's either been set lower in case or not correctly looking @ memory allocation.
all overhead should negligible compared memory use. should set --executor-memory
20 gb or more. why trying configure ridiculously low amount of memory?
Comments
Post a Comment