Spark Memory Internals

Biggest change in Spark 1.6 with the Unified Memory Management paradigm, advantage of dynamic memory allocation.

Two types of memory:

  • Execution and storage

Both of these are in a unified region: Execution: for computing shuffles, joins, sorts and aggregations, execution memory can evict storage memory until used storage memory falls under a particular threshold. Storage: for caching and moving internal data across the cluster

Unified Memory#

spark.memory.fraction is fraction of JVM heap space, defualt 0.6 of JVM heap space which is 300MiB.

spark.memory.storageFraction is fraction of Execution+Storage dedicted to storage where cached blocks can sit and not be evicted.

Reserved Memory#

By default 40% of the JVM heap space, configure using spark.testing.reservedMemory. Not used by spark, only sets a limit for what can be allocated, used for Spark internal objects. if you don’t give Spark executor at least 1.5 * Reserved Memory = 450MB heap, it will fail with “please use larger heap size” error message. Reserved memory, set at 300 MB, is allocated for Spark’s internal operations and metadata. This ensures that there is always a baseline amount of memory available for Spark’s core functionalities, even when the available executor memory is low. If the executor memory is configured to be less than 1.5 times the reserved memory, the Spark job will fail to initialize, leading to potential disruptions in application execution

References#

https://0x0fff.com/spark-memory-management/
https://spark.apache.org/docs/3.5.1/tuning.html#memory-management-overview