Spark OOM on K8s: RAM Spill & Affinity Traps

Alps Wang

Alps Wang

Jun 3, 2026 · 1 views

Kubernetes Infrastructure's Subtle Dangers

This article brilliantly dissects a subtle yet critical infrastructure misconfiguration that led to Spark OOM failures on Kubernetes. The core insight is that seemingly minor infrastructure choices, particularly around storage semantics and pod placement, can have catastrophic ripple effects when combined, especially under production-scale load. The authors effectively illustrate how the default assumption of 'lift-and-shift' for cloud migrations can be a dangerous pitfall if underlying infrastructure contracts aren't meticulously validated. The interaction between RAM-backed scratch directories (spark.kubernetes.local.dirs.tmpfs=true) and a hard podAffinity rule, forcing executors onto a single node, is a prime example of how these components can synergistically lead to memory exhaustion. This is a valuable lesson for anyone managing Spark workloads on Kubernetes, emphasizing that troubleshooting OOMs requires looking beyond Spark's internal tuning parameters to the host infrastructure.

The primary limitation, if any, is that the article assumes a certain level of familiarity with Kubernetes networking and storage concepts, as well as Spark's internal workings. While it clearly explains the problem and solution, a reader entirely new to these environments might benefit from slightly more foundational context. However, for the target audience of experienced developers and SREs, the detail provided is excellent. The article is highly actionable; its lessons are immediately applicable to anyone planning or managing Spark on Kubernetes, encouraging a proactive approach to infrastructure validation. The 'why pre-migration testing didn't catch this' section is particularly strong, highlighting the importance of realistic load testing for complex distributed systems.

Key Points

  • RAM-backed scratch directories (spark.kubernetes.local.dirs.tmpfs=true) can cause Spark shuffle spills to exhaust node RAM instead of disk, leading to OOM failures.
  • A hard podAffinity rule forcing executors onto the same node concentrates memory pressure, exacerbating OOM risks during shuffle-heavy stages.
  • Pre-migration testing must simulate production-scale workloads to uncover compound infrastructure failures.
  • Explicitly validate storage semantics (disk vs. RAM) and scheduling behavior (pod placement) during cloud migrations.
  • Increasing scratch volume size limits and using disk-backed storage are crucial fixes.

Article Image


📖 Source: Article: Two Misconfigurations That Caused Spark OOM Failures on Kubernetes

Related Articles

Comments (0)

No comments yet. Be the first to comment!