Netflix's Kueue Leap: Batch Compute Simplified

Alps Wang

Alps Wang

Jun 23, 2026 · 1 views

Kubernetes-Native Batch Evolution

Netflix's successful migration of its Compute Managed Batch (CMB) to Kueue represents a significant step in their Kubernetes-native journey. The key insight is the strategic adoption of a cloud-native job queueing system to replace complex, homegrown logic, thereby streamlining operations and unlocking advanced features like preemption and fairer resource sharing. The transparent migration strategy, requiring zero lift for end-users, is particularly commendable and a crucial factor in its success. This approach not only minimized disruption but also built confidence, allowing for a rapid production rollout. The article effectively highlights how Kueue's native Kubernetes integration allowed them to leverage existing Titus scheduling profiles without fragmentation, a critical technical consideration. The ability to manage multi-tenant quotas over heterogeneous hardware and support for various job abstractions (Pod, Job, RayJob) showcases Kueue's versatility and future-proofing.

However, the article could benefit from a deeper dive into the specific challenges encountered during the high QPS/Burst/groupKindConcurrency tuning. While load tests are mentioned, understanding the precise bottlenecks and how they were overcome would provide even greater value to readers facing similar scale issues. Furthermore, while the benefits of preemption are clear, quantifying the exact gains in resource utilization and turnaround times would strengthen the case for adoption. The comparison with other solutions like YuniKorn and Volcano is brief; expanding on why Kueue was the definitive choice beyond just integration with Titus profiles could offer more nuanced insights for organizations evaluating similar technologies. Despite these points, the article provides a compelling case study for leveraging open-source Kubernetes tooling to modernize complex batch compute infrastructure.

Key Points

  • Netflix migrated its homegrown Compute Managed Batch (CMB) to Kueue, a cloud-native job queueing system, to simplify batch compute.
  • The migration was designed to be transparent to end-users, requiring zero lift.
  • Kueue's integration with Titus allowed leveraging existing scheduling profiles without fragmentation.
  • Key benefits include native support for features like preemption, all-or-nothing scheduling, and topology-aware scheduling.
  • The adoption involved migrating millions of batch jobs and required tuning Kueue for high QPS, Burst, and groupKindConcurrency.
  • Kueue enables fairer sharing and preemption, leading to better utilization of reserved capacity and improved turnaround times for critical workloads.

Article Image


📖 Source: How Netflix Simplified Batch Compute with Kueue

Related Articles

Comments (0)

No comments yet. Be the first to comment!