Salesforce's Karpenter Leap: 1,000+ EKS Clusters
Alps Wang
Jan 13, 2026 · 1 views
Scaling Kubernetes: A Salesforce Story
This article provides a compelling case study of Salesforce's migration from Cluster Autoscaler to Karpenter, highlighting the benefits in terms of operational efficiency, cost optimization, and developer experience. The key insights are valuable for any organization operating Kubernetes at scale, particularly those facing challenges with node group management, scaling latency, and resource utilization. The detailed discussion of the migration process, including the custom tooling developed (Karpenter transition tool and Karpenter patching check tool), is particularly noteworthy. The phased rollout strategy, risk-based sequencing, and emphasis on zero-disruption migration are excellent examples of best practices. However, the article primarily focuses on the successes and doesn't delve deeply into potential downsides or trade-offs. While the challenges encountered are mentioned, a more thorough discussion of the complexities of managing such a large-scale migration, the potential for unexpected issues, and the ongoing maintenance requirements would enhance the analysis. Furthermore, while Karpenter is praised, a comparison with other node autoscaling solutions, such as the existing Kubernetes Cluster Autoscaler or alternative commercial offerings, would provide a more complete picture of the landscape. Finally, the article is heavily focused on Salesforce's specific environment; while the core concepts are generally applicable, the specific tooling and configurations may not be directly transferable to other organizations.
Key Points
- Salesforce migrated 1,000+ EKS clusters from Cluster Autoscaler to Karpenter, improving operational efficiency and reducing costs.
- The migration involved custom tooling (Karpenter transition tool, patching check tool) and a phased, risk-mitigated approach.
- Key benefits include reduced scaling latency, improved node utilization, and a more self-service infrastructure for developers.
- Challenges included managing application availability during node updates, Kubernetes label constraints, and storage requirements.

Related Articles
Comments (0)
No comments yet. Be the first to comment!
