Netflix's ML Routing: Switchboard to Lightbulb

Alps Wang

Alps Wang

May 2, 2026 · 1 views

Netflix's detailed exploration of their ML model serving infrastructure, particularly the evolution from Switchboard to Lightbulb, offers a masterclass in tackling the complexities of routing at massive scale. The core insight lies in the explicit recognition that ML models at Netflix are not just simple inference functions but complex, self-contained workflows. This distinction is crucial and informs their architectural decisions, leading to a domain-independent API abstraction and a centralized serving platform. The introduction of 'Objectives' as a unifying concept for business use cases is particularly elegant, decoupling clients from specific model implementations and simplifying integration. The progressive rollout strategy, from Switchboard's centralized proxy to the planned distributed approach with Lightbulb, demonstrates a mature understanding of operational challenges like single points of failure and latency.

However, the article highlights a common trade-off in building custom, highly optimized solutions: the inherent complexity introduced. While Switchboard provided essential flexibility and integration with Netflix's experimentation platform, its monolithic nature became a bottleneck. The added latency and the obscuring of client request origins are valid concerns. The article effectively teases the next iteration, Lightbulb, which aims to address these limitations by distributing responsibilities. This suggests a continuous learning and adaptation process, which is commendable. A potential limitation, not fully detailed, is the overhead of maintaining the JavaScript-based 'Switchboard Rules' and ensuring its robust parsing and execution across different components. The reliance on a custom routing service also implies significant in-house engineering investment, making it less directly transferable to organizations without similar resources and expertise.

This article is invaluable for ML engineers, MLOps professionals, and architects involved in building or scaling ML serving infrastructure. It provides concrete examples of how to manage model versioning, A/B testing, and traffic shifting in a production environment. Developers building client applications that consume ML models will benefit from understanding the underlying abstractions and how they enable faster iteration. The insights into decoupling clients from model sharding and the challenges of dynamic configuration management are universally applicable. For organizations aiming for high-velocity ML innovation, the Netflix approach offers a compelling blueprint, albeit one that requires significant engineering maturity and commitment to custom solutions.

Key Points

  • Netflix's ML models are treated as self-contained workflows, not just inference functions.
  • A centralized ML serving platform with a domain-independent API abstraction is key to rapid innovation.
  • 'Objectives' serve as a unified concept for business use cases, decoupling clients from concrete models.
  • Switchboard, a custom routing service, provided context-aware routing, dynamic traffic splitting, and model lifecycle management.
  • Challenges with Switchboard at scale included single point of failure and added latency.
  • The evolution towards Lightbulb aims to address these limitations by distributing responsibilities.

Article Image


📖 Source: State of Routing in Model Serving

Related Articles

Comments (0)

No comments yet. Be the first to comment!