ClickHouse Cloud Beta: Run Python ML Models Directly in SQL
Alps Wang
Jun 2, 2026 · 1 views
Bridging ML and Data: ClickHouse's Executable UDFs
The introduction of executable UDFs in ClickHouse Cloud is a powerful advancement, particularly for scenarios requiring real-time inference directly alongside data ingestion and querying. The ability to upload Python code and run it in a managed sandbox eliminates significant architectural complexity, such as setting up and maintaining separate model serving infrastructure. This drastically reduces latency and data movement, enabling truly inline processing. The provided example of anomaly detection for stock trades, scoring billions of ticks inline with ingest and feeding into downstream materialized views and a web application, is a compelling demonstration of this capability. This approach democratizes the use of trained models for a broader set of data professionals who are comfortable with SQL but may not have extensive MLOps expertise.
However, while the public beta is exciting, several aspects warrant consideration. The reliance on Python for UDFs, while common, might be a limitation for teams heavily invested in other ecosystems like R or specialized ML frameworks not easily packaged for Python. The managed sandbox, while convenient, implies a degree of abstraction that could mask performance bottlenecks or introduce unexpected overheads. Detailed observability into the UDF execution, resource consumption, and potential cold starts for sandboxed processes will be crucial for production deployments. Furthermore, security implications of running arbitrary code within the database environment, even in a sandbox, will require robust access controls and monitoring. The article hints at network-access UDFs in private beta, which further expands capabilities but also introduces new security and operational considerations for external service integrations. For complex models, managing dependencies and ensuring consistent execution across the pool of sandboxed processes will be an ongoing challenge for users.
The long-term implications of this feature are substantial. It pushes the boundaries of what a data warehouse/analytics database can do, blurring the lines between data processing and application logic. This move could inspire similar integrations in other database platforms and fundamentally change how real-time AI applications are architected. For businesses with large volumes of streaming data and a need for immediate insights derived from ML models, ClickHouse Cloud's executable UDFs offer a compelling, integrated solution that bypasses many traditional hurdles. The performance metrics for backfilling 6 billion rows at 35K rows/sec are impressive and suggest that this approach is viable for high-throughput scenarios. The flexibility to embed model logic directly into materialized views for continuous scoring is a game-changer for real-time anomaly detection and feature engineering.
Key Points
- ClickHouse Cloud now offers executable UDFs in public beta, allowing users to run custom Python code directly within SQL queries.
- This feature enables real-time ML model inference inline with data ingestion and querying, eliminating the need for separate model serving infrastructure.
- The system manages long-lived, sandboxed Python processes that execute the UDFs, simplifying deployment to a single upload.
- A demo showcases an autoencoder for real-time anomaly detection on stock trades, scoring billions of records inline with ingest.
- The integration significantly reduces latency and architectural complexity for AI-driven data processing.
- Network-access UDFs are also in private beta, enabling outbound HTTPS calls from UDFs for richer data enrichment.

📖 Source: Executable UDFs are now in public beta on ClickHouse Cloud
Related Articles
Comments (0)
No comments yet. Be the first to comment!
