Qonto Reimagines Observability with ClickHouse Cloud & AI

Alps Wang

Alps Wang

Jun 2, 2026 Β· 1 views

From Constraints to Capabilities

Qonto's journey with ClickHouse Cloud represents a compelling case study for modernizing observability stacks, particularly for organizations grappling with high-cardinality data and performance bottlenecks. The shift from Grafana Tempo to ClickHouse Cloud for tracing, as highlighted, demonstrates a clear path to overcoming limitations like sampling and time-range constraints, enabling deeper and more comprehensive incident investigations. The dramatic compression achieved (99.84%) not only translates to significant cost savings ($70k annually in S3 storage) but also underscores ClickHouse's prowess in handling massive datasets efficiently. This is a strong testament to the value of columnar databases for analytical workloads. Furthermore, the integration of ClickHouse MCP with AI to create an 'incident companion' is a truly innovative step. Democratizing incident investigation by allowing natural language queries, abstracting away SQL complexity, and providing transparent AI reasoning is a significant advancement. This lowers the barrier to entry for complex debugging, empowering a wider range of engineers and even product teams to derive insights from operational data. The 'readiness' of OpenTelemetry data for LLMs due to its standardized structure is also a crucial enabler for such AI integrations.

However, while the article paints a very positive picture, some nuances warrant consideration. The success hinges on a deep understanding and effective implementation of ClickHouse's capabilities. The article implies a relatively seamless transition, but migrating complex observability data pipelines, especially with the scale Qonto operates at, often involves intricate tuning and potential trade-offs not fully explored here. For instance, the initial setup and ongoing management of ClickHouse Cloud, while managed, still require expertise. The article also touches upon Qonto's plan to integrate logs into ClickHouse. While promising for a unified view, this could introduce new challenges regarding ingestion patterns, query performance for different log structures, and potential cost implications compared to specialized log aggregation tools if not managed carefully. The reliance on AI for incident investigation, while powerful, also introduces potential risks related to AI hallucination or misinterpretation, though Qonto's approach of making the SQL queries transparent mitigates this significantly. The article could benefit from a brief mention of the team's strategy for validating AI-generated insights or handling edge cases where AI might struggle.

This migration and innovation would benefit a wide array of organizations, from growing SaaS companies to established enterprises dealing with distributed systems and massive data volumes. Any team struggling with the limitations of traditional observability tools, facing escalating storage costs for telemetry data, or seeking to empower a broader set of users with incident investigation capabilities would find this case study highly relevant. Particularly, engineering teams at scale that are already invested in or considering OpenTelemetry will find the integration pathway with ClickHouse compelling. The potential for cost savings, combined with enhanced analytical depth and AI-driven insights, makes this a powerful blueprint for future-proofing observability strategies. The technical details about data compression and the AI companion's workflow provide concrete examples that developers can readily understand and potentially replicate or adapt for their own environments. The emphasis on moving beyond the 'three pillars' of observability towards a unified 'data' approach is a significant architectural shift that many organizations are contemplating.

Key Points

  • Qonto migrated its observability stack from Grafana Tempo to ClickHouse Cloud, significantly expanding queryable time ranges (2-3 hours to 2 weeks) and eliminating sampling.
  • ClickHouse Cloud achieved a 99.84% compression rate on 231 TB of high-cardinality trace data, reducing it to 376 GB and saving approximately $70,000 annually in S3 storage costs.
  • Qonto built an AI-powered 'incident companion' using ClickHouse MCP, enabling any employee to investigate incidents using natural language, abstracting SQL knowledge.
  • The new architecture leverages OpenTelemetry for semantic data, making it readily understandable by LLMs for AI integrations.
  • The shift from a 'three pillars' (logs, metrics, traces) model to a unified 'data' approach with ClickHouse breaks down silos and unlocks new use cases, including product analytics.

Article Image


πŸ“– Source: εˆΆι™γ‚ˆγ€γ•γ‚ˆγ†γͺγ‚‰γ€‚γƒ‡γƒΌγ‚Ώγ‚ˆγ€γ“γ‚“γ«γ‘γ―γ€‚Qontoが ClickHouse Cloud でγ‚ͺブアーバビγƒͺティを再構築する方法

Related Articles

Comments (0)

No comments yet. Be the first to comment!