Benchmark Fallout: ClickHouse Challenges Databricks' Reyden Claims

The Benchmark Transparency Imperative

The ClickHouse blog post effectively dissects Databricks' Reyden benchmark announcement, raising crucial points about benchmark transparency and reproducibility. The core of the criticism lies in Databricks' use of extremely small datasets (TPC-H SF 1 and a 22K row NYC Taxi dataset) which are too small to accurately measure the performance of large-scale query engines. This approach, as ClickHouse argues, tests in-memory caching rather than true engine performance at scale. Furthermore, the lack of detail regarding Databricks' benchmark methodology—hardware configuration, specific settings, cost, and whether ClickHouse was self-managed or cloud-deployed—renders the results opaque and unreproducible. ClickHouse's own attempt to reproduce the 'crash' scenario failed, instead demonstrating ClickHouse's ability to handle significantly higher QPS than claimed by Databricks, albeit with query rejections to protect system stability, a normal behavior for such systems.

The implications of such opaque benchmarking are significant. It can lead to 'benchmarketing,' where results are presented in a way that favors the vendor, potentially misleading users and hindering fair competition. For developers and architects making critical decisions about data infrastructure, unreliable benchmarks can lead to misinformed choices, wasted resources, and ultimately, project failures. ClickHouse's call for open, transparent, and reproducible benchmarks, backed by their commitment to publishing their own methodologies and data, provides a stark contrast and sets a standard for the industry. The article also touches on the accessibility issue, noting that Databricks' Reyden product was not yet accessible for independent testing at the time of writing, further compounding the lack of transparency.

Key Points

Databricks' Reyden benchmark announcement is criticized for using extremely small datasets, which do not accurately reflect large-scale query engine performance.
The benchmark methodology lacked crucial details (hardware, configuration, cost, deployment type), rendering it opaque and unreproducible.
ClickHouse's attempt to reproduce the claimed 'crash' failed; instead, it demonstrated ClickHouse's ability to handle significantly higher QPS than reported.
The article advocates for open, transparent, and reproducible benchmarking as essential for fair competition and informed decision-making.
Lack of access to the Databricks Reyden product further limited independent validation.

📖 Source: Benchmarks and Obscurantism: A “red” line that should not be crossed

Benchmark Fallout: ClickHouse Challenges Databricks' Reyden Claims

The Benchmark Transparency Imperative

Key Points

Related Articles

WAL-RUS: Rust Reimagines Postgres Backups

AI-Powered Bug Hunting: A ClickHouse Case Study

Thousands of AI Agents: The Future of Observability

Comments (0)

Related Articles

WAL-RUS: Rust Reimagines Postgres Backups
#PostgreSQL#Databases

AI-Powered Bug Hunting: A ClickHouse Case Study
#AI#Databases

Thousands of AI Agents: The Future of Observability
#AI#Observability