Clickhouse — a Weird but Powerful Columnar, Lsm-based Data Warehouse

Table of Contents

In previous discussions, we talked about how the Lakehouse architecture is emerging in the industry and gradually replacing the classic two-tier setup of data lakes and data warehouses. However, in the company I work for, we currently use ClickHouse as our single data warehouse. As we move forward toward recommendation engines and machine learning models, an important question naturally arises:

Is ClickHouse really the right system for where we are heading?

To answer questions like this, I believe we should focus on understanding the architecture of a system, the architectural trade-offs it makes, and the use cases it is explicitly designed to support, rather than making decisions based on surface-level observations or low-context comparisons.

With this mindset, I came across an excellent paper titled “ClickHouse” (Schulze et al., 2024), written by several members of the ClickHouse team. The paper does a great job of explaining what ClickHouse is, the design decisions behind it, and why those decisions were made.

Architecture Overview

image At its core, ClickHouse uses an LSM-tree–inspired storage engine, but with important differences compared to traditional LSM implementations. Instead of writing data first to a memtable and later flushing it to disk, ClickHouse writes data directly to immutable on-disk “parts.” Each part consists of separate files for each column. These parts are then merged asynchronously in the background.

To control the number of parts and sustain high ingestion rates, ClickHouse supports asynchronous ingestion and in-memory buffering. Unlike many LSM-based systems, however, it does not rely on a traditional write-ahead log (WAL). As a result, in certain failure scenarios (such as sudden power loss), a small amount of recently ingested data may be lost, which is a conscious trade-off in favor of ingestion throughput.

All ingested data is stored using the MergeTree family of table engines, which also support different transformations during the merge process, such as:

  • Replacing rows
  • Aggregation
  • TTL-based cleanup
  • Other merge-time transformations

For storage efficiency, ClickHouse applies several optimizations:

  • Bitmaps to represent NULL values
  • Dictionary encoding for low-cardinality columns
  • Heavy compression algorithms such as LZ4 (with others available)

Query Processing and Performance Model

Because ClickHouse is designed for petabyte-scale analytical workloads, the system strongly emphasizes minimizing unnecessary data reads and maximizing parallel execution.

To avoid loading irrelevant data into memory, ClickHouse relies on:

  • Sparse primary key indexes
  • Projections
  • Skipping indexes, such as min-max indexes, set indexes, and Bloom filters

In distributed setups, data is processed across multiple nodes. Within each node:

  • Each CPU core runs its own execution thread
  • The query execution pipeline attempts to preserve data locality to keep CPU caches hot
  • Each core leverages SIMD instructions to process multiple values per CPU cycle

After local processing, intermediate results are sent to the coordinating node, where final aggregation is performed and the result is returned to the user.

Trade-offs and Limitations

The efficiencies ClickHouse achieves in ingestion, query execution, and parallel processing come with explicit trade-offs.

ClickHouse intentionally relaxes full ACID guarantees and strong transactional semantics in favor of performance. While operations are atomic at the level of individual data parts, ClickHouse does not provide multi-table transactions or full snapshot isolation. Data durability is also relaxed compared to traditional transactional systems.

In addition, ClickHouse is not storage-format agnostic. It relies on a tightly coupled internal columnar storage layout rather than general-purpose columnar formats such as Parquet.

Data modification is another area where trade-offs are evident. Updates and deletes are expensive and non-atomic, as they require rewriting multiple parts and column files in the background. As a result, large-scale backfilling or extensive data rewrites are operationally expensive and often impractical.

For similar reasons, while it is technically possible to extract large datasets for machine learning, ClickHouse is not well-suited for bulk feature extraction or large-scale training workflows, especially compared to lakeor lakehouse-based systems.

Closing Thought

Understanding ClickHouse through its architecture makes one thing clear: it is an extremely powerful system within the boundaries it was designed for. The real challenge is not whether ClickHouse is “good” or “bad,” but whether an organization’s current and future workloads align with those architectural boundaries.


comments powered by Disqus