Deep Dive on Kianoosh's Blog

Clickhouse — a Weird but Powerful Columnar, Lsm-based Data Warehouse

Thu, 22 Jan 2026 00:00:00 +0000

In previous discussions, we talked about how the Lakehouse architecture is emerging in the industry and gradually replacing the classic two-tier setup of data lakes and data warehouses. However, in the company I work for, we currently use ClickHouse as our single data warehouse. As we move forward toward recommendation engines and machine learning models, an important question naturally arises:

Is ClickHouse really the right system for where we are heading?

OLAPs and Columnar File Formats

Tue, 06 Jan 2026 00:00:00 +0000

In earlier posts, we explored the Lakehouse architecture, which focuses on splitting storage and computation. By leveraging S3 object storage and open file formats, we can store a theoretically infinite volume of data.

But which file format is most efficient for this architecture? 🤔

This paper, “A Deep Dive into Common Open Formats for Analytical” written by researchers at Microsoft, compares Parquet, Arrow, and ORC to determine which is most suitable for OLAP workloads. The Verdict? There is no single winner. The “best” format depends entirely on the specific use case.

Engineering for Exabytes: Lessons from IBM COS

Mon, 29 Dec 2025 00:00:00 +0000

Beyond Standard Storage

After my last post on the MinIO shift, I started going down the rabbit hole. We often take object storage for granted, but when you step back and ask, “How do you actually manage data at an Exabyte scale?”, the answers get fascinating.

How do you design a system that reaches 15-nines of durability without bankrupting the company on hardware?

I recently found an IBM Redpaper on their Cloud Object Storage (COS) that peels back the layers. It explains the engineering choices required to operate at that magnitude.