Paper Review on Kianoosh's Blog

Clickhouse — a Weird but Powerful Columnar, Lsm-based Data Warehouse

Thu, 22 Jan 2026 00:00:00 +0000

In previous discussions, we talked about how the Lakehouse architecture is emerging in the industry and gradually replacing the classic two-tier setup of data lakes and data warehouses. However, in the company I work for, we currently use ClickHouse as our single data warehouse. As we move forward toward recommendation engines and machine learning models, an important question naturally arises:

Is ClickHouse really the right system for where we are heading?

Software Modularity: Trivial Concept, Yet Still Rarely Done Right!

Wed, 07 Jan 2026 00:00:00 +0000

If you’ve ever read any software engineering blog or book, you’ve probably seen the word “Modularity” mentioned many times. But ask most engineers what modularity really means, what benefits it brings, and how to actually break a system into modules — and you’ll often get vague or unclear answers.

To find clear and timeless answers, we need to go back to the 1960s and 70s, when these ideas were first introduced and refined. In his famous 1972 paper (over 8,000 citations and still cited 250+ times in 2025!), David Parnas tackled a problem that was unsolved until then: the criteria for decomposing software into modules.

OLAPs and Columnar File Formats

Tue, 06 Jan 2026 00:00:00 +0000

In earlier posts, we explored the Lakehouse architecture, which focuses on splitting storage and computation. By leveraging S3 object storage and open file formats, we can store a theoretically infinite volume of data.

But which file format is most efficient for this architecture? 🤔

This paper, “A Deep Dive into Common Open Formats for Analytical” written by researchers at Microsoft, compares Parquet, Arrow, and ORC to determine which is most suitable for OLAP workloads. The Verdict? There is no single winner. The “best” format depends entirely on the specific use case.

The Architecture of Evolution: Why the Lakehouse is Inevitable

Thu, 01 Jan 2026 00:00:00 +0000

The Structural Bottleneck

In an earlier post, I touched on why Data Lakes emerged. But to understand where the industry is going next, we need to understand the history of how we got here.

Great architecture isn’t about chasing new tools; it’s about recognizing the structural bottlenecks that force systems to evolve.

The paper “Lakehouse: A New Generation of Open Platforms” (CIDR 2021) is a brilliant breakdown of this evolution. It maps the shift from Gen 1 (Warehouses) to Gen 2 (Lakes + Warehouses) and explains why the industry is inevitably moving to Gen 3 (Lakehouse).

Engineering for Exabytes: Lessons from IBM COS

Mon, 29 Dec 2025 00:00:00 +0000

Beyond Standard Storage

After my last post on the MinIO shift, I started going down the rabbit hole. We often take object storage for granted, but when you step back and ask, “How do you actually manage data at an Exabyte scale?”, the answers get fascinating.

How do you design a system that reaches 15-nines of durability without bankrupting the company on hardware?

I recently found an IBM Redpaper on their Cloud Object Storage (COS) that peels back the layers. It explains the engineering choices required to operate at that magnitude.