<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Deep Dive on Kianoosh's Blog</title><link>https://kianoosh.dev/tags/deep-dive/</link><description>Recent content in Deep Dive on Kianoosh's Blog</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Thu, 22 Jan 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://kianoosh.dev/tags/deep-dive/index.xml" rel="self" type="application/rss+xml"/><item><title>Clickhouse — a Weird but Powerful Columnar, Lsm-based Data Warehouse</title><link>https://kianoosh.dev/posts/2026-01-22-clickhouse-a-weird-but-powerful-columnar-lsm-based-data-warehouse/</link><pubDate>Thu, 22 Jan 2026 00:00:00 +0000</pubDate><guid>https://kianoosh.dev/posts/2026-01-22-clickhouse-a-weird-but-powerful-columnar-lsm-based-data-warehouse/</guid><description>&lt;p>In previous discussions, we talked about how the &lt;strong>Lakehouse architecture&lt;/strong> is emerging in the industry and gradually replacing the classic two-tier setup of data lakes and data warehouses. However, in the company I work for, we currently use &lt;strong>ClickHouse as our single data warehouse&lt;/strong>. As we move forward toward &lt;strong>recommendation engines and machine learning models&lt;/strong>, an important question naturally arises:&lt;/p>
&lt;p>&lt;strong>Is ClickHouse really the right system for where we are heading?&lt;/strong>&lt;/p></description></item><item><title>OLAPs and Columnar File Formats</title><link>https://kianoosh.dev/posts/2026-01-06-olaps-and-columnar-file-formats/</link><pubDate>Tue, 06 Jan 2026 00:00:00 +0000</pubDate><guid>https://kianoosh.dev/posts/2026-01-06-olaps-and-columnar-file-formats/</guid><description>&lt;p>In earlier &lt;a href="https://kianoosh.dev/posts/2026-01-01-the-architecture-of-evolution-why-the-lakehouse-is-inevitable">posts&lt;/a>, we explored the Lakehouse architecture, which focuses on splitting storage and computation. By leveraging S3 object storage and open file formats, we can store a theoretically infinite volume of data.&lt;/p>
&lt;p>But which file format is most efficient for this architecture? 🤔&lt;/p>
&lt;p>This paper, &lt;a href="https://www.vldb.org/pvldb/vol16/p3044-liu.pdf">&amp;ldquo;A Deep Dive into Common Open Formats for Analytical&amp;rdquo;&lt;/a> written by researchers at Microsoft, compares Parquet, Arrow, and ORC to determine which is most suitable for OLAP workloads. The Verdict? There is no single winner. The &amp;ldquo;best&amp;rdquo; format depends entirely on the specific use case.&lt;/p></description></item><item><title>Engineering for Exabytes: Lessons from IBM COS</title><link>https://kianoosh.dev/posts/2025-12-29-engineering-for-exabytes-lessons-from-ibm-cos/</link><pubDate>Mon, 29 Dec 2025 00:00:00 +0000</pubDate><guid>https://kianoosh.dev/posts/2025-12-29-engineering-for-exabytes-lessons-from-ibm-cos/</guid><description>&lt;h2 id="beyond-standard-storage">Beyond Standard Storage&lt;/h2>
&lt;p>After &lt;a href="https://kianoosh.dev/posts/2025-12-28-open-source-is-not-enough-the-post-minio-era">my last post&lt;/a> on the MinIO shift, I started going down the rabbit hole. We often take object storage for granted, but when you step back and ask, &lt;em>&amp;ldquo;How do you actually manage data at an Exabyte scale?&amp;rdquo;&lt;/em>, the answers get fascinating.&lt;/p>
&lt;p>How do you design a system that reaches 15-nines of durability without bankrupting the company on hardware?&lt;/p>
&lt;p>I recently found an IBM Redpaper on their Cloud Object Storage (COS) that peels back the layers. It explains the engineering choices required to operate at that magnitude.&lt;/p></description></item></channel></rss>