pg_clickhouse & The Art of Pushdown: An Engineer's Deep Dive

Table of Contents

  1. The Core Problem
  2. What is pg_clickhouse and Why Does It Exist?
  3. The Default Behavior: Naive Federation
  4. Pushdown: The Mental Model
  5. How Postgres FDWs Actually Implement Pushdown
  6. The April 2026 Release: What Got Pushed Down
  7. The Other Direction: Streaming (Pushback)
  8. Why This Pattern Shows Up Everywhere
  9. Failure Modes & Edge Cases
  10. Core Principles Cheat Sheet

1. The Core Problem

When you split a query engine from its storage, the network becomes the new disk — and it's much slower than disk. Every byte you move across the wire is a byte you didn't need to move. Pushdown is how you avoid moving them.

Rendering diagram…

The key insight: pushdown is not a ClickHouse thing or a Postgres thing — it's a fundamental requirement for any federated system. Spark SQL does it. Trino does it. BigQuery's external tables do it. Every distributed query engine eventually rediscovers this lesson.


2. What is pg_clickhouse and Why Does It Exist?

pg_clickhouse is a Postgres Foreign Data Wrapper (FDW) that lets a Postgres instance query ClickHouse tables as if they were native Postgres tables. The motivation is the classic OLTP + OLAP split:

Rendering diagram…

The FDW collapses this into a single front door:

Rendering diagram…

The application sees Postgres. Postgres handles the routing. ClickHouse stays specialized for what it's good at. From the setup in the blog:

SQL
CREATE SERVER ch FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(driver 'http'); CREATE USER MAPPING FOR CURRENT_USER SERVER ch; CREATE EXTENSION pg_clickhouse; CREATE SCHEMA customer; IMPORT FOREIGN SCHEMA "default" FROM SERVER ch INTO customer;

After this, customer.events looks like a Postgres table. The illusion is what makes it useful — and the illusion is what makes pushdown so critical to get right.


3. The Default Behavior: Naive Federation

This is what happens without pushdown. It's the trap every FDW falls into on day one.

Rendering diagram…

The bug here isn't conceptual — Postgres is doing exactly what you asked. The bug is that the FDW didn't tell Postgres "I can do filters remotely." So the planner, conservatively, pulled everything and filtered locally.

This is especially painful with ClickHouse on the other end. ClickHouse is a world-class filtering engine — primary-key skip indices, sparse indexes, vectorized SIMD scans, columnar pruning. Bypassing all of that to filter in Postgres is like flying first-class to the airport and then walking to your destination.


4. Pushdown: The Mental Model

The fix is conceptually simple: before executing the foreign scan, look at the query plan and see what work the remote side can do. Translate as much of the WHERE/ORDER BY/LIMIT/aggregation as possible into the remote dialect, send it, and let the remote do the heavy lifting.

Rendering diagram…

The pushdown taxonomy

Not all pushdowns are equal. There's a hierarchy of how aggressive an FDW can be:

Rendering diagram…

The April 2026 pg_clickhouse release is heavily focused on Level 3 — pushing down JSONB accessors, date/time functions, and array functions. This is the level where most real-world apps live: complex WHERE clauses with function calls everywhere.


5. How Postgres FDWs Actually Implement Pushdown

This is the part most blog posts skip. Pushdown isn't magic — it's a specific dance between the Postgres planner and the FDW's callback functions.

Rendering diagram…

The translation step

The hardest part of building an FDW is the expression walker — the code that walks the Postgres expression tree and decides, for each node, whether the remote system has an equivalent.

Rendering diagram…

This is exactly what you see in the EXPLAIN VERBOSE output:

Remote SQL: SELECT id, event, props FROM "default".events
            WHERE ((props.cid = 'C200'))

The Postgres planner is showing you the literal string it generated. That string is what travels to ClickHouse.


6. The April 2026 Release: What Got Pushed Down

The blog covers four major pushdown additions. Let's look at each from an engineering angle.

Rendering diagram…

6a. JSONB accessor pushdown

This is the headline feature. Postgres's ->> operator (JSON path → text) gets translated to ClickHouse's sub-column syntax on the new JSON type.

Rendering diagram…

The -> operator (returns JSONB, not text) is trickier — Postgres compares JSONB-to-JSONB, ClickHouse needs the sub-column wrapped in toJSONString() to match Postgres semantics:

SQL
-- Postgres input WHERE props -> 'cid' = '"C300"'::jsonb -- ClickHouse remote SQL WHERE toJSONString(props.cid) = '"C300"'

This is a critical detail: pushdown isn't just translation, it's semantic-preserving translation. If the wrapped expression doesn't return exactly what Postgres would have returned, the result is wrong — and that's worse than slow.

6b. Date/time function pushdown

The win here is more subtle but more important than it looks. Without pushdown, CURRENT_DATE evaluated on the Postgres side becomes a literal date before the query even leaves Postgres. So why does pushdown matter?

Rendering diagram…

This is the cascade effect of pushdown: one un-pushable function in the middle of an expression tree can poison the whole branch. That's why a customer hit failures combining CURRENT_DATE with date_part() — the latter was pushable, but the former wasn't, so neither could go.

The fix in v0.2.0 was to push down all the "current"-style functions with proper timezone handling:

Postgres functionClickHouse equivalent
CURRENT_DATEtoDate(now('America/New_York'))
CURRENT_TIMESTAMPnow64(6, 'America/New_York')
CURRENT_TIMESTAMP(3)now64(3, 'America/New_York')
clock_timestamp()nowInBlock64(6, 'America/New_York')
statement_timestamp()nowInBlock64(6, 'America/New_York')
transaction_timestamp()nowInBlock64(6, 'America/New_York')

The timezone is captured from the Postgres session — that's what makes the translation semantically correct, not just syntactically.

6c. Array function pushdown

The same translation pattern, applied to arrays:

Rendering diagram…

The blog shows a particularly nice example — composing array pushdown with JSON pushdown:

SQL
WHERE string_to_array(jsonb_extract_path_text(props, 'address', 'city'), ', ') = ARRAY['Portland', 'USA']

becomes

SQL
WHERE splitByString(', ', props.address.city) = ['Portland','USA']

Two layers of translation collapse cleanly into a single remote expression. This is what you want — pushdown that composes, not pushdown that breaks the moment you nest functions.


7. The Other Direction: Streaming (Pushback)

Pushdown solves the "send less data over the wire" problem. But what if the user legitimately asks for a huge result set — say, exporting a billion-row table for ML training?

Rendering diagram…

The memory profile, visualized

The blog includes timings — here's what happens during a 40-second extract of the NYC taxi dataset:

Rendering diagram…

Same query. Same result set. One holds the whole thing in memory and unwinds at the end. The other holds 50–86 MiB throughout.

Why batched streaming is the right design

Rendering diagram…

50MB is a deliberate choice. Big enough to amortize per-batch overhead. Small enough that ten concurrent queries don't blow up the Postgres backend. This is the same trade-off Spark made with its task batch size, JDBC made with setFetchSize(), and Arrow Flight made with record batches.


8. Why This Pattern Shows Up Everywhere

Once you internalize the pushdown mental model, you start seeing it in every distributed system you touch.

Rendering diagram…

The unifying principle: wherever you have a separation between "the thing that knows what's wanted" and "the thing that has the data," you want to push the want as close to the data as possible. This is true for query engines, APIs, file systems, networks, even CPU caches (data locality is just CPU-scale pushdown).

A case study: what changes when pushdown is missing

Rendering diagram…

The numbers are illustrative but not exaggerated — selectivity of 1-in-a-million is normal for analytical queries. Pushdown isn't a 2x optimization; it's a 1000x or 1,000,000x optimization depending on the selectivity.


9. Failure Modes & Edge Cases

Pushdown is dangerous if you get it wrong. Here are the traps every FDW author eventually steps in:

Rendering diagram…

The April 2026 release shows multiple instances of careful semantic preservation:

  • The -> operator wraps the result in toJSONString() so JSONB-to-JSONB comparison works
  • CURRENT_DATE carries the Postgres session timezone explicitly into the ClickHouse expression
  • CURRENT_TIMESTAMP defaults to precision 6 (Postgres default), and respects CURRENT_TIMESTAMP(3) when the user specifies it

These aren't details — they're the difference between a pushdown you can trust and one that silently returns wrong answers.


10. Core Principles Cheat Sheet

Rendering diagram…

TL;DR table

ConceptOne-line summary
PushdownSend the filter/transform to where the data lives, not the other way around.
Foreign Data Wrapper (FDW)The Postgres extension API that lets one DB query another as if it were local.
pg_clickhouseAn FDW that connects Postgres to ClickHouse, with aggressive pushdown of WHERE/ORDER BY/HAVING.
EXPLAIN VERBOSEYour friend. Shows the literal remote SQL the FDW will send. Use it to verify pushdown is working.
Cascade failureWhen one un-pushable expression forces a whole predicate to be evaluated locally. The reason every function pushdown matters.
Semantic preservationPushdown is only safe if the translated expression returns exactly what the original would have.
Streaming / pushbackThe other direction of the same problem — bound memory when results are legitimately large.
Universal patternSpark, Trino, BigQuery, DuckDB, GraphQL, ORMs — the pushdown idea shows up everywhere compute and data are separated.

A final note from the engineer's chair

If you're building a system that sits in front of a fast backend — whether that's ClickHouse, BigQuery, Iceberg-on-S3, or a custom storage engine — pushdown is not a nice-to-have. It's the single biggest determinant of whether your federation is actually usable in production.

The pg_clickhouse team's "pushdown, pushdown, pushdown!" mantra is not a slogan; it's a recognition that every release of a federation tool is fundamentally an exercise in reducing the gap between what the user wrote and what runs natively on the fast side.

Every function you push down is one less reason for the user to drop the abstraction and go talk to ClickHouse directly. That's the whole game.

Created 5/8/2026, 3:06:52 AM · z0Q3Rm1H

This document is protected with AES-256-GCM encryption at rest