pg_clickhouse & The Art of Pushdown: An Engineer's Deep Dive
Table of Contents
- The Core Problem
- What is pg_clickhouse and Why Does It Exist?
- The Default Behavior: Naive Federation
- Pushdown: The Mental Model
- How Postgres FDWs Actually Implement Pushdown
- The April 2026 Release: What Got Pushed Down
- The Other Direction: Streaming (Pushback)
- Why This Pattern Shows Up Everywhere
- Failure Modes & Edge Cases
- Core Principles Cheat Sheet
1. The Core Problem
When you split a query engine from its storage, the network becomes the new disk — and it's much slower than disk. Every byte you move across the wire is a byte you didn't need to move. Pushdown is how you avoid moving them.
The key insight: pushdown is not a ClickHouse thing or a Postgres thing — it's a fundamental requirement for any federated system. Spark SQL does it. Trino does it. BigQuery's external tables do it. Every distributed query engine eventually rediscovers this lesson.
2. What is pg_clickhouse and Why Does It Exist?
pg_clickhouse is a Postgres Foreign Data Wrapper (FDW) that lets a Postgres instance query ClickHouse tables as if they were native Postgres tables. The motivation is the classic OLTP + OLAP split:
The FDW collapses this into a single front door:
The application sees Postgres. Postgres handles the routing. ClickHouse stays specialized for what it's good at. From the setup in the blog:
CREATE SERVER ch FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(driver 'http');
CREATE USER MAPPING FOR CURRENT_USER SERVER ch;
CREATE EXTENSION pg_clickhouse;
CREATE SCHEMA customer;
IMPORT FOREIGN SCHEMA "default" FROM SERVER ch INTO customer;After this, customer.events looks like a Postgres table. The illusion is what makes it useful — and the illusion is what makes pushdown so critical to get right.
3. The Default Behavior: Naive Federation
This is what happens without pushdown. It's the trap every FDW falls into on day one.
The bug here isn't conceptual — Postgres is doing exactly what you asked. The bug is that the FDW didn't tell Postgres "I can do filters remotely." So the planner, conservatively, pulled everything and filtered locally.
This is especially painful with ClickHouse on the other end. ClickHouse is a world-class filtering engine — primary-key skip indices, sparse indexes, vectorized SIMD scans, columnar pruning. Bypassing all of that to filter in Postgres is like flying first-class to the airport and then walking to your destination.
4. Pushdown: The Mental Model
The fix is conceptually simple: before executing the foreign scan, look at the query plan and see what work the remote side can do. Translate as much of the WHERE/ORDER BY/LIMIT/aggregation as possible into the remote dialect, send it, and let the remote do the heavy lifting.
The pushdown taxonomy
Not all pushdowns are equal. There's a hierarchy of how aggressive an FDW can be:
The April 2026 pg_clickhouse release is heavily focused on Level 3 — pushing down JSONB accessors, date/time functions, and array functions. This is the level where most real-world apps live: complex WHERE clauses with function calls everywhere.
5. How Postgres FDWs Actually Implement Pushdown
This is the part most blog posts skip. Pushdown isn't magic — it's a specific dance between the Postgres planner and the FDW's callback functions.
The translation step
The hardest part of building an FDW is the expression walker — the code that walks the Postgres expression tree and decides, for each node, whether the remote system has an equivalent.
This is exactly what you see in the EXPLAIN VERBOSE output:
Remote SQL: SELECT id, event, props FROM "default".events
WHERE ((props.cid = 'C200'))
The Postgres planner is showing you the literal string it generated. That string is what travels to ClickHouse.
6. The April 2026 Release: What Got Pushed Down
The blog covers four major pushdown additions. Let's look at each from an engineering angle.
6a. JSONB accessor pushdown
This is the headline feature. Postgres's ->> operator (JSON path → text) gets translated to ClickHouse's sub-column syntax on the new JSON type.
The -> operator (returns JSONB, not text) is trickier — Postgres compares JSONB-to-JSONB, ClickHouse needs the sub-column wrapped in toJSONString() to match Postgres semantics:
-- Postgres input
WHERE props -> 'cid' = '"C300"'::jsonb
-- ClickHouse remote SQL
WHERE toJSONString(props.cid) = '"C300"'This is a critical detail: pushdown isn't just translation, it's semantic-preserving translation. If the wrapped expression doesn't return exactly what Postgres would have returned, the result is wrong — and that's worse than slow.
6b. Date/time function pushdown
The win here is more subtle but more important than it looks. Without pushdown, CURRENT_DATE evaluated on the Postgres side becomes a literal date before the query even leaves Postgres. So why does pushdown matter?
This is the cascade effect of pushdown: one un-pushable function in the middle of an expression tree can poison the whole branch. That's why a customer hit failures combining CURRENT_DATE with date_part() — the latter was pushable, but the former wasn't, so neither could go.
The fix in v0.2.0 was to push down all the "current"-style functions with proper timezone handling:
| Postgres function | ClickHouse equivalent |
|---|---|
CURRENT_DATE | toDate(now('America/New_York')) |
CURRENT_TIMESTAMP | now64(6, 'America/New_York') |
CURRENT_TIMESTAMP(3) | now64(3, 'America/New_York') |
clock_timestamp() | nowInBlock64(6, 'America/New_York') |
statement_timestamp() | nowInBlock64(6, 'America/New_York') |
transaction_timestamp() | nowInBlock64(6, 'America/New_York') |
The timezone is captured from the Postgres session — that's what makes the translation semantically correct, not just syntactically.
6c. Array function pushdown
The same translation pattern, applied to arrays:
The blog shows a particularly nice example — composing array pushdown with JSON pushdown:
WHERE string_to_array(jsonb_extract_path_text(props, 'address', 'city'), ', ')
= ARRAY['Portland', 'USA']becomes
WHERE splitByString(', ', props.address.city) = ['Portland','USA']Two layers of translation collapse cleanly into a single remote expression. This is what you want — pushdown that composes, not pushdown that breaks the moment you nest functions.
7. The Other Direction: Streaming (Pushback)
Pushdown solves the "send less data over the wire" problem. But what if the user legitimately asks for a huge result set — say, exporting a billion-row table for ML training?
The memory profile, visualized
The blog includes timings — here's what happens during a 40-second extract of the NYC taxi dataset:
Same query. Same result set. One holds the whole thing in memory and unwinds at the end. The other holds 50–86 MiB throughout.
Why batched streaming is the right design
50MB is a deliberate choice. Big enough to amortize per-batch overhead. Small enough that ten concurrent queries don't blow up the Postgres backend. This is the same trade-off Spark made with its task batch size, JDBC made with setFetchSize(), and Arrow Flight made with record batches.
8. Why This Pattern Shows Up Everywhere
Once you internalize the pushdown mental model, you start seeing it in every distributed system you touch.
The unifying principle: wherever you have a separation between "the thing that knows what's wanted" and "the thing that has the data," you want to push the want as close to the data as possible. This is true for query engines, APIs, file systems, networks, even CPU caches (data locality is just CPU-scale pushdown).
A case study: what changes when pushdown is missing
The numbers are illustrative but not exaggerated — selectivity of 1-in-a-million is normal for analytical queries. Pushdown isn't a 2x optimization; it's a 1000x or 1,000,000x optimization depending on the selectivity.
9. Failure Modes & Edge Cases
Pushdown is dangerous if you get it wrong. Here are the traps every FDW author eventually steps in:
The April 2026 release shows multiple instances of careful semantic preservation:
- The
->operator wraps the result intoJSONString()so JSONB-to-JSONB comparison works CURRENT_DATEcarries the Postgres session timezone explicitly into the ClickHouse expressionCURRENT_TIMESTAMPdefaults to precision 6 (Postgres default), and respectsCURRENT_TIMESTAMP(3)when the user specifies it
These aren't details — they're the difference between a pushdown you can trust and one that silently returns wrong answers.
10. Core Principles Cheat Sheet
TL;DR table
| Concept | One-line summary |
|---|---|
| Pushdown | Send the filter/transform to where the data lives, not the other way around. |
| Foreign Data Wrapper (FDW) | The Postgres extension API that lets one DB query another as if it were local. |
| pg_clickhouse | An FDW that connects Postgres to ClickHouse, with aggressive pushdown of WHERE/ORDER BY/HAVING. |
| EXPLAIN VERBOSE | Your friend. Shows the literal remote SQL the FDW will send. Use it to verify pushdown is working. |
| Cascade failure | When one un-pushable expression forces a whole predicate to be evaluated locally. The reason every function pushdown matters. |
| Semantic preservation | Pushdown is only safe if the translated expression returns exactly what the original would have. |
| Streaming / pushback | The other direction of the same problem — bound memory when results are legitimately large. |
| Universal pattern | Spark, Trino, BigQuery, DuckDB, GraphQL, ORMs — the pushdown idea shows up everywhere compute and data are separated. |
A final note from the engineer's chair
If you're building a system that sits in front of a fast backend — whether that's ClickHouse, BigQuery, Iceberg-on-S3, or a custom storage engine — pushdown is not a nice-to-have. It's the single biggest determinant of whether your federation is actually usable in production.
The pg_clickhouse team's "pushdown, pushdown, pushdown!" mantra is not a slogan; it's a recognition that every release of a federation tool is fundamentally an exercise in reducing the gap between what the user wrote and what runs natively on the fast side.
Every function you push down is one less reason for the user to drop the abstraction and go talk to ClickHouse directly. That's the whole game.
Created 5/8/2026, 3:06:52 AM · z0Q3Rm1H