Four NOT EXISTS and a COUNT: rewriting a query to run in linear time
Someone handed me a query that ran for the better part of an hour to return a single number, and asked why. It counts the distinct results of a four-way chain join, each row filtered by four NOT EXISTS subqueries:
SELECT COUNT(*) AS q_full_size
FROM (
SELECT DISTINCT
R1.x1, R1.x2, R2.x3, R3.x4, R4.x5
FROM R1
JOIN R2 ON R1.x2 = R2.x2
JOIN R3 ON R2.x3 = R3.x3
JOIN R4 ON R3.x4 = R4.x4
WHERE
NOT EXISTS (SELECT 1 FROM S1 WHERE S1.x1 = R1.x1 AND S1.x2 = R1.x2 AND S1.x3 = R2.x3)
AND NOT EXISTS (SELECT 1 FROM S2 WHERE S2.x2 = R2.x2 AND S2.x3 = R2.x3 AND S2.x4 = R3.x4)
AND NOT EXISTS (SELECT 1 FROM S3 WHERE S3.x2 = R2.x2 AND S3.x3 = R2.x3 AND S3.x4 = R3.x4 AND S3.x5 = R4.x5)
AND NOT EXISTS (SELECT 1 FROM S4 WHERE S4.x3 = R3.x3 AND S4.x4 = R3.x4 AND S4.x5 = R4.x5)
) AS full_results;
The interesting part is not that it was slow. It’s that this query is, almost verbatim, Example 3.3 of a paper called Computing Conjunctive Queries with Negations Efficiently, and the paper tells you how to make it run in time linear in the input plus the answer.
Why the obvious plan is expensive
Read the plan an optimizer naturally picks. First build the four-way join R1 ⋈ R2 ⋈ R3 ⋈ R4. Then, for each surviving row, probe S1 through S4 to drop the forbidden ones. Then DISTINCT. Then count.
The cost is dominated by the first step. That intermediate join can be far larger than the final count, because every dangling tuple that will eventually be killed by a negation still gets built, materialized, and carried through the pipeline before anyone checks whether it belongs. You pay for rows you immediately throw away. Call the input size \(N\) and the number of answers \(\mathrm{OUT}\):
\[N \;=\; \sum_{e \in \mathcal{E}^{+}} |R_e| \;+\; \sum_{e \in \mathcal{E}^{-}} |S_e|, \qquad \mathrm{OUT} \;=\; |Q(D)|.\]How big can that intermediate get? The query is acyclic, so it is polynomial, not exponential. The honest worst case for a path of four binary relations follows from the AGM bound: the join size is at most \(N^{\rho^{*}}\), where \(\rho^{*}\) is the fractional edge-cover number, and for a path that number is \(2\). So the naive plan is quadratic, \(O(N^2)\), while the rewrite below is linear. And here \(\mathrm{OUT}\) is the number we want — the outer COUNT(*) over a DISTINCT projection. We are building a quadratic intermediate object solely to count what survives it.
The structural insight: this is a chain Signed CQ
Strip the SQL down to relational algebra. Writing \(\bowtie\) for natural join and \(\triangleright\) for anti-join (keep left rows with no match on the right), the query denotes:
\[Q \;=\; R_1(x_1,x_2)\bowtie R_2(x_2,x_3)\bowtie R_3(x_3,x_4)\bowtie R_4(x_4,x_5)\;\triangleright\;S_1(x_1,x_2,x_3)\;\triangleright\;S_2(x_2,x_3,x_4)\;\triangleright\;S_3(x_2,x_3,x_4,x_5)\;\triangleright\;S_4(x_3,x_4,x_5).\]This is a Signed Conjunctive Query: positive relations joined, negative relations anti-joined. Model it as a hypergraph on the variables \(\mathcal{V}=\{x_1,\dots,x_5\}\). The positive edges \(\mathcal{E}^{+}=\{R_1,R_2,R_3,R_4\}\) are binary and form a path:
\[x_1 - x_2 - x_3 - x_4 - x_5.\]Now look at where each negation lives on that path. \(S_1\) covers \(\{x_1,x_2,x_3\}\), \(S_2\) covers \(\{x_2,x_3,x_4\}\), \(S_3\) covers \(\{x_2,x_3,x_4,x_5\}\), \(S_4\) covers \(\{x_3,x_4,x_5\}\). Every negation spans a contiguous range of the chain. That contiguity is the property the algorithm exploits.
A query is signed-acyclic (Def 2.4) when \((\mathcal{V},\,\mathcal{E}^{+}\cup S)\) is \(\alpha\)-acyclic for every subset \(S\subseteq\mathcal{E}^{-}\) of the negations. A path is acyclic, and adding a contiguous-range hyperedge to a path keeps it acyclic, so any combination of these negations preserves acyclicity. The query is signed-acyclic. By Lemma 2.6, signed-acyclic is exactly the class that is quasi-linear-time solvable (quasi-linear meaning \(O(N\log N)\), i.e. linear up to log factors) — the negation analog of Yannakakis’s theorem that acyclic CQs are quasi-linear (Lemma 2.2). And because the positive part is a path and each negation is a contiguous range, it is the well-behaved special case the paper calls a chain SCQ, which sharpens the quasi-linear bound to a clean \(O(N + \mathrm{OUT})\) relational algorithm (Theorem 3.5).
The rewrite recipe
Four steps turn the structure into a plan.
Step 1 — Reduce (Algorithm 1). Fold away the easy negations: any NOT EXISTS whose columns all sit inside a single base relation becomes a direct anti-join on that relation, and two negations with the same schema get UNIONed. For this query nothing simplifies — no negation fits inside one binary positive edge — but it matters in general, and Reduce provably preserves both the answers and signed-acyclicity (Lemmas 2.7, 2.8).
Step 2 — Yannakakis full reducer. Run a forward then backward semijoin pass over the positive relations only — Yannakakis’s classic full reducer: two semijoin passes over an acyclic join tree remove every dangling tuple, so every surviving positive tuple participates in a full join of the positive part:
pseudocode, not runnable; ⋉ is semijoin
-- forward, toward x5: R2 := R2 ⋉ R1; R3 := R3 ⋉ R2; R4 := R4 ⋉ R3;
-- backward, toward x1: R3 := R3 ⋉ R4; R2 := R2 ⋉ R3; R1 := R1 ⋉ R2;
This passes over \(\mathcal{E}^{+}\) alone, so it establishes consistency with respect to the positive joins, not the whole query — a tuple it keeps can still be killed later by a negation. It preserves \(Q(D)\) regardless: a positive tuple deleted as dangling cannot appear in any full join of the positive part, so it could never have survived the anti-joins either, and anti-joins only ever remove tuples. The pass is cheap and it shrinks everything downstream.
Step 3 — Handle the negations at a covering node. An anti-join \(S_e\) is only legal once all of its attributes are available somewhere. A negation whose schema is covered by an already-available attribute set can simply be pushed there as an early anti-join, before the full join. But every negation in this query spans more than one positive edge — none fits inside a single binary \(R\) — so none can be pushed as a free-standing anti-join. They are instead consumed by the marginalization step below, which is exactly why Step 1 found nothing to simplify.
Step 4 — Marginalize endpoints by counting. This counting step is what avoids materializing the join. By the full-join decomposition (Lemma 2.9), with \(U\) the unique attributes,
\[Q(D) \;=\; \Join_{x\in U}\; \pi_{\mathcal{V}-\{x\}}\,Q(D).\]So instead of materializing the join we peel off one endpoint at a time. The endpoints \(x_1\) and \(x_5\) are signed-leaves (Lemma 3.1), and removing one leaves a smaller chain SCQ (Lemma 3.2). ChainProject (Algorithm 3) eliminates an endpoint not by enumerating its values but by counting them.
Walking it through on this query
A note on semantics before the arithmetic: all relations are sets, duplicate-free. Every count below is over distinct extensions — which is why the S-plus CTEs use SELECT DISTINCT and why COUNT(*) FROM R1 GROUP BY x2 correctly counts distinct \(x_1\) values. The full-join decomposition reconstructs the distinct answer set, so the propagated count equals COUNT(DISTINCT ...) \(= \mathrm{OUT}\) exactly. With bag semantics the count would be wrong.
Peel \(x_1\). The only positive edge touching \(x_1\) is \(R_1\), and the only negation mentioning it is \(S_1\). Count, per residual tuple, how many values of \(x_1\) exist versus how many are forbidden:
WITH
R1plus AS (SELECT x2, COUNT(*) AS count0 FROM R1 GROUP BY x2),
S1plus AS (SELECT x2, x3, COUNT(*) AS count1
FROM (SELECT DISTINCT S1.x1, S1.x2, S1.x3
FROM S1 JOIN R1 ON S1.x1=R1.x1 AND S1.x2=R1.x2) t
GROUP BY x2, x3),
-- exclude a residual tuple iff EVERY x1 is killed: forbidden == total
S1prime AS (SELECT p.x2, s.x3
FROM R1plus p JOIN S1plus s ON p.x2 = s.x2
WHERE s.count1 = p.count0)
-- residual (⋈ join, ▷ anti-join in the comment below):
-- Q1 = R2 ⋈ R3 ⋈ R4 ▷ S1prime(x2,x3) ▷ S2 ▷ S3 ▷ S4
The count-equality predicate carries the work. A (x2,x3) combination is dropped exactly when the number of \(x_1\) values forbidden by \(S_1\) equals the total number of \(x_1\) extensions — meaning no value of \(x_1\) survives, so the tuple contributes nothing. In symbols, with negations \(f_1\subseteq\cdots\subseteq f_\ell\) on edge \(e_x\):
We learned everything about \(x_1\) without ever listing its values. Note that S1prime has schema \(\{x_2,x_3\}\), which sits inside \(R_2\)’s edge \(\{x_2,x_3\}\), so the next Reduce pass folds it straight into \(R_2\) (R2 := R2 ⋈ S1prime) and it disappears from \(\mathcal{E}^{-}\) — matching the paper’s reduced \(Q_1 = R_2 \bowtie R_3 \bowtie R_4 \triangleright S_2 \triangleright S_3 \triangleright S_4\).
Peel \(x_5\). Symmetric, but now two negations mention the endpoint, nested \(S_4\subseteq S_3\). The displayed formula uses a disjoint union \(\biguplus\) over \(f_1\subseteq\cdots\subseteq f_\ell\), and the reason it is disjoint is a non-obvious lemma: the per-negation forbidden \(x\)-value sets are pairwise disjoint (Lemma 3.4, second observation, \(\pi_x(S^{\oplus}_{f_i}\ltimes\cdots)\cap\pi_x(S^{\oplus}_{f_j}\ltimes\cdots)=\varnothing\) for \(i\neq j\)). That disjointness is exactly what lets us add counts instead of doing inclusion-exclusion. To make the partition watertight, count2 must count the \(x_5\) values \(S_3\) forbids that \(S_4\) does not already forbid:
WITH
R4plus AS (SELECT x4, COUNT(*) AS count0 FROM R4 GROUP BY x4),
-- forbidden-by-S4 x5 values, per (x3,x4)
S4fb AS (SELECT DISTINCT S4.x3, S4.x4, S4.x5
FROM S4 JOIN R4 ON S4.x4=R4.x4 AND S4.x5=R4.x5),
S4plus AS (SELECT x3, x4, COUNT(*) AS count1 FROM S4fb GROUP BY x3, x4),
-- forbidden-by-S3 x5 values, per (x2,x3,x4)
S3fb AS (SELECT DISTINCT S3.x2, S3.x3, S3.x4, S3.x5
FROM S3 JOIN R4 ON S3.x4=R4.x4 AND S3.x5=R4.x5),
-- count2 = S3-forbidden x5 NOT already forbidden by S4 (keeps the partition disjoint)
S3plus AS (SELECT a.x2, a.x3, a.x4, COUNT(*) AS count2
FROM S3fb a
WHERE NOT EXISTS (SELECT 1 FROM S4fb b
WHERE b.x3=a.x3 AND b.x4=a.x4 AND b.x5=a.x5)
GROUP BY a.x2, a.x3, a.x4),
S4prime AS (SELECT p.x4, s.x3 FROM R4plus p JOIN S4plus s ON p.x4=s.x4
WHERE s.count1 = p.count0),
S3prime AS (SELECT s3.x2, s3.x3, s3.x4
FROM R4plus p JOIN S4plus s4 ON p.x4=s4.x4
JOIN S3plus s3 ON s3.x4=s4.x4 AND s3.x3=s4.x3
WHERE (s4.count1 + s3.count2) = p.count0)
-- residual: Q2 = R1 ⋈ R2 ⋈ R3 ▷ S1 ▷ S2 ▷ S3prime(x2,x3,x4) ▷ S4prime(x3,x4)
Because \(S_4\subseteq S_3\), every \(x_5\) value forbidden via \(S_4\) already lies inside \(S_3\)’s range, so each forbidden \(x_5\) can be attributed to exactly one negation. With count2 excluding the values \(S_4\) already caught, the two forbidden sets partition the forbidden \(x_5\) values and their cardinalities add: count1 + count2 is the true size of the union, not a multiset over-count. That is why the disjoint union \(\biguplus\) in the formula becomes a plain + in SQL. (Sum the two raw counts without the NOT EXISTS filter and any \(x_5\) forbidden by both is double-counted, the equality trips early, and you drop residual tuples that actually had a surviving \(x_5\) — undercounting \(\mathrm{OUT}\).)
Both residuals \(Q_1\) and \(Q_2\) are smaller chain SCQs. Recurse on each down to the base case of a single positive relation, then natural-join the two recursive results to reassemble the answer — that’s Lemma 2.9 in action, and ChainProject is correct by Lemma 3.4: \(Q_{\bar x}(D_{\bar x}) = \pi_{\mathcal{V}-\{x\}}\,Q(D)\).
What this buys you, especially for COUNT
The paper guarantees \(O(N + \mathrm{OUT})\) for computing \(Q(D)\) (Theorem 3.5), and it is honest about the \(\mathrm{OUT}\) term: the recursion’s base case returns a relation, and its final step natural-joins the two recursive results into an output of size \(\mathrm{OUT}\), bounded by \(O(\mathrm{OUT})\). The algorithm materializes \(\mathrm{OUT}\)-sized residuals; the counting inside ChainProject is only over an eliminated endpoint’s extensions, not a running total of the global answer, so it does not, by itself, collapse that bound.
But the user never asked for the rows — they asked COUNT(*), which opens a specialization. In principle one could carry counts instead of tuples through the recursion: each ChainProject is already a constant number of GROUP BY/semijoin passes linear in \(N\), so if every step propagates a count of surviving distinct tuples, nothing is ever enumerated and the \(\mathrm{OUT}\) term has nowhere to come from, leaving \(O(N)\). To be clear, this \(O(N)\) counting variant is my extrapolation, not something Theorem 3.5 states — the paper’s algorithm still materializes \(\mathrm{OUT}\)-sized residuals and its stated bound is \(O(N + \mathrm{OUT})\). Read it as the counting-DP specialization I would reach for, not a property of the cited method.
The catch is that no SQL optimizer recognizes signed-acyclicity on its own. It sees four NOT EXISTS clauses, picks a join order blind to the chain structure, and applies the negations late. You get the linear-time behavior only by writing the plan out: semijoin reduction over the positive part, then the count-equality marginalization expressed as GROUP BY ... COUNT(*) CTEs with count-equality predicates. It is more SQL to read, but it is SQL whose cost you can actually bound.
The broader lesson is one I keep relearning: the negations weren’t the problem, and neither was the join. The problem was order — checking forbidden-ness only after paying to build everything. Push the negation into a count, and the question “does any extension survive?” gets answered with arithmetic instead of enumeration. Next time a NOT EXISTS query crawls, the first thing I’ll do is draw the hypergraph and check whether every negation lands on a contiguous piece of an acyclic skeleton. If it does, there’s a linear-time plan hiding in there, waiting to be written by hand.