View Single Post

   
  #2 (permalink)  
Old 04-19-2008, 09:26 AM
Kevin Grittner
 
Posts: n/a
Default Re: [HACKERS] EXISTS optimization



>>> On Fri, Mar 23, 2007 at 4:49 PM, in message <25339.1174686582@sss.pgh.pa.us>,

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> explain analyze
>> SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date",

> "H"."userId", "H"."time"
>> FROM "Adjustment" "A"
>> JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND

> "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
>> WHERE "H"."tranType" = 'A'
>> AND "A"."date" > DATE '2006- 01- 01'
>> AND "H"."countyNo" = 66
>> AND "A"."countyNo" = 66
>> AND EXISTS
>> (
>> SELECT 1 FROM "TranDetail" "D"
>> WHERE "D"."tranNo" = "H"."tranNo"
>> AND "D"."countyNo" = "H"."countyNo"
>> AND "D"."caseNo" LIKE '2006TR%'
>> )
>> ;

>
>> The commercial product scans the index on caseNo in TranDetail to build a

> work table of unique values, then uses indexed access to the TranHeader and
> then to Adjustment.
>
> If you want that, try rewriting the EXISTS to an IN:
>
> AND ("H"."tranNo", "H"."countyNo") IN
> (
> SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
> WHERE "D"."caseNo" LIKE '2006TR%'
> )


Nice. I get this:

explain analyze
SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
FROM "Adjustment" "A"
JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
WHERE "H"."tranType" = 'A'
AND "A"."date" > DATE '2006- 01- 01'
AND "H"."countyNo" = 66
AND "A"."countyNo" = 66
AND ("H"."tranNo", "H"."countyNo") IN
(
SELECT "D"."tranNo", "D"."countyNo" FROM "TranDetail" "D"
WHERE "D"."caseNo" LIKE '2006TR%'
)
;

Nested Loop (cost=27.76..36.38 rows=1 width=46) (actual time=92.999..200.398 rows=2209 loops=1)
Join Filter: (("H"."tranNo")::integer = ("A"."tranNo")::integer)
-> Nested Loop (cost=27.76..32.08 rows=1 width=50) (actual time=92.970..176.472 rows=2209 loops=1)
-> HashAggregate (cost=27.76..27.77 rows=1 width=6) (actual time=92.765..100.810 rows=9788 loops=1)
-> Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D" (cost=0.00..27.66 rows=20 width=6) (actual time=0.059..60.967 rows=46301 loops=1)
Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND (("countyNo")::smallint = 66))
Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)
-> Index Scan using "TranHeader_pkey" on "TranHeader" "H" (cost=0.00..4.30 rows=1 width=46) (actual time=0.006..0.006 rows=0 loops=9788)
Index Cond: ((("H"."tranNo")::integer = ("D"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66))
Filter: (("tranType")::bpchar = 'A'::bpchar)
-> Index Scan using "Adjustment_pkey" on "Adjustment" "A" (cost=0.00..4.28 rows=1 width=22) (actual time=0.008..0.009 rows=1 loops=2209)
Index Cond: ((("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar) AND (("A"."countyNo")::smallint = 66))
Filter: ((date)::date > '2006-01-01'::date)
Total runtime: 201.306 ms

That's the good news. The bad news is that I operate under a management portability dictate which doesn't currently allow that syntax, since not all of the products they want to cover support it. I tried something which seems equivalent, but it is running for a very long time. I'll show it with just the explain while I wait to see how long the explain analyze takes.

explain
SELECT "A"."adjustmentNo", "A"."tranNo", "A"."countyNo", "H"."date", "H"."userId", "H"."time"
FROM "Adjustment" "A"
JOIN "TranHeader" "H" ON ("H"."tranId" = "A"."adjustmentNo" AND "H"."countyNo" = "A"."countyNo" AND "H"."tranNo" = "A"."tranNo")
WHERE "H"."tranType" = 'A'
AND "A"."date" > DATE '2006- 01- 01'
AND "H"."countyNo" = 66
AND "A"."countyNo" = 66
AND "H"."tranNo" IN
(
SELECT "D"."tranNo" FROM "TranDetail" "D"
WHERE "D"."caseNo" LIKE '2006TR%'
AND "D"."countyNo" = "H"."countyNo"
)
;

Nested Loop (cost=0.00..181673.08 rows=1 width=46)
Join Filter: (("H"."tranId")::bpchar = ("A"."adjustmentNo")::bpchar)
-> Seq Scan on "Adjustment" "A" (cost=0.00..2384.27 rows=11733 width=22)
Filter: (((date)::date > '2006-01-01'::date) AND (("countyNo")::smallint = 66))
-> Index Scan using "TranHeader_pkey" on "TranHeader" "H" (cost=0.00..15.27 rows=1 width=46)
Index Cond: ((("H"."tranNo")::integer = ("A"."tranNo")::integer) AND (("H"."countyNo")::smallint = 66))
Filter: ((("tranType")::bpchar = 'A'::bpchar) AND (subplan))
SubPlan
-> Index Scan using "TranDetail_TranDetCaseNo" on "TranDetail" "D" (cost=0.00..27.66 rows=20 width=4)
Index Cond: ((("caseNo")::bpchar >= '2006TR'::bpchar) AND (("caseNo")::bpchar < '2006TS'::bpchar) AND (("countyNo")::smallint = ($0)::smallint))
Filter: (("caseNo")::bpchar ~~ '2006TR%'::text)

> We don't currently try to flatten EXISTS into a unique/join plan as we
> do for IN. I seem to recall not doing so when I rewrote IN planning
> because I didn't think it would be exactly semantically equivalent,
> but that was awhile ago. Right at the moment it seems like it ought
> to be equivalent as long as the comparison operators are strict.


There are a great many situations where they are exactly semantically equivalent. In fact, the commercial database product usually generates an identical plan. I could try to work out (or better yet find) a formal description of when that equivalence holds, if someone would be up for implementing it. Barring that, I could see if management would approve some time for me to look at submitting a patch, but I haven't looked at the code involved, so I have no idea of the scale of effort involved yet.

-Kevin



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply With Quote