vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I've requested this before without response, but I'm asking again because it just caused me pain again: could we get a TODO added to have the planner recognize equivalent IN and EXISTS constructs and have them compete on cost estimates? I know it's not a trivial improvement, but if it's on the list maybe someone will pick it up, and I see it as the single biggest weakness in PostgreSQL performance. I don't need help resolving this particular case, because the fix is always blinding obvious when we hit this, and it doesn't even break portability because no other database we've tested fails to recognize these equivalent cases. step=# explain DELETE FROM "Body" WHERE "bodySeqNo" NOT IN (SELECT "bodySeqNo" FROM "Message"); QUERY PLAN ---------------------------------------------------------------------------------- Seq Scan on "Body" (cost=90277.43..285235351699.39 rows=3313379 width=6) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=90277.43..159793.40 rows=6627957 width=11) -> Seq Scan on "Message" (cost=0.00..80413.07 rows=6627957width=11) (5 rows) step=# explain DELETE FROM "Body" WHERE NOT EXISTS (SELECT * FROM "Message" m WHERE m."bodySeqNo" = "Body"."bodySeqNo"); QUERY PLAN -------------------------------------------------------------------------------------------- Seq Scan on "Body" (cost=0.00..3401760.88 rows=3313416 width=6) Filter: (NOT (subplan)) SubPlan -> Index Scan using "Message_Body" on "Message" m (cost=0.00..0.49rows=1 width=136) Index Cond: (("bodySeqNo")::numeric = ($0)::numeric) (5 rows) The bodySeqNo column is NOT NULL in both tables, and is the primary key in the Body table. The Message table has a non-unique index on it. (\d lists will follow at the bottom.) I cancelled the first query after it had been running for 54 hours over our slowest hours (the weekend). The second form ran in four minutes in competition with peak time queries. -Kevin step=# \d "Body" Table "public.Body" Column | Type | Modifiers -------------+------------------------+----------- bodySeqNo | "SequenceT" | not null contentType | character varying(255) | not null encoding | character varying(255) | body | "BodyT" | Indexes: "Body_pkey" PRIMARY KEY, btree ("bodySeqNo") step=# \d "Message" Table "public.Message" Column | Type | Modifiers -----------------+--------------------------+----------- messageId | "SequenceT" | not null clientMessageId | "ClientMessageIdT" | not null correlationId | "SequenceT" | destQueue | "QueueNameT" | not null replyToQueue | "QueueNameT" | not null typeCode | character(2) | expiration | timestamp with time zone | priority | smallint | not null status | character(2) | not null created | timestamp with time zone | not null lastModified | timestamp with time zone | not null bodySeqNo | "SequenceT" | not null messageIdSearch | "PrioritySequenceT" | not null Indexes: "Message_pkey" PRIMARY KEY, btree ("messageId") "MessageIndex2" UNIQUE, btree ("destQueue", "clientMessageId") "Message_MessageIdSearch" UNIQUE, btree ("destQueue", status, "messageIdSearch") CLUSTER "Message_Body" btree ("bodySeqNo") "Message_Created" btree ("destQueue", status, created) "Message_Created2" btree ("destQueue", created) "Message_Expiration" btree (expiration) "Message_LastModified" btree ("destQueue", "lastModified") "Message_ReplyToQueue" btree ("replyToQueue") Foreign-key constraints: "Message_fk1" FOREIGN KEY ("destQueue") REFERENCES "Queue"(name) "Message_fk2" FOREIGN KEY ("replyToQueue") REFERENCES "Queue"(name) ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Mon, 2007-10-22 at 09:31 -0500, Kevin Grittner wrote: > I've requested this before without response, but I'm asking again > because it just caused me pain again: could we get a TODO added to > have the planner recognize equivalent IN and EXISTS constructs and > have them compete on cost estimates? I know it's not a trivial > improvement, but if it's on the list maybe someone will pick it up, > and I see it as the single biggest weakness in PostgreSQL > performance. I'll pick it up as a default unless someone requests they have it from me. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| >>> On Mon, Oct 22, 2007 at 1:30 PM, in message <1193077831.4319.61.camel@ebony.site>, Simon Riggs <simon@2ndquadrant.com> wrote: > On Mon, 2007-10-22 at 09:31 -0500, Kevin Grittner wrote: >> I've requested this before without response, but I'm asking again >> because it just caused me pain again: could we get a TODO added to >> have the planner recognize equivalent IN and EXISTS constructs and >> have them compete on cost estimates? I know it's not a trivial >> improvement, but if it's on the list maybe someone will pick it up, >> and I see it as the single biggest weakness in PostgreSQL >> performance. > > I'll pick it up as a default unless someone requests they have it from > me. Thanks, Simon. One more logically equivalent, PostgreSQL-specific form which costs out even better was suggested off-list: step=# explain DELETE FROM "Body" USING "Message" WHERE "Message"."bodySeqNo" = "Body"."bodySeqNo"; QUERY PLAN -------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..696766.20 rows=4048543 width=6) Merge Cond: (("Body"."bodySeqNo")::numeric = ("Message"."bodySeqNo")::numeric) -> Index Scan using "Body_pkey" on "Body" (cost=0.00..326108.11 rows=4048543 width=18) -> Index Scan using "Message_Body" on "Message" (cost=0.00..310085.16 rows=4048847 width=12) (4 rows) If both of the other syntaxes could compete against that, it would be fantastic. (If that's feasible.) -Kevin ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| >>> On Mon, Oct 22, 2007 at 4:37 PM, in message <471CD1BE.EE98.0025.0@wicourts.gov>, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > One more logically equivalent, PostgreSQL-specific form which > costs out even better was suggested off-list: Oops. That is not logically equivalent. We want to delete WHERE NOT EXISTS; the logic of that suggestion is backwards. Disregard that last post, please. -Kevin ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| 2007/10/23, Kevin Grittner <Kevin.Grittner@wicourts.gov>: > >>> On Mon, Oct 22, 2007 at 4:37 PM, in message > <471CD1BE.EE98.0025.0@wicourts.gov>, "Kevin Grittner" > <Kevin.Grittner@wicourts.gov> wrote: > > > One more logically equivalent, PostgreSQL-specific form which > > costs out even better was suggested off-list: > > Oops. That is not logically equivalent. We want to delete WHERE NOT > EXISTS; the logic of that suggestion is backwards. > > Disregard that last post, please. > > -Kevin > > my mistake, sorry Pavel ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| >>> On Mon, Oct 22, 2007 at 5:04 PM, in message <471CD819.EE98.0025.0@wicourts.gov>, "Kevin Grittner" > Oops. That is not logically equivalent. We want to delete WHERE NOT > EXISTS; the logic of that suggestion is backwards. > > Disregard that last post, please. Maybe that last post shouldn't be totally disregarded -- it wouldn't be a bad idea to support a Merge NOT IN Join if it the effort isn't out of line with the benefit. Pavel suggested a clever kludge to accomplish this, which costs out better than anything else I've tried: step=# explain DELETE FROM "Body" step-# WHERE "bodySeqNo" IN (SELECT "Body"."bodySeqNo" step(# FROM "Body" step(# LEFT JOIN "Message" step(# ON "Body"."bodySeqNo" = "Message"."bodySeqNo" step(# WHERE "Message"."bodySeqNo" IS NULL); QUERY PLAN -------------------------------------------------------------------------------------------------------------- Merge IN Join (cost=825315.30..1265285.81 rows=2010418 width=6) Merge Cond: ((public."Body"."bodySeqNo")::numeric = (public."Body"."bodySeqNo")::numeric) -> Index Scan using "Body_pkey" on "Body" (cost=0.00..383702.32 rows=4020835 width=18) -> Materialize (cost=825315.30..846401.18 rows=2010418 width=12) -> Merge Left Join (cost=0.00..822323.18 rows=2010418 width=12) Merge Cond: ((public."Body"."bodySeqNo")::numeric = ("Message"."bodySeqNo")::numeric) Filter: ("Message"."bodySeqNo" IS NULL) -> Index Scan using "Body_pkey" on "Body" (cost=0.00..383702.32 rows=4020835 width=12) -> Index Scan using "Message_Body" on "Message" (cost=0.00..378901.17 rows=4021733 width=12) (9 rows) Just some ideas to look at while you're "in the neighborhood." -Kevin ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |