vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Consider this setup - which is a gross simplification of parts of our production system ;-) create table c (id integer primary key); create table b (id integer primary key, c_id integer); create index b_on_c on b(c_id) insert into c (select ... lots of IDs ...); insert into b (select id, id from c); /* keep it simple :-) */ Now, I'm just interessted in some few rows. All those gives good plans: explain select c.id from c order by c.id limit 1; explain select c.id from c group by c.id order by c.id limit 1; explain select c.id from c join b on c_id=c.id order by c.id limit 1; .... BUT ... combining join, group and limit makes havoc: explain select c.id from c join b on c_id=c.id group by c.id order by c.id desc limit 5; QUERY PLAN ------------------------------------------------------------------------------------- Limit (cost=3809.65..3809.67 rows=5 width=4) -> Group (cost=3809.65..3940.59 rows=26187 width=4) -> Sort (cost=3809.65..3875.12 rows=26188 width=4) Sort Key: c.id -> Hash Join (cost=559.34..1887.89 rows=26188 width=4) Hash Cond: ("outer".id = "inner".c_id) -> Seq Scan on c (cost=0.00..403.87 rows=26187 width=4) -> Hash (cost=403.87..403.87 rows=26187 width=4) -> Seq Scan on b (cost=0.00..403.87 rows=26187 width=4) (9 rows) I get the same behaviour on pg 7.4.7 and pg 8.0.2. Of course, I can probably use subqueries instead of join - though, I would have wished the planner could do better ;-) -- Notice of Confidentiality: This information may be confidential, and blah-blah-blah - so please keep your eyes closed. Please delete and destroy this email. Failure to comply will cause my lawyer to yawn. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Fri, 2005-08-26 at 02:27 +0200, Tobias Brox wrote: > Consider this setup - which is a gross simplification of parts of our > production system ;-) > > create table c (id integer primary key); > create table b (id integer primary key, c_id integer); > create index b_on_c on b(c_id) > > insert into c (select ... lots of IDs ...); > insert into b (select id, id from c); /* keep it simple :-) */ > > Now, I'm just interessted in some few rows. > > All those gives good plans: > > explain select c.id from c order by c.id limit 1; > explain select c.id from c group by c.id order by c.id limit 1; > explain select c.id from c join b on c_id=c.id order by c.id limit 1; > > ... BUT ... combining join, group and limit makes havoc: > > explain select c.id from c join b on c_id=c.id group by c.id order by c.id > desc limit 5; Where's b in this join clause? It looks like a cartesian product to me. -jwb ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| [Jeffrey W. Baker - Thu at 06:56:59PM -0700] > > explain select c.id from c join b on c_id=c.id group by c.id order by c.id > > desc limit 5; > > Where's b in this join clause? "join b on c_id=c.id" It just a funny way of writing: select c.id from c,b where c_id=c.id group by c.id order by c.id desc limit 5; > It looks like a cartesian product to me. No. The query will return exactly the same as the simplest query: select c.id from c order by c.id desc limit 5; As said, this is a gross oversimplification of the production envorinment. In the production environment, I really need to use both join, group and limit. I tested a bit with subqueries, it was not a good solution (selecting really a lot of rows and aggregates from many of the tables). The next idea is to hack it up by manually finding out where the "limit" will cut, and place a restriction in the where-part of the query. -- Notice of Confidentiality: This information may be confidential, and blah-blah-blah - so please keep your eyes closed. Please delete and destroy this email. Failure to comply will cause my lawyer to yawn. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Thu, 2005-08-25 at 18:56 -0700, Jeffrey W. Baker wrote: > On Fri, 2005-08-26 at 02:27 +0200, Tobias Brox wrote: > > Consider this setup - which is a gross simplification of parts of our > > production system ;-) > > > > create table c (id integer primary key); > > create table b (id integer primary key, c_id integer); > > create index b_on_c on b(c_id) > > > > insert into c (select ... lots of IDs ...); > > insert into b (select id, id from c); /* keep it simple :-) */ > > > > Now, I'm just interessted in some few rows. > > > > All those gives good plans: > > > > explain select c.id from c order by c.id limit 1; > > explain select c.id from c group by c.id order by c.id limit 1; > > explain select c.id from c join b on c_id=c.id order by c.id limit 1; > > > > ... BUT ... combining join, group and limit makes havoc: > > > > explain select c.id from c join b on c_id=c.id group by c.id order by c.id > > desc limit 5; > > Where's b in this join clause? It looks like a cartesian product to me. Nevermind. I read c_id as c.id. -jwb ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Tobias, Interesting example: The 'desc' seems to be the guy triggering the sort, e.g: explain select c.id from c join b on c_id=c.id group by c.id order by c.id limit 5; QUERY PLAN ----------------------------------------------------------------------------------------- Limit (cost=0.00..0.28 rows=5 width=4) -> Group (cost=0.00..4476.00 rows=80000 width=4) -> Merge Join (cost=0.00..4276.00 rows=80000 width=4) Merge Cond: ("outer".id = "inner".c_id) -> Index Scan using c_pkey on c (cost=0.00..1518.00 rows=80000 width=4) -> Index Scan using b_on_c on b (cost=0.00..1558.00 rows=80000 width=4) (6 rows) Whereas with it back in again: explain select c.id from c join b on c_id=c.id group by c.id order by c.id desc limit 5; QUERY PLAN -------------------------------------------------------------------------------------- Limit (cost=10741.08..10741.11 rows=5 width=4) -> Group (cost=10741.08..11141.08 rows=80000 width=4) -> Sort (cost=10741.08..10941.08 rows=80000 width=4) Sort Key: c.id -> Hash Join (cost=1393.00..4226.00 rows=80000 width=4) Hash Cond: ("outer".c_id = "inner".id) -> Seq Scan on b (cost=0.00..1233.00 rows=80000 width=4) -> Hash (cost=1193.00..1193.00 rows=80000 width=4) -> Seq Scan on c (cost=0.00..1193.00 rows=80000 width=4) (9 rows) However being a bit brutal: set enable_mergejoin=false; set enable_hashjoin=false; explain select c.id from c join b on c_id=c.id group by c.id order by c.id desc limit 5; QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=0.00..15.24 rows=5 width=4) -> Group (cost=0.00..243798.00 rows=80000 width=4) -> Nested Loop (cost=0.00..243598.00 rows=80000 width=4) -> Index Scan Backward using c_pkey on c (cost=0.00..1518.00 rows=80000 width=4) -> Index Scan using b_on_c on b (cost=0.00..3.01 rows=1 width=4) Index Cond: (b.c_id = "outer".id) (6 rows) What is interesting is why this plan is being rejected... Cheers Mark Tobias Brox wrote: > Consider this setup - which is a gross simplification of parts of our > production system ;-) > > create table c (id integer primary key); > create table b (id integer primary key, c_id integer); > create index b_on_c on b(c_id) > > insert into c (select ... lots of IDs ...); > insert into b (select id, id from c); /* keep it simple :-) */ > > Now, I'm just interessted in some few rows. > > All those gives good plans: > > explain select c.id from c order by c.id limit 1; > explain select c.id from c group by c.id order by c.id limit 1; > explain select c.id from c join b on c_id=c.id order by c.id limit 1; > > ... BUT ... combining join, group and limit makes havoc: > > explain select c.id from c join b on c_id=c.id group by c.id order by c.id > desc limit 5; > ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| [Mark Kirkwood - Fri at 03:01:01PM +1200] > Tobias, > Interesting example: > > The 'desc' seems to be the guy triggering the sort, e.g: Oh; really an accident that I didn't notice myself, I was actually going to remove all instances of "desc" in my simplification, but seems like I forgot. > However being a bit brutal: > > set enable_mergejoin=false; > set enable_hashjoin=false; :-) maybe I can use that in production. I'll check. -- Notice of Confidentiality: This information may be confidential, and blah-blah-blah - so please keep your eyes closed. Please delete and destroy this email. Failure to comply will cause my lawyer to yawn. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Fri, 26 Aug 2005, Mark Kirkwood wrote: > However being a bit brutal: > > set enable_mergejoin=false; > set enable_hashjoin=false; > > explain select c.id from c join b on c_id=c.id group by c.id order by > c.id desc limit 5; > QUERY PLAN > > -------------------------------------------------------------------------------------------------- > Limit (cost=0.00..15.24 rows=5 width=4) > -> Group (cost=0.00..243798.00 rows=80000 width=4) > -> Nested Loop (cost=0.00..243598.00 rows=80000 width=4) > -> Index Scan Backward using c_pkey on c > (cost=0.00..1518.00 rows=80000 width=4) > -> Index Scan using b_on_c on b (cost=0.00..3.01 > rows=1 width=4) > Index Cond: (b.c_id = "outer".id) > (6 rows) > > What is interesting is why this plan is being rejected... Well, it expects 80000 probles into b_on_c to be more expensive than the hash join and sort. I wonder what explain analyze shows for the original and the version with the enables changed. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Mark Kirkwood <markir@paradise.net.nz> writes: > What is interesting is why this plan is being rejected... Which PG version are you using exactly? That mistake looks like an artifact of the 8.0 "fuzzy plan cost" patch, which we fixed recently: http://archives.postgresql.org/pgsql...7/msg00474.php But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy cost issue explains his results. As far as the "desc" point goes, the problem is that mergejoins aren't capable of dealing with backward sort order, so a merge plan isn't considered for that case (or at least, it would have to have a sort after it, which pretty much defeats the point for a fast-start plan). I have some ideas about fixing this but it won't happen before 8.2. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane wrote: > Mark Kirkwood <markir@paradise.net.nz> writes: > >>What is interesting is why this plan is being rejected... > > > Which PG version are you using exactly? That mistake looks like an > artifact of the 8.0 "fuzzy plan cost" patch, which we fixed recently: > http://archives.postgresql.org/pgsql...7/msg00474.php > Right on - 8.0.3 (I might look at how CVS tip handles this, could be interesting). > But Tobias wasn't happy with 7.4 either, so I'm not sure that the fuzzy > cost issue explains his results. > > As far as the "desc" point goes, the problem is that mergejoins aren't > capable of dealing with backward sort order, so a merge plan isn't > considered for that case (or at least, it would have to have a sort > after it, which pretty much defeats the point for a fast-start plan). > I have some ideas about fixing this but it won't happen before 8.2. That doesn't explain why the nested loop is being kicked tho', or have I missed something obvious? - it's been known to happen :-)... Cheers Mark ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| Mark Kirkwood <markir@paradise.net.nz> writes: > That doesn't explain why the nested loop is being kicked tho', No, but I think the fuzzy-cost bug does. There are two different issues here. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| Thread Tools | |
| Display Modes | |
|
|