vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Dear Gurus, Version: 7.4.6 I use a query on a heavily indexed table which picks a wrong index unexpectedly. Since this query is used in response to certain user interactions thousands of times in succession (with different constants), 500ms is not affordable for us. I can easily work around this, but I'd like to understand the root of the problem. Basically, there are two relevant indexes: - muvelet_vonalkod_muvelet btree (muvelet, ..., idopont) - muvelet_vonalkod_pk3 btree (idopont, ...) Query is: SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1. I expected the planner to choose the index on muvelet, then sort by idopont. Instead, it took the other index. I think there is heavy correlation since muvelet references to a sequenced pkey and idopont is a timestamp (both increase with passing time). May that be a cause? See full table description and explain analyze results at end of the email. TIA, -- G. ---- table : Table "public.muvelet_vonalkod" Column | Type | Modifiers ------------+--------------------------+----------------------------------- az | integer | not null def. nextval('...') olvaso_nev | character varying | not null vonalkod | character varying | not null mozgasnem | integer | not null idopont | timestamp with time zone | not null muvelet | integer | minoseg | integer | not null cikk | integer | muszakhely | integer | muszakkod | integer | muszaknap | date | repre | boolean | not null default false hiba | integer | not null default 0 Indexes: "muvelet_vonalkod_pkey" primary key, btree (az) "muvelet_vonalkod_pk2" unique, btree (olvaso_nev, idopont) "muvelet_vonalkod_muvelet" btree (muvelet, mozgasnem, vonalkod, olvaso_nev, idopont) "muvelet_vonalkod_pk3" btree (idopont, olvaso_nev) "muvelet_vonalkod_vonalkod" btree (vonalkod, mozgasnem, olvaso_nev, idopont) Foreign-key constraints: "$1" FOREIGN KEY (mozgasnem) REFERENCES mozgasnem(az) "$2" FOREIGN KEY (muvelet) REFERENCES muvelet(az) "$3" FOREIGN KEY (minoseg) REFERENCES minoseg(az) "$4" FOREIGN KEY (cikk) REFERENCES cikk(az) "$5" FOREIGN KEY (muszakhely) REFERENCES hely(az) "$6" FOREIGN KEY (muszakkod) REFERENCES muszakkod(az) "muvelet_vonalkod_muszak_fk" FOREIGN KEY (muszakhely, muszaknap, muszakkod) REFERENCES muszak(hely, nap, muszakkod) Triggers: muvelet_vonalkod_aiud AFTER INSERT OR DELETE OR UPDATE ON muvelet_vonalkod FOR EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_aiud() muvelet_vonalkod_biu BEFORE INSERT OR UPDATE ON muvelet_vonalkod FOR EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_biu() muvelet_vonalkod_noty AFTER INSERT OR DELETE OR UPDATE ON muvelet_vonalkod FOR EACH ROW EXECUTE PROCEDURE muvelet_vonalkod_noty() -- original query, limit # explain analyze select idopont from muvelet_vonalkod where muvelet=6859 order by idopont limit 1; QUERY PLAN ---------------------------------------------------------------------------- Limit (cost=0.00..25.71 rows=1 width=8) (actual time=579.528..579.529 rows=1 loops=1) -> Index Scan using muvelet_vonalkod_pk3 on muvelet_vonalkod (cost=0.00..8304.42 rows=323 width=8) (actual time=579.522..579.522 rows=1 loops=1) Filter: (muvelet = 6859) Total runtime: 579.606 ms (4 rows) -- however, if I omit the limit clause: # explain analyze select idopont from muvelet_vonalkod where muvelet=6859 order by idopont; QUERY PLAN --------------------------------------------------------------------------- Sort (cost=405.41..405.73 rows=323 width=8) (actual time=1.295..1.395 rows=360 loops=1) Sort Key: idopont -> Index Scan using muvelet_vonalkod_muvelet on muvelet_vonalkod (cost=0.00..400.03 rows=323 width=8) (actual time=0.049..0.855 rows=360 loops=1) Index Cond: (muvelet = 6859) Total runtime: 1.566 ms (5 rows) -- workaround 1: the planner is hard to trick... # explain analyze select idopont from (select idopont from muvelet_vonalkod where muvelet=6859) foo order by idopont limit 1; QUERY PLAN --------------------------------------------------------------------------- Limit (cost=0.00..25.71 rows=1 width=8) (actual time=584.403..584.404 rows=1 loops=1) -> Index Scan using muvelet_vonalkod_pk3 on muvelet_vonalkod (cost=0.00..8304.42 rows=323 width=8) (actual time=584.397..584.397 rows=1 loops=1) Filter: (muvelet = 6859) Total runtime: 584.482 ms (4 rows) -- workaround 2: quite ugly but seems to work (at least for this -- one test case): # explain analyze select idopont from (select idopont from muvelet_vonalkod where muvelet=6859 order by idopont) foo order by idopont limit 1; QUERY PLAN --------------------------------------------------------------------------- Limit (cost=405.41..405.42 rows=1 width=8) (actual time=1.754..1.755 rows=1 loops=1) -> Subquery Scan foo (cost=405.41..407.35 rows=323 width=8) (actual time=1.751..1.751 rows=1 loops=1) -> Sort (cost=405.41..405.73 rows=323 width=8) (actual time=1.746..1.746 rows=1 loops=1) Sort Key: idopont -> Index Scan using muvelet_vonalkod_muvelet on muvelet_vonalkod (cost=0.00..400.03 rows=323 width=8) (actual time=0.377..1.359 rows=360 loops=1) Index Cond: (muvelet = 6859) Total runtime: 1.853 ms (7 rows) ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Wed, Dec 21, 2005 at 07:03:00PM +0100, Sz?cs Gbor wrote: > Version: 7.4.6 [...] > Query is: > SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1. > > I expected the planner to choose the index on muvelet, then sort by idopont. > Instead, it took the other index. I think the planner is guessing that since you're ordering on idopont, scanning the idopont index will find the first matching row faster than using the muvelet index would. In many cases that's a good bet, but in this case the guess is wrong and you end up with a suboptimal plan. I just ran some tests with 8.1.1 and it chose the better plan for a query similar to what you're doing. One of the developers could probably explain why; maybe it's because of the changes that allow better use of multicolumn indexes. Try 8.1.1 if you can and see if you get better results. > -- workaround 2: quite ugly but seems to work (at least for this > -- one test case): > # explain analyze > select idopont from > (select idopont from muvelet_vonalkod > where muvelet=6859 order by idopont) foo > order by idopont limit 1; Another workaround is to use OFFSET 0 in the subquery. -- Michael Fuhr ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| =?ISO-8859-2?Q?Sz=FBcs_G=E1bor?= <surrano@gmail.com> writes: > Query is: > SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1. Much the best solution for this would be to have an index on (muvelet, idopont) --- perhaps you can reorder the columns of "muvelet_vonalkod_muvelet" instead of making a whole new index --- and then say SELECT idopont WHERE muvelet = x ORDER BY muvelet, idopont LIMIT 1 PG 8.1 can apply such an index to your original query, but older versions will need the help of the modified ORDER BY to recognize that the index is usable. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Dear Tom, On 2005.12.21. 20:34, Tom Lane wrote: > =?ISO-8859-2?Q?Sz=FBcs_G=E1bor?= <surrano@gmail.com> writes: >> Query is: >> SELECT idopont WHERE muvelet = x ORDER BY idopont LIMIT 1. > > Much the best solution for this would be to have an index on > (muvelet, idopont) > --- perhaps you can reorder the columns of "muvelet_vonalkod_muvelet" > instead of making a whole new index --- and then say > > SELECT idopont WHERE muvelet = x ORDER BY muvelet, idopont LIMIT 1 I was far too tired yesterday evening to produce such a clean solution but finally came to this conclusion this morning it picks the index on muvelet, which decreases time to ~1.5ms. The new index takes it down to 0.1ms. However, this has a problem; namely, what if I don't (or can't) tell the exact int value in the WHERE clause? In general: will the following query: SELECT indexed_ts_field FROM table WHERE indexed_int_field IN (100,200) -- or even: indexed_int_field BETWEEN 100 AND 200 ORDER BY indexed_ts_field LIMIT n always pick the index on the timestamp field, or does it depend on something else, say the limit size n and the attributes' statistics? > PG 8.1 can apply such an index to your original query, but older > versions will need the help of the modified ORDER BY to recognize > that the index is usable. So the direct cause is that 7.x planners prefer ORDER BY to WHERE when picking indexes? But only when there is a LIMIT clause present? I'd like to know how much of our code should I review; if it's explicitly connected to LIMIT, I'd probably have to check far less code. -- G. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| Thread Tools | |
| Display Modes | |
|
|