This is a discussion on quickly getting the top N rows within the Pgsql Performance forums, part of the PostgreSQL category; --> If I have this: create table foo (bar int primary key); ....then in my ideal world, Postgres would be ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| If I have this: create table foo (bar int primary key); ....then in my ideal world, Postgres would be able to use that index on bar to help me with this: select bar from foo order by bar desc limit 20; But in my experience, PG8.2 is doing a full table scan on foo, then sorting it, then doing the limit. I have a more complex primary key, but I was hoping the same concept would still apply. Am I doing something wrong, or just expecting something that doesn't exist? ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Thu, 2007-10-04 at 11:00 -0700, Ben wrote: > If I have this: > > create table foo (bar int primary key); > > ...then in my ideal world, Postgres would be able to use that index on bar > to help me with this: > > select bar from foo order by bar desc limit 20; > > But in my experience, PG8.2 is doing a full table scan on foo, then > sorting it, then doing the limit. I have a more complex primary key, but I > was hoping the same concept would still apply. Am I doing something wrong, > or just expecting something that doesn't exist? It has to do with the way that NULL values are stored in the index. This page has details and instructions for how to get it to work: http://developer.postgresql.org/pgdo...-ordering.html -- Mark Lewis ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| In response to Ben <bench@silentmedia.com>: > If I have this: > > create table foo (bar int primary key); > > ...then in my ideal world, Postgres would be able to use that index on bar > to help me with this: > > select bar from foo order by bar desc limit 20; > > But in my experience, PG8.2 is doing a full table scan on foo, then > sorting it, then doing the limit. I have a more complex primary key, but I > was hoping the same concept would still apply. Am I doing something wrong, > or just expecting something that doesn't exist? Show us the explain. However, 2 guesses: 1) You never analyzed the table, thus PG has awful statistics and doesn't know how to pick a good plan. 2) You have so few rows in the table that a seq scan is actually faster than an index scan, which is why PG uses it instead. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Ben <bench@silentmedia.com> schrieb: > If I have this: > > create table foo (bar int primary key); > > ...then in my ideal world, Postgres would be able to use that index on bar > to help me with this: > > select bar from foo order by bar desc limit 20; > > But in my experience, PG8.2 is doing a full table scan on foo, then sorting > it, then doing the limit. I have a more complex primary key, but I was Please show us the output from EXPLAIN ANALYSE select bar from foo order by bar desc limit 20; I try it, with 8.1, and i can see an index scan. You have, maybe, wrong statistics ot not enough (to few) rows in your table. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| On Thu, 4 Oct 2007, Bill Moran wrote: > However, 2 guesses: > 1) You never analyzed the table, thus PG has awful statistics and > doesn't know how to pick a good plan. > 2) You have so few rows in the table that a seq scan is actually > faster than an index scan, which is why PG uses it instead. No, the tables are recently analyzed and there are a couple hundred thousand rows in there. But I think I just figured it out.... it's a 3-column index, and two columns of that index are the same for every row. When I drop those two columns from the ordering restriction, the index gets used and things speed up 5 orders of magnitude. Maybe the planner is smart enough to think that if a column in the order by clause is identical for most rows, then using an index won't help.... but not smart enough to realize that if said column is at the *end* of the order by arguments, after columns which do sort quite well, then it should use an index after all. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Ben <bench@silentmedia.com> writes: > No, the tables are recently analyzed and there are a couple hundred > thousand rows in there. But I think I just figured it out.... it's a > 3-column index, and two columns of that index are the same for every row. > When I drop those two columns from the ordering restriction, the index > gets used and things speed up 5 orders of magnitude. > Maybe the planner is smart enough to think that if a column in the order > by clause is identical for most rows, then using an index won't help.... > but not smart enough to realize that if said column is at the *end* of the > order by arguments, after columns which do sort quite well, then it should > use an index after all. You're being about as clear as mud here, except that you obviously lied about what you were doing in your first message. If you have a planner problem, show us the *exact* query, the *exact* table definition, and unfaked EXPLAIN ANALYZE output. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Thu, 4 Oct 2007, Tom Lane wrote: > You're being about as clear as mud here, except that you obviously lied > about what you were doing in your first message. If you have a planner > problem, show us the *exact* query, the *exact* table definition, and > unfaked EXPLAIN ANALYZE output. I didn't realize that simplification was viewed as so sinister, but thanks, I'll remember that in the future. The table: Table "public.log" Column | Type | Modifiers ----------------+-----------------------------+--------------------- clientkey | character(30) | not null premiseskey | character(30) | not null logkey | character(30) | not null logicaldel | character(1) | default 'N'::bpchar lockey | character(30) | devlockey | character(30) | eventkey | character(30) | logshorttext | character varying(255) | logdesc | character varying(255) | loguserkey | character(30) | logassetkey | character(30) | logresourcekey | character(30) | logtime | timestamp without time zone | logip | character varying(50) | logarchived | character(1) | logarchivedate | timestamp without time zone | loghasvideo | character(1) | loghasaudio | character(1) | resvehiclekey | character(30) | synccreated | character(1) | logtypekey | character(30) | Indexes: "log_pkey" PRIMARY KEY, btree (clientkey, premiseskey, logkey) "eventkey_idx" btree (eventkey), "log_ak1" btree (clientkey, premiseskey, logtime, logkey) The original, slow query: explain analyze SELECT * FROM log WHERE clientkey in ('000000004500000000010000000001') AND premiseskey in ('000000004500000000010000000001') and logicaldel = 'N' ORDER BY logtime desc, logkey desc, clientkey desc, premiseskey desc LIMIT 20 offset 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=356402.58..356402.63 rows=20 width=563) (actual time=215858.481..215858.527 rows=20 loops=1) -> Sort (cost=356402.58..357598.25 rows=478267 width=563) (actual time=215858.478..215858.498 rows=20 loops=1) Sort Key: logtime, logkey, clientkey, premiseskey -> Seq Scan on log (cost=0.00..52061.67 rows=478267 width=563) (actual time=29.340..100043.313 rows=475669 loops=1) Filter: ((clientkey = '000000004500000000010000000001'::bpchar) AND (premiseskey = '000000004500000000010000000001'::bpchar) AND (logicaldel = 'N'::bpchar)) Total runtime: 262462.582 ms (6 rows) Every row in log has identical clientkey and premiseskey values, so if I just remove those columns from the order by clause, I get this far superior plan: explain analyze SELECT * FROM log WHERE clientkey in ('000000004500000000010000000001') AND premiseskey in ('000000004500000000010000000001') and logicaldel = 'N' ORDER BY logtime desc, logkey desc LIMIT 20 offset 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..12.33 rows=20 width=563) (actual time=0.047..0.105 rows=20 loops=1) -> Index Scan Backward using log_ak1 on log (cost=0.00..294735.70 rows=478267 width=563) (actual time=0.044..0.076 rows=20 loops=1) Index Cond: ((clientkey = '000000004500000000010000000001'::bpchar) AND (premiseskey = '000000004500000000010000000001'::bpchar)) Filter: (logicaldel = 'N'::bpchar) Total runtime: 0.165 ms (5 rows) ....which made me to think that maybe postgres is not using log_ak1 in the former case because two of the columns in the order by match every row. Unfortunately, in this case it's not an option to alter the query. I'm just trying to figure out an explaination. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Thu, 2007-10-04 at 12:52 -0700, Ben wrote: > The original, slow query: > > explain analyze SELECT * FROM log WHERE clientkey in > ('000000004500000000010000000001') AND premiseskey in > ('000000004500000000010000000001') and logicaldel = 'N' > ORDER BY logtime desc, logkey desc, clientkey desc, premiseskey desc LIMIT 20 offset 0; > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=356402.58..356402.63 rows=20 width=563) (actual time=215858.481..215858.527 rows=20 loops=1) > -> Sort (cost=356402.58..357598.25 rows=478267 width=563) (actual time=215858.478..215858.498 rows=20 loops=1) > Sort Key: logtime, logkey, clientkey, premiseskey > -> Seq Scan on log (cost=0.00..52061.67 rows=478267 width=563) (actual time=29.340..100043.313 rows=475669 loops=1) > Filter: ((clientkey = '000000004500000000010000000001'::bpchar) AND (premiseskey = '000000004500000000010000000001'::bpchar) AND (logicaldel = 'N'::bpchar)) > Total runtime: 262462.582 ms > (6 rows) > > > Every row in log has identical clientkey and premiseskey values, so if I > just remove those columns from the order by clause, I get this far > superior plan: > > explain analyze SELECT * FROM log WHERE clientkey in > ('000000004500000000010000000001') AND premiseskey in > ('000000004500000000010000000001') and logicaldel = 'N' > ORDER BY logtime desc, logkey desc LIMIT 20 offset 0; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..12.33 rows=20 width=563) (actual time=0.047..0.105 rows=20 loops=1) > -> Index Scan Backward using log_ak1 on log (cost=0.00..294735.70 rows=478267 width=563) (actual time=0.044..0.076 rows=20 loops=1) > Index Cond: ((clientkey = '000000004500000000010000000001'::bpchar) AND (premiseskey = '000000004500000000010000000001'::bpchar)) > Filter: (logicaldel = 'N'::bpchar) > Total runtime: 0.165 ms > (5 rows) > > > ...which made me to think that maybe postgres is not using log_ak1 in the > former case because two of the columns in the order by match every row. > > Unfortunately, in this case it's not an option to alter the query. I'm > just trying to figure out an explaination. In the first query, Postgres cannot use the index because the sort order of the index does not match the sort order of the query. When you change the sort order of the query so that it matches that of the index, then the index is used. If you define your index on (logtime, logkey, clientkey, premiseskey) rather than on (clientkey, premiseskey, logtime, logkey) you will have a fast query. Yes, the column order matters. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| On 10/4/07, Ben <bench@silentmedia.com> wrote: > If I have this: > > create table foo (bar int primary key); > > ...then in my ideal world, Postgres would be able to use that index on bar > to help me with this: > > select bar from foo order by bar desc limit 20; > > But in my experience, PG8.2 is doing a full table scan on foo, then > sorting it, then doing the limit. I have a more complex primary key, but I > was hoping the same concept would still apply. Am I doing something wrong, > or just expecting something that doesn't exist? pg uses an intelligent planner. It looks at the table, the number of rows, the distribution of values, and makes a decision whether to use seq scan or index. Do you have any evidence that in your case seq scan is a bad choice? try this experiment: psql mydb =# select * from foo; -- this will prime the db and put the table in memory if it will fit =# \timing =# set enable_seqscan=off; =# select bar from foo order by bar desc limit 20; =# set enable_seqscan=on; =# select bar from foo order by bar desc limit 20; and compare the times each takes. run each way a few times to be sure you're not getting random variance. On my reporting db with somewhere around 75 million tables, a similar query 0.894 mS and uses an index scan. Which is good, because a sequential scan on that table takes about 15 to 30 minutes. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| On Thu, 4 Oct 2007, Simon Riggs wrote: > In the first query, Postgres cannot use the index because the sort order > of the index does not match the sort order of the query. When you change > the sort order of the query so that it matches that of the index, then > the index is used. > > If you define your index on (logtime, logkey, clientkey, premiseskey) > rather than on (clientkey, premiseskey, logtime, logkey) you will have a > fast query. Yes, the column order matters. I thought that might explain it, but then I'm surprised that it can still use an index when the first two columns of the index aren't in the query. Wouldn't that mean that it might have to walk the entire index to find matching rows? .....unless it's smart enough to realize that the first two columns will match everything. Which would be cool. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |