This is a discussion on BUG #2658: Query not using index within the pgsql Bugs forums, part of the PostgreSQL category; --> The following bug has been logged online: Bug reference: 2658 Logged by: Graham Davis Email address: gdavis@refractions.net PostgreSQL version: ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| The following bug has been logged online: Bug reference: 2658 Logged by: Graham Davis Email address: gdavis@refractions.net PostgreSQL version: 8.1.4 Operating system: Linux Description: Query not using index Details: I know that in version 8 you guys added support so that aggregate functions can take advantage of indexes. However, I have a simple query that is not taking advantage of an index where I believe it should. I have a large table full of GPS positions. I want to query the table for the most recent location of each asset (an asset is essentially a vehicle). The ts column is the timestamp, so I am using this to figure out the most recent position. I use the following query to do it: SELECT assetid, max(ts) AS ts FROM asset_positions GROUP BY assetid; I have an index on (ts), another index on (assetid) and a multikey index on (assetid, ts). I know the assetid index is pointless since the multikey one takes its place, but I put it there while testing just to make sure. The ANALYZE EXPLAIN for this query is: QUERY PLAN ---------------------------------------------------------------------------- ------------------------------------------------------------- HashAggregate (cost=125423.96..125424.21 rows=20 width=12) (actual time=39693.995..39694.036 rows=20 loops=1) -> Seq Scan on asset_positions (cost=0.00..116654.64 rows=1753864 width=12) (actual time=20002.362..34724.896 rows=1738693 loops=1) Total runtime: 39694.245 ms (3 rows) You can see it is doing a sequential scan on the table when it should be using the (assetid, ts) index, or at the very least the (ts) index. This query takes about 40 seconds to complete with a table of 1.7 million rows. I tested running the query without the group by as follows: SELECT max(ts) AS ts FROM asset_positions; This query DOES use the (ts) index and takes less than 1 ms to complete. So I'm not sure why my initial query is not using one of the indexes. I have to use the GROUP BY in my query so that I get the max ts of EACH asset. I've tried restructuring my query so that it will use an index, but nothing seems to work. I tried this syntax for example: SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid, ts DESC; It still does a sequential scan and takes 40+ seconds to complete. If I am missing something here, please let me know, but I believe this is a bug that needs addressing. If it is not a bug (and there just simply isn't support for this with multikey indexes yet), please let me know so I can either try restructuring the coding I am working on, or move on for now. The documentation does not mention anything about this, but I know from reading a list of changes in version 8 that this sort of support was added for aggregate functions. If you need more information, please let me know, thanks in advance. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| ||||
| This shouldn't have been submitted to the bugs list, as it isn't a bug. The best spot for this kind of question is the performance list so I am copying it there and redirecting followups there. On Wed, Sep 27, 2006 at 20:56:32 +0000, Graham Davis <gdavis@refractions.net> wrote: > > SELECT assetid, max(ts) AS ts > FROM asset_positions > GROUP BY assetid; > > I have an index on (ts), another index on (assetid) and a multikey index on > (assetid, ts). I know the assetid index is pointless since the multikey one > takes its place, but I put it there while testing just to make sure. The > ANALYZE EXPLAIN for this query is: > > QUERY PLAN > ---------------------------------------------------------------------------- > ------------------------------------------------------------- > HashAggregate (cost=125423.96..125424.21 rows=20 width=12) (actual > time=39693.995..39694.036 rows=20 loops=1) > -> Seq Scan on asset_positions (cost=0.00..116654.64 rows=1753864 > width=12) (actual time=20002.362..34724.896 rows=1738693 loops=1) > Total runtime: 39694.245 ms > (3 rows) > > You can see it is doing a sequential scan on the table when it should be > using the (assetid, ts) index, or at the very least the (ts) index. This > query takes about 40 seconds to complete with a table of 1.7 million rows. > I tested running the query without the group by as follows: > SELECT DISTINCT ON (assetid) assetid, ts > FROM asset_positions > ORDER BY assetid, ts DESC; This is almost what you want to do to get an alternative plan. But you need to ORDER BY assetid DESC, ts DESC to make use of the multicolumn index. If you really need the other output order, reverse it in your application or use the above as a subselect in another query that orders by assetid ASC. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |