This is a discussion on Re: index for group by within the pgsql Sql forums, part of the PostgreSQL category; --> At 09:20 AM 7/22/2008, pgsql-sql-owner@postgresql.org wrote: >Date: Tue, 22 Jul 2008 13:27:24 +0200 >From: "A. Kretschmer" <andreas.kretschmer@schollglas.com> >To: pgsql-sql@postgresql.org ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| At 09:20 AM 7/22/2008, pgsql-sql-owner@postgresql.org wrote: >Date: Tue, 22 Jul 2008 13:27:24 +0200 >From: "A. Kretschmer" <andreas.kretschmer@schollglas.com> >To: pgsql-sql@postgresql.org >Subject: Re: index for group by >Message-ID: <20080722112724.GD2742@a-kretschmer.de> > >am Tue, dem 22.07.2008, um 13:18:30 +0200 mailte Patrick Scharrenberg >folgendes: > > Hi, > > > > is there a way to speedup "group by" queries with an index? > > > > In particular if I have a table like this: > > > > CREATE TABLE data > > ( > > id1 integer, > > id2 integer, > > somedata character varying, > > ts timestamp with time zone > > ); > > > > where continously data is logged about "id1" and "id2" into > "somedata", > > together with the timestamp when it was logged. > > > > So I have multiple rows with the same id1 and id2 but different > > timestamp (and data maybe). > > > > At the moment I have ~40.000.000 rows in that table so doing a > > > > SELECT id1, id2 FROM data GROUP BY id1, id2; > > >without a where-clause every select forces a seq-scan. First, why are you doing a group by when you aren't doing an aggregation (like COUNT, SUM, etc)? It seems like you can get way better performance by doing this: SELECT DISTINCT ON (id1, id2) id1, id2 FROM data ORDER BY id1, id2 (Assuming your compound index is in "id1,id2" order). Am I missing something? A different more cumbersome idea I have for you (if you really do need a GROUP BY) is to build a warehouse table that precalculates the data you want. You can build some recurring process that runs every NN minutes or hours and fires off a stored procedure which grabs all the data from this "data" table, aggregates it and saves it to warehouse table. You could aggregate against your datetime stamp by N hours or days as well. If this idea is of interest you can write back to the list or off-list to me for more info. Steve -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql |
| Thread Tools | |
| Display Modes | |
|
|