Unix Technical Forum

Re: index for group by

This is a discussion on Re: index for group by within the pgsql Sql forums, part of the PostgreSQL category; --> At 09:20 AM 7/22/2008, pgsql-sql-owner@postgresql.org wrote: >Date: Tue, 22 Jul 2008 13:27:24 +0200 >From: "A. Kretschmer" <andreas.kretschmer@schollglas.com> >To: pgsql-sql@postgresql.org ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Sql

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 07-23-2008, 11:40 AM
Steve Midgley
 
Posts: n/a
Default Re: index for group by

At 09:20 AM 7/22/2008, pgsql-sql-owner@postgresql.org wrote:
>Date: Tue, 22 Jul 2008 13:27:24 +0200
>From: "A. Kretschmer" <andreas.kretschmer@schollglas.com>
>To: pgsql-sql@postgresql.org
>Subject: Re: index for group by
>Message-ID: <20080722112724.GD2742@a-kretschmer.de>
>
>am Tue, dem 22.07.2008, um 13:18:30 +0200 mailte Patrick Scharrenberg
>folgendes:
> > Hi,
> >
> > is there a way to speedup "group by" queries with an index?
> >
> > In particular if I have a table like this:
> >
> > CREATE TABLE data
> > (
> > id1 integer,
> > id2 integer,
> > somedata character varying,
> > ts timestamp with time zone
> > );
> >
> > where continously data is logged about "id1" and "id2" into

> "somedata",
> > together with the timestamp when it was logged.
> >
> > So I have multiple rows with the same id1 and id2 but different
> > timestamp (and data maybe).
> >
> > At the moment I have ~40.000.000 rows in that table so doing a
> >
> > SELECT id1, id2 FROM data GROUP BY id1, id2;

>
>
>without a where-clause every select forces a seq-scan.


First, why are you doing a group by when you aren't doing an
aggregation (like COUNT, SUM, etc)? It seems like you can get way
better performance by doing this:

SELECT DISTINCT ON (id1, id2) id1, id2 FROM data ORDER BY id1, id2

(Assuming your compound index is in "id1,id2" order). Am I missing
something?

A different more cumbersome idea I have for you (if you really do need
a GROUP BY) is to build a warehouse table that precalculates the data
you want. You can build some recurring process that runs every NN
minutes or hours and fires off a stored procedure which grabs all the
data from this "data" table, aggregates it and saves it to warehouse
table. You could aggregate against your datetime stamp by N hours or
days as well. If this idea is of interest you can write back to the
list or off-list to me for more info.

Steve


--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:17 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com