Unix Technical Forum

index for group by

This is a discussion on index for group by within the pgsql Sql forums, part of the PostgreSQL category; --> Hi, is there a way to speedup "group by" queries with an index? In particular if I have a ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Sql

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 07-23-2008, 11:40 AM
Patrick Scharrenberg
 
Posts: n/a
Default index for group by

Hi,

is there a way to speedup "group by" queries with an index?

In particular if I have a table like this:

CREATE TABLE data
(
id1 integer,
id2 integer,
somedata character varying,
ts timestamp with time zone
);

where continously data is logged about "id1" and "id2" into "somedata",
together with the timestamp when it was logged.

So I have multiple rows with the same id1 and id2 but different
timestamp (and data maybe).

At the moment I have ~40.000.000 rows in that table so doing a

SELECT id1, id2 FROM data GROUP BY id1, id2;

takes some time (~10 minutes)
and return about 1.000.000 rows.

I created an index on both colums id1 and id2 (together) which takes
about 800 MB but doesn't speedup things.
In fact it even doesn't seem to be used.

Is there any way to speedup this "group by" or does it seem more likely
that I have a conceptional flaw?

regards
Patrick

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 07-23-2008, 11:40 AM
A. Kretschmer
 
Posts: n/a
Default Re: index for group by

am Tue, dem 22.07.2008, um 13:18:30 +0200 mailte Patrick Scharrenberg folgendes:
> Hi,
>
> is there a way to speedup "group by" queries with an index?
>
> In particular if I have a table like this:
>
> CREATE TABLE data
> (
> id1 integer,
> id2 integer,
> somedata character varying,
> ts timestamp with time zone
> );
>
> where continously data is logged about "id1" and "id2" into "somedata",
> together with the timestamp when it was logged.
>
> So I have multiple rows with the same id1 and id2 but different
> timestamp (and data maybe).
>
> At the moment I have ~40.000.000 rows in that table so doing a
>
> SELECT id1, id2 FROM data GROUP BY id1, id2;



without a where-clause every select forces a seq-scan.


>
> takes some time (~10 minutes)
> and return about 1.000.000 rows.
>
> I created an index on both colums id1 and id2 (together) which takes
> about 800 MB but doesn't speedup things.
> In fact it even doesn't seem to be used.


The database has to read all rows, an index can't help in this case.


>
> Is there any way to speedup this "group by" or does it seem more likely
> that I have a conceptional flaw?


Hard to say...


Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:38 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com