Unix Technical Forum

Re: group by will not use an index?

This is a discussion on Re: group by will not use an index? within the Pgsql Performance forums, part of the PostgreSQL category; --> Actually, as I recently discovered, GROUP BY is faster than DISTINCT. It'sjust due to how they are implemented, so ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 09:03 AM
Brian Herlihy
 
Posts: n/a
Default Re: group by will not use an index?

Actually, as I recently discovered, GROUP BY is faster than DISTINCT. It'sjust due to how they are implemented, so don't go looking for any deep reason The thread "GROUP BY vs DISTINCT" from 2006-12-20 discusses it. DISTINCT sorts the results to find the unique rows, but GROUP BY uses a hash.

Brian

----- Original Message ----
From: Adam Rich <adam.r@sbcglobal.net>
To: tsuraan <tsuraan@gmail.com>; pgsql-performance <pgsql-performance@postgresql.org>
Sent: Wednesday, 10 January, 2007 7:32:50 AM
Subject: Re: [PERFORM] group by will not use an index?

Message





That
query looks strange to me (a group by without an aggregate). See if this
is

any
faster:



SELECT
DISTINCT DATE(inserted) FROM Messages



I
won't hold my breath though, I don't think there's any way around the full table
scan

in
Postgres, because the index does not contain enough information about
transactional

state,
so table access is always required (unlike virtually every other type of
db)






-----Original Message-----
From:
pgsql-performance-owner@postgresql.org
[mailtogsql-performance-owner@postgresql.org] On Behalf Of
tsuraan
Sent: Tuesday,January 09, 2007 5:06 PM
To:
pgsql-performance
Subject: [PERFORM]group by will not use an
index?


I have a table of messages with paths and inserted
dates (among other things), like so:

CREATETABLE Messages
(
msgkey BIGSERIAL PRIMARY KEY,

path TEXT NOT NULL,
inserted TIMESTAMP WITHOUT TIMEZONE
DEFAULT NOW()
);

I run a query to determine which days actually saw
emails come in, like so:

SELECT DATE(inserted) FROM Messages GROUP BY
DATE(inserted);

That's obviously not very efficient, so I made an
index:

CREATE INDEX messages_date_inserted_ind ON
Messages(DATE(inserted));

However, GROUP BY does not use this
index:

=# explain analyze select date(inserted) from messages group by
date(inserted);

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------
HashAggregate
(cost=104773.10..104789.51 rows=1313 width=8) (actual
time=31269.476..31269.557 rows=44 loops=1)
-> Seq
Scan on messages (cost=0.00..101107.25 rows=1466340 width=8) (actual
time=23.923..25248.400 rows=1467036 loops=1)
Total runtime: 31269.735
ms
(3 rows)


Is it possible to get pg to use an index in a group
by? I don't see why it wouldn't be possible, but maybe I'm missing
something.

Using pg 8.1.4...




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 09:03 AM
Steinar H. Gunderson
 
Posts: n/a
Default Re: group by will not use an index?

On Tue, Jan 09, 2007 at 05:07:03PM -0800, Brian Herlihy wrote:
> Actually, as I recently discovered, GROUP BY is faster than DISTINCT. It's
> just due to how they are implemented, so don't go looking for any deep
> reason The thread "GROUP BY vs DISTINCT" from 2006-12-20 discusses it.
> DISTINCT sorts the results to find the unique rows, but GROUP BY uses a
> hash.


Actually, GROUP BY _can_ use a sort too, it's just that a hash is usually
faster.

/* Steinar */
--
Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:13 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com