Unix Technical Forum

Performance indexing of a simple query

This is a discussion on Performance indexing of a simple query within the Pgsql Performance forums, part of the PostgreSQL category; --> I have a table called 'jobs' with several million rows, and the only columns that are important to this ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 12:17 PM
Mark Fox
 
Posts: n/a
Default Performance indexing of a simple query

I have a table called 'jobs' with several million rows, and the only
columns that are important to this discussion are 'start_time' and
'completion_time'.

The sort of queries I want to execute (among others) are like:

SELECT * FROM jobs
WHERE completion_time > SOMEDATE AND start_time < SOMEDATE;

In plain english: All the jobs that were running at SOMEDATE. The
result of the query is on the order of 500 rows.

I've got seperate indexes on 'start_time' and 'completion_time'.

Now, if SOMEDATE is such that the number of rows with completion_time
> SOMEDATE is small (say 10s of thousands), the query uses index scans

and executes quickly. If not, the query uses sequential scans and is
unacceptably slow (a couple of minutes). I've used EXPLAIN and
EXPLAIN ANALYZE to confirm this. This makes perfect sense to me.

I've played with some of the memory settings for PostgreSQL, but none
has had a significant impact.

Any ideas on how to structure the query or add/change indexes in such
a way to improve its performance? In desperation, I tried using a
subquery, but unsurprisingly it made no (positive) difference. I feel
like there might be a way of using an index on both 'completion_time'
and 'start_time', but can't put a temporal lobe on the details.


Mark

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 12:17 PM
Jim C. Nasby
 
Posts: n/a
Default Re: Performance indexing of a simple query

Try

CREATE INDEX start_complete ON jobs( start_time, completion_time );

Try also completion_time, start_time. One might work better than the
other. Or, depending on your data, you might want to keep both.

In 8.1 you'll be able to do bitmap-based index combination, which might
allow making use of the seperate indexes.

On Wed, Aug 24, 2005 at 02:43:51PM -0600, Mark Fox wrote:
> I have a table called 'jobs' with several million rows, and the only
> columns that are important to this discussion are 'start_time' and
> 'completion_time'.
>
> The sort of queries I want to execute (among others) are like:
>
> SELECT * FROM jobs
> WHERE completion_time > SOMEDATE AND start_time < SOMEDATE;
>
> In plain english: All the jobs that were running at SOMEDATE. The
> result of the query is on the order of 500 rows.
>
> I've got seperate indexes on 'start_time' and 'completion_time'.
>
> Now, if SOMEDATE is such that the number of rows with completion_time
> > SOMEDATE is small (say 10s of thousands), the query uses index scans

> and executes quickly. If not, the query uses sequential scans and is
> unacceptably slow (a couple of minutes). I've used EXPLAIN and
> EXPLAIN ANALYZE to confirm this. This makes perfect sense to me.
>
> I've played with some of the memory settings for PostgreSQL, but none
> has had a significant impact.
>
> Any ideas on how to structure the query or add/change indexes in such
> a way to improve its performance? In desperation, I tried using a
> subquery, but unsurprisingly it made no (positive) difference. I feel
> like there might be a way of using an index on both 'completion_time'
> and 'start_time', but can't put a temporal lobe on the details.
>
>
> Mark
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>


--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 12:17 PM
Tom Lane
 
Posts: n/a
Default Re: Performance indexing of a simple query

Mark Fox <mark.fox@gmail.com> writes:
> The sort of queries I want to execute (among others) are like:
> SELECT * FROM jobs
> WHERE completion_time > SOMEDATE AND start_time < SOMEDATE;
> In plain english: All the jobs that were running at SOMEDATE.


AFAIK there is no good way to do this with btree indexes; the problem
is that it's fundamentally a 2-dimensional query and btrees are
1-dimensional. There are various hacks you can try if you're willing
to constrain the problem (eg, if you can assume some not-very-large
maximum on the running time of jobs) but in full generality btrees are
just the Wrong Thing.

So what you want to look at is a non-btree index, ie, rtree or gist.
For example, the contrib/seg data type could pretty directly be adapted
to solve this problem, since it can index searches for overlapping
line segments.

The main drawback of these index types in existing releases is that they
are bad on concurrent updates and don't have WAL support. Both those
things are (allegedly) fixed for GIST in 8.1 ... are you interested in
trying out 8.1beta?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 12:18 PM
Jim C. Nasby
 
Posts: n/a
Default Re: Performance indexing of a simple query

On Wed, Aug 24, 2005 at 07:42:00PM -0400, Tom Lane wrote:
> Mark Fox <mark.fox@gmail.com> writes:
> > The sort of queries I want to execute (among others) are like:
> > SELECT * FROM jobs
> > WHERE completion_time > SOMEDATE AND start_time < SOMEDATE;
> > In plain english: All the jobs that were running at SOMEDATE.


Uh, the plain english and the SQL don't match. That query will find
every job that was NOT running at the time you said.

> AFAIK there is no good way to do this with btree indexes; the problem
> is that it's fundamentally a 2-dimensional query and btrees are
> 1-dimensional. There are various hacks you can try if you're willing
> to constrain the problem (eg, if you can assume some not-very-large
> maximum on the running time of jobs) but in full generality btrees are
> just the Wrong Thing.


Ignoring the SQL and doing what the author actually wanted, wouldn't a
bitmap combination of indexes work here?

Or with an index on (start_time, completion_time), start an index scan
at start_time = SOMEDATE and only include rows where completion_time <
SOMEDATE. Of course if SOMEDATE is near the beginning of the table that
wouldn't help.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 12:18 PM
Tom Lane
 
Posts: n/a
Default Re: Performance indexing of a simple query

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> Uh, the plain english and the SQL don't match. That query will find
> every job that was NOT running at the time you said.


No, I think it was right. But anyway it was just an example.

> On Wed, Aug 24, 2005 at 07:42:00PM -0400, Tom Lane wrote:
>> AFAIK there is no good way to do this with btree indexes; the problem
>> is that it's fundamentally a 2-dimensional query and btrees are
>> 1-dimensional. There are various hacks you can try if you're willing
>> to constrain the problem (eg, if you can assume some not-very-large
>> maximum on the running time of jobs) but in full generality btrees are
>> just the Wrong Thing.


> Ignoring the SQL and doing what the author actually wanted, wouldn't a
> bitmap combination of indexes work here?


> Or with an index on (start_time, completion_time), start an index scan
> at start_time = SOMEDATE and only include rows where completion_time <
> SOMEDATE. Of course if SOMEDATE is near the beginning of the table that
> wouldn't help.


The trouble with either of those is that you have to scan very large
fractions of the index (if not indeed *all* of it) in order to get your
answer; certainly you hit much more of the index than just the region
containing matching rows. Btree just doesn't have a good way to answer
this type of query.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:16 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com