Unix Technical Forum

Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

This is a discussion on Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever within the pgsql Bugs forums, part of the PostgreSQL category; --> "Fahad G." <Fahad.Gilani@anusf.anu.edu.au> writes: > -- Indexes > CREATE INDEX jobstat_lc_q4_2004_jobid ON jobstat_lc_q4_2004 USING btree > (jobid); > CREATE ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Bugs

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 09:26 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

"Fahad G." <Fahad.Gilani@anusf.anu.edu.au> writes:
> -- Indexes
> CREATE INDEX jobstat_lc_q4_2004_jobid ON jobstat_lc_q4_2004 USING btree
> (jobid);
> CREATE INDEX jobstat_lc_q4_2004_fetchtime ON jobstat_lc_q4_2004 USING btree
> (fetchtime);
> CREATE UNIQUE INDEX jobstat_lc_q4_2004_walltime ON
> unq_jobstat_lc_q4_2004_jobid_fetch USING btree (jobid, fetchtime);


I bet it's choosing the wrong index. What does EXPLAIN show in each
case?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 09:26 AM
Michael Fuhr
 
Posts: n/a
Default Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

On Fri, Jan 14, 2005 at 11:31:05PM -0500, Tom Lane wrote:
> "Fahad G." <Fahad.Gilani@anusf.anu.edu.au> writes:
> > -- Indexes
> > CREATE INDEX jobstat_lc_q4_2004_jobid ON jobstat_lc_q4_2004 USING btree
> > (jobid);
> > CREATE INDEX jobstat_lc_q4_2004_fetchtime ON jobstat_lc_q4_2004 USING btree
> > (fetchtime);
> > CREATE UNIQUE INDEX jobstat_lc_q4_2004_walltime ON
> > unq_jobstat_lc_q4_2004_jobid_fetch USING btree (jobid, fetchtime);


The last index is created on a different table -- should it be
created on the table we're working with? And if so, are the columns
(jobid, fetchtime) correct? The index name suggests otherwise.

> I bet it's choosing the wrong index. What does EXPLAIN show in each
> case?


I created the table and the two indexes (the third is on a different
table; creating it on this table didn't change anything), populated
the table with random data, and ANALYZEd it. Below are several
tests run on 8.0.0rc5; notice how case 4 is much slower than the
others. My random data probably doesn't have the same distribution
as Fahad's, but I appear to have duplicated the problem.


Case 1: jobid exists, no LIMIT

EXPLAIN ANALYZE SELECT * FROM jobstat_lc_q4_2004
WHERE jobid = 500 AND curr_walltime != 0 ORDER BY fetchtime;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=189.80..190.05 rows=98 width=149) (actual time=2.768..3.189 rows=94 loops=1)
Sort Key: fetchtime
-> Index Scan using jobstat_lc_q4_2004_jobid on jobstat_lc_q4_2004 (cost=0.00..186.56 rows=98 width=149) (actual time=0.099..1.727 rows=94 loops=1)
Index Cond: (jobid = 500)
Filter: (curr_walltime <> 0)
Total runtime: 3.851 ms
(6 rows)


Case 2: jobid exists, LIMIT

EXPLAIN ANALYZE SELECT * FROM jobstat_lc_q4_2004
WHERE jobid = 500 AND curr_walltime != 0 ORDER BY fetchtime LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..33.22 rows=1 width=149) (actual time=6.659..6.664 rows=1 loops=1)
-> Index Scan using jobstat_lc_q4_2004_fetchtime on jobstat_lc_q4_2004 (cost=0.00..3255.97 rows=98 width=149) (actual time=6.644..6.644 rows=1 loops=1)
Filter: ((jobid = 500) AND (curr_walltime <> 0))
Total runtime: 6.900 ms
(4 rows)


Case 3: jobid doesn't exist, no LIMIT

EXPLAIN ANALYZE SELECT * FROM jobstat_lc_q4_2004
WHERE jobid = 9999 AND curr_walltime != 0 ORDER BY fetchtime;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=189.80..190.05 rows=98 width=149) (actual time=0.103..0.103 rows=0 loops=1)
Sort Key: fetchtime
-> Index Scan using jobstat_lc_q4_2004_jobid on jobstat_lc_q4_2004 (cost=0.00..186.56 rows=98 width=149) (actual time=0.064..0.064 rows=0 loops=1)
Index Cond: (jobid = 9999)
Filter: (curr_walltime <> 0)
Total runtime: 0.325 ms
(6 rows)


Case 4: jobid doesn't exist, LIMIT

EXPLAIN ANALYZE SELECT * FROM jobstat_lc_q4_2004
WHERE jobid = 9999 AND curr_walltime != 0 ORDER BY fetchtime LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..33.22 rows=1 width=149) (actual time=684.957..684.957 rows=0 loops=1)
-> Index Scan using jobstat_lc_q4_2004_fetchtime on jobstat_lc_q4_2004 (cost=0.00..3255.97 rows=98 width=149) (actual time=684.937..684.937 rows=0 loops=1)
Filter: ((jobid = 9999) AND (curr_walltime <> 0))
Total runtime: 685.197 ms
(4 rows)

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 09:27 AM
Michael Fuhr
 
Posts: n/a
Default Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

I've simplified the test case to the following:

CREATE TABLE foo (
id integer NOT NULL,
value integer NOT NULL
);

INSERT INTO foo (id, value)
SELECT random() * 1000, random() * 1000
FROM generate_series(1, 100000);

CREATE INDEX foo_id_idx ON foo (id);
CREATE INDEX foo_value_idx ON foo (value);

VACUUM ANALYZE foo;

EXPLAIN ANALYZE SELECT * FROM foo WHERE id = -1 ORDER BY value;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=186.46..186.71 rows=99 width=8) (actual time=0.101..0.101 rows=0 loops=1)
Sort Key: value
-> Index Scan using foo_id_idx on foo (cost=0.00..183.18 rows=99 width=8) (actual time=0.067..0.067 rows=0 loops=1)
Index Cond: (id = -1)
Total runtime: 0.259 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE id = -1 ORDER BY value LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..25.79 rows=1 width=8) (actual time=631.964..631.964 rows=0 loops=1)
-> Index Scan using foo_value_idx on foo (cost=0.00..2552.75 rows=99 width=8) (actual time=631.942..631.942 rows=0 loops=1)
Filter: (id = -1)
Total runtime: 632.135 ms
(4 rows)

Maybe I don't understand something about what EXPLAIN is showing,
but why does Limit have an estimated cost of 0.00..25.79 when the
thing it's limiting has a cost of 0.00..2552.75? Is that the cost
of just the limit operation? Is it supposed to be the cumulative
cost of everything up to that point? Is the planner preferring
this plan because of the 25.79 cost?

A workaround appears to be:

EXPLAIN ANALYZE SELECT * FROM (SELECT * FROM foo WHERE id = -1 ORDER BY value) AS s LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=186.46..186.48 rows=1 width=8) (actual time=0.124..0.124 rows=0 loops=1)
-> Subquery Scan s (cost=186.46..187.70 rows=99 width=8) (actual time=0.110..0.110 rows=0 loops=1)
-> Sort (cost=186.46..186.71 rows=99 width=8) (actual time=0.099..0.099 rows=0 loops=1)
Sort Key: value
-> Index Scan using foo_id_idx on foo (cost=0.00..183.18 rows=99 width=8) (actual time=0.064..0.064 rows=0 loops=1)
Index Cond: (id = -1)
Total runtime: 0.313 ms
(7 rows)

I see that the Limit in this query has an estimated cost of
186.46..186.48, so I'm still wondering why the Limit in the previous
query had a cost of 0.00..25.79. Is that my ignorance about how
the planner works, or is it a bug?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 09:27 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

Michael Fuhr <mike@fuhr.org> writes:
> Limit (cost=0.00..25.79 rows=1 width=8) (actual time=631.964..631.964 rows=0 loops=1)
> -> Index Scan using foo_value_idx on foo (cost=0.00..2552.75 rows=99 width=8) (actual time=631.942..631.942 rows=0 loops=1)
> Filter: (id = -1)
> Total runtime: 632.135 ms
> (4 rows)


> Maybe I don't understand something about what EXPLAIN is showing,
> but why does Limit have an estimated cost of 0.00..25.79 when the
> thing it's limiting has a cost of 0.00..2552.75?


This represents the planner assuming that the indexscan will only need
to be run 1/99th of the way to completion. That is, having estimated
that there were 99 matching rows to be found, it assumes those are
uniformly distributed in the index-by-value, and that the scan can stop
as soon as the first one is found.

Since in reality there aren't *any* matching rows, the index scan has to
go all the way to the end :-(. Even if there were matching rows, they
might be much further out in the index order than the
uniform-distribution hypothesis predicts, because the id and value
columns might have been correlated.

Basically, what you're looking at here is that the planner is thinking
it should go for a fast-start plan in a scenario where that bet loses.
It's still a good bet though. I'm not sure how to formulate the notion
that there's too much risk of a slow result in this scenario.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-10-2008, 09:27 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

Michael Fuhr <mike@fuhr.org> writes:
> Would it be accurate to say that the planner makes the bet most
> likely to win without regard to how badly it might lose?


Yes, I think that's a fair summary.

> Is taking the downside into consideration a tough problem to solve, or
> is it simply not worthwhile in the large?


I don't know how to solve it, and whether it would be worthwhile would
depend considerably on how expensive the proposed solution is ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-10-2008, 09:27 AM
Michael Fuhr
 
Posts: n/a
Default Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

On Sun, Jan 16, 2005 at 02:56:11PM -0500, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
>
> > Maybe I don't understand something about what EXPLAIN is showing,
> > but why does Limit have an estimated cost of 0.00..25.79 when the
> > thing it's limiting has a cost of 0.00..2552.75?

>
> This represents the planner assuming that the indexscan will only need
> to be run 1/99th of the way to completion.


Thanks -- I understood the rationale for considering a scan on this
index but not why that plan was preferred. Your explanation provides
the piece I was missing.

> Basically, what you're looking at here is that the planner is thinking
> it should go for a fast-start plan in a scenario where that bet loses.
> It's still a good bet though. I'm not sure how to formulate the notion
> that there's too much risk of a slow result in this scenario.


Would it be accurate to say that the planner makes the bet most
likely to win without regard to how badly it might lose? Is taking
the downside into consideration a tough problem to solve, or is it
simply not worthwhile in the large?

Thanks again.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-10-2008, 09:27 AM
Michael Fuhr
 
Posts: n/a
Default Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

On Sun, Jan 16, 2005 at 04:08:35PM -0500, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
>
> > Is taking the downside into consideration a tough problem to solve, or
> > is it simply not worthwhile in the large?

>
> I don't know how to solve it, and whether it would be worthwhile would
> depend considerably on how expensive the proposed solution is ...


Would the topic merit discussion in pgsql-hackers after the dust
from the 8.0 release settles down? I know little of the theory
behind query planning; I'd hate to waste the developers' time on a
topic that's already been debated or that has little merit.

If the topic is worthwhile, then I was thinking of a configuration
setting that would allow the user to request either "the plan most
likely to be the fastest" or "the plan least likely to be the slowest,"
or maybe something in between.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-10-2008, 09:27 AM
Fahad G.
 
Posts: n/a
Default Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

Tom,

You're right. Here's what explain says:

hpc=> explain SELECT fetchtime, curr_walltime FROM jobstat_lc_q4_2004 WHERE
jobid = 213213 ORDER BY fetchtime DESC;
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------
Sort (cost=107726.01..107801.53 rows=30205 width=12)
Sort Key: fetchtime
-> Index Scan using jobstat_lc_q4_2004_jobid on jobstat_lc_q4_2004
(cost=0.00..105478.38 rows=30205 width=12)
Index Cond: (jobid = 213213)
(4 rows)


And with LIMIT 1, I get:

hpc=> explain SELECT fetchtime, curr_walltime FROM jobstat_lc_q4_2004 WHERE
jobid = 213213 ORDER BY fetchtime DESC LIMIT 1;
QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------------------
Limit (cost=0.00..600.14 rows=1 width=12)
-> Index Scan Backward using jobstat_lc_q4_2004_fetchtime on
jobstat_lc_q4_2004 (cost=0.00..18127339.29 rows=30205 width=12)
Filter: (jobid = 213213)
(3 rows)

Is there some way to fix this problem? I don't see why adding LIMIT 1 should
choose the wrong index. Thanks,

Fahad


On 15/1/05 3:31 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

> "Fahad G." <Fahad.Gilani@anusf.anu.edu.au> writes:
>> -- Indexes
>> CREATE INDEX jobstat_lc_q4_2004_jobid ON jobstat_lc_q4_2004 USING btree
>> (jobid);
>> CREATE INDEX jobstat_lc_q4_2004_fetchtime ON jobstat_lc_q4_2004 USING btree
>> (fetchtime);
>> CREATE UNIQUE INDEX jobstat_lc_q4_2004_walltime ON
>> unq_jobstat_lc_q4_2004_jobid_fetch USING btree (jobid, fetchtime);

>
> I bet it's choosing the wrong index. What does EXPLAIN show in each
> case?
>
> regards, tom lane


--
main(){int j=12345;char t[]=":aAbcdefFgGhijklmnNopqrsStuUvwyz \n",
*i="dUGScUiAbpmwqbmgduAvpmmlzce\nlmGGUbFbzjdb";whi le(*i){j+=
strchr(t,*i++)-t;j%=sizeof t-1;putchar(t[j]);}return 0;}




---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-10-2008, 09:27 AM
Fahad G.
 
Posts: n/a
Default Re: BUG #1393: Adding 'LIMIT 1' to the query halts forever

Michael,


On 16/1/05 12:48 AM, "Michael Fuhr" <mike@fuhr.org> wrote:

> On Fri, Jan 14, 2005 at 11:31:05PM -0500, Tom Lane wrote:
>> "Fahad G." <Fahad.Gilani@anusf.anu.edu.au> writes:
>>> -- Indexes
>>> CREATE INDEX jobstat_lc_q4_2004_jobid ON jobstat_lc_q4_2004 USING btree
>>> (jobid);
>>> CREATE INDEX jobstat_lc_q4_2004_fetchtime ON jobstat_lc_q4_2004 USING btree
>>> (fetchtime);
>>> CREATE UNIQUE INDEX jobstat_lc_q4_2004_walltime ON
>>> unq_jobstat_lc_q4_2004_jobid_fetch USING btree (jobid, fetchtime);

>
> The last index is created on a different table -- should it be
> created on the table we're working with? And if so, are the columns
> (jobid, fetchtime) correct? The index name suggests otherwise.
>


I'm sorry. My mistake again while copying indexes from the log. The index
is:

CREATE UNIQUE INDEX unq_jobstat_lc_q4_2004_jobid_fetch ON jobstat_lc_q4_2004
USING btree( jobid, fetchtime);

So basically the unique index is on the same table.

Regards,
Fahad
--
main(){int j=12345;char t[]=":aAbcdefFgGhijklmnNopqrsStuUvwyz \n",
*i="dUGScUiAbpmwqbmgduAvpmmlzce\nlmGGUbFbzjdb";whi le(*i){j+=
strchr(t,*i++)-t;j%=sizeof t-1;putchar(t[j]);}return 0;}




---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:13 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com