Unix Technical Forum

Poor index choice -- multiple indexes of the same columns

This is a discussion on Poor index choice -- multiple indexes of the same columns within the Pgsql Performance forums, part of the PostgreSQL category; --> Postgresql 8.0.3 Hi, I have a query select 1 from census where date < '1975-9-21' and sname = 'RAD' ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 11:57 AM
Karl O. Pinc
 
Posts: n/a
Default Poor index choice -- multiple indexes of the same columns

Postgresql 8.0.3

Hi,

I have a query

select 1
from census
where date < '1975-9-21' and sname = 'RAD' and status != 'A'
limit 1;

Explain analyze says it always uses the index made by:

CREATE INDEX census_date_sname ON census (date, sname);

this is even after I made the index:

CREATE INDEX census_sname_date ON census (sname, date);

I made census_sname_date because it ran too slow. By deleting
census_date_sname (temporarly, because my apps don't like this)
I can force the use of census_sname_date and the query runs fine.

Seems to me that when there's a constant value in the query
and an = comparision it will always be faster to use the (b-tree)
index that's ordered first by the constant value, as then all further
blocks are guarenteed to have a higher relevant information
density. At least when compared with another index that has the
same columns in it.

As you might imagine there are relatively few sname values and
relatively many date values in my data. I use a query like the
above in a trigger to enforce bounds limitations. I don't
expect (want) to find any rows returned.

I've figured out how to avoid executing this code very often,
so this is not presently a serious problem for me.

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 11:57 AM
Josh Berkus
 
Posts: n/a
Default Re: Poor index choice -- multiple indexes of the same columns

Karl,

> Seems to me that when there's a constant value in the query
> and an = comparision it will always be faster to use the (b-tree)
> index that's ordered first by the constant value, as then all further
> blocks are guarenteed to have a higher relevant information
> density. At least when compared with another index that has the
> same columns in it.


That really depends on the stats. Such a choice would *not* be
appropriate if the < comparison was expected to return 1- rows while the =
condition applied to 15% of the table.

What is your STATISTICS_TARGET for the relevant columns set to? When's
the last time you ran analyze? If this is all updated, you want to post
the pg_stats rows for the relevant columns?

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 11:57 AM
Karl O. Pinc
 
Posts: n/a
Default Re: Poor index choice -- multiple indexes of the same


On 06/27/2005 05:37:41 PM, Josh Berkus wrote:
> Karl,
>
> > Seems to me that when there's a constant value in the query
> > and an = comparision it will always be faster to use the (b-tree)
> > index that's ordered first by the constant value, as then all

> further
> > blocks are guarenteed to have a higher relevant information
> > density. At least when compared with another index that has the
> > same columns in it.

>
> That really depends on the stats. Such a choice would *not* be
> appropriate if the < comparison was expected to return 1- rows while
> the =
> condition applied to 15% of the table.


We're talking internals here so I don't know what I'm talking
about, but, when the = comparison returns 15% of the table
you can find your way straight to the 1- (sic) relevent rows
because that 15% is further sorted by the second column of the
index. So that's one disk read and after that when you scan
the rest of the blocks every datum is relevant/returned.
So your scan will pass through fewer disk blocks. The only
case that would make sense to consider using the other
index is if the planner knew it could
get the answer in 1 disk read, in which case it should be
able to get the answer out of either index in one disk read
as both indexes are on the same columns.

> What is your STATISTICS_TARGET for the relevant columns set to?


STATISTICS_TARGET is the default, which I read as 10 the docs.

> When's
> the last time you ran analyze?


I'm doing this in a torture test script, loading data.
Every fibnocci number of rows * 100 I VACCUM ANALYZE.
So, 100, 200, 300, 500, 800, etc.

Just for grins I've created the index I'd like it to use
and run VACUUM ANALYZE and shown the EXPLAIN ANALYZE below.

> If this is all updated, you want to
> post
> the pg_stats rows for the relevant columns?


Pg_stats rows below. (I've tried to wrap the lines short
so as not to mess up anybody's mailer.)

# create index census_sname_date on census (sname, date);
CREATE INDEX
# vacuum analyze census;
VACUUM
# explain analyze select 1 from census where date < '1975-9-21'
and sname = 'RAD' and status != 'A' ;
QUERY
PLAN
---------------------------------------------------------------
---------------------------------------------------------------
----
Index Scan using census_date_sname on census (cost=0.00..2169.51
rows=1437 width=0) (actual time=40.610..40.610 rows=0 loops=1)
Index Cond: ((date < '1975-09-21'::date) AND (sname =
'RAD'::bpchar))
Filter: (status <> 'A'::bpchar)
Total runtime: 40.652 ms
(4 rows)

Compare with:

# drop index census_date_sname;
DROP INDEX
# explain analyze select date from census where sname = 'RAD'
and date < '1975-9-21' and status != 'A' limit 1;
QUERY
PLAN
-------------------------------------------------------------------
-------------------------------------------------------------------
Limit (cost=0.00..3.37 rows=1 width=4) (actual time=0.097..0.097
rows=0 loops=1)
-> Index Scan using census_sname_date on census
(cost=0.00..5203.95 rows=1544 width=4) (actual time=0.094..0.094
rows=0 loops=1)
Index Cond: ((sname = 'RAD'::bpchar) AND (date <
'1975-09-21'::date))
Filter: (status <> 'A'::bpchar)
Total runtime: 0.133 ms
(5 rows)




# select * from pg_stats where tablename = 'census' and (attname =
'sname' or attname = 'date');
schemaname | tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals | most_common_freqs | histogram_bounds |
correlation
------------+-----------+---------+-----------+-----------+-----------
-+--------------------------------------------------------------------
---------------------------------------------+------------------------
----------------------------------------------------------------------
--------------+-------------------------------------------------------
---------------------------------------------------------------------+
-------------
babase | census | date | 0 | 4 | 4687 |
{1979-02-01,1976-06-16,1977-03-23,1978-08-25,1979-09-20,1971-06-28
,1972-04-28,1972-08-27,1974-04-06,1975-03-19}
|
{0.002,0.00166667,0.00166667,0.00166667,0.00166667 ,0.00133333
,0.00133333,0.00133333,0.00133333,0.00133333}
|
{1959-07-15,1966-02-18,1969-02-22,1971-01-10,1972-07-26,1974-02-09
,1975-05-27,1976-07-28,1977-08-19,1978-08-07,1979-10-02}
| 1
babase | census | sname | 0 | 7 | 177 |
{MAX,ALT,PRE,COW,EST,JAN,RIN,ZUM,DUT,LUL} |
{0.0166667,0.015,0.015,0.0146667
,0.0143333,0.014,0.0136667,0.0136667,0.0133333,0.0 133333}
| {ALI,BUN,FAN,IBI,LER,NDO,PET,RUS,SLM,TOT,XEN} | 0.0446897
(2 rows)

Thanks.


Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 11:58 AM
Karl O. Pinc
 
Posts: n/a
Default Re: Poor index choice -- multiple indexes of the same


On 06/27/2005 09:36:51 PM, Karl O. Pinc wrote:

> I'm doing this in a torture test script, loading data.
> Every fibnocci number of rows * 100 I VACCUM ANALYZE.
> So, 100, 200, 300, 500, 800, etc.


(And of course disconnect my client and re-connect so
as to use the new statistics. sure would be nice if
I didn't have to do this.)

Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 11:58 AM
Tom Lane
 
Posts: n/a
Default Re: Poor index choice -- multiple indexes of the same columns

"Karl O. Pinc" <kop@meme.com> writes:
> I have a query


> select 1
> from census
> where date < '1975-9-21' and sname = 'RAD' and status != 'A'
> limit 1;


> Explain analyze says it always uses the index made by:


> CREATE INDEX census_date_sname ON census (date, sname);


> this is even after I made the index:


> CREATE INDEX census_sname_date ON census (sname, date);


I don't believe that any existing release can tell the difference
between these two indexes as far as costs go. I just recently
added some code to btcostestimate that would cause it to prefer
the index on (sname, date) but of course that's not released yet.

However: isn't the above query pretty seriously underspecified?
With a LIMIT and no ORDER BY, you are asking for a random one
of the rows matching the condition. I realize that with
"select 1" you may not care much, but adding a suitable ORDER BY
would help push the planner towards using the right index. In
this case "ORDER BY sname DESC, date DESC" would probably do the
trick.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 11:58 AM
Karl O. Pinc
 
Posts: n/a
Default Re: Poor index choice -- multiple indexes of the same


On 06/28/2005 01:40:56 AM, Tom Lane wrote:
> "Karl O. Pinc" <kop@meme.com> writes:
> > I have a query

>
> > select 1
> > from census
> > where date < '1975-9-21' and sname = 'RAD' and status != 'A'
> > limit 1;

>
> > Explain analyze says it always uses the index made by:

>
> > CREATE INDEX census_date_sname ON census (date, sname);

>
> > this is even after I made the index:

>
> > CREATE INDEX census_sname_date ON census (sname, date);

>
> I don't believe that any existing release can tell the difference
> between these two indexes as far as costs go. I just recently
> added some code to btcostestimate that would cause it to prefer
> the index on (sname, date) but of course that's not released yet.
>
> However: isn't the above query pretty seriously underspecified?
> With a LIMIT and no ORDER BY, you are asking for a random one
> of the rows matching the condition. I realize that with
> "select 1" you may not care much, but adding a suitable ORDER BY
> would help push the planner towards using the right index. In
> this case "ORDER BY sname DESC, date DESC" would probably do the
> trick.


Yes, that works. I'd already tried "ORDER BY date DESC", before
I first wrote, and that did not work. (I started with no LIMIT
either, and tried adding specifications until I gave up. It's
very good that the new planner will figure out things by itself.)
"ORDER BY sname DESC" works as well. This is a
bit odd, as with the constant in the = comparison "ORDER BY date
DESC" is the same as "ORDER BY sname DESC, date DESC".
I guess that's why I gave up on my attempts to get the planner
to use the (sname, date) index before I got to your solution.

Thanks everybody for the help.


Karl <kop@meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:59 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com