Unix Technical Forum

Expression index ignores column statistics target

This is a discussion on Expression index ignores column statistics target within the pgsql Hackers forums, part of the PostgreSQL category; --> I've noticed that row count estimates for expression indexes appear to rely on default_statistics_target rather than on a column's ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 07:01 AM
Michael Fuhr
 
Posts: n/a
Default Expression index ignores column statistics target

I've noticed that row count estimates for expression indexes appear
to rely on default_statistics_target rather than on a column's
actual statistics target. That is, if I use ALTER TABLE SET
STATISTICS to increase a column's statistics target and then run
ANALYZE, then estimates for non-expression-index queries improve
as expected. However, queries that use an expression index remain
accurate for only around the N most common values, where N is the
default_statistics_target that was in effect when ANALYZE ran. I'm
still rummaging through the archives looking for past discussion;
is this behavior a known limitation or just an oversight?

CREATE TABLE foo (x integer);

CREATE INDEX foo_x_idx ON foo (x);
CREATE INDEX foo_abs_x_idx ON foo (abs(x));

INSERT INTO foo (x)
SELECT r1 % r2
FROM generate_series(1, 100) AS g1(r1),
generate_series(1, 100) AS g2(r2);

SET default_statistics_target TO 15;
ALTER TABLE foo ALTER COLUMN x SET STATISTICS 20;
ANALYZE foo;

SELECT most_common_vals FROM pg_stats WHERE attname = 'x';
most_common_vals
-----------------------------------------------------
{0,1,2,3,4,5,6,7,8,10,9,11,12,14,13,15,16,19,17,18 }
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 13;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=2.72..50.28 rows=205 width=4) (actual time=0.370..1.766 rows=220 loops=1)
Recheck Cond: (x = 13)
-> Bitmap Index Scan on foo_x_idx (cost=0.00..2.72 rows=205 width=0) (actual time=0.314..0.314 rows=220 loops=1)
Index Cond: (x = 13)
Total runtime: 2.905 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 13;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=2.72..50.80 rows=205 width=4) (actual time=0.358..1.720 rows=220 loops=1)
Recheck Cond: (abs(x) = 13)
-> Bitmap Index Scan on foo_abs_x_idx (cost=0.00..2.72 rows=205 width=0) (actual time=0.305..0.305 rows=220 loops=1)
Index Cond: (abs(x) = 13)
Total runtime: 2.875 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 18;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=2.60..49.75 rows=172 width=4) (actual time=0.312..1.442 rows=180 loops=1)
Recheck Cond: (x = 18)
-> Bitmap Index Scan on foo_x_idx (cost=0.00..2.60 rows=172 width=0) (actual time=0.262..0.262 rows=180 loops=1)
Index Cond: (x = 18)
Total runtime: 2.393 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 18;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foo (cost=2.22..43.65 rows=63 width=4) (actual time=0.313..1.436 rows=180 loops=1)
Recheck Cond: (abs(x) = 18)
-> Bitmap Index Scan on foo_abs_x_idx (cost=0.00..2.22 rows=63 width=0) (actual time=0.263..0.263 rows=180 loops=1)
Index Cond: (abs(x) = 18)
Total runtime: 2.418 ms
(5 rows)

--
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 07:01 AM
Bruce Momjian
 
Posts: n/a
Default Re: Expression index ignores column statistics target


This is expected. The main TODO items is:

* Allow accurate statistics to be collected on indexes with more than
one column or expression indexes, perhaps using per-index statistics

Basically, we don't have multi-column or expression statistics. ANALYZE
just analyzes columns, even if an expression index exists.

---------------------------------------------------------------------------

Michael Fuhr wrote:
> I've noticed that row count estimates for expression indexes appear
> to rely on default_statistics_target rather than on a column's
> actual statistics target. That is, if I use ALTER TABLE SET
> STATISTICS to increase a column's statistics target and then run
> ANALYZE, then estimates for non-expression-index queries improve
> as expected. However, queries that use an expression index remain
> accurate for only around the N most common values, where N is the
> default_statistics_target that was in effect when ANALYZE ran. I'm
> still rummaging through the archives looking for past discussion;
> is this behavior a known limitation or just an oversight?
>
> CREATE TABLE foo (x integer);
>
> CREATE INDEX foo_x_idx ON foo (x);
> CREATE INDEX foo_abs_x_idx ON foo (abs(x));
>
> INSERT INTO foo (x)
> SELECT r1 % r2
> FROM generate_series(1, 100) AS g1(r1),
> generate_series(1, 100) AS g2(r2);
>
> SET default_statistics_target TO 15;
> ALTER TABLE foo ALTER COLUMN x SET STATISTICS 20;
> ANALYZE foo;
>
> SELECT most_common_vals FROM pg_stats WHERE attname = 'x';
> most_common_vals
> -----------------------------------------------------
> {0,1,2,3,4,5,6,7,8,10,9,11,12,14,13,15,16,19,17,18 }
> (1 row)
>
> EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 13;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on foo (cost=2.72..50.28 rows=205 width=4) (actual time=0.370..1.766 rows=220 loops=1)
> Recheck Cond: (x = 13)
> -> Bitmap Index Scan on foo_x_idx (cost=0.00..2.72 rows=205 width=0) (actual time=0.314..0.314 rows=220 loops=1)
> Index Cond: (x = 13)
> Total runtime: 2.905 ms
> (5 rows)
>
> EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 13;
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on foo (cost=2.72..50.80 rows=205 width=4) (actual time=0.358..1.720 rows=220 loops=1)
> Recheck Cond: (abs(x) = 13)
> -> Bitmap Index Scan on foo_abs_x_idx (cost=0.00..2.72 rows=205 width=0) (actual time=0.305..0.305 rows=220 loops=1)
> Index Cond: (abs(x) = 13)
> Total runtime: 2.875 ms
> (5 rows)
>
> EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 18;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on foo (cost=2.60..49.75 rows=172 width=4) (actual time=0.312..1.442 rows=180 loops=1)
> Recheck Cond: (x = 18)
> -> Bitmap Index Scan on foo_x_idx (cost=0.00..2.60 rows=172 width=0) (actual time=0.262..0.262 rows=180 loops=1)
> Index Cond: (x = 18)
> Total runtime: 2.393 ms
> (5 rows)
>
> EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 18;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on foo (cost=2.22..43.65 rows=63 width=4) (actual time=0.313..1.436 rows=180 loops=1)
> Recheck Cond: (abs(x) = 18)
> -> Bitmap Index Scan on foo_abs_x_idx (cost=0.00..2.22 rows=63 width=0) (actual time=0.263..0.263 rows=180 loops=1)
> Index Cond: (abs(x) = 18)
> Total runtime: 2.418 ms
> (5 rows)
>
> --
> Michael Fuhr
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 07:02 AM
Tom Lane
 
Posts: n/a
Default Re: Expression index ignores column statistics target

Michael Fuhr <mike@fuhr.org> writes:
> I've noticed that row count estimates for expression indexes appear
> to rely on default_statistics_target rather than on a column's
> actual statistics target. That is, if I use ALTER TABLE SET
> STATISTICS to increase a column's statistics target and then run
> ANALYZE, then estimates for non-expression-index queries improve
> as expected. However, queries that use an expression index remain
> accurate for only around the N most common values, where N is the
> default_statistics_target that was in effect when ANALYZE ran.


The code does in fact honor per-column statistics targets attached to
expression indexes, viz

alter table myfuncindex alter column pg_expression_1 set statistics 100;

This isn't documented, mainly because pg_dump doesn't promise to dump
such things, which it doesn't do because I didn't want to see the
"pg_expression_N" naming for expression index columns become graven on
stone tablets. I seem to recall bringing up the question of whether
we could find a less implementation-specific way of commanding this
behavior, but I can't find it in the archives right now.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-11-2008, 07:02 AM
Michael Fuhr
 
Posts: n/a
Default Re: Expression index ignores column statistics target

On Fri, Sep 30, 2005 at 11:59:26PM -0400, Bruce Momjian wrote:
> This is expected. The main TODO items is:
>
> * Allow accurate statistics to be collected on indexes with more than
> one column or expression indexes, perhaps using per-index statistics
>
> Basically, we don't have multi-column or expression statistics. ANALYZE
> just analyzes columns, even if an expression index exists.


But the row count estimates imply that expression index queries do
use column statistics, presumably as a proxy in the absence of
expression statistics. This looks like a relevant commit:

http://archives.postgresql.org/pgsql...2/msg00124.php

The behavior I observed is that the planner does appear to use
column statistics when planning an expression index query, but it
doesn't appear to honor a column's non-default statistics target.
In other words:

* Row count estimates for expression index queries (at least simple
ones) are reasonably accurate for the N most common column values,
where N is the value of default_statistics_target when ANALYZE
was run.

* Specifically setting the column's statistics target with ALTER
TABLE SET STATISTICS doesn't result in better statistics for
expression index queries.

That difference in behavior seems odd: if default_statistics_target
has an effect, why doesn't ALTER TABLE SET STATISTICS?

--
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-11-2008, 07:02 AM
Michael Fuhr
 
Posts: n/a
Default Re: Expression index ignores column statistics target

On Sat, Oct 01, 2005 at 12:53:03AM -0400, Tom Lane wrote:
> The code does in fact honor per-column statistics targets attached to
> expression indexes, viz
>
> alter table myfuncindex alter column pg_expression_1 set statistics 100;


Aha -- that's the piece I didn't know about. I was wondering where
those statistics were being stored, since they were affected by
default_statistics_target but not by per-column statistics targets.
And now I see them when I don't restrict queries against pg_stats
by just the table or column name. Thanks.

--
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-11-2008, 07:02 AM
Tom Lane
 
Posts: n/a
Default Re: Expression index ignores column statistics target

I wrote:
> I seem to recall bringing up the question of whether
> we could find a less implementation-specific way of commanding this
> behavior, but I can't find it in the archives right now.


Ah, here it is:
http://archives.postgresql.org/pgsql...3/msg00502.php

No responses :-(

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-11-2008, 07:02 AM
Michael Fuhr
 
Posts: n/a
Default Re: Expression index ignores column statistics target

On Sat, Oct 01, 2005 at 02:19:06AM -0400, Tom Lane wrote:
> I wrote:
> > I seem to recall bringing up the question of whether
> > we could find a less implementation-specific way of commanding this
> > behavior, but I can't find it in the archives right now.

>
> Ah, here it is:
> http://archives.postgresql.org/pgsql...3/msg00502.php
>
> No responses :-(


Would an ALTER INDEX SET STATISTICS form be possible?

--
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-11-2008, 07:02 AM
Tom Lane
 
Posts: n/a
Default Re: Expression index ignores column statistics target

Michael Fuhr <mike@fuhr.org> writes:
> On Sat, Oct 01, 2005 at 02:19:06AM -0400, Tom Lane wrote:
>> Ah, here it is:
>> http://archives.postgresql.org/pgsql...3/msg00502.php


> Would an ALTER INDEX SET STATISTICS form be possible?


It's not so much the table/index misnomer that's bothering me, it's
the lack of a clean way to identify which column of the index you
are talking about.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-11-2008, 07:02 AM
Michael Fuhr
 
Posts: n/a
Default Re: Expression index ignores column statistics target

On Sat, Oct 01, 2005 at 02:42:32AM -0400, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > Would an ALTER INDEX SET STATISTICS form be possible?

>
> It's not so much the table/index misnomer that's bothering me, it's
> the lack of a clean way to identify which column of the index you
> are talking about.


Ah, I see -- I wasn't thinking about expressions in multicolumn
indexes. What about identifying the column with the expression
itself, ala quote_ident(pg_get_indexdef())? That might be tedious
for the user to type but it would be attractive from a self-documentation
standpoint.

ALTER INDEX indexname ALTER COLUMN "the expression" SET STATISTICS 100;

I do see that indexes allow multiple instances of the same expression,
so this approach could be ambiguous. Or should such repetition be
prohibited as it is with column names?

test=> CREATE TABLE foo (x integer);
CREATE TABLE
test=> CREATE INDEX foo1_idx ON foo (x, x);
ERROR: duplicate key violates unique constraint "pg_attribute_relid_attnam_index"
test=> CREATE INDEX foo2_idx ON foo (abs(x), abs(x));
CREATE INDEX

--
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-11-2008, 07:02 AM
Bruce Momjian
 
Posts: n/a
Default Re: Expression index ignores column statistics target

Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > I've noticed that row count estimates for expression indexes appear
> > to rely on default_statistics_target rather than on a column's
> > actual statistics target. That is, if I use ALTER TABLE SET
> > STATISTICS to increase a column's statistics target and then run
> > ANALYZE, then estimates for non-expression-index queries improve
> > as expected. However, queries that use an expression index remain
> > accurate for only around the N most common values, where N is the
> > default_statistics_target that was in effect when ANALYZE ran.

>
> The code does in fact honor per-column statistics targets attached to
> expression indexes, viz
>
> alter table myfuncindex alter column pg_expression_1 set statistics 100;
>
> This isn't documented, mainly because pg_dump doesn't promise to dump
> such things, which it doesn't do because I didn't want to see the
> "pg_expression_N" naming for expression index columns become graven on
> stone tablets. I seem to recall bringing up the question of whether
> we could find a less implementation-specific way of commanding this
> behavior, but I can't find it in the archives right now.


Is this a TODO?

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:52 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com