Unix Technical Forum

Postgres Optimizer is not smart enough?

This is a discussion on Postgres Optimizer is not smart enough? within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi All, Here is my test comparison between Postgres (7.3.2) optimizer vs Oracle (10g) optimizer. It seems to me ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 10:53 AM
Litao Wu
 
Posts: n/a
Default Postgres Optimizer is not smart enough?

Hi All,

Here is my test comparison between Postgres (7.3.2)
optimizer vs Oracle (10g) optimizer.

It seems to me that Postgres optimizer is not smart
enough.

Did I miss anything?

Thanks,

In Postgres:
============
drop table test;
create table test (
module character varying(50),
action_deny integer,
created timestamp with time zone,
customer_id integer,
domain character varying(255));
create or replace function insert_rows () returns
integer as '
BEGIN
for i in 1 .. 500000 loop
insert into test values (i, 2, now(), 100, i);
end loop;
return 1;
END;
' LANGUAGE 'plpgsql';

select insert_rows();

create index test_id1 on test (customer_id, created,
domain);

analyze test;

explain analyze
SELECT module, sum(action_deny)
FROM test
WHERE created >= ('now'::timestamptz - '1
day'::interval) AND customer_id='100'
AND domain='100'
GROUP BY module;


QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3.12..3.13 rows=1 width=9) (actual
time=91.05..91.05 rows=1 loops=1)
-> Group (cost=3.12..3.12 rows=1 width=9) (actual
time=91.04..91.04 rows=1 loops=1)
-> Sort (cost=3.12..3.12 rows=1 width=9)
(actual time=91.03..91.03 rows=1 loops=1)
Sort Key: module
-> Index Scan using test_id1 on test
(cost=0.00..3.11 rows=1 width=9) (actual
time=0.03..91.00 rows=1 loops=1)
Index Cond: ((customer_id = 100)
AND (created >= '2005-01-11
14:48:44.832552-07'::timestamp with time zone) AND
("domain" = '100'::character varying))
Total runtime: 91.13 msec
(7 rows)

create index test_id2 on test(domain);
analyze test;

explain analyze
SELECT module, sum(action_deny)
FROM test
WHERE created >= ('now'::timestamptz - '1
day'::interval) AND customer_id='100'
AND domain='100'
GROUP BY module;


QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3.12..3.13 rows=1 width=9) (actual
time=90.30..90.30 rows=1 loops=1)
-> Group (cost=3.12..3.12 rows=1 width=9) (actual
time=90.29..90.30 rows=1 loops=1)
-> Sort (cost=3.12..3.12 rows=1 width=9)
(actual time=90.29..90.29 rows=1 loops=1)
Sort Key: module
-> Index Scan using test_id1 on test
(cost=0.00..3.11 rows=1 width=9) (actual
time=0.03..90.25 rows=1 loops=1)
Index Cond: ((customer_id = 100)
AND (created >= '2005-01-11
14:51:09.555974-07'::timestamp with time zone) AND
("domain" = '100'::character varying))
Total runtime: 90.38 msec
(7 rows)

WHY PG STILL CHOOSE INDEX test_id1???
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BECAUSE QUERY WILL RUN MUCH FASTER USING test_id2!!!
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^

drop index test_id1;
explain analyze
SELECT module, sum(action_deny)
FROM test
WHERE created >= ('now'::timestamptz - '1
day'::interval) AND customer_id='100'
AND domain='100'
GROUP BY module;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3.12..3.13 rows=1 width=9) (actual
time=0.08..0.08 rows=1 loops=1)
-> Group (cost=3.12..3.13 rows=1 width=9) (actual
time=0.08..0.08 rows=1 loops=1)
-> Sort (cost=3.12..3.13 rows=1 width=9)
(actual time=0.07..0.07 rows=1 loops=1)
Sort Key: module
-> Index Scan using test_id2 on test
(cost=0.00..3.11 rows=1 width=9) (actual
time=0.04..0.05 rows=1 loops=1)
Index Cond: ("domain" =
'100'::character varying)
Filter: ((created >= '2005-01-11
14:53:58.806364-07'::timestamp with time zone) AND
(customer_id = 100))
Total runtime: 0.14 msec
(8 rows)

In Oracle:
==========
drop table test;
create table test (
module character varying(50),
action_deny integer,
created timestamp with time zone,
customer_id integer,
domain character varying(255));

begin
for i in 1..500000 loop
insert into test values (i, 2, current_timestamp,
100, i);
end loop;
end;
/

create index test_id1 on test (customer_id, created,
domain);

analyze table test compute statistics;

set autot on
set timing on

SELECT module, sum(action_deny)
FROM test
WHERE created >= (current_timestamp - interval '1'
day) AND customer_id=100
AND domain='100'
GROUP BY module
/

MODULE
SUM(ACTION_DENY)
--------------------------------------------------
----------------
100
2

Elapsed: 00:00:00.67

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=25
Card=1 Bytes=29
)

1 0 SORT (GROUP BY) (Cost=25 Card=1 Bytes=29)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'TEST'
(TABLE) (Cost=24
Card=1 Bytes=29)

3 2 INDEX (RANGE SCAN) OF 'TEST_ID1'
(INDEX) (Cost=23 Card
=4500)





Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2292 consistent gets
2291 physical reads
0 redo size
461 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed

create index test_id2 on test (domain);

SELECT module, sum(action_deny)
FROM test
WHERE created >= (current_timestamp - interval '1'
day) AND customer_id=100
AND domain='100'
GROUP BY module
/

MODULE
SUM(ACTION_DENY)
--------------------------------------------------
----------------
100
2

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5
Card=1 Bytes=29)
1 0 SORT (GROUP BY) (Cost=5 Card=1 Bytes=29)
2 1 TABLE ACCESS (BY INDEX ROWID) OF 'TEST'
(TABLE) (Cost=4
Card=1 Bytes=29)

3 2 INDEX (RANGE SCAN) OF 'TEST_ID2'
(INDEX) (Cost=3 Card=
1)





Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
461 bytes sent via SQL*Net to client
508 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed





__________________________________
Do you Yahoo!?
All your favorites on one personal page – Try My Yahoo!
http://my.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 10:53 AM
Tom Lane
 
Posts: n/a
Default Re: Postgres Optimizer is not smart enough?

Mark Kirkwood <markir@coretech.co.nz> writes:
> Hmmm ... so it's only the selectivity that is the same (sourced from
> index->amcostestimate which I am guessing points to btcostestimate), is
> that correct?


No, the point is that btcostestimate will compute not only the same
selectivities but the identical index access cost values, because it
thinks that only one index entry will be fetched in both cases. It
needs to account for the fact that the inequality condition will cause a
scan over a larger range of the index than is actually returned. See
_bt_preprocess_keys() and _bt_checkkeys().

The small differences you are showing have to do with different
assumptions about where the now() function will get evaluated (once per
row or once at scan start). That's not the effect that I'm worried
about.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 10:53 AM
Tom Lane
 
Posts: n/a
Default Re: Postgres Optimizer is not smart enough?

Mark Kirkwood <markir@coretech.co.nz> writes:
> I happen to have some debugging code enabled for the optimizer, and the
> issue appears to be that the costs of paths using these indexes are
> quite similar, so are quite sensitive to (some) parameter values.


They'll be exactly the same, actually, as long as the thing predicts
exactly one row retrieved. So it's quasi-random which plan you get.

btcostestimate needs to be improved to understand that in multicolumn
index searches with inequality conditions, we may have to scan through
tuples that don't meet all the qualifications. It's not accounting for
that cost at the moment, which is why the estimates are the same.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:34 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com