Unix Technical Forum

Some issues with planner and query optimization

This is a discussion on Some issues with planner and query optimization within the Pgsql General forums, part of the PostgreSQL category; --> Postgres 8.1 Sample test table: (all queries done on fresh vacuumed analyzed table with statistics on rub_id and news_dtime ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 02:49 PM
Boguk Maxim
 
Posts: n/a
Default Some issues with planner and query optimization

Postgres 8.1
Sample test table:
(all queries done on fresh vacuumed analyzed table with statistics on
rub_id and news_dtime set to 1000)
(all table in memory and server do not doing anything other)

media=> \d test_table
Table "public.test_table"
Column | Type | Modifiers
------------+-----------------------------+-----------
id | integer |
rub_id | integer |
news_id | integer |
news_dtime | timestamp without time zone |
Indexes:
"test_table_pk" UNIQUE, btree (id)
"test_table_main_idx" btree (rub_id, news_dtime)

media=> select count(*) from test_table;
count
---------
5834463
media=> select count(distinct rub_id) from test_table;
count
-------
342

Now doing 3 simple query:

First:
media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5)
order by news_dtime limit 20;
QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
Limit (cost=0.00..10.73 rows=20 width=20) (actual time=0.018..0.121
rows=20 loops=1)
-> Index Scan using test_table_main_idx on test_table
(cost=0.00..29758.11 rows=55447 width=20) (actual time=0.014..0.054
rows=20 loops=1)
Index Cond: (rub_id = 5)
Total runtime: 0.186 ms

Second (almost same but with rub_id 8):
media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (8)
order by news_dtime limit 20;
QUERY
PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-
Limit (cost=0.00..1.98 rows=20 width=20) (actual time=0.019..0.121
rows=20 loops=1)
-> Index Scan using test_table_main_idx on test_table
(cost=0.00..45976.37 rows=463684 width=20) (actual time=0.014..0.054
rows=20 loops=1)
Index Cond: (rub_id = 8)
Total runtime: 0.186 ms


Now try with rub_id IN (5,8) (I was assumed query will work 2-10 time
longer max... With almost same plan)
But i'm got bad plan/really slow query:

media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5,8)
order by news_dtime limit 20;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------
Limit (cost=103337.45..103337.50 rows=20 width=20) (actual
time=4437.841..4437.976 rows=20 loops=1)
-> Sort (cost=103337.45..104624.26 rows=514725 width=20) (actual
time=4437.836..4437.873 rows=20 loops=1)
Sort Key: news_dtime
-> Bitmap Heap Scan on test_table (cost=3818.96..54506.92
rows=514725 width=20) (actual time=82.139..1100.021 rows=515340 loops=1)
Recheck Cond: ((rub_id = 5) OR (rub_id = 8))
-> BitmapOr (cost=3818.96..3818.96 rows=519131 width=0)
(actual time=80.498..80.498 rows=0 loops=1)
-> Bitmap Index Scan on test_table_main_idx
(cost=0.00..409.06 rows=55447 width=0) (actual time=8.342..8.342
rows=54959 loops=1)
Index Cond: (rub_id = 5)
-> Bitmap Index Scan on test_table_main_idx
(cost=0.00..3409.89 rows=463684 width=0) (actual time=72.146..72.146
rows=460381 loops=1)
Index Cond: (rub_id = 8)
Total runtime: 4458.999 ms
(11 rows)

Ouch.... 25000 slower...
Why planner not try two index scan and merge results...
Only one workaround wich i found was use query like:

media=> EXPLAIN ANALYZE
media-> (select * from test_table where rub_id IN (8) order by
news_dtime limit 20)
media-> UNION
media-> (select * from test_table where rub_id IN (5) order by
news_dtime limit 20)
media-> ORDER BY news_dtime LIMIT 20;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------------
Limit (cost=15.75..15.80 rows=20 width=20) (actual time=0.788..0.893
rows=20 loops=1)
-> Sort (cost=15.75..15.85 rows=40 width=20) (actual
time=0.784..0.826 rows=20 loops=1)
Sort Key: news_dtime
-> Unique (cost=14.18..14.68 rows=40 width=20) (actual
time=0.462..0.694 rows=40 loops=1)
-> Sort (cost=14.18..14.28 rows=40 width=20) (actual
time=0.458..0.527 rows=40 loops=1)
Sort Key: id, rub_id, news_id, news_dtime
-> Append (cost=0.00..13.12 rows=40 width=20)
(actual time=0.021..0.370 rows=40 loops=1)
-> Limit (cost=0.00..1.98 rows=20 width=20)
(actual time=0.017..0.118 rows=20 loops=1)
-> Index Scan using
test_table_main_idx on test_table (cost=0.00..45976.37 rows=463684
width=20) (actual time=0.014..0.052 rows=20 loops=1)
Index Cond: (rub_id = 8)
-> Limit (cost=0.00..10.73 rows=20
width=20) (actual time=0.012..0.114 rows=20 loops=1)
-> Index Scan using
test_table_main_idx on test_table (cost=0.00..29758.11 rows=55447
width=20) (actual time=0.010..0.050 rows=20 loops=1)
Index Cond: (rub_id = 5)
Total runtime: 0.987 ms
(14 rows)

Performance look good.
But writing query for 10-20 rub_id inside IN (list) is pain.
I tested and UNION work faster even with 100 rub_id inside IN (list).

So any way rewrite query for get fast results without using long UNION
lists?
Or any way force postgres use different plan for query?



PS: trying disable bitmap_scan lead to seq_scan:

media=> set enable_bitmapscan=off;
SET
media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5,8)
order by news_dtime limit 20;
QUERY PLAN

------------------------------------------------------------------------
-------------------------------------------------------------
Limit (cost=179248.47..179248.52 rows=20 width=20) (actual
time=6815.917..6816.014 rows=20 loops=1)
-> Sort (cost=179248.47..180535.28 rows=514725 width=20) (actual
time=6815.912..6815.945 rows=20 loops=1)
Sort Key: news_dtime
-> Seq Scan on test_table (cost=0.00..130417.94 rows=514725
width=20) (actual time=549.371..3529.336 rows=515340 loops=1)
Filter: ((rub_id = 5) OR (rub_id = 8))
Total runtime: 6838.502 ms
(6 rows)


Regars, Maxim Boguk
astar@rambler-co.ru | www.rambler.ru


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 02:50 PM
Richard Huxton
 
Posts: n/a
Default Re: Some issues with planner and query optimization

Boguk Maxim wrote:
> Postgres 8.1
> Sample test table:
> (all queries done on fresh vacuumed analyzed table with statistics on
> rub_id and news_dtime set to 1000)
> (all table in memory and server do not doing anything other)
>
> media=> \d test_table
> Table "public.test_table"
> Column | Type | Modifiers
> ------------+-----------------------------+-----------
> id | integer |
> rub_id | integer |
> news_id | integer |
> news_dtime | timestamp without time zone |
> Indexes:
> "test_table_pk" UNIQUE, btree (id)
> "test_table_main_idx" btree (rub_id, news_dtime)
>
> media=> select count(*) from test_table;
> count
> ---------
> 5834463
> media=> select count(distinct rub_id) from test_table;
> count
> -------
> 342
>
> Now doing 3 simple query:
>
> First:
> media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5)
> order by news_dtime limit 20;
> QUERY
> PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> Limit (cost=0.00..10.73 rows=20 width=20) (actual time=0.018..0.121
> rows=20 loops=1)
> -> Index Scan using test_table_main_idx on test_table
> (cost=0.00..29758.11 rows=55447 width=20) (actual time=0.014..0.054
> rows=20 loops=1)
> Index Cond: (rub_id = 5)
> Total runtime: 0.186 ms
>
> Second (almost same but with rub_id 8):
> media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (8)
> order by news_dtime limit 20;
> QUERY
> PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -
> Limit (cost=0.00..1.98 rows=20 width=20) (actual time=0.019..0.121
> rows=20 loops=1)
> -> Index Scan using test_table_main_idx on test_table
> (cost=0.00..45976.37 rows=463684 width=20) (actual time=0.014..0.054
> rows=20 loops=1)
> Index Cond: (rub_id = 8)
> Total runtime: 0.186 ms
>
>
> Now try with rub_id IN (5,8) (I was assumed query will work 2-10 time
> longer max... With almost same plan)
> But i'm got bad plan/really slow query:
>
> media=> EXPLAIN ANALYZE select * from test_table where rub_id IN (5,8)
> order by news_dtime limit 20;
>
> QUERY PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------
> Limit (cost=103337.45..103337.50 rows=20 width=20) (actual
> time=4437.841..4437.976 rows=20 loops=1)
> -> Sort (cost=103337.45..104624.26 rows=514725 width=20) (actual
> time=4437.836..4437.873 rows=20 loops=1)
> Sort Key: news_dtime
> -> Bitmap Heap Scan on test_table (cost=3818.96..54506.92
> rows=514725 width=20) (actual time=82.139..1100.021 rows=515340 loops=1)
> Recheck Cond: ((rub_id = 5) OR (rub_id = 8))
> -> BitmapOr (cost=3818.96..3818.96 rows=519131 width=0)
> (actual time=80.498..80.498 rows=0 loops=1)
> -> Bitmap Index Scan on test_table_main_idx
> (cost=0.00..409.06 rows=55447 width=0) (actual time=8.342..8.342
> rows=54959 loops=1)
> Index Cond: (rub_id = 5)
> -> Bitmap Index Scan on test_table_main_idx
> (cost=0.00..3409.89 rows=463684 width=0) (actual time=72.146..72.146
> rows=460381 loops=1)
> Index Cond: (rub_id = 8)
> Total runtime: 4458.999 ms
> (11 rows)
>
> Ouch.... 25000 slower...
> Why planner not try two index scan and merge results...


Try: ORDER BY rub_id, news_dtime
Does that give it enough of a hint?
The problem is you're asking for the 20 oldest regardless of rub_id, so
the index isn't as much use as it might be.

Perhaps an index on (news_dtime,rub_id) rather than the other way around?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:39 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com