Unix Technical Forum

Re: [PERFORM] 8.2rc1 (much) slower than 8.2dev?

This is a discussion on Re: [PERFORM] 8.2rc1 (much) slower than 8.2dev? within the Pgsql Patches forums, part of the PostgreSQL category; --> Arjen van der Meijden <acmmailing@tweakers.net> writes: >> The file is attached. (bz)Grepping for 'Was executed on 8.2 final slower ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 09:10 AM
Tom Lane
 
Posts: n/a
Default Re: [PERFORM] 8.2rc1 (much) slower than 8.2dev?

Arjen van der Meijden <acmmailing@tweakers.net> writes:
>> The file is attached. (bz)Grepping for 'Was executed on 8.2 final slower
>> than 8.2 dev' with -A1 will show you the timecomparisons, which you
>> could than look up using your favorite editor.


I dug through this file, and it seems that all the cases where 8.2 final
is choosing a really markedly worse plan are instances of the same
query, for which 8.2 chose a nestloop plan with an inner indexscan on
a clustered index. 8.2 final is failing to choose that because in the
nestloop case it's not giving any cost credit for the index's
correlation. Obviously a clustered index should have very high
correlation. In the test case, half a dozen or so heap tuples need to
be fetched per index scan, and because of the correlation it's likely
they are all on the same heap page ... but the costing is assuming that
they are scattered randomly. I had punted on this point back in June
because it seemed too complicated to handle in combination with the
cross-scan caching, but obviously we need to do something. After
thinking a bit more, I propose the attached patch --- to be applied on
top of the other ones I sent you --- which seems to fix the problem
here. Please give it a try.

regards, tom lane


*** src/backend/optimizer/path/costsize.c.orig Fri Dec 8 15:58:34 2006
--- src/backend/optimizer/path/costsize.c Thu Dec 14 15:37:40 2006
***************
*** 276,288 ****
if (outer_rel != NULL && outer_rel->rows > 1)
{
/*
! * For repeated indexscans, scale up the number of tuples fetched in
* the Mackert and Lohman formula by the number of scans, so that we
! * estimate the number of pages fetched by all the scans. Then
* pro-rate the costs for one scan. In this case we assume all the
! * fetches are random accesses. XXX it'd be good to include
! * correlation in this model, but it's not clear how to do that
! * without double-counting cache effects.
*/
double num_scans = outer_rel->rows;

--- 276,287 ----
if (outer_rel != NULL && outer_rel->rows > 1)
{
/*
! * For repeated indexscans, the appropriate estimate for the
! * uncorrelated case is to scale up the number of tuples fetched in
* the Mackert and Lohman formula by the number of scans, so that we
! * estimate the number of pages fetched by all the scans; then
* pro-rate the costs for one scan. In this case we assume all the
! * fetches are random accesses.
*/
double num_scans = outer_rel->rows;

***************
*** 291,297 ****
(double) index->pages,
root);

! run_cost += (pages_fetched * random_page_cost) / num_scans;
}
else
{
--- 290,316 ----
(double) index->pages,
root);

! max_IO_cost = (pages_fetched * random_page_cost) / num_scans;
!
! /*
! * In the perfectly correlated case, the number of pages touched
! * by each scan is selectivity * table_size, and we can use the
! * Mackert and Lohman formula at the page level to estimate how
! * much work is saved by caching across scans. We still assume
! * all the fetches are random, though, which is an overestimate
! * that's hard to correct for without double-counting the cache
! * effects. (But in most cases where such a plan is actually
! * interesting, only one page would get fetched per scan anyway,
! * so it shouldn't matter much.)
! */
! pages_fetched = ceil(indexSelectivity * (double) baserel->pages);
!
! pages_fetched = index_pages_fetched(pages_fetched * num_scans,
! baserel->pages,
! (double) index->pages,
! root);
!
! min_IO_cost = (pages_fetched * random_page_cost) / num_scans;
}
else
{
***************
*** 312,326 ****
min_IO_cost = random_page_cost;
if (pages_fetched > 1)
min_IO_cost += (pages_fetched - 1) * seq_page_cost;

! /*
! * Now interpolate based on estimated index order correlation to get
! * total disk I/O cost for main table accesses.
! */
! csquared = indexCorrelation * indexCorrelation;

! run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
! }

/*
* Estimate CPU costs per tuple.
--- 331,345 ----
min_IO_cost = random_page_cost;
if (pages_fetched > 1)
min_IO_cost += (pages_fetched - 1) * seq_page_cost;
+ }

! /*
! * Now interpolate based on estimated index order correlation to get
! * total disk I/O cost for main table accesses.
! */
! csquared = indexCorrelation * indexCorrelation;

! run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

/*
* Estimate CPU costs per tuple.


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:30 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com