Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 10:47 AM
David Brown
 
Posts: n/a
Default Re: Seqscan rather than Index

> You might want to reduce random_page_cost a little.

> Keep in mind that your test case is small enough to fit in RAM and is
> probably not reflective of what will happen with larger tables.


I am also running 8.0 rc1 for Windows. Despite many hours spent tweaking various planner cost constants, I found little effect on cost estimates. Even reducing random_page_cost from 4.0 to 0.1 had negligible impact and failed to significantly influence the planner.

Increasing the statistics target for the last_name column to 250 or so *may* help, at least if you're only selecting one name at a time. That's the standard advice around here and the only thing I've found useful. Half the threads in this forum are about under-utilized indexes. It would be great if someone could admit the planner is broken and talk about actually fixing it!

I'm unconvinced that the planner only favours sequential scans as table size decreases. In my experience so far, larger tables have the same problem only it's more noticeable.

The issue hits PostgreSQL harder than others because of its awful sequential scan speed, which is two to five times slower than other DBMS. The archives show there has been talk for years about this, but it seems, no solution. The obvious thing to consider is the block size, but people have tried increasing this in the past with only marginal success.

Regards

David

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 10:47 AM
Richard Huxton
 
Posts: n/a
Default Re: Seqscan rather than Index

David Brown wrote:
>> You might want to reduce random_page_cost a little.

>
>
>> Keep in mind that your test case is small enough to fit in RAM and
>> is probably not reflective of what will happen with larger tables.

>
>
> I am also running 8.0 rc1 for Windows. Despite many hours spent
> tweaking various planner cost constants, I found little effect on
> cost estimates. Even reducing random_page_cost from 4.0 to 0.1 had
> negligible impact and failed to significantly influence the planner.


I'm not sure setting random_page_cost below 1.0 makes much sense.

> Increasing the statistics target for the last_name column to 250 or
> so *may* help, at least if you're only selecting one name at a time.


Not going to do anything in this case. The planner is roughly right
about how many rows will be returned, it's just not expecting everything
to be in RAM.

> That's the standard advice around here and the only thing I've found
> useful. Half the threads in this forum are about under-utilized
> indexes. It would be great if someone could admit the planner is
> broken and talk about actually fixing it!


Not sure I agree here - when the stats are accurate, you can get the
planner to make near-optimal choices most of the time. Is there any
particular pattern you've seen?

> I'm unconvinced that the planner only favours sequential scans as
> table size decreases. In my experience so far, larger tables have the
> same problem only it's more noticeable.


Hmm - assuming your statistics are good, this would suggest the other
cost settings just aren't right for your hardware.

> The issue hits PostgreSQL harder than others because of its awful
> sequential scan speed, which is two to five times slower than other
> DBMS. The archives show there has been talk for years about this, but
> it seems, no solution. The obvious thing to consider is the block
> size, but people have tried increasing this in the past with only
> marginal success.


Must admit this puzzles me. Are you saying you can't saturate your disk
I/O? Or are you saying other DBMS store records in 0.5 to 0.2 times less
space than PG?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 10:47 AM
Greg Stark
 
Posts: n/a
Default Re: Seqscan rather than Index

Richard Huxton <dev@archonet.com> writes:

> Not going to do anything in this case. The planner is roughly right about how
> many rows will be returned, it's just not expecting everything to be in RAM.


That doesn't make sense or else it would switch to the index at
random_page_cost = 1.0. If it was still using a sequential scan at
random_page_cost < 1 then perhaps he had some problem with his query like
mismatched data types that forced it to use a full scan.

> > That's the standard advice around here and the only thing I've found
> > useful. Half the threads in this forum are about under-utilized
> > indexes. It would be great if someone could admit the planner is
> > broken and talk about actually fixing it!

>
> Not sure I agree here - when the stats are accurate, you can get the planner to
> make near-optimal choices most of the time. Is there any particular pattern
> you've seen?


The most common cause I've seen here is that Postgres makes very pessimistic
assumptions about selectivity when it doesn't know better. Every other
database I've tested assumes 'col > ?' is about 5% selectivity . Postgres
assumes 33%.

Postgres is also more pessimistic about the efficiency of index scans. It's
willing to use a sequential scan down to well below 5% selectivity when other
databases use the more traditional rule of thumb of 10%.

In combination these effects do seem to cause an _awful_ lot of complaints.


> > The issue hits PostgreSQL harder than others because of its awful
> > sequential scan speed, which is two to five times slower than other
> > DBMS. The archives show there has been talk for years about this, but
> > it seems, no solution. The obvious thing to consider is the block
> > size, but people have tried increasing this in the past with only
> > marginal success.

>
> Must admit this puzzles me. Are you saying you can't saturate your disk I/O? Or
> are you saying other DBMS store records in 0.5 to 0.2 times less space than PG?


I don't know what he's talking about either. Perhaps he's thinking of people
who haven't been running vacuum enough?

--
greg


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 10:47 AM
Tom Lane
 
Posts: n/a
Default Re: Seqscan rather than Index

Greg Stark <gsstark@mit.edu> writes:
> Postgres is also more pessimistic about the efficiency of index scans. It's
> willing to use a sequential scan down to well below 5% selectivity when other
> databases use the more traditional rule of thumb of 10%.


However, other databases are probably basing their analysis on a
different execution model. Since we have to visit both heap and index
in all cases, we do indeed have a larger penalty for index use.

I've looked pretty closely at the cost model for index access, believe me.
It's not pessimistic; if anything it is undercharging for index access.
(For one thing it treats the index's internal fetches as sequential
access, when in reality they are probably random.)

I think the one effect that's not being modeled is amortization of index
fetches across successive queries. The cost model is pretty much based
on the assumption that each query starts from ground zero, whereas in
reality a heavily used index will certainly have all its upper levels in
RAM, and if it's not too large the leaf pages might all be cached too.
I wouldn't want to switch the planner over to making that assumption
exclusively, but we could talk about having a cost parameter that dials
the assumption up or down.

Awhile back I tried rewriting btcostestimate to charge zero for
accessing the metapage and the upper index levels, but charge
random_page_cost for fetching leaf pages. For small indexes this came
out with worse (larger) numbers than we have now, which is not the
direction we want to go in :-(. So I think that we have to somehow
honestly model caching of index pages across queries.

Of course, to be completely fair such a modification should account for
caching of heap pages as well, so it would also bring down the estimates
for seqscans. But I'd be willing to accept a model that considers only
caching of index pages as a zero-order approximation.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 10:47 AM
Greg Stark
 
Posts: n/a
Default Re: Seqscan rather than Index

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Postgres is also more pessimistic about the efficiency of index scans. It's
> > willing to use a sequential scan down to well below 5% selectivity when other
> > databases use the more traditional rule of thumb of 10%.

>
> However, other databases are probably basing their analysis on a
> different execution model. Since we have to visit both heap and index
> in all cases, we do indeed have a larger penalty for index use.


It's only in special cases that other databases do not have to look at the
heap. For simple queries like "select * from x where foo > ?" they still have
to look at the heap. I never looked into how much of a bonus Oracle gives for
the index-only case, I'm not sure it even takes it into account.

> I've looked pretty closely at the cost model for index access, believe me.
> It's not pessimistic; if anything it is undercharging for index access.


I think there's another effect here beyond the physical arithmetic. There's a
kind of teleological reasoning that goes something like "If the user created
the index chances are it's because he wanted it to be used".

I guess that argues more for more aggressive selectivity estimates than for
biased index costing though. If I'm doing "where foo > ?" then if there's an
index on foo I probably put it there for a reason and want it to be used even
if postgres doesn't really have a clue how selective the query will be.

> I think the one effect that's not being modeled is amortization of index
> fetches across successive queries.


And across multiple fetches in a single query, such as with a nested loop.

It seems like the effective_cache_size parameter should be having some
influence here.

--
greg


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 10:47 AM
Tom Lane
 
Posts: n/a
Default Re: Seqscan rather than Index

Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I think the one effect that's not being modeled is amortization of index
>> fetches across successive queries.


> And across multiple fetches in a single query, such as with a nested loop.


Right, that's effectively the same problem. You could imagine making a
special-purpose solution for nestloop queries but I think the issue is
more general than that.

> It seems like the effective_cache_size parameter should be having some
> influence here.


But it doesn't :-(. e_c_s is currently only used to estimate
amortization of repeated heap-page fetches within a single indexscan.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 10:48 AM
Steinar H. Gunderson
 
Posts: n/a
Default Re: Seqscan rather than Index

On Fri, Dec 17, 2004 at 10:47:57AM -0500, Greg Stark wrote:
>> Must admit this puzzles me. Are you saying you can't saturate your disk I/O? Or
>> are you saying other DBMS store records in 0.5 to 0.2 times less space than PG?

> I don't know what he's talking about either. Perhaps he's thinking of people
> who haven't been running vacuum enough?


I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG 7.4,
everything in cache, 32-byte rows) take ~3500ms on an Athlon 64 2800+?

/* Steinar */
--
Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 10:48 AM
Steinar H. Gunderson
 
Posts: n/a
Default Re: Seqscan rather than Index

On Fri, Dec 17, 2004 at 10:56:27PM +0100, Steinar H. Gunderson wrote:
> I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG 7.4,
> everything in cache, 32-byte rows) take ~3500ms on an Athlon 64 2800+?


(I realize I was a bit unclear here. This is a completely separate case, not
related to the original poster -- I was just wondering if what I'm seeing is
normal or not.)

/* Steinar */
--
Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 10:48 AM
Frank Wiles
 
Posts: n/a
Default Re: Seqscan rather than Index

On Fri, 17 Dec 2004 23:09:07 +0100
"Steinar H. Gunderson" <sgunderson@bigfoot.com> wrote:

> On Fri, Dec 17, 2004 at 10:56:27PM +0100, Steinar H. Gunderson wrote:
> > I'm a bit unsure -- should counting ~3 million rows (no OIDs, PG
> > 7.4, everything in cache, 32-byte rows) take ~3500ms on an Athlon 64
> > 2800+?

>
> (I realize I was a bit unclear here. This is a completely separate
> case, not related to the original poster -- I was just wondering if
> what I'm seeing is normal or not.)


It depends more on your disk IO than the processor. Counting isn't
processor intensive, but reading through the entire table on disk
is. I've also seen a huge difference between select count(*) and
select count(1) in older versions, haven't tried it on a recent
version however.

---------------------------------
Frank Wiles <frank@wiles.org>
http://www.wiles.org
---------------------------------


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 10:48 AM
Steinar H. Gunderson
 
Posts: n/a
Default Re: Seqscan rather than Index

On Fri, Dec 17, 2004 at 05:02:29PM -0600, Frank Wiles wrote:
> It depends more on your disk IO than the processor. Counting isn't
> processor intensive, but reading through the entire table on disk
> is. I've also seen a huge difference between select count(*) and
> select count(1) in older versions, haven't tried it on a recent
> version however.


Like I said, all in cache, so no disk IO. count(*) and count(1) give me
identical results. (BTW, I don't think this is a count problem, it's a
"sequential scan" problem -- I'm just trying to find out if this is natural
or not, ie. if this is just something I have to expect in a relational
database, even with no I/O.)

/* Steinar */
--
Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:19 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573