Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 11:07 AM
Mark Aufflick
 
Posts: n/a
Default seq scan cache vs. index cache smackdown

Hi All,

I have boiled my situation down to the following simple case: (postgres
version 7.3)

* Query 1 is doing a sequential scan over a table (courtesy of field
ILIKE 'foo%') and index joins to a few others
* Query 2 is doing a functional index scan over the same table
(lower(field) LIKE 'foo%') and index joins to a few others
* neither query has an order by clause
* for the purpose of testing, both queries are designed to return the
same result set

Obviously Q2 is faster than Q1, but if I ever run them both at the same
time (lets say I run two of Q1 and one of Q2 at the same time) then Q2
consistently returns WORSE times than Q1 (explain analyze confirms that
it is using the index).

My assumption is that the sequential scan is blowing the index from any
cache it might live in, and simultaneously stealing all the disk IO
that is needed to access the index on disk (the table has 200,000
rows).

If I simplify the case to not do the index joins (ie. operate on the
one table only) the situation is not as dramatic, but similar.

My thoughts are:

1) kill the sequential scan - but unfortunately I don't have direct
control over that code
2) change the way the server allocates/prioritizes different caches - i
don't know enough about how postgres caches work to do this (if it's
possible)
3) try it on postgres 7.4 - possible, but migrating the system to 7.4
in production will be hard because the above code that I am not
responsible for has a lot of (slightly wacky) implicit date casts
4) ask the fine people on the mailing list for other suggestions!
--
Mark Aufflick
e mark@pumptheory.com
w www.pumptheory.com (work)
w mark.aufflick.com (personal)
p +61 438 700 647
f +61 2 9436 4737


================================================== ======================
iBurst Wireless Broadband from $34.95/month www.platformnetworks.net
Forward undetected SPAM to: spam@mailsecurity.net.au
================================================== ======================


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 11:07 AM
Christopher Browne
 
Posts: n/a
Default Re: seq scan cache vs. index cache smackdown

The world rejoiced as mark@pumptheory.com (Mark Aufflick) wrote:
> Hi All,
>
> I have boiled my situation down to the following simple case:
> (postgres version 7.3)
>
> * Query 1 is doing a sequential scan over a table (courtesy of field
> ILIKE 'foo%') and index joins to a few others
> * Query 2 is doing a functional index scan over the same table
> (lower(field) LIKE 'foo%') and index joins to a few others
> * neither query has an order by clause
> * for the purpose of testing, both queries are designed to return the
> same result set
>
> Obviously Q2 is faster than Q1, but if I ever run them both at the
> same time (lets say I run two of Q1 and one of Q2 at the same time)
> then Q2 consistently returns WORSE times than Q1 (explain analyze
> confirms that it is using the index).
>
> My assumption is that the sequential scan is blowing the index from
> any cache it might live in, and simultaneously stealing all the disk
> IO that is needed to access the index on disk (the table has 200,000
> rows).


There's something to be said for that...

> If I simplify the case to not do the index joins (ie. operate on the
> one table only) the situation is not as dramatic, but similar.
>
> My thoughts are:
>
> 1) kill the sequential scan - but unfortunately I don't have direct
> control over that code


This is a good choice, if plausible...

> 2) change the way the server allocates/prioritizes different caches -
> i don't know enough about how postgres caches work to do this (if it's
> possible)


That's what the 8.0 cache changes did... Patent claim issues are
leading to some changes to the prioritization, which is liable to
change 8.0.something and 8.1.

> 3) try it on postgres 7.4 - possible, but migrating the system to 7.4
> in production will be hard because the above code that I am not
> responsible for has a lot of (slightly wacky) implicit date casts


Moving to 7.4 wouldn't materially change the situation; you'd have to
go all the way to version 8.
--
(format nil "~S@~S" "cbbrowne" "gmail.com")
http://linuxdatabases.info/~cbbrowne/postgresql.html
Rules of the Evil Overlord #32. "I will not fly into a rage and kill a
messenger who brings me bad news just to illustrate how evil I really
am. Good messengers are hard to come by."
<http://www.eviloverlord.com/>
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 11:08 AM
Greg Stark
 
Posts: n/a
Default Re: seq scan cache vs. index cache smackdown

Mark Aufflick <mark@pumptheory.com> writes:

> Obviously Q2 is faster than Q1,


That's not really obvious at all. If there are lots of records being returned
the index might not be faster than a sequential scan.

> My assumption is that the sequential scan is blowing the index from any cache
> it might live in, and simultaneously stealing all the disk IO that is needed to
> access the index on disk (the table has 200,000 rows).


It kind of sounds to me like you've lowered random_page_cost to reflect the
fact that your indexes are nearly always completely cached. But when they're
not this unrealistic random_page_cost causes indexes to be used when they're
no longer faster.

Perhaps you should post an "EXPLAIN ANALYZE" of your Q1 and Q2 (the latter
preferable with and without enable_indexscan, but since it's a join you may
not be able to get precisely the comparable plan without just that one index
scan.)

> 2) change the way the server allocates/prioritizes different caches - i don't
> know enough about how postgres caches work to do this (if it's possible)


Postgres keeps one set of shared buffers, not separate pools . Normally you
only allocate a small amount of your memory for Postgres and let the OS handle
disk caching.

What is your shared_buffers set to and how much memory do you have?

> 3) try it on postgres 7.4 - possible, but migrating the system to 7.4 in
> production will be hard because the above code that I am not responsible for
> has a lot of (slightly wacky) implicit date casts


I can't think of any 7.4 changes that would affect this directly, but there
were certainly plenty of changes that had broad effects. you never know.

8.0, on the other hand, has a new algorithm that specifically tries to protect
against the shared buffers being blown out by a sequential scan. But that will
only help if it's the shared buffers being thrashed that's hurting you, not
the entire OS file system cache.

--
greg


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 11:08 AM
Greg Stark
 
Posts: n/a
Default Re: seq scan cache vs. index cache smackdown

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > 8.0, on the other hand, has a new algorithm that specifically tries to
> > protect against the shared buffers being blown out by a sequential
> > scan. But that will only help if it's the shared buffers being
> > thrashed that's hurting you, not the entire OS file system cache.

>
> Something we ought to think about sometime: what are the performance
> implications of the real-world situation that we have another level of
> caching sitting underneath us?


It seems inevitable that Postgres will eventually eliminate that redundant
layer of buffering. Since mmap is not workable, that means using O_DIRECT to
read table and index data.

Every other database eventually goes this direction, and for good reason.
Having two layers of caching and buffering is inherently inefficient. It also
makes it impossible for Postgres to offer any application-specific hints to
the caching replacement algorithms.

In that world you would configure Postgres much like you configure Oracle,
with shared_buffers taking up as much of your memory as you can afford. And
the OS file system cache is kept entirely out of the loop.

> AFAIK all the theoretical studies we've looked at consider only a single
> level of caching. But for example, if our buffer management algorithm
> recognizes an index page as being heavily hit and therefore keeps it in
> cache for a long time, then when it does fall out of cache you can be sure
> it's going to need to be read from disk when it's next used, because the
> OS-level buffer cache has not seen a call for that page in a long time.
> Contrariwise a page that we think is only on the fringe of usefulness is
> going to stay in the OS cache because we repeatedly drop it and then have to
> ask for it again.


Hum. Is it clear that that's bad? By the same logic it's the ones on the
fringe that you're likely to have to read again anyways. The ones that are
being heavily used are likely not to have to be read again anyways.

--
greg


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 11:08 AM
Tom Lane
 
Posts: n/a
Default Re: seq scan cache vs. index cache smackdown

Josh Berkus <josh@agliodbs.com> writes:
> Why is mmap not workable?


We can't control write order. There are other equally bad problems,
but that one alone eliminates it from consideration. See past discussions.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 11:08 AM
Greg Stark
 
Posts: n/a
Default Re: seq scan cache vs. index cache smackdown


Josh Berkus <josh@agliodbs.com> writes:

> Why is mmap not workable? It would require far-reaching changes to our code
> -- certainly -- but I don't think it can be eliminated from consideration.


Fundamentally because there is no facility for being notified by the OS before
a page is written to disk. And there's no way to prevent a page from being
written to disk (mlock just prevents it from being flushed from memory, not
from being synced to disk).

So there's no way to guarantee the WAL will be written before the buffer is
synced to disk.



Maybe it could be done by writing and syncing the WAL independently before the
shared buffer is written to at all, but that would be a completely different
model. And it would locking the shared buffer until the sync is done, and
require a private copy of the shared buffer necessitating more copies than the
double buffering in the first place.

--
greg


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 11:08 AM
Iain
 
Posts: n/a
Default Re: seq scan cache vs. index cache smackdown

Hi Rod,

> Any solution fixing buffers should probably not take into consideration
> the method being performed (do you really want to skip caching a
> sequential scan of a 2 tuple table because it didn't use an index) but
> the volume of data involved as compared to the size of the cache.


Yes, in fact indexes aren't so different to tables really in that regard.

It sounds like version 8 may help out anyway.

regards
Iain

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 11:08 AM
Iain
 
Posts: n/a
Default Re: seq scan cache vs. index cache smackdown

Hi,

I think there was some discussion about seq scans messing up the cache, and
talk about doing something about it but I don't think it has been addressed
yet. Maybe worth a troll through the archives.

It is certainly true that in many situations, a seq scan is preferable to
using an index. I have been testing a situation here on two versions of the
same database, one of the databases is much bigger than the other
(artificially bloated for testing purposes). Some of the query plans change
to use seq scans on the big database, where they used indexes on the little
database - but either way, in *single user* testing the performance is fine.
My concern is that this kind of testing has very little relevance to the
real world of multiuser processing where contention for the cache becomes an
issue. It may be that, at least in the current situation, postgres is
giving too much weight to seq scans based on single user, straight line
performance comparisons. If your assumption is correct, then addressing that
might help, though it's bound to have it's compromises too...

regards
Iain




----- Original Message -----
From: "Mark Aufflick" <mark@pumptheory.com>
To: <pgsql-performance@postgresql.org>
Sent: Tuesday, February 15, 2005 8:34 AM
Subject: [PERFORM] seq scan cache vs. index cache smackdown


> Hi All,
>
> I have boiled my situation down to the following simple case: (postgres
> version 7.3)
>
> * Query 1 is doing a sequential scan over a table (courtesy of field ILIKE
> 'foo%') and index joins to a few others
> * Query 2 is doing a functional index scan over the same table
> (lower(field) LIKE 'foo%') and index joins to a few others
> * neither query has an order by clause
> * for the purpose of testing, both queries are designed to return the same
> result set
>
> Obviously Q2 is faster than Q1, but if I ever run them both at the same
> time (lets say I run two of Q1 and one of Q2 at the same time) then Q2
> consistently returns WORSE times than Q1 (explain analyze confirms that it
> is using the index).
>
> My assumption is that the sequential scan is blowing the index from any
> cache it might live in, and simultaneously stealing all the disk IO that
> is needed to access the index on disk (the table has 200,000 rows).
>
> If I simplify the case to not do the index joins (ie. operate on the one
> table only) the situation is not as dramatic, but similar.
>
> My thoughts are:
>
> 1) kill the sequential scan - but unfortunately I don't have direct
> control over that code
> 2) change the way the server allocates/prioritizes different caches - i
> don't know enough about how postgres caches work to do this (if it's
> possible)
> 3) try it on postgres 7.4 - possible, but migrating the system to 7.4 in
> production will be hard because the above code that I am not responsible
> for has a lot of (slightly wacky) implicit date casts
> 4) ask the fine people on the mailing list for other suggestions!
> --
> Mark Aufflick
> e mark@pumptheory.com
> w www.pumptheory.com (work)
> w mark.aufflick.com (personal)
> p +61 438 700 647
> f +61 2 9436 4737
>
>
> ================================================== ======================
> iBurst Wireless Broadband from $34.95/month www.platformnetworks.net
> Forward undetected SPAM to: spam@mailsecurity.net.au
> ================================================== ======================
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings



---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 11:08 AM
Tom Lane
 
Posts: n/a
Default Re: seq scan cache vs. index cache smackdown

Greg Stark <gsstark@mit.edu> writes:
> 8.0, on the other hand, has a new algorithm that specifically tries to
> protect against the shared buffers being blown out by a sequential
> scan. But that will only help if it's the shared buffers being
> thrashed that's hurting you, not the entire OS file system cache.


Something we ought to think about sometime: what are the performance
implications of the real-world situation that we have another level of
caching sitting underneath us? AFAIK all the theoretical studies we've
looked at consider only a single level of caching. But for example,
if our buffer management algorithm recognizes an index page as being
heavily hit and therefore keeps it in cache for a long time, then when
it does fall out of cache you can be sure it's going to need to be read
from disk when it's next used, because the OS-level buffer cache has not
seen a call for that page in a long time. Contrariwise a page that we
think is only on the fringe of usefulness is going to stay in the OS
cache because we repeatedly drop it and then have to ask for it again.

I have no idea how to model this situation, but it seems like it needs
some careful thought.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 11:08 AM
Josh Berkus
 
Posts: n/a
Default Re: seq scan cache vs. index cache smackdown

Tom, Greg, Merlin,

> But for example,
> if our buffer management algorithm recognizes an index page as being
> heavily hit and therefore keeps it in cache for a long time, then when
> it does fall out of cache you can be sure it's going to need to be read
> from disk when it's next used, because the OS-level buffer cache has not
> seen a call for that page in a long time. Contrariwise a page that we
> think is only on the fringe of usefulness is going to stay in the OS
> cache because we repeatedly drop it and then have to ask for it again.


Now you can see why other DBMSs don't use the OS disk cache. There's other
issues as well; for example, as long as we use the OS disk cache, we can't
eliminate checkpoint spikes, at least on Linux. No matter what we do with
the bgwriter, fsyncing the OS disk cache causes heavy system activity.

> It seems inevitable that Postgres will eventually eliminate that redundant
> layer of buffering. Since mmap is not workable, that means using O_DIRECT
> to read table and index data.


Why is mmap not workable? It would require far-reaching changes to our code
-- certainly -- but I don't think it can be eliminated from consideration.

> What about going the other way and simply letting the o/s do all the
> caching? *How bad (or good) would the performance really be? *


Pretty bad. You can simulate this easily by turning your shared_buffers way
down ...

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:18 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492