Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 10:33 AM
Brad Nicholson
 
Posts: n/a
Default Long Running Commits - Not Checkpoints

I'm having a problem with long running commits appearing in my database
logs. It may be hardware related, as the problem appeared when we moved
the database to a new server connected to a different disk array. The
disk array is a lower class array, but still more than powerful enough
to handle the IO requirements. One big difference though is that the
old array had 16 GB of cache, the new one has 4 GB.

Running Postgres 8.1.8 on AIX 5.3

We have enough IO to spare that we have the bgwriter cranked up pretty
high, dirty buffers are getting quickly. Vmstat indicates 0 io wait
time, no swapping or anything nasty like that going on.

The long running commits do not line up with checkpoint times.

The postgresql.conf config are identical except that wal_buffers was 8
on the old master, and it is set to 16 on the new one.

We have other installations of this product running on the same array
(different servers though) and they are not suffering from this
problem.

The only other thing of note is that the wal files sit on the same disk
as the data directory. This has not changed between the old and new
config, but the installs that are running fine do have their wal files
on a separate partition.

Any ideas where the problem could lie? Could having the wal files on
the same data partition cause long running commits when there is plenty
of IO to spare?

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 10:33 AM
Brad Nicholson
 
Posts: n/a
Default Re: Long Running Commits - Not Checkpoints

On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote:
> I'm having a problem with long running commits appearing in my database
> logs. It may be hardware related, as the problem appeared when we moved
> the database to a new server connected to a different disk array. The
> disk array is a lower class array, but still more than powerful enough
> to handle the IO requirements. One big difference though is that the
> old array had 16 GB of cache, the new one has 4 GB.
>
> Running Postgres 8.1.8 on AIX 5.3
>
> We have enough IO to spare that we have the bgwriter cranked up pretty
> high, dirty buffers are getting quickly. Vmstat indicates 0 io wait
> time, no swapping or anything nasty like that going on.
>
> The long running commits do not line up with checkpoint times.
>
> The postgresql.conf config are identical except that wal_buffers was 8
> on the old master, and it is set to 16 on the new one.
>
> We have other installations of this product running on the same array
> (different servers though) and they are not suffering from this
> problem.
>
> The only other thing of note is that the wal files sit on the same disk
> as the data directory. This has not changed between the old and new
> config, but the installs that are running fine do have their wal files
> on a separate partition.
>
> Any ideas where the problem could lie? Could having the wal files on
> the same data partition cause long running commits when there is plenty
> of IO to spare?


More on this - we also have long running commits on installations that
do have the wal files on a separate partition.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 10:33 AM
Tom Lane
 
Posts: n/a
Default Re: Long Running Commits - Not Checkpoints

Brad Nicholson <bnichols@ca.afilias.info> writes:
> On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote:
>> I'm having a problem with long running commits appearing in my database
>> logs. It may be hardware related, as the problem appeared when we moved
>> the database to a new server connected to a different disk array.


> More on this - we also have long running commits on installations that
> do have the wal files on a separate partition.


What's your definition of "long running commit" --- seconds? milliseconds?
Exactly what are you measuring? Can you correlate the problem with what
the transaction was doing?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 10:33 AM
Brad Nicholson
 
Posts: n/a
Default Re: Long Running Commits - Not Checkpoints

On Thu, 2007-09-13 at 11:10 -0400, Tom Lane wrote:
> Brad Nicholson <bnichols@ca.afilias.info> writes:
> > On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote:
> >> I'm having a problem with long running commits appearing in my database
> >> logs. It may be hardware related, as the problem appeared when we moved
> >> the database to a new server connected to a different disk array.

>
> > More on this - we also have long running commits on installations that
> > do have the wal files on a separate partition.

>
> What's your definition of "long running commit" --- seconds? milliseconds?
> Exactly what are you measuring? Can you correlate the problem with what


log_min_duration is set to 150ms

Commits running over that up to 788ms. Here is what we see in the logs
(with obfuscated dbname, username and IP):

2007-09-13 10:01:49.787 CUT [782426] dbname username 1.2.3.171 LOG:
duration: 224.286 ms statement: EXECUTE <unnamed> [PREPARE: commit]
2007-09-13 10:19:16.373 CUT [737404] dbname username 1.2.3.174 LOG:
duration: 372.545 ms statement: EXECUTE <unnamed> [PREPARE: commit]
2007-09-13 10:19:24.437 CUT [1806498] dbname username 11.2.3.171 LOG:
duration: 351.544 ms statement: EXECUTE <unnamed> [PREPARE: commit]
2007-09-13 10:33:11.204 CUT [962598] dbname username 1.2.3.170 LOG:
duration: 504.057 ms statement: EXECUTE <unnamed> [PREPARE: commit]
2007-09-13 10:40:33.735 CUT [1282104] dbname username 1.2.3.174 LOG:
duration: 250.127 ms statement: EXECUTE <unnamed> [PREPARE: commit]
2007-09-13 10:49:54.752 CUT [1188032] dbname username 1.2.3.170 LOG:
duration: 382.781 ms statement: EXECUTE <unnamed> [PREPARE: commit]
2007-09-13 11:30:43.339 CUT [1589464] dbname username 1.2.3.172 LOG:
duration: 408.463 ms statement: EXECUTE <unnamed> [PREPARE: commit]


--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 10:33 AM
Greg Smith
 
Posts: n/a
Default Re: Long Running Commits - Not Checkpoints

On Thu, 13 Sep 2007, Brad Nicholson wrote:

> One big difference though is that the old array had 16 GB of cache, the
> new one has 4 GB.
>
> We have enough IO to spare that we have the bgwriter cranked up pretty
> high, dirty buffers are getting quickly.


If your system is very active, running the bgwriter very aggressively will
result in the cache on the disk array being almost continuously filled
with pending writes that then trickle their way onto real disk eventually.
If the typical working set on this system results in the amount of
background writer cached writes regularly being >4GB but <16GB, that would
explain what you're seeing. The resolution is actually unexpected by most
people: you make the background writer less aggressive so that it's
spewing less redundant writes clogging the array's cache, leaving more
cache to buffer the actual commits so they don't block. This will
increase the odds that there will be a checkpoint block instead, but if
you're seeing none of them right now you may have some margin there to
reduce the BGW activity without aggrevating checkpoints.

Since you're probably not monitoring I/O waits and similar statistics on
how the disk array's cache is being used, whether this is happening or not
to you won't be obvious from what the operating system is reporting. One
or two blocked writes every couple of minutes won't even show up on the
gross statistics for operating system I/O waits; on average, they'll still
be 0. So it's possible you may think you have plenty of I/O to spare,
when in fact you don't quite have enough--what you've got is enough cache
that the OS can't see where the real I/O bottleneck is.

I'd be curious to see how you've got your background writer configured to
see if it matches situations like this I've seen in the past. The
parameters controlling the all scan are the ones you'd might consider
turning down, definately the percentage and possibly the maxpages as well.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 10:33 AM
Brad Nicholson
 
Posts: n/a
Default Re: Long Running Commits - Not Checkpoints

On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
> On Thu, 13 Sep 2007, Brad Nicholson wrote:


> I'd be curious to see how you've got your background writer configured to
> see if it matches situations like this I've seen in the past. The
> parameters controlling the all scan are the ones you'd might consider
> turning down, definately the percentage and possibly the maxpages as well.



bgwriter_delay = 50 # 10-10000 milliseconds between
rounds
bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers
scanned/round
bgwriter_lru_maxpages = 300 # 0-1000 buffers max
written/round
bgwriter_all_percent = 20 # 0-100% of all buffers
scanned/round
bgwriter_all_maxpages = 600 # 0-1000 buffers max
written/round


--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 10:33 AM
Brad Nicholson
 
Posts: n/a
Default Re: Long Running Commits - Not Checkpoints

On Thu, 2007-09-13 at 12:19 -0400, Brad Nicholson wrote:
> On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
> > On Thu, 13 Sep 2007, Brad Nicholson wrote:

>
> > I'd be curious to see how you've got your background writer configured to
> > see if it matches situations like this I've seen in the past. The
> > parameters controlling the all scan are the ones you'd might consider
> > turning down, definately the percentage and possibly the maxpages as well.

>
>
> bgwriter_delay = 50 # 10-10000 milliseconds between
> rounds
> bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers
> scanned/round
> bgwriter_lru_maxpages = 300 # 0-1000 buffers max
> written/round
> bgwriter_all_percent = 20 # 0-100% of all buffers
> scanned/round
> bgwriter_all_maxpages = 600 # 0-1000 buffers max
> written/round


I should add, there are 6 back ends running on this disk array
(different servers and different data partitions) with these bgwriter
settings.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 10:33 AM
Alvaro Herrera
 
Posts: n/a
Default Re: Long Running Commits - Not Checkpoints

Brad Nicholson wrote:
> On Thu, 2007-09-13 at 12:19 -0400, Brad Nicholson wrote:
> > On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
> > > On Thu, 13 Sep 2007, Brad Nicholson wrote:

> >
> > > I'd be curious to see how you've got your background writer configured to
> > > see if it matches situations like this I've seen in the past. The
> > > parameters controlling the all scan are the ones you'd might consider
> > > turning down, definately the percentage and possibly the maxpages as well.

> >
> >
> > bgwriter_delay = 50 # 10-10000 milliseconds between
> > rounds
> > bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers
> > scanned/round
> > bgwriter_lru_maxpages = 300 # 0-1000 buffers max
> > written/round
> > bgwriter_all_percent = 20 # 0-100% of all buffers
> > scanned/round
> > bgwriter_all_maxpages = 600 # 0-1000 buffers max
> > written/round

>
> I should add, there are 6 back ends running on this disk array
> (different servers and different data partitions) with these bgwriter
> settings.


Maybe it is running deferred triggers or something?

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 10:33 AM
Brad Nicholson
 
Posts: n/a
Default Re: Long Running Commits - Not Checkpoints

On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote:
> Since you're probably not monitoring I/O waits and similar statistics on
> how the disk array's cache is being used, whether this is happening or not
> to you won't be obvious from what the operating system is reporting.



A sysadmin looked at cache usage on the disk array. The read cache is
being used heavily, and the write cache is not.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-19-2008, 10:33 AM
Greg Smith
 
Posts: n/a
Default Re: Long Running Commits - Not Checkpoints

On Thu, 13 Sep 2007, Brad Nicholson wrote:

> A sysadmin looked at cache usage on the disk array. The read cache is
> being used heavily, and the write cache is not.


Given that information, you can take the below (which I was just about to
send before the above update came in) as something to think about and test
but perhaps not your primary line of attack. Even if my theory about the
exact mechanism involved isn't correct, the background writer is still
problematic in terms of its impact on the system when run as aggressively
as you're doing it; I'm not sure but I think that's even more true on 8.1
than it is on 8.2 where I did most my testing in this area.

> bgwriter_delay = 50
> bgwriter_lru_percent = 20.0
> bgwriter_lru_maxpages = 300
> bgwriter_all_percent = 20
> bgwriter_all_maxpages = 600


That was what I was expecting. Your all scan has the potential to be
writing 600*8K*(1/50 msec)=98MB/sec worth of data to your disk array.
Since some of this data has a random access component to it, your array
cannot be expected to keep with a real peak load; the only thing saving
you if something starts dirtying buffers as far as possible is that the
array cache is buffering things. And that 4GB worth of cache could be
filling in very little time.

Every time the all scan writes a buffer that is frequently used, that
write has a good chance that it was wasted because the block will be
modified again before checkpoint time. Your settings are beyond regular
aggressive and into the hyperactive terrority where I'd expect such
redundant writes are happening often. I'd suggest you try to move toward
dropping bgwriter_all_percent dramatically from its current setting and
see how far down you can go before it starts to introduce blocks at
checkpoint time. With bgwriter_delay set to 1/4 the default, I would
expect that even 5% would be a high setting for you. That may be a more
dramatic change than you want to make at once though, so lowering it in
that direction more slowly (perhaps drop 5% each day) and seeing whether
things improve as that happens may make more sense.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:05 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542