vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I'm having a problem with long running commits appearing in my database logs. It may be hardware related, as the problem appeared when we moved the database to a new server connected to a different disk array. The disk array is a lower class array, but still more than powerful enough to handle the IO requirements. One big difference though is that the old array had 16 GB of cache, the new one has 4 GB. Running Postgres 8.1.8 on AIX 5.3 We have enough IO to spare that we have the bgwriter cranked up pretty high, dirty buffers are getting quickly. Vmstat indicates 0 io wait time, no swapping or anything nasty like that going on. The long running commits do not line up with checkpoint times. The postgresql.conf config are identical except that wal_buffers was 8 on the old master, and it is set to 16 on the new one. We have other installations of this product running on the same array (different servers though) and they are not suffering from this problem. The only other thing of note is that the wal files sit on the same disk as the data directory. This has not changed between the old and new config, but the installs that are running fine do have their wal files on a separate partition. Any ideas where the problem could lie? Could having the wal files on the same data partition cause long running commits when there is plenty of IO to spare? -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote: > I'm having a problem with long running commits appearing in my database > logs. It may be hardware related, as the problem appeared when we moved > the database to a new server connected to a different disk array. The > disk array is a lower class array, but still more than powerful enough > to handle the IO requirements. One big difference though is that the > old array had 16 GB of cache, the new one has 4 GB. > > Running Postgres 8.1.8 on AIX 5.3 > > We have enough IO to spare that we have the bgwriter cranked up pretty > high, dirty buffers are getting quickly. Vmstat indicates 0 io wait > time, no swapping or anything nasty like that going on. > > The long running commits do not line up with checkpoint times. > > The postgresql.conf config are identical except that wal_buffers was 8 > on the old master, and it is set to 16 on the new one. > > We have other installations of this product running on the same array > (different servers though) and they are not suffering from this > problem. > > The only other thing of note is that the wal files sit on the same disk > as the data directory. This has not changed between the old and new > config, but the installs that are running fine do have their wal files > on a separate partition. > > Any ideas where the problem could lie? Could having the wal files on > the same data partition cause long running commits when there is plenty > of IO to spare? More on this - we also have long running commits on installations that do have the wal files on a separate partition. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Brad Nicholson <bnichols@ca.afilias.info> writes: > On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote: >> I'm having a problem with long running commits appearing in my database >> logs. It may be hardware related, as the problem appeared when we moved >> the database to a new server connected to a different disk array. > More on this - we also have long running commits on installations that > do have the wal files on a separate partition. What's your definition of "long running commit" --- seconds? milliseconds? Exactly what are you measuring? Can you correlate the problem with what the transaction was doing? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Thu, 2007-09-13 at 11:10 -0400, Tom Lane wrote: > Brad Nicholson <bnichols@ca.afilias.info> writes: > > On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote: > >> I'm having a problem with long running commits appearing in my database > >> logs. It may be hardware related, as the problem appeared when we moved > >> the database to a new server connected to a different disk array. > > > More on this - we also have long running commits on installations that > > do have the wal files on a separate partition. > > What's your definition of "long running commit" --- seconds? milliseconds? > Exactly what are you measuring? Can you correlate the problem with what log_min_duration is set to 150ms Commits running over that up to 788ms. Here is what we see in the logs (with obfuscated dbname, username and IP): 2007-09-13 10:01:49.787 CUT [782426] dbname username 1.2.3.171 LOG: duration: 224.286 ms statement: EXECUTE <unnamed> [PREPARE: commit] 2007-09-13 10:19:16.373 CUT [737404] dbname username 1.2.3.174 LOG: duration: 372.545 ms statement: EXECUTE <unnamed> [PREPARE: commit] 2007-09-13 10:19:24.437 CUT [1806498] dbname username 11.2.3.171 LOG: duration: 351.544 ms statement: EXECUTE <unnamed> [PREPARE: commit] 2007-09-13 10:33:11.204 CUT [962598] dbname username 1.2.3.170 LOG: duration: 504.057 ms statement: EXECUTE <unnamed> [PREPARE: commit] 2007-09-13 10:40:33.735 CUT [1282104] dbname username 1.2.3.174 LOG: duration: 250.127 ms statement: EXECUTE <unnamed> [PREPARE: commit] 2007-09-13 10:49:54.752 CUT [1188032] dbname username 1.2.3.170 LOG: duration: 382.781 ms statement: EXECUTE <unnamed> [PREPARE: commit] 2007-09-13 11:30:43.339 CUT [1589464] dbname username 1.2.3.172 LOG: duration: 408.463 ms statement: EXECUTE <unnamed> [PREPARE: commit] -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Thu, 13 Sep 2007, Brad Nicholson wrote: > One big difference though is that the old array had 16 GB of cache, the > new one has 4 GB. > > We have enough IO to spare that we have the bgwriter cranked up pretty > high, dirty buffers are getting quickly. If your system is very active, running the bgwriter very aggressively will result in the cache on the disk array being almost continuously filled with pending writes that then trickle their way onto real disk eventually. If the typical working set on this system results in the amount of background writer cached writes regularly being >4GB but <16GB, that would explain what you're seeing. The resolution is actually unexpected by most people: you make the background writer less aggressive so that it's spewing less redundant writes clogging the array's cache, leaving more cache to buffer the actual commits so they don't block. This will increase the odds that there will be a checkpoint block instead, but if you're seeing none of them right now you may have some margin there to reduce the BGW activity without aggrevating checkpoints. Since you're probably not monitoring I/O waits and similar statistics on how the disk array's cache is being used, whether this is happening or not to you won't be obvious from what the operating system is reporting. One or two blocked writes every couple of minutes won't even show up on the gross statistics for operating system I/O waits; on average, they'll still be 0. So it's possible you may think you have plenty of I/O to spare, when in fact you don't quite have enough--what you've got is enough cache that the OS can't see where the real I/O bottleneck is. I'd be curious to see how you've got your background writer configured to see if it matches situations like this I've seen in the past. The parameters controlling the all scan are the ones you'd might consider turning down, definately the percentage and possibly the maxpages as well. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote: > On Thu, 13 Sep 2007, Brad Nicholson wrote: > I'd be curious to see how you've got your background writer configured to > see if it matches situations like this I've seen in the past. The > parameters controlling the all scan are the ones you'd might consider > turning down, definately the percentage and possibly the maxpages as well. bgwriter_delay = 50 # 10-10000 milliseconds between rounds bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers scanned/round bgwriter_lru_maxpages = 300 # 0-1000 buffers max written/round bgwriter_all_percent = 20 # 0-100% of all buffers scanned/round bgwriter_all_maxpages = 600 # 0-1000 buffers max written/round -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Thu, 2007-09-13 at 12:19 -0400, Brad Nicholson wrote: > On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote: > > On Thu, 13 Sep 2007, Brad Nicholson wrote: > > > I'd be curious to see how you've got your background writer configured to > > see if it matches situations like this I've seen in the past. The > > parameters controlling the all scan are the ones you'd might consider > > turning down, definately the percentage and possibly the maxpages as well. > > > bgwriter_delay = 50 # 10-10000 milliseconds between > rounds > bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers > scanned/round > bgwriter_lru_maxpages = 300 # 0-1000 buffers max > written/round > bgwriter_all_percent = 20 # 0-100% of all buffers > scanned/round > bgwriter_all_maxpages = 600 # 0-1000 buffers max > written/round I should add, there are 6 back ends running on this disk array (different servers and different data partitions) with these bgwriter settings. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Brad Nicholson wrote: > On Thu, 2007-09-13 at 12:19 -0400, Brad Nicholson wrote: > > On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote: > > > On Thu, 13 Sep 2007, Brad Nicholson wrote: > > > > > I'd be curious to see how you've got your background writer configured to > > > see if it matches situations like this I've seen in the past. The > > > parameters controlling the all scan are the ones you'd might consider > > > turning down, definately the percentage and possibly the maxpages as well. > > > > > > bgwriter_delay = 50 # 10-10000 milliseconds between > > rounds > > bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers > > scanned/round > > bgwriter_lru_maxpages = 300 # 0-1000 buffers max > > written/round > > bgwriter_all_percent = 20 # 0-100% of all buffers > > scanned/round > > bgwriter_all_maxpages = 600 # 0-1000 buffers max > > written/round > > I should add, there are 6 back ends running on this disk array > (different servers and different data partitions) with these bgwriter > settings. Maybe it is running deferred triggers or something? -- Alvaro Herrera http://www.advogato.org/person/alvherre "I suspect most samba developers are already technically insane... Of course, since many of them are Australians, you can't tell." (L. Torvalds) ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote: > Since you're probably not monitoring I/O waits and similar statistics on > how the disk array's cache is being used, whether this is happening or not > to you won't be obvious from what the operating system is reporting. A sysadmin looked at cache usage on the disk array. The read cache is being used heavily, and the write cache is not. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| On Thu, 13 Sep 2007, Brad Nicholson wrote: > A sysadmin looked at cache usage on the disk array. The read cache is > being used heavily, and the write cache is not. Given that information, you can take the below (which I was just about to send before the above update came in) as something to think about and test but perhaps not your primary line of attack. Even if my theory about the exact mechanism involved isn't correct, the background writer is still problematic in terms of its impact on the system when run as aggressively as you're doing it; I'm not sure but I think that's even more true on 8.1 than it is on 8.2 where I did most my testing in this area. > bgwriter_delay = 50 > bgwriter_lru_percent = 20.0 > bgwriter_lru_maxpages = 300 > bgwriter_all_percent = 20 > bgwriter_all_maxpages = 600 That was what I was expecting. Your all scan has the potential to be writing 600*8K*(1/50 msec)=98MB/sec worth of data to your disk array. Since some of this data has a random access component to it, your array cannot be expected to keep with a real peak load; the only thing saving you if something starts dirtying buffers as far as possible is that the array cache is buffering things. And that 4GB worth of cache could be filling in very little time. Every time the all scan writes a buffer that is frequently used, that write has a good chance that it was wasted because the block will be modified again before checkpoint time. Your settings are beyond regular aggressive and into the hyperactive terrority where I'd expect such redundant writes are happening often. I'd suggest you try to move toward dropping bgwriter_all_percent dramatically from its current setting and see how far down you can go before it starts to introduce blocks at checkpoint time. With bgwriter_delay set to 1/4 the default, I would expect that even 5% would be a high setting for you. That may be a more dramatic change than you want to make at once though, so lowering it in that direction more slowly (perhaps drop 5% each day) and seeing whether things improve as that happens may make more sense. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |