This is a discussion on Long checkpoints on AIX/IDS7.31/IBM Shark within the Informix forums, part of the Database Server Software category; --> List, We have very sporadic, very long checkpoints on our IBM RS/6000 boxes (AIX 4.3.3, IDS 7.31.UD4) connected to ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| List, We have very sporadic, very long checkpoints on our IBM RS/6000 boxes (AIX 4.3.3, IDS 7.31.UD4) connected to an IBM Shark (2105-F20), eg. 66 seconds to write away 2772 KB. Normally checkpoints are 0 - 1 seconds for 10 - 20 MB: $ onstat -c | grep -e BUFFERS -e LRU -e CKPTINTVL BUFFERS 450000 # Maximum number of shared buffers LRUS 16 # Number of LRU queues LRU_MAX_DIRTY 2 # LRU percent dirty begin cleaning limit LRU_MIN_DIRTY 1 # LRU percent dirty end cleaning limit CKPTINTVL 120 # Check point interval (in sec) Yes, the io queues are long, this might impress you :-) $ onstat -g ioq | sort -nrk4 | head -10 -- AIO I/O queues: q name/id len maxlen totalops dskread dskwrite dskcopy -- gfd 53 0 3824 13913625 12711860 1201765 0 gfd 51 0 3228 21124282 17794407 3329875 0 gfd 13 0 1920 64013103 37281232 26731871 0 gfd 9 0 1486 63746636 37051648 26694988 0 gfd 35 0 448 7975258 5494291 2480967 0 gfd 59 0 293 139126728 138370033 756695 0 gfd 47 0 257 12013748 1900379 10113369 0 gfd 113 0 255 23153575 19759065 3394510 0 gfd 114 0 253 56520026 53125987 3394039 0 gfd 60 0 245 88378187 87675628 702559 0 This instance uses cooked spaces and is an HDR primary. The secondary also has cooked space on the same Shark and also has long checkpoints at the same times. Other instances on the same box use raw and don't appear to suffer from these long checkpoints. Anyone seen this sort of behaviour before? Any help or pointers greatly appreciated. TIA, ------------------------------------------- Willem Roos - (+27) 21 980 4941 Per sercas vi malkovri Disclaimer http://www.shoprite.co.za/disclaimer.html sending to informix-list |
| |||
| Willem Roos wrote: > List, > > We have very sporadic, very long checkpoints on our IBM RS/6000 boxes > (AIX 4.3.3, IDS 7.31.UD4) connected to an IBM Shark (2105-F20), eg. 66 > seconds to write away 2772 KB. Normally checkpoints are 0 - 1 seconds > for 10 - 20 MB: > > $ onstat -c | grep -e BUFFERS -e LRU -e CKPTINTVL > BUFFERS 450000 # Maximum number of shared buffers > LRUS 16 # Number of LRU queues > LRU_MAX_DIRTY 2 # LRU percent dirty begin cleaning limit > LRU_MIN_DIRTY 1 # LRU percent dirty end cleaning limit > CKPTINTVL 120 # Check point interval (in sec) > > Yes, the io queues are long, this might impress you :-) > $ onstat -g ioq | sort -nrk4 | head -10 > -- > AIO I/O queues: > q name/id len maxlen totalops dskread dskwrite dskcopy > -- > gfd 53 0 3824 13913625 12711860 1201765 0 > gfd 51 0 3228 21124282 17794407 3329875 0 > gfd 13 0 1920 64013103 37281232 26731871 0 > gfd 9 0 1486 63746636 37051648 26694988 0 > gfd 35 0 448 7975258 5494291 2480967 0 > gfd 59 0 293 139126728 138370033 756695 0 > gfd 47 0 257 12013748 1900379 10113369 0 > gfd 113 0 255 23153575 19759065 3394510 0 > gfd 114 0 253 56520026 53125987 3394039 0 > gfd 60 0 245 88378187 87675628 702559 0 > > This instance uses cooked spaces and is an HDR primary. The secondary > also has cooked space on the same Shark and also has long checkpoints at > the same times. > > Other instances on the same box use raw and don't appear to suffer from > these long checkpoints. > > Anyone seen this sort of behaviour before? Any help or pointers greatly > appreciated. TIA, > > ------------------------------------------- > Willem Roos - (+27) 21 980 4941 > Per sercas vi malkovri > > Disclaimer > http://www.shoprite.co.za/disclaimer.html > > sending to informix-list Impressive Well ... Single point of failure ... Shark ... humm. Cooked devices show the problem, raw don't ... suggests some sort of caching being done on cooked. How many AIO VPs have you got? How many cleaners? Post an onstat -g ioa / onstat -d |
| ||||
| On Thu, 22 Apr 2004 14:29:20 +0100, "TBP@Nospam.Nothere.Co.Uk" <TBP@Nospam.Nothere.Co.Uk> wrote: >Willem Roos wrote: > >> List, >> >> We have very sporadic, very long checkpoints on our IBM RS/6000 boxes >> (AIX 4.3.3, IDS 7.31.UD4) connected to an IBM Shark (2105-F20), eg. 66 >> seconds to write away 2772 KB. Normally checkpoints are 0 - 1 seconds >> for 10 - 20 MB: >> >> $ onstat -c | grep -e BUFFERS -e LRU -e CKPTINTVL >> BUFFERS 450000 # Maximum number of shared buffers >> LRUS 16 # Number of LRU queues >> LRU_MAX_DIRTY 2 # LRU percent dirty begin cleaning limit >> LRU_MIN_DIRTY 1 # LRU percent dirty end cleaning limit >> CKPTINTVL 120 # Check point interval (in sec) >> >> Yes, the io queues are long, this might impress you :-) >> $ onstat -g ioq | sort -nrk4 | head -10 >> -- >> AIO I/O queues: >> q name/id len maxlen totalops dskread dskwrite dskcopy >> -- >> gfd 53 0 3824 13913625 12711860 1201765 0 >> gfd 51 0 3228 21124282 17794407 3329875 0 >> gfd 13 0 1920 64013103 37281232 26731871 0 >> gfd 9 0 1486 63746636 37051648 26694988 0 >> gfd 35 0 448 7975258 5494291 2480967 0 >> gfd 59 0 293 139126728 138370033 756695 0 >> gfd 47 0 257 12013748 1900379 10113369 0 >> gfd 113 0 255 23153575 19759065 3394510 0 >> gfd 114 0 253 56520026 53125987 3394039 0 >> gfd 60 0 245 88378187 87675628 702559 0 Rather unbalanced distribution there in terms of max queue lengths. Those first 4 certainly show an inability to complete i/os in a reasonable amount of time, and the first two look quite bad considering the total number of writes actually done. >> >> This instance uses cooked spaces and is an HDR primary. The secondary >> also has cooked space on the same Shark and also has long checkpoints at >> the same times. >> >> Other instances on the same box use raw and don't appear to suffer from >> these long checkpoints. >> >> Anyone seen this sort of behaviour before? Any help or pointers greatly >> appreciated. TIA, >> >> ------------------------------------------- >> Willem Roos - (+27) 21 980 4941 >> Per sercas vi malkovri >> >> Disclaimer >> http://www.shoprite.co.za/disclaimer.html >> >> sending to informix-list >Impressive > >Well ... > >Single point of failure ... Shark ... humm. > >Cooked devices show the problem, raw don't ... suggests some sort of >caching being done on cooked. Have to disagree. Caching should appear to improve i/o performance, as you wouldn't actually be performing the physical i/o. The overhead imposed by the o/s in handling buffered writes should not be that great. > >How many AIO VPs have you got? I'd say this is a good question. Remember that cooked chunks cannot take advantage of kaio, and the vps will have to do blocked writes. If there aren't enough of them (at least 1 per chunk), your i/o to the cooked chunks is going to be queuing up. However, the bit of info at the beginning of the message: "eg. 66 seconds to write away 2772 KB" would seem to indicate something else is going on, as a dearth of aio vps would be making all checkpoints slower, moreso for the ones with more data to write. So it seems to me the problem lies outside of the database. How are the shark disks configured for the volume(s) containing the cooked files (particularly those corresponding to those top 4 in the onstat output)? Could it be they are on a raid5 volume? >How many cleaners? > >Post an onstat -g ioa / onstat -d Dave |