This is a discussion on RE: Long checkpoints on AIX/IDS7.31/IBM Shark within the Informix forums, part of the Database Server Software category; --> Thank you TBP, Dave & Alex for you suggestions, I have duly increased my LRUS to 128, and CLEANERS ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Thank you TBP, Dave & Alex for you suggestions, I have duly increased my LRUS to 128, and CLEANERS & NUMAIOVPS after looking at onstat -g iov and checking the io/wup column. We've found the culprit which affects all hosts connected to the same Shark (and there are quite a few) - it was a SQLServer box doing a large delete. How come the Shark misbehaves so badly due to one particular operation (ie. SQLServer delete, it's not volume related, the SAN is quiet and the reports on the Shark don't show any high throughput) is beyond me - and apparently beyond local support as well since we've struggled with this since November last year and we've blamed everything from the usual firmware thing to backups and what not (even found some new entries for my BOFH DIY excuse board). Sooo, you do something quite innocent on one box and you bring all critical systems in your entire enterprise to a grinding halt, and worse, you're not even able to tell. Now isn't that what mass storage is all about :-... Kind regards, ------------------------------------------- Willem Roos - (+27) 21 980 4941 Per sercas vi malkovri Disclaimer http://www.shoprite.co.za/disclaimer.html sending to informix-list |
| ||||
| Willem Roos wrote: [ ... SNIP ... ] > > We've found the culprit which affects all hosts connected to the same > Shark (and there are quite a few) - it was a SQLServer box doing a large > delete. How come the Shark misbehaves so badly due to one particular > operation (ie. SQLServer delete, it's not volume related, the SAN is > quiet and the reports on the Shark don't show any high throughput) is > beyond me - and apparently beyond local support as well since we've [ ... SNIP ... ] > Sooo, you do something quite innocent on one box and you bring all > critical systems in your entire enterprise to a grinding halt, and > worse, you're not even able to tell. Now isn't that what mass storage is > all about :-... * caution: sark intermixed, as I still do not have the guts to construct my monthly NO SAN with RAID5 posting * I'd say this is what RAID5 in today SANs is all about. To see this type of behavior it is not necessary at all to use virtual or consolidated storage..... The most expensive operation on a RAID5: to do many small writes after reads, like a delete tends to do it. Actually you'll see like only a few MB/second and *think* this is not much. BUT: if the unitsize or pagesize of the database software doing the deletes is 4KB or even 8 KB and every page holds like 50 records to be deleted, the SAN must fetch 4KB worth of info from the disk (or must have it already in the SAN buffer) to do the delete. The page must be written back to buffer or disk 50 times. In *all* the cases I know of, and this is at least 1 dozen or 3 the SAN buffer layout is way too small for all the servers consolidated into 1 box and for all the disks you need). Let us assume now that the records to be deleted are nicely spread over tons of 4 KB pages in respect of the sequence of deletion. The only way to see from outside what happens in the up- and downstage arena inside the SAN is to have monitoring tools there or to be able to delete in the very same sequence as the records are packed into pages. Else you will not notice much traffic, not in the OS tools nor on the fabric switched, because the deletes are slow, but cause a lot of action inside the SAN box. If not, exactly what you describe will happen on every RAID5, be it standalone or in a SAN. Search in this newsgroup for 'NO RAID5' postings/rants from Art Kagel and you will read about more nasty lil details that come along with RAID5. dic_k -- Richard Kofler SOLID STATE EDV Dienstleistungen GmbH Vienna/Austria/Europe |