This is a discussion on No Subject within the Linux Operating System forums, part of the Unix Operating Systems category; --> Subject: RAID5 performance Lines: 30 Date: Thu, 11 Dec 2003 09:45:02 GMT NNTP-Posting-Host: 69.34.91.102 X-Complaints-To: abuse@earthlink.net X-Trace: newsread1.news.pas.earthlink.net 1071135902 ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Subject: RAID5 performance Lines: 30 Date: Thu, 11 Dec 2003 09:45:02 GMT NNTP-Posting-Host: 69.34.91.102 X-Complaints-To: abuse@earthlink.net X-Trace: newsread1.news.pas.earthlink.net 1071135902 69.34.91.102 (Thu, 11 Dec 2003 01:45:02 PST) NNTP-Posting-Date: Thu, 11 Dec 2003 01:45:02 PST Xref: intern1.nntp.aus1.giganews.com comp.os.linux.setup:450934 I'm setting up a database server with a LSI Logic U320 RAID controller, using the megaraid2 driver. So far, I've only tested simple read and write performance. I'm not particularly impressed using the defaults. There are six Ultrastar 146Z10 drives, each capable of up to 66MB/sec sustained throughput (near track 0 - goes down to around 40MB/sec towards the last track). With the six drives setup as RAID5 with the default 64KB stripe size, the write performance seems to around 8 to 12MB/sec. That's horribly slow. I know writing parity will slow down writes with RAID5, but it shouldn't slow them down to a fraction of the performance of a single drive. Read performance is better, about 150MB/sec. However, that's an average of only 25MB/sec per drive, which is less than half the transfer rate for the region of the drives that the read testing was done from. I'm hoping someone knows off hand how best to tune this, so I don't have to spend a whole lot of time testing different stripe sizes and cache settings. -- - Mike Remove 'spambegone.net' and reverse to send e-mail. |
| |||
| Mike Ruskai <spamten.knilhtrae@begonedynnaht.net> wrote: > I'm setting up a database server with a LSI Logic U320 RAID controller, > using the megaraid2 driver. [..] > With the six drives setup as RAID5 with the default 64KB stripe size, the > write performance seems to around 8 to 12MB/sec. That's horribly slow. I > know writing parity will slow down writes with RAID5, but it shouldn't > slow them down to a fraction of the performance of a single drive. Can't really (for sure) remember if it was with such an megaraid.o powered controller, however write performance was even lower. The only way to get some throughput was running RAID 10 (1+0) on the controller. I'd go for RAID1 on the system disk anyway. If this works better, I'd drop the author a mail about the problem. -- Michael Heiming Remove +SIGNS and www. if you expect an answer, sorry for inconvenience, but I get tons of SPAM |
| |||
| "Mike Ruskai" <spamten.knilhtrae@begonedynnaht.net> said: >There are six Ultrastar 146Z10 drives, each capable of up to 66MB/sec >sustained throughput (near track 0 - goes down to around 40MB/sec towards >the last track). > >With the six drives setup as RAID5 with the default 64KB stripe size, the >write performance seems to around 8 to 12MB/sec. That's horribly slow. I >know writing parity will slow down writes with RAID5, but it shouldn't >slow them down to a fraction of the performance of a single drive. > >Read performance is better, about 150MB/sec. However, that's an average >of only 25MB/sec per drive, which is less than half the transfer rate for >the region of the drives that the read testing was done from. How are you measuring? This might well depend on the write size you're using. So, with small writes (below the stripe size - or was it below (ndisks-1)*stripesz), the disk subsystem needs to do read-modify- calculate-write cycles to update the parity. When you write big enough chunks at a time, the read-modify -part becomes obsolete as what is being written overwrites all the old data within the stripe, so as there's no need to read the old data from the stripe to calculate the new parity, the RAID controller can just calculate the new parity and write the full stripe (incl. new parity). -- Wolf a.k.a. Juha Laiho Espoo, Finland (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++ "...cancel my subscription to the resurrection!" (Jim Morrison) |
| |||
| Juha Laiho <Juha.Laiho@iki.fi> wrote: > "Mike Ruskai" <spamten.knilhtrae@begonedynnaht.net> said: > >There are six Ultrastar 146Z10 drives, each capable of up to 66MB/sec > >sustained throughput (near track 0 - goes down to around 40MB/sec towards > >the last track). > > > >With the six drives setup as RAID5 with the default 64KB stripe size, the > >write performance seems to around 8 to 12MB/sec. That's horribly slow. I > >know writing parity will slow down writes with RAID5, but it shouldn't > >slow them down to a fraction of the performance of a single drive. Eh? Won't it have to read the parity disk, read the target disk, then write the target disk, then write the parity disk (yesss, I know it's not a disk but a stripe)? That would make it four times as slow on write to start with. And that'sif they do it the way I just suggested. They could read all the data disks instead of reading the parity disk :-). > >Read performance is better, about 150MB/sec. However, that's an average You'd expect it to be equal to the bandwidth available on your buses and controllers. How are these arranged? Are these scsi or IDE? If IDE, then you're talking three buses, no? And you can only access one at a time on each bus. So I'd expect the bandwidth to be three times 60MB/s or so, which is what you are getting. > >of only 25MB/sec per drive, which is less than half the transfer rate for > >the region of the drives that the read testing was done from. Eh? > How are you measuring? This might well depend on the write size you're > using. > > So, with small writes (below the stripe size - or was it below > (ndisks-1)*stripesz), the disk subsystem needs to do read-modify- > calculate-write cycles to update the parity. When you write big Yes. > enough chunks at a time, the read-modify -part becomes obsolete as > what is being written overwrites all the old data within the stripe, I don't get that! Surely he's not rewriting the same data area again and again! > so as there's no need to read the old data from the stripe to calculate > the new parity, the RAID controller can just calculate the new parity > and write the full stripe (incl. new parity). Eh? He'd have to read the data off all disks to find the current parity without reading the parity stripe itself. I don't get the argument Are you saying he can build up a picture in cache of the parity as he writes? That's not clear to me. Peter |
| |||
| Subject: Re: RAID5 performance Lines: 47 Date: Fri, 12 Dec 2003 23:09:05 GMT NNTP-Posting-Host: 69.34.89.106 X-Complaints-To: abuse@earthlink.net X-Trace: newsread1.news.pas.earthlink.net 1071270545 69.34.89.106 (Fri, 12 Dec 2003 15:09:05 PST) NNTP-Posting-Date: Fri, 12 Dec 2003 15:09:05 PST Xref: intern1.nntp.aus1.giganews.com comp.os.linux.setup:450998 On Fri, 12 Dec 2003 20:10:14 GMT, P.T. Breuer wrote: >> "Mike Ruskai" <spamten.knilhtrae@begonedynnaht.net> said: >> >Read performance is better, about 150MB/sec. However, that's an average > >You'd expect it to be equal to the bandwidth available on your buses and >controllers. How are these arranged? Are these scsi or IDE? If IDE, then >you're talking three buses, no? And you can only access one at a time >on each bus. So I'd expect the bandwidth to be three times 60MB/s or >so, which is what you are getting. "IDE" and "RAID" do not belong together in the same sentence. The drives are U320 SCSI, one a single channel. At full transfer speed they would saturate the channel, but I don't expect that to happen often. Being SCSI, all drives can be read simultaneously. >> >of only 25MB/sec per drive, which is less than half the transfer rate for >> >the region of the drives that the read testing was done from. > >Eh? Hard drives do not have the same media transfer rate from one track to the next. Track 0 is much faster than the last track, typically by about 100%. The data being read was at the beginning of the volume, meaning it should have been stored physically near track 0 on each drive. >I don't get that! Surely he's not rewriting the same data area again >and again! The (very) simple test involved copying a 1.7GB file to the volume, after it had been read completely and still resided completely in the read buffer of the source drive (the box has 4GB of RAM), leaving the read speed of that drive out of the equation. Not a scientific test, to be sure, but rough and ready. -- - Mike Remove 'spambegone.net' and reverse to send e-mail. |
| |||
| Mike Ruskai <spamten.knilhtrae@begonedynnaht.net> wrote: > >I don't get that! Surely he's not rewriting the same data area again > >and again! > > The (very) simple test involved copying a 1.7GB file to the volume, after > it had been read completely and still resided completely in the read > buffer of the source drive (the box has 4GB of RAM), leaving the read Oh, I see. I imagined the whole file could not be cached in ram (I've never had my hands on as much ram as that!) given the figure of 17GB, hence my puzzlement. > speed of that drive out of the equation. Peter |
| |||
| On Thu, 11 Dec 2003 09:45:02 +0000, Mike Ruskai wrote: > I'm setting up a database server with a LSI Logic U320 RAID controller, > using the megaraid2 driver. > > So far, I've only tested simple read and write performance. I'm not > particularly impressed using the defaults. > > There are six Ultrastar 146Z10 drives, each capable of up to 66MB/sec > sustained throughput (near track 0 - goes down to around 40MB/sec towards > the last track). > > With the six drives setup as RAID5 with the default 64KB stripe size, the > write performance seems to around 8 to 12MB/sec. That's horribly slow. I > know writing parity will slow down writes with RAID5, but it shouldn't > slow them down to a fraction of the performance of a single drive. > > Read performance is better, about 150MB/sec. However, that's an average > of only 25MB/sec per drive, which is less than half the transfer rate for > the region of the drives that the read testing was done from. > > I'm hoping someone knows off hand how best to tune this, so I don't have > to spend a whole lot of time testing different stripe sizes and cache > settings. This may be too late for you, but here are my recommendations: 1. Do not use RAID 5, it is slow on write performance. Use RAID 10 if your controller supports it (it probably does). At work our software is basically a database application and on a RAID 5 it crawls but RAID 10 it is much, much better. 2. Use a smaller stripe size. 16KB would be good. This is what we have found to work best. If we are setting up the server for our clients on UNIX/Linux we go 8 KB or 16KB for the stripe. Don -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World! -----== Over 100,000 Newsgroups - 19 Different Servers! =----- |
| ||||
| ptb@oboe.it.uc3m.es (P.T. Breuer) said: >Juha Laiho <Juha.Laiho@iki.fi> wrote: >> So, with small writes (below the stripe size - or was it below >> (ndisks-1)*stripesz), the disk subsystem needs to do read-modify- >> calculate-write cycles to update the parity. When you write big > >Yes. > >> enough chunks at a time, the read-modify -part becomes obsolete as >> what is being written overwrites all the old data within the stripe, > >I don't get that! Surely he's not rewriting the same data area again >and again! No, not same. But all data on RAID5 is arranged on stripes. So, there's a set of data stripes corresponding to a single parity stripe. Whenever your write is so large that it completely overwrites a complete set of data stripes corresponding to a single parity stripe, then the parity can be calculated purely based on the written data - so the old parity (and rest of the old stripe) does not need to be read from the disks. -- Wolf a.k.a. Juha Laiho Espoo, Finland (GC 3.0) GIT d- s+: a C++ ULSH++++$ P++@ L+++ E- W+$@ N++ !K w !O !M V PS(+) PE Y+ PGP(+) t- 5 !X R !tv b+ !DI D G e+ h---- r+++ y++++ "...cancel my subscription to the resurrection!" (Jim Morrison) |
| Thread Tools | |
| Display Modes | |
|
|