* mkfs options for a 16x hw raid5 and xfs (mostly large files) @ 2007-09-23 9:38 Ralf Gross 2007-09-23 12:56 ` Peter Grandi 2007-09-24 17:31 ` Ralf Gross 0 siblings, 2 replies; 48+ messages in thread From: Ralf Gross @ 2007-09-23 9:38 UTC (permalink / raw) To: linux-xfs Hi, we have a new large raid array, the shelf has 48 disks, the max. amount of disks in a single raid 5 set is 16. There will be one global spare disk, thus we have two raid 5 with 15 data disks and one with 14 data disk. The data on these raid sets will be video data + some meta data. Typically each set of data consist of a 2 GB + 500 MB + 100 MB + 20 KB +2 KB file. There will be some dozen of these sets in a single directory - but not many hundred or thousend. Often the data will be transfernd from the windows clients to the server in some parallel copy jobs at night (eg. 5-10, for each new data directory). The clients will access the data later (mostly) read only, the data will not be changed after it was stored on the file server. Each client then needs a data stream of about 17 MB/s (max. 5 clients are expected to acces the data in parallel). I expect the fs, each will have a size of 10-11 TB, to be filled > 90%. I know this is not ideal, but we need every GB we can get. I already played with different mkfs.xfs options (sw, su) but didn't see much of a difference. The volume sets of the hw raid have the following parameters: 11,xx TB (15 data disks): Chunk Size : 64 KB (values of 64/128/256 KB are possible, I'll try 256 KB next week) Stripe Size : 960 KB (15 x 64 KM) or 10,xx TB (14 data disks): Chunk Size : 64 KB Stripe Size : 896 KB (14 x 64 KB) The created logical volumes have a block size of 512 bytes (the only possible value). Any ideas what options I should use for mkfs.xfs? At the moment I get about 150 MB/s in seq. writing (tiobench) and 160 MB/s in seq. reading. This is ok, but I'm curious what I could get with tuned xfs parameters. The system is running debian etch (amd64) with 16 GB of RAM. The raid array is connect to the server by fibre channel. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-23 9:38 mkfs options for a 16x hw raid5 and xfs (mostly large files) Ralf Gross @ 2007-09-23 12:56 ` Peter Grandi 2007-09-26 14:54 ` Ralf Gross 2007-09-24 17:31 ` Ralf Gross 1 sibling, 1 reply; 48+ messages in thread From: Peter Grandi @ 2007-09-23 12:56 UTC (permalink / raw) To: Linux XFS >>> On Sun, 23 Sep 2007 11:38:41 +0200, Ralf Gross >>> <Ralf-Lists@ralfgross.de> said: Ralf> Hi, we have a new large raid array, the shelf has 48 disks, Ralf> the max. amount of disks in a single raid 5 set is 16. Too bad about that petty limitation ;-). Ralf> There will be one global spare disk, thus we have two raid 5 Ralf> with 15 data disks and one with 14 data disk. Ahhh a positive-thinking, can-do, brave design ;-). [ ... ] Ralf> Often the data will be transfernd from the windows clients Ralf> to the server in some parallel copy jobs at night (eg. 5-10, Ralf> for each new data directory). The clients will access the Ralf> data later (mostly) read only, the data will not be changed Ralf> after it was stored on the file server. This is good, and perhaps in some cases one of the few cases in which even RAID5 naysayers might now object too much. Ralf> Each client then needs a data stream of about 17 MB/s Ralf> (max. 5 clients are expected to acces the data in parallel). Do the requirements include as features some (possibly several) hours of ''challenging'' read performance if any disk fails or total loss of data if another disks fails during that time? ;-) IIRC Google have reported 5% per year disk failure rates across a very wide mostly uncorrelated population, you have 48 disks, perhaps 2-3 disks per year will fail. Perhaps more and more often, because they will likely be all from the same manufacturer, model, batch and spinning in the same environment. Ralf> [ ... ] I expect the fs, each will have a size of 10-11 TB, Ralf> to be filled > 90%. I know this is not ideal, but we need Ralf> every GB we can get. That "every GB we can get" is often the key in ''wide RAID5'' stories. Cheap as well as fast and safe, you can have it all with wide RAID5 setups, so the salesmen would say ;-). Ralf> [ ... ] Stripe Size : 960 KB (15 x 64 KM) Ralf> [ ... ] Stripe Size : 896 KB (14 x 64 KB) Pretty long stripes, I wonder what happens when a whole stripe cannot be written at once or it can but is not naturally aligned ;-). Ralf> [ ... ] about 150 MB/s in seq. writing Surprise surprise ;-). Ralf> (tiobench) and 160 MB/s in seq. reading. This is sort of low. If there something that RAID5 can do sort of OK is reads (if there are no faults). I'd look at the underlying storage system and the maximum performance that you can get out of a single disk. I have seen a 45-drive 500GB storage subsystem where each drive can deliver at most 7-10MB/s (even if the same disk standalone in an ordinary PC can do 60-70MB/s), and the supplier actually claims so in their published literature (that RAID product is meant to compete *only* with tape backup subsystems). Your later comment that "The raid array is connect to the server by fibre channel" makes me suspect that it may be the same brand. Ralf> This is ok, As the total aggregate requirement is 5x17MB/s this is probably the case [as long as there are no drive failures ;-)]. Ralf> but I'm curious what I could get with tuned xfs parameters. Looking at the archives of this mailing list the topic ''good mkfs parameters'' reappears frequently, even if usually for smaller arrays, as many have yet to discover the benefits of 15-wide RAID5 setups ;-). Threads like these may help: http://OSS.SGI.com/archives/xfs/2007-01/msg00079.html http://OSS.SGI.com/archives/xfs/2007-05/msg00051.html ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-23 12:56 ` Peter Grandi @ 2007-09-26 14:54 ` Ralf Gross 2007-09-26 16:27 ` [UNSURE] " Justin Piszcz 2007-09-27 15:22 ` Ralf Gross 0 siblings, 2 replies; 48+ messages in thread From: Ralf Gross @ 2007-09-26 14:54 UTC (permalink / raw) To: linux-xfs Peter Grandi schrieb: > Ralf> Hi, we have a new large raid array, the shelf has 48 disks, > Ralf> the max. amount of disks in a single raid 5 set is 16. > > Too bad about that petty limitation ;-). Yeah, I prefer 24x RAID 5 without spare. Why waste so much space ;) After talking to the people that own the data and wanted to use as much as possible space of the device, we'll start with four 12/11 disk RAID 6 volumes (47 disk + 1 spare). That's ~12% less space than before with five RAID 5 volumes. I think this is a good compromise between safety and max. usable disk space. There's only one point left: will the RAID 6 during a rebuild be able to deliver 2-3 streams of 17MB/s. Write performance is not the point then, but the clients will be running simulations for up to 5 days and need this (more or less) constant data rate. Now that I'm getting ~400 MB/s (which is limited by the FC controller) this should be possible. > Ralf> There will be one global spare disk, thus we have two raid 5 > Ralf> with 15 data disks and one with 14 data disk. > > Ahhh a positive-thinking, can-do, brave design ;-). We have a 60 slot tape lib too (well, we'll have next week...I hope). I know that raid != backup. > [ ... ] > Ralf> Each client then needs a data stream of about 17 MB/s > Ralf> (max. 5 clients are expected to acces the data in parallel). > > Do the requirements include as features some (possibly several) > hours of ''challenging'' read performance if any disk fails or > total loss of data if another disks fails during that time? ;-) The data then is still on USB disks and on tape. Maybe I'll pull out a disk of one of the new RAID 6 volumes and see how much the read performance drops. At the moment only one test bed is active, thus 17 MB/s would be ok. Later with 5 test beds 5 x 17 MB/s are needed (if they are online at the same time). > IIRC Google have reported 5% per year disk failure rates across a > very wide mostly uncorrelated population, you have 48 disks, > perhaps 2-3 disks per year will fail. Perhaps more and more often, > because they will likely be all from the same manufacturer, model, > batch and spinning in the same environment. Hey, these are ENTERPRISE disks ;) As far as I know, we couldn't even use other disks than the ones that the manufacturer provides (modified firmware?). > Ralf> [ ... ] I expect the fs, each will have a size of 10-11 TB, > Ralf> to be filled > 90%. I know this is not ideal, but we need > Ralf> every GB we can get. > > That "every GB we can get" is often the key in ''wide RAID5'' > stories. Cheap as well as fast and safe, you can have it all with > wide RAID5 setups, so the salesmen would say ;-). I think we now have found a reasonable solution. > Ralf> [ ... ] Stripe Size : 960 KB (15 x 64 KM) > Ralf> [ ... ] Stripe Size : 896 KB (14 x 64 KB) > > Pretty long stripes, I wonder what happens when a whole stripe > cannot be written at once or it can but is not naturally aligned > ;-). I'm still confused bye the chunk/stripe and block size values. The block size of the HW-RAID is fixed to 512 bytes, I think that a bit small. Also, I first thought about wasting disk space with larger chunk/stripe sizes (HW RAID), but as the OS/FS doesn't necessarily know about the values, it can't be true - unlike the FS block size which defines the smallest possible file size. > Ralf> [ ... ] about 150 MB/s in seq. writing > > Surprise surprise ;-). > > Ralf> (tiobench) and 160 MB/s in seq. reading. > > This is sort of low. If there something that RAID5 can do sort of > OK is reads (if there are no faults). I'd look at the underlying > storage system and the maximum performance that you can get out of > a single disk. /sbin/blockdev --setra 16384 /dev/sdc was the key to ~400 MB/s read performance. > I have seen a 45-drive 500GB storage subsystem where each drive > can deliver at most 7-10MB/s (even if the same disk standalone in > an ordinary PC can do 60-70MB/s), and the supplier actually claims > so in their published literature (that RAID product is meant to > compete *only* with tape backup subsystems). Your later comment > that "The raid array is connect to the server by fibre channel" > makes me suspect that it may be the same brand. > > Ralf> This is ok, > > As the total aggregate requirement is 5x17MB/s this is probably > the case [as long as there are no drive failures ;-)]. > > Ralf> but I'm curious what I could get with tuned xfs parameters. > > Looking at the archives of this mailing list the topic ''good mkfs > parameters'' reappears frequently, even if usually for smaller > arrays, as many have yet to discover the benefits of 15-wide RAID5 > setups ;-). Threads like these may help: > > http://OSS.SGI.com/archives/xfs/2007-01/msg00079.html > http://OSS.SGI.com/archives/xfs/2007-05/msg00051.html I've seen some of JP's postings before. I couldn't get much more performace with the sw/su options, I got the best results with the default values. But I haven't tried external logs yet. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [UNSURE] Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 14:54 ` Ralf Gross @ 2007-09-26 16:27 ` Justin Piszcz 2007-09-26 16:54 ` Ralf Gross 2007-09-27 15:22 ` Ralf Gross 1 sibling, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-26 16:27 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs On Wed, 26 Sep 2007, Ralf Gross wrote: > Peter Grandi schrieb: >> Ralf> Hi, we have a new large raid array, the shelf has 48 disks, >> Ralf> the max. amount of disks in a single raid 5 set is 16. >> >> Too bad about that petty limitation ;-). > > Yeah, I prefer 24x RAID 5 without spare. Why waste so much space ;) > > After talking to the people that own the data and wanted to use as > much as possible space of the device, we'll start with four 12/11 > disk RAID 6 volumes (47 disk + 1 spare). That's ~12% less space than > before with five RAID 5 volumes. I think this is a good compromise > between safety and max. usable disk space. > > There's only one point left: will the RAID 6 during a rebuild be able > to deliver 2-3 streams of 17MB/s. Write performance is not the point > then, but the clients will be running simulations for up to 5 days and > need this (more or less) constant data rate. Now that I'm getting > ~400 MB/s (which is limited by the FC controller) this should be > possible. > >> Ralf> There will be one global spare disk, thus we have two raid 5 >> Ralf> with 15 data disks and one with 14 data disk. >> >> Ahhh a positive-thinking, can-do, brave design ;-). > > We have a 60 slot tape lib too (well, we'll have next week...I hope). > I know that raid != backup. > >> [ ... ] >> Ralf> Each client then needs a data stream of about 17 MB/s >> Ralf> (max. 5 clients are expected to acces the data in parallel). >> >> Do the requirements include as features some (possibly several) >> hours of ''challenging'' read performance if any disk fails or >> total loss of data if another disks fails during that time? ;-) > > The data then is still on USB disks and on tape. Maybe I'll pull out > a disk of one of the new RAID 6 volumes and see how much the read > performance drops. At the moment only one test bed is active, thus 17 > MB/s would be ok. Later with 5 test beds 5 x 17 MB/s are needed (if > they are online at the same time). > >> IIRC Google have reported 5% per year disk failure rates across a >> very wide mostly uncorrelated population, you have 48 disks, >> perhaps 2-3 disks per year will fail. Perhaps more and more often, >> because they will likely be all from the same manufacturer, model, >> batch and spinning in the same environment. > > Hey, these are ENTERPRISE disks ;) As far as I know, we couldn't even > use other disks than the ones that the manufacturer provides (modified > firmware?). > >> Ralf> [ ... ] I expect the fs, each will have a size of 10-11 TB, >> Ralf> to be filled > 90%. I know this is not ideal, but we need >> Ralf> every GB we can get. >> >> That "every GB we can get" is often the key in ''wide RAID5'' >> stories. Cheap as well as fast and safe, you can have it all with >> wide RAID5 setups, so the salesmen would say ;-). > > I think we now have found a reasonable solution. > >> Ralf> [ ... ] Stripe Size : 960 KB (15 x 64 KM) >> Ralf> [ ... ] Stripe Size : 896 KB (14 x 64 KB) >> >> Pretty long stripes, I wonder what happens when a whole stripe >> cannot be written at once or it can but is not naturally aligned >> ;-). > > I'm still confused bye the chunk/stripe and block size values. The > block size of the HW-RAID is fixed to 512 bytes, I think that a bit > small. > > Also, I first thought about wasting disk space with larger > chunk/stripe sizes (HW RAID), but as the OS/FS doesn't necessarily > know about the values, it can't be true - unlike the FS block size > which defines the smallest possible file size. > >> Ralf> [ ... ] about 150 MB/s in seq. writing >> >> Surprise surprise ;-). >> >> Ralf> (tiobench) and 160 MB/s in seq. reading. >> >> This is sort of low. If there something that RAID5 can do sort of >> OK is reads (if there are no faults). I'd look at the underlying >> storage system and the maximum performance that you can get out of >> a single disk. > > /sbin/blockdev --setra 16384 /dev/sdc > > was the key to ~400 MB/s read performance. > >> I have seen a 45-drive 500GB storage subsystem where each drive >> can deliver at most 7-10MB/s (even if the same disk standalone in >> an ordinary PC can do 60-70MB/s), and the supplier actually claims >> so in their published literature (that RAID product is meant to >> compete *only* with tape backup subsystems). Your later comment >> that "The raid array is connect to the server by fibre channel" >> makes me suspect that it may be the same brand. >> >> Ralf> This is ok, >> >> As the total aggregate requirement is 5x17MB/s this is probably >> the case [as long as there are no drive failures ;-)]. >> >> Ralf> but I'm curious what I could get with tuned xfs parameters. >> >> Looking at the archives of this mailing list the topic ''good mkfs >> parameters'' reappears frequently, even if usually for smaller >> arrays, as many have yet to discover the benefits of 15-wide RAID5 >> setups ;-). Threads like these may help: >> >> http://OSS.SGI.com/archives/xfs/2007-01/msg00079.html >> http://OSS.SGI.com/archives/xfs/2007-05/msg00051.html > > I've seen some of JP's postings before. I couldn't get much more > performace with the sw/su options, I got the best results with the > default values. But I haven't tried external logs yet. > > Ralf > > /sbin/blockdev --setra 16384 /dev/sdc was the key to ~400 MB/s read performance. Nice, what do you get for write speed? Justin. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [UNSURE] Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 16:27 ` [UNSURE] " Justin Piszcz @ 2007-09-26 16:54 ` Ralf Gross 2007-09-26 16:59 ` Justin Piszcz 2007-09-26 17:13 ` [UNSURE] " Bryan J. Smith 0 siblings, 2 replies; 48+ messages in thread From: Ralf Gross @ 2007-09-26 16:54 UTC (permalink / raw) To: linux-xfs Justin Piszcz schrieb: > > > > /sbin/blockdev --setra 16384 /dev/sdc > > > > was the key to ~400 MB/s read performance. > > Nice, what do you get for write speed? Still 170-200 MB/s. The command above just tunes the read ahead value. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 16:54 ` Ralf Gross @ 2007-09-26 16:59 ` Justin Piszcz 2007-09-26 17:38 ` Bryan J. Smith 2007-09-26 17:13 ` [UNSURE] " Bryan J. Smith 1 sibling, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-26 16:59 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs On Wed, 26 Sep 2007, Ralf Gross wrote: > Justin Piszcz schrieb: >>> >>> /sbin/blockdev --setra 16384 /dev/sdc >>> >>> was the key to ~400 MB/s read performance. >> >> Nice, what do you get for write speed? > > Still 170-200 MB/s. The command above just tunes the read ahead value. > > Ralf > > Yes, I understand; what is the equivalent tweak for HW RAID? I have tried to tweak some HW RAIDS (3ware 9550SX's) with ~10 drives and one can set the read ahead for better reads but writes are still slow, that 3ware 'tuning' doc always get passed around but it never helps much at least in my testing. I wonder where the bottleneck lies. Justin. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 16:59 ` Justin Piszcz @ 2007-09-26 17:38 ` Bryan J. Smith 2007-09-26 17:41 ` Justin Piszcz 0 siblings, 1 reply; 48+ messages in thread From: Bryan J. Smith @ 2007-09-26 17:38 UTC (permalink / raw) To: Justin Piszcz, Ralf Gross; +Cc: linux-xfs Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > I wonder where the bottleneck lies. The microcontroller. Listen, for the last time, hardware RAID is _not_ for non-blocking I/O. Hardware RAID is for in-line XOR streaming off-load, so it doesn't tie up a system interconnect (which isn't an ideal use for it). A hardware RAID card is when you have other things going on in your interconnect that you don't want the parity LOAD-XOR-STOR to take away from what it could be using for the service. It will _never_ have the "raw performance" of OS optimized software RAID. At the same time, OS optimized software RAID's impact on the system interconnect is one of those "unmeasurable" details _unless_ you actually benchmark your application. I have repeatedly had issues with elementary UDP/IP NFS performance when the PIO of software RAID is hogging the system interconnect. Same deal for large numbers of large database record commits. -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 17:38 ` Bryan J. Smith @ 2007-09-26 17:41 ` Justin Piszcz 2007-09-26 17:55 ` Bryan J. Smith 0 siblings, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-26 17:41 UTC (permalink / raw) To: b.j.smith; +Cc: Ralf Gross, linux-xfs On Wed, 26 Sep 2007, Bryan J. Smith wrote: > Justin Piszcz <jpiszcz@lucidpixels.com> wrote: >> I wonder where the bottleneck lies. > > The microcontroller. > > Listen, for the last time, hardware RAID is _not_ for non-blocking > I/O. Hardware RAID is for in-line XOR streaming off-load, so it > doesn't tie up a system interconnect (which isn't an ideal use for > it). I agree and this makes sense but in real-world loads it makes me wonder, at least with the 2.4 kernel. I see hosts where total streaming does not take place, instead lots of little files are copied on and off a host and with the 2.4 kernel (RHEL3) the system 'feels' as if it were buried even though the load is not that high ~9-10-15. Using ext3 on a 9 or 10 disk RAID5 with default RAID parameters on a 3ware 9550SX card. Justin. > > A hardware RAID card is when you have other things going on in your > interconnect that you don't want the parity LOAD-XOR-STOR to take > away from what it could be using for the service. > > It will _never_ have the "raw performance" of OS optimized software > RAID. At the same time, OS optimized software RAID's impact on the > system interconnect is one of those "unmeasurable" details _unless_ > you actually benchmark your application. > > I have repeatedly had issues with elementary UDP/IP NFS performance > when the PIO of software RAID is hogging the system interconnect. > Same deal for large numbers of large database record commits. Understood. > > > -- > Bryan J. Smith Professional, Technical Annoyance > b.j.smith@ieee.org http://thebs413.blogspot.com > -------------------------------------------------- > Fission Power: An Inconvenient Solution > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 17:41 ` Justin Piszcz @ 2007-09-26 17:55 ` Bryan J. Smith 0 siblings, 0 replies; 48+ messages in thread From: Bryan J. Smith @ 2007-09-26 17:55 UTC (permalink / raw) To: Justin Piszcz, b.j.smith; +Cc: Ralf Gross, linux-xfs Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > I agree and this makes sense but in real-world loads it makes me > wonder, at least with the 2.4 kernel. It took 3Ware (actually AMCC) a good 18 months to get the firmware and driver tuned well. I purposely avoided any new 3Ware solution until a good 12-18 months after release because of such. The 9550SX series was the first microcontroller approach (PowerPC 400 series) done by 3Ware. All of their prior designs were an older 64-bit ASIC design with SRAM (and only DRAM slapped on, poorly, in the 9500S), which only worked well for RAID-0/1/10, not 5. That was well into the 2.6 era. I'd say you're well out of date with what 3Ware, let alone the Intel X-Scale-based Areca, are actually capable of with RAID-5/6 now. -- Bryan P.S. Both AMD and Intel are currently putting serious R&D into the first embedded x86 designs with added ASICs for Network, Storage, etc... I.e., this is going to be mainstream shortly, as AMD got out of 29000 long ago, and Intel is putting less and less focus on IOP33x/34x X-Scale. NPE, SPE and other units can literally handle DTRs that are 10x of what a general CPU/interconnect LOAD-op-STOR can do. I.e., Don't be surprised when your 2009+ server mainboard ICH is actually an embedded x86 processor with NPE and SPE units. That will finally remove the whole "separate card" in general. -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [UNSURE] Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 16:54 ` Ralf Gross 2007-09-26 16:59 ` Justin Piszcz @ 2007-09-26 17:13 ` Bryan J. Smith 2007-09-26 17:27 ` Justin Piszcz 1 sibling, 1 reply; 48+ messages in thread From: Bryan J. Smith @ 2007-09-26 17:13 UTC (permalink / raw) To: Ralf Gross, linux-xfs Ralf Gross <Ralf-Lists@ralfgross.de> wrote: > Still 170-200 MB/s. The command above just tunes the read ahead > value. Don't expect your commits to an external subsystem to be anywhere near as fast as software RAID in simple disk benchmarks. -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [UNSURE] Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 17:13 ` [UNSURE] " Bryan J. Smith @ 2007-09-26 17:27 ` Justin Piszcz 2007-09-26 17:35 ` Bryan J. Smith 0 siblings, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-26 17:27 UTC (permalink / raw) To: b.j.smith; +Cc: Ralf Gross, linux-xfs On Wed, 26 Sep 2007, Bryan J. Smith wrote: > Ralf Gross <Ralf-Lists@ralfgross.de> wrote: >> Still 170-200 MB/s. The command above just tunes the read ahead >> value. > > Don't expect your commits to an external subsystem to be anywhere > near as fast as software RAID in simple disk benchmarks. > > > -- > Bryan J. Smith Professional, Technical Annoyance > b.j.smith@ieee.org http://thebs413.blogspot.com > -------------------------------------------------- > Fission Power: An Inconvenient Solution > > So what tunables do the 9550/9650SE users utilize to achieve > 500 MiB/s write on HW RAID5/6? Justin. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [UNSURE] Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 17:27 ` Justin Piszcz @ 2007-09-26 17:35 ` Bryan J. Smith 2007-09-26 17:37 ` Justin Piszcz 0 siblings, 1 reply; 48+ messages in thread From: Bryan J. Smith @ 2007-09-26 17:35 UTC (permalink / raw) To: Justin Piszcz, b.j.smith; +Cc: Ralf Gross, linux-xfs Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > So what tunables do the 9550/9650SE users utilize to achieve > 500 > MiB/s write on HW RAID5/6? Don't know. But I've never claimed it was capable of it either. At the same time, I've seen software RAID do over 500MBps, only to drop to under 50MBps aggregate client DTR under load. -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [UNSURE] Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 17:35 ` Bryan J. Smith @ 2007-09-26 17:37 ` Justin Piszcz 2007-09-26 17:38 ` Justin Piszcz 2007-09-26 17:49 ` Bryan J. Smith 0 siblings, 2 replies; 48+ messages in thread From: Justin Piszcz @ 2007-09-26 17:37 UTC (permalink / raw) To: Bryan J. Smith; +Cc: Ralf Gross, linux-xfs On Wed, 26 Sep 2007, Bryan J. Smith wrote: > Justin Piszcz <jpiszcz@lucidpixels.com> wrote: >> So what tunables do the 9550/9650SE users utilize to achieve > 500 >> MiB/s write on HW RAID5/6? > > Don't know. But I've never claimed it was capable of it either. > > At the same time, I've seen software RAID do over 500MBps, only to > drop to under 50MBps aggregate client DTR under load. Do you have any type of benchmarks to similate the load you are mentioning? What did HW RAID drop to when the same test was run with SW RAID / 50 MBps under load? Did it achieve better performance due to an on-board / raid-card controller cache, or? Justin. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [UNSURE] Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 17:37 ` Justin Piszcz @ 2007-09-26 17:38 ` Justin Piszcz 2007-09-26 17:49 ` Bryan J. Smith 1 sibling, 0 replies; 48+ messages in thread From: Justin Piszcz @ 2007-09-26 17:38 UTC (permalink / raw) To: Bryan J. Smith; +Cc: Ralf Gross, linux-xfs On Wed, 26 Sep 2007, Justin Piszcz wrote: > > > On Wed, 26 Sep 2007, Bryan J. Smith wrote: > >> Justin Piszcz <jpiszcz@lucidpixels.com> wrote: >>> So what tunables do the 9550/9650SE users utilize to achieve > 500 >>> MiB/s write on HW RAID5/6? >> >> Don't know. But I've never claimed it was capable of it either. >> >> At the same time, I've seen software RAID do over 500MBps, only to >> drop to under 50MBps aggregate client DTR under load. > > Do you have any type of benchmarks to similate the load you are mentioning? > What did HW RAID drop to when the same test was run with SW RAID / 50 MBps > under load? Did it achieve better performance due to an on-board / raid-card > controller cache, or? > > Justin. > > simulate* rather. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [UNSURE] Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 17:37 ` Justin Piszcz 2007-09-26 17:38 ` Justin Piszcz @ 2007-09-26 17:49 ` Bryan J. Smith 1 sibling, 0 replies; 48+ messages in thread From: Bryan J. Smith @ 2007-09-26 17:49 UTC (permalink / raw) To: Justin Piszcz, Bryan J. Smith; +Cc: Ralf Gross, linux-xfs Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > Do you have any type of benchmarks to similate the load you are > mentioning? Yes, write different, non-zero, 100GB data files from 30 NFSv3 sync clients at the same time. You can easily script firing that off and get the number of seconds it takes to commit. Use NFS with UDP to avoid the overhead of TCP. > What did HW RAID drop to when the same test was run with SW > RAID / 50 MBps under load? I saw an aggregate commit average of around 150MBps using a pair of 8-channel 3Ware Escalade 9550SX cards (each on their own PCI-X bus), with a LVM stripe between them. Understand the test literally took 5 hours to run! The software RAID-50, two "dumb" SATA 8-channel Marvell cards (each on their own PCI-X bus), with a LVM stripe between them, was not completed after 15 hours (overnight). So I finally terminated it. Each system had a 4x GbE trunk to a layer-3 switch. I would have run the same test with SMB TCP/IP, possibly with a LeWiz 4x GbE RX TOE HBA, except I honestly didn't have the time to wait on it. > Did it achieve better performance due to an on-board / > raid-card controller cache, or? Has nothing to do with cache. The OS is far better at scheduling and buffering in the system RAM, in addition to the fact that it does an async buffer, whereas many HW RAID drivers are sync to the NVRAM of the HW RAID card (that's part of the problem with comparisons). It has to do with the fact in software RAID-5 you are streaming 100% of the data through the general system interconnect for the LOAD-XOR-STO operation. XORs are extrmely fast. LOAD/STO through a general purpose CPU is not. It's the same reason why we don't use general purpose CPUs for layer-3 switches either, but a "core" CPU with NPE (network processor engine) ASICs. Same deal with most HW RAID cards, a "core" CPU with SPE ASICs -- for off-load from the general CPU system interconnect. XORs are done "in-line" with the transfer, instead of hogging up the system interconnect. It's the direct difference between PIO and DMA. An in-line NPE/SPE ASIC basically acts like a DMA transfer, real-time. A general purpose CPU and its interconnect cannot do that, so it has all the issues of PIO. PIO in a general purpose CPU is to be avoided at all costs when you have other needs for the system interconnect, like I/O. If you don't have much else bothering the I/O, like in a web server or read-only system (where you're not doing the writes), then it doesn't matter, and software RAID-5 is great. -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 14:54 ` Ralf Gross 2007-09-26 16:27 ` [UNSURE] " Justin Piszcz @ 2007-09-27 15:22 ` Ralf Gross 1 sibling, 0 replies; 48+ messages in thread From: Ralf Gross @ 2007-09-27 15:22 UTC (permalink / raw) To: linux-xfs Ralf Gross schrieb: > Peter Grandi schrieb: > > Ralf> Hi, we have a new large raid array, the shelf has 48 disks, > > Ralf> the max. amount of disks in a single raid 5 set is 16. > > > > Too bad about that petty limitation ;-). > > Yeah, I prefer 24x RAID 5 without spare. Why waste so much space ;) > > After talking to the people that own the data and wanted to use as > much as possible space of the device, we'll start with four 12/11 > disk RAID 6 volumes (47 disk + 1 spare). That's ~12% less space than > before with five RAID 5 volumes. I think this is a good compromise > between safety and max. usable disk space. Ok, the init of the new 12 disk RAID 6 volume is complete. The numbers I get now are a bit dissapointing: ~210 MB/s for read and ~110 MB/s for write. I know that RAID 6 is slower than RAID 5, and that less data disks (10 instead of 15) also slow things down. But 390 MB/s read performance compared to 220 MB/s is a bit suprising. Particularly because the RAID 5 read performance was limited by the FC (I think). I thought I still would get 1/3 of the RAID 5 read throughput because of the 5 fewer disks of the RAID 6. I have to test this again with a larger chunk size (256k), we'll how much this affects read/write performance. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-23 9:38 mkfs options for a 16x hw raid5 and xfs (mostly large files) Ralf Gross 2007-09-23 12:56 ` Peter Grandi @ 2007-09-24 17:31 ` Ralf Gross 2007-09-24 18:01 ` Justin Piszcz 1 sibling, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-24 17:31 UTC (permalink / raw) To: linux-xfs Ralf Gross schrieb: > > we have a new large raid array, the shelf has 48 disks, the max. > amount of disks in a single raid 5 set is 16. There will be one global > spare disk, thus we have two raid 5 with 15 data disks and one with 14 > data disk. > > The data on these raid sets will be video data + some meta data. > Typically each set of data consist of a 2 GB + 500 MB + 100 MB + 20 KB > +2 KB file. There will be some dozen of these sets in a single > directory - but not many hundred or thousend. > ... > I already played with different mkfs.xfs options (sw, su) but didn't > see much of a difference. > > The volume sets of the hw raid have the following parameters: > > 11,xx TB (15 data disks): > Chunk Size : 64 KB > (values of 64/128/256 KB are possible, I'll try 256 KB next week) > Stripe Size : 960 KB (15 x 64 KB) > ... I did some more benchmarks with the 64KB/256KB chunk size option of the RAID array and 64K/256K sw option for mkfs.xfs. 4 tests: two RAID 5 volumes (sdd + sdh, both in the same 48 disk shelf), each with 15 data disks + 1 parity, 750 GB SATA disks 1. 256KB chunk size (HW RAID, sdd) + su=256K + sw=15 2. 256KB chunk size (HW RAID, sdd) + su=64K + sw=15 3. 64KB chunk size (HW RAID, sdh) + su=256K + sw=15 4. 64KB chunk size (HW RAID, sdh) + su=64K + sw=15 Although the manual of the HW RAID mentions that a 64KB chunk size would be better with more drives, the result for the 256KB chunk size seems to me better and more important than the mkfs options. The same manual states that RAID 5 would be best for databases... A bit ot: will I waste space on the RAID device with a 256K chunk size and small files? Or does this only depend on the block size of the fs (4KB at the moment). 1.) Chunk Size: 256 KB Stripe Size: 3840 KB Array size: 11135 GB Logical Drive Block Size: 512 bytes (only possible value) mkfs.xfs -d su=256k -d sw=15 /dev/sdd1 /mnt# tiobench --numruns 3 --threads 1 --threads 2 --block 4096 --size 20000 Sequential Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 207.80 23.88% 0.055 50.43 0.00000 0.00000 870 20000 4096 2 197.86 44.29% 0.117 373.10 0.00000 0.00000 447 Random Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ------- ---- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 2.90 0.569% 4.035 42.83 0.00000 0.00000 510 20000 4096 2 4.47 1.679% 5.201 69.75 0.00000 0.00000 266 Sequential Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ------- ---- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 167.84 36.31% 0.055 9151.42 0.00053 0.00000 462 20000 4096 2 170.77 84.39% 0.099 8471.22 0.00066 0.00000 202 Random Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ------- ---- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 1.97 0.990% 0.016 0.05 0.00000 0.00000 199 20000 4096 2 1.68 1.739% 0.019 3.04 0.00000 0.00000 97 2.) Chunk Size: 256 KB Stripe Size: 3840 KB Array size: 11135 GB Logical Drive Block Size: 512 bytes (only possible value) mkfs.xfs -d su=64k -d sw=15 /dev/sdd1 Sequential Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 203.15 25.13% 0.056 47.58 0.00000 0.00000 808 20000 4096 2 190.85 44.67% 0.121 370.55 0.00000 0.00000 427 Random Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 1.98 0.592% 5.908 41.81 0.00000 0.00000 335 20000 4096 2 3.55 1.665% 6.417 69.23 0.00000 0.00000 213 Sequential Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 168.97 35.47% 0.054 8338.06 0.00056 0.00000 476 20000 4096 2 159.21 73.18% 0.109 8133.66 0.00103 0.00000 218 Random Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 2.01 1.046% 0.018 2.46 0.00000 0.00000 192 20000 4096 2 1.78 1.668% 0.020 2.98 0.00000 0.00000 107 3.) Chunk Size: 64 KB Stripe Size: 960 KB Array size: 11135 GB Logical Drive Block Size: 512 bytes (only possible value) mkfs.xfs -d su=256k -d sw=15 /dev/sdh1 Sequential Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 189.84 23.00% 0.061 43.77 0.00000 0.00000 825 20000 4096 2 173.20 40.87% 0.134 365.86 0.00000 0.00000 424 Random Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 2.16 0.461% 5.415 38.47 0.00000 0.00000 469 20000 4096 2 2.94 1.379% 7.772 69.02 0.00000 0.00000 213 Sequential Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 130.48 26.59% 0.076 10970.30 0.00097 0.00000 491 20000 4096 2 124.93 59.08% 0.134 10370.07 0.00173 0.00000 211 Random Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 1.73 0.827% 0.018 2.32 0.00000 0.00000 209 20000 4096 2 1.83 1.609% 0.019 2.88 0.00000 0.00000 114 4.) Chunk Size: 64 KB Stripe Size: 960 KB Array size: 11135 GB Logical Drive Block Size: 512 bytes (only possible value) mkfs.xfs -d su=64k -d sw=15 /dev/sdh1 Sequential Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 193.87 21.96% 0.059 59.45 0.00000 0.00000 883 20000 4096 2 185.08 40.73% 0.125 369.16 0.00000 0.00000 454 Random Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 2.88 0.565% 4.061 39.23 0.00000 0.00000 510 20000 4096 2 4.37 1.640% 5.199 75.55 0.00000 0.00000 266 Sequential Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 143.80 31.12% 0.068 10424.88 0.00072 0.00000 462 20000 4096 2 115.01 53.56% 0.147 11421.10 0.00209 0.00000 215 Random Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 2.05 0.753% 0.016 0.09 0.00000 0.00000 273 20000 4096 2 1.86 1.539% 0.018 0.09 0.00000 0.00000 121 Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-24 17:31 ` Ralf Gross @ 2007-09-24 18:01 ` Justin Piszcz 2007-09-24 20:39 ` Ralf Gross 0 siblings, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-24 18:01 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs A bit ot: will I waste space on the RAID device with a 256K chunk size and small files? Or does this only depend on the block size of the fs (4KB at the moment). That's a good question, I believe its only respective of the filesystem size, but will wait for someone to confirm, nice benchmarks! I use a 1 MiB stripe myself as I found that to give the best performance. Justin. On Mon, 24 Sep 2007, Ralf Gross wrote: > Ralf Gross schrieb: >> >> we have a new large raid array, the shelf has 48 disks, the max. >> amount of disks in a single raid 5 set is 16. There will be one global >> spare disk, thus we have two raid 5 with 15 data disks and one with 14 >> data disk. >> >> The data on these raid sets will be video data + some meta data. >> Typically each set of data consist of a 2 GB + 500 MB + 100 MB + 20 KB >> +2 KB file. There will be some dozen of these sets in a single >> directory - but not many hundred or thousend. >> ... >> I already played with different mkfs.xfs options (sw, su) but didn't >> see much of a difference. >> >> The volume sets of the hw raid have the following parameters: >> >> 11,xx TB (15 data disks): >> Chunk Size : 64 KB >> (values of 64/128/256 KB are possible, I'll try 256 KB next week) >> Stripe Size : 960 KB (15 x 64 KB) >> ... > > I did some more benchmarks with the 64KB/256KB chunk size option of > the RAID array and 64K/256K sw option for mkfs.xfs. > > 4 tests: > two RAID 5 volumes (sdd + sdh, both in the same 48 disk shelf), each > with 15 data disks + 1 parity, 750 GB SATA disks > > 1. 256KB chunk size (HW RAID, sdd) + su=256K + sw=15 > 2. 256KB chunk size (HW RAID, sdd) + su=64K + sw=15 > 3. 64KB chunk size (HW RAID, sdh) + su=256K + sw=15 > 4. 64KB chunk size (HW RAID, sdh) + su=64K + sw=15 > > Although the manual of the HW RAID mentions that a 64KB chunk size would be > better with more drives, the result for the 256KB chunk size seems to > me better and more important than the mkfs options. The same manual > states that RAID 5 would be best for databases... > > A bit ot: will I waste space on the RAID device with a 256K chunk size > and small files? Or does this only depend on the block size of the fs > (4KB at the moment). > > 1.) > Chunk Size: 256 KB > Stripe Size: 3840 KB > Array size: 11135 GB > Logical Drive Block Size: 512 bytes (only possible value) > mkfs.xfs -d su=256k -d sw=15 /dev/sdd1 > > /mnt# tiobench --numruns 3 --threads 1 --threads 2 --block 4096 --size 20000 > > Sequential Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 207.80 23.88% 0.055 50.43 0.00000 0.00000 870 > 20000 4096 2 197.86 44.29% 0.117 373.10 0.00000 0.00000 447 > > Random Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ------- ---- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 2.90 0.569% 4.035 42.83 0.00000 0.00000 510 > 20000 4096 2 4.47 1.679% 5.201 69.75 0.00000 0.00000 266 > > Sequential Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ------- ---- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 167.84 36.31% 0.055 9151.42 0.00053 0.00000 462 > 20000 4096 2 170.77 84.39% 0.099 8471.22 0.00066 0.00000 202 > > Random Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ------- ---- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 1.97 0.990% 0.016 0.05 0.00000 0.00000 199 > 20000 4096 2 1.68 1.739% 0.019 3.04 0.00000 0.00000 97 > > > 2.) > Chunk Size: 256 KB > Stripe Size: 3840 KB > Array size: 11135 GB > Logical Drive Block Size: 512 bytes (only possible value) > mkfs.xfs -d su=64k -d sw=15 /dev/sdd1 > > Sequential Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 203.15 25.13% 0.056 47.58 0.00000 0.00000 808 > 20000 4096 2 190.85 44.67% 0.121 370.55 0.00000 0.00000 427 > > Random Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 1.98 0.592% 5.908 41.81 0.00000 0.00000 335 > 20000 4096 2 3.55 1.665% 6.417 69.23 0.00000 0.00000 213 > > Sequential Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 168.97 35.47% 0.054 8338.06 0.00056 0.00000 476 > 20000 4096 2 159.21 73.18% 0.109 8133.66 0.00103 0.00000 218 > > Random Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 2.01 1.046% 0.018 2.46 0.00000 0.00000 192 > 20000 4096 2 1.78 1.668% 0.020 2.98 0.00000 0.00000 107 > > 3.) > Chunk Size: 64 KB > Stripe Size: 960 KB > Array size: 11135 GB > Logical Drive Block Size: 512 bytes (only possible value) > mkfs.xfs -d su=256k -d sw=15 /dev/sdh1 > > Sequential Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 189.84 23.00% 0.061 43.77 0.00000 0.00000 825 > 20000 4096 2 173.20 40.87% 0.134 365.86 0.00000 0.00000 424 > > Random Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 2.16 0.461% 5.415 38.47 0.00000 0.00000 469 > 20000 4096 2 2.94 1.379% 7.772 69.02 0.00000 0.00000 213 > > Sequential Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 130.48 26.59% 0.076 10970.30 0.00097 0.00000 491 > 20000 4096 2 124.93 59.08% 0.134 10370.07 0.00173 0.00000 211 > > Random Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 1.73 0.827% 0.018 2.32 0.00000 0.00000 209 > 20000 4096 2 1.83 1.609% 0.019 2.88 0.00000 0.00000 114 > > > 4.) > Chunk Size: 64 KB > Stripe Size: 960 KB > Array size: 11135 GB > Logical Drive Block Size: 512 bytes (only possible value) > mkfs.xfs -d su=64k -d sw=15 /dev/sdh1 > > Sequential Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 193.87 21.96% 0.059 59.45 0.00000 0.00000 883 > 20000 4096 2 185.08 40.73% 0.125 369.16 0.00000 0.00000 454 > > Random Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 2.88 0.565% 4.061 39.23 0.00000 0.00000 510 > 20000 4096 2 4.37 1.640% 5.199 75.55 0.00000 0.00000 266 > > Sequential Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 143.80 31.12% 0.068 10424.88 0.00072 0.00000 462 > 20000 4096 2 115.01 53.56% 0.147 11421.10 0.00209 0.00000 215 > > Random Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 2.05 0.753% 0.016 0.09 0.00000 0.00000 273 > 20000 4096 2 1.86 1.539% 0.018 0.09 0.00000 0.00000 121 > > > Ralf > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-24 18:01 ` Justin Piszcz @ 2007-09-24 20:39 ` Ralf Gross 2007-09-24 20:43 ` Justin Piszcz 0 siblings, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-24 20:39 UTC (permalink / raw) To: linux-xfs Justin Piszcz schrieb: >> A bit ot: will I waste space on the RAID device with a 256K chunk size >> and small files? Or does this only depend on the block size of the fs >> (4KB at the moment). > > That's a good question, I believe its only respective of the filesystem > size, but will wait for someone to confirm, nice benchmarks! > > I use a 1 MiB stripe myself as I found that to give the best performance. 256KB is the largest chunk size I can choose for a raid set. BTW: the HW-RAID is an Overland Ultamus 4800. The funny thing is, that performance (256KB chunks) is even better without adding any sw/su option to the mkfs command. mkfs.xfs /dev/sdd1 -f Sequential Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 208.33 23.81% 0.055 49.55 0.00000 0.00000 875 20000 4096 2 199.48 43.72% 0.116 376.85 0.00000 0.00000 456 Random Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 2.83 0.604% 4.131 38.81 0.00000 0.00000 469 20000 4096 2 4.53 1.700% 4.995 67.15 0.00000 0.00000 266 Sequential Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 188.15 42.98% 0.047 7547.93 0.00027 0.00000 438 20000 4096 2 167.76 76.89% 0.100 7521.34 0.00078 0.00000 218 Random Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 2.08 0.869% 0.016 0.13 0.00000 0.00000 239 20000 4096 2 1.80 1.501% 0.020 6.28 0.00000 0.00000 12 Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-24 20:39 ` Ralf Gross @ 2007-09-24 20:43 ` Justin Piszcz 2007-09-24 21:33 ` Ralf Gross 0 siblings, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-24 20:43 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs On Mon, 24 Sep 2007, Ralf Gross wrote: > Justin Piszcz schrieb: >>> A bit ot: will I waste space on the RAID device with a 256K chunk size >>> and small files? Or does this only depend on the block size of the fs >>> (4KB at the moment). >> >> That's a good question, I believe its only respective of the filesystem >> size, but will wait for someone to confirm, nice benchmarks! >> >> I use a 1 MiB stripe myself as I found that to give the best performance. > > 256KB is the largest chunk size I can choose for a raid set. BTW: the HW-RAID > is an Overland Ultamus 4800. > > The funny thing is, that performance (256KB chunks) is even better without > adding any sw/su option to the mkfs command. > > mkfs.xfs /dev/sdd1 -f > > Sequential Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 208.33 23.81% 0.055 49.55 0.00000 0.00000 875 > 20000 4096 2 199.48 43.72% 0.116 376.85 0.00000 0.00000 456 > > Random Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 2.83 0.604% 4.131 38.81 0.00000 0.00000 469 > 20000 4096 2 4.53 1.700% 4.995 67.15 0.00000 0.00000 266 > > Sequential Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 188.15 42.98% 0.047 7547.93 0.00027 0.00000 438 > 20000 4096 2 167.76 76.89% 0.100 7521.34 0.00078 0.00000 218 > > Random Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 2.08 0.869% 0.016 0.13 0.00000 0.00000 239 > 20000 4096 2 1.80 1.501% 0.020 6.28 0.00000 0.00000 12 > > > Ralf > > I find that to be the case with SW RAID (defaults are best) Although with 16 drives(?) that is awfully slow. I have 6 SATA's I get 160-180 MiB/s raid5 and 250-280 MiB/s raid 0 (sw raid). With 10 raptors I get ~450 MiB/s write and ~550-600 MiB/s read, again XFS+SW raid. Justin. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-24 20:43 ` Justin Piszcz @ 2007-09-24 21:33 ` Ralf Gross 2007-09-24 21:36 ` Justin Piszcz 0 siblings, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-24 21:33 UTC (permalink / raw) To: linux-xfs Justin Piszcz schrieb: > > > On Mon, 24 Sep 2007, Ralf Gross wrote: > > >Justin Piszcz schrieb: > >>>A bit ot: will I waste space on the RAID device with a 256K chunk size > >>>and small files? Or does this only depend on the block size of the fs > >>>(4KB at the moment). > >> > >>That's a good question, I believe its only respective of the filesystem > >>size, but will wait for someone to confirm, nice benchmarks! > >> > >>I use a 1 MiB stripe myself as I found that to give the best performance. > > > >256KB is the largest chunk size I can choose for a raid set. BTW: the > >HW-RAID > >is an Overland Ultamus 4800. > > > >The funny thing is, that performance (256KB chunks) is even better without > >adding any sw/su option to the mkfs command. > > > >mkfs.xfs /dev/sdd1 -f > > > >Sequential Reads > >File Blk Num Avg Maximum Lat% Lat% > >CPU > >Size Size Thr Rate (CPU%) Latency Latency >2s >10s > >Eff > >----- ----- --- ------ ------ --------- ----------- -------- -------- > >----- > >20000 4096 1 208.33 23.81% 0.055 49.55 0.00000 0.00000 > >875 > >20000 4096 2 199.48 43.72% 0.116 376.85 0.00000 0.00000 > >456 > > > >Random Reads > >File Blk Num Avg Maximum Lat% Lat% > >CPU > >Size Size Thr Rate (CPU%) Latency Latency >2s >10s > >Eff > >----- ----- --- ------ ------ --------- ----------- -------- -------- > >----- > >20000 4096 1 2.83 0.604% 4.131 38.81 0.00000 0.00000 > >469 > >20000 4096 2 4.53 1.700% 4.995 67.15 0.00000 0.00000 > >266 > > > >Sequential Writes > >File Blk Num Avg Maximum Lat% Lat% > >CPU > >Size Size Thr Rate (CPU%) Latency Latency >2s >10s > >Eff > >----- ----- --- ------ ------ --------- ----------- -------- -------- > >----- > >20000 4096 1 188.15 42.98% 0.047 7547.93 0.00027 0.00000 > >438 > >20000 4096 2 167.76 76.89% 0.100 7521.34 0.00078 0.00000 > >218 > > > >Random Writes > >File Blk Num Avg Maximum Lat% Lat% > >CPU > >Size Size Thr Rate (CPU%) Latency Latency >2s >10s > >Eff > >----- ----- --- ------ ------ --------- ----------- -------- -------- > >----- > >20000 4096 1 2.08 0.869% 0.016 0.13 0.00000 0.00000 > >239 > >20000 4096 2 1.80 1.501% 0.020 6.28 0.00000 0.00000 > >12 > > > > I find that to be the case with SW RAID (defaults are best) > > Although with 16 drives(?) that is awfully slow. > > I have 6 SATA's I get 160-180 MiB/s raid5 and 250-280 MiB/s raid 0 (sw > raid). > > With 10 raptors I get ~450 MiB/s write and ~550-600 MiB/s read, again > XFS+SW raid. Hm, with the different HW-RAIDs I've used so far (easyRAID, Infortrend, internal Areca controller), I always got 160-200 MiB/s read/write with 7-15 disks. That's one reason why I asked if there are some xfs options I could use for better performance. But I guess fs options won't boost performance that much. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-24 21:33 ` Ralf Gross @ 2007-09-24 21:36 ` Justin Piszcz 2007-09-24 21:52 ` Ralf Gross 0 siblings, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-24 21:36 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs On Mon, 24 Sep 2007, Ralf Gross wrote: > Justin Piszcz schrieb: >> >> >> On Mon, 24 Sep 2007, Ralf Gross wrote: >> >>> Justin Piszcz schrieb: >>>>> A bit ot: will I waste space on the RAID device with a 256K chunk size >>>>> and small files? Or does this only depend on the block size of the fs >>>>> (4KB at the moment). >>>> >>>> That's a good question, I believe its only respective of the filesystem >>>> size, but will wait for someone to confirm, nice benchmarks! >>>> >>>> I use a 1 MiB stripe myself as I found that to give the best performance. >>> >>> 256KB is the largest chunk size I can choose for a raid set. BTW: the >>> HW-RAID >>> is an Overland Ultamus 4800. >>> >>> The funny thing is, that performance (256KB chunks) is even better without >>> adding any sw/su option to the mkfs command. >>> >>> mkfs.xfs /dev/sdd1 -f >>> >>> Sequential Reads >>> File Blk Num Avg Maximum Lat% Lat% >>> CPU >>> Size Size Thr Rate (CPU%) Latency Latency >2s >10s >>> Eff >>> ----- ----- --- ------ ------ --------- ----------- -------- -------- >>> ----- >>> 20000 4096 1 208.33 23.81% 0.055 49.55 0.00000 0.00000 >>> 875 >>> 20000 4096 2 199.48 43.72% 0.116 376.85 0.00000 0.00000 >>> 456 >>> >>> Random Reads >>> File Blk Num Avg Maximum Lat% Lat% >>> CPU >>> Size Size Thr Rate (CPU%) Latency Latency >2s >10s >>> Eff >>> ----- ----- --- ------ ------ --------- ----------- -------- -------- >>> ----- >>> 20000 4096 1 2.83 0.604% 4.131 38.81 0.00000 0.00000 >>> 469 >>> 20000 4096 2 4.53 1.700% 4.995 67.15 0.00000 0.00000 >>> 266 >>> >>> Sequential Writes >>> File Blk Num Avg Maximum Lat% Lat% >>> CPU >>> Size Size Thr Rate (CPU%) Latency Latency >2s >10s >>> Eff >>> ----- ----- --- ------ ------ --------- ----------- -------- -------- >>> ----- >>> 20000 4096 1 188.15 42.98% 0.047 7547.93 0.00027 0.00000 >>> 438 >>> 20000 4096 2 167.76 76.89% 0.100 7521.34 0.00078 0.00000 >>> 218 >>> >>> Random Writes >>> File Blk Num Avg Maximum Lat% Lat% >>> CPU >>> Size Size Thr Rate (CPU%) Latency Latency >2s >10s >>> Eff >>> ----- ----- --- ------ ------ --------- ----------- -------- -------- >>> ----- >>> 20000 4096 1 2.08 0.869% 0.016 0.13 0.00000 0.00000 >>> 239 >>> 20000 4096 2 1.80 1.501% 0.020 6.28 0.00000 0.00000 >>> 12 >>> >> >> I find that to be the case with SW RAID (defaults are best) >> >> Although with 16 drives(?) that is awfully slow. >> >> I have 6 SATA's I get 160-180 MiB/s raid5 and 250-280 MiB/s raid 0 (sw >> raid). >> >> With 10 raptors I get ~450 MiB/s write and ~550-600 MiB/s read, again >> XFS+SW raid. > > Hm, with the different HW-RAIDs I've used so far (easyRAID, > Infortrend, internal Areca controller), I always got 160-200 MiB/s > read/write with 7-15 disks. That's one reason why I asked if there are > some xfs options I could use for better performance. But I guess fs > options won't boost performance that much. > > Ralf > > What do you get when (reading) from the raw device? dd if=/dev/sda bs=1M count=10240 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-24 21:36 ` Justin Piszcz @ 2007-09-24 21:52 ` Ralf Gross 2007-09-25 12:35 ` Ralf Gross 0 siblings, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-24 21:52 UTC (permalink / raw) To: linux-xfs Justin Piszcz schrieb: > >>I find that to be the case with SW RAID (defaults are best) > >> > >>Although with 16 drives(?) that is awfully slow. > >> > >>I have 6 SATA's I get 160-180 MiB/s raid5 and 250-280 MiB/s raid 0 (sw > >>raid). > >> > >>With 10 raptors I get ~450 MiB/s write and ~550-600 MiB/s read, again > >>XFS+SW raid. > > > >Hm, with the different HW-RAIDs I've used so far (easyRAID, > >Infortrend, internal Areca controller), I always got 160-200 MiB/s > >read/write with 7-15 disks. That's one reason why I asked if there are > >some xfs options I could use for better performance. But I guess fs > >options won't boost performance that much. > > What do you get when (reading) from the raw device? > > dd if=/dev/sda bs=1M count=10240 The server has 16 GB RAM, so I tried it with 20 GB of data. dd if=/dev/sdd of=/dev/null bs=1M count=20480 20480+0 Datensätze ein 20480+0 Datensätze aus 21474836480 Bytes (21 GB) kopiert, 95,3738 Sekunden, 225 MB/s and a second try: dd if=/dev/sdd of=/dev/null bs=1M count=20480 20480+0 Datensätze ein 20480+0 Datensätze aus 21474836480 Bytes (21 GB) kopiert, 123,78 Sekunden, 173 MB/s I'm taoo tired to interprete these numbers at the moment, I'll do some more testing tomorrow. Good night, Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-24 21:52 ` Ralf Gross @ 2007-09-25 12:35 ` Ralf Gross 2007-09-25 12:50 ` Justin Piszcz 2007-09-25 12:57 ` KELEMEN Peter 0 siblings, 2 replies; 48+ messages in thread From: Ralf Gross @ 2007-09-25 12:35 UTC (permalink / raw) To: linux-xfs Ralf Gross schrieb: > > What do you get when (reading) from the raw device? > > > > dd if=/dev/sda bs=1M count=10240 > > The server has 16 GB RAM, so I tried it with 20 GB of data. > > dd if=/dev/sdd of=/dev/null bs=1M count=20480 > 20480+0 Datensätze ein > 20480+0 Datensätze aus > 21474836480 Bytes (21 GB) kopiert, 95,3738 Sekunden, 225 MB/s > > and a second try: > > dd if=/dev/sdd of=/dev/null bs=1M count=20480 > 20480+0 Datensätze ein > 20480+0 Datensätze aus > 21474836480 Bytes (21 GB) kopiert, 123,78 Sekunden, 173 MB/s > > I'm taoo tired to interprete these numbers at the moment, I'll do some > more testing tomorrow. There is a second RAID device attached to the server (24x RAID5). The numbers I get from this device are a bit worse than the 16x RAID 5 numbers (150MB/s read with dd). I'm really wondering how people can achieve transfer rates of 400MB/s and more. I know that I'm limited by the FC controller, but I don't even get >200MB/s. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 12:35 ` Ralf Gross @ 2007-09-25 12:50 ` Justin Piszcz 2007-09-25 13:44 ` Bryan J Smith 2007-09-25 12:57 ` KELEMEN Peter 1 sibling, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-25 12:50 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs [-- Attachment #1: Type: TEXT/PLAIN, Size: 1378 bytes --] On Tue, 25 Sep 2007, Ralf Gross wrote: > Ralf Gross schrieb: >>> What do you get when (reading) from the raw device? >>> >>> dd if=/dev/sda bs=1M count=10240 >> >> The server has 16 GB RAM, so I tried it with 20 GB of data. >> >> dd if=/dev/sdd of=/dev/null bs=1M count=20480 >> 20480+0 Datensätze ein >> 20480+0 Datensätze aus >> 21474836480 Bytes (21 GB) kopiert, 95,3738 Sekunden, 225 MB/s >> >> and a second try: >> >> dd if=/dev/sdd of=/dev/null bs=1M count=20480 >> 20480+0 Datensätze ein >> 20480+0 Datensätze aus >> 21474836480 Bytes (21 GB) kopiert, 123,78 Sekunden, 173 MB/s >> >> I'm taoo tired to interprete these numbers at the moment, I'll do some >> more testing tomorrow. > > There is a second RAID device attached to the server (24x RAID5). The > numbers I get from this device are a bit worse than the 16x RAID 5 > numbers (150MB/s read with dd). > > I'm really wondering how people can achieve transfer rates of > 400MB/s and more. I know that I'm limited by the FC controller, but > I don't even get >200MB/s. > > Ralf > > Perhaps something is wrong with your setup? Here are my 10 raptors in RAID5 using Software RAID (no hw raid controller): p34:~# dd if=/dev/md3 of=/dev/null bs=1M count=16384 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB) copied, 29.8193 seconds, 576 MB/s p34:~# ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 12:50 ` Justin Piszcz @ 2007-09-25 13:44 ` Bryan J Smith 0 siblings, 0 replies; 48+ messages in thread From: Bryan J Smith @ 2007-09-25 13:44 UTC (permalink / raw) To: Justin Piszcz, xfs-bounce, Ralf Gross; +Cc: linux-xfs There is not a week that goes by without this on some list. Benchmarks not under load are useless, and hardware RAID shows no advantage at all, and can actually be hurt since all data is committed to the I/O controller synchronously at the driver. Furthermore, there is a huge difference between software RAID-5 reads and writes, and read benchmarks are basic RAID-0 (minus one disc) which is always faster with software RAID-0. Again, testing under actual, production load is how you gage performance. If your application is CPU bound, like most web servers, then software RAID-5 is fine because A) little I/O is require, so there is plenty of systen interconnect throughput available for LOAD-XOR-STOR, and B) web servers are heavily reads more than writes. But if your server is a file server, the the amount of inteconnect required for the LOAD-XOR-STO of software RAID-5 detracts from that available for the I/O intensive operations of the file service. You can't measure that at the kernel at all, much less not under load. Benchmark multiple clients hitting the server to see what they get. Furthermore, when you're concerned about I/O, you don't stop at your storage controller, but RX TOE with your HBA GbE NIC(s), your latency v. throughput of your discs, etc... -- Bryan J Smith - mailto:b.j.smith@ieee.org http://thebs413.blogspot.com Sent via BlackBerry from T-Mobile -----Original Message----- From: Justin Piszcz <jpiszcz@lucidpixels.com> Date: Tue, 25 Sep 2007 08:50:15 To:Ralf Gross <Ralf-Lists@ralfgross.de> Cc:linux-xfs@oss.sgi.com Subject: Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) On Tue, 25 Sep 2007, Ralf Gross wrote: > Ralf Gross schrieb: >>> What do you get when (reading) from the raw device? >>> >>> dd if=/dev/sda bs=1M count=10240 >> >> The server has 16 GB RAM, so I tried it with 20 GB of data. >> >> dd if=/dev/sdd of=/dev/null bs=1M count=20480 >> 20480+0 Datensätze ein >> 20480+0 Datensätze aus >> 21474836480 Bytes (21 GB) kopiert, 95,3738 Sekunden, 225 MB/s >> >> and a second try: >> >> dd if=/dev/sdd of=/dev/null bs=1M count=20480 >> 20480+0 Datensätze ein >> 20480+0 Datensätze aus >> 21474836480 Bytes (21 GB) kopiert, 123,78 Sekunden, 173 MB/s >> >> I'm taoo tired to interprete these numbers at the moment, I'll do some >> more testing tomorrow. > > There is a second RAID device attached to the server (24x RAID5). The > numbers I get from this device are a bit worse than the 16x RAID 5 > numbers (150MB/s read with dd). > > I'm really wondering how people can achieve transfer rates of > 400MB/s and more. I know that I'm limited by the FC controller, but > I don't even get >200MB/s. > > Ralf > > Perhaps something is wrong with your setup? Here are my 10 raptors in RAID5 using Software RAID (no hw raid controller): p34:~# dd if=/dev/md3 of=/dev/null bs=1M count=16384 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB) copied, 29.8193 seconds, 576 MB/s p34:~# ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 12:35 ` Ralf Gross 2007-09-25 12:50 ` Justin Piszcz @ 2007-09-25 12:57 ` KELEMEN Peter 2007-09-25 13:49 ` Ralf Gross 1 sibling, 1 reply; 48+ messages in thread From: KELEMEN Peter @ 2007-09-25 12:57 UTC (permalink / raw) To: linux-xfs * Ralf Gross (ralf-lists@ralfgross.de) [20070925 14:35]: > There is a second RAID device attached to the server (24x > RAID5). The numbers I get from this device are a bit worse than > the 16x RAID 5 numbers (150MB/s read with dd). You are expecting 24 spindles to align up when you have a write request, which has to be 23*chunksize bytes in order to avoid RMW. Additionally, your array is so big that you're very likely to hit another error while rebuilding. Chop up your monster RAID5 array into smaller arrays and stripe across them. Even better, consider RAID10. Peter -- .+'''+. .+'''+. .+'''+. .+'''+. .+'' Kelemen Péter / \ / \ Peter.Kelemen@cern.ch .+' `+...+' `+...+' `+...+' `+...+' ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 12:57 ` KELEMEN Peter @ 2007-09-25 13:49 ` Ralf Gross 2007-09-25 14:08 ` Bryan J Smith 0 siblings, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-25 13:49 UTC (permalink / raw) To: linux-xfs KELEMEN Peter schrieb: > * Ralf Gross (ralf-lists@ralfgross.de) [20070925 14:35]: > > > There is a second RAID device attached to the server (24x > > RAID5). The numbers I get from this device are a bit worse than > > the 16x RAID 5 numbers (150MB/s read with dd). > > You are expecting 24 spindles to align up when you have a write > request, which has to be 23*chunksize bytes in order to avoid RMW. > Additionally, your array is so big that you're very likely to hit > another error while rebuilding. Chop up your monster RAID5 array > into smaller arrays and stripe across them. Even better, consider > RAID10. RAID10 is no option, we need 60+ TB at the moment, mostly large video files. Basically the read/write performance we get with the 16x RAID 5 is sufficient for our needs. The 24x RAID 5 is only a test device. The volumes that will be used in the future are the 16/15x RAIDs (48 disk shelf with 3 volumes). I'm just wondering how people get 400+ MB/s with HW-RAID 5. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 13:49 ` Ralf Gross @ 2007-09-25 14:08 ` Bryan J Smith 2007-09-25 16:07 ` Ralf Gross 0 siblings, 1 reply; 48+ messages in thread From: Bryan J Smith @ 2007-09-25 14:08 UTC (permalink / raw) To: Ralf Gross, linux-xfs Use multiple cards on multiple PCI-X/PCIe channels, each with their own RAID-5 (or 6) volume, and then stripe (OS LVM) RAID-0 across the volumes. Depending on your network service and application, you can use either hardware or software for the RAID-5 (or 6). If it's heavily read-only servicing, then software RAID works great, because it's essentially RAID-0 (minus 1 disc). But always use the OS RAID (e.g., LVM stripe) to stripe RAID-0 across all volumes, assuming there is not an OS volume limit (of course ;). Software RAID is extemely fast at XORs, that's not the problem. The problem is how the data stream through the PC's inefficient I/O interconnect. PC's have gotten much better, but the load still detracts from other I/O, that services may contend with. Software RAID-5 writes are, essentially, "programmed I/O." Every single commit has to have it's parity blocked programmed by the CPU, which is difficult to bechmark because the bottleneck is not the CPU, but the LOAD-XOR-STOR of the interconnect. An IOP is designed with ASIC peripherals to do that in-line, real-time. In fact, by the very nature of the IOP driver, the operation is synchronous to the OS' standpoint, unlike software RAID optimizations by the OS. -- Bryan J Smith - mailto:b.j.smith@ieee.org http://thebs413.blogspot.com Sent via BlackBerry from T-Mobile -----Original Message----- From: Ralf Gross <Ralf-Lists@ralfgross.de> Date: Tue, 25 Sep 2007 15:49:56 To:linux-xfs@oss.sgi.com Subject: Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) KELEMEN Peter schrieb: > * Ralf Gross (ralf-lists@ralfgross.de) [20070925 14:35]: > > > There is a second RAID device attached to the server (24x > > RAID5). The numbers I get from this device are a bit worse than > > the 16x RAID 5 numbers (150MB/s read with dd). > > You are expecting 24 spindles to align up when you have a write > request, which has to be 23*chunksize bytes in order to avoid RMW. > Additionally, your array is so big that you're very likely to hit > another error while rebuilding. Chop up your monster RAID5 array > into smaller arrays and stripe across them. Even better, consider > RAID10. RAID10 is no option, we need 60+ TB at the moment, mostly large video files. Basically the read/write performance we get with the 16x RAID 5 is sufficient for our needs. The 24x RAID 5 is only a test device. The volumes that will be used in the future are the 16/15x RAIDs (48 disk shelf with 3 volumes). I'm just wondering how people get 400+ MB/s with HW-RAID 5. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 14:08 ` Bryan J Smith @ 2007-09-25 16:07 ` Ralf Gross 2007-09-25 16:28 ` Bryan J. Smith 2007-09-25 16:48 ` Justin Piszcz 0 siblings, 2 replies; 48+ messages in thread From: Ralf Gross @ 2007-09-25 16:07 UTC (permalink / raw) To: linux-xfs Bryan J Smith schrieb: > Use multiple cards on multiple PCI-X/PCIe channels, each with > their own RAID-5 (or 6) volume, and then stripe (OS LVM) RAID-0 > across the volumes. The hardware is fixed to one PCI-X FC HBA (4Gb) and two 48x shelfs. The performance I get with this setup is ok for us. The data will be stored in bunches of multiple TB. Only few clients will access the data, maybe 5-10 clients at the same time. > Depending on your network service and application, you can use > either hardware or software for the RAID-5 (or 6). > If it's heavily read-only servicing, then software RAID works great, > because it's essentially RAID-0 (minus 1 disc). > But always use the OS RAID (e.g., LVM stripe) to stripe RAID-0 > across all volumes, assuming there is not an OS volume limit > (of course ;). > [...] I always use SW-RAID for RAID0 and RAID1. But for RAID 5/6 I choose either external arrays or internal controllers (Areca). Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 16:07 ` Ralf Gross @ 2007-09-25 16:28 ` Bryan J. Smith 2007-09-25 17:25 ` Ralf Gross 2007-09-25 16:48 ` Justin Piszcz 1 sibling, 1 reply; 48+ messages in thread From: Bryan J. Smith @ 2007-09-25 16:28 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs Ralf Gross <Ralf-Lists@ralfgross.de> wrote: > The hardware is fixed to one PCI-X FC HBA (4Gb) and two 48x shelfs. > The performance I get with this setup is ok for us. The data will > be stored in bunches of multiple TB. Only few clients will access > the data, maybe 5-10 clients at the same time. If raw performance is your ultimate goal, the closer you are to the hardware, and the less overhead in the protocol, the better. Direct SATA channels (software RAID-10), or taking advantage of the 3Ware ASIC+SRAM (hardware RAID-10) is most ideal. I've put in a setup myself that used three (3) 3Ware Escalade 9550SX cards on three (3) different PCI-X channels, and then striped RAID-0 across all three (3) volumes (found little difference between using the OS LVM or the 3Ware manager for the RAID-0 stripe across volumes). Using a buffered RAID-5 hardware solution is not going to get you the best latency or direct DTR, if that is what matters. In most cases, it does not, depending on your application. > I always use SW-RAID for RAID0 and RAID1. But for RAID 5/6 I choose > either external arrays or internal controllers (Areca). Areca is the Intel IOP + firmware. Intel's X-Scale storage processing engines (SPE) seem to best 3Ware's AMCC PowerPC engine. The off-load is massive when I/O is an issue. Unfortunately, I still find I prefer 3Ware's firmware and software support in Linux over Areca, and Intel clearly does not have the dedication to addressing issues that 3Ware does (just like back in the IOP30x/i960 days, sigh). To me, support is key. I've yet to drop a 3Ware volume myself. The only people who seem to drop a volume are typically using 3Ware in JBOD mode, or are "early adopters" of new products. I don't care if it's hardware or software, "early adoption" of anything is just not worth it. I'd rather have reduced performance for "piece-of-mind." 3Ware has a solid history on Linux, and my experiences are the ultimate after 7 years.** [ **NOTE: Don't get me started. The common "proprietary" or "hardware reliance" argument doesn't hold, because 3Ware's volume upward compatibility is proven (I've moved volumes of ATA 6000 to 7000 series, SATA 8000 to 9000, etc...), and they have shared the data organization so you can read them with dmraid as well. I.e., you can always fall back to reading your data off a 3Ware volume with dmraid these days. I've also _never_ had an "ATA timeout" issue with 3Ware cards, because 3Ware updates its firmware regularly to "deal" with troublesome [S]ATA drives. That has bitten me far too many times in Linux with direct [S]ATA -- not Linux's fault, just the fault of hardware [S]ATA PHY chips and their on-drive IDE firmware, something 3Ware has mitigated for me time and time again. ] I'm completely biased though, I assemble file and database servers, not web or other CPU-bound systems. Turning my system interconnect (not the CPU, a PC CPU crunches XOR very fast) into a bottlenecked PIO operation is not ideal for NFS writes or large record SQL commits in my experience. Heck, one look at NetApp's volume w/NVRAM and SPE-accelerated RAID-4 designs will quickly change your opinion as well (and make you wonder if they aren't worth the cost at times as well ;). -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 16:28 ` Bryan J. Smith @ 2007-09-25 17:25 ` Ralf Gross 2007-09-25 17:41 ` Bryan J. Smith 0 siblings, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-25 17:25 UTC (permalink / raw) To: linux-xfs Bryan J. Smith schrieb: > Ralf Gross <Ralf-Lists@ralfgross.de> wrote: > ... > I'm completely biased though, I assemble file and database servers, > not web or other CPU-bound systems. Turning my system interconnect > (not the CPU, a PC CPU crunches XOR very fast) into a bottlenecked > PIO operation is not ideal for NFS writes or large record SQL commits > in my experience. Heck, one look at NetApp's volume w/NVRAM and > SPE-accelerated RAID-4 designs will quickly change your opinion as > well (and make you wonder if they aren't worth the cost at times as > well ;). Thanks for all the details. Before I leave the office (it's getting dark here): I think the Overland RAID we have (48x Disk) is from the same manufacturer (Xyratex) that builds some devices for NetApp. Our profile is not that performance driven, thus the ~200MB/s read/write performace is ok. We just need cheap storage ;) Still I'm wondering how other people saturate a 4 Gb FC controller with one single RAID 5. At least that's what I've seen in some benchmarks and here on the list. If dd doesn't give me more than 200MB/s, the problem could only be the array, the controller or the FC connection. Given that other setup are similar and not using different controllers and stripes. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 17:25 ` Ralf Gross @ 2007-09-25 17:41 ` Bryan J. Smith 2007-09-25 19:13 ` Ralf Gross 0 siblings, 1 reply; 48+ messages in thread From: Bryan J. Smith @ 2007-09-25 17:41 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs Ralf Gross <Ralf-Lists@ralfgross.de> wrote: > Thanks for all the details. Before I leave the office (it's getting > dark here): I think the Overland RAID we have (48x Disk) is from > the same manufacturer (Xyratex) that builds some devices for NetApp. There's a lot of cross-fabbing these days. I was referring more to NetApp's combined hardware-OS-volume approach, although that was clearly a poor tangent by myself. > Our profile is not that performance driven, thus the ~200MB/s > read/write performace is ok. We just need cheap storage ;) For what application? That is the question. I mean, sustained software RAID-5 writes can be a PITA. E.g., the dd example prior doesn't even do XOR recalculation, it merely copies the existing parity block with data. Doing sustained software RAID-5 writes can easily drop under 50MBps, as the PC interconnect was not designed to stream data (programmed I/O), only direct it (Direct Memory Access). > Still I'm wondering how other people saturate a 4 Gb FC controller > with one single RAID 5. At least that's what I've seen in some > benchmarks and here on the list. Depends on the solution, the benchmark, etc... > If dd doesn't give me more than 200MB/s, the problem could only be > the array, the controller or the FC connection. I think you're getting confused. There are many factors in how dd performs. Using an OS-managed volume will result in non-blocking I/O, of which dd will scream. Especially when the OS knows it's merely just copying one block to another, unlike the FC array, and doesn't need to recalculate the parity block. I know software RAID proponents like to show those numbers, but they are beyond removed from "real world," they literally leverage the fact that parity doesn't need to be recalculated for the blocks moved. You need to benchmark from your application -- e.g., clients. If you want "raw" disk access benchmarks, then build a software RAID volume with a massive number of SATA channels using "dumb" SATA ASICs. Don't even use an intelligent hardware RAID card in JBOD mode, that will only slow the DTR. > Given that other setup are similar and not using different > controllers and stripes. Again, benchmark from your application -- e.g., clients. Everything else means squat. I cannot stress this enough. The only way I can show otherwise, is with hardware taps (e.g., PCI-X, PCIe). I literally couldn't explain "well enough" to one client was only getting 60MBps and seeing only 10% CPU utilization why their software RAID was the bottleneck until I put in a PCI-X card and showed the amount of traffic on the bus. And even that wasn't the system interconnect (although it should be possible with a HTX card on an AMD solution, although the card would probably cost 5 figures and have some limits). -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 17:41 ` Bryan J. Smith @ 2007-09-25 19:13 ` Ralf Gross 2007-09-25 20:23 ` Bryan J. Smith 0 siblings, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-25 19:13 UTC (permalink / raw) To: linux-xfs Bryan J. Smith schrieb: > > Our profile is not that performance driven, thus the ~200MB/s > > read/write performace is ok. We just need cheap storage ;) > > For what application? That is the question. I mean, sustained > software RAID-5 writes can be a PITA. E.g., the dd example prior > doesn't even do XOR recalculation, it merely copies the existing > parity block with data. Doing sustained software RAID-5 writes can > easily drop under 50MBps, as the PC interconnect was not designed to > stream data (programmed I/O), only direct it (Direct Memory Access). The server should be able to provide five 17MB/s streams (5 win clients). Each file is ~2GB large. The clients will access the data with smb/cifs, I think the main bottleneck will be samba. There will not be much write access, the data that later will be streamed to the clients will be transferend to the server from the win clients first. The files will not be changed afterwards. So there will be weeks where no data is written, and some days where several TB will be transfered to the server in 48 hours. Furthermore, the win clients read the data from external USB/PCIe SATA drives. Sometimes the clients transfers the data from a external enclosure with 5 drives (no raid) to the server. The will also be a limiting factor. > > Still I'm wondering how other people saturate a 4 Gb FC controller > > with one single RAID 5. At least that's what I've seen in some > > benchmarks and here on the list. > > Depends on the solution, the benchmark, etc... I've seen benchmark results from 3ware, areca and other hw raid 5 vendors (bonnie++, tiobench). > > If dd doesn't give me more than 200MB/s, the problem could only be > > the array, the controller or the FC connection. > > I think you're getting confused. > > There are many factors in how dd performs. Using an OS-managed > volume will result in non-blocking I/O, of which dd will scream. > Especially when the OS knows it's merely just copying one block to > another, unlike the FC array, and doesn't need to recalculate the > parity block. I know software RAID proponents like to show those > numbers, but they are beyond removed from "real world," they > literally leverage the fact that parity doesn't need to be > recalculated for the blocks moved. > > You need to benchmark from your application -- e.g., clients. If you > want "raw" disk access benchmarks, then build a software RAID volume > with a massive number of SATA channels using "dumb" SATA ASICs. > Don't even use an intelligent hardware RAID card in JBOD mode, that > will only slow the DTR. > > > Given that other setup are similar and not using different > > controllers and stripes. > > Again, benchmark from your application -- e.g., clients. Everything > else means squat. > > I cannot stress this enough. The only way I can show otherwise, is > with hardware taps (e.g., PCI-X, PCIe). I literally couldn't explain > "well enough" to one client was only getting 60MBps and seeing only > 10% CPU utilization why their software RAID was the bottleneck until > I put in a PCI-X card and showed the amount of traffic on the bus. > And even that wasn't the system interconnect (although it should be > possible with a HTX card on an AMD solution, although the card would > probably cost 5 figures and have some limits). Maybe I'm just confused by the benchmarks I found in the net and my 200MB/s sql. read/write with tiobench are perfectly ok. @Justin Piszcz: could you provide some tiobench numbers for you sw raid 5? Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 19:13 ` Ralf Gross @ 2007-09-25 20:23 ` Bryan J. Smith 0 siblings, 0 replies; 48+ messages in thread From: Bryan J. Smith @ 2007-09-25 20:23 UTC (permalink / raw) To: Ralf Gross, linux-xfs Ralf Gross <Ralf-Lists@ralfgross.de> wrote: > The server should be able to provide five 17MB/s streams (5 win > clients). Each file is ~2GB large. The clients will access the > data with smb/cifs, I think the main bottleneck will be samba. So it's largely read-only SMB access? So we're talking ... - Largely read-only disk access - Largely server TX-only TCP/IP serving Read-only is cheap, and software RAID-5 is essentially RAID-0 (sans one disc). So software RAID-5 is just fine there (assuming there are no volume addressing limitations). Server TX is also cheap, most commodity server NICs (i.e., even those built into mainboards, or typical dual-MAC 96-128KiB SRAM unified buffer) have a TX TCP Off-load Engine (TOE), some even with Linux driver support. You don't need any hardware accelerated RAID or RX TOE (which is far, far more expensive than TX TOE, largely for receive buffer and processing). > Furthermore, the win clients read the data from external USB/PCIe > SATA drives. Ouch. But I won't go there. ;) > Sometimes the clients transfers the data from a external > enclosure with 5 drives (no raid) to the server. The will also be a > limiting factor. Ouch. But I won't go there. ;) > I've seen benchmark results from 3ware, areca and other hw raid 5 > vendors (bonnie++, tiobench). Bonnie++ is really only good for NFS mounts from multiple clients to a server, and then it will vary. Aggregate, median, etc... studies are required. > Maybe I'm just confused by the benchmarks I found in the > net and my 200MB/s sql. read/write with tiobench are > perfectly ok. I've striped RAID-0 over two, RAID-10 volumes on old 3Ware Escalade 8500-8LP series products over two PCI-X (66MHz) busses and reached close to 400MBps reads, and over 200MBps writes. And that was old ASIC+SRAM (only 4MB) technology in the Escalade 8500 series, not even native SATA (PATA with SATA PHY). But I wouldn't get even close to that over the network, especially not for SMB, unless I used a 4xGbE with a RX TOE and a layer-3 switch. -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 16:07 ` Ralf Gross 2007-09-25 16:28 ` Bryan J. Smith @ 2007-09-25 16:48 ` Justin Piszcz 2007-09-25 18:00 ` Bryan J. Smith 1 sibling, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-25 16:48 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs On Tue, 25 Sep 2007, Ralf Gross wrote: > Bryan J Smith schrieb: >> Use multiple cards on multiple PCI-X/PCIe channels, each with >> their own RAID-5 (or 6) volume, and then stripe (OS LVM) RAID-0 >> across the volumes. > > The hardware is fixed to one PCI-X FC HBA (4Gb) and two 48x shelfs. > The performance I get with this setup is ok for us. The data will be > stored in bunches of multiple TB. Only few clients will access the > data, maybe 5-10 clients at the same time. > >> Depending on your network service and application, you can use >> either hardware or software for the RAID-5 (or 6). >> If it's heavily read-only servicing, then software RAID works great, >> because it's essentially RAID-0 (minus 1 disc). >> But always use the OS RAID (e.g., LVM stripe) to stripe RAID-0 >> across all volumes, assuming there is not an OS volume limit >> (of course ;). >> [...] > > I always use SW-RAID for RAID0 and RAID1. But for RAID 5/6 I choose > either external arrays or internal controllers (Areca). > > Ralf > > Just out of curisosity have you tried SW RAID5 on this array? Also what do you get if you use RAID0 (hw or sw)? Justin. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 16:48 ` Justin Piszcz @ 2007-09-25 18:00 ` Bryan J. Smith 2007-09-25 18:33 ` Ralf Gross 2007-09-25 23:38 ` Justin Piszcz 0 siblings, 2 replies; 48+ messages in thread From: Bryan J. Smith @ 2007-09-25 18:00 UTC (permalink / raw) To: Justin Piszcz, Ralf Gross; +Cc: linux-xfs Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > Just out of curisosity have you tried SW RAID5 on this array? > Also what do you get if you use RAID0 (hw or sw)? According to him, if I read it correclty, it is an external FC RAID-5 chassis. I.e., all of the logic is in the chassis. So your question is N/A. Although I'm more than ready to be proven incorrect. Furthermore, what benchmark do you use? If dd on the volume itself, software RAID wins, hands down. Doesn't matter what size you give it, it literally copies (and doesn't recalculate) the parity. It's the rawest form of non-blocking I/O, and uses virtually no system interconnect to the CPU (just pushes disk-mem-disk). -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 18:00 ` Bryan J. Smith @ 2007-09-25 18:33 ` Ralf Gross 2007-09-25 23:38 ` Justin Piszcz 1 sibling, 0 replies; 48+ messages in thread From: Ralf Gross @ 2007-09-25 18:33 UTC (permalink / raw) To: linux-xfs Bryan J. Smith schrieb: > Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > > Just out of curisosity have you tried SW RAID5 on this array? > > Also what do you get if you use RAID0 (hw or sw)? > > According to him, if I read it correclty, it is an external FC RAID-5 > chassis. I.e., all of the logic is in the chassis. So your question > is N/A. > > Although I'm more than ready to be proven incorrect. No, your're right. It's an external chassis with FC connection to the server. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 18:00 ` Bryan J. Smith 2007-09-25 18:33 ` Ralf Gross @ 2007-09-25 23:38 ` Justin Piszcz 2007-09-26 8:23 ` Ralf Gross 1 sibling, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-25 23:38 UTC (permalink / raw) To: b.j.smith; +Cc: Ralf Gross, linux-xfs On Tue, 25 Sep 2007, Bryan J. Smith wrote: > Justin Piszcz <jpiszcz@lucidpixels.com> wrote: >> Just out of curisosity have you tried SW RAID5 on this array? >> Also what do you get if you use RAID0 (hw or sw)? > > According to him, if I read it correclty, it is an external FC RAID-5 > chassis. I.e., all of the logic is in the chassis. So your question > is N/A. > > Although I'm more than ready to be proven incorrect. > > Furthermore, what benchmark do you use? If dd on the volume itself, > software RAID wins, hands down. Doesn't matter what size you give > it, it literally copies (and doesn't recalculate) the parity. It's > the rawest form of non-blocking I/O, and uses virtually no system > interconnect to the CPU (just pushes disk-mem-disk). > > -- > Bryan J. Smith Professional, Technical Annoyance > b.j.smith@ieee.org http://thebs413.blogspot.com > -------------------------------------------------- > Fission Power: An Inconvenient Solution > bonnie++, iozone, etc.. all show ~430-460 MiB/s write and ~550 MiB/s read Justin. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-25 23:38 ` Justin Piszcz @ 2007-09-26 8:23 ` Ralf Gross 2007-09-26 8:42 ` Justin Piszcz 0 siblings, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-26 8:23 UTC (permalink / raw) To: linux-xfs Justin Piszcz schrieb: > > > On Tue, 25 Sep 2007, Bryan J. Smith wrote: > > >Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > >>Just out of curisosity have you tried SW RAID5 on this array? > >>Also what do you get if you use RAID0 (hw or sw)? > > > >According to him, if I read it correclty, it is an external FC RAID-5 > >chassis. I.e., all of the logic is in the chassis. So your question > >is N/A. > > > >Although I'm more than ready to be proven incorrect. > > > >Furthermore, what benchmark do you use? If dd on the volume itself, > >software RAID wins, hands down. Doesn't matter what size you give > >it, it literally copies (and doesn't recalculate) the parity. It's > >the rawest form of non-blocking I/O, and uses virtually no system > >interconnect to the CPU (just pushes disk-mem-disk). > > bonnie++, iozone, etc.. > > all show ~430-460 MiB/s write and ~550 MiB/s read I'm happy :) I was able to boost the read performance by setting blockdev --setra 16384 /dev/sdc I knew this parameter is neccessary for 3ware controllers, but I haven't noticed any difference with areca controllers or ext. raids yet. The write performance may not be ideal, but these read numbers makes much more sense now because the FC controller is the limiting factor. Sequential Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 391.20 50.24% 0.019 43.01 0.00000 0.00000 779 20000 4096 2 387.79 92.22% 0.040 278.71 0.00000 0.00000 420 Random Reads File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 2.87 0.698% 2.720 27.47 0.00000 0.00000 411 20000 4096 2 4.37 2.013% 3.473 47.35 0.00000 0.00000 217 Sequential Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 189.23 42.73% 0.029 5670.66 0.00014 0.00000 443 20000 4096 2 173.92 84.93% 0.064 4590.56 0.00029 0.00000 205 Random Writes File Blk Num Avg Maximum Lat% Lat% CPU Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- 20000 4096 1 1.85 0.662% 0.011 0.05 0.00000 0.00000 279 20000 4096 2 1.68 0.772% 0.012 0.05 0.00000 0.00000 217 Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 8:23 ` Ralf Gross @ 2007-09-26 8:42 ` Justin Piszcz 2007-09-26 8:49 ` Ralf Gross 0 siblings, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-26 8:42 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs On Wed, 26 Sep 2007, Ralf Gross wrote: > Justin Piszcz schrieb: >> >> >> On Tue, 25 Sep 2007, Bryan J. Smith wrote: >> >>> Justin Piszcz <jpiszcz@lucidpixels.com> wrote: >>>> Just out of curisosity have you tried SW RAID5 on this array? >>>> Also what do you get if you use RAID0 (hw or sw)? >>> >>> According to him, if I read it correclty, it is an external FC RAID-5 >>> chassis. I.e., all of the logic is in the chassis. So your question >>> is N/A. >>> >>> Although I'm more than ready to be proven incorrect. >>> >>> Furthermore, what benchmark do you use? If dd on the volume itself, >>> software RAID wins, hands down. Doesn't matter what size you give >>> it, it literally copies (and doesn't recalculate) the parity. It's >>> the rawest form of non-blocking I/O, and uses virtually no system >>> interconnect to the CPU (just pushes disk-mem-disk). >> >> bonnie++, iozone, etc.. >> >> all show ~430-460 MiB/s write and ~550 MiB/s read > > I'm happy :) I was able to boost the read performance by setting > > blockdev --setra 16384 /dev/sdc > > I knew this parameter is neccessary for 3ware controllers, but I > haven't noticed any difference with areca controllers or ext. raids yet. > > The write performance may not be ideal, but these read numbers makes much > more sense now because the FC controller is the limiting factor. > > Sequential Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 391.20 50.24% 0.019 43.01 0.00000 0.00000 779 > 20000 4096 2 387.79 92.22% 0.040 278.71 0.00000 0.00000 420 > > Random Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 2.87 0.698% 2.720 27.47 0.00000 0.00000 411 > 20000 4096 2 4.37 2.013% 3.473 47.35 0.00000 0.00000 217 > > Sequential Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 189.23 42.73% 0.029 5670.66 0.00014 0.00000 443 > 20000 4096 2 173.92 84.93% 0.064 4590.56 0.00029 0.00000 205 > > Random Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ----- ----- --- ------ ------ --------- ----------- -------- -------- ----- > 20000 4096 1 1.85 0.662% 0.011 0.05 0.00000 0.00000 279 > 20000 4096 2 1.68 0.772% 0.012 0.05 0.00000 0.00000 217 > > > > > Ralf > > What was the command line you used for that output? tiobench.. ? Justin. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 8:42 ` Justin Piszcz @ 2007-09-26 8:49 ` Ralf Gross 2007-09-26 9:52 ` Justin Piszcz 0 siblings, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-26 8:49 UTC (permalink / raw) To: linux-xfs Justin Piszcz schrieb: > What was the command line you used for that output? > tiobench.. ? tiobench --numruns 3 --threads 1 --threads 2 --block 4096 --size 20000 --size 20000 because the server has 16 GB RAM. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 8:49 ` Ralf Gross @ 2007-09-26 9:52 ` Justin Piszcz 2007-09-26 15:03 ` Bryan J Smith 0 siblings, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-26 9:52 UTC (permalink / raw) To: Ralf Gross; +Cc: linux-xfs, linux-raid On Wed, 26 Sep 2007, Ralf Gross wrote: > Justin Piszcz schrieb: >> What was the command line you used for that output? >> tiobench.. ? > > tiobench --numruns 3 --threads 1 --threads 2 --block 4096 --size 20000 > > --size 20000 because the server has 16 GB RAM. > > Ralf > > Here is my output on my SW RAID5 keep in mind it is currently being used so the numbers are a little slower than they probably should be: My machine only has 8 GiB of memory but I used the same command you did: This is with the 2.6.22.6 kernel, the 2.6.23-rcX/final when released is supposed to have the SW RAID5 accelerator code, correct? Unit information ================ File size = megabytes Blk Size = bytes Rate = megabytes per second CPU% = percentage of CPU used during the test Latency = milliseconds Lat% = percent of requests that took longer than X seconds CPU Eff = Rate divided by CPU% - throughput per cpu load Sequential Reads File Blk Num Avg Maximum Lat% Lat% CPU Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- 2.6.22.6 20000 4096 1 523.01 45.79% 0.022 510.77 0.00000 0.00000 1142 2.6.22.6 20000 4096 2 501.29 85.84% 0.046 855.59 0.00000 0.00000 584 Random Reads File Blk Num Avg Maximum Lat% Lat% CPU Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- 2.6.22.6 20000 4096 1 0.90 0.276% 13.003 74.41 0.00000 0.00000 326 2.6.22.6 20000 4096 2 1.61 1.167% 14.443 126.43 0.00000 0.00000 137 Sequential Writes File Blk Num Avg Maximum Lat% Lat% CPU Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- 2.6.22.6 20000 4096 1 363.46 75.72% 0.030 2757.45 0.00000 0.00000 480 2.6.22.6 20000 4096 2 394.45 287.9% 0.056 2798.92 0.00000 0.00000 137 Random Writes File Blk Num Avg Maximum Lat% Lat% CPU Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- 2.6.22.6 20000 4096 1 3.16 1.752% 0.011 1.02 0.00000 0.00000 180 2.6.22.6 20000 4096 2 3.07 3.769% 0.013 0.10 0.00000 0.00000 82 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 9:52 ` Justin Piszcz @ 2007-09-26 15:03 ` Bryan J Smith 2007-09-26 15:15 ` Ralf Gross 2007-09-26 16:24 ` Justin Piszcz 0 siblings, 2 replies; 48+ messages in thread From: Bryan J Smith @ 2007-09-26 15:03 UTC (permalink / raw) To: Justin Piszcz, xfs-bounce, Ralf Gross; +Cc: linux-xfs, linux-raid Everyone can play local benchmarking games all they want, and software RAID will almost always be faster, significantly at times. What matters is actual, multiple client performance under full load. Anything less is a completely irrelevant. -- Bryan J Smith - mailto:b.j.smith@ieee.org http://thebs413.blogspot.com Sent via BlackBerry from T-Mobile -----Original Message----- From: Justin Piszcz <jpiszcz@lucidpixels.com> Date: Wed, 26 Sep 2007 05:52:39 To:Ralf Gross <Ralf-Lists@ralfgross.de> Cc:linux-xfs@oss.sgi.com, linux-raid@vger.kernel.org Subject: Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) On Wed, 26 Sep 2007, Ralf Gross wrote: > Justin Piszcz schrieb: >> What was the command line you used for that output? >> tiobench.. ? > > tiobench --numruns 3 --threads 1 --threads 2 --block 4096 --size 20000 > > --size 20000 because the server has 16 GB RAM. > > Ralf > > Here is my output on my SW RAID5 keep in mind it is currently being used so the numbers are a little slower than they probably should be: My machine only has 8 GiB of memory but I used the same command you did: This is with the 2.6.22.6 kernel, the 2.6.23-rcX/final when released is supposed to have the SW RAID5 accelerator code, correct? Unit information ================ File size = megabytes Blk Size = bytes Rate = megabytes per second CPU% = percentage of CPU used during the test Latency = milliseconds Lat% = percent of requests that took longer than X seconds CPU Eff = Rate divided by CPU% - throughput per cpu load Sequential Reads File Blk Num Avg Maximum Lat% Lat% CPU Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- 2.6.22.6 20000 4096 1 523.01 45.79% 0.022 510.77 0.00000 0.00000 1142 2.6.22.6 20000 4096 2 501.29 85.84% 0.046 855.59 0.00000 0.00000 584 Random Reads File Blk Num Avg Maximum Lat% Lat% CPU Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- 2.6.22.6 20000 4096 1 0.90 0.276% 13.003 74.41 0.00000 0.00000 326 2.6.22.6 20000 4096 2 1.61 1.167% 14.443 126.43 0.00000 0.00000 137 Sequential Writes File Blk Num Avg Maximum Lat% Lat% CPU Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- 2.6.22.6 20000 4096 1 363.46 75.72% 0.030 2757.45 0.00000 0.00000 480 2.6.22.6 20000 4096 2 394.45 287.9% 0.056 2798.92 0.00000 0.00000 137 Random Writes File Blk Num Avg Maximum Lat% Lat% CPU Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- 2.6.22.6 20000 4096 1 3.16 1.752% 0.011 1.02 0.00000 0.00000 180 2.6.22.6 20000 4096 2 3.07 3.769% 0.013 0.10 0.00000 0.00000 82 ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 15:03 ` Bryan J Smith @ 2007-09-26 15:15 ` Ralf Gross 2007-09-26 17:08 ` Bryan J. Smith 2007-09-26 16:24 ` Justin Piszcz 1 sibling, 1 reply; 48+ messages in thread From: Ralf Gross @ 2007-09-26 15:15 UTC (permalink / raw) To: linux-xfs Bryan J Smith schrieb: > Everyone can play local benchmarking games all they want, > and software RAID will almost always be faster, significantly at times. > > What matters is actual, multiple client performance under full load. > Anything less is a completely irrelevant. You're right, but these benchmarks help to find simple failures or misconfigurations at an earlier stage of the process. Ralf ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 15:15 ` Ralf Gross @ 2007-09-26 17:08 ` Bryan J. Smith 0 siblings, 0 replies; 48+ messages in thread From: Bryan J. Smith @ 2007-09-26 17:08 UTC (permalink / raw) To: Ralf Gross, linux-xfs Ralf Gross <Ralf-Lists@ralfgross.de> wrote: > You're right, but these benchmarks help to find simple failures or > misconfigurations at an earlier stage of the process. Yes, as long as you are comparing to a benchmark of a known, similar quantity. It's not uncommon for Linux's RAID-5 to be 2-3x faster at dd and single file operations, especially if there are no, actual parity operations. -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 15:03 ` Bryan J Smith 2007-09-26 15:15 ` Ralf Gross @ 2007-09-26 16:24 ` Justin Piszcz 2007-09-26 17:11 ` Bryan J. Smith 1 sibling, 1 reply; 48+ messages in thread From: Justin Piszcz @ 2007-09-26 16:24 UTC (permalink / raw) To: Bryan J Smith; +Cc: xfs-bounce, Ralf Gross, linux-xfs, linux-raid I have a question, when I use multiple writer threads (2 or 3) I see 550-600 MiB/s write speed (vmstat) but when using only 1 thread, ~420-430 MiB/s... Also without tweaking, SW RAID is very slow (180-200 MiB/s) using the same disks. Justin. On Wed, 26 Sep 2007, Bryan J Smith wrote: > Everyone can play local benchmarking games all they want, > and software RAID will almost always be faster, significantly at times. > > What matters is actual, multiple client performance under full load. > Anything less is a completely irrelevant. > -- > Bryan J Smith - mailto:b.j.smith@ieee.org > http://thebs413.blogspot.com > Sent via BlackBerry from T-Mobile > > > -----Original Message----- > From: Justin Piszcz <jpiszcz@lucidpixels.com> > > Date: Wed, 26 Sep 2007 05:52:39 > To:Ralf Gross <Ralf-Lists@ralfgross.de> > Cc:linux-xfs@oss.sgi.com, linux-raid@vger.kernel.org > Subject: Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) > > > > > On Wed, 26 Sep 2007, Ralf Gross wrote: > >> Justin Piszcz schrieb: >>> What was the command line you used for that output? >>> tiobench.. ? >> >> tiobench --numruns 3 --threads 1 --threads 2 --block 4096 --size 20000 >> >> --size 20000 because the server has 16 GB RAM. >> >> Ralf >> >> > > Here is my output on my SW RAID5 keep in mind it is currently being used so the numbers are a little slower than they probably should be: > > My machine only has 8 GiB of memory but I used the same command you did: > > This is with the 2.6.22.6 kernel, the 2.6.23-rcX/final when released is supposed to have the SW RAID5 accelerator code, correct? > > Unit information > ================ > File size = megabytes > Blk Size = bytes > Rate = megabytes per second > CPU% = percentage of CPU used during the test > Latency = milliseconds > Lat% = percent of requests that took longer than X seconds > CPU Eff = Rate divided by CPU% - throughput per cpu load > > Sequential Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- > 2.6.22.6 20000 4096 1 523.01 45.79% 0.022 510.77 0.00000 0.00000 1142 > 2.6.22.6 20000 4096 2 501.29 85.84% 0.046 855.59 0.00000 0.00000 584 > > Random Reads > File Blk Num Avg Maximum Lat% Lat% CPU > Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- > 2.6.22.6 20000 4096 1 0.90 0.276% 13.003 74.41 0.00000 0.00000 326 > 2.6.22.6 20000 4096 2 1.61 1.167% 14.443 126.43 0.00000 0.00000 137 > > Sequential Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- > 2.6.22.6 20000 4096 1 363.46 75.72% 0.030 2757.45 0.00000 0.00000 480 > 2.6.22.6 20000 4096 2 394.45 287.9% 0.056 2798.92 0.00000 0.00000 137 > > Random Writes > File Blk Num Avg Maximum Lat% Lat% CPU > Identifier Size Size Thr Rate (CPU%) Latency Latency >2s >10s Eff > ---------------------------- ------ ----- --- ------ ------ --------- ----------- -------- -------- ----- > 2.6.22.6 20000 4096 1 3.16 1.752% 0.011 1.02 0.00000 0.00000 180 > 2.6.22.6 20000 4096 2 3.07 3.769% 0.013 0.10 0.00000 0.00000 82 > > > > > ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: mkfs options for a 16x hw raid5 and xfs (mostly large files) 2007-09-26 16:24 ` Justin Piszcz @ 2007-09-26 17:11 ` Bryan J. Smith 0 siblings, 0 replies; 48+ messages in thread From: Bryan J. Smith @ 2007-09-26 17:11 UTC (permalink / raw) To: Justin Piszcz, Bryan J Smith Cc: xfs-bounce, Ralf Gross, linux-xfs, linux-raid Justin Piszcz <jpiszcz@lucidpixels.com> wrote: > I have a question, when I use multiple writer threads (2 or 3) I > see 550-600 MiB/s write speed (vmstat) but when using only 1 thread, > ~420-430 MiB/s... It's called scheduling buffer flushes, as well as the buffering itself. > Also without tweaking, SW RAID is very slow (180-200 > MiB/s) using the same disks. But how much of that tweaking is actually just buffering? That's a continued theme (and issue). Unless you can force completely synchronous writes, you honestly don't know. Using a larger size than memory is not anywhere near the same. Plus it makes software RAID utterly n/a in comparison to hardware RAID, where the driver is waiting until the commit to actual NVRAM or disc is complete. -- Bryan J. Smith Professional, Technical Annoyance b.j.smith@ieee.org http://thebs413.blogspot.com -------------------------------------------------- Fission Power: An Inconvenient Solution ^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2007-09-27 15:23 UTC | newest] Thread overview: 48+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-09-23 9:38 mkfs options for a 16x hw raid5 and xfs (mostly large files) Ralf Gross 2007-09-23 12:56 ` Peter Grandi 2007-09-26 14:54 ` Ralf Gross 2007-09-26 16:27 ` [UNSURE] " Justin Piszcz 2007-09-26 16:54 ` Ralf Gross 2007-09-26 16:59 ` Justin Piszcz 2007-09-26 17:38 ` Bryan J. Smith 2007-09-26 17:41 ` Justin Piszcz 2007-09-26 17:55 ` Bryan J. Smith 2007-09-26 17:13 ` [UNSURE] " Bryan J. Smith 2007-09-26 17:27 ` Justin Piszcz 2007-09-26 17:35 ` Bryan J. Smith 2007-09-26 17:37 ` Justin Piszcz 2007-09-26 17:38 ` Justin Piszcz 2007-09-26 17:49 ` Bryan J. Smith 2007-09-27 15:22 ` Ralf Gross 2007-09-24 17:31 ` Ralf Gross 2007-09-24 18:01 ` Justin Piszcz 2007-09-24 20:39 ` Ralf Gross 2007-09-24 20:43 ` Justin Piszcz 2007-09-24 21:33 ` Ralf Gross 2007-09-24 21:36 ` Justin Piszcz 2007-09-24 21:52 ` Ralf Gross 2007-09-25 12:35 ` Ralf Gross 2007-09-25 12:50 ` Justin Piszcz 2007-09-25 13:44 ` Bryan J Smith 2007-09-25 12:57 ` KELEMEN Peter 2007-09-25 13:49 ` Ralf Gross 2007-09-25 14:08 ` Bryan J Smith 2007-09-25 16:07 ` Ralf Gross 2007-09-25 16:28 ` Bryan J. Smith 2007-09-25 17:25 ` Ralf Gross 2007-09-25 17:41 ` Bryan J. Smith 2007-09-25 19:13 ` Ralf Gross 2007-09-25 20:23 ` Bryan J. Smith 2007-09-25 16:48 ` Justin Piszcz 2007-09-25 18:00 ` Bryan J. Smith 2007-09-25 18:33 ` Ralf Gross 2007-09-25 23:38 ` Justin Piszcz 2007-09-26 8:23 ` Ralf Gross 2007-09-26 8:42 ` Justin Piszcz 2007-09-26 8:49 ` Ralf Gross 2007-09-26 9:52 ` Justin Piszcz 2007-09-26 15:03 ` Bryan J Smith 2007-09-26 15:15 ` Ralf Gross 2007-09-26 17:08 ` Bryan J. Smith 2007-09-26 16:24 ` Justin Piszcz 2007-09-26 17:11 ` Bryan J. Smith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox