* raid10 performance question @ 2007-12-23 14:26 Jon Nelson 2007-12-25 19:08 ` Peter Grandi 0 siblings, 1 reply; 4+ messages in thread From: Jon Nelson @ 2007-12-23 14:26 UTC (permalink / raw) To: linux-raid I've found in some tests that raid10,f2 gives me the best I/O of any raid5 or raid10 format. However, the performance of raid10,o2 and raid10,n2 in degraded mode is nearly identical to the non-degraded mode performance (for me, this hovers around 100MB/s). raid10,f2 has degraded mode performance, writing, that is indistinguishable from it's non-degraded mode performance. It's the raid10,f2 *read* performance in degraded mode that is strange - I get almost exactly 50% of the non-degraded mode read performance. Why is that? -- Jon ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: raid10 performance question 2007-12-23 14:26 raid10 performance question Jon Nelson @ 2007-12-25 19:08 ` Peter Grandi 2007-12-25 21:34 ` Peter Grandi 0 siblings, 1 reply; 4+ messages in thread From: Peter Grandi @ 2007-12-25 19:08 UTC (permalink / raw) To: Linux RAID >>> On Sun, 23 Dec 2007 08:26:55 -0600, "Jon Nelson" >>> <jnelson-linux-raid@jamponi.net> said: > I've found in some tests that raid10,f2 gives me the best I/O > of any raid5 or raid10 format. Mostly, depending on type of workload. Anyhow in general most forms of RAID10 are cool, and handle disk losses better and so on. > However, the performance of raid10,o2 and raid10,n2 in > degraded mode is nearly identical to the non-degraded mode > performance (for me, this hovers around 100MB/s). You don't say how many drives you got, but may suggest that your array transfers are limited by the PCI host bus speed. > raid10,f2 has degraded mode performance, writing, that is > indistinguishable from it's non-degraded mode performance > It's the raid10,f2 *read* performance in degraded mode that is > strange - I get almost exactly 50% of the non-degraded mode > read performance. Why is that? Well, the best description I found of the odd Linux RAID10 modes is here: http://en.Wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 The key here is: "The driver also supports a "far" layout where all the drives are divided into f sections." Now when there are two sections as in 'f2', each block will be written to a block in the first half of the first disk and to the second half of the "next" disk. Consider this layout for the first 4 blocks on 2x2 layout compared to the standard layout: DISK DISK A B C D A B C D 1 2 3 4 1 1 2 2 . . . . 3 3 4 4 . . . . . . . . ------- 4 1 2 3 . . . . . . . . . . . . This means that with the far layout one can read blocks 1,2,3,4 at the same speed as a RAID0 on the outer cylinders of each disk; but if one of the disks fails, the mirror blocks have to be read from the inner cylinders of the next disk, which are usually a lot slower than the outer ones. Now, there is a very interesting detail here: one idea about getting a fast array is to take make it out of large high density drives and just use the outer cylinders of each drive, thus at the same time having a much smaller range of arm travel and higher transfer rates. The 'f2' layout means that (until a drive fails) for all reads and for "short" writes MD is effectively using just the outer half of each drive, *as well as* what is effectively a RAID0 layout. Note that the sustained writing speed of 'f2' is going to be same *across the whole capacity* of the RAID. While the sustained write speed of a 'n2' layout will be higher at the beginning and slower at the end just like for a single disk. Interesting, I hadn't realized that, even if I am keenly aware of the non uniform speeds of disks across cylinders. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: raid10 performance question 2007-12-25 19:08 ` Peter Grandi @ 2007-12-25 21:34 ` Peter Grandi 2007-12-26 20:59 ` Bill Davidsen 0 siblings, 1 reply; 4+ messages in thread From: Peter Grandi @ 2007-12-25 21:34 UTC (permalink / raw) To: Linux RAID >>> On Tue, 25 Dec 2007 19:08:15 +0000, >>> pg_lxra@lxra.for.sabi.co.UK (Peter Grandi) said: [ ... ] >> It's the raid10,f2 *read* performance in degraded mode that is >> strange - I get almost exactly 50% of the non-degraded mode >> read performance. Why is that? > [ ... ] the mirror blocks have to be read from the inner > cylinders of the next disk, which are usually a lot slower > than the outer ones. [ ... ] Just to be complete there is of course the other issue that affect sustained writes too, which is extra seeks. If disk B fails the situation becomes: DISK A X C D 1 X 3 4 . . . . . . . . . . . . ------- 4 X 2 3 . . . . . . . . . . . . Not only must block 2 be read from an inner cylinder, but to read block 3 there must be a seek to an outer cylinder on the same disk. Which is the same well known issue when doing sustained writes with RAID10 'f2'. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: raid10 performance question 2007-12-25 21:34 ` Peter Grandi @ 2007-12-26 20:59 ` Bill Davidsen 0 siblings, 0 replies; 4+ messages in thread From: Bill Davidsen @ 2007-12-26 20:59 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux RAID Peter Grandi wrote: >>>> On Tue, 25 Dec 2007 19:08:15 +0000, >>>> pg_lxra@lxra.for.sabi.co.UK (Peter Grandi) said: >>>> > > [ ... ] > > >>> It's the raid10,f2 *read* performance in degraded mode that is >>> strange - I get almost exactly 50% of the non-degraded mode >>> read performance. Why is that? >>> > > >> [ ... ] the mirror blocks have to be read from the inner >> cylinders of the next disk, which are usually a lot slower >> than the outer ones. [ ... ] >> > > Just to be complete there is of course the other issue that > affect sustained writes too, which is extra seeks. If disk B > fails the situation becomes: > > DISK > A X C D > > 1 X 3 4 > . . . . > . . . . > . . . . > ------- > 4 X 2 3 > . . . . > . . . . > . . . . > > Not only must block 2 be read from an inner cylinder, but to > read block 3 there must be a seek to an outer cylinder on the > same disk. Which is the same well known issue when doing > sustained writes with RAID10 'f2'. I have often wondered why the elevator code doesn't do better on this sustained load, grouping the writes at the drive extremities so there would be lots of writes to nearby cylinders then a big seek and lots of writes near the next position. I tried bumping the stripe_cache, changing to alternate elevators, and just increasing the physical memory, and never saw any serious improvement beyond the speed with default settings. -- Bill Davidsen <davidsen@tmr.com> "Woe unto the statesman who makes war without a reason that will still be valid when the war is over..." Otto von Bismark ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-12-26 20:59 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-23 14:26 raid10 performance question Jon Nelson 2007-12-25 19:08 ` Peter Grandi 2007-12-25 21:34 ` Peter Grandi 2007-12-26 20:59 ` Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).