* raid10 performance question
@ 2007-12-23 14:26 Jon Nelson
2007-12-25 19:08 ` Peter Grandi
0 siblings, 1 reply; 4+ messages in thread
From: Jon Nelson @ 2007-12-23 14:26 UTC (permalink / raw)
To: linux-raid
I've found in some tests that raid10,f2 gives me the best I/O of any
raid5 or raid10 format. However, the performance of raid10,o2 and
raid10,n2 in degraded mode is nearly identical to the non-degraded
mode performance (for me, this hovers around 100MB/s). raid10,f2 has
degraded mode performance, writing, that is indistinguishable from
it's non-degraded mode performance. It's the raid10,f2 *read*
performance in degraded mode that is strange - I get almost exactly
50% of the non-degraded mode read performance. Why is that?
--
Jon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: raid10 performance question
2007-12-23 14:26 raid10 performance question Jon Nelson
@ 2007-12-25 19:08 ` Peter Grandi
2007-12-25 21:34 ` Peter Grandi
0 siblings, 1 reply; 4+ messages in thread
From: Peter Grandi @ 2007-12-25 19:08 UTC (permalink / raw)
To: Linux RAID
>>> On Sun, 23 Dec 2007 08:26:55 -0600, "Jon Nelson"
>>> <jnelson-linux-raid@jamponi.net> said:
> I've found in some tests that raid10,f2 gives me the best I/O
> of any raid5 or raid10 format.
Mostly, depending on type of workload. Anyhow in general most
forms of RAID10 are cool, and handle disk losses better and so
on.
> However, the performance of raid10,o2 and raid10,n2 in
> degraded mode is nearly identical to the non-degraded mode
> performance (for me, this hovers around 100MB/s).
You don't say how many drives you got, but may suggest that your
array transfers are limited by the PCI host bus speed.
> raid10,f2 has degraded mode performance, writing, that is
> indistinguishable from it's non-degraded mode performance
> It's the raid10,f2 *read* performance in degraded mode that is
> strange - I get almost exactly 50% of the non-degraded mode
> read performance. Why is that?
Well, the best description I found of the odd Linux RAID10 modes
is here:
http://en.Wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
The key here is:
"The driver also supports a "far" layout where all the drives
are divided into f sections."
Now when there are two sections as in 'f2', each block will be
written to a block in the first half of the first disk and to
the second half of the "next" disk.
Consider this layout for the first 4 blocks on 2x2 layout
compared to the standard layout:
DISK DISK
A B C D A B C D
1 2 3 4 1 1 2 2
. . . . 3 3 4 4
. . . .
. . . .
-------
4 1 2 3
. . . .
. . . .
. . . .
This means that with the far layout one can read blocks 1,2,3,4
at the same speed as a RAID0 on the outer cylinders of each
disk; but if one of the disks fails, the mirror blocks have to
be read from the inner cylinders of the next disk, which are
usually a lot slower than the outer ones.
Now, there is a very interesting detail here: one idea about
getting a fast array is to take make it out of large high
density drives and just use the outer cylinders of each drive,
thus at the same time having a much smaller range of arm travel
and higher transfer rates.
The 'f2' layout means that (until a drive fails) for all reads
and for "short" writes MD is effectively using just the outer
half of each drive, *as well as* what is effectively a RAID0
layout.
Note that the sustained writing speed of 'f2' is going to be
same *across the whole capacity* of the RAID. While the
sustained write speed of a 'n2' layout will be higher at the
beginning and slower at the end just like for a single disk.
Interesting, I hadn't realized that, even if I am keenly aware
of the non uniform speeds of disks across cylinders.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: raid10 performance question
2007-12-25 19:08 ` Peter Grandi
@ 2007-12-25 21:34 ` Peter Grandi
2007-12-26 20:59 ` Bill Davidsen
0 siblings, 1 reply; 4+ messages in thread
From: Peter Grandi @ 2007-12-25 21:34 UTC (permalink / raw)
To: Linux RAID
>>> On Tue, 25 Dec 2007 19:08:15 +0000,
>>> pg_lxra@lxra.for.sabi.co.UK (Peter Grandi) said:
[ ... ]
>> It's the raid10,f2 *read* performance in degraded mode that is
>> strange - I get almost exactly 50% of the non-degraded mode
>> read performance. Why is that?
> [ ... ] the mirror blocks have to be read from the inner
> cylinders of the next disk, which are usually a lot slower
> than the outer ones. [ ... ]
Just to be complete there is of course the other issue that
affect sustained writes too, which is extra seeks. If disk B
fails the situation becomes:
DISK
A X C D
1 X 3 4
. . . .
. . . .
. . . .
-------
4 X 2 3
. . . .
. . . .
. . . .
Not only must block 2 be read from an inner cylinder, but to
read block 3 there must be a seek to an outer cylinder on the
same disk. Which is the same well known issue when doing
sustained writes with RAID10 'f2'.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: raid10 performance question
2007-12-25 21:34 ` Peter Grandi
@ 2007-12-26 20:59 ` Bill Davidsen
0 siblings, 0 replies; 4+ messages in thread
From: Bill Davidsen @ 2007-12-26 20:59 UTC (permalink / raw)
To: Peter Grandi; +Cc: Linux RAID
Peter Grandi wrote:
>>>> On Tue, 25 Dec 2007 19:08:15 +0000,
>>>> pg_lxra@lxra.for.sabi.co.UK (Peter Grandi) said:
>>>>
>
> [ ... ]
>
>
>>> It's the raid10,f2 *read* performance in degraded mode that is
>>> strange - I get almost exactly 50% of the non-degraded mode
>>> read performance. Why is that?
>>>
>
>
>> [ ... ] the mirror blocks have to be read from the inner
>> cylinders of the next disk, which are usually a lot slower
>> than the outer ones. [ ... ]
>>
>
> Just to be complete there is of course the other issue that
> affect sustained writes too, which is extra seeks. If disk B
> fails the situation becomes:
>
> DISK
> A X C D
>
> 1 X 3 4
> . . . .
> . . . .
> . . . .
> -------
> 4 X 2 3
> . . . .
> . . . .
> . . . .
>
> Not only must block 2 be read from an inner cylinder, but to
> read block 3 there must be a seek to an outer cylinder on the
> same disk. Which is the same well known issue when doing
> sustained writes with RAID10 'f2'.
I have often wondered why the elevator code doesn't do better on this
sustained load, grouping the writes at the drive extremities so there
would be lots of writes to nearby cylinders then a big seek and lots of
writes near the next position. I tried bumping the stripe_cache,
changing to alternate elevators, and just increasing the physical
memory, and never saw any serious improvement beyond the speed with
default settings.
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-12-26 20:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-23 14:26 raid10 performance question Jon Nelson
2007-12-25 19:08 ` Peter Grandi
2007-12-25 21:34 ` Peter Grandi
2007-12-26 20:59 ` Bill Davidsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).