* layout of far blocks in raid10
@ 2010-05-11 15:12 Keld Simonsen
2010-05-11 17:13 ` Aryeh Gregor
0 siblings, 1 reply; 6+ messages in thread
From: Keld Simonsen @ 2010-05-11 15:12 UTC (permalink / raw)
To: linux-raid
Hi
There is a quesition on block layout in the raid10 far layout,
that I would like to know more about.
For 4 drives, and with 2 copies (-n 4 -p n2) I see several
possible layouts, 3 of them are, showing the beginning of each raid0 section:
Disks:
a b c d
Layout 1:
1 2 3 4
..............
4 1 2 3
Layout 2:
1 2 3 4
..............
3 4 1 2
Layout 3:
1 2 3 4
..............
2 3 4 1
This gives 3! combinations for double faliure, or in all 6 possibilities.
These are the combinations that will contain all blocks with only 2 working
drives:
Layout 1: a+c, b+d will work, total 2 combinations
Layout 2: a+b, c+d, a+d b+c would work, total 4 combinations
Layout 3: a+c, b+d would work, total 2 combinations
So the best layout would be layout 2, as it provides a
4/6 chance = 67 % that it will survive the 2nd disk failure,
while the 2 others only have 2/6 = 33 % chance of surviving
the 2nd disk failure.
How is the "-n 4 -p f2" layout actually done?
I think one could do similar analyses for other numbers of drives.
Is there any general pattern one (algorithm) one could use?
I think one key is allocating copies in pairs. Maybe more could be
gained by allocating the blocks in groups of powers of 2.
Or maybe the only gain comes from the grouping in pairs as there are
only 2 copies.
best regards
Keld
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: layout of far blocks in raid10
2010-05-11 15:12 layout of far blocks in raid10 Keld Simonsen
@ 2010-05-11 17:13 ` Aryeh Gregor
2010-05-11 21:56 ` Neil Brown
0 siblings, 1 reply; 6+ messages in thread
From: Aryeh Gregor @ 2010-05-11 17:13 UTC (permalink / raw)
To: Keld Simonsen; +Cc: linux-raid
On Tue, May 11, 2010 at 11:12 AM, Keld Simonsen <keld@keldix.com> wrote:
> There is a quesition on block layout in the raid10 far layout,
> that I would like to know more about.
> For 4 drives, and with 2 copies (-n 4 -p n2) I see several
> possible layouts, 3 of them are, showing the beginning of each raid0 section:
There are only two layouts possible here: cyclic, and
double-transposition. The first can be summarized in cycle notation
<http://en.wikipedia.org/wiki/Cycle_notation> as (abcd), where two
letters are adjacent if the extra copy of the first letter is on the
same disk as the second letter, and it's assumed the letters wrap
around in the parentheses (so the extra copy of d is on the same disk
as a). The second is (ab)(cd). So for instance, your example 1 is
(1432), example 2 is (13)(24), and example 3 is (1234). For larger
numbers you have more possibilities, like (abc)(def) or (abcd)(ef) for
six drives. The exact number of possibilities is the number of
partitions of the number of drives
<http://en.wikipedia.org/wiki/Partition_(number_theory)> that don't
include 1.
As far as I know (hopefully someone will correct me if I'm wrong),
RAID10 in mdadm stores data like (ab)(cd)(ef)..., at least if you have
an even number of drives. Thus one disk out of every pair can fail
and you'll still have your data, where the pairs are determined by the
order you specify on the command line. I don't know if this behavior
is guaranteed, but you can verify it by leaving some devices missing
-- trying to create a RAID10 with "/dev/sda1 /dev/sdb1 missing
missing" will fail, but "/dev/sda1 missing /dev/sdb1 missing" will
succeed, at least in my limited experience.
I don't know what mdadm does if there are an odd number of drives --
perhaps something like (ab)(cd)(efg), perhaps something more
complicated. I know more about mathematics than about mdadm. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: layout of far blocks in raid10
2010-05-11 17:13 ` Aryeh Gregor
@ 2010-05-11 21:56 ` Neil Brown
2010-05-11 22:22 ` Aryeh Gregor
2010-05-11 22:35 ` Keld Simonsen
0 siblings, 2 replies; 6+ messages in thread
From: Neil Brown @ 2010-05-11 21:56 UTC (permalink / raw)
To: Aryeh Gregor; +Cc: Keld Simonsen, linux-raid
On Tue, 11 May 2010 13:13:06 -0400
Aryeh Gregor <Simetrical+list@gmail.com> wrote:
> On Tue, May 11, 2010 at 11:12 AM, Keld Simonsen <keld@keldix.com> wrote:
> > There is a quesition on block layout in the raid10 far layout,
> > that I would like to know more about.
> > For 4 drives, and with 2 copies (-n 4 -p n2) I see several
> > possible layouts, 3 of them are, showing the beginning of each raid0 section:
>
> There are only two layouts possible here: cyclic, and
> double-transposition. The first can be summarized in cycle notation
> <http://en.wikipedia.org/wiki/Cycle_notation> as (abcd), where two
> letters are adjacent if the extra copy of the first letter is on the
> same disk as the second letter, and it's assumed the letters wrap
> around in the parentheses (so the extra copy of d is on the same disk
> as a). The second is (ab)(cd). So for instance, your example 1 is
> (1432), example 2 is (13)(24), and example 3 is (1234). For larger
> numbers you have more possibilities, like (abc)(def) or (abcd)(ef) for
> six drives. The exact number of possibilities is the number of
> partitions of the number of drives
> <http://en.wikipedia.org/wiki/Partition_(number_theory)> that don't
> include 1.
>
> As far as I know (hopefully someone will correct me if I'm wrong),
> RAID10 in mdadm stores data like (ab)(cd)(ef)..., at least if you have
> an even number of drives.
I'm not quite sure how to respond to this... As a mathematician I would
expect you to understand the important of precision in choosing words, yet
you use the word "know" for something that is exactly wrong. Either you mean
"guess" or you have been seriously misinformed. If it is the latter, then
please let me know where this misinformation came from so I can see about
getting it corrected.
md/raid10 uses a simple cyclic layout in all cases. It does so because this
layout is completely general and works for all numbers of devices and copies.
So you can only survive multiple device failures where are most N-1 are
adjacent where N is the number of copies, and the first and last devices are
treated as adjacent.
NeilBrown
> Thus one disk out of every pair can fail
> and you'll still have your data, where the pairs are determined by the
> order you specify on the command line. I don't know if this behavior
> is guaranteed, but you can verify it by leaving some devices missing
> -- trying to create a RAID10 with "/dev/sda1 /dev/sdb1 missing
> missing" will fail, but "/dev/sda1 missing /dev/sdb1 missing" will
> succeed, at least in my limited experience.
>
> I don't know what mdadm does if there are an odd number of drives --
> perhaps something like (ab)(cd)(efg), perhaps something more
> complicated. I know more about mathematics than about mdadm. :)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: layout of far blocks in raid10
2010-05-11 21:56 ` Neil Brown
@ 2010-05-11 22:22 ` Aryeh Gregor
2010-05-11 22:54 ` Neil Brown
2010-05-11 22:35 ` Keld Simonsen
1 sibling, 1 reply; 6+ messages in thread
From: Aryeh Gregor @ 2010-05-11 22:22 UTC (permalink / raw)
To: Neil Brown; +Cc: Keld Simonsen, linux-raid
On Tue, May 11, 2010 at 5:56 PM, Neil Brown <neilb@suse.de> wrote:
> I'm not quite sure how to respond to this... As a mathematician I would
> expect you to understand the important of precision in choosing words, yet
> you use the word "know" for something that is exactly wrong. Either you mean
> "guess" or you have been seriously misinformed. If it is the latter, then
> please let me know where this misinformation came from so I can see about
> getting it corrected.
>
> md/raid10 uses a simple cyclic layout in all cases. It does so because this
> layout is completely general and works for all numbers of devices and copies.
>
> So you can only survive multiple device failures where are most N-1 are
> adjacent where N is the number of copies, and the first and last devices are
> treated as adjacent.
Mathematicians are sometimes wrong too, sadly. :) (And I'm only a
grad student!) I believe this is where I got my info:
http://git.debian.org/?p=pkg-mdadm/mdadm.git;a=blob_plain;f=debian/FAQ;hb=HEAD
The answer to question 20 of that suggests that if you have four
disks, 0 1 2 3, then 0 and 1 form one pair and 2 and 3 form the other.
If 2 fails, then 0 or 1 could still fail without data loss, but a
failure of 3 will cause data loss. Obviously, you know what you're
talking about better than a Debian FAQ, so unless I'm misunderstanding
the FAQ or you or both, maybe you should talk to the author of that.
Testing with loopback files does seem to show that failing the second
and third drives in a four-drive RAID will cause the RAID to fail, as
I would predict from what you say and contrary to what I interpreted
that FAQ to mean, so hopefully now I understand correctly.
Thanks for the correction. Next time I'll be more cautious.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: layout of far blocks in raid10
2010-05-11 21:56 ` Neil Brown
2010-05-11 22:22 ` Aryeh Gregor
@ 2010-05-11 22:35 ` Keld Simonsen
1 sibling, 0 replies; 6+ messages in thread
From: Keld Simonsen @ 2010-05-11 22:35 UTC (permalink / raw)
To: Neil Brown; +Cc: Aryeh Gregor, linux-raid
On Wed, May 12, 2010 at 07:56:56AM +1000, Neil Brown wrote:
> On Tue, 11 May 2010 13:13:06 -0400
> Aryeh Gregor <Simetrical+list@gmail.com> wrote:
>
> > On Tue, May 11, 2010 at 11:12 AM, Keld Simonsen <keld@keldix.com> wrote:
> > > There is a question on block layout in the raid10 far layout,
> > > that I would like to know more about.
> > > For 4 drives, and with 2 copies (-n 4 -p n2) I see several
> > > possible layouts, 3 of them are, showing the beginning of each raid0 section:
> >
> > There are only two layouts possible here: cyclic, and
> > double-transposition. The first can be summarized in cycle notation
> > <http://en.wikipedia.org/wiki/Cycle_notation> as (abcd), where two
> > letters are adjacent if the extra copy of the first letter is on the
> > same disk as the second letter, and it's assumed the letters wrap
> > around in the parentheses (so the extra copy of d is on the same disk
> > as a). The second is (ab)(cd). So for instance, your example 1 is
> > (1432), example 2 is (13)(24), and example 3 is (1234). For larger
> > numbers you have more possibilities, like (abc)(def) or (abcd)(ef) for
> > six drives. The exact number of possibilities is the number of
> > partitions of the number of drives
> > <http://en.wikipedia.org/wiki/Partition_(number_theory)> that don't
> > include 1.
> >
> > As far as I know (hopefully someone will correct me if I'm wrong),
> > RAID10 in mdadm stores data like (ab)(cd)(ef)..., at least if you have
> > an even number of drives.
>
> I'm not quite sure how to respond to this... As a mathematician I would
> expect you to understand the important of precision in choosing words, yet
> you use the word "know" for something that is exactly wrong. Either you mean
> "guess" or you have been seriously misinformed. If it is the latter, then
> please let me know where this misinformation came from so I can see about
> getting it corrected.
>
> md/raid10 uses a simple cyclic layout in all cases. It does so because this
> layout is completely general and works for all numbers of devices and copies.
>
> So you can only survive multiple device failures where are most N-1 are
> adjacent where N is the number of copies, and the first and last devices are
> treated as adjacent.
Hmm, I think there is then room for improvement here.
For a 4 drive raid10,f2 I do think it is a significant enhancement to
go from 33 % chance of recovery with 2 failing disks, to 67 %.
This would also go for raid10,n2, I think. And a 4 drive raid1+0 would
then have better probabilities than a 4 drive raid10,n2...
Enhancements would probably be even better for raid10 with more drives.
Any bid on the order of improvements to be theoretically obtainable?
It would also be interesting to find out what could be done for the case
where you want to protect controller failure or the like, that is, a
failure of a whole group of drives within an array.
I would like to have some kind of guidance written up for the wiki.
best regards
keld
> NeilBrown
>
> > Thus one disk out of every pair can fail
> > and you'll still have your data, where the pairs are determined by the
> > order you specify on the command line. I don't know if this behavior
> > is guaranteed, but you can verify it by leaving some devices missing
> > -- trying to create a RAID10 with "/dev/sda1 /dev/sdb1 missing
> > missing" will fail, but "/dev/sda1 missing /dev/sdb1 missing" will
> > succeed, at least in my limited experience.
> >
> > I don't know what mdadm does if there are an odd number of drives --
> > perhaps something like (ab)(cd)(efg), perhaps something more
> > complicated. I know more about mathematics than about mdadm. :)
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: layout of far blocks in raid10
2010-05-11 22:22 ` Aryeh Gregor
@ 2010-05-11 22:54 ` Neil Brown
0 siblings, 0 replies; 6+ messages in thread
From: Neil Brown @ 2010-05-11 22:54 UTC (permalink / raw)
To: Aryeh Gregor; +Cc: Keld Simonsen, linux-raid
On Tue, 11 May 2010 18:22:58 -0400
Aryeh Gregor <Simetrical+list@gmail.com> wrote:
> On Tue, May 11, 2010 at 5:56 PM, Neil Brown <neilb@suse.de> wrote:
> > I'm not quite sure how to respond to this... As a mathematician I would
> > expect you to understand the important of precision in choosing words, yet
> > you use the word "know" for something that is exactly wrong. Either you mean
> > "guess" or you have been seriously misinformed. If it is the latter, then
> > please let me know where this misinformation came from so I can see about
> > getting it corrected.
> >
> > md/raid10 uses a simple cyclic layout in all cases. It does so because this
> > layout is completely general and works for all numbers of devices and copies.
> >
> > So you can only survive multiple device failures where are most N-1 are
> > adjacent where N is the number of copies, and the first and last devices are
> > treated as adjacent.
>
> Mathematicians are sometimes wrong too, sadly. :) (And I'm only a
> grad student!) I believe this is where I got my info:
A grad student! You must be over educated:
http://www.maa.org/devlin/devlin_02_10.html
:-)
>
> http://git.debian.org/?p=pkg-mdadm/mdadm.git;a=blob_plain;f=debian/FAQ;hb=HEAD
Thanks... I guess I should read through that and report errors...
>
> The answer to question 20 of that suggests that if you have four
> disks, 0 1 2 3, then 0 and 1 form one pair and 2 and 3 form the other.
> If 2 fails, then 0 or 1 could still fail without data loss, but a
> failure of 3 will cause data loss. Obviously, you know what you're
> talking about better than a Debian FAQ, so unless I'm misunderstanding
> the FAQ or you or both, maybe you should talk to the author of that.
The conclusion stated in question 20 is correct if you are considering the
'near' layout, though the reasoning is foggy and doesn't generalise to the
'far' or 'offset' layout.
With a 'near 2' layout on 4 drives, the blocks are:
0 0 1 1
2 2 3 3
which looks like striping across mirrored pairs, but that is really just a
coincidence.
On 5 drives it would look like:
0 0 1 1 2
2 3 3 4 4
The rule "OK as long as no two adjacent devices fail" still holds, though
there are some cases where it is OK even if two adjacent devices fail, for
the even-number-of-devices case.
NeilBrown
>
> Testing with loopback files does seem to show that failing the second
> and third drives in a four-drive RAID will cause the RAID to fail, as
> I would predict from what you say and contrary to what I interpreted
> that FAQ to mean, so hopefully now I understand correctly.
>
> Thanks for the correction. Next time I'll be more cautious.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-05-11 22:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-11 15:12 layout of far blocks in raid10 Keld Simonsen
2010-05-11 17:13 ` Aryeh Gregor
2010-05-11 21:56 ` Neil Brown
2010-05-11 22:22 ` Aryeh Gregor
2010-05-11 22:54 ` Neil Brown
2010-05-11 22:35 ` Keld Simonsen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).