* RAID 10 far and offset on-disk layouts
@ 2013-12-27 14:29 Gionatan Danti
2013-12-27 14:46 ` Peter Grandi
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Gionatan Danti @ 2013-12-27 14:29 UTC (permalink / raw)
To: linux-raid
Hi all,
I (think of) quite well understand how far and offset work, but I can
not find any data on the precise on-disk layout.
FAR LAYOUT
md(4) states:
"The first copy of all data blocks will be striped across the early part
of all drives in RAID0 fashion, and then the next copy of all blocks
will be striped across a later section of all drives, always ensuring
that all copies of any given block are on different drives"
The "on different drives" part let me wonder _how_ are chunks
distributed. On a 4-disk array, I can imagine some different schemas:
1) A1 A2 A3 A4
.. .. .. ..
A4 A1 A2 A3
2) A1 A2 A3 A4
.. .. .. ..
A2 A1 A4 A3
The first schema is the one depicted by SuSe documentation [1], while
the second is the one described by Wikipedia [2].
Question 1: as the two schema have different reliability
characteristics, which is really used?
OFFSET LAYOUT
md(4) states:
"When 'offset' replicas are chosen, the multiple copies of a given chunk
are laid out on consecutive drives and at consecutive offsets.
Effectively each stripe is duplicated and the copies are offset by one
device."
This means a schema like this:
3) A1 A2 A3 A4
A4 A1 A2 A3
.. .. .. ..
However, this is susceptible to any consecutive two-disk failures. A
schema like
4) A1 A2 A3 A4
A2 A1 A4 A3
would not suffer from this problem (eg: disk 2 & 3 can fail and the
array is still working).
Question 2: apart from simplicity, why the offset layout use the schema
as n.3? I miss something?
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti
@ 2013-12-27 14:46 ` Peter Grandi
2013-12-27 15:16 ` Gionatan Danti
2013-12-27 15:19 ` keld
2 siblings, 0 replies; 22+ messages in thread
From: Peter Grandi @ 2013-12-27 14:46 UTC (permalink / raw)
To: Linux RAID
[ ... ]
> The "on different drives" part let me wonder _how_ are chunks
> distributed. [ ... ] Question 1: as the two schema have
> different reliability characteristics, which is really used?
It does not matter (except to people writing MD-specific tools).
There is nothing special as to the ordering of drives or chunks
on drives. Also reliability is a *statistical* property not a
geometric one...
> [ ... ] However, this is susceptible to any consecutive
> two-disk failures. A schema like [... ] would not suffer from
> this problem (eg: disk 2 & 3 can fail and the array is still
> working).
That "consecutive two-disk failures" is really funny!
If two-paired-disk failure in RAID10 bother you, try RAID14:
http://www.sabi.co.uk/blog/13-two.html#131213
Warning: that does not come at no cost :-).
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti
2013-12-27 14:46 ` Peter Grandi
@ 2013-12-27 15:16 ` Gionatan Danti
2013-12-27 17:16 ` Peter Grandi
2013-12-27 15:19 ` keld
2 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2013-12-27 15:16 UTC (permalink / raw)
To: linux-raid
> It does not matter (except to people writing MD-specific tools).
> There is nothing special as to the ordering of drives or chunks
> on drives. Also reliability is a *statistical* property not a
> geometric one...
Uhm, why it don't matter? For clarity, let me redraw the two schemas:
1) A1 A2 A3 A4
.. .. .. ..
A4 A1 A2 A3
2) A1 A2 A3 A4
.. .. .. ..
A2 A1 A4 A3
Schema n.1 will fail on any adjacent disk failure. Eg: 1 & 2, 2 & 3, 3 &
4, 4 & 1.
On the other hand, schema n.2 will become inactive only when 1 & 2 or 3
& 4 disk fail, but not, for example, when 2 & 3 or 1 & 4 fail.
Or I misunderstand something?
> That "consecutive two-disk failures" is really funny!
Er, my English is not very good :p
I really was talking about adjacent disk failures. Sorry!
> If two-paired-disk failure in RAID10 bother you, try RAID14:
>
> http://www.sabi.co.uk/blog/13-two.html#131213
>
> Warning: that does not come at no cost :-).
Thank you very mych for the link! I need some time to read it carefully...
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti
2013-12-27 14:46 ` Peter Grandi
2013-12-27 15:16 ` Gionatan Danti
@ 2013-12-27 15:19 ` keld
2013-12-27 15:22 ` Gionatan Danti
2 siblings, 1 reply; 22+ messages in thread
From: keld @ 2013-12-27 15:19 UTC (permalink / raw)
To: Gionatan Danti; +Cc: linux-raid
On Fri, Dec 27, 2013 at 03:29:49PM +0100, Gionatan Danti wrote:
> Hi all,
> I (think of) quite well understand how far and offset work, but I can
> not find any data on the precise on-disk layout.
>
> FAR LAYOUT
> md(4) states:
> "The first copy of all data blocks will be striped across the early part
> of all drives in RAID0 fashion, and then the next copy of all blocks
> will be striped across a later section of all drives, always ensuring
> that all copies of any given block are on different drives"
>
> The "on different drives" part let me wonder _how_ are chunks
> distributed. On a 4-disk array, I can imagine some different schemas:
>
> 1) A1 A2 A3 A4
> .. .. .. ..
> A4 A1 A2 A3
>
> 2) A1 A2 A3 A4
> .. .. .. ..
> A2 A1 A4 A3
>
> The first schema is the one depicted by SuSe documentation [1], while
> the second is the one described by Wikipedia [2].
>
> Question 1: as the two schema have different reliability
> characteristics, which is really used?
The wikipedia description is what you get for new arrays with newer
kernels, while the suse documentation is what you will get with older kernels.
The wikipedia layout was made because there are better chances of recovery,
Chances went from 1/3 to 2/3 with eg 4 drives, when 2 drives were failing.
I would say that the Suse description is just not updated.
Best regards
Keld
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2013-12-27 15:19 ` keld
@ 2013-12-27 15:22 ` Gionatan Danti
2013-12-27 15:49 ` keld
0 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2013-12-27 15:22 UTC (permalink / raw)
To: keld; +Cc: linux-raid
> On Fri, Dec 27, 2013 at 03:29:49PM +0100, Gionatan Danti wrote:
> The wikipedia description is what you get for new arrays with newer
> kernels, while the suse documentation is what you will get with older kernels.
> The wikipedia layout was made because there are better chances of recovery,
> Chances went from 1/3 to 2/3 with eg 4 drives, when 2 drives were failing.
>
> I would say that the Suse description is just not updated.
>
> Best regards
> Keld
>
Interesting. Two question:
1) from which kernel the layout is the one depicted by Wikipedia?
2) it is possible, using mdadm, check what "far" layout is in use?
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2013-12-27 15:22 ` Gionatan Danti
@ 2013-12-27 15:49 ` keld
2014-01-09 8:03 ` Gionatan Danti
0 siblings, 1 reply; 22+ messages in thread
From: keld @ 2013-12-27 15:49 UTC (permalink / raw)
To: Gionatan Danti; +Cc: linux-raid
On Fri, Dec 27, 2013 at 04:22:55PM +0100, Gionatan Danti wrote:
>
> >On Fri, Dec 27, 2013 at 03:29:49PM +0100, Gionatan Danti wrote:
> >The wikipedia description is what you get for new arrays with newer
> >kernels, while the suse documentation is what you will get with older
> >kernels.
> >The wikipedia layout was made because there are better chances of recovery,
> >Chances went from 1/3 to 2/3 with eg 4 drives, when 2 drives were failing.
> >
> >I would say that the Suse description is just not updated.
> >
> >Best regards
> >Keld
> >
>
> Interesting. Two question:
> 1) from which kernel the layout is the one depicted by Wikipedia?
> 2) it is possible, using mdadm, check what "far" layout is in use?
I cannot answer that. Neil Brown should know.
Best regards
Keld
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2013-12-27 15:16 ` Gionatan Danti
@ 2013-12-27 17:16 ` Peter Grandi
2013-12-27 17:32 ` Gionatan Danti
0 siblings, 1 reply; 22+ messages in thread
From: Peter Grandi @ 2013-12-27 17:16 UTC (permalink / raw)
To: Linux RAID
>> It does not matter (except to people writing MD-specific
>> tools). There is nothing special as to the ordering of
>> drives or chunks on drives. Also reliability is a
>> *statistical* property not a geometric one...
[ ... ]
> 1) A1 A2 A3 A4
> .. .. .. ..
> A4 A1 A2 A3
> 2) A1 A2 A3 A4
> .. .. .. ..
> A2 A1 A4 A3
> Schema n.1 will fail on any adjacent disk failure. Eg: 1 & 2,
> 2 & 3, 3 & 4, 4 & 1. On the other hand, schema n.2 will become
> inactive only when 1 & 2 or 3 & 4 disk fail, but not, for
> example, when 2 & 3 or 1 & 4 fail.
>> That "consecutive two-disk failures" is really funny!
> [ ... ] talking about adjacent disk failures. Sorry!
The same, that the MD member devices are adjacent or not does
not matter, ordering is irrelevant.
When you compare layout 1) above and 2) above what matters is
how many 2-device failures lead to loss of data, not how many
"adjacent" 2-device failures.
RAID10 has the property that only the failure of 2 *paired* (for
the usual case of two copies of the same chunk) member devices,
whether "adjacent" or not, will lead to loss of data. So what
matters are which devices are paired, not whether they are
adjacent or not.
Using the layout convention of:
https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#byz81ho
and doing the full picture of 4 stripes, with chunks 0, 1, 2, 3,
4, 5, 6, each replicated on 2 distinct drives out of 4:
1)
a b c d
--------------------------
0 1 2 3
3 0 1 2
4 5 6 7
7 4 5 6
2)
a b c d
--------------------------
0 1 2 3
1 0 3 2
4 5 6 7
5 4 7 6
It becomes more easily apparent that in layout 1):
* 'a' is paired with:
- 'b' (chunks 0 and 4);
- 'd' (chunks 3 and 7).
* 'c' is paired with:
- 'b' (chunks 1 and 2);
- 'd' (chunks 5 and 6).
while in layout 2) 'a' is paired with 'b' (chunks 0, 1, 4, 5)
and 'c' with 'd' (chunks 2, 3, 6, 7). Therefore only the failure
of 'a' and 'c' will result in no data loss. It is very easy to
swap around 'b' and 'c' an entirely equivalent layout where not
every failure of two "adjacent" devices results in data loss.
Therefore the *probability* of loss of data because of 2 member
devices failing is higher in layout 1) than layout 2), whether
or not the drives are "adjacent".
Note that arguably layout 1) is not really RAID10, because an
important property of RAID10 is or should be that there are
only N/2 pairs out of N drives. Otherwise it is not quite
'RAID1' if a chunk position in a stripe can be replicated on 2
other devices, half the replicas on one and half on another.
That the member devices are *adjacent* is irrelevant; what
matters is the statistical chance, which is driven by the
percent of cases where 2 failures result in data loss, which
driven by the number of paired drives.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2013-12-27 17:16 ` Peter Grandi
@ 2013-12-27 17:32 ` Gionatan Danti
2013-12-27 18:26 ` keld
0 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2013-12-27 17:32 UTC (permalink / raw)
To: Peter Grandi, Linux RAID
> <snip>
> Therefore the *probability* of loss of data because of 2 member
> devices failing is higher in layout 1) than layout 2), whether
> or not the drives are "adjacent".
>
> Note that arguably layout 1) is not really RAID10, because an
> important property of RAID10 is or should be that there are
> only N/2 pairs out of N drives. Otherwise it is not quite
> 'RAID1' if a chunk position in a stripe can be replicated on 2
> other devices, half the replicas on one and half on another.
>
> That the member devices are *adjacent* is irrelevant; what
> matters is the statistical chance, which is driven by the
> percent of cases where 2 failures result in data loss, which
> driven by the number of paired drives.
Very detailed answer, thank you Peter :)
Based on what keld told before, the current scheme if n.2 (wikipedia's
one), right? It is possible, using mdadm, understand the physical layout
(if n.1 or n.2) of a live RAID10 array?
As schema n.1 lead to increased probability of data loss, why offset
layout use it instead of, say, some variance of schema n.2?
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2013-12-27 17:32 ` Gionatan Danti
@ 2013-12-27 18:26 ` keld
0 siblings, 0 replies; 22+ messages in thread
From: keld @ 2013-12-27 18:26 UTC (permalink / raw)
To: Gionatan Danti; +Cc: Peter Grandi, Linux RAID
On Fri, Dec 27, 2013 at 06:32:48PM +0100, Gionatan Danti wrote:
> ><snip>
> >Therefore the *probability* of loss of data because of 2 member
> >devices failing is higher in layout 1) than layout 2), whether
> >or not the drives are "adjacent".
> >
> > Note that arguably layout 1) is not really RAID10, because an
> > important property of RAID10 is or should be that there are
> > only N/2 pairs out of N drives. Otherwise it is not quite
> > 'RAID1' if a chunk position in a stripe can be replicated on 2
> > other devices, half the replicas on one and half on another.
> >
> >That the member devices are *adjacent* is irrelevant; what
> >matters is the statistical chance, which is driven by the
> >percent of cases where 2 failures result in data loss, which
> >driven by the number of paired drives.
>
> Very detailed answer, thank you Peter :)
>
> Based on what keld told before, the current scheme if n.2 (wikipedia's
> one), right? It is possible, using mdadm, understand the physical layout
> (if n.1 or n.2) of a live RAID10 array?
>
> As schema n.1 lead to increased probability of data loss, why offset
> layout use it instead of, say, some variance of schema n.2?
I am not sure of the probabilities on chances of surviving more
than 1 failing drive for the offset layout, but my intuition tells
me it is rather bad. As it shifts the blocks one block at a time,
my guts feeling is that it really cannot survive more than one
failing disk.
On the other hand raid10,far in the second layout (wikipedia - and
I am the author of the text:-) I am quite sure that the layout is
theoretically optimal, as you in the luckiest case can survive
n/2 drives failing, where n is your number of drives, and it is
integer division... I did the design of this layout for maximum
redundancy.
The main reason for chosing raid10,far is that it is faster
for single reads, a speed of raid0, while for other operations it is
about the same. For degraded arrays raid10,far is probably worse
than the other raid10 types, while the IO scheduling algorithm
probably remedies some of the bad raw performance on the degraded
raid10,far.
Also the use of inner and faster sectors of a hard drive gives
raid10,far an edge towards the other raid10 types.
Best regards
Keld
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2013-12-27 15:49 ` keld
@ 2014-01-09 8:03 ` Gionatan Danti
2014-01-12 23:20 ` NeilBrown
0 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2014-01-09 8:03 UTC (permalink / raw)
To: linux-raid; +Cc: keld, neilb
>>
>> Interesting. Two question:
>> 1) from which kernel the layout is the one depicted by Wikipedia?
>> 2) it is possible, using mdadm, check what "far" layout is in use?
>
> I cannot answer that. Neil Brown should know.
>
> Best regards
> Keld
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Hi all,
anyone with an update on these two questions?
I was thinking to use the kernel block trace facility to track disk
access and infer the on-disk data structure, but I haven't tried for now.
On the other hand, I carefully looked at mdadm output, without finding
anything related to physical block placing.
Any new advices on that regard?
Thanks.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-09 8:03 ` Gionatan Danti
@ 2014-01-12 23:20 ` NeilBrown
2014-01-13 8:52 ` Gionatan Danti
0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2014-01-12 23:20 UTC (permalink / raw)
To: Gionatan Danti; +Cc: linux-raid, keld
[-- Attachment #1: Type: text/plain, Size: 1160 bytes --]
On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> >>
> >> Interesting. Two question:
> >> 1) from which kernel the layout is the one depicted by Wikipedia?
Exactly what depiction in wikipedia are you referring to? A link to the
image might help.
> >> 2) it is possible, using mdadm, check what "far" layout is in use?
mdadm --detail /dev/mdWHATEVER | grep Layout
> >
> > I cannot answer that. Neil Brown should know.
> >
> > Best regards
> > Keld
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>
> Hi all,
> anyone with an update on these two questions?
>
> I was thinking to use the kernel block trace facility to track disk
> access and infer the on-disk data structure, but I haven't tried for now.
>
> On the other hand, I carefully looked at mdadm output, without finding
> anything related to physical block placing.
Look for "Layout".
NeilBrown
>
> Any new advices on that regard?
> Thanks.
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-12 23:20 ` NeilBrown
@ 2014-01-13 8:52 ` Gionatan Danti
2014-01-13 9:45 ` NeilBrown
0 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2014-01-13 8:52 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti
Hi Neil,
let me recap from a previous message:
>FAR LAYOUT
>md(4) states:
>"The first copy of all data blocks will be striped across the early >part
>of all drives in RAID0 fashion, and then the next copy of all blocks
>will be striped across a later section of all drives, always ensuring
>that all copies of any given block are on different drives"
>
>The "on different drives" part let me wonder _how_ are chunks
>distributed. On a 4-disk array, I can imagine some different schemas:
>
>1) A1 A2 A3 A4
> .. .. .. ..
> A4 A1 A2 A3
>
>2) A1 A2 A3 A4
> .. .. .. ..
> A2 A1 A4 A3
>
>The first schema is the one depicted by SuSe documentation [1], while
>the second is the one described by Wikipedia [2].
>
>Question 1: as the two schema have different reliability
>characteristics, which is really used?
SuSe entry:
https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
Wikipedia entry:
http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how
far layout is depicted)
Keld kindly told me that the SuSe is simply not updated, as it depict a
situation changed with newer kernels. So my two questions:
1) from which kernel the layout is the one depicted by Wikipedia?
2) it is possible, using mdadm, check what "far" layout is in use?
From what I can see, a "mdadm --detail /dev/mdWHATEVER | grep Layout"
tell me if using far vs near vs offset layout, but not the physical
on-disk chunks organization (eg: far "type" 1 or 2).
Anyway, the thread started because I wonder why the OFFSET layout couple
each disk to other two disks. Let me quote again:
>OFFSET LAYOUT
>md(4) states:
>"When 'offset' replicas are chosen, the multiple copies of a given >chunk
>are laid out on consecutive drives and at consecutive offsets.
>Effectively each stripe is duplicated and the copies are offset by one
>device."
>
>This means a schema like this:
>
>3) A1 A2 A3 A4
> A4 A1 A2 A3
> .. .. .. ..
>
>However, this is susceptible to any consecutive two-disk failures. A
>schema like
>
>4) A1 A2 A3 A4
> A2 A1 A4 A3
>
>would not suffer from this problem (eg: disk 2 & 3 can fail and the
>array is still working).
>
>Question 2: apart from simplicity, why the offset layout use the schema
>as n.3? I miss something?
Full thread link: http://marc.info/?t=138815504400002&r=1&w=2
Excuse me for the long email, I am simply trying to learn something :)
Thank you very much.
On 01/13/2014 12:20 AM, NeilBrown wrote:
> On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
>
>>>>
>>>> Interesting. Two question:
>>>> 1) from which kernel the layout is the one depicted by Wikipedia?
>
> Exactly what depiction in wikipedia are you referring to? A link to the
> image might help.
>
>>>> 2) it is possible, using mdadm, check what "far" layout is in use?
>
> mdadm --detail /dev/mdWHATEVER | grep Layout
>
>
>>>
>>> I cannot answer that. Neil Brown should know.
>>>
>>> Best regards
>>> Keld
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> Hi all,
>> anyone with an update on these two questions?
>>
>> I was thinking to use the kernel block trace facility to track disk
>> access and infer the on-disk data structure, but I haven't tried for now.
>>
>> On the other hand, I carefully looked at mdadm output, without finding
>> anything related to physical block placing.
>
> Look for "Layout".
>
> NeilBrown
>
>
>>
>> Any new advices on that regard?
>> Thanks.
>>
>
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-13 8:52 ` Gionatan Danti
@ 2014-01-13 9:45 ` NeilBrown
2014-01-13 10:15 ` Gionatan Danti
2014-01-14 10:06 ` keld
0 siblings, 2 replies; 22+ messages in thread
From: NeilBrown @ 2014-01-13 9:45 UTC (permalink / raw)
To: Gionatan Danti; +Cc: linux-raid, keld
[-- Attachment #1: Type: text/plain, Size: 4829 bytes --]
On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> Hi Neil,
> let me recap from a previous message:
>
> >FAR LAYOUT
> >md(4) states:
> >"The first copy of all data blocks will be striped across the early >part
> >of all drives in RAID0 fashion, and then the next copy of all blocks
> >will be striped across a later section of all drives, always ensuring
> >that all copies of any given block are on different drives"
> >
> >The "on different drives" part let me wonder _how_ are chunks
> >distributed. On a 4-disk array, I can imagine some different schemas:
> >
> >1) A1 A2 A3 A4
> > .. .. .. ..
> > A4 A1 A2 A3
> >
> >2) A1 A2 A3 A4
> > .. .. .. ..
> > A2 A1 A4 A3
> >
> >The first schema is the one depicted by SuSe documentation [1], while
> >the second is the one described by Wikipedia [2].
> >
> >Question 1: as the two schema have different reliability
> >characteristics, which is really used?
>
> SuSe entry:
> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
>
> Wikipedia entry:
> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how
> far layout is depicted)
>
> Keld kindly told me that the SuSe is simply not updated, as it depict a
> situation changed with newer kernels. So my two questions:
I cannot see an important difference between the two pages you reference.
Both appear to be correct.
> 1) from which kernel the layout is the one depicted by Wikipedia?
These are both valid for any kernel since 2.6.18 with mdadm 2.5 or later.
> 2) it is possible, using mdadm, check what "far" layout is in use?
I think I know what you are talking about now. The md driver in the kernel
supports two sorts of 'far' or 'offset' layouts for arrays where the number
of devices is not an integer multiple of the number of copies.
This has been supported in Linux since v3.9. but is not yet supported by
mdadm.
>
> From what I can see, a "mdadm --detail /dev/mdWHATEVER | grep Layout"
> tell me if using far vs near vs offset layout, but not the physical
> on-disk chunks organization (eg: far "type" 1 or 2).
This is because mdadm does not yet create or report on the new type.
When it does, the above command will be the correct command to find out which
layout is in use (but I don't yet know what the output will say exactly).
NeilBrown
>
> Anyway, the thread started because I wonder why the OFFSET layout couple
> each disk to other two disks. Let me quote again:
>
> >OFFSET LAYOUT
> >md(4) states:
> >"When 'offset' replicas are chosen, the multiple copies of a given >chunk
> >are laid out on consecutive drives and at consecutive offsets.
> >Effectively each stripe is duplicated and the copies are offset by one
> >device."
> >
> >This means a schema like this:
> >
> >3) A1 A2 A3 A4
> > A4 A1 A2 A3
> > .. .. .. ..
> >
> >However, this is susceptible to any consecutive two-disk failures. A
> >schema like
> >
> >4) A1 A2 A3 A4
> > A2 A1 A4 A3
> >
> >would not suffer from this problem (eg: disk 2 & 3 can fail and the
> >array is still working).
> >
> >Question 2: apart from simplicity, why the offset layout use the schema
> >as n.3? I miss something?
>
> Full thread link: http://marc.info/?t=138815504400002&r=1&w=2
>
> Excuse me for the long email, I am simply trying to learn something :)
> Thank you very much.
>
> On 01/13/2014 12:20 AM, NeilBrown wrote:
> > On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> >
> >>>>
> >>>> Interesting. Two question:
> >>>> 1) from which kernel the layout is the one depicted by Wikipedia?
> >
> > Exactly what depiction in wikipedia are you referring to? A link to the
> > image might help.
> >
> >>>> 2) it is possible, using mdadm, check what "far" layout is in use?
> >
> > mdadm --detail /dev/mdWHATEVER | grep Layout
> >
> >
> >>>
> >>> I cannot answer that. Neil Brown should know.
> >>>
> >>> Best regards
> >>> Keld
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>
> >>
> >> Hi all,
> >> anyone with an update on these two questions?
> >>
> >> I was thinking to use the kernel block trace facility to track disk
> >> access and infer the on-disk data structure, but I haven't tried for now.
> >>
> >> On the other hand, I carefully looked at mdadm output, without finding
> >> anything related to physical block placing.
> >
> > Look for "Layout".
> >
> > NeilBrown
> >
> >
> >>
> >> Any new advices on that regard?
> >> Thanks.
> >>
> >
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-13 9:45 ` NeilBrown
@ 2014-01-13 10:15 ` Gionatan Danti
2014-01-13 22:27 ` NeilBrown
2014-01-14 10:06 ` keld
1 sibling, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2014-01-13 10:15 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti
On 01/13/2014 10:45 AM, NeilBrown wrote:
> On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
>
>> Hi Neil,
>> let me recap from a previous message:
>>
>> >FAR LAYOUT
>> >md(4) states:
>> >"The first copy of all data blocks will be striped across the early >part
>> >of all drives in RAID0 fashion, and then the next copy of all blocks
>> >will be striped across a later section of all drives, always ensuring
>> >that all copies of any given block are on different drives"
>> >
>> >The "on different drives" part let me wonder _how_ are chunks
>> >distributed. On a 4-disk array, I can imagine some different schemas:
>> >
>> >1) A1 A2 A3 A4
>> > .. .. .. ..
>> > A4 A1 A2 A3
>> >
>> >2) A1 A2 A3 A4
>> > .. .. .. ..
>> > A2 A1 A4 A3
>> >
>> >The first schema is the one depicted by SuSe documentation [1], while
>> >the second is the one described by Wikipedia [2].
>> >
>> >Question 1: as the two schema have different reliability
>> >characteristics, which is really used?
>>
>> SuSe entry:
>> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
>>
>> Wikipedia entry:
>> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how
>> far layout is depicted)
>>
>> Keld kindly told me that the SuSe is simply not updated, as it depict a
>> situation changed with newer kernels. So my two questions:
>
> I cannot see an important difference between the two pages you reference.
> Both appear to be correct.
Mmm... they seem different to me.
SeSe FAR Layout:
sda1 sdb1 sdc1 sde1
0 1 2 3
4 5 6 7
. . .
3 0 1 2
7 4 5 6
Notice how (for example) sdb1 is coupled both to sda1 (0,4) and
sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.
Now, Wikipedia FAR Layout:
4 drives (sda1, sdb1, sdc1, sdd1)
--------------------
A1 A2 A3 A4
A5 A6 A7 A8
A9 A10 A11 A12
.. .. .. ..
A2 A1 A4 A3
A6 A5 A8 A7
A10 A9 A12 A11
.. .. .. ..
Notice now how a single disk (eg: sdb1) is coupled to only another
_single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose
sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.
I am wrong?
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-13 10:15 ` Gionatan Danti
@ 2014-01-13 22:27 ` NeilBrown
2014-01-13 23:38 ` keld
2014-01-14 9:06 ` Gionatan Danti
0 siblings, 2 replies; 22+ messages in thread
From: NeilBrown @ 2014-01-13 22:27 UTC (permalink / raw)
To: Gionatan Danti; +Cc: linux-raid, keld
[-- Attachment #1: Type: text/plain, Size: 3060 bytes --]
On Mon, 13 Jan 2014 11:15:13 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> On 01/13/2014 10:45 AM, NeilBrown wrote:
> > On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> >
> >> Hi Neil,
> >> let me recap from a previous message:
> >>
> >> >FAR LAYOUT
> >> >md(4) states:
> >> >"The first copy of all data blocks will be striped across the early >part
> >> >of all drives in RAID0 fashion, and then the next copy of all blocks
> >> >will be striped across a later section of all drives, always ensuring
> >> >that all copies of any given block are on different drives"
> >> >
> >> >The "on different drives" part let me wonder _how_ are chunks
> >> >distributed. On a 4-disk array, I can imagine some different schemas:
> >> >
> >> >1) A1 A2 A3 A4
> >> > .. .. .. ..
> >> > A4 A1 A2 A3
> >> >
> >> >2) A1 A2 A3 A4
> >> > .. .. .. ..
> >> > A2 A1 A4 A3
> >> >
> >> >The first schema is the one depicted by SuSe documentation [1], while
> >> >the second is the one described by Wikipedia [2].
> >> >
> >> >Question 1: as the two schema have different reliability
> >> >characteristics, which is really used?
> >>
> >> SuSe entry:
> >> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
> >>
> >> Wikipedia entry:
> >> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how
> >> far layout is depicted)
> >>
> >> Keld kindly told me that the SuSe is simply not updated, as it depict a
> >> situation changed with newer kernels. So my two questions:
> >
> > I cannot see an important difference between the two pages you reference.
> > Both appear to be correct.
>
> Mmm... they seem different to me.
>
> SeSe FAR Layout:
>
> sda1 sdb1 sdc1 sde1
> 0 1 2 3
> 4 5 6 7
> . . .
> 3 0 1 2
> 7 4 5 6
>
> Notice how (for example) sdb1 is coupled both to sda1 (0,4) and
> sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.
>
> Now, Wikipedia FAR Layout:
>
> 4 drives (sda1, sdb1, sdc1, sdd1)
> --------------------
> A1 A2 A3 A4
> A5 A6 A7 A8
> A9 A10 A11 A12
> .. .. .. ..
> A2 A1 A4 A3
> A6 A5 A8 A7
> A10 A9 A12 A11
> .. .. .. ..
>
> Notice now how a single disk (eg: sdb1) is coupled to only another
> _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose
> sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.
>
Thanks for being explicit - it is much easier to answer explicit questions :-)
Yes, they are different. So the wikipedia article is wrong, or at least
misleading. That is not what the "f2" layout looks like.
The md driver does support that layout. I don't know yet what mdadm will
call it, but it won't be called "f2".
So this change:
http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
was wrong.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-13 22:27 ` NeilBrown
@ 2014-01-13 23:38 ` keld
2014-01-14 0:46 ` Stan Hoeppner
2014-01-14 9:06 ` Gionatan Danti
1 sibling, 1 reply; 22+ messages in thread
From: keld @ 2014-01-13 23:38 UTC (permalink / raw)
To: NeilBrown; +Cc: Gionatan Danti, linux-raid
On Tue, Jan 14, 2014 at 09:27:51AM +1100, NeilBrown wrote:
> On Mon, 13 Jan 2014 11:15:13 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
>
> > On 01/13/2014 10:45 AM, NeilBrown wrote:
> > > On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> > >
> > >> Hi Neil,
> > >> let me recap from a previous message:
> > >>
> > >> >FAR LAYOUT
> > >> >md(4) states:
> > >> >"The first copy of all data blocks will be striped across the early >part
> > >> >of all drives in RAID0 fashion, and then the next copy of all blocks
> > >> >will be striped across a later section of all drives, always ensuring
> > >> >that all copies of any given block are on different drives"
> > >> >
> > >> >The "on different drives" part let me wonder _how_ are chunks
> > >> >distributed. On a 4-disk array, I can imagine some different schemas:
> > >> >
> > >> >1) A1 A2 A3 A4
> > >> > .. .. .. ..
> > >> > A4 A1 A2 A3
> > >> >
> > >> >2) A1 A2 A3 A4
> > >> > .. .. .. ..
> > >> > A2 A1 A4 A3
> > >> >
> > >> >The first schema is the one depicted by SuSe documentation [1], while
> > >> >the second is the one described by Wikipedia [2].
> > >> >
> > >> >Question 1: as the two schema have different reliability
> > >> >characteristics, which is really used?
> > >>
> > >> SuSe entry:
> > >> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
> > >>
> > >> Wikipedia entry:
> > >> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how
> > >> far layout is depicted)
> > >>
> > >> Keld kindly told me that the SuSe is simply not updated, as it depict a
> > >> situation changed with newer kernels. So my two questions:
> > >
> > > I cannot see an important difference between the two pages you reference.
> > > Both appear to be correct.
> >
> > Mmm... they seem different to me.
> >
> > SeSe FAR Layout:
> >
> > sda1 sdb1 sdc1 sde1
> > 0 1 2 3
> > 4 5 6 7
> > . . .
> > 3 0 1 2
> > 7 4 5 6
> >
> > Notice how (for example) sdb1 is coupled both to sda1 (0,4) and
> > sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.
> >
> > Now, Wikipedia FAR Layout:
> >
> > 4 drives (sda1, sdb1, sdc1, sdd1)
> > --------------------
> > A1 A2 A3 A4
> > A5 A6 A7 A8
> > A9 A10 A11 A12
> > .. .. .. ..
> > A2 A1 A4 A3
> > A6 A5 A8 A7
> > A10 A9 A12 A11
> > .. .. .. ..
> >
> > Notice now how a single disk (eg: sdb1) is coupled to only another
> > _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose
> > sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.
> >
>
> Thanks for being explicit - it is much easier to answer explicit questions :-)
>
> Yes, they are different. So the wikipedia article is wrong, or at least
> misleading. That is not what the "f2" layout looks like.
>
> The md driver does support that layout. I don't know yet what mdadm will
> call it, but it won't be called "f2".
>
> So this change:
>
> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
>
> was wrong.
Well, it was me doing the wikipedia edit. The edit was done based on information from Neil that this was actually
the layout. Then later we found out that it really was not, but it should be; and then Neil implemented
the better layout. Maybe it is not called "f2", I look forward to be informed what the actual name
will be.
I think the name should be "f2" as it is a "far" layout, with 2 copies, and it really should be
the default for "far" with 2 copies, as the redundancy is much better than the old layout.
Keeping the name would mean that we would not need to make and spread documentation on this,
so that people following existing documentation would automatically get the better implementation.
There is no need that new raid instances of "far" should get the old layout, except for
backwards compatibility.
Best regards
keld
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-13 23:38 ` keld
@ 2014-01-14 0:46 ` Stan Hoeppner
2014-01-14 9:38 ` keld
0 siblings, 1 reply; 22+ messages in thread
From: Stan Hoeppner @ 2014-01-14 0:46 UTC (permalink / raw)
To: keld, NeilBrown; +Cc: Gionatan Danti, linux-raid
On 1/13/2014 5:38 PM, keld@keldix.com wrote:
> On Tue, Jan 14, 2014 at 09:27:51AM +1100, NeilBrown wrote:
...
>> So this change:
>>
>> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
>>
>> was wrong.
>
> Well, it was me doing the wikipedia edit. The edit was done based on information from Neil that this was actually
> the layout. Then later we found out that it really was not, but it should be; and then Neil implemented
> the better layout. Maybe it is not called "f2", I look forward to be informed what the actual name
> will be.
>
> I think the name should be "f2" as it is a "far" layout, with 2 copies, and it really should be
> the default for "far" with 2 copies, as the redundancy is much better than the old layout.
> Keeping the name would mean that we would not need to make and spread documentation on this,
> so that people following existing documentation would automatically get the better implementation.
> There is no need that new raid instances of "far" should get the old layout, except for
> backwards compatibility.
The problem here is that you're creating the Wikipedia page as if it
*is* source reference material. I.e. you're including "original work,
your original work. This is a violation of the Wikipedia rules of
editing. And this kind of situation is exactly why those rules exist.
The layout tables you are including need to exist in a free to duplicate
reference document, and should be copied verbatim from said document.
They should not be created from scratch simply based on information in
an email exchange on a mailing list, just as web forums are not
considered a valid reference source.
Therefore, if such layout tables do not exist in official Linux
documentation they should not be included in Wikipedia. If they do
exist, the information should be copied verbatim, and the source
document referenced. There are no such references in the article.
Wikipedia is an encyclopedia, not a reference work. All information
needs to be source from reference work. If you want your original work
to be included in Wikipedia, you need to create your own documentation,
add it to the Linux Documentation Project, and have it peer reviewed.
Then you can include it, and cite that source.
--
Stan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-13 22:27 ` NeilBrown
2014-01-13 23:38 ` keld
@ 2014-01-14 9:06 ` Gionatan Danti
2014-01-14 9:16 ` NeilBrown
1 sibling, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2014-01-14 9:06 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti
On 01/13/2014 11:27 PM, NeilBrown wrote:
>>
>> Mmm... they seem different to me.
>>
>> SeSe FAR Layout:
>>
>> sda1 sdb1 sdc1 sde1
>> 0 1 2 3
>> 4 5 6 7
>> . . .
>> 3 0 1 2
>> 7 4 5 6
>>
>> Notice how (for example) sdb1 is coupled both to sda1 (0,4) and
>> sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.
>>
>> Now, Wikipedia FAR Layout:
>>
>> 4 drives (sda1, sdb1, sdc1, sdd1)
>> --------------------
>> A1 A2 A3 A4
>> A5 A6 A7 A8
>> A9 A10 A11 A12
>> .. .. .. ..
>> A2 A1 A4 A3
>> A6 A5 A8 A7
>> A10 A9 A12 A11
>> .. .. .. ..
>>
>> Notice now how a single disk (eg: sdb1) is coupled to only another
>> _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose
>> sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.
>>
>
> Thanks for being explicit - it is much easier to answer explicit questions :-)
>
> Yes, they are different. So the wikipedia article is wrong, or at least
> misleading. That is not what the "f2" layout looks like.
>
> The md driver does support that layout. I don't know yet what mdadm will
> call it, but it won't be called "f2".
>
> So this change:
>
> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
>
> was wrong.
>
> NeilBrown
>
Ok, so let recap:
1) FAR layout is the one depicted by SuSe documentation, while the
Wikipedia entry is wrong
2) MD _can_ produce a FAR layout as depicted by Wikipedia, but we don't
know how the user-space mdadm tool call it (maybe it is not implemented
yet?)
3) There are any reasons why FAR and OFFSET layout scramble data in this
manner, coupling any disk with two more disks? It was done for
simplicity, or I am missing something?
4) you confirm that currently we can _not_ create a FAR layout as the
one depicted by wikipedia by no means? What about OFFSET layout?
Thank you very much.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-14 9:06 ` Gionatan Danti
@ 2014-01-14 9:16 ` NeilBrown
2014-01-14 9:27 ` Gionatan Danti
0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2014-01-14 9:16 UTC (permalink / raw)
To: Gionatan Danti; +Cc: linux-raid, keld
[-- Attachment #1: Type: text/plain, Size: 2792 bytes --]
On Tue, 14 Jan 2014 10:06:13 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> On 01/13/2014 11:27 PM, NeilBrown wrote:
> >>
> >> Mmm... they seem different to me.
> >>
> >> SeSe FAR Layout:
> >>
> >> sda1 sdb1 sdc1 sde1
> >> 0 1 2 3
> >> 4 5 6 7
> >> . . .
> >> 3 0 1 2
> >> 7 4 5 6
> >>
> >> Notice how (for example) sdb1 is coupled both to sda1 (0,4) and
> >> sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.
> >>
> >> Now, Wikipedia FAR Layout:
> >>
> >> 4 drives (sda1, sdb1, sdc1, sdd1)
> >> --------------------
> >> A1 A2 A3 A4
> >> A5 A6 A7 A8
> >> A9 A10 A11 A12
> >> .. .. .. ..
> >> A2 A1 A4 A3
> >> A6 A5 A8 A7
> >> A10 A9 A12 A11
> >> .. .. .. ..
> >>
> >> Notice now how a single disk (eg: sdb1) is coupled to only another
> >> _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose
> >> sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.
> >>
> >
> > Thanks for being explicit - it is much easier to answer explicit questions :-)
> >
> > Yes, they are different. So the wikipedia article is wrong, or at least
> > misleading. That is not what the "f2" layout looks like.
> >
> > The md driver does support that layout. I don't know yet what mdadm will
> > call it, but it won't be called "f2".
> >
> > So this change:
> >
> > http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
> >
> > was wrong.
> >
> > NeilBrown
> >
>
> Ok, so let recap:
>
> 1) FAR layout is the one depicted by SuSe documentation, while the
> Wikipedia entry is wrong
Yes.
>
> 2) MD _can_ produce a FAR layout as depicted by Wikipedia, but we don't
> know how the user-space mdadm tool call it (maybe it is not implemented
> yet?)
Yes. Not implemented yet.
>
> 3) There are any reasons why FAR and OFFSET layout scramble data in this
> manner, coupling any disk with two more disks? It was done for
> simplicity, or I am missing something?
It just seemed the easiest thing to do at the time.
>
> 4) you confirm that currently we can _not_ create a FAR layout as the
> one depicted by wikipedia by no means? What about OFFSET layout?
You certainly can created the FAR layout depicted on wikipedia, e.g. by
binary-editing the metadata on some devices, or writing some code which does
that for you. It requires flipping one bit in the metadata and updating the
checksum. You can probably even to it by writing something appropriate into
some sysfs files.
But mdadm cannot do it yet.
Ditto for the new OFFSET layout.
(The old offset layout can be created with "--layout=o2").
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-14 9:16 ` NeilBrown
@ 2014-01-14 9:27 ` Gionatan Danti
0 siblings, 0 replies; 22+ messages in thread
From: Gionatan Danti @ 2014-01-14 9:27 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti
>> Ok, so let recap:
>>
>> 1) FAR layout is the one depicted by SuSe documentation, while the
>> Wikipedia entry is wrong
>
> Yes.
>
>>
>> 2) MD _can_ produce a FAR layout as depicted by Wikipedia, but we don't
>> know how the user-space mdadm tool call it (maybe it is not implemented
>> yet?)
>
> Yes. Not implemented yet.
>
>>
>> 3) There are any reasons why FAR and OFFSET layout scramble data in this
>> manner, coupling any disk with two more disks? It was done for
>> simplicity, or I am missing something?
>
> It just seemed the easiest thing to do at the time.
>
>>
>> 4) you confirm that currently we can _not_ create a FAR layout as the
>> one depicted by wikipedia by no means? What about OFFSET layout?
>
> You certainly can created the FAR layout depicted on wikipedia, e.g. by
> binary-editing the metadata on some devices, or writing some code which does
> that for you. It requires flipping one bit in the metadata and updating the
> checksum. You can probably even to it by writing something appropriate into
> some sysfs files.
> But mdadm cannot do it yet.
> Ditto for the new OFFSET layout.
> (The old offset layout can be created with "--layout=o2").
>
> NeilBrown
>
All clear now :)
Thank you very much Neil.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-14 0:46 ` Stan Hoeppner
@ 2014-01-14 9:38 ` keld
0 siblings, 0 replies; 22+ messages in thread
From: keld @ 2014-01-14 9:38 UTC (permalink / raw)
To: Stan Hoeppner; +Cc: NeilBrown, Gionatan Danti, linux-raid
On Mon, Jan 13, 2014 at 06:46:15PM -0600, Stan Hoeppner wrote:
> On 1/13/2014 5:38 PM, keld@keldix.com wrote:
> > On Tue, Jan 14, 2014 at 09:27:51AM +1100, NeilBrown wrote:
> ...
> >> So this change:
> >>
> >> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
> >>
> >> was wrong.
> >
> > Well, it was me doing the wikipedia edit. The edit was done based on information from Neil that this was actually
> > the layout. Then later we found out that it really was not, but it should be; and then Neil implemented
> > the better layout. Maybe it is not called "f2", I look forward to be informed what the actual name
> > will be.
> >
> > I think the name should be "f2" as it is a "far" layout, with 2 copies, and it really should be
> > the default for "far" with 2 copies, as the redundancy is much better than the old layout.
> > Keeping the name would mean that we would not need to make and spread documentation on this,
> > so that people following existing documentation would automatically get the better implementation.
> > There is no need that new raid instances of "far" should get the old layout, except for
> > backwards compatibility.
>
> The problem here is that you're creating the Wikipedia page as if it
> *is* source reference material. I.e. you're including "original work,
> your original work. This is a violation of the Wikipedia rules of
> editing. And this kind of situation is exactly why those rules exist.
I am only referencing material available other places.
> The layout tables you are including need to exist in a free to duplicate
> reference document, and should be copied verbatim from said document.
> They should not be created from scratch simply based on information in
> an email exchange on a mailing list, just as web forums are not
> considered a valid reference source.
I only described things that was already described.
best regards
keld
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts
2014-01-13 9:45 ` NeilBrown
2014-01-13 10:15 ` Gionatan Danti
@ 2014-01-14 10:06 ` keld
1 sibling, 0 replies; 22+ messages in thread
From: keld @ 2014-01-14 10:06 UTC (permalink / raw)
To: NeilBrown; +Cc: Gionatan Danti, linux-raid
On Mon, Jan 13, 2014 at 08:45:34PM +1100, NeilBrown wrote:
> On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
>
> I think I know what you are talking about now. The md driver in the kernel
> supports two sorts of 'far' or 'offset' layouts for arrays where the number
> of devices is not an integer multiple of the number of copies.
> This has been supported in Linux since v3.9. but is not yet supported by
> mdadm.
Hmm, we discussed also the new layouts for when the number of drives are
a whole multiple of the number of copies. That layout should follow the same
principles.
How do I generate the new format on kernel 3.9?
best regards
keld
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2014-01-14 10:06 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti
2013-12-27 14:46 ` Peter Grandi
2013-12-27 15:16 ` Gionatan Danti
2013-12-27 17:16 ` Peter Grandi
2013-12-27 17:32 ` Gionatan Danti
2013-12-27 18:26 ` keld
2013-12-27 15:19 ` keld
2013-12-27 15:22 ` Gionatan Danti
2013-12-27 15:49 ` keld
2014-01-09 8:03 ` Gionatan Danti
2014-01-12 23:20 ` NeilBrown
2014-01-13 8:52 ` Gionatan Danti
2014-01-13 9:45 ` NeilBrown
2014-01-13 10:15 ` Gionatan Danti
2014-01-13 22:27 ` NeilBrown
2014-01-13 23:38 ` keld
2014-01-14 0:46 ` Stan Hoeppner
2014-01-14 9:38 ` keld
2014-01-14 9:06 ` Gionatan Danti
2014-01-14 9:16 ` NeilBrown
2014-01-14 9:27 ` Gionatan Danti
2014-01-14 10:06 ` keld
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).