RAID 10 far and offset on-disk layouts

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID 10 far and offset on-disk layouts
@ 2013-12-27 14:29 Gionatan Danti
  2013-12-27 14:46 ` Peter Grandi
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Gionatan Danti @ 2013-12-27 14:29 UTC (permalink / raw)
  To: linux-raid

Hi all,
I (think of) quite well understand how far and offset work, but I can 
not find any data on the precise on-disk layout.

FAR LAYOUT
md(4) states:
"The first copy of all data blocks will be striped across the early part 
of all drives in RAID0 fashion, and then the next copy of all blocks 
will be striped across a later section of all drives, always ensuring 
that all copies of any given block are on different drives"

The "on different drives" part let me wonder _how_ are chunks 
distributed. On a 4-disk array, I can imagine some different schemas:

1)	A1 A2 A3 A4
	.. .. .. ..
	A4 A1 A2 A3

2)	A1 A2 A3 A4
	.. .. .. ..
	A2 A1 A4 A3

The first schema is the one depicted by SuSe documentation [1], while 
the second is the one described by Wikipedia [2].

Question 1: as the two schema have different reliability 
characteristics, which is really used?

OFFSET LAYOUT
md(4) states:
"When 'offset' replicas are chosen, the multiple copies of a given chunk 
are laid out on consecutive drives and at consecutive offsets. 
Effectively each stripe is duplicated and the copies are offset by one 
device."

This means a schema like this:

3)	A1 A2 A3 A4
	A4 A1 A2 A3
	.. .. .. ..

However, this is susceptible to any consecutive two-disk failures. A 
schema like

4)	A1 A2 A3 A4
	A2 A1 A4 A3

would not suffer from this problem (eg: disk 2 & 3 can fail and the 
array is still working).

Question 2: apart from simplicity, why the offset layout use the schema 
as n.3? I miss something?

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti
@ 2013-12-27 14:46 ` Peter Grandi
  2013-12-27 15:16 ` Gionatan Danti
  2013-12-27 15:19 ` keld
  2 siblings, 0 replies; 22+ messages in thread
From: Peter Grandi @ 2013-12-27 14:46 UTC (permalink / raw)
  To: Linux RAID

[ ... ]
> The "on different drives" part let me wonder _how_ are chunks
> distributed. [ ... ] Question 1: as the two schema have
> different reliability characteristics, which is really used?

It does not matter (except to people writing MD-specific tools).
There is nothing special as to the ordering of drives or chunks
on drives. Also reliability is a *statistical* property not a
geometric one...

> [ ... ] However, this is susceptible to any consecutive
> two-disk failures. A schema like [... ] would not suffer from
> this problem (eg: disk 2 & 3 can fail and the array is still
> working).

That "consecutive two-disk failures" is really funny!

If two-paired-disk failure in RAID10 bother you, try RAID14:

  http://www.sabi.co.uk/blog/13-two.html#131213

Warning: that does not come at no cost :-).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti
  2013-12-27 14:46 ` Peter Grandi
@ 2013-12-27 15:16 ` Gionatan Danti
  2013-12-27 17:16   ` Peter Grandi
  2013-12-27 15:19 ` keld
  2 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2013-12-27 15:16 UTC (permalink / raw)
  To: linux-raid

 > It does not matter (except to people writing MD-specific tools).
 > There is nothing special as to the ordering of drives or chunks
 > on drives. Also reliability is a *statistical* property not a
 > geometric one...

Uhm, why it don't matter? For clarity, let me redraw the two schemas:

1) A1 A2 A3 A4
    .. .. .. ..
    A4 A1 A2 A3

2) A1 A2 A3 A4
    .. .. .. ..
    A2 A1 A4 A3

Schema n.1 will fail on any adjacent disk failure. Eg: 1 & 2, 2 & 3, 3 & 
4, 4 & 1.

On the other hand, schema n.2 will become inactive only when 1 & 2 or 3 
& 4 disk fail, but not, for example, when 2 & 3 or 1 & 4 fail.

Or I misunderstand something?

 > That "consecutive two-disk failures" is really funny!

Er, my English is not very good :p
I really was talking about adjacent disk failures. Sorry!

 > If two-paired-disk failure in RAID10 bother you, try RAID14:
 >
 >  http://www.sabi.co.uk/blog/13-two.html#131213
 >
 > Warning: that does not come at no cost :-).

Thank you very mych for the link! I need some time to read it carefully...

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti
  2013-12-27 14:46 ` Peter Grandi
  2013-12-27 15:16 ` Gionatan Danti
@ 2013-12-27 15:19 ` keld
  2013-12-27 15:22   ` Gionatan Danti
  2 siblings, 1 reply; 22+ messages in thread
From: keld @ 2013-12-27 15:19 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-raid

On Fri, Dec 27, 2013 at 03:29:49PM +0100, Gionatan Danti wrote:
> Hi all,
> I (think of) quite well understand how far and offset work, but I can 
> not find any data on the precise on-disk layout.
> 
> FAR LAYOUT
> md(4) states:
> "The first copy of all data blocks will be striped across the early part 
> of all drives in RAID0 fashion, and then the next copy of all blocks 
> will be striped across a later section of all drives, always ensuring 
> that all copies of any given block are on different drives"
> 
> The "on different drives" part let me wonder _how_ are chunks 
> distributed. On a 4-disk array, I can imagine some different schemas:
> 
> 1)	A1 A2 A3 A4
> 	.. .. .. ..
> 	A4 A1 A2 A3
> 
> 2)	A1 A2 A3 A4
> 	.. .. .. ..
> 	A2 A1 A4 A3
> 
> The first schema is the one depicted by SuSe documentation [1], while 
> the second is the one described by Wikipedia [2].
> 
> Question 1: as the two schema have different reliability 
> characteristics, which is really used?

The wikipedia description is what you get for new arrays with newer
kernels, while the suse documentation is what you will get with older kernels.
The wikipedia layout was made because there are better chances of recovery,
Chances went from 1/3 to 2/3 with eg 4 drives, when 2 drives were failing.

I would say that the Suse description is just not updated.

Best regards
Keld

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2013-12-27 15:19 ` keld
@ 2013-12-27 15:22   ` Gionatan Danti
  2013-12-27 15:49     ` keld
  0 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2013-12-27 15:22 UTC (permalink / raw)
  To: keld; +Cc: linux-raid


> On Fri, Dec 27, 2013 at 03:29:49PM +0100, Gionatan Danti wrote:
> The wikipedia description is what you get for new arrays with newer
> kernels, while the suse documentation is what you will get with older kernels.
> The wikipedia layout was made because there are better chances of recovery,
> Chances went from 1/3 to 2/3 with eg 4 drives, when 2 drives were failing.
>
> I would say that the Suse description is just not updated.
>
> Best regards
> Keld
>

Interesting. Two question:
1) from which kernel the layout is the one depicted by Wikipedia?
2) it is possible, using mdadm, check what "far" layout is in use?

Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2013-12-27 15:22   ` Gionatan Danti
@ 2013-12-27 15:49     ` keld
  2014-01-09  8:03       ` Gionatan Danti
  0 siblings, 1 reply; 22+ messages in thread
From: keld @ 2013-12-27 15:49 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-raid

On Fri, Dec 27, 2013 at 04:22:55PM +0100, Gionatan Danti wrote:
> 
> >On Fri, Dec 27, 2013 at 03:29:49PM +0100, Gionatan Danti wrote:
> >The wikipedia description is what you get for new arrays with newer
> >kernels, while the suse documentation is what you will get with older 
> >kernels.
> >The wikipedia layout was made because there are better chances of recovery,
> >Chances went from 1/3 to 2/3 with eg 4 drives, when 2 drives were failing.
> >
> >I would say that the Suse description is just not updated.
> >
> >Best regards
> >Keld
> >
> 
> Interesting. Two question:
> 1) from which kernel the layout is the one depicted by Wikipedia?
> 2) it is possible, using mdadm, check what "far" layout is in use?

I cannot answer that. Neil Brown should know.

Best regards
Keld

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2013-12-27 15:16 ` Gionatan Danti
@ 2013-12-27 17:16   ` Peter Grandi
  2013-12-27 17:32     ` Gionatan Danti
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Grandi @ 2013-12-27 17:16 UTC (permalink / raw)
  To: Linux RAID

>> It does not matter (except to people writing MD-specific
>> tools).  There is nothing special as to the ordering of
>> drives or chunks on drives. Also reliability is a
>> *statistical* property not a geometric one...

[ ... ]

> 1) A1 A2 A3 A4
>    .. .. .. ..
>    A4 A1 A2 A3

> 2) A1 A2 A3 A4
>    .. .. .. ..
>    A2 A1 A4 A3

> Schema n.1 will fail on any adjacent disk failure. Eg: 1 & 2,
> 2 & 3, 3 & 4, 4 & 1. On the other hand, schema n.2 will become
> inactive only when 1 & 2 or 3 & 4 disk fail, but not, for
> example, when 2 & 3 or 1 & 4 fail.

>> That "consecutive two-disk failures" is really funny!

> [ ... ] talking about adjacent disk failures. Sorry!

The same, that the MD member devices are adjacent or not does
not matter, ordering is irrelevant.

When you compare layout 1) above and 2) above what matters is
how many 2-device failures lead to loss of data, not how many
"adjacent" 2-device failures.

RAID10 has the property that only the failure of 2 *paired* (for
the usual case of two copies of the same chunk) member devices,
whether "adjacent" or not, will lead to loss of data. So what
matters are which devices are paired, not whether they are
adjacent or not.

Using the layout convention of:

  https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#byz81ho

and doing the full picture of 4 stripes, with chunks 0, 1, 2, 3,
4, 5, 6, each replicated on 2 distinct drives out of 4:

1)
	a	b	c	d
	--------------------------

	0	1	2	3
	3	0	1	2

	4	5	6	7
	7	4	5	6

2)
	a	b	c	d
	--------------------------

	0	1	2	3
	1	0	3	2

	4	5	6	7
	5	4	7	6

It becomes more easily apparent that in layout 1):

  * 'a' is paired with:
    - 'b' (chunks 0 and 4);
    - 'd' (chunks 3 and 7).
  * 'c' is paired with:
    - 'b' (chunks 1 and 2);
    - 'd' (chunks 5 and 6).

while in layout 2) 'a' is paired with 'b' (chunks 0, 1, 4, 5)
and 'c' with 'd' (chunks 2, 3, 6, 7). Therefore only the failure
of 'a' and 'c' will result in no data loss. It is very easy to
swap around 'b' and 'c' an entirely equivalent layout where not
every failure of two "adjacent" devices results in data loss.

Therefore the *probability* of loss of data because of 2 member
devices failing is higher in layout 1) than layout 2), whether
or not the drives are "adjacent".

  Note that arguably layout 1) is not really RAID10, because an
  important property of RAID10 is or should be that there are
  only N/2 pairs out of N drives. Otherwise it is not quite
  'RAID1' if a chunk position in a stripe can be replicated on 2
  other devices, half the replicas on one and half on another.

That the member devices are *adjacent* is irrelevant; what
matters is the statistical chance, which is driven by the
percent of cases where 2 failures result in data loss, which
driven by the number of paired drives.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2013-12-27 17:16   ` Peter Grandi
@ 2013-12-27 17:32     ` Gionatan Danti
  2013-12-27 18:26       ` keld
  0 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2013-12-27 17:32 UTC (permalink / raw)
  To: Peter Grandi, Linux RAID

> <snip>
> Therefore the *probability* of loss of data because of 2 member
> devices failing is higher in layout 1) than layout 2), whether
> or not the drives are "adjacent".
>
>    Note that arguably layout 1) is not really RAID10, because an
>    important property of RAID10 is or should be that there are
>    only N/2 pairs out of N drives. Otherwise it is not quite
>    'RAID1' if a chunk position in a stripe can be replicated on 2
>    other devices, half the replicas on one and half on another.
>
> That the member devices are *adjacent* is irrelevant; what
> matters is the statistical chance, which is driven by the
> percent of cases where 2 failures result in data loss, which
> driven by the number of paired drives.

Very detailed answer, thank you Peter :)

Based on what keld told before, the current scheme if n.2 (wikipedia's 
one), right? It is possible, using mdadm, understand the physical layout 
(if n.1 or n.2) of a live RAID10 array?

As schema n.1 lead to increased probability of data loss, why offset 
layout use it instead of, say, some variance of schema n.2?

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2013-12-27 17:32     ` Gionatan Danti
@ 2013-12-27 18:26       ` keld
  0 siblings, 0 replies; 22+ messages in thread
From: keld @ 2013-12-27 18:26 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: Peter Grandi, Linux RAID

On Fri, Dec 27, 2013 at 06:32:48PM +0100, Gionatan Danti wrote:
> ><snip>
> >Therefore the *probability* of loss of data because of 2 member
> >devices failing is higher in layout 1) than layout 2), whether
> >or not the drives are "adjacent".
> >
> >   Note that arguably layout 1) is not really RAID10, because an
> >   important property of RAID10 is or should be that there are
> >   only N/2 pairs out of N drives. Otherwise it is not quite
> >   'RAID1' if a chunk position in a stripe can be replicated on 2
> >   other devices, half the replicas on one and half on another.
> >
> >That the member devices are *adjacent* is irrelevant; what
> >matters is the statistical chance, which is driven by the
> >percent of cases where 2 failures result in data loss, which
> >driven by the number of paired drives.
> 
> Very detailed answer, thank you Peter :)
> 
> Based on what keld told before, the current scheme if n.2 (wikipedia's 
> one), right? It is possible, using mdadm, understand the physical layout 
> (if n.1 or n.2) of a live RAID10 array?
> 
> As schema n.1 lead to increased probability of data loss, why offset 
> layout use it instead of, say, some variance of schema n.2?

I am not sure of the probabilities on chances of surviving more
than 1 failing drive for the offset layout, but my intuition tells
me it is rather bad. As it shifts the blocks one block at a time,
my guts feeling is that it really cannot survive more than one
failing disk.

On the other hand raid10,far in the second layout (wikipedia - and 
I am the author of the text:-) I am quite sure that the layout is 
theoretically optimal, as you in the luckiest case can survive
n/2 drives failing, where n is your number of drives, and it is
integer division...  I did the design of this layout for maximum
redundancy. 

The main reason for chosing raid10,far is that it is faster
for single reads, a speed of raid0, while for other operations it is 
about the same. For degraded arrays raid10,far is probably worse
than the other raid10 types, while the IO scheduling algorithm
probably remedies some of the bad raw performance on the degraded
raid10,far.

Also the use of inner and faster sectors of a hard drive gives
raid10,far an edge towards the other raid10 types.

Best regards
Keld

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2013-12-27 15:49     ` keld
@ 2014-01-09  8:03       ` Gionatan Danti
  2014-01-12 23:20         ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2014-01-09  8:03 UTC (permalink / raw)
  To: linux-raid; +Cc: keld, neilb

>>
>> Interesting. Two question:
>> 1) from which kernel the layout is the one depicted by Wikipedia?
>> 2) it is possible, using mdadm, check what "far" layout is in use?
>
> I cannot answer that. Neil Brown should know.
>
> Best regards
> Keld
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Hi all,
anyone with an update on these two questions?

I was thinking to use the kernel block trace facility to track disk 
access and infer the on-disk data structure, but I haven't tried for now.

On the other hand, I carefully looked at mdadm output, without finding 
anything related to physical block placing.

Any new advices on that regard?
Thanks.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-09  8:03       ` Gionatan Danti
@ 2014-01-12 23:20         ` NeilBrown
  2014-01-13  8:52           ` Gionatan Danti
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2014-01-12 23:20 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-raid, keld

[-- Attachment #1: Type: text/plain, Size: 1160 bytes --]

On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:

> >>
> >> Interesting. Two question:
> >> 1) from which kernel the layout is the one depicted by Wikipedia?

Exactly what depiction in wikipedia are you referring to?  A link to the
image might help.

> >> 2) it is possible, using mdadm, check what "far" layout is in use?

mdadm --detail /dev/mdWHATEVER | grep Layout


> >
> > I cannot answer that. Neil Brown should know.
> >
> > Best regards
> > Keld
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Hi all,
> anyone with an update on these two questions?
> 
> I was thinking to use the kernel block trace facility to track disk 
> access and infer the on-disk data structure, but I haven't tried for now.
> 
> On the other hand, I carefully looked at mdadm output, without finding 
> anything related to physical block placing.

Look for "Layout".

NeilBrown


> 
> Any new advices on that regard?
> Thanks.
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-12 23:20         ` NeilBrown
@ 2014-01-13  8:52           ` Gionatan Danti
  2014-01-13  9:45             ` NeilBrown
  0 siblings, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2014-01-13  8:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti

Hi Neil,
let me recap from a previous message:

 >FAR LAYOUT
 >md(4) states:
 >"The first copy of all data blocks will be striped across the early >part
 >of all drives in RAID0 fashion, and then the next copy of all blocks
 >will be striped across a later section of all drives, always ensuring
 >that all copies of any given block are on different drives"
 >
 >The "on different drives" part let me wonder _how_ are chunks
 >distributed. On a 4-disk array, I can imagine some different schemas:
 >
 >1)	A1 A2 A3 A4
 >	.. .. .. ..
 >	A4 A1 A2 A3
 >
 >2)	A1 A2 A3 A4
 >	.. .. .. ..
 >	A2 A1 A4 A3
 >
 >The first schema is the one depicted by SuSe documentation [1], while
 >the second is the one described by Wikipedia [2].
 >
 >Question 1: as the two schema have different reliability
 >characteristics, which is really used?

SuSe entry: 
https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk

Wikipedia entry: 
http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how 
far layout is depicted)

Keld kindly told me that the SuSe is simply not updated, as it depict a 
situation changed with newer kernels. So my two questions:
1) from which kernel the layout is the one depicted by Wikipedia?
2) it is possible, using mdadm, check what "far" layout is in use?

 From what I can see, a "mdadm --detail /dev/mdWHATEVER | grep Layout" 
tell me if using far vs near vs offset layout, but not the physical 
on-disk chunks organization (eg: far "type" 1 or 2).

Anyway, the thread started because I wonder why the OFFSET layout couple 
each disk to other two disks. Let me quote again:

 >OFFSET LAYOUT
 >md(4) states:
 >"When 'offset' replicas are chosen, the multiple copies of a given >chunk
 >are laid out on consecutive drives and at consecutive offsets.
 >Effectively each stripe is duplicated and the copies are offset by one
 >device."
 >
 >This means a schema like this:
 >	
 >3)	A1 A2 A3 A4
 >	A4 A1 A2 A3
 >	.. .. .. ..
 >
 >However, this is susceptible to any consecutive two-disk failures. A
 >schema like
 >
 >4)	A1 A2 A3 A4
 >	A2 A1 A4 A3
 >
 >would not suffer from this problem (eg: disk 2 & 3 can fail and the
 >array is still working).
 >
 >Question 2: apart from simplicity, why the offset layout use the schema
 >as n.3? I miss something?

Full thread link: http://marc.info/?t=138815504400002&r=1&w=2

Excuse me for the long email, I am simply trying to learn something :)
Thank you very much.

On 01/13/2014 12:20 AM, NeilBrown wrote:
> On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
>
>>>>
>>>> Interesting. Two question:
>>>> 1) from which kernel the layout is the one depicted by Wikipedia?
>
> Exactly what depiction in wikipedia are you referring to?  A link to the
> image might help.
>
>>>> 2) it is possible, using mdadm, check what "far" layout is in use?
>
> mdadm --detail /dev/mdWHATEVER | grep Layout
>
>
>>>
>>> I cannot answer that. Neil Brown should know.
>>>
>>> Best regards
>>> Keld
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> Hi all,
>> anyone with an update on these two questions?
>>
>> I was thinking to use the kernel block trace facility to track disk
>> access and infer the on-disk data structure, but I haven't tried for now.
>>
>> On the other hand, I carefully looked at mdadm output, without finding
>> anything related to physical block placing.
>
> Look for "Layout".
>
> NeilBrown
>
>
>>
>> Any new advices on that regard?
>> Thanks.
>>
>

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-13  8:52           ` Gionatan Danti
@ 2014-01-13  9:45             ` NeilBrown
  2014-01-13 10:15               ` Gionatan Danti
  2014-01-14 10:06               ` keld
  0 siblings, 2 replies; 22+ messages in thread
From: NeilBrown @ 2014-01-13  9:45 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-raid, keld

[-- Attachment #1: Type: text/plain, Size: 4829 bytes --]

On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:

> Hi Neil,
> let me recap from a previous message:
> 
>  >FAR LAYOUT
>  >md(4) states:
>  >"The first copy of all data blocks will be striped across the early >part
>  >of all drives in RAID0 fashion, and then the next copy of all blocks
>  >will be striped across a later section of all drives, always ensuring
>  >that all copies of any given block are on different drives"
>  >
>  >The "on different drives" part let me wonder _how_ are chunks
>  >distributed. On a 4-disk array, I can imagine some different schemas:
>  >
>  >1)	A1 A2 A3 A4
>  >	.. .. .. ..
>  >	A4 A1 A2 A3
>  >
>  >2)	A1 A2 A3 A4
>  >	.. .. .. ..
>  >	A2 A1 A4 A3
>  >
>  >The first schema is the one depicted by SuSe documentation [1], while
>  >the second is the one described by Wikipedia [2].
>  >
>  >Question 1: as the two schema have different reliability
>  >characteristics, which is really used?
> 
> SuSe entry: 
> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
> 
> Wikipedia entry: 
> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how 
> far layout is depicted)
> 
> Keld kindly told me that the SuSe is simply not updated, as it depict a 
> situation changed with newer kernels. So my two questions:

I cannot see an important difference between the two pages you reference.
Both appear to be correct.

> 1) from which kernel the layout is the one depicted by Wikipedia?

These are both valid for any kernel since 2.6.18 with mdadm 2.5 or later.

> 2) it is possible, using mdadm, check what "far" layout is in use?

I think I know what you are talking about now.  The md driver in the kernel
supports two sorts of 'far' or 'offset' layouts for arrays where the number
of devices is not an integer multiple of the number of copies.
This has been supported in Linux since v3.9. but is not yet supported by
mdadm.

> 
>  From what I can see, a "mdadm --detail /dev/mdWHATEVER | grep Layout" 
> tell me if using far vs near vs offset layout, but not the physical 
> on-disk chunks organization (eg: far "type" 1 or 2).

This is because mdadm does not yet create or report on the new type.
When it does, the above command will be the correct command to find out which
layout is in use (but I don't yet know what the output will say exactly).

NeilBrown


> 
> Anyway, the thread started because I wonder why the OFFSET layout couple 
> each disk to other two disks. Let me quote again:
> 
>  >OFFSET LAYOUT
>  >md(4) states:
>  >"When 'offset' replicas are chosen, the multiple copies of a given >chunk
>  >are laid out on consecutive drives and at consecutive offsets.
>  >Effectively each stripe is duplicated and the copies are offset by one
>  >device."
>  >
>  >This means a schema like this:
>  >	
>  >3)	A1 A2 A3 A4
>  >	A4 A1 A2 A3
>  >	.. .. .. ..
>  >
>  >However, this is susceptible to any consecutive two-disk failures. A
>  >schema like
>  >
>  >4)	A1 A2 A3 A4
>  >	A2 A1 A4 A3
>  >
>  >would not suffer from this problem (eg: disk 2 & 3 can fail and the
>  >array is still working).
>  >
>  >Question 2: apart from simplicity, why the offset layout use the schema
>  >as n.3? I miss something?
> 
> Full thread link: http://marc.info/?t=138815504400002&r=1&w=2
> 
> Excuse me for the long email, I am simply trying to learn something :)
> Thank you very much.
> 
> On 01/13/2014 12:20 AM, NeilBrown wrote:
> > On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> >
> >>>>
> >>>> Interesting. Two question:
> >>>> 1) from which kernel the layout is the one depicted by Wikipedia?
> >
> > Exactly what depiction in wikipedia are you referring to?  A link to the
> > image might help.
> >
> >>>> 2) it is possible, using mdadm, check what "far" layout is in use?
> >
> > mdadm --detail /dev/mdWHATEVER | grep Layout
> >
> >
> >>>
> >>> I cannot answer that. Neil Brown should know.
> >>>
> >>> Best regards
> >>> Keld
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>
> >> Hi all,
> >> anyone with an update on these two questions?
> >>
> >> I was thinking to use the kernel block trace facility to track disk
> >> access and infer the on-disk data structure, but I haven't tried for now.
> >>
> >> On the other hand, I carefully looked at mdadm output, without finding
> >> anything related to physical block placing.
> >
> > Look for "Layout".
> >
> > NeilBrown
> >
> >
> >>
> >> Any new advices on that regard?
> >> Thanks.
> >>
> >
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-13  9:45             ` NeilBrown
@ 2014-01-13 10:15               ` Gionatan Danti
  2014-01-13 22:27                 ` NeilBrown
  2014-01-14 10:06               ` keld
  1 sibling, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2014-01-13 10:15 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti

On 01/13/2014 10:45 AM, NeilBrown wrote:
> On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
>
>> Hi Neil,
>> let me recap from a previous message:
>>
>>   >FAR LAYOUT
>>   >md(4) states:
>>   >"The first copy of all data blocks will be striped across the early >part
>>   >of all drives in RAID0 fashion, and then the next copy of all blocks
>>   >will be striped across a later section of all drives, always ensuring
>>   >that all copies of any given block are on different drives"
>>   >
>>   >The "on different drives" part let me wonder _how_ are chunks
>>   >distributed. On a 4-disk array, I can imagine some different schemas:
>>   >
>>   >1)	A1 A2 A3 A4
>>   >	.. .. .. ..
>>   >	A4 A1 A2 A3
>>   >
>>   >2)	A1 A2 A3 A4
>>   >	.. .. .. ..
>>   >	A2 A1 A4 A3
>>   >
>>   >The first schema is the one depicted by SuSe documentation [1], while
>>   >the second is the one described by Wikipedia [2].
>>   >
>>   >Question 1: as the two schema have different reliability
>>   >characteristics, which is really used?
>>
>> SuSe entry:
>> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
>>
>> Wikipedia entry:
>> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how
>> far layout is depicted)
>>
>> Keld kindly told me that the SuSe is simply not updated, as it depict a
>> situation changed with newer kernels. So my two questions:
>
> I cannot see an important difference between the two pages you reference.
> Both appear to be correct.

Mmm... they seem different to me.

SeSe FAR Layout:

sda1 sdb1 sdc1 sde1
   0    1    2    3
   4    5    6    7
   . . .
   3    0    1    2
   7    4    5    6

Notice how (for example) sdb1 is coupled both to sda1 (0,4) and 
sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.

Now, Wikipedia FAR Layout:

4 drives (sda1, sdb1, sdc1, sdd1)
--------------------
A1   A2   A3   A4
A5   A6   A7   A8
A9   A10  A11  A12
..   ..   ..   ..
A2   A1   A4   A3
A6   A5   A8   A7
A10  A9   A12  A11
..   ..   ..   ..

Notice now how a single disk (eg: sdb1) is coupled to only another 
_single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose 
sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.

I am wrong?

Regards.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-13 10:15               ` Gionatan Danti
@ 2014-01-13 22:27                 ` NeilBrown
  2014-01-13 23:38                   ` keld
  2014-01-14  9:06                   ` Gionatan Danti
  0 siblings, 2 replies; 22+ messages in thread
From: NeilBrown @ 2014-01-13 22:27 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-raid, keld

[-- Attachment #1: Type: text/plain, Size: 3060 bytes --]

On Mon, 13 Jan 2014 11:15:13 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:

> On 01/13/2014 10:45 AM, NeilBrown wrote:
> > On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> >
> >> Hi Neil,
> >> let me recap from a previous message:
> >>
> >>   >FAR LAYOUT
> >>   >md(4) states:
> >>   >"The first copy of all data blocks will be striped across the early >part
> >>   >of all drives in RAID0 fashion, and then the next copy of all blocks
> >>   >will be striped across a later section of all drives, always ensuring
> >>   >that all copies of any given block are on different drives"
> >>   >
> >>   >The "on different drives" part let me wonder _how_ are chunks
> >>   >distributed. On a 4-disk array, I can imagine some different schemas:
> >>   >
> >>   >1)	A1 A2 A3 A4
> >>   >	.. .. .. ..
> >>   >	A4 A1 A2 A3
> >>   >
> >>   >2)	A1 A2 A3 A4
> >>   >	.. .. .. ..
> >>   >	A2 A1 A4 A3
> >>   >
> >>   >The first schema is the one depicted by SuSe documentation [1], while
> >>   >the second is the one described by Wikipedia [2].
> >>   >
> >>   >Question 1: as the two schema have different reliability
> >>   >characteristics, which is really used?
> >>
> >> SuSe entry:
> >> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
> >>
> >> Wikipedia entry:
> >> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how
> >> far layout is depicted)
> >>
> >> Keld kindly told me that the SuSe is simply not updated, as it depict a
> >> situation changed with newer kernels. So my two questions:
> >
> > I cannot see an important difference between the two pages you reference.
> > Both appear to be correct.
> 
> Mmm... they seem different to me.
> 
> SeSe FAR Layout:
> 
> sda1 sdb1 sdc1 sde1
>    0    1    2    3
>    4    5    6    7
>    . . .
>    3    0    1    2
>    7    4    5    6
> 
> Notice how (for example) sdb1 is coupled both to sda1 (0,4) and 
> sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.
> 
> Now, Wikipedia FAR Layout:
> 
> 4 drives (sda1, sdb1, sdc1, sdd1)
> --------------------
> A1   A2   A3   A4
> A5   A6   A7   A8
> A9   A10  A11  A12
> ..   ..   ..   ..
> A2   A1   A4   A3
> A6   A5   A8   A7
> A10  A9   A12  A11
> ..   ..   ..   ..
> 
> Notice now how a single disk (eg: sdb1) is coupled to only another 
> _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose 
> sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.
> 

Thanks for being explicit - it is much easier to answer explicit questions :-)

Yes, they are different.  So the wikipedia article is wrong, or at least
misleading.  That is not what the "f2" layout looks like.

The md driver does support that layout.  I don't know yet what mdadm will
call it, but it won't be called "f2".

So this change:

http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733

was wrong.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-13 22:27                 ` NeilBrown
@ 2014-01-13 23:38                   ` keld
  2014-01-14  0:46                     ` Stan Hoeppner
  2014-01-14  9:06                   ` Gionatan Danti
  1 sibling, 1 reply; 22+ messages in thread
From: keld @ 2014-01-13 23:38 UTC (permalink / raw)
  To: NeilBrown; +Cc: Gionatan Danti, linux-raid

On Tue, Jan 14, 2014 at 09:27:51AM +1100, NeilBrown wrote:
> On Mon, 13 Jan 2014 11:15:13 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> 
> > On 01/13/2014 10:45 AM, NeilBrown wrote:
> > > On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> > >
> > >> Hi Neil,
> > >> let me recap from a previous message:
> > >>
> > >>   >FAR LAYOUT
> > >>   >md(4) states:
> > >>   >"The first copy of all data blocks will be striped across the early >part
> > >>   >of all drives in RAID0 fashion, and then the next copy of all blocks
> > >>   >will be striped across a later section of all drives, always ensuring
> > >>   >that all copies of any given block are on different drives"
> > >>   >
> > >>   >The "on different drives" part let me wonder _how_ are chunks
> > >>   >distributed. On a 4-disk array, I can imagine some different schemas:
> > >>   >
> > >>   >1)	A1 A2 A3 A4
> > >>   >	.. .. .. ..
> > >>   >	A4 A1 A2 A3
> > >>   >
> > >>   >2)	A1 A2 A3 A4
> > >>   >	.. .. .. ..
> > >>   >	A2 A1 A4 A3
> > >>   >
> > >>   >The first schema is the one depicted by SuSe documentation [1], while
> > >>   >the second is the one described by Wikipedia [2].
> > >>   >
> > >>   >Question 1: as the two schema have different reliability
> > >>   >characteristics, which is really used?
> > >>
> > >> SuSe entry:
> > >> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
> > >>
> > >> Wikipedia entry:
> > >> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how
> > >> far layout is depicted)
> > >>
> > >> Keld kindly told me that the SuSe is simply not updated, as it depict a
> > >> situation changed with newer kernels. So my two questions:
> > >
> > > I cannot see an important difference between the two pages you reference.
> > > Both appear to be correct.
> > 
> > Mmm... they seem different to me.
> > 
> > SeSe FAR Layout:
> > 
> > sda1 sdb1 sdc1 sde1
> >    0    1    2    3
> >    4    5    6    7
> >    . . .
> >    3    0    1    2
> >    7    4    5    6
> > 
> > Notice how (for example) sdb1 is coupled both to sda1 (0,4) and 
> > sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.
> > 
> > Now, Wikipedia FAR Layout:
> > 
> > 4 drives (sda1, sdb1, sdc1, sdd1)
> > --------------------
> > A1   A2   A3   A4
> > A5   A6   A7   A8
> > A9   A10  A11  A12
> > ..   ..   ..   ..
> > A2   A1   A4   A3
> > A6   A5   A8   A7
> > A10  A9   A12  A11
> > ..   ..   ..   ..
> > 
> > Notice now how a single disk (eg: sdb1) is coupled to only another 
> > _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose 
> > sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.
> > 
> 
> Thanks for being explicit - it is much easier to answer explicit questions :-)
> 
> Yes, they are different.  So the wikipedia article is wrong, or at least
> misleading.  That is not what the "f2" layout looks like.
> 
> The md driver does support that layout.  I don't know yet what mdadm will
> call it, but it won't be called "f2".
> 
> So this change:
> 
> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
> 
> was wrong.

Well, it was me doing the wikipedia edit. The edit was done based on information from Neil that this was actually 
the layout. Then later we found out that it really was not, but it should be; and then Neil implemented
the better layout.  Maybe it is not called "f2", I look forward to be informed what the actual name 
will be. 

I think the name should be "f2" as it is a "far" layout, with 2 copies, and it really should be
the default for "far" with 2 copies, as the redundancy is much better than the old layout.
Keeping the name would mean that  we would not need to make and spread documentation on this,
so that people following existing documentation would automatically get the better implementation.
There is no need that new raid instances of "far" should get the old layout, except for
backwards compatibility. 

Best regards
keld



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-13 23:38                   ` keld
@ 2014-01-14  0:46                     ` Stan Hoeppner
  2014-01-14  9:38                       ` keld
  0 siblings, 1 reply; 22+ messages in thread
From: Stan Hoeppner @ 2014-01-14  0:46 UTC (permalink / raw)
  To: keld, NeilBrown; +Cc: Gionatan Danti, linux-raid

On 1/13/2014 5:38 PM, keld@keldix.com wrote:
> On Tue, Jan 14, 2014 at 09:27:51AM +1100, NeilBrown wrote:
...
>> So this change:
>>
>> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
>>
>> was wrong.
> 
> Well, it was me doing the wikipedia edit. The edit was done based on information from Neil that this was actually 
> the layout. Then later we found out that it really was not, but it should be; and then Neil implemented
> the better layout.  Maybe it is not called "f2", I look forward to be informed what the actual name 
> will be. 
> 
> I think the name should be "f2" as it is a "far" layout, with 2 copies, and it really should be
> the default for "far" with 2 copies, as the redundancy is much better than the old layout.
> Keeping the name would mean that  we would not need to make and spread documentation on this,
> so that people following existing documentation would automatically get the better implementation.
> There is no need that new raid instances of "far" should get the old layout, except for
> backwards compatibility. 

The problem here is that you're creating the Wikipedia page as if it
*is* source reference material.  I.e. you're including "original work,
your original work.  This is a violation of the Wikipedia rules of
editing.  And this kind of situation is exactly why those rules exist.

The layout tables you are including need to exist in a free to duplicate
reference document, and should be copied verbatim from said document.
They should not be created from scratch simply based on information in
an email exchange on a mailing list, just as web forums are not
considered a valid reference source.

Therefore, if such layout tables do not exist in official Linux
documentation they should not be included in Wikipedia.  If they do
exist, the information should be copied verbatim, and the source
document referenced.  There are no such references in the article.

Wikipedia is an encyclopedia, not a reference work.  All information
needs to be source from reference work.  If you want your original work
to be included in Wikipedia, you need to create your own documentation,
add it to the Linux Documentation Project, and have it peer reviewed.
Then you can include it, and cite that source.

-- 
Stan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-13 22:27                 ` NeilBrown
  2014-01-13 23:38                   ` keld
@ 2014-01-14  9:06                   ` Gionatan Danti
  2014-01-14  9:16                     ` NeilBrown
  1 sibling, 1 reply; 22+ messages in thread
From: Gionatan Danti @ 2014-01-14  9:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti

On 01/13/2014 11:27 PM, NeilBrown wrote:
>>
>> Mmm... they seem different to me.
>>
>> SeSe FAR Layout:
>>
>> sda1 sdb1 sdc1 sde1
>>     0    1    2    3
>>     4    5    6    7
>>     . . .
>>     3    0    1    2
>>     7    4    5    6
>>
>> Notice how (for example) sdb1 is coupled both to sda1 (0,4) and
>> sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.
>>
>> Now, Wikipedia FAR Layout:
>>
>> 4 drives (sda1, sdb1, sdc1, sdd1)
>> --------------------
>> A1   A2   A3   A4
>> A5   A6   A7   A8
>> A9   A10  A11  A12
>> ..   ..   ..   ..
>> A2   A1   A4   A3
>> A6   A5   A8   A7
>> A10  A9   A12  A11
>> ..   ..   ..   ..
>>
>> Notice now how a single disk (eg: sdb1) is coupled to only another
>> _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose
>> sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.
>>
>
> Thanks for being explicit - it is much easier to answer explicit questions :-)
>
> Yes, they are different.  So the wikipedia article is wrong, or at least
> misleading.  That is not what the "f2" layout looks like.
>
> The md driver does support that layout.  I don't know yet what mdadm will
> call it, but it won't be called "f2".
>
> So this change:
>
> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
>
> was wrong.
>
> NeilBrown
>

Ok, so let recap:

1) FAR layout is the one depicted by SuSe documentation, while the 
Wikipedia entry is wrong

2) MD _can_ produce a FAR layout as depicted by Wikipedia, but we don't 
know how the user-space mdadm tool call it (maybe it is not implemented 
yet?)

3) There are any reasons why FAR and OFFSET layout scramble data in this 
manner, coupling any disk with two more disks? It was done for 
simplicity, or I am missing something?

4) you confirm that currently we can _not_ create a FAR layout as the 
one depicted by wikipedia by no means? What about OFFSET layout?

Thank you very much.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-14  9:06                   ` Gionatan Danti
@ 2014-01-14  9:16                     ` NeilBrown
  2014-01-14  9:27                       ` Gionatan Danti
  0 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2014-01-14  9:16 UTC (permalink / raw)
  To: Gionatan Danti; +Cc: linux-raid, keld

[-- Attachment #1: Type: text/plain, Size: 2792 bytes --]

On Tue, 14 Jan 2014 10:06:13 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:

> On 01/13/2014 11:27 PM, NeilBrown wrote:
> >>
> >> Mmm... they seem different to me.
> >>
> >> SeSe FAR Layout:
> >>
> >> sda1 sdb1 sdc1 sde1
> >>     0    1    2    3
> >>     4    5    6    7
> >>     . . .
> >>     3    0    1    2
> >>     7    4    5    6
> >>
> >> Notice how (for example) sdb1 is coupled both to sda1 (0,4) and
> >> sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss.
> >>
> >> Now, Wikipedia FAR Layout:
> >>
> >> 4 drives (sda1, sdb1, sdc1, sdd1)
> >> --------------------
> >> A1   A2   A3   A4
> >> A5   A6   A7   A8
> >> A9   A10  A11  A12
> >> ..   ..   ..   ..
> >> A2   A1   A4   A3
> >> A6   A5   A8   A7
> >> A10  A9   A12  A11
> >> ..   ..   ..   ..
> >>
> >> Notice now how a single disk (eg: sdb1) is coupled to only another
> >> _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose
> >> sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss.
> >>
> >
> > Thanks for being explicit - it is much easier to answer explicit questions :-)
> >
> > Yes, they are different.  So the wikipedia article is wrong, or at least
> > misleading.  That is not what the "f2" layout looks like.
> >
> > The md driver does support that layout.  I don't know yet what mdadm will
> > call it, but it won't be called "f2".
> >
> > So this change:
> >
> > http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
> >
> > was wrong.
> >
> > NeilBrown
> >
> 
> Ok, so let recap:
> 
> 1) FAR layout is the one depicted by SuSe documentation, while the 
> Wikipedia entry is wrong

Yes.

> 
> 2) MD _can_ produce a FAR layout as depicted by Wikipedia, but we don't 
> know how the user-space mdadm tool call it (maybe it is not implemented 
> yet?)

Yes.  Not implemented yet.

> 
> 3) There are any reasons why FAR and OFFSET layout scramble data in this 
> manner, coupling any disk with two more disks? It was done for 
> simplicity, or I am missing something?

It just seemed the easiest thing to do at the time.

> 
> 4) you confirm that currently we can _not_ create a FAR layout as the 
> one depicted by wikipedia by no means? What about OFFSET layout?

You certainly can created the FAR layout depicted on wikipedia, e.g. by
binary-editing the metadata on some devices, or writing some code which does
that for you.  It requires flipping one bit in the metadata and updating the
checksum.  You can probably even to it by writing something appropriate into
some sysfs files.
But mdadm cannot do it yet.
Ditto for the new OFFSET layout.
(The old offset layout can be created with "--layout=o2").

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-14  9:16                     ` NeilBrown
@ 2014-01-14  9:27                       ` Gionatan Danti
  0 siblings, 0 replies; 22+ messages in thread
From: Gionatan Danti @ 2014-01-14  9:27 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti

>> Ok, so let recap:
>>
>> 1) FAR layout is the one depicted by SuSe documentation, while the
>> Wikipedia entry is wrong
>
> Yes.
>
>>
>> 2) MD _can_ produce a FAR layout as depicted by Wikipedia, but we don't
>> know how the user-space mdadm tool call it (maybe it is not implemented
>> yet?)
>
> Yes.  Not implemented yet.
>
>>
>> 3) There are any reasons why FAR and OFFSET layout scramble data in this
>> manner, coupling any disk with two more disks? It was done for
>> simplicity, or I am missing something?
>
> It just seemed the easiest thing to do at the time.
>
>>
>> 4) you confirm that currently we can _not_ create a FAR layout as the
>> one depicted by wikipedia by no means? What about OFFSET layout?
>
> You certainly can created the FAR layout depicted on wikipedia, e.g. by
> binary-editing the metadata on some devices, or writing some code which does
> that for you.  It requires flipping one bit in the metadata and updating the
> checksum.  You can probably even to it by writing something appropriate into
> some sysfs files.
> But mdadm cannot do it yet.
> Ditto for the new OFFSET layout.
> (The old offset layout can be created with "--layout=o2").
>
> NeilBrown
>

All clear now :)

Thank you very much Neil.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-14  0:46                     ` Stan Hoeppner
@ 2014-01-14  9:38                       ` keld
  0 siblings, 0 replies; 22+ messages in thread
From: keld @ 2014-01-14  9:38 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: NeilBrown, Gionatan Danti, linux-raid

On Mon, Jan 13, 2014 at 06:46:15PM -0600, Stan Hoeppner wrote:
> On 1/13/2014 5:38 PM, keld@keldix.com wrote:
> > On Tue, Jan 14, 2014 at 09:27:51AM +1100, NeilBrown wrote:
> ...
> >> So this change:
> >>
> >> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733
> >>
> >> was wrong.
> > 
> > Well, it was me doing the wikipedia edit. The edit was done based on information from Neil that this was actually 
> > the layout. Then later we found out that it really was not, but it should be; and then Neil implemented
> > the better layout.  Maybe it is not called "f2", I look forward to be informed what the actual name 
> > will be. 
> > 
> > I think the name should be "f2" as it is a "far" layout, with 2 copies, and it really should be
> > the default for "far" with 2 copies, as the redundancy is much better than the old layout.
> > Keeping the name would mean that  we would not need to make and spread documentation on this,
> > so that people following existing documentation would automatically get the better implementation.
> > There is no need that new raid instances of "far" should get the old layout, except for
> > backwards compatibility. 
> 
> The problem here is that you're creating the Wikipedia page as if it
> *is* source reference material.  I.e. you're including "original work,
> your original work.  This is a violation of the Wikipedia rules of
> editing.  And this kind of situation is exactly why those rules exist.


I am only referencing material available other places.

> The layout tables you are including need to exist in a free to duplicate
> reference document, and should be copied verbatim from said document.
> They should not be created from scratch simply based on information in
> an email exchange on a mailing list, just as web forums are not
> considered a valid reference source.

I only described things that was already described.

best regards
keld

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: RAID 10 far and offset on-disk layouts
  2014-01-13  9:45             ` NeilBrown
  2014-01-13 10:15               ` Gionatan Danti
@ 2014-01-14 10:06               ` keld
  1 sibling, 0 replies; 22+ messages in thread
From: keld @ 2014-01-14 10:06 UTC (permalink / raw)
  To: NeilBrown; +Cc: Gionatan Danti, linux-raid

On Mon, Jan 13, 2014 at 08:45:34PM +1100, NeilBrown wrote:
> On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> 
> I think I know what you are talking about now.  The md driver in the kernel
> supports two sorts of 'far' or 'offset' layouts for arrays where the number
> of devices is not an integer multiple of the number of copies.
> This has been supported in Linux since v3.9. but is not yet supported by
> mdadm.

Hmm, we discussed also the new layouts for when the number of drives are 
a whole multiple of the number of copies. That layout should follow the same 
principles.

How do I generate the new format on kernel 3.9?

best regards
keld

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-01-14 10:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti
2013-12-27 14:46 ` Peter Grandi
2013-12-27 15:16 ` Gionatan Danti
2013-12-27 17:16   ` Peter Grandi
2013-12-27 17:32     ` Gionatan Danti
2013-12-27 18:26       ` keld
2013-12-27 15:19 ` keld
2013-12-27 15:22   ` Gionatan Danti
2013-12-27 15:49     ` keld
2014-01-09  8:03       ` Gionatan Danti
2014-01-12 23:20         ` NeilBrown
2014-01-13  8:52           ` Gionatan Danti
2014-01-13  9:45             ` NeilBrown
2014-01-13 10:15               ` Gionatan Danti
2014-01-13 22:27                 ` NeilBrown
2014-01-13 23:38                   ` keld
2014-01-14  0:46                     ` Stan Hoeppner
2014-01-14  9:38                       ` keld
2014-01-14  9:06                   ` Gionatan Danti
2014-01-14  9:16                     ` NeilBrown
2014-01-14  9:27                       ` Gionatan Danti
2014-01-14 10:06               ` keld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).