* RAID 10 far and offset on-disk layouts
@ 2013-12-27 14:29 Gionatan Danti
2013-12-27 14:46 ` Peter Grandi
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Gionatan Danti @ 2013-12-27 14:29 UTC (permalink / raw)
To: linux-raid
Hi all,
I (think of) quite well understand how far and offset work, but I can
not find any data on the precise on-disk layout.
FAR LAYOUT
md(4) states:
"The first copy of all data blocks will be striped across the early part
of all drives in RAID0 fashion, and then the next copy of all blocks
will be striped across a later section of all drives, always ensuring
that all copies of any given block are on different drives"
The "on different drives" part let me wonder _how_ are chunks
distributed. On a 4-disk array, I can imagine some different schemas:
1) A1 A2 A3 A4
.. .. .. ..
A4 A1 A2 A3
2) A1 A2 A3 A4
.. .. .. ..
A2 A1 A4 A3
The first schema is the one depicted by SuSe documentation [1], while
the second is the one described by Wikipedia [2].
Question 1: as the two schema have different reliability
characteristics, which is really used?
OFFSET LAYOUT
md(4) states:
"When 'offset' replicas are chosen, the multiple copies of a given chunk
are laid out on consecutive drives and at consecutive offsets.
Effectively each stripe is duplicated and the copies are offset by one
device."
This means a schema like this:
3) A1 A2 A3 A4
A4 A1 A2 A3
.. .. .. ..
However, this is susceptible to any consecutive two-disk failures. A
schema like
4) A1 A2 A3 A4
A2 A1 A4 A3
would not suffer from this problem (eg: disk 2 & 3 can fail and the
array is still working).
Question 2: apart from simplicity, why the offset layout use the schema
as n.3? I miss something?
Regards.
--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@assyoma.it - info@assyoma.it
GPG public key ID: FF5F32A8
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: RAID 10 far and offset on-disk layouts 2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti @ 2013-12-27 14:46 ` Peter Grandi 2013-12-27 15:16 ` Gionatan Danti 2013-12-27 15:19 ` keld 2 siblings, 0 replies; 22+ messages in thread From: Peter Grandi @ 2013-12-27 14:46 UTC (permalink / raw) To: Linux RAID [ ... ] > The "on different drives" part let me wonder _how_ are chunks > distributed. [ ... ] Question 1: as the two schema have > different reliability characteristics, which is really used? It does not matter (except to people writing MD-specific tools). There is nothing special as to the ordering of drives or chunks on drives. Also reliability is a *statistical* property not a geometric one... > [ ... ] However, this is susceptible to any consecutive > two-disk failures. A schema like [... ] would not suffer from > this problem (eg: disk 2 & 3 can fail and the array is still > working). That "consecutive two-disk failures" is really funny! If two-paired-disk failure in RAID10 bother you, try RAID14: http://www.sabi.co.uk/blog/13-two.html#131213 Warning: that does not come at no cost :-). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti 2013-12-27 14:46 ` Peter Grandi @ 2013-12-27 15:16 ` Gionatan Danti 2013-12-27 17:16 ` Peter Grandi 2013-12-27 15:19 ` keld 2 siblings, 1 reply; 22+ messages in thread From: Gionatan Danti @ 2013-12-27 15:16 UTC (permalink / raw) To: linux-raid > It does not matter (except to people writing MD-specific tools). > There is nothing special as to the ordering of drives or chunks > on drives. Also reliability is a *statistical* property not a > geometric one... Uhm, why it don't matter? For clarity, let me redraw the two schemas: 1) A1 A2 A3 A4 .. .. .. .. A4 A1 A2 A3 2) A1 A2 A3 A4 .. .. .. .. A2 A1 A4 A3 Schema n.1 will fail on any adjacent disk failure. Eg: 1 & 2, 2 & 3, 3 & 4, 4 & 1. On the other hand, schema n.2 will become inactive only when 1 & 2 or 3 & 4 disk fail, but not, for example, when 2 & 3 or 1 & 4 fail. Or I misunderstand something? > That "consecutive two-disk failures" is really funny! Er, my English is not very good :p I really was talking about adjacent disk failures. Sorry! > If two-paired-disk failure in RAID10 bother you, try RAID14: > > http://www.sabi.co.uk/blog/13-two.html#131213 > > Warning: that does not come at no cost :-). Thank you very mych for the link! I need some time to read it carefully... Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2013-12-27 15:16 ` Gionatan Danti @ 2013-12-27 17:16 ` Peter Grandi 2013-12-27 17:32 ` Gionatan Danti 0 siblings, 1 reply; 22+ messages in thread From: Peter Grandi @ 2013-12-27 17:16 UTC (permalink / raw) To: Linux RAID >> It does not matter (except to people writing MD-specific >> tools). There is nothing special as to the ordering of >> drives or chunks on drives. Also reliability is a >> *statistical* property not a geometric one... [ ... ] > 1) A1 A2 A3 A4 > .. .. .. .. > A4 A1 A2 A3 > 2) A1 A2 A3 A4 > .. .. .. .. > A2 A1 A4 A3 > Schema n.1 will fail on any adjacent disk failure. Eg: 1 & 2, > 2 & 3, 3 & 4, 4 & 1. On the other hand, schema n.2 will become > inactive only when 1 & 2 or 3 & 4 disk fail, but not, for > example, when 2 & 3 or 1 & 4 fail. >> That "consecutive two-disk failures" is really funny! > [ ... ] talking about adjacent disk failures. Sorry! The same, that the MD member devices are adjacent or not does not matter, ordering is irrelevant. When you compare layout 1) above and 2) above what matters is how many 2-device failures lead to loss of data, not how many "adjacent" 2-device failures. RAID10 has the property that only the failure of 2 *paired* (for the usual case of two copies of the same chunk) member devices, whether "adjacent" or not, will lead to loss of data. So what matters are which devices are paired, not whether they are adjacent or not. Using the layout convention of: https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#byz81ho and doing the full picture of 4 stripes, with chunks 0, 1, 2, 3, 4, 5, 6, each replicated on 2 distinct drives out of 4: 1) a b c d -------------------------- 0 1 2 3 3 0 1 2 4 5 6 7 7 4 5 6 2) a b c d -------------------------- 0 1 2 3 1 0 3 2 4 5 6 7 5 4 7 6 It becomes more easily apparent that in layout 1): * 'a' is paired with: - 'b' (chunks 0 and 4); - 'd' (chunks 3 and 7). * 'c' is paired with: - 'b' (chunks 1 and 2); - 'd' (chunks 5 and 6). while in layout 2) 'a' is paired with 'b' (chunks 0, 1, 4, 5) and 'c' with 'd' (chunks 2, 3, 6, 7). Therefore only the failure of 'a' and 'c' will result in no data loss. It is very easy to swap around 'b' and 'c' an entirely equivalent layout where not every failure of two "adjacent" devices results in data loss. Therefore the *probability* of loss of data because of 2 member devices failing is higher in layout 1) than layout 2), whether or not the drives are "adjacent". Note that arguably layout 1) is not really RAID10, because an important property of RAID10 is or should be that there are only N/2 pairs out of N drives. Otherwise it is not quite 'RAID1' if a chunk position in a stripe can be replicated on 2 other devices, half the replicas on one and half on another. That the member devices are *adjacent* is irrelevant; what matters is the statistical chance, which is driven by the percent of cases where 2 failures result in data loss, which driven by the number of paired drives. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2013-12-27 17:16 ` Peter Grandi @ 2013-12-27 17:32 ` Gionatan Danti 2013-12-27 18:26 ` keld 0 siblings, 1 reply; 22+ messages in thread From: Gionatan Danti @ 2013-12-27 17:32 UTC (permalink / raw) To: Peter Grandi, Linux RAID > <snip> > Therefore the *probability* of loss of data because of 2 member > devices failing is higher in layout 1) than layout 2), whether > or not the drives are "adjacent". > > Note that arguably layout 1) is not really RAID10, because an > important property of RAID10 is or should be that there are > only N/2 pairs out of N drives. Otherwise it is not quite > 'RAID1' if a chunk position in a stripe can be replicated on 2 > other devices, half the replicas on one and half on another. > > That the member devices are *adjacent* is irrelevant; what > matters is the statistical chance, which is driven by the > percent of cases where 2 failures result in data loss, which > driven by the number of paired drives. Very detailed answer, thank you Peter :) Based on what keld told before, the current scheme if n.2 (wikipedia's one), right? It is possible, using mdadm, understand the physical layout (if n.1 or n.2) of a live RAID10 array? As schema n.1 lead to increased probability of data loss, why offset layout use it instead of, say, some variance of schema n.2? Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2013-12-27 17:32 ` Gionatan Danti @ 2013-12-27 18:26 ` keld 0 siblings, 0 replies; 22+ messages in thread From: keld @ 2013-12-27 18:26 UTC (permalink / raw) To: Gionatan Danti; +Cc: Peter Grandi, Linux RAID On Fri, Dec 27, 2013 at 06:32:48PM +0100, Gionatan Danti wrote: > ><snip> > >Therefore the *probability* of loss of data because of 2 member > >devices failing is higher in layout 1) than layout 2), whether > >or not the drives are "adjacent". > > > > Note that arguably layout 1) is not really RAID10, because an > > important property of RAID10 is or should be that there are > > only N/2 pairs out of N drives. Otherwise it is not quite > > 'RAID1' if a chunk position in a stripe can be replicated on 2 > > other devices, half the replicas on one and half on another. > > > >That the member devices are *adjacent* is irrelevant; what > >matters is the statistical chance, which is driven by the > >percent of cases where 2 failures result in data loss, which > >driven by the number of paired drives. > > Very detailed answer, thank you Peter :) > > Based on what keld told before, the current scheme if n.2 (wikipedia's > one), right? It is possible, using mdadm, understand the physical layout > (if n.1 or n.2) of a live RAID10 array? > > As schema n.1 lead to increased probability of data loss, why offset > layout use it instead of, say, some variance of schema n.2? I am not sure of the probabilities on chances of surviving more than 1 failing drive for the offset layout, but my intuition tells me it is rather bad. As it shifts the blocks one block at a time, my guts feeling is that it really cannot survive more than one failing disk. On the other hand raid10,far in the second layout (wikipedia - and I am the author of the text:-) I am quite sure that the layout is theoretically optimal, as you in the luckiest case can survive n/2 drives failing, where n is your number of drives, and it is integer division... I did the design of this layout for maximum redundancy. The main reason for chosing raid10,far is that it is faster for single reads, a speed of raid0, while for other operations it is about the same. For degraded arrays raid10,far is probably worse than the other raid10 types, while the IO scheduling algorithm probably remedies some of the bad raw performance on the degraded raid10,far. Also the use of inner and faster sectors of a hard drive gives raid10,far an edge towards the other raid10 types. Best regards Keld ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti 2013-12-27 14:46 ` Peter Grandi 2013-12-27 15:16 ` Gionatan Danti @ 2013-12-27 15:19 ` keld 2013-12-27 15:22 ` Gionatan Danti 2 siblings, 1 reply; 22+ messages in thread From: keld @ 2013-12-27 15:19 UTC (permalink / raw) To: Gionatan Danti; +Cc: linux-raid On Fri, Dec 27, 2013 at 03:29:49PM +0100, Gionatan Danti wrote: > Hi all, > I (think of) quite well understand how far and offset work, but I can > not find any data on the precise on-disk layout. > > FAR LAYOUT > md(4) states: > "The first copy of all data blocks will be striped across the early part > of all drives in RAID0 fashion, and then the next copy of all blocks > will be striped across a later section of all drives, always ensuring > that all copies of any given block are on different drives" > > The "on different drives" part let me wonder _how_ are chunks > distributed. On a 4-disk array, I can imagine some different schemas: > > 1) A1 A2 A3 A4 > .. .. .. .. > A4 A1 A2 A3 > > 2) A1 A2 A3 A4 > .. .. .. .. > A2 A1 A4 A3 > > The first schema is the one depicted by SuSe documentation [1], while > the second is the one described by Wikipedia [2]. > > Question 1: as the two schema have different reliability > characteristics, which is really used? The wikipedia description is what you get for new arrays with newer kernels, while the suse documentation is what you will get with older kernels. The wikipedia layout was made because there are better chances of recovery, Chances went from 1/3 to 2/3 with eg 4 drives, when 2 drives were failing. I would say that the Suse description is just not updated. Best regards Keld ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2013-12-27 15:19 ` keld @ 2013-12-27 15:22 ` Gionatan Danti 2013-12-27 15:49 ` keld 0 siblings, 1 reply; 22+ messages in thread From: Gionatan Danti @ 2013-12-27 15:22 UTC (permalink / raw) To: keld; +Cc: linux-raid > On Fri, Dec 27, 2013 at 03:29:49PM +0100, Gionatan Danti wrote: > The wikipedia description is what you get for new arrays with newer > kernels, while the suse documentation is what you will get with older kernels. > The wikipedia layout was made because there are better chances of recovery, > Chances went from 1/3 to 2/3 with eg 4 drives, when 2 drives were failing. > > I would say that the Suse description is just not updated. > > Best regards > Keld > Interesting. Two question: 1) from which kernel the layout is the one depicted by Wikipedia? 2) it is possible, using mdadm, check what "far" layout is in use? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2013-12-27 15:22 ` Gionatan Danti @ 2013-12-27 15:49 ` keld 2014-01-09 8:03 ` Gionatan Danti 0 siblings, 1 reply; 22+ messages in thread From: keld @ 2013-12-27 15:49 UTC (permalink / raw) To: Gionatan Danti; +Cc: linux-raid On Fri, Dec 27, 2013 at 04:22:55PM +0100, Gionatan Danti wrote: > > >On Fri, Dec 27, 2013 at 03:29:49PM +0100, Gionatan Danti wrote: > >The wikipedia description is what you get for new arrays with newer > >kernels, while the suse documentation is what you will get with older > >kernels. > >The wikipedia layout was made because there are better chances of recovery, > >Chances went from 1/3 to 2/3 with eg 4 drives, when 2 drives were failing. > > > >I would say that the Suse description is just not updated. > > > >Best regards > >Keld > > > > Interesting. Two question: > 1) from which kernel the layout is the one depicted by Wikipedia? > 2) it is possible, using mdadm, check what "far" layout is in use? I cannot answer that. Neil Brown should know. Best regards Keld ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2013-12-27 15:49 ` keld @ 2014-01-09 8:03 ` Gionatan Danti 2014-01-12 23:20 ` NeilBrown 0 siblings, 1 reply; 22+ messages in thread From: Gionatan Danti @ 2014-01-09 8:03 UTC (permalink / raw) To: linux-raid; +Cc: keld, neilb >> >> Interesting. Two question: >> 1) from which kernel the layout is the one depicted by Wikipedia? >> 2) it is possible, using mdadm, check what "far" layout is in use? > > I cannot answer that. Neil Brown should know. > > Best regards > Keld > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Hi all, anyone with an update on these two questions? I was thinking to use the kernel block trace facility to track disk access and infer the on-disk data structure, but I haven't tried for now. On the other hand, I carefully looked at mdadm output, without finding anything related to physical block placing. Any new advices on that regard? Thanks. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-09 8:03 ` Gionatan Danti @ 2014-01-12 23:20 ` NeilBrown 2014-01-13 8:52 ` Gionatan Danti 0 siblings, 1 reply; 22+ messages in thread From: NeilBrown @ 2014-01-12 23:20 UTC (permalink / raw) To: Gionatan Danti; +Cc: linux-raid, keld [-- Attachment #1: Type: text/plain, Size: 1160 bytes --] On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > >> > >> Interesting. Two question: > >> 1) from which kernel the layout is the one depicted by Wikipedia? Exactly what depiction in wikipedia are you referring to? A link to the image might help. > >> 2) it is possible, using mdadm, check what "far" layout is in use? mdadm --detail /dev/mdWHATEVER | grep Layout > > > > I cannot answer that. Neil Brown should know. > > > > Best regards > > Keld > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > Hi all, > anyone with an update on these two questions? > > I was thinking to use the kernel block trace facility to track disk > access and infer the on-disk data structure, but I haven't tried for now. > > On the other hand, I carefully looked at mdadm output, without finding > anything related to physical block placing. Look for "Layout". NeilBrown > > Any new advices on that regard? > Thanks. > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-12 23:20 ` NeilBrown @ 2014-01-13 8:52 ` Gionatan Danti 2014-01-13 9:45 ` NeilBrown 0 siblings, 1 reply; 22+ messages in thread From: Gionatan Danti @ 2014-01-13 8:52 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti Hi Neil, let me recap from a previous message: >FAR LAYOUT >md(4) states: >"The first copy of all data blocks will be striped across the early >part >of all drives in RAID0 fashion, and then the next copy of all blocks >will be striped across a later section of all drives, always ensuring >that all copies of any given block are on different drives" > >The "on different drives" part let me wonder _how_ are chunks >distributed. On a 4-disk array, I can imagine some different schemas: > >1) A1 A2 A3 A4 > .. .. .. .. > A4 A1 A2 A3 > >2) A1 A2 A3 A4 > .. .. .. .. > A2 A1 A4 A3 > >The first schema is the one depicted by SuSe documentation [1], while >the second is the one described by Wikipedia [2]. > >Question 1: as the two schema have different reliability >characteristics, which is really used? SuSe entry: https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk Wikipedia entry: http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how far layout is depicted) Keld kindly told me that the SuSe is simply not updated, as it depict a situation changed with newer kernels. So my two questions: 1) from which kernel the layout is the one depicted by Wikipedia? 2) it is possible, using mdadm, check what "far" layout is in use? From what I can see, a "mdadm --detail /dev/mdWHATEVER | grep Layout" tell me if using far vs near vs offset layout, but not the physical on-disk chunks organization (eg: far "type" 1 or 2). Anyway, the thread started because I wonder why the OFFSET layout couple each disk to other two disks. Let me quote again: >OFFSET LAYOUT >md(4) states: >"When 'offset' replicas are chosen, the multiple copies of a given >chunk >are laid out on consecutive drives and at consecutive offsets. >Effectively each stripe is duplicated and the copies are offset by one >device." > >This means a schema like this: > >3) A1 A2 A3 A4 > A4 A1 A2 A3 > .. .. .. .. > >However, this is susceptible to any consecutive two-disk failures. A >schema like > >4) A1 A2 A3 A4 > A2 A1 A4 A3 > >would not suffer from this problem (eg: disk 2 & 3 can fail and the >array is still working). > >Question 2: apart from simplicity, why the offset layout use the schema >as n.3? I miss something? Full thread link: http://marc.info/?t=138815504400002&r=1&w=2 Excuse me for the long email, I am simply trying to learn something :) Thank you very much. On 01/13/2014 12:20 AM, NeilBrown wrote: > On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > >>>> >>>> Interesting. Two question: >>>> 1) from which kernel the layout is the one depicted by Wikipedia? > > Exactly what depiction in wikipedia are you referring to? A link to the > image might help. > >>>> 2) it is possible, using mdadm, check what "far" layout is in use? > > mdadm --detail /dev/mdWHATEVER | grep Layout > > >>> >>> I cannot answer that. Neil Brown should know. >>> >>> Best regards >>> Keld >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> Hi all, >> anyone with an update on these two questions? >> >> I was thinking to use the kernel block trace facility to track disk >> access and infer the on-disk data structure, but I haven't tried for now. >> >> On the other hand, I carefully looked at mdadm output, without finding >> anything related to physical block placing. > > Look for "Layout". > > NeilBrown > > >> >> Any new advices on that regard? >> Thanks. >> > -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-13 8:52 ` Gionatan Danti @ 2014-01-13 9:45 ` NeilBrown 2014-01-13 10:15 ` Gionatan Danti 2014-01-14 10:06 ` keld 0 siblings, 2 replies; 22+ messages in thread From: NeilBrown @ 2014-01-13 9:45 UTC (permalink / raw) To: Gionatan Danti; +Cc: linux-raid, keld [-- Attachment #1: Type: text/plain, Size: 4829 bytes --] On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > Hi Neil, > let me recap from a previous message: > > >FAR LAYOUT > >md(4) states: > >"The first copy of all data blocks will be striped across the early >part > >of all drives in RAID0 fashion, and then the next copy of all blocks > >will be striped across a later section of all drives, always ensuring > >that all copies of any given block are on different drives" > > > >The "on different drives" part let me wonder _how_ are chunks > >distributed. On a 4-disk array, I can imagine some different schemas: > > > >1) A1 A2 A3 A4 > > .. .. .. .. > > A4 A1 A2 A3 > > > >2) A1 A2 A3 A4 > > .. .. .. .. > > A2 A1 A4 A3 > > > >The first schema is the one depicted by SuSe documentation [1], while > >the second is the one described by Wikipedia [2]. > > > >Question 1: as the two schema have different reliability > >characteristics, which is really used? > > SuSe entry: > https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk > > Wikipedia entry: > http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how > far layout is depicted) > > Keld kindly told me that the SuSe is simply not updated, as it depict a > situation changed with newer kernels. So my two questions: I cannot see an important difference between the two pages you reference. Both appear to be correct. > 1) from which kernel the layout is the one depicted by Wikipedia? These are both valid for any kernel since 2.6.18 with mdadm 2.5 or later. > 2) it is possible, using mdadm, check what "far" layout is in use? I think I know what you are talking about now. The md driver in the kernel supports two sorts of 'far' or 'offset' layouts for arrays where the number of devices is not an integer multiple of the number of copies. This has been supported in Linux since v3.9. but is not yet supported by mdadm. > > From what I can see, a "mdadm --detail /dev/mdWHATEVER | grep Layout" > tell me if using far vs near vs offset layout, but not the physical > on-disk chunks organization (eg: far "type" 1 or 2). This is because mdadm does not yet create or report on the new type. When it does, the above command will be the correct command to find out which layout is in use (but I don't yet know what the output will say exactly). NeilBrown > > Anyway, the thread started because I wonder why the OFFSET layout couple > each disk to other two disks. Let me quote again: > > >OFFSET LAYOUT > >md(4) states: > >"When 'offset' replicas are chosen, the multiple copies of a given >chunk > >are laid out on consecutive drives and at consecutive offsets. > >Effectively each stripe is duplicated and the copies are offset by one > >device." > > > >This means a schema like this: > > > >3) A1 A2 A3 A4 > > A4 A1 A2 A3 > > .. .. .. .. > > > >However, this is susceptible to any consecutive two-disk failures. A > >schema like > > > >4) A1 A2 A3 A4 > > A2 A1 A4 A3 > > > >would not suffer from this problem (eg: disk 2 & 3 can fail and the > >array is still working). > > > >Question 2: apart from simplicity, why the offset layout use the schema > >as n.3? I miss something? > > Full thread link: http://marc.info/?t=138815504400002&r=1&w=2 > > Excuse me for the long email, I am simply trying to learn something :) > Thank you very much. > > On 01/13/2014 12:20 AM, NeilBrown wrote: > > On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > > > >>>> > >>>> Interesting. Two question: > >>>> 1) from which kernel the layout is the one depicted by Wikipedia? > > > > Exactly what depiction in wikipedia are you referring to? A link to the > > image might help. > > > >>>> 2) it is possible, using mdadm, check what "far" layout is in use? > > > > mdadm --detail /dev/mdWHATEVER | grep Layout > > > > > >>> > >>> I cannot answer that. Neil Brown should know. > >>> > >>> Best regards > >>> Keld > >>> -- > >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >>> the body of a message to majordomo@vger.kernel.org > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >> > >> Hi all, > >> anyone with an update on these two questions? > >> > >> I was thinking to use the kernel block trace facility to track disk > >> access and infer the on-disk data structure, but I haven't tried for now. > >> > >> On the other hand, I carefully looked at mdadm output, without finding > >> anything related to physical block placing. > > > > Look for "Layout". > > > > NeilBrown > > > > > >> > >> Any new advices on that regard? > >> Thanks. > >> > > > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-13 9:45 ` NeilBrown @ 2014-01-13 10:15 ` Gionatan Danti 2014-01-13 22:27 ` NeilBrown 2014-01-14 10:06 ` keld 1 sibling, 1 reply; 22+ messages in thread From: Gionatan Danti @ 2014-01-13 10:15 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti On 01/13/2014 10:45 AM, NeilBrown wrote: > On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > >> Hi Neil, >> let me recap from a previous message: >> >> >FAR LAYOUT >> >md(4) states: >> >"The first copy of all data blocks will be striped across the early >part >> >of all drives in RAID0 fashion, and then the next copy of all blocks >> >will be striped across a later section of all drives, always ensuring >> >that all copies of any given block are on different drives" >> > >> >The "on different drives" part let me wonder _how_ are chunks >> >distributed. On a 4-disk array, I can imagine some different schemas: >> > >> >1) A1 A2 A3 A4 >> > .. .. .. .. >> > A4 A1 A2 A3 >> > >> >2) A1 A2 A3 A4 >> > .. .. .. .. >> > A2 A1 A4 A3 >> > >> >The first schema is the one depicted by SuSe documentation [1], while >> >the second is the one described by Wikipedia [2]. >> > >> >Question 1: as the two schema have different reliability >> >characteristics, which is really used? >> >> SuSe entry: >> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk >> >> Wikipedia entry: >> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how >> far layout is depicted) >> >> Keld kindly told me that the SuSe is simply not updated, as it depict a >> situation changed with newer kernels. So my two questions: > > I cannot see an important difference between the two pages you reference. > Both appear to be correct. Mmm... they seem different to me. SeSe FAR Layout: sda1 sdb1 sdc1 sde1 0 1 2 3 4 5 6 7 . . . 3 0 1 2 7 4 5 6 Notice how (for example) sdb1 is coupled both to sda1 (0,4) and sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss. Now, Wikipedia FAR Layout: 4 drives (sda1, sdb1, sdc1, sdd1) -------------------- A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 .. .. .. .. A2 A1 A4 A3 A6 A5 A8 A7 A10 A9 A12 A11 .. .. .. .. Notice now how a single disk (eg: sdb1) is coupled to only another _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss. I am wrong? Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-13 10:15 ` Gionatan Danti @ 2014-01-13 22:27 ` NeilBrown 2014-01-13 23:38 ` keld 2014-01-14 9:06 ` Gionatan Danti 0 siblings, 2 replies; 22+ messages in thread From: NeilBrown @ 2014-01-13 22:27 UTC (permalink / raw) To: Gionatan Danti; +Cc: linux-raid, keld [-- Attachment #1: Type: text/plain, Size: 3060 bytes --] On Mon, 13 Jan 2014 11:15:13 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > On 01/13/2014 10:45 AM, NeilBrown wrote: > > On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > > > >> Hi Neil, > >> let me recap from a previous message: > >> > >> >FAR LAYOUT > >> >md(4) states: > >> >"The first copy of all data blocks will be striped across the early >part > >> >of all drives in RAID0 fashion, and then the next copy of all blocks > >> >will be striped across a later section of all drives, always ensuring > >> >that all copies of any given block are on different drives" > >> > > >> >The "on different drives" part let me wonder _how_ are chunks > >> >distributed. On a 4-disk array, I can imagine some different schemas: > >> > > >> >1) A1 A2 A3 A4 > >> > .. .. .. .. > >> > A4 A1 A2 A3 > >> > > >> >2) A1 A2 A3 A4 > >> > .. .. .. .. > >> > A2 A1 A4 A3 > >> > > >> >The first schema is the one depicted by SuSe documentation [1], while > >> >the second is the one described by Wikipedia [2]. > >> > > >> >Question 1: as the two schema have different reliability > >> >characteristics, which is really used? > >> > >> SuSe entry: > >> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk > >> > >> Wikipedia entry: > >> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how > >> far layout is depicted) > >> > >> Keld kindly told me that the SuSe is simply not updated, as it depict a > >> situation changed with newer kernels. So my two questions: > > > > I cannot see an important difference between the two pages you reference. > > Both appear to be correct. > > Mmm... they seem different to me. > > SeSe FAR Layout: > > sda1 sdb1 sdc1 sde1 > 0 1 2 3 > 4 5 6 7 > . . . > 3 0 1 2 > 7 4 5 6 > > Notice how (for example) sdb1 is coupled both to sda1 (0,4) and > sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss. > > Now, Wikipedia FAR Layout: > > 4 drives (sda1, sdb1, sdc1, sdd1) > -------------------- > A1 A2 A3 A4 > A5 A6 A7 A8 > A9 A10 A11 A12 > .. .. .. .. > A2 A1 A4 A3 > A6 A5 A8 A7 > A10 A9 A12 A11 > .. .. .. .. > > Notice now how a single disk (eg: sdb1) is coupled to only another > _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose > sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss. > Thanks for being explicit - it is much easier to answer explicit questions :-) Yes, they are different. So the wikipedia article is wrong, or at least misleading. That is not what the "f2" layout looks like. The md driver does support that layout. I don't know yet what mdadm will call it, but it won't be called "f2". So this change: http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733 was wrong. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-13 22:27 ` NeilBrown @ 2014-01-13 23:38 ` keld 2014-01-14 0:46 ` Stan Hoeppner 2014-01-14 9:06 ` Gionatan Danti 1 sibling, 1 reply; 22+ messages in thread From: keld @ 2014-01-13 23:38 UTC (permalink / raw) To: NeilBrown; +Cc: Gionatan Danti, linux-raid On Tue, Jan 14, 2014 at 09:27:51AM +1100, NeilBrown wrote: > On Mon, 13 Jan 2014 11:15:13 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > > > On 01/13/2014 10:45 AM, NeilBrown wrote: > > > On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > > > > > >> Hi Neil, > > >> let me recap from a previous message: > > >> > > >> >FAR LAYOUT > > >> >md(4) states: > > >> >"The first copy of all data blocks will be striped across the early >part > > >> >of all drives in RAID0 fashion, and then the next copy of all blocks > > >> >will be striped across a later section of all drives, always ensuring > > >> >that all copies of any given block are on different drives" > > >> > > > >> >The "on different drives" part let me wonder _how_ are chunks > > >> >distributed. On a 4-disk array, I can imagine some different schemas: > > >> > > > >> >1) A1 A2 A3 A4 > > >> > .. .. .. .. > > >> > A4 A1 A2 A3 > > >> > > > >> >2) A1 A2 A3 A4 > > >> > .. .. .. .. > > >> > A2 A1 A4 A3 > > >> > > > >> >The first schema is the one depicted by SuSe documentation [1], while > > >> >the second is the one described by Wikipedia [2]. > > >> > > > >> >Question 1: as the two schema have different reliability > > >> >characteristics, which is really used? > > >> > > >> SuSe entry: > > >> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk > > >> > > >> Wikipedia entry: > > >> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how > > >> far layout is depicted) > > >> > > >> Keld kindly told me that the SuSe is simply not updated, as it depict a > > >> situation changed with newer kernels. So my two questions: > > > > > > I cannot see an important difference between the two pages you reference. > > > Both appear to be correct. > > > > Mmm... they seem different to me. > > > > SeSe FAR Layout: > > > > sda1 sdb1 sdc1 sde1 > > 0 1 2 3 > > 4 5 6 7 > > . . . > > 3 0 1 2 > > 7 4 5 6 > > > > Notice how (for example) sdb1 is coupled both to sda1 (0,4) and > > sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss. > > > > Now, Wikipedia FAR Layout: > > > > 4 drives (sda1, sdb1, sdc1, sdd1) > > -------------------- > > A1 A2 A3 A4 > > A5 A6 A7 A8 > > A9 A10 A11 A12 > > .. .. .. .. > > A2 A1 A4 A3 > > A6 A5 A8 A7 > > A10 A9 A12 A11 > > .. .. .. .. > > > > Notice now how a single disk (eg: sdb1) is coupled to only another > > _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose > > sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss. > > > > Thanks for being explicit - it is much easier to answer explicit questions :-) > > Yes, they are different. So the wikipedia article is wrong, or at least > misleading. That is not what the "f2" layout looks like. > > The md driver does support that layout. I don't know yet what mdadm will > call it, but it won't be called "f2". > > So this change: > > http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733 > > was wrong. Well, it was me doing the wikipedia edit. The edit was done based on information from Neil that this was actually the layout. Then later we found out that it really was not, but it should be; and then Neil implemented the better layout. Maybe it is not called "f2", I look forward to be informed what the actual name will be. I think the name should be "f2" as it is a "far" layout, with 2 copies, and it really should be the default for "far" with 2 copies, as the redundancy is much better than the old layout. Keeping the name would mean that we would not need to make and spread documentation on this, so that people following existing documentation would automatically get the better implementation. There is no need that new raid instances of "far" should get the old layout, except for backwards compatibility. Best regards keld ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-13 23:38 ` keld @ 2014-01-14 0:46 ` Stan Hoeppner 2014-01-14 9:38 ` keld 0 siblings, 1 reply; 22+ messages in thread From: Stan Hoeppner @ 2014-01-14 0:46 UTC (permalink / raw) To: keld, NeilBrown; +Cc: Gionatan Danti, linux-raid On 1/13/2014 5:38 PM, keld@keldix.com wrote: > On Tue, Jan 14, 2014 at 09:27:51AM +1100, NeilBrown wrote: ... >> So this change: >> >> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733 >> >> was wrong. > > Well, it was me doing the wikipedia edit. The edit was done based on information from Neil that this was actually > the layout. Then later we found out that it really was not, but it should be; and then Neil implemented > the better layout. Maybe it is not called "f2", I look forward to be informed what the actual name > will be. > > I think the name should be "f2" as it is a "far" layout, with 2 copies, and it really should be > the default for "far" with 2 copies, as the redundancy is much better than the old layout. > Keeping the name would mean that we would not need to make and spread documentation on this, > so that people following existing documentation would automatically get the better implementation. > There is no need that new raid instances of "far" should get the old layout, except for > backwards compatibility. The problem here is that you're creating the Wikipedia page as if it *is* source reference material. I.e. you're including "original work, your original work. This is a violation of the Wikipedia rules of editing. And this kind of situation is exactly why those rules exist. The layout tables you are including need to exist in a free to duplicate reference document, and should be copied verbatim from said document. They should not be created from scratch simply based on information in an email exchange on a mailing list, just as web forums are not considered a valid reference source. Therefore, if such layout tables do not exist in official Linux documentation they should not be included in Wikipedia. If they do exist, the information should be copied verbatim, and the source document referenced. There are no such references in the article. Wikipedia is an encyclopedia, not a reference work. All information needs to be source from reference work. If you want your original work to be included in Wikipedia, you need to create your own documentation, add it to the Linux Documentation Project, and have it peer reviewed. Then you can include it, and cite that source. -- Stan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-14 0:46 ` Stan Hoeppner @ 2014-01-14 9:38 ` keld 0 siblings, 0 replies; 22+ messages in thread From: keld @ 2014-01-14 9:38 UTC (permalink / raw) To: Stan Hoeppner; +Cc: NeilBrown, Gionatan Danti, linux-raid On Mon, Jan 13, 2014 at 06:46:15PM -0600, Stan Hoeppner wrote: > On 1/13/2014 5:38 PM, keld@keldix.com wrote: > > On Tue, Jan 14, 2014 at 09:27:51AM +1100, NeilBrown wrote: > ... > >> So this change: > >> > >> http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733 > >> > >> was wrong. > > > > Well, it was me doing the wikipedia edit. The edit was done based on information from Neil that this was actually > > the layout. Then later we found out that it really was not, but it should be; and then Neil implemented > > the better layout. Maybe it is not called "f2", I look forward to be informed what the actual name > > will be. > > > > I think the name should be "f2" as it is a "far" layout, with 2 copies, and it really should be > > the default for "far" with 2 copies, as the redundancy is much better than the old layout. > > Keeping the name would mean that we would not need to make and spread documentation on this, > > so that people following existing documentation would automatically get the better implementation. > > There is no need that new raid instances of "far" should get the old layout, except for > > backwards compatibility. > > The problem here is that you're creating the Wikipedia page as if it > *is* source reference material. I.e. you're including "original work, > your original work. This is a violation of the Wikipedia rules of > editing. And this kind of situation is exactly why those rules exist. I am only referencing material available other places. > The layout tables you are including need to exist in a free to duplicate > reference document, and should be copied verbatim from said document. > They should not be created from scratch simply based on information in > an email exchange on a mailing list, just as web forums are not > considered a valid reference source. I only described things that was already described. best regards keld ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-13 22:27 ` NeilBrown 2014-01-13 23:38 ` keld @ 2014-01-14 9:06 ` Gionatan Danti 2014-01-14 9:16 ` NeilBrown 1 sibling, 1 reply; 22+ messages in thread From: Gionatan Danti @ 2014-01-14 9:06 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti On 01/13/2014 11:27 PM, NeilBrown wrote: >> >> Mmm... they seem different to me. >> >> SeSe FAR Layout: >> >> sda1 sdb1 sdc1 sde1 >> 0 1 2 3 >> 4 5 6 7 >> . . . >> 3 0 1 2 >> 7 4 5 6 >> >> Notice how (for example) sdb1 is coupled both to sda1 (0,4) and >> sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss. >> >> Now, Wikipedia FAR Layout: >> >> 4 drives (sda1, sdb1, sdc1, sdd1) >> -------------------- >> A1 A2 A3 A4 >> A5 A6 A7 A8 >> A9 A10 A11 A12 >> .. .. .. .. >> A2 A1 A4 A3 >> A6 A5 A8 A7 >> A10 A9 A12 A11 >> .. .. .. .. >> >> Notice now how a single disk (eg: sdb1) is coupled to only another >> _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose >> sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss. >> > > Thanks for being explicit - it is much easier to answer explicit questions :-) > > Yes, they are different. So the wikipedia article is wrong, or at least > misleading. That is not what the "f2" layout looks like. > > The md driver does support that layout. I don't know yet what mdadm will > call it, but it won't be called "f2". > > So this change: > > http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733 > > was wrong. > > NeilBrown > Ok, so let recap: 1) FAR layout is the one depicted by SuSe documentation, while the Wikipedia entry is wrong 2) MD _can_ produce a FAR layout as depicted by Wikipedia, but we don't know how the user-space mdadm tool call it (maybe it is not implemented yet?) 3) There are any reasons why FAR and OFFSET layout scramble data in this manner, coupling any disk with two more disks? It was done for simplicity, or I am missing something? 4) you confirm that currently we can _not_ create a FAR layout as the one depicted by wikipedia by no means? What about OFFSET layout? Thank you very much. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-14 9:06 ` Gionatan Danti @ 2014-01-14 9:16 ` NeilBrown 2014-01-14 9:27 ` Gionatan Danti 0 siblings, 1 reply; 22+ messages in thread From: NeilBrown @ 2014-01-14 9:16 UTC (permalink / raw) To: Gionatan Danti; +Cc: linux-raid, keld [-- Attachment #1: Type: text/plain, Size: 2792 bytes --] On Tue, 14 Jan 2014 10:06:13 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > On 01/13/2014 11:27 PM, NeilBrown wrote: > >> > >> Mmm... they seem different to me. > >> > >> SeSe FAR Layout: > >> > >> sda1 sdb1 sdc1 sde1 > >> 0 1 2 3 > >> 4 5 6 7 > >> . . . > >> 3 0 1 2 > >> 7 4 5 6 > >> > >> Notice how (for example) sdb1 is coupled both to sda1 (0,4) and > >> sdc1(1,5). If sdb1 fails, any sda1 or sdc1 failure lead to data loss. > >> > >> Now, Wikipedia FAR Layout: > >> > >> 4 drives (sda1, sdb1, sdc1, sdd1) > >> -------------------- > >> A1 A2 A3 A4 > >> A5 A6 A7 A8 > >> A9 A10 A11 A12 > >> .. .. .. .. > >> A2 A1 A4 A3 > >> A6 A5 A8 A7 > >> A10 A9 A12 A11 > >> .. .. .. .. > >> > >> Notice now how a single disk (eg: sdb1) is coupled to only another > >> _single_ disk (eg: sda1). In this case, if sdb1 fails, you had to lose > >> sda1 to have a data loss. Losing sdc1 or sdd1 will _not_ lead to data loss. > >> > > > > Thanks for being explicit - it is much easier to answer explicit questions :-) > > > > Yes, they are different. So the wikipedia article is wrong, or at least > > misleading. That is not what the "f2" layout looks like. > > > > The md driver does support that layout. I don't know yet what mdadm will > > call it, but it won't be called "f2". > > > > So this change: > > > > http://en.wikipedia.org/w/index.php?title=Non-standard_RAID_levels&diff=501908270&oldid=501604733 > > > > was wrong. > > > > NeilBrown > > > > Ok, so let recap: > > 1) FAR layout is the one depicted by SuSe documentation, while the > Wikipedia entry is wrong Yes. > > 2) MD _can_ produce a FAR layout as depicted by Wikipedia, but we don't > know how the user-space mdadm tool call it (maybe it is not implemented > yet?) Yes. Not implemented yet. > > 3) There are any reasons why FAR and OFFSET layout scramble data in this > manner, coupling any disk with two more disks? It was done for > simplicity, or I am missing something? It just seemed the easiest thing to do at the time. > > 4) you confirm that currently we can _not_ create a FAR layout as the > one depicted by wikipedia by no means? What about OFFSET layout? You certainly can created the FAR layout depicted on wikipedia, e.g. by binary-editing the metadata on some devices, or writing some code which does that for you. It requires flipping one bit in the metadata and updating the checksum. You can probably even to it by writing something appropriate into some sysfs files. But mdadm cannot do it yet. Ditto for the new OFFSET layout. (The old offset layout can be created with "--layout=o2"). NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-14 9:16 ` NeilBrown @ 2014-01-14 9:27 ` Gionatan Danti 0 siblings, 0 replies; 22+ messages in thread From: Gionatan Danti @ 2014-01-14 9:27 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid, keld, Gionatan Danti >> Ok, so let recap: >> >> 1) FAR layout is the one depicted by SuSe documentation, while the >> Wikipedia entry is wrong > > Yes. > >> >> 2) MD _can_ produce a FAR layout as depicted by Wikipedia, but we don't >> know how the user-space mdadm tool call it (maybe it is not implemented >> yet?) > > Yes. Not implemented yet. > >> >> 3) There are any reasons why FAR and OFFSET layout scramble data in this >> manner, coupling any disk with two more disks? It was done for >> simplicity, or I am missing something? > > It just seemed the easiest thing to do at the time. > >> >> 4) you confirm that currently we can _not_ create a FAR layout as the >> one depicted by wikipedia by no means? What about OFFSET layout? > > You certainly can created the FAR layout depicted on wikipedia, e.g. by > binary-editing the metadata on some devices, or writing some code which does > that for you. It requires flipping one bit in the metadata and updating the > checksum. You can probably even to it by writing something appropriate into > some sysfs files. > But mdadm cannot do it yet. > Ditto for the new OFFSET layout. > (The old offset layout can be created with "--layout=o2"). > > NeilBrown > All clear now :) Thank you very much Neil. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@assyoma.it - info@assyoma.it GPG public key ID: FF5F32A8 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: RAID 10 far and offset on-disk layouts 2014-01-13 9:45 ` NeilBrown 2014-01-13 10:15 ` Gionatan Danti @ 2014-01-14 10:06 ` keld 1 sibling, 0 replies; 22+ messages in thread From: keld @ 2014-01-14 10:06 UTC (permalink / raw) To: NeilBrown; +Cc: Gionatan Danti, linux-raid On Mon, Jan 13, 2014 at 08:45:34PM +1100, NeilBrown wrote: > On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote: > > I think I know what you are talking about now. The md driver in the kernel > supports two sorts of 'far' or 'offset' layouts for arrays where the number > of devices is not an integer multiple of the number of copies. > This has been supported in Linux since v3.9. but is not yet supported by > mdadm. Hmm, we discussed also the new layouts for when the number of drives are a whole multiple of the number of copies. That layout should follow the same principles. How do I generate the new format on kernel 3.9? best regards keld ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2014-01-14 10:06 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti 2013-12-27 14:46 ` Peter Grandi 2013-12-27 15:16 ` Gionatan Danti 2013-12-27 17:16 ` Peter Grandi 2013-12-27 17:32 ` Gionatan Danti 2013-12-27 18:26 ` keld 2013-12-27 15:19 ` keld 2013-12-27 15:22 ` Gionatan Danti 2013-12-27 15:49 ` keld 2014-01-09 8:03 ` Gionatan Danti 2014-01-12 23:20 ` NeilBrown 2014-01-13 8:52 ` Gionatan Danti 2014-01-13 9:45 ` NeilBrown 2014-01-13 10:15 ` Gionatan Danti 2014-01-13 22:27 ` NeilBrown 2014-01-13 23:38 ` keld 2014-01-14 0:46 ` Stan Hoeppner 2014-01-14 9:38 ` keld 2014-01-14 9:06 ` Gionatan Danti 2014-01-14 9:16 ` NeilBrown 2014-01-14 9:27 ` Gionatan Danti 2014-01-14 10:06 ` keld
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).