linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Gionatan Danti <g.danti@assyoma.it>
Cc: linux-raid@vger.kernel.org, keld@keldix.com
Subject: Re: RAID 10 far and offset on-disk layouts
Date: Mon, 13 Jan 2014 20:45:34 +1100	[thread overview]
Message-ID: <20140113204534.737a98f6@notabene.brown> (raw)
In-Reply-To: <52D3A962.4000308@assyoma.it>

[-- Attachment #1: Type: text/plain, Size: 4829 bytes --]

On Mon, 13 Jan 2014 09:52:50 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:

> Hi Neil,
> let me recap from a previous message:
> 
>  >FAR LAYOUT
>  >md(4) states:
>  >"The first copy of all data blocks will be striped across the early >part
>  >of all drives in RAID0 fashion, and then the next copy of all blocks
>  >will be striped across a later section of all drives, always ensuring
>  >that all copies of any given block are on different drives"
>  >
>  >The "on different drives" part let me wonder _how_ are chunks
>  >distributed. On a 4-disk array, I can imagine some different schemas:
>  >
>  >1)	A1 A2 A3 A4
>  >	.. .. .. ..
>  >	A4 A1 A2 A3
>  >
>  >2)	A1 A2 A3 A4
>  >	.. .. .. ..
>  >	A2 A1 A4 A3
>  >
>  >The first schema is the one depicted by SuSe documentation [1], while
>  >the second is the one described by Wikipedia [2].
>  >
>  >Question 1: as the two schema have different reliability
>  >characteristics, which is really used?
> 
> SuSe entry: 
> https://www.suse.com/documentation/sles11/stor_admin/data/raidmdadmr10cpx.html#b7cynnk
> 
> Wikipedia entry: 
> http://en.wikipedia.org/wiki/Linux_MD_RAID_10#LINUX-MD-RAID-10 (see how 
> far layout is depicted)
> 
> Keld kindly told me that the SuSe is simply not updated, as it depict a 
> situation changed with newer kernels. So my two questions:

I cannot see an important difference between the two pages you reference.
Both appear to be correct.

> 1) from which kernel the layout is the one depicted by Wikipedia?

These are both valid for any kernel since 2.6.18 with mdadm 2.5 or later.

> 2) it is possible, using mdadm, check what "far" layout is in use?

I think I know what you are talking about now.  The md driver in the kernel
supports two sorts of 'far' or 'offset' layouts for arrays where the number
of devices is not an integer multiple of the number of copies.
This has been supported in Linux since v3.9. but is not yet supported by
mdadm.

> 
>  From what I can see, a "mdadm --detail /dev/mdWHATEVER | grep Layout" 
> tell me if using far vs near vs offset layout, but not the physical 
> on-disk chunks organization (eg: far "type" 1 or 2).

This is because mdadm does not yet create or report on the new type.
When it does, the above command will be the correct command to find out which
layout is in use (but I don't yet know what the output will say exactly).

NeilBrown


> 
> Anyway, the thread started because I wonder why the OFFSET layout couple 
> each disk to other two disks. Let me quote again:
> 
>  >OFFSET LAYOUT
>  >md(4) states:
>  >"When 'offset' replicas are chosen, the multiple copies of a given >chunk
>  >are laid out on consecutive drives and at consecutive offsets.
>  >Effectively each stripe is duplicated and the copies are offset by one
>  >device."
>  >
>  >This means a schema like this:
>  >	
>  >3)	A1 A2 A3 A4
>  >	A4 A1 A2 A3
>  >	.. .. .. ..
>  >
>  >However, this is susceptible to any consecutive two-disk failures. A
>  >schema like
>  >
>  >4)	A1 A2 A3 A4
>  >	A2 A1 A4 A3
>  >
>  >would not suffer from this problem (eg: disk 2 & 3 can fail and the
>  >array is still working).
>  >
>  >Question 2: apart from simplicity, why the offset layout use the schema
>  >as n.3? I miss something?
> 
> Full thread link: http://marc.info/?t=138815504400002&r=1&w=2
> 
> Excuse me for the long email, I am simply trying to learn something :)
> Thank you very much.
> 
> On 01/13/2014 12:20 AM, NeilBrown wrote:
> > On Thu, 09 Jan 2014 09:03:37 +0100 Gionatan Danti <g.danti@assyoma.it> wrote:
> >
> >>>>
> >>>> Interesting. Two question:
> >>>> 1) from which kernel the layout is the one depicted by Wikipedia?
> >
> > Exactly what depiction in wikipedia are you referring to?  A link to the
> > image might help.
> >
> >>>> 2) it is possible, using mdadm, check what "far" layout is in use?
> >
> > mdadm --detail /dev/mdWHATEVER | grep Layout
> >
> >
> >>>
> >>> I cannot answer that. Neil Brown should know.
> >>>
> >>> Best regards
> >>> Keld
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >>
> >> Hi all,
> >> anyone with an update on these two questions?
> >>
> >> I was thinking to use the kernel block trace facility to track disk
> >> access and infer the on-disk data structure, but I haven't tried for now.
> >>
> >> On the other hand, I carefully looked at mdadm output, without finding
> >> anything related to physical block placing.
> >
> > Look for "Layout".
> >
> > NeilBrown
> >
> >
> >>
> >> Any new advices on that regard?
> >> Thanks.
> >>
> >
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2014-01-13  9:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-27 14:29 RAID 10 far and offset on-disk layouts Gionatan Danti
2013-12-27 14:46 ` Peter Grandi
2013-12-27 15:16 ` Gionatan Danti
2013-12-27 17:16   ` Peter Grandi
2013-12-27 17:32     ` Gionatan Danti
2013-12-27 18:26       ` keld
2013-12-27 15:19 ` keld
2013-12-27 15:22   ` Gionatan Danti
2013-12-27 15:49     ` keld
2014-01-09  8:03       ` Gionatan Danti
2014-01-12 23:20         ` NeilBrown
2014-01-13  8:52           ` Gionatan Danti
2014-01-13  9:45             ` NeilBrown [this message]
2014-01-13 10:15               ` Gionatan Danti
2014-01-13 22:27                 ` NeilBrown
2014-01-13 23:38                   ` keld
2014-01-14  0:46                     ` Stan Hoeppner
2014-01-14  9:38                       ` keld
2014-01-14  9:06                   ` Gionatan Danti
2014-01-14  9:16                     ` NeilBrown
2014-01-14  9:27                       ` Gionatan Danti
2014-01-14 10:06               ` keld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140113204534.737a98f6@notabene.brown \
    --to=neilb@suse.de \
    --cc=g.danti@assyoma.it \
    --cc=keld@keldix.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).