linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Doug Ledford <dledford@redhat.com>
Cc: "Labun, Marcin" <Marcin.Labun@intel.com>,
	Neil Brown <neilb@suse.de>,
	"Hawrylewicz Czarnowski,
	Przemyslaw" <przemyslaw.hawrylewicz.czarnowski@intel.com>,
	"Ciechanowski, Ed" <ed.ciechanowski@intel.com>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
	Bill Davidsen <davidsen@tmr.com>
Subject: Re: Auto Rebuild on hot-plug
Date: Mon, 29 Mar 2010 17:46:15 -0700	[thread overview]
Message-ID: <e9c3a7c21003291746o49191295r876c7fef50fe63a5@mail.gmail.com> (raw)
In-Reply-To: <4BB13830.8070709@redhat.com>

On Mon, Mar 29, 2010 at 4:30 PM, Doug Ledford <dledford@redhat.com> wrote:
> On 03/29/2010 05:36 PM, Dan Williams wrote:
>> I agree once you have a DOMAIN you implicitly have a spare-group.  So
>> DOMAIN would supersede the existing spare-group identifier in the
>> ARRAY line and cause mdadm --monitor to auto-migrate spares between
>> 0.90 and 1.x metadata arrays in the same DOMAIN.  For the imsm case
>> the expectation is that spares migrate between containers regardless
>> of the DOMAIN line as that is what the implementation expects.
>
> Give me some clearer explanation here because I think you and I are
> using terms differently and so I want to make sure I have things right.
>  My understanding of imsm raid containers is that all the drives that
> belong to a single option rom, as long as they aren't listed as jbod in
> the option rom setup, belong to the same container.

I think the disconnect in the imsm case is that the container to
DOMAIN relationship is N:1, not 1:1.  The mdadm notion of an
imsm-container correlates directly with a 'family' in the imsm
metadata.  The rules of a family are:

1/ All family members must be a member of all defined volumes.  For
example with a 4-drive container you could not simultaneously have a
4-drive (sd[abcd]) raid10 and a 2-drive (sd[ab]) raid1 volume because
any volume would need to incorporate all 4 disks.  Also, per the rules
if you create two raid1 volumes sd[ab] and sd[cd] those would show up
as two containers.

2/ A spare drive does not belong to any particular family
('family_number' is undefined for a spare).  The Windows driver will
automatically use a spare to fix any degraded family in the system.
In the mdadm/mdmon case since we break families into containers we
need a mechanism to migrate spare devices between containers because
they are equally valid hot spare candidate for any imsm container in
the system.

> That container is
> then split up into various chunks and that's where you get logical
> volumes.  I know there are odd rules for logical volumes inside a
> container, but I think those are mostly irrelevant to this discussion.
> So, when I think of a domain for imsm, I think of all the sata ports or
> sas ports under a single option rom.  From that perspective, spares can
> *not* move between domains as a spare on a sas port can't be added to a
> sata option rom container array.  I was under the impression that if you
> had, say, a 6 port sata controller option rom, you couldn't have the
> first three ports be one container and the next three ports be another
> container.  Is that impression wrong?

Yes, we can have exactly this situation.

This begs the question, why not change the definition of an imsm
container to incorporate anything with imsm metadata?  This definitely
would make spare management easier.  This was an early design decision
and had the nice side effect that it lined up naturally with the
failure and rebuild boundaries of a family.  I could give it more
thought, but right now I believe there is a lot riding on this 1:1
container-to-family relationship, and I would rather not go there.

> However, that just means (to me anyway) that I would treat all of the
> sata ports as one domain with multiple container arrays in that domain
> just like we can have multiple native md arrays in a domain.  If a disk
> dies and we hot plug a new one, then mdadm would look for the degraded
> container present in the domain and add the spare to it.  It would then
> be up to mdmon to determine what logical volumes are currently degraded
> and slice up the new drive to work as spares for those degraded logical
> volumes.  Does this sound correct to you, and can mdmon do that already
> or will this need to be added?

This sounds correct, and no mdmon cannot do this today.  The current
discussions we (Marcin and I) had with Neil offlist was extending
mdadm --monitor to handle spare migration for containers since it
already handles spare migration for native md arrays.  It will need
some mdmon coordination since mdmon is the only agent that can
disambiguate a spare from a stale device at any given point in time.

>> However this is where we get into questions of DOMAIN conflicting with
>> 'platform' expectations, under what conditions, if any, should DOMAIN
>> be allowed to conflict/override the platform constraint?  Currently
>> there is an environment variable IMSM_NO_PLATFORM, do we also need a
>> configuration op
>
> I'm not sure I would ever allow breaking valid platform limitations.  I
> think if you want to break platform limitations, then you need to use
> native md raid arrays and not imsm/ddf.  It seems to me that if you
> allow the creation of an imsm/ddf array that the BIOS can't work with
> then you've potentially opened an entire can of worms we don't want to
> open about expectations that the BIOS will be able to work with things
> but can't.  If you force native arrays as the only type that can break
> platform limitations, then you are at least perfectly clear with the
> user that the BIOS can't do what the user wants.

Agreed.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-03-30  0:46 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-25  0:35 Auto Rebuild on hot-plug Neil Brown
2010-03-25  2:47 ` Michael Evans
2010-03-31  1:18   ` Neil Brown
2010-03-31  2:46     ` Michael Evans
2010-03-25  8:01 ` Luca Berra
2010-03-31  1:26   ` Neil Brown
2010-03-31  6:10     ` Luca Berra
2010-03-25 14:10 ` John Robinson
2010-03-31  1:30   ` Neil Brown
2010-03-25 15:04 ` Labun, Marcin
2010-03-27  0:37   ` Dan Williams
2010-03-29 18:10     ` Doug Ledford
2010-03-29 18:36       ` John Robinson
2010-03-29 18:57         ` Doug Ledford
2010-03-29 22:36           ` John Robinson
2010-03-29 22:41             ` Dan Williams
2010-03-29 22:46               ` John Robinson
2010-03-29 23:35             ` Doug Ledford
2010-03-30 12:10               ` John Robinson
2010-03-30 15:53                 ` Doug Ledford
2010-04-02 11:01                   ` John Robinson
2010-03-29 21:36       ` Dan Williams
2010-03-29 23:30         ` Doug Ledford
2010-03-30  0:46           ` Dan Williams [this message]
2010-03-30 15:23             ` Doug Ledford
2010-03-30 17:47               ` Labun, Marcin
2010-03-30 23:47                 ` Dan Williams
2010-03-30 23:36               ` Dan Williams
2010-03-31  4:53               ` Neil Brown
2010-03-26  6:41 ` linbloke
2010-03-31  1:35   ` Neil Brown
2010-03-26  7:52 ` Majed B.
2010-03-31  1:42   ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9c3a7c21003291746o49191295r876c7fef50fe63a5@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=Marcin.Labun@intel.com \
    --cc=davidsen@tmr.com \
    --cc=dledford@redhat.com \
    --cc=ed.ciechanowski@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=przemyslaw.hawrylewicz.czarnowski@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).