Re: "failed" vs "removed" or "locked-out" state and --incremental auto-re-adding

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Doug Ledford <dledford@redhat.com>
To: Christian Gatzemeier <c.gatzemeier@tu-bs.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: &quot;failed&quot; vs &quot;removed&quot; or &quot;locked-out&quot; state and --incremental auto-re-adding
Date: Mon, 26 Apr 2010 19:15:51 -0400	[thread overview]
Message-ID: <4BD61EA7.5060609@redhat.com> (raw)
In-Reply-To: <loom.20100427T002749-963@post.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 3282 bytes --]

On 04/26/2010 06:28 PM, Christian Gatzemeier wrote:
> 
> As comming to "terms" working with mdadm took me a while, I'll add my current
> "translations" of the actions to the discussion:
> 
> 1) The "failed" state is the state a member that failed or is missing gets,
> while it can stay listed in mdstat.

Yes.  In particular failed simply means that the kernel no longer
considers it a running member of the array.  However, the kernel still
holds open the reference to the device (which means anything/anyone else
is still locked out from attempting to access the device, which prevents
anything bad from happening to the data that was on it when it failed).

> 2) To "unbind", "unlist" or "dismiss" a member from the md device stats is
> currently called to --remove it. In particular you can "unbind", "unlist" or
> "dismiss" failed or detatched members with --remove failed/detached.

You can use --remove failed/detached/<devname>, they all work.  But yes,
the underlying action here is to take an already failed device go ahead
and release all references to the device from the raid stack.  In
particular, this releases the exclusive open the raid stack holds on the
device and now makes the device available for other things to
open/modify.  At this point there is no longer any guarantee that the
device will not be modified from the pristine state it was in when it
failed.

> 3) A safe way to "lock-out" or "really remove" members from udev/--incremental
> assembly is not available yet AFAIK. (--zero-superblock on mirror members makes
> the md device content detectable/available directly)

This is a shortcoming of version 0.90/1.0 superblocks and raid1 arrays.
 For all other superblock versions and raid types, this is not true.
The default superblock version changed from 0.90 to 1.2 as of the mdadm
3.1 series and so this won't be a problem in the future.

> IMHO the ones mentioned first could seen as implied by those mentioned later.

No, and this is a safety feature.  We won't remove a good device in
order to prevent a typo from rendering an array dead.  Imagine that
/dev/sdd1 was already failed, and you typed mdadm /dev/md0 -r /dev/sdc1
and we just blindly failed and then removed sdc1, and assume the array
could only handle one failed member (aka, raid4 or raid5), you've just
rendered the array dead in the water.  We could ask questions I suppose,
but it's just as well off to require that a drive be failed before we
remove it.

> I am unclear why --incremental seems to require a device to be unbound first
> (--removed) in order to re-add it after it failed. IMHO it could do it itself if
> it is really necessary without bothering the user.

It would be kind of useless to put that support into incremental.
Incremental isn't really intended to be run from the command line
(although you can), it's intended to be done on hotplug events.  Those
hotplug events never happen when the device is failed but not removed
from an array, so it's a condition we don't need to handle.

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
	      http://people.redhat.com/dledford

Infiniband specific RPMs available at
	      http://people.redhat.com/dledford/Infiniband

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

next prev parent reply	other threads:[~2010-04-26 23:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-23 12:20 "failed" vs "removed" or "locked-out" state and --incremental auto-re-adding Christian Gatzemeier
2010-04-23 14:46 ` Phillip Susi
2010-04-26 22:28 ` &quot;failed&quot; vs &quot;removed&quot; or &quot;locked-out&quot; " Christian Gatzemeier
2010-04-26 23:15   ` Doug Ledford [this message]
2010-04-27 10:13     ` "failed" vs "released" and "locked-out" " Christian Gatzemeier
2010-04-27 15:45       ` Doug Ledford
2010-04-27 19:39         ` Christian Gatzemeier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BD61EA7.5060609@redhat.com \
    --to=dledford@redhat.com \
    --cc=c.gatzemeier@tu-bs.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).