Re: Last working drive in RAID1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Eric Mei <meijia@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Last working drive in RAID1
Date: Thu, 5 Mar 2015 10:26:22 +1100	[thread overview]
Message-ID: <20150305102622.016ec792@notabene.brown> (raw)
In-Reply-To: <54F78BD9.403@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1812 bytes --]

On Wed, 04 Mar 2015 15:48:57 -0700 Eric Mei <meijia@gmail.com> wrote:

> Hi Neil,
> 
> I see, that does make sense. Thank you.
> 
> But it impose a problem for HA. We have 2 nodes as active-standby pair, 
> if HW on node 1 have problem (e.g. SAS cable get pulled, thus all access 
> to physical drives are gone), we hope the array failover to node 2. But 
> with lingering drive reference, mdadm will report array is still alive 
> thus failover won't happen.
> 
> I guess it depends on what kind of error on the drive. If it's just a 
> media error we should keep it online as much as possible. But if the 
> drive is really bad or physically gone, keeping the stale reference 
> won't help anything. Back to your comparison with single drive /dev/sda, 
> I think MD as an array should do the same as /dev/sda, not the 
> individual drive inside MD, for them we should just let it go. How do 
> you think?

If there were some what that md could be told that the device really was gone
and just just returning errors, then I would be OK with it being marked as
faulty and being removed from the array.

I don't think there is any mechanism in the kernel to allow that.  It would
be easiest to capture a "REMOVE" event via udev, and have udev run "mdadm" to
tell the md array that the device was gone.

Currently there is no way to do that ... I guess we could change raid1 so
that a 'fail' event that came from user-space  would always cause the device
to be marked failed, even when an IO error would not...
To preserve current behaviour, it should require something like "faulty-force"
to be written to the "state" file.   We would need to check that raid1 copes
with having zero working drives - currently it might always assume there is
at least one device.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

next prev parent reply	other threads:[~2015-03-04 23:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-04 19:55 Last working drive in RAID1 Eric Mei
2015-03-04 21:46 ` NeilBrown
2015-03-04 22:48   ` Eric Mei
2015-03-04 23:26     ` NeilBrown [this message]
2015-03-05 15:55       ` Wols Lists
2015-03-05 19:54         ` Eric Mei
2015-03-05 20:00         ` Phil Turmel
2015-03-05 21:52           ` NeilBrown
2015-03-06  9:21             ` Chris
2015-03-05 21:54           ` Chris
2015-03-05 20:23       ` Eric Mei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150305102622.016ec792@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=meijia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.