From: Asdo <asdo@shiftmail.org>
To: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: 4 out of 16 drives show up as 'removed'
Date: Fri, 09 Dec 2011 17:40:54 +0100 [thread overview]
Message-ID: <4EE23A16.6060102@shiftmail.org> (raw)
In-Reply-To: <20111209142006.7ad399f9@notabene.brown>
Dear Neil,
this issue went OK for the OP (and thanks for your continuous support),
however, exactly this situation is my worst nightmare regarding MD RAID.
It seems to me that MD has no mechanism to safeguard the situation of a
disconnecting cable (holding multiple drives) and I think this could
cause major puzzlement of the user potentially followed by major data loss.
I think it should be possible in line of principle to implement a
mechanism that discriminates between cable disconnects (on multiple
drives) and a failed single drive:
The technique would be:
BEFORE failing a drive with any symptom that *could* be caused by a
cable disconnect, (maybe wait a couple of seconds and then) perform a
read and/or a write (not cached, mandatorily from the platters, and sync
in case of write) from each of the drives of the array. If multiple
drives which were believed to be working, do not respond to such
read/write command, then assume a cable to be disconnected and either
block the array (is there a blocked state like for other linux
blockdevices? if not it should be implemented) or set it as read-only.
Or worst case, disassemble the array. But DO NOT proceed failing the
drive. OTOH if all other drives respond correctly, assume it's not a
cable problem and, go ahead failing the drive which was supposed to be
failed.
The current behaviour is not good because MD will start declaring all
the failed drives onto the metadatas of the good drives, before
discovering that there are so many failed drives that the array cannot
be kept running at all.
So you end up with a down array, but which has also an inconsistent
state (I think writes could have been performed between the first
discovered failure and the last discovered failure so the array would
indeed be inconsistent) and also does not cleanly assemble anymore.
Thank you
next prev parent reply other threads:[~2011-12-09 16:40 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-07 20:42 4 out of 16 drives show up as 'removed' Eli Morris
2011-12-07 20:51 ` Mathias Burén
2011-12-07 20:57 ` NeilBrown
2011-12-07 22:00 ` Eli Morris
2011-12-07 22:16 ` NeilBrown
2011-12-07 23:42 ` Eli Morris
2011-12-08 19:17 ` Eli Morris
2011-12-08 19:51 ` NeilBrown
2011-12-08 20:39 ` Eli Morris
2011-12-08 20:59 ` NeilBrown
2011-12-08 21:42 ` Eli Morris
2011-12-08 22:50 ` NeilBrown
2011-12-08 23:03 ` Eli Morris
2011-12-09 3:20 ` NeilBrown
2011-12-09 6:58 ` Eli Morris
2011-12-09 15:31 ` John Stoffel
2011-12-09 16:40 ` Asdo [this message]
2011-12-09 19:38 ` Stan Hoeppner
2011-12-09 22:07 ` Eli Morris
2011-12-10 2:29 ` Stan Hoeppner
2011-12-10 4:57 ` Eli Morris
2011-12-11 1:15 ` Stan Hoeppner
2011-12-10 17:28 ` wilsonjonathan
2011-12-10 17:43 ` wilsonjonathan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EE23A16.6060102@shiftmail.org \
--to=asdo@shiftmail.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox