From: NeilBrown <neilb@suse.de>
To: Asdo <asdo@shiftmail.org>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Some md/mdadm bugs
Date: Fri, 3 Feb 2012 08:17:17 +1100 [thread overview]
Message-ID: <20120203081717.195bfec8@notabene.brown> (raw)
In-Reply-To: <4F2ADF45.4040103@shiftmail.org>
[-- Attachment #1: Type: text/plain, Size: 4492 bytes --]
On Thu, 02 Feb 2012 20:08:53 +0100 Asdo <asdo@shiftmail.org> wrote:
> Hello list
>
> I removed sda from the system and I confirmed /dev/sda did not exist any
> more.
> After some time an I/O was issued to the array and sda6 was failed by MD
> in /dev/md5:
>
> md5 : active raid1 sdb6[2] sda6[0](F)
> 10485688 blocks super 1.0 [2/1] [_U]
> bitmap: 1/160 pages [4KB], 32KB chunk
>
> At this point I tried:
>
> mdadm /dev/md5 --remove detached
> --> no effect !
> mdadm /dev/md5 --remove failed
> --> no effect !
What version of mdadm? (mdadm --version).
These stopped working at one stage and were fixed in 3.1.5.
> mdadm /dev/md5 --remove /dev/sda6
> --> mdadm: cannot find /dev/sda6: No such file or directory (!!!)
> mdadm /dev/md5 --remove sda6
> --> finally worked ! (I don't know how I had the idea to actually try
> this...)
Well done.
>
>
> Then here is another array:
>
> md1 : active raid1 sda2[0] sdb2[2]
> 10485688 blocks super 1.0 [2/2] [UU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> This one did not even realize that sda was removed from the system long ago.
Nobody told it.
> Apparently only when an I/O is issued, mdadm realizes the drive is not
> there anymore.
Only when there is IO, or someone tells it.
> I am wondering (and this would be very serious) what happens if a new
> drives is inserted and it takes the /dev/sda identifier!? Would MD start
> writing or do any operation THERE!?
Wouldn't happen. As long as md hold onto the shell of the old sda nothing
else will get the name 'sda'.
>
> There is another problem...
> I tried to make MD realize that the drive is detached:
>
> mdadm /dev/md1 --fail detached
> --> no effect !
> however:
> ls /dev/sda2
> --> ls: cannot access /dev/sda2: No such file or directory
> so "detached" also seems broken...
Before 3.1.5 it was. If you are using a newer mdadm I'll need to look into
it.
>
>
>
> And here goes also a feature request:
>
> if a device is detached from the system, (echo 1 > device/delete or
> removing via hardware hot-swap + AHCI) MD should detect this situation
> and mark the device (and all its partitions) as failed in all arrays, or
> even remove the device completely from the RAID.
This needs to be done via a udev rule.
That is why --remove understands names like "sda6" (no /dev).
Then a device is removed, udev processes the remove notification.
The rule
ACTION=="remove", RUN+="/sbin/mdadm -If $name"
in /etc/udev/rules.d/something.rules
will make that happen.
> In my case I have verified that MD did not realize the device was
> removed from the system, and only much later when an I/O was issued to
> the disk, it would mark the device as failed in the RAID.
>
> After the above is implemented, it could be an idea to actually allow a
> new disk to take the place of a failed disk automatically if that would
> be a "re-add" (probably the same failed disk is being reinserted by the
> operator) and this even if the array is running, and especially if there
> is a bitmap.
It should so that, providing you have a udev rule like:
ACTION=="add", RUN+="/sbin/mdadm -I $tempnode"
You can even get it to add other devices as spares with e.g.
policy action=force-spare
though you almost certainly don't want that general a policy. You would
want to restrict that to certain ports (device paths).
> Now it doesn't happen:
> When I reinserted the disk, udev triggered the --incremental, to
> reinsert the device, but mdadm refused to do anything because the old
> slot was still occupied with a failed+detached device. I manually
> removed the device from the raid then I ran --incremental, but mdadm
> still refused to re-add the device to the RAID because the array was
> running. I think that if it is a re-add, and especially if the bitmap is
> active, I can't think of a situation in which the user would *not* want
> to do an incremental re-add even if the array is running.
Hmmm.. that doesn't seem right. What version of mdadm are you running?
Maybe a newer one would get this right.
Thanks for the reports.
NeilBrown
>
> Thank you
> Asdo
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-02-02 21:17 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-02 19:08 Some md/mdadm bugs Asdo
2012-02-02 21:17 ` NeilBrown [this message]
2012-02-02 22:58 ` Asdo
2012-02-06 16:59 ` Joel
2012-02-06 18:47 ` Asdo
2012-02-06 18:50 ` Joel
2012-02-06 17:07 ` Asdo
2012-02-06 18:47 ` Asdo
2012-02-06 22:31 ` NeilBrown
2012-02-07 17:13 ` Asdo
2012-02-09 0:55 ` NeilBrown
2012-02-06 22:20 ` NeilBrown
2012-02-07 17:47 ` Asdo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120203081717.195bfec8@notabene.brown \
--to=neilb@suse.de \
--cc=asdo@shiftmail.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.