From: "Rémi Rérolle" <rrerolle@lacie.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: mdadm: can't removed failed/detached drives when using metadata 1.x
Date: Mon, 14 Feb 2011 15:05:25 +0100 [thread overview]
Message-ID: <4D5936A5.4010106@lacie.com> (raw)
In-Reply-To: <20110214142701.28950b00@notabene.brown>
Le 14/02/2011 04:27, NeilBrown a écrit :
> On Thu, 10 Feb 2011 16:28:12 +0100 Rémi Rérolle<rrerolle@lacie.com> wrote:
>
>> Hi Neil,
>>
>> I recently came across what I believe is a regression in mdadm, which
>> has been introduced in version 3.1.3.
>>
>> It seems that, when using metadata 1.x, the handling of failed/detached
>> drives isn't effective anymore.
>>
>> Here's a quick example:
>>
>> [root@GrosCinq ~]# mdadm -C /dev/md4 -l1 -n2 --metadata=1.0 /dev/sdc1
>> /dev/sdd1
>> mdadm: array /dev/md4 started.
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm --wait /dev/md4
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm -D /dev/md4
>> /dev/md4:
>> Version : 1.0
>> Creation Time : Thu Feb 10 13:56:31 2011
>> Raid Level : raid1
>> Array Size : 1953096 (1907.64 MiB 1999.97 MB)
>> Used Dev Size : 1953096 (1907.64 MiB 1999.97 MB)
>> Raid Devices : 2
>> Total Devices : 2
>> Persistence : Superblock is persistent
>>
>> Update Time : Thu Feb 10 13:56:46 2011
>> State : clean
>> Active Devices : 2
>> Working Devices : 2
>> Failed Devices : 0
>> Spare Devices : 0
>>
>> Name : GrosCinq:4 (local to host GrosCinq)
>> UUID : bbfef508:252e7ce1:c95d4a03:8beb3cbd
>> Events : 17
>>
>> Number Major Minor RaidDevice State
>> 0 8 1 0 active sync /dev/sdc1
>> 1 8 49 1 active sync /dev/sdd1
>>
>> [root@GrosCinq ~]# mdadm --fail /dev/md4 /dev/sdc1
>> mdadm: set /dev/sdc1 faulty in /dev/md4
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
>>
>> Number Major Minor RaidDevice State
>> 0 0 0 0 removed
>> 1 8 49 1 active sync /dev/sdd1
>>
>> 0 8 1 - faulty spare /dev/sdc1
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm --remove /dev/md4 failed
>> [root@GrosCinq ~]#
>> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
>>
>> Number Major Minor RaidDevice State
>> 0 0 0 0 removed
>> 1 8 49 1 active sync /dev/sdd1
>>
>> 0 8 1 - faulty spare /dev/sdc1
>> [root@GrosCinq ~]#
>>
>> This is with mdadm 3.1.4, 3.1.3 or even 3.2, but not 3.1.2. I did a git
>> bisect to try and isolate the regression and it appears the guilty
>> commit is :
>>
>> b3b4e8a : "Avoid skipping devices where removing all faulty/detached
>> devices."
>>
>> As stated in the commit, this is only true with metadata 1.x. With 0.9,
>> there is no problem. I also tested with detached drives as well as
>> raid5/6 and encountered the same issue. Actually, with detached drives,
>> it's even more annoying, since using --remove detached is the only way
>> to remove the device without restarting the array. For a failed drive,
>> there is still the possibility to use the device name.
>>
>> Do you have any idea of the reason behind that regression ? Shall this
>> patch only apply in the case of 0.9 metadata ?
>>
>> Regards,
>>
>
>
> Thanks for the report - especially for bitsecting it down to the erroneous
> commit!
>
> This patch should fix the regression. I'll ensure it is in all future
> releases.
>
Hi Neil,
I've tested your patch with the setup that was causing me trouble. It
did fix the regression.
Thanks!
Rémi
> Thanks,
> NeilBrown
>
>
> diff --git a/Manage.c b/Manage.c
> index 481c165..8c86a53 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -421,7 +421,7 @@ int Manage_subdevs(char *devname, int fd,
> dnprintable = dvname;
> break;
> }
> - if (jnext == 0)
> + if (next != dv)
> continue;
> } else if (strcmp(dv->devname, "detached") == 0) {
> if (dv->disposition != 'r'&& dv->disposition != 'f') {
> @@ -461,7 +461,7 @@ int Manage_subdevs(char *devname, int fd,
> dnprintable = dvname;
> break;
> }
> - if (jnext == 0)
> + if (next != dv)
> continue;
> } else if (strcmp(dv->devname, "missing") == 0) {
> if (dv->disposition != 'a' || dv->re_add == 0) {
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-02-14 14:05 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-10 15:28 mdadm: can't removed failed/detached drives when using metadata 1.x Rémi Rérolle
2011-02-14 3:27 ` NeilBrown
2011-02-14 14:05 ` Rémi Rérolle [this message]
2011-02-15 0:05 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D5936A5.4010106@lacie.com \
--to=rrerolle@lacie.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.