linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: "Rémi Rérolle" <rrerolle@lacie.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: mdadm: can't removed failed/detached drives when using metadata 1.x
Date: Mon, 14 Feb 2011 14:27:01 +1100	[thread overview]
Message-ID: <20110214142701.28950b00@notabene.brown> (raw)
In-Reply-To: <4D54040C.4040201@lacie.com>

On Thu, 10 Feb 2011 16:28:12 +0100 Rémi Rérolle <rrerolle@lacie.com> wrote:

> Hi Neil,
> 
> I recently came across what I believe is a regression in mdadm, which
> has been introduced in version 3.1.3.
> 
> It seems that, when using metadata 1.x, the handling of failed/detached
> drives isn't effective anymore.
> 
> Here's a quick example:
> 
> [root@GrosCinq ~]# mdadm -C /dev/md4 -l1 -n2 --metadata=1.0 /dev/sdc1 
> /dev/sdd1
> mdadm: array /dev/md4 started.
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm --wait /dev/md4
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm -D /dev/md4
> /dev/md4:
>          Version : 1.0
>    Creation Time : Thu Feb 10 13:56:31 2011
>       Raid Level : raid1
>       Array Size : 1953096 (1907.64 MiB 1999.97 MB)
>    Used Dev Size : 1953096 (1907.64 MiB 1999.97 MB)
>     Raid Devices : 2
>    Total Devices : 2
>      Persistence : Superblock is persistent
> 
>      Update Time : Thu Feb 10 13:56:46 2011
>            State : clean
>   Active Devices : 2
> Working Devices : 2
>   Failed Devices : 0
>    Spare Devices : 0
> 
>             Name : GrosCinq:4  (local to host GrosCinq)
>             UUID : bbfef508:252e7ce1:c95d4a03:8beb3cbd
>           Events : 17
> 
>      Number   Major   Minor   RaidDevice State
>         0       8        1        0      active sync   /dev/sdc1
>         1       8       49        1      active sync   /dev/sdd1
> 
> [root@GrosCinq ~]# mdadm --fail /dev/md4 /dev/sdc1
> mdadm: set /dev/sdc1 faulty in /dev/md4
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
> 
>      Number   Major   Minor   RaidDevice State
>         0       0        0        0      removed
>         1       8       49        1      active sync   /dev/sdd1
> 
>         0       8        1        -      faulty spare   /dev/sdc1
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm --remove /dev/md4 failed
> [root@GrosCinq ~]#
> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
> 
>      Number   Major   Minor   RaidDevice State
>         0       0        0        0      removed
>         1       8       49        1      active sync   /dev/sdd1
> 
>         0       8        1        -      faulty spare   /dev/sdc1
> [root@GrosCinq ~]#
> 
> This is with mdadm 3.1.4, 3.1.3 or even 3.2, but not 3.1.2. I did a git 
> bisect to try and isolate the regression and it appears the guilty 
> commit is :
> 
> b3b4e8a : "Avoid skipping devices where removing all faulty/detached
>             devices."
> 
> As stated in the commit, this is only true with metadata 1.x. With 0.9, 
> there is no problem. I also tested with detached drives as well as 
> raid5/6 and encountered the same issue. Actually, with detached drives, 
> it's even more annoying, since using --remove detached is the only way 
> to remove the device without restarting the array. For a failed drive, 
> there is still the possibility to use the device name.
> 
> Do you have any idea of the reason behind that regression ? Shall this 
> patch only apply in the case of 0.9 metadata ?
> 
> Regards,
> 


Thanks for the report - especially for bitsecting it down to the erroneous
commit!

This patch should fix the regression.  I'll ensure it is in all future
releases.

Thanks,
NeilBrown


diff --git a/Manage.c b/Manage.c
index 481c165..8c86a53 100644
--- a/Manage.c
+++ b/Manage.c
@@ -421,7 +421,7 @@ int Manage_subdevs(char *devname, int fd,
 				dnprintable = dvname;
 				break;
 			}
-			if (jnext == 0)
+			if (next != dv)
 				continue;
 		} else if (strcmp(dv->devname, "detached") == 0) {
 			if (dv->disposition != 'r' && dv->disposition != 'f') {
@@ -461,7 +461,7 @@ int Manage_subdevs(char *devname, int fd,
 				dnprintable = dvname;
 				break;
 			}
-			if (jnext == 0)
+			if (next != dv)
 				continue;
 		} else if (strcmp(dv->devname, "missing") == 0) {
 			if (dv->disposition != 'a' || dv->re_add == 0) {
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2011-02-14  3:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-10 15:28 mdadm: can't removed failed/detached drives when using metadata 1.x Rémi Rérolle
2011-02-14  3:27 ` NeilBrown [this message]
2011-02-14 14:05   ` Rémi Rérolle
2011-02-15  0:05     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110214142701.28950b00@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=rrerolle@lacie.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).