* mdadm: can't removed failed/detached drives when using metadata 1.x
@ 2011-02-10 15:28 Rémi Rérolle
2011-02-14 3:27 ` NeilBrown
0 siblings, 1 reply; 4+ messages in thread
From: Rémi Rérolle @ 2011-02-10 15:28 UTC (permalink / raw)
To: neilb; +Cc: linux-raid
Hi Neil,
I recently came across what I believe is a regression in mdadm, which
has been introduced in version 3.1.3.
It seems that, when using metadata 1.x, the handling of failed/detached
drives isn't effective anymore.
Here's a quick example:
[root@GrosCinq ~]# mdadm -C /dev/md4 -l1 -n2 --metadata=1.0 /dev/sdc1
/dev/sdd1
mdadm: array /dev/md4 started.
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm --wait /dev/md4
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm -D /dev/md4
/dev/md4:
Version : 1.0
Creation Time : Thu Feb 10 13:56:31 2011
Raid Level : raid1
Array Size : 1953096 (1907.64 MiB 1999.97 MB)
Used Dev Size : 1953096 (1907.64 MiB 1999.97 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Thu Feb 10 13:56:46 2011
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : GrosCinq:4 (local to host GrosCinq)
UUID : bbfef508:252e7ce1:c95d4a03:8beb3cbd
Events : 17
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
[root@GrosCinq ~]# mdadm --fail /dev/md4 /dev/sdc1
mdadm: set /dev/sdc1 faulty in /dev/md4
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 49 1 active sync /dev/sdd1
0 8 1 - faulty spare /dev/sdc1
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm --remove /dev/md4 failed
[root@GrosCinq ~]#
[root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 49 1 active sync /dev/sdd1
0 8 1 - faulty spare /dev/sdc1
[root@GrosCinq ~]#
This is with mdadm 3.1.4, 3.1.3 or even 3.2, but not 3.1.2. I did a git
bisect to try and isolate the regression and it appears the guilty
commit is :
b3b4e8a : "Avoid skipping devices where removing all faulty/detached
devices."
As stated in the commit, this is only true with metadata 1.x. With 0.9,
there is no problem. I also tested with detached drives as well as
raid5/6 and encountered the same issue. Actually, with detached drives,
it's even more annoying, since using --remove detached is the only way
to remove the device without restarting the array. For a failed drive,
there is still the possibility to use the device name.
Do you have any idea of the reason behind that regression ? Shall this
patch only apply in the case of 0.9 metadata ?
Regards,
--
Rémi
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: mdadm: can't removed failed/detached drives when using metadata 1.x 2011-02-10 15:28 mdadm: can't removed failed/detached drives when using metadata 1.x Rémi Rérolle @ 2011-02-14 3:27 ` NeilBrown 2011-02-14 14:05 ` Rémi Rérolle 0 siblings, 1 reply; 4+ messages in thread From: NeilBrown @ 2011-02-14 3:27 UTC (permalink / raw) To: Rémi Rérolle; +Cc: linux-raid On Thu, 10 Feb 2011 16:28:12 +0100 Rémi Rérolle <rrerolle@lacie.com> wrote: > Hi Neil, > > I recently came across what I believe is a regression in mdadm, which > has been introduced in version 3.1.3. > > It seems that, when using metadata 1.x, the handling of failed/detached > drives isn't effective anymore. > > Here's a quick example: > > [root@GrosCinq ~]# mdadm -C /dev/md4 -l1 -n2 --metadata=1.0 /dev/sdc1 > /dev/sdd1 > mdadm: array /dev/md4 started. > [root@GrosCinq ~]# > [root@GrosCinq ~]# mdadm --wait /dev/md4 > [root@GrosCinq ~]# > [root@GrosCinq ~]# mdadm -D /dev/md4 > /dev/md4: > Version : 1.0 > Creation Time : Thu Feb 10 13:56:31 2011 > Raid Level : raid1 > Array Size : 1953096 (1907.64 MiB 1999.97 MB) > Used Dev Size : 1953096 (1907.64 MiB 1999.97 MB) > Raid Devices : 2 > Total Devices : 2 > Persistence : Superblock is persistent > > Update Time : Thu Feb 10 13:56:46 2011 > State : clean > Active Devices : 2 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 0 > > Name : GrosCinq:4 (local to host GrosCinq) > UUID : bbfef508:252e7ce1:c95d4a03:8beb3cbd > Events : 17 > > Number Major Minor RaidDevice State > 0 8 1 0 active sync /dev/sdc1 > 1 8 49 1 active sync /dev/sdd1 > > [root@GrosCinq ~]# mdadm --fail /dev/md4 /dev/sdc1 > mdadm: set /dev/sdc1 faulty in /dev/md4 > [root@GrosCinq ~]# > [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6 > > Number Major Minor RaidDevice State > 0 0 0 0 removed > 1 8 49 1 active sync /dev/sdd1 > > 0 8 1 - faulty spare /dev/sdc1 > [root@GrosCinq ~]# > [root@GrosCinq ~]# mdadm --remove /dev/md4 failed > [root@GrosCinq ~]# > [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6 > > Number Major Minor RaidDevice State > 0 0 0 0 removed > 1 8 49 1 active sync /dev/sdd1 > > 0 8 1 - faulty spare /dev/sdc1 > [root@GrosCinq ~]# > > This is with mdadm 3.1.4, 3.1.3 or even 3.2, but not 3.1.2. I did a git > bisect to try and isolate the regression and it appears the guilty > commit is : > > b3b4e8a : "Avoid skipping devices where removing all faulty/detached > devices." > > As stated in the commit, this is only true with metadata 1.x. With 0.9, > there is no problem. I also tested with detached drives as well as > raid5/6 and encountered the same issue. Actually, with detached drives, > it's even more annoying, since using --remove detached is the only way > to remove the device without restarting the array. For a failed drive, > there is still the possibility to use the device name. > > Do you have any idea of the reason behind that regression ? Shall this > patch only apply in the case of 0.9 metadata ? > > Regards, > Thanks for the report - especially for bitsecting it down to the erroneous commit! This patch should fix the regression. I'll ensure it is in all future releases. Thanks, NeilBrown diff --git a/Manage.c b/Manage.c index 481c165..8c86a53 100644 --- a/Manage.c +++ b/Manage.c @@ -421,7 +421,7 @@ int Manage_subdevs(char *devname, int fd, dnprintable = dvname; break; } - if (jnext == 0) + if (next != dv) continue; } else if (strcmp(dv->devname, "detached") == 0) { if (dv->disposition != 'r' && dv->disposition != 'f') { @@ -461,7 +461,7 @@ int Manage_subdevs(char *devname, int fd, dnprintable = dvname; break; } - if (jnext == 0) + if (next != dv) continue; } else if (strcmp(dv->devname, "missing") == 0) { if (dv->disposition != 'a' || dv->re_add == 0) { -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: mdadm: can't removed failed/detached drives when using metadata 1.x 2011-02-14 3:27 ` NeilBrown @ 2011-02-14 14:05 ` Rémi Rérolle 2011-02-15 0:05 ` NeilBrown 0 siblings, 1 reply; 4+ messages in thread From: Rémi Rérolle @ 2011-02-14 14:05 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Le 14/02/2011 04:27, NeilBrown a écrit : > On Thu, 10 Feb 2011 16:28:12 +0100 Rémi Rérolle<rrerolle@lacie.com> wrote: > >> Hi Neil, >> >> I recently came across what I believe is a regression in mdadm, which >> has been introduced in version 3.1.3. >> >> It seems that, when using metadata 1.x, the handling of failed/detached >> drives isn't effective anymore. >> >> Here's a quick example: >> >> [root@GrosCinq ~]# mdadm -C /dev/md4 -l1 -n2 --metadata=1.0 /dev/sdc1 >> /dev/sdd1 >> mdadm: array /dev/md4 started. >> [root@GrosCinq ~]# >> [root@GrosCinq ~]# mdadm --wait /dev/md4 >> [root@GrosCinq ~]# >> [root@GrosCinq ~]# mdadm -D /dev/md4 >> /dev/md4: >> Version : 1.0 >> Creation Time : Thu Feb 10 13:56:31 2011 >> Raid Level : raid1 >> Array Size : 1953096 (1907.64 MiB 1999.97 MB) >> Used Dev Size : 1953096 (1907.64 MiB 1999.97 MB) >> Raid Devices : 2 >> Total Devices : 2 >> Persistence : Superblock is persistent >> >> Update Time : Thu Feb 10 13:56:46 2011 >> State : clean >> Active Devices : 2 >> Working Devices : 2 >> Failed Devices : 0 >> Spare Devices : 0 >> >> Name : GrosCinq:4 (local to host GrosCinq) >> UUID : bbfef508:252e7ce1:c95d4a03:8beb3cbd >> Events : 17 >> >> Number Major Minor RaidDevice State >> 0 8 1 0 active sync /dev/sdc1 >> 1 8 49 1 active sync /dev/sdd1 >> >> [root@GrosCinq ~]# mdadm --fail /dev/md4 /dev/sdc1 >> mdadm: set /dev/sdc1 faulty in /dev/md4 >> [root@GrosCinq ~]# >> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6 >> >> Number Major Minor RaidDevice State >> 0 0 0 0 removed >> 1 8 49 1 active sync /dev/sdd1 >> >> 0 8 1 - faulty spare /dev/sdc1 >> [root@GrosCinq ~]# >> [root@GrosCinq ~]# mdadm --remove /dev/md4 failed >> [root@GrosCinq ~]# >> [root@GrosCinq ~]# mdadm -D /dev/md4 | tail -n 6 >> >> Number Major Minor RaidDevice State >> 0 0 0 0 removed >> 1 8 49 1 active sync /dev/sdd1 >> >> 0 8 1 - faulty spare /dev/sdc1 >> [root@GrosCinq ~]# >> >> This is with mdadm 3.1.4, 3.1.3 or even 3.2, but not 3.1.2. I did a git >> bisect to try and isolate the regression and it appears the guilty >> commit is : >> >> b3b4e8a : "Avoid skipping devices where removing all faulty/detached >> devices." >> >> As stated in the commit, this is only true with metadata 1.x. With 0.9, >> there is no problem. I also tested with detached drives as well as >> raid5/6 and encountered the same issue. Actually, with detached drives, >> it's even more annoying, since using --remove detached is the only way >> to remove the device without restarting the array. For a failed drive, >> there is still the possibility to use the device name. >> >> Do you have any idea of the reason behind that regression ? Shall this >> patch only apply in the case of 0.9 metadata ? >> >> Regards, >> > > > Thanks for the report - especially for bitsecting it down to the erroneous > commit! > > This patch should fix the regression. I'll ensure it is in all future > releases. > Hi Neil, I've tested your patch with the setup that was causing me trouble. It did fix the regression. Thanks! Rémi > Thanks, > NeilBrown > > > diff --git a/Manage.c b/Manage.c > index 481c165..8c86a53 100644 > --- a/Manage.c > +++ b/Manage.c > @@ -421,7 +421,7 @@ int Manage_subdevs(char *devname, int fd, > dnprintable = dvname; > break; > } > - if (jnext == 0) > + if (next != dv) > continue; > } else if (strcmp(dv->devname, "detached") == 0) { > if (dv->disposition != 'r'&& dv->disposition != 'f') { > @@ -461,7 +461,7 @@ int Manage_subdevs(char *devname, int fd, > dnprintable = dvname; > break; > } > - if (jnext == 0) > + if (next != dv) > continue; > } else if (strcmp(dv->devname, "missing") == 0) { > if (dv->disposition != 'a' || dv->re_add == 0) { > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: mdadm: can't removed failed/detached drives when using metadata 1.x 2011-02-14 14:05 ` Rémi Rérolle @ 2011-02-15 0:05 ` NeilBrown 0 siblings, 0 replies; 4+ messages in thread From: NeilBrown @ 2011-02-15 0:05 UTC (permalink / raw) To: Rémi Rérolle; +Cc: linux-raid On Mon, 14 Feb 2011 15:05:25 +0100 Rémi Rérolle <rrerolle@lacie.com> wrote: > > Thanks for the report - especially for bitsecting it down to the erroneous > > commit! > > > > This patch should fix the regression. I'll ensure it is in all future > > releases. > > > > Hi Neil, > > I've tested your patch with the setup that was causing me trouble. It > did fix the regression. > Great - thanks for the confirmation. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-02-15 0:05 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-10 15:28 mdadm: can't removed failed/detached drives when using metadata 1.x Rémi Rérolle 2011-02-14 3:27 ` NeilBrown 2011-02-14 14:05 ` Rémi Rérolle 2011-02-15 0:05 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).