* mdadm --wait returns while array under construction?
@ 2012-11-20 17:55 Ross Boylan
2012-11-20 18:22 ` Ross Boylan
2012-11-20 21:43 ` NeilBrown
0 siblings, 2 replies; 9+ messages in thread
From: Ross Boylan @ 2012-11-20 17:55 UTC (permalink / raw)
To: linux-raid; +Cc: ross
While switching the disks a RAID 1 is based on I used the --wait command
to wait for the rebuild to finish. It returned immediately, but a
subsequent query showed it had not been rebuilt. Have I misunderstood
something, or is this an error?
While doing these commands a much larger rebuild was going on with a
different array, involving some of the same physical disks but different
partitions. The partitions being rebuilt are on different physical
disks for the different arrays.
Here are the logs, with version info at the end (Debian Lenny + more
recent kernel):
markov:~# date; mdadm --detail /dev/md0
Tue Nov 20 09:37:07 PST 2012
/dev/md0:
Version : 00.90
Creation Time : Mon Dec 15 06:49:51 2008
Raid Level : raid1
Array Size : 96256 (94.02 MiB 98.57 MB)
Used Dev Size : 96256 (94.02 MiB 98.57 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Nov 20 07:41:04 2012
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 313d5489:7869305b:5b5da825:51e3856c
Events : 0.1602
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 33 1 active sync /dev/sdc1
markov:~# date; mdadm --fail /dev/md0 /dev/sdc1
Tue Nov 20 09:37:58 PST 2012
mdadm: set /dev/sdc1 faulty in /dev/md0
markov:~# date; mdadm --add /dev/md0 /dev/sdd2
Tue Nov 20 09:39:05 PST 2012
mdadm: added /dev/sdd2
markov:~# date; mdadm --detail /dev/md0
Tue Nov 20 09:39:14 PST 2012
/dev/md0:
Version : 00.90
Creation Time : Mon Dec 15 06:49:51 2008
Raid Level : raid1
Array Size : 96256 (94.02 MiB 98.57 MB)
Used Dev Size : 96256 (94.02 MiB 98.57 MB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Nov 20 09:39:05 2012
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
UUID : 313d5489:7869305b:5b5da825:51e3856c
Events : 0.1606
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
2 8 50 1 spare rebuilding /dev/sdd2
3 8 33 - faulty spare /dev/sdc1
markov:~# time mdadm --wait /dev/md0; date
real 0m0.002s
user 0m0.000s
sys 0m0.004s
Tue Nov 20 09:40:07 PST 2012
markov:~# date; mdadm --detail /dev/md0
Tue Nov 20 09:40:20 PST 2012
/dev/md0:
Version : 00.90
Creation Time : Mon Dec 15 06:49:51 2008
Raid Level : raid1
Array Size : 96256 (94.02 MiB 98.57 MB)
Used Dev Size : 96256 (94.02 MiB 98.57 MB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Nov 20 09:39:15 2012
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
UUID : 313d5489:7869305b:5b5da825:51e3856c
Events : 0.1608
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
2 8 50 1 spare rebuilding /dev/sdd2
3 8 33 - faulty spare /dev/sdc1
markov:~# uname -a
Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
markov:~# mdadm --version
mdadm - v2.6.7.2 - 14th November 2008
I notice that in this case, unlike the other array, the message during
the rebuild (the last detail report) does not include a line like
Rebuild Status : 0% complete
I just tried --wait again to see if there was some kind of race, but
once again it returned immediately, though detail says the spare is
rebuilding.
Ross Boylan
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: mdadm --wait returns while array under construction? 2012-11-20 17:55 mdadm --wait returns while array under construction? Ross Boylan @ 2012-11-20 18:22 ` Ross Boylan 2012-11-20 21:43 ` NeilBrown 1 sibling, 0 replies; 9+ messages in thread From: Ross Boylan @ 2012-11-20 18:22 UTC (permalink / raw) To: linux-raid; +Cc: ross On Tue, 2012-11-20 at 09:55 -0800, Ross Boylan wrote: > While switching the disks a RAID 1 is based on I used the --wait command > to wait for the rebuild to finish. It returned immediately, but a > subsequent query showed it had not been rebuilt. Have I misunderstood > something, or is this an error? This message from the logs seems relevant: md: delaying recovery of md0 until md1 has finished (they share one or more physical units) It's still not the behavior I'd expect from --wait. md0 and md1 are based on partitions, which are completely distinct; however they do use the sam physical disks. Ross > > While doing these commands a much larger rebuild was going on with a > different array, involving some of the same physical disks but different > partitions. The partitions being rebuilt are on different physical > disks for the different arrays. > > Here are the logs, with version info at the end (Debian Lenny + more > recent kernel): > markov:~# date; mdadm --detail /dev/md0 > Tue Nov 20 09:37:07 PST 2012 > /dev/md0: > Version : 00.90 > Creation Time : Mon Dec 15 06:49:51 2008 > Raid Level : raid1 > Array Size : 96256 (94.02 MiB 98.57 MB) > Used Dev Size : 96256 (94.02 MiB 98.57 MB) > Raid Devices : 2 > Total Devices : 2 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Tue Nov 20 07:41:04 2012 > State : clean > Active Devices : 2 > Working Devices : 2 > Failed Devices : 0 > Spare Devices : 0 > > UUID : 313d5489:7869305b:5b5da825:51e3856c > Events : 0.1602 > > Number Major Minor RaidDevice State > 0 8 1 0 active sync /dev/sda1 > 1 8 33 1 active sync /dev/sdc1 > markov:~# date; mdadm --fail /dev/md0 /dev/sdc1 > Tue Nov 20 09:37:58 PST 2012 > mdadm: set /dev/sdc1 faulty in /dev/md0 > markov:~# date; mdadm --add /dev/md0 /dev/sdd2 > Tue Nov 20 09:39:05 PST 2012 > mdadm: added /dev/sdd2 > markov:~# date; mdadm --detail /dev/md0 > Tue Nov 20 09:39:14 PST 2012 > /dev/md0: > Version : 00.90 > Creation Time : Mon Dec 15 06:49:51 2008 > Raid Level : raid1 > Array Size : 96256 (94.02 MiB 98.57 MB) > Used Dev Size : 96256 (94.02 MiB 98.57 MB) > Raid Devices : 2 > Total Devices : 3 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Tue Nov 20 09:39:05 2012 > State : clean, degraded > Active Devices : 1 > Working Devices : 2 > Failed Devices : 1 > Spare Devices : 1 > > UUID : 313d5489:7869305b:5b5da825:51e3856c > Events : 0.1606 > > Number Major Minor RaidDevice State > 0 8 1 0 active sync /dev/sda1 > 2 8 50 1 spare rebuilding /dev/sdd2 > > 3 8 33 - faulty spare /dev/sdc1 > markov:~# time mdadm --wait /dev/md0; date > > real 0m0.002s > user 0m0.000s > sys 0m0.004s > Tue Nov 20 09:40:07 PST 2012 > markov:~# date; mdadm --detail /dev/md0 > Tue Nov 20 09:40:20 PST 2012 > /dev/md0: > Version : 00.90 > Creation Time : Mon Dec 15 06:49:51 2008 > Raid Level : raid1 > Array Size : 96256 (94.02 MiB 98.57 MB) > Used Dev Size : 96256 (94.02 MiB 98.57 MB) > Raid Devices : 2 > Total Devices : 3 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Tue Nov 20 09:39:15 2012 > State : clean, degraded > Active Devices : 1 > Working Devices : 2 > Failed Devices : 1 > Spare Devices : 1 > > UUID : 313d5489:7869305b:5b5da825:51e3856c > Events : 0.1608 > > Number Major Minor RaidDevice State > 0 8 1 0 active sync /dev/sda1 > 2 8 50 1 spare rebuilding /dev/sdd2 > > 3 8 33 - faulty spare /dev/sdc1 > markov:~# uname -a > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux > markov:~# mdadm --version > mdadm - v2.6.7.2 - 14th November 2008 > > > I notice that in this case, unlike the other array, the message during > the rebuild (the last detail report) does not include a line like > Rebuild Status : 0% complete > > I just tried --wait again to see if there was some kind of race, but > once again it returned immediately, though detail says the spare is > rebuilding. > > Ross Boylan > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mdadm --wait returns while array under construction? 2012-11-20 17:55 mdadm --wait returns while array under construction? Ross Boylan 2012-11-20 18:22 ` Ross Boylan @ 2012-11-20 21:43 ` NeilBrown 2012-11-21 16:43 ` Ross Boylan 2012-11-27 18:28 ` mdadm --wait returns while array under construction? [patch question] Ross Boylan 1 sibling, 2 replies; 9+ messages in thread From: NeilBrown @ 2012-11-20 21:43 UTC (permalink / raw) To: Ross Boylan; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1704 bytes --] On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > While switching the disks a RAID 1 is based on I used the --wait command > to wait for the rebuild to finish. It returned immediately, but a > subsequent query showed it had not been rebuilt. Have I misunderstood > something, or is this an error? > > While doing these commands a much larger rebuild was going on with a > different array, involving some of the same physical disks but different > partitions. The partitions being rebuilt are on different physical > disks for the different arrays. > > Here are the logs, with version info at the end (Debian Lenny + more > recent kernel): .... > markov:~# uname -a > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux > markov:~# mdadm --version > mdadm - v2.6.7.2 - 14th November 2008 > > > I notice that in this case, unlike the other array, the message during > the rebuild (the last detail report) does not include a line like > Rebuild Status : 0% complete > > I just tried --wait again to see if there was some kind of race, but > once again it returned immediately, though detail says the spare is > rebuilding. Can you test this patch to see if it fixes the problem? diff --git a/Monitor.c b/Monitor.c index c4d57c3..a5e7aaa 100644 --- a/Monitor.c +++ b/Monitor.c @@ -973,7 +973,7 @@ int Wait(char *dev) if (e->devnum == devnum) break; - if (!e || e->percent < 0) { + if (!e || e->percent == RESYNC_NONE) { if (e && e->metadata_version && strncmp(e->metadata_version, "external:", 9) == 0) { if (is_subarray(&e->metadata_version[9])) NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: mdadm --wait returns while array under construction? 2012-11-20 21:43 ` NeilBrown @ 2012-11-21 16:43 ` Ross Boylan 2012-11-22 6:09 ` NeilBrown 2012-11-27 18:28 ` mdadm --wait returns while array under construction? [patch question] Ross Boylan 1 sibling, 1 reply; 9+ messages in thread From: Ross Boylan @ 2012-11-21 16:43 UTC (permalink / raw) To: NeilBrown; +Cc: ross, linux-raid On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote: > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > While switching the disks a RAID 1 is based on I used the --wait command > > to wait for the rebuild to finish. It returned immediately, but a > > subsequent query showed it had not been rebuilt. Have I misunderstood > > something, or is this an error? > > > > While doing these commands a much larger rebuild was going on with a > > different array, involving some of the same physical disks but different > > partitions. The partitions being rebuilt are on different physical > > disks for the different arrays. > > > > Here are the logs, with version info at the end (Debian Lenny + more > > recent kernel): > .... > > > markov:~# uname -a > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux > > markov:~# mdadm --version > > mdadm - v2.6.7.2 - 14th November 2008 > > > > > > I notice that in this case, unlike the other array, the message during > > the rebuild (the last detail report) does not include a line like > > Rebuild Status : 0% complete > > > > I just tried --wait again to see if there was some kind of race, but > > once again it returned immediately, though detail says the spare is > > rebuilding. > > Can you test this patch to see if it fixes the problem? > > diff --git a/Monitor.c b/Monitor.c > index c4d57c3..a5e7aaa 100644 > --- a/Monitor.c > +++ b/Monitor.c > @@ -973,7 +973,7 @@ int Wait(char *dev) > if (e->devnum == devnum) > break; > > - if (!e || e->percent < 0) { > + if (!e || e->percent == RESYNC_NONE) { > if (e && e->metadata_version && > strncmp(e->metadata_version, "external:", 9) == 0) { > if (is_subarray(&e->metadata_version[9])) > > > NeilBrown Thanks for the patch. I take it the current behavior is expected, if undesirable? I'll try to apply it, but I'm in the middle of several system upgrades and I may have trouble getting the source for the current system, since it is out of date. I spent most of yesterday dealing with various RAID problems, which I will detail in a separate message. Thanks. Ross ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mdadm --wait returns while array under construction? 2012-11-21 16:43 ` Ross Boylan @ 2012-11-22 6:09 ` NeilBrown 0 siblings, 0 replies; 9+ messages in thread From: NeilBrown @ 2012-11-22 6:09 UTC (permalink / raw) To: Ross Boylan; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 2559 bytes --] On Wed, 21 Nov 2012 08:43:02 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote: > > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > > > While switching the disks a RAID 1 is based on I used the --wait command > > > to wait for the rebuild to finish. It returned immediately, but a > > > subsequent query showed it had not been rebuilt. Have I misunderstood > > > something, or is this an error? > > > > > > While doing these commands a much larger rebuild was going on with a > > > different array, involving some of the same physical disks but different > > > partitions. The partitions being rebuilt are on different physical > > > disks for the different arrays. > > > > > > Here are the logs, with version info at the end (Debian Lenny + more > > > recent kernel): > > .... > > > > > markov:~# uname -a > > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux > > > markov:~# mdadm --version > > > mdadm - v2.6.7.2 - 14th November 2008 > > > > > > > > > I notice that in this case, unlike the other array, the message during > > > the rebuild (the last detail report) does not include a line like > > > Rebuild Status : 0% complete > > > > > > I just tried --wait again to see if there was some kind of race, but > > > once again it returned immediately, though detail says the spare is > > > rebuilding. > > > > Can you test this patch to see if it fixes the problem? > > > > diff --git a/Monitor.c b/Monitor.c > > index c4d57c3..a5e7aaa 100644 > > --- a/Monitor.c > > +++ b/Monitor.c > > @@ -973,7 +973,7 @@ int Wait(char *dev) > > if (e->devnum == devnum) > > break; > > > > - if (!e || e->percent < 0) { > > + if (!e || e->percent == RESYNC_NONE) { > > if (e && e->metadata_version && > > strncmp(e->metadata_version, "external:", 9) == 0) { > > if (is_subarray(&e->metadata_version[9])) > > > > > > NeilBrown > Thanks for the patch. I take it the current behavior is expected, if > undesirable? Well, I didn't expect it until I looked in the code and saw the bug. But now I do ;-) Yes, undesirable. NeilBrown > > I'll try to apply it, but I'm in the middle of several system upgrades > and I may have trouble getting the source for the current system, since > it is out of date. > > I spent most of yesterday dealing with various RAID problems, which I > will detail in a separate message. > Thanks. > Ross [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mdadm --wait returns while array under construction? [patch question] 2012-11-20 21:43 ` NeilBrown 2012-11-21 16:43 ` Ross Boylan @ 2012-11-27 18:28 ` Ross Boylan 2012-11-27 21:30 ` NeilBrown 1 sibling, 1 reply; 9+ messages in thread From: Ross Boylan @ 2012-11-27 18:28 UTC (permalink / raw) To: NeilBrown; +Cc: ross, linux-raid On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote: > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > While switching the disks a RAID 1 is based on I used the --wait command > > to wait for the rebuild to finish. It returned immediately, but a > > subsequent query showed it had not been rebuilt. Have I misunderstood > > something, or is this an error? > > > > While doing these commands a much larger rebuild was going on with a > > different array, involving some of the same physical disks but different > > partitions. The partitions being rebuilt are on different physical > > disks for the different arrays. > > > > Here are the logs, with version info at the end (Debian Lenny + more > > recent kernel): > .... > > > markov:~# uname -a > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux > > markov:~# mdadm --version > > mdadm - v2.6.7.2 - 14th November 2008 > > > > > > I notice that in this case, unlike the other array, the message during > > the rebuild (the last detail report) does not include a line like > > Rebuild Status : 0% complete > > > > I just tried --wait again to see if there was some kind of race, but > > once again it returned immediately, though detail says the spare is > > rebuilding. > > Can you test this patch to see if it fixes the problem? > > diff --git a/Monitor.c b/Monitor.c > index c4d57c3..a5e7aaa 100644 > --- a/Monitor.c > +++ b/Monitor.c > @@ -973,7 +973,7 @@ int Wait(char *dev) > if (e->devnum == devnum) > break; > > - if (!e || e->percent < 0) { > + if (!e || e->percent == RESYNC_NONE) { > if (e && e->metadata_version && > strncmp(e->metadata_version, "external:", 9) == 0) { > if (is_subarray(&e->metadata_version[9])) > > > NeilBrown My source for 2.6.7.2 looks somewhat different. It only has 627 lines; I think this is the relevant code (at the end of the file): /* Not really Monitor but ... */ int Wait(char *dev) { struct stat stb; int devnum; int rv = 1; if (stat(dev, &stb) != 0) { fprintf(stderr, Name ": Cannot find %s: %s\n", dev, strerror(errno)); return 2; } if (major(stb.st_rdev) == MD_MAJOR) devnum = minor(stb.st_rdev); else devnum = -1-(minor(stb.st_rdev)/64); while(1) { struct mdstat_ent *ms = mdstat_read(1, 0); struct mdstat_ent *e; for (e=ms ; e; e=e->next) if (e->devnum == devnum) break; if (!e || e->percent < 0) { free_mdstat(ms); return rv; } free(ms); rv = 0; mdstat_wait(5); } } The section if (!e || e->percent < 0) { free_mdstat(ms); return rv; is the only one with e->percent < 0. Is it OK to change that to if (!e || e->percent == RESYNC_NONE) {? Thanks. Ross ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mdadm --wait returns while array under construction? [patch question] 2012-11-27 18:28 ` mdadm --wait returns while array under construction? [patch question] Ross Boylan @ 2012-11-27 21:30 ` NeilBrown 2012-11-28 2:10 ` Ross Boylan 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2012-11-27 21:30 UTC (permalink / raw) To: Ross Boylan; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 4016 bytes --] On Tue, 27 Nov 2012 10:28:33 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote: > > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > > > While switching the disks a RAID 1 is based on I used the --wait command > > > to wait for the rebuild to finish. It returned immediately, but a > > > subsequent query showed it had not been rebuilt. Have I misunderstood > > > something, or is this an error? > > > > > > While doing these commands a much larger rebuild was going on with a > > > different array, involving some of the same physical disks but different > > > partitions. The partitions being rebuilt are on different physical > > > disks for the different arrays. > > > > > > Here are the logs, with version info at the end (Debian Lenny + more > > > recent kernel): > > .... > > > > > markov:~# uname -a > > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux > > > markov:~# mdadm --version > > > mdadm - v2.6.7.2 - 14th November 2008 > > > > > > > > > I notice that in this case, unlike the other array, the message during > > > the rebuild (the last detail report) does not include a line like > > > Rebuild Status : 0% complete > > > > > > I just tried --wait again to see if there was some kind of race, but > > > once again it returned immediately, though detail says the spare is > > > rebuilding. > > > > Can you test this patch to see if it fixes the problem? > > > > diff --git a/Monitor.c b/Monitor.c > > index c4d57c3..a5e7aaa 100644 > > --- a/Monitor.c > > +++ b/Monitor.c > > @@ -973,7 +973,7 @@ int Wait(char *dev) > > if (e->devnum == devnum) > > break; > > > > - if (!e || e->percent < 0) { > > + if (!e || e->percent == RESYNC_NONE) { > > if (e && e->metadata_version && > > strncmp(e->metadata_version, "external:", 9) == 0) { > > if (is_subarray(&e->metadata_version[9])) > > > > > > NeilBrown > My source for 2.6.7.2 looks somewhat different. It only has 627 lines; > I think this is the relevant code (at the end of the file): > /* Not really Monitor but ... */ > int Wait(char *dev) > { > struct stat stb; > int devnum; > int rv = 1; > > if (stat(dev, &stb) != 0) { > fprintf(stderr, Name ": Cannot find %s: %s\n", dev, > strerror(errno)); > return 2; > } > if (major(stb.st_rdev) == MD_MAJOR) > devnum = minor(stb.st_rdev); > else > devnum = -1-(minor(stb.st_rdev)/64); > > while(1) { > struct mdstat_ent *ms = mdstat_read(1, 0); > struct mdstat_ent *e; > > for (e=ms ; e; e=e->next) > if (e->devnum == devnum) > break; > > if (!e || e->percent < 0) { > free_mdstat(ms); > return rv; > } > free(ms); > rv = 0; > mdstat_wait(5); > } > } > > > The section > if (!e || e->percent < 0) { > free_mdstat(ms); > return rv; > is the only one with e->percent < 0. Is it OK to change that to > if (!e || e->percent == RESYNC_NONE) {? > > That's the right place to make the change, bit it won't compile. RESYNC_NONE isn't defined in that version of mdadm, and you would need to make some changes in mdstat.c where ent->percent is set. Current code has if (l > 8 && strcmp(w+l-8, "=DELAYED") == 0) ent->percent = RESYNC_DELAYED; if (l > 8 && strcmp(w+l-8, "=PENDING") == 0) ent->percent = RESYNC_PENDING; which is completely missing from 2.6.7.2. You'd be a lot better off starting with 3.2.6 and adding the patch to that. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mdadm --wait returns while array under construction? [patch question] 2012-11-27 21:30 ` NeilBrown @ 2012-11-28 2:10 ` Ross Boylan 2012-11-29 1:35 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: Ross Boylan @ 2012-11-28 2:10 UTC (permalink / raw) To: NeilBrown; +Cc: ross, linux-raid On Wed, 2012-11-28 at 08:30 +1100, NeilBrown wrote: > On Tue, 27 Nov 2012 10:28:33 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote: > > > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > > > > > While switching the disks a RAID 1 is based on I used the --wait command > > > > to wait for the rebuild to finish. It returned immediately, but a > > > > subsequent query showed it had not been rebuilt. Have I misunderstood > > > > something, or is this an error? > > > > > > > > While doing these commands a much larger rebuild was going on with a > > > > different array, involving some of the same physical disks but different > > > > partitions. The partitions being rebuilt are on different physical > > > > disks for the different arrays. > > > > > > > > Here are the logs, with version info at the end (Debian Lenny + more > > > > recent kernel): > > > .... > > > > > > > markov:~# uname -a > > > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux > > > > markov:~# mdadm --version > > > > mdadm - v2.6.7.2 - 14th November 2008 > > > > > > > > > > > > I notice that in this case, unlike the other array, the message during > > > > the rebuild (the last detail report) does not include a line like > > > > Rebuild Status : 0% complete > > > > > > > > I just tried --wait again to see if there was some kind of race, but > > > > once again it returned immediately, though detail says the spare is > > > > rebuilding. > > > > > > Can you test this patch to see if it fixes the problem? > > > > > > diff --git a/Monitor.c b/Monitor.c > > > index c4d57c3..a5e7aaa 100644 > > > --- a/Monitor.c > > > +++ b/Monitor.c > > > @@ -973,7 +973,7 @@ int Wait(char *dev) > > > if (e->devnum == devnum) > > > break; > > > > > > - if (!e || e->percent < 0) { > > > + if (!e || e->percent == RESYNC_NONE) { > > > if (e && e->metadata_version && > > > strncmp(e->metadata_version, "external:", 9) == 0) { > > > if (is_subarray(&e->metadata_version[9])) > > > > > > > > > NeilBrown > > My source for 2.6.7.2 looks somewhat different. It only has 627 lines; > > I think this is the relevant code (at the end of the file): > > /* Not really Monitor but ... */ > > int Wait(char *dev) > > { > > struct stat stb; > > int devnum; > > int rv = 1; > > > > if (stat(dev, &stb) != 0) { > > fprintf(stderr, Name ": Cannot find %s: %s\n", dev, > > strerror(errno)); > > return 2; > > } > > if (major(stb.st_rdev) == MD_MAJOR) > > devnum = minor(stb.st_rdev); > > else > > devnum = -1-(minor(stb.st_rdev)/64); > > > > while(1) { > > struct mdstat_ent *ms = mdstat_read(1, 0); > > struct mdstat_ent *e; > > > > for (e=ms ; e; e=e->next) > > if (e->devnum == devnum) > > break; > > > > if (!e || e->percent < 0) { > > free_mdstat(ms); > > return rv; > > } > > free(ms); > > rv = 0; > > mdstat_wait(5); > > } > > } > > > > > > The section > > if (!e || e->percent < 0) { > > free_mdstat(ms); > > return rv; > > is the only one with e->percent < 0. Is it OK to change that to > > if (!e || e->percent == RESYNC_NONE) {? > > > > > > That's the right place to make the change, bit it won't compile. > RESYNC_NONE isn't defined in that version of mdadm, and you would need to > make some changes in mdstat.c where ent->percent is set. > Current code has > > > if (l > 8 && strcmp(w+l-8, "=DELAYED") == 0) > ent->percent = RESYNC_DELAYED; > if (l > 8 && strcmp(w+l-8, "=PENDING") == 0) > ent->percent = RESYNC_PENDING; > > which is completely missing from 2.6.7.2. You'd be a lot better off starting > with 3.2.6 and adding the patch to that. > > NeilBrown I think I'm going to have to pass on testing for now, as the alternatives appear too high risk: 1) I got the debianized source for 3.2.5 (for some reason 3.2.6 is not there yet). It depends on a variety of package versions that post-date my lenny system. So it will not install unless I override those, or located/backport more recent versions of the other packages. Since this is messing with core areas of the system (grub, udev, initscripts) it seems unwise to attempt backports. 2) I considered patching 2.6.7.2 in place with the additional info you provided, but I'm not sure if you're sayiing the mdstat.c changes alone are sufficient, or if I need to change Monitor.c in some way. 3) I could just dump your 3.2.6 upstream source over my current 2.6.7.2 Debianized directory. But then I'd need to figure out what Debian patches I need to reapply, and wonder if it would all work in a Lenny environment. I'd like to help, but since this is just a reporting problem for me I don't want to risk screwing things up further. I might be able to do 2) with a little more information. BTW, I reviewed the udev rules for mdadm on my system and in the 2.6.7.2 package, and it does not appear that incremental assembly is being attempted. That's not relevant to this thread, but does matter for some of my other ones. Also, the 3.2.5 Debian package's udev rules say ## DISABLED: Incremental udev assembly disabled ## ** this is a Debian-specific change ** GOTO="md_inc_skip" ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: mdadm --wait returns while array under construction? [patch question] 2012-11-28 2:10 ` Ross Boylan @ 2012-11-29 1:35 ` NeilBrown 0 siblings, 0 replies; 9+ messages in thread From: NeilBrown @ 2012-11-29 1:35 UTC (permalink / raw) To: Ross Boylan; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 6750 bytes --] On Tue, 27 Nov 2012 18:10:20 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > On Wed, 2012-11-28 at 08:30 +1100, NeilBrown wrote: > > On Tue, 27 Nov 2012 10:28:33 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > > > On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote: > > > > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote: > > > > > > > > > While switching the disks a RAID 1 is based on I used the --wait command > > > > > to wait for the rebuild to finish. It returned immediately, but a > > > > > subsequent query showed it had not been rebuilt. Have I misunderstood > > > > > something, or is this an error? > > > > > > > > > > While doing these commands a much larger rebuild was going on with a > > > > > different array, involving some of the same physical disks but different > > > > > partitions. The partitions being rebuilt are on different physical > > > > > disks for the different arrays. > > > > > > > > > > Here are the logs, with version info at the end (Debian Lenny + more > > > > > recent kernel): > > > > .... > > > > > > > > > markov:~# uname -a > > > > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux > > > > > markov:~# mdadm --version > > > > > mdadm - v2.6.7.2 - 14th November 2008 > > > > > > > > > > > > > > > I notice that in this case, unlike the other array, the message during > > > > > the rebuild (the last detail report) does not include a line like > > > > > Rebuild Status : 0% complete > > > > > > > > > > I just tried --wait again to see if there was some kind of race, but > > > > > once again it returned immediately, though detail says the spare is > > > > > rebuilding. > > > > > > > > Can you test this patch to see if it fixes the problem? > > > > > > > > diff --git a/Monitor.c b/Monitor.c > > > > index c4d57c3..a5e7aaa 100644 > > > > --- a/Monitor.c > > > > +++ b/Monitor.c > > > > @@ -973,7 +973,7 @@ int Wait(char *dev) > > > > if (e->devnum == devnum) > > > > break; > > > > > > > > - if (!e || e->percent < 0) { > > > > + if (!e || e->percent == RESYNC_NONE) { > > > > if (e && e->metadata_version && > > > > strncmp(e->metadata_version, "external:", 9) == 0) { > > > > if (is_subarray(&e->metadata_version[9])) > > > > > > > > > > > > NeilBrown > > > My source for 2.6.7.2 looks somewhat different. It only has 627 lines; > > > I think this is the relevant code (at the end of the file): > > > /* Not really Monitor but ... */ > > > int Wait(char *dev) > > > { > > > struct stat stb; > > > int devnum; > > > int rv = 1; > > > > > > if (stat(dev, &stb) != 0) { > > > fprintf(stderr, Name ": Cannot find %s: %s\n", dev, > > > strerror(errno)); > > > return 2; > > > } > > > if (major(stb.st_rdev) == MD_MAJOR) > > > devnum = minor(stb.st_rdev); > > > else > > > devnum = -1-(minor(stb.st_rdev)/64); > > > > > > while(1) { > > > struct mdstat_ent *ms = mdstat_read(1, 0); > > > struct mdstat_ent *e; > > > > > > for (e=ms ; e; e=e->next) > > > if (e->devnum == devnum) > > > break; > > > > > > if (!e || e->percent < 0) { > > > free_mdstat(ms); > > > return rv; > > > } > > > free(ms); > > > rv = 0; > > > mdstat_wait(5); > > > } > > > } > > > > > > > > > The section > > > if (!e || e->percent < 0) { > > > free_mdstat(ms); > > > return rv; > > > is the only one with e->percent < 0. Is it OK to change that to > > > if (!e || e->percent == RESYNC_NONE) {? > > > > > > > > > > That's the right place to make the change, bit it won't compile. > > RESYNC_NONE isn't defined in that version of mdadm, and you would need to > > make some changes in mdstat.c where ent->percent is set. > > Current code has > > > > > > if (l > 8 && strcmp(w+l-8, "=DELAYED") == 0) > > ent->percent = RESYNC_DELAYED; > > if (l > 8 && strcmp(w+l-8, "=PENDING") == 0) > > ent->percent = RESYNC_PENDING; > > > > which is completely missing from 2.6.7.2. You'd be a lot better off starting > > with 3.2.6 and adding the patch to that. > > > > NeilBrown > I think I'm going to have to pass on testing for now, as the > alternatives appear too high risk: > 1) I got the debianized source for 3.2.5 (for some reason 3.2.6 is not > there yet). It depends on a variety of package versions that post-date > my lenny system. So it will not install unless I override those, or > located/backport more recent versions of the other packages. Since this > is messing with core areas of the system (grub, udev, initscripts) it > seems unwise to attempt backports. > > 2) I considered patching 2.6.7.2 in place with the additional info you > provided, but I'm not sure if you're sayiing the mdstat.c changes alone > are sufficient, or if I need to change Monitor.c in some way. Looks like I communicated quite effectively :-) I'm not sure. I thought about making a patch fro 2.6.7.2 and quickly decided that just upgrading would be easiest. You don't need to use the debian version. Just git clone git://neil.brown.name/mdadm cd mdadm git checkout 3.2.5 make make install Of course you would void your support contract with Debian.... > > 3) I could just dump your 3.2.6 upstream source over my current 2.6.7.2 > Debianized directory. But then I'd need to figure out what Debian > patches I need to reapply, and wonder if it would all work in a Lenny > environment. I don't think you need any Debian patches. > > I'd like to help, but since this is just a reporting problem for me I > don't want to risk screwing things up further. I might be able to do 2) > with a little more information. > > BTW, I reviewed the udev rules for mdadm on my system and in the 2.6.7.2 > package, and it does not appear that incremental assembly is being > attempted. That's not relevant to this thread, but does matter for > some of my other ones. Also, the 3.2.5 Debian package's udev rules say > ## DISABLED: Incremental udev assembly disabled > ## ** this is a Debian-specific change ** > GOTO="md_inc_skip" > > Ahhh.. "make install" will change the udev script. So maybe "make install" wouldn't quite be such a good idea. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-11-29 1:35 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-11-20 17:55 mdadm --wait returns while array under construction? Ross Boylan 2012-11-20 18:22 ` Ross Boylan 2012-11-20 21:43 ` NeilBrown 2012-11-21 16:43 ` Ross Boylan 2012-11-22 6:09 ` NeilBrown 2012-11-27 18:28 ` mdadm --wait returns while array under construction? [patch question] Ross Boylan 2012-11-27 21:30 ` NeilBrown 2012-11-28 2:10 ` Ross Boylan 2012-11-29 1:35 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).