linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mdadm --wait returns while array under construction?
@ 2012-11-20 17:55 Ross Boylan
  2012-11-20 18:22 ` Ross Boylan
  2012-11-20 21:43 ` NeilBrown
  0 siblings, 2 replies; 9+ messages in thread
From: Ross Boylan @ 2012-11-20 17:55 UTC (permalink / raw)
  To: linux-raid; +Cc: ross

While switching the disks a RAID 1 is based on I used the --wait command
to wait for the rebuild to finish.  It returned immediately, but a
subsequent query showed it had not been rebuilt.  Have I misunderstood
something, or is this an error?

While doing these commands a much larger rebuild was going on with a
different array, involving some of the same physical disks but different
partitions.  The partitions being rebuilt are on different physical
disks for the different arrays.

Here are the logs, with version info at the end (Debian Lenny + more
recent kernel):
markov:~# date; mdadm --detail /dev/md0
Tue Nov 20 09:37:07 PST 2012
/dev/md0:
        Version : 00.90
  Creation Time : Mon Dec 15 06:49:51 2008
     Raid Level : raid1
     Array Size : 96256 (94.02 MiB 98.57 MB)
  Used Dev Size : 96256 (94.02 MiB 98.57 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Nov 20 07:41:04 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 313d5489:7869305b:5b5da825:51e3856c
         Events : 0.1602

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       33        1      active sync   /dev/sdc1
markov:~# date; mdadm --fail /dev/md0 /dev/sdc1
Tue Nov 20 09:37:58 PST 2012
mdadm: set /dev/sdc1 faulty in /dev/md0
markov:~# date; mdadm --add /dev/md0 /dev/sdd2
Tue Nov 20 09:39:05 PST 2012
mdadm: added /dev/sdd2
markov:~# date; mdadm --detail /dev/md0
Tue Nov 20 09:39:14 PST 2012
/dev/md0:
        Version : 00.90
  Creation Time : Mon Dec 15 06:49:51 2008
     Raid Level : raid1
     Array Size : 96256 (94.02 MiB 98.57 MB)
  Used Dev Size : 96256 (94.02 MiB 98.57 MB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Nov 20 09:39:05 2012
          State : clean, degraded
 Active Devices : 1
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 1

           UUID : 313d5489:7869305b:5b5da825:51e3856c
         Events : 0.1606

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       2       8       50        1      spare rebuilding   /dev/sdd2

       3       8       33        -      faulty spare   /dev/sdc1
markov:~# time mdadm --wait /dev/md0; date

real    0m0.002s
user    0m0.000s
sys     0m0.004s
Tue Nov 20 09:40:07 PST 2012
markov:~# date; mdadm --detail /dev/md0
Tue Nov 20 09:40:20 PST 2012
/dev/md0:
        Version : 00.90
  Creation Time : Mon Dec 15 06:49:51 2008
     Raid Level : raid1
     Array Size : 96256 (94.02 MiB 98.57 MB)
  Used Dev Size : 96256 (94.02 MiB 98.57 MB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Tue Nov 20 09:39:15 2012
          State : clean, degraded
 Active Devices : 1
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 1

           UUID : 313d5489:7869305b:5b5da825:51e3856c
         Events : 0.1608

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       2       8       50        1      spare rebuilding   /dev/sdd2

       3       8       33        -      faulty spare   /dev/sdc1
markov:~# uname -a
Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
markov:~# mdadm --version
mdadm - v2.6.7.2 - 14th November 2008


I notice that in this case, unlike the other array, the message during
the rebuild (the last detail report) does not include a line like
Rebuild Status : 0% complete

I just tried --wait again to see if there was some kind of race, but
once again it returned immediately, though detail says the spare is
rebuilding.

Ross Boylan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mdadm --wait returns while array under construction?
  2012-11-20 17:55 mdadm --wait returns while array under construction? Ross Boylan
@ 2012-11-20 18:22 ` Ross Boylan
  2012-11-20 21:43 ` NeilBrown
  1 sibling, 0 replies; 9+ messages in thread
From: Ross Boylan @ 2012-11-20 18:22 UTC (permalink / raw)
  To: linux-raid; +Cc: ross

On Tue, 2012-11-20 at 09:55 -0800, Ross Boylan wrote:
> While switching the disks a RAID 1 is based on I used the --wait command
> to wait for the rebuild to finish.  It returned immediately, but a
> subsequent query showed it had not been rebuilt.  Have I misunderstood
> something, or is this an error?
This message from the logs seems relevant:
md: delaying recovery of md0 until md1 has finished (they share one or more physical units)
It's still not the behavior I'd expect from --wait.
md0 and md1 are based on partitions, which are completely distinct;
however they do use the sam physical disks.
Ross
> 
> While doing these commands a much larger rebuild was going on with a
> different array, involving some of the same physical disks but different
> partitions.  The partitions being rebuilt are on different physical
> disks for the different arrays.
> 
> Here are the logs, with version info at the end (Debian Lenny + more
> recent kernel):
> markov:~# date; mdadm --detail /dev/md0
> Tue Nov 20 09:37:07 PST 2012
> /dev/md0:
>         Version : 00.90
>   Creation Time : Mon Dec 15 06:49:51 2008
>      Raid Level : raid1
>      Array Size : 96256 (94.02 MiB 98.57 MB)
>   Used Dev Size : 96256 (94.02 MiB 98.57 MB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 0
>     Persistence : Superblock is persistent
> 
>     Update Time : Tue Nov 20 07:41:04 2012
>           State : clean
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
> 
>            UUID : 313d5489:7869305b:5b5da825:51e3856c
>          Events : 0.1602
> 
>     Number   Major   Minor   RaidDevice State
>        0       8        1        0      active sync   /dev/sda1
>        1       8       33        1      active sync   /dev/sdc1
> markov:~# date; mdadm --fail /dev/md0 /dev/sdc1
> Tue Nov 20 09:37:58 PST 2012
> mdadm: set /dev/sdc1 faulty in /dev/md0
> markov:~# date; mdadm --add /dev/md0 /dev/sdd2
> Tue Nov 20 09:39:05 PST 2012
> mdadm: added /dev/sdd2
> markov:~# date; mdadm --detail /dev/md0
> Tue Nov 20 09:39:14 PST 2012
> /dev/md0:
>         Version : 00.90
>   Creation Time : Mon Dec 15 06:49:51 2008
>      Raid Level : raid1
>      Array Size : 96256 (94.02 MiB 98.57 MB)
>   Used Dev Size : 96256 (94.02 MiB 98.57 MB)
>    Raid Devices : 2
>   Total Devices : 3
> Preferred Minor : 0
>     Persistence : Superblock is persistent
> 
>     Update Time : Tue Nov 20 09:39:05 2012
>           State : clean, degraded
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 1
>   Spare Devices : 1
> 
>            UUID : 313d5489:7869305b:5b5da825:51e3856c
>          Events : 0.1606
> 
>     Number   Major   Minor   RaidDevice State
>        0       8        1        0      active sync   /dev/sda1
>        2       8       50        1      spare rebuilding   /dev/sdd2
> 
>        3       8       33        -      faulty spare   /dev/sdc1
> markov:~# time mdadm --wait /dev/md0; date
> 
> real    0m0.002s
> user    0m0.000s
> sys     0m0.004s
> Tue Nov 20 09:40:07 PST 2012
> markov:~# date; mdadm --detail /dev/md0
> Tue Nov 20 09:40:20 PST 2012
> /dev/md0:
>         Version : 00.90
>   Creation Time : Mon Dec 15 06:49:51 2008
>      Raid Level : raid1
>      Array Size : 96256 (94.02 MiB 98.57 MB)
>   Used Dev Size : 96256 (94.02 MiB 98.57 MB)
>    Raid Devices : 2
>   Total Devices : 3
> Preferred Minor : 0
>     Persistence : Superblock is persistent
> 
>     Update Time : Tue Nov 20 09:39:15 2012
>           State : clean, degraded
>  Active Devices : 1
> Working Devices : 2
>  Failed Devices : 1
>   Spare Devices : 1
> 
>            UUID : 313d5489:7869305b:5b5da825:51e3856c
>          Events : 0.1608
> 
>     Number   Major   Minor   RaidDevice State
>        0       8        1        0      active sync   /dev/sda1
>        2       8       50        1      spare rebuilding   /dev/sdd2
> 
>        3       8       33        -      faulty spare   /dev/sdc1
> markov:~# uname -a
> Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
> markov:~# mdadm --version
> mdadm - v2.6.7.2 - 14th November 2008
> 
> 
> I notice that in this case, unlike the other array, the message during
> the rebuild (the last detail report) does not include a line like
> Rebuild Status : 0% complete
> 
> I just tried --wait again to see if there was some kind of race, but
> once again it returned immediately, though detail says the spare is
> rebuilding.
> 
> Ross Boylan
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mdadm --wait returns while array under construction?
  2012-11-20 17:55 mdadm --wait returns while array under construction? Ross Boylan
  2012-11-20 18:22 ` Ross Boylan
@ 2012-11-20 21:43 ` NeilBrown
  2012-11-21 16:43   ` Ross Boylan
  2012-11-27 18:28   ` mdadm --wait returns while array under construction? [patch question] Ross Boylan
  1 sibling, 2 replies; 9+ messages in thread
From: NeilBrown @ 2012-11-20 21:43 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1704 bytes --]

On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:

> While switching the disks a RAID 1 is based on I used the --wait command
> to wait for the rebuild to finish.  It returned immediately, but a
> subsequent query showed it had not been rebuilt.  Have I misunderstood
> something, or is this an error?
> 
> While doing these commands a much larger rebuild was going on with a
> different array, involving some of the same physical disks but different
> partitions.  The partitions being rebuilt are on different physical
> disks for the different arrays.
> 
> Here are the logs, with version info at the end (Debian Lenny + more
> recent kernel):
....

> markov:~# uname -a
> Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
> markov:~# mdadm --version
> mdadm - v2.6.7.2 - 14th November 2008
> 
> 
> I notice that in this case, unlike the other array, the message during
> the rebuild (the last detail report) does not include a line like
> Rebuild Status : 0% complete
> 
> I just tried --wait again to see if there was some kind of race, but
> once again it returned immediately, though detail says the spare is
> rebuilding.

Can you test this patch to see if it fixes the problem?

diff --git a/Monitor.c b/Monitor.c
index c4d57c3..a5e7aaa 100644
--- a/Monitor.c
+++ b/Monitor.c
@@ -973,7 +973,7 @@ int Wait(char *dev)
 			if (e->devnum == devnum)
 				break;
 
-		if (!e || e->percent < 0) {
+		if (!e || e->percent == RESYNC_NONE) {
 			if (e && e->metadata_version &&
 			    strncmp(e->metadata_version, "external:", 9) == 0) {
 				if (is_subarray(&e->metadata_version[9]))


NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: mdadm --wait returns while array under construction?
  2012-11-20 21:43 ` NeilBrown
@ 2012-11-21 16:43   ` Ross Boylan
  2012-11-22  6:09     ` NeilBrown
  2012-11-27 18:28   ` mdadm --wait returns while array under construction? [patch question] Ross Boylan
  1 sibling, 1 reply; 9+ messages in thread
From: Ross Boylan @ 2012-11-21 16:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: ross, linux-raid

On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote:
> On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:
> 
> > While switching the disks a RAID 1 is based on I used the --wait command
> > to wait for the rebuild to finish.  It returned immediately, but a
> > subsequent query showed it had not been rebuilt.  Have I misunderstood
> > something, or is this an error?
> > 
> > While doing these commands a much larger rebuild was going on with a
> > different array, involving some of the same physical disks but different
> > partitions.  The partitions being rebuilt are on different physical
> > disks for the different arrays.
> > 
> > Here are the logs, with version info at the end (Debian Lenny + more
> > recent kernel):
> ....
> 
> > markov:~# uname -a
> > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
> > markov:~# mdadm --version
> > mdadm - v2.6.7.2 - 14th November 2008
> > 
> > 
> > I notice that in this case, unlike the other array, the message during
> > the rebuild (the last detail report) does not include a line like
> > Rebuild Status : 0% complete
> > 
> > I just tried --wait again to see if there was some kind of race, but
> > once again it returned immediately, though detail says the spare is
> > rebuilding.
> 
> Can you test this patch to see if it fixes the problem?
> 
> diff --git a/Monitor.c b/Monitor.c
> index c4d57c3..a5e7aaa 100644
> --- a/Monitor.c
> +++ b/Monitor.c
> @@ -973,7 +973,7 @@ int Wait(char *dev)
>  			if (e->devnum == devnum)
>  				break;
>  
> -		if (!e || e->percent < 0) {
> +		if (!e || e->percent == RESYNC_NONE) {
>  			if (e && e->metadata_version &&
>  			    strncmp(e->metadata_version, "external:", 9) == 0) {
>  				if (is_subarray(&e->metadata_version[9]))
> 
> 
> NeilBrown
Thanks for the patch.  I take it the current behavior is expected, if
undesirable?

I'll try to apply it, but I'm in the middle of several system upgrades
and I may have trouble getting the source for the current system, since
it is out of date.

I spent most of yesterday dealing with various RAID problems, which I
will detail in a separate message.
Thanks.
Ross


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mdadm --wait returns while array under construction?
  2012-11-21 16:43   ` Ross Boylan
@ 2012-11-22  6:09     ` NeilBrown
  0 siblings, 0 replies; 9+ messages in thread
From: NeilBrown @ 2012-11-22  6:09 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2559 bytes --]

On Wed, 21 Nov 2012 08:43:02 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:

> On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote:
> > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:
> > 
> > > While switching the disks a RAID 1 is based on I used the --wait command
> > > to wait for the rebuild to finish.  It returned immediately, but a
> > > subsequent query showed it had not been rebuilt.  Have I misunderstood
> > > something, or is this an error?
> > > 
> > > While doing these commands a much larger rebuild was going on with a
> > > different array, involving some of the same physical disks but different
> > > partitions.  The partitions being rebuilt are on different physical
> > > disks for the different arrays.
> > > 
> > > Here are the logs, with version info at the end (Debian Lenny + more
> > > recent kernel):
> > ....
> > 
> > > markov:~# uname -a
> > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
> > > markov:~# mdadm --version
> > > mdadm - v2.6.7.2 - 14th November 2008
> > > 
> > > 
> > > I notice that in this case, unlike the other array, the message during
> > > the rebuild (the last detail report) does not include a line like
> > > Rebuild Status : 0% complete
> > > 
> > > I just tried --wait again to see if there was some kind of race, but
> > > once again it returned immediately, though detail says the spare is
> > > rebuilding.
> > 
> > Can you test this patch to see if it fixes the problem?
> > 
> > diff --git a/Monitor.c b/Monitor.c
> > index c4d57c3..a5e7aaa 100644
> > --- a/Monitor.c
> > +++ b/Monitor.c
> > @@ -973,7 +973,7 @@ int Wait(char *dev)
> >  			if (e->devnum == devnum)
> >  				break;
> >  
> > -		if (!e || e->percent < 0) {
> > +		if (!e || e->percent == RESYNC_NONE) {
> >  			if (e && e->metadata_version &&
> >  			    strncmp(e->metadata_version, "external:", 9) == 0) {
> >  				if (is_subarray(&e->metadata_version[9]))
> > 
> > 
> > NeilBrown
> Thanks for the patch.  I take it the current behavior is expected, if
> undesirable?

Well, I didn't expect it until I looked in the code and saw the bug.  But now
I do ;-)
Yes, undesirable.

NeilBrown


> 
> I'll try to apply it, but I'm in the middle of several system upgrades
> and I may have trouble getting the source for the current system, since
> it is out of date.
> 
> I spent most of yesterday dealing with various RAID problems, which I
> will detail in a separate message.
> Thanks.
> Ross


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mdadm --wait returns while array under construction? [patch question]
  2012-11-20 21:43 ` NeilBrown
  2012-11-21 16:43   ` Ross Boylan
@ 2012-11-27 18:28   ` Ross Boylan
  2012-11-27 21:30     ` NeilBrown
  1 sibling, 1 reply; 9+ messages in thread
From: Ross Boylan @ 2012-11-27 18:28 UTC (permalink / raw)
  To: NeilBrown; +Cc: ross, linux-raid

On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote:
> On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:
> 
> > While switching the disks a RAID 1 is based on I used the --wait command
> > to wait for the rebuild to finish.  It returned immediately, but a
> > subsequent query showed it had not been rebuilt.  Have I misunderstood
> > something, or is this an error?
> > 
> > While doing these commands a much larger rebuild was going on with a
> > different array, involving some of the same physical disks but different
> > partitions.  The partitions being rebuilt are on different physical
> > disks for the different arrays.
> > 
> > Here are the logs, with version info at the end (Debian Lenny + more
> > recent kernel):
> ....
> 
> > markov:~# uname -a
> > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
> > markov:~# mdadm --version
> > mdadm - v2.6.7.2 - 14th November 2008
> > 
> > 
> > I notice that in this case, unlike the other array, the message during
> > the rebuild (the last detail report) does not include a line like
> > Rebuild Status : 0% complete
> > 
> > I just tried --wait again to see if there was some kind of race, but
> > once again it returned immediately, though detail says the spare is
> > rebuilding.
> 
> Can you test this patch to see if it fixes the problem?
> 
> diff --git a/Monitor.c b/Monitor.c
> index c4d57c3..a5e7aaa 100644
> --- a/Monitor.c
> +++ b/Monitor.c
> @@ -973,7 +973,7 @@ int Wait(char *dev)
>  			if (e->devnum == devnum)
>  				break;
>  
> -		if (!e || e->percent < 0) {
> +		if (!e || e->percent == RESYNC_NONE) {
>  			if (e && e->metadata_version &&
>  			    strncmp(e->metadata_version, "external:", 9) == 0) {
>  				if (is_subarray(&e->metadata_version[9]))
> 
> 
> NeilBrown
My source for 2.6.7.2 looks somewhat different.  It only has 627 lines;
I think this is the relevant code (at the end of the file):
/* Not really Monitor but ... */
int Wait(char *dev)
{
        struct stat stb;
        int devnum;
        int rv = 1;

        if (stat(dev, &stb) != 0) {
                fprintf(stderr, Name ": Cannot find %s: %s\n", dev,
                        strerror(errno));
                return 2;
        }
        if (major(stb.st_rdev) == MD_MAJOR)
                devnum = minor(stb.st_rdev);
        else
                devnum = -1-(minor(stb.st_rdev)/64);

        while(1) {
                struct mdstat_ent *ms = mdstat_read(1, 0);
                struct mdstat_ent *e;

                for (e=ms ; e; e=e->next)
                        if (e->devnum == devnum)
                                break;

                if (!e || e->percent < 0) {
                        free_mdstat(ms);
                        return rv;
                }
                free(ms);
                rv = 0;
                mdstat_wait(5);
        }
}


The section
                if (!e || e->percent < 0) {
                        free_mdstat(ms);
                        return rv;
 is the only one with e->percent < 0.  Is it OK to change that to 
if (!e || e->percent == RESYNC_NONE) {?

Thanks.
Ross


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mdadm --wait returns while array under construction? [patch question]
  2012-11-27 18:28   ` mdadm --wait returns while array under construction? [patch question] Ross Boylan
@ 2012-11-27 21:30     ` NeilBrown
  2012-11-28  2:10       ` Ross Boylan
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2012-11-27 21:30 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4016 bytes --]

On Tue, 27 Nov 2012 10:28:33 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:

> On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote:
> > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:
> > 
> > > While switching the disks a RAID 1 is based on I used the --wait command
> > > to wait for the rebuild to finish.  It returned immediately, but a
> > > subsequent query showed it had not been rebuilt.  Have I misunderstood
> > > something, or is this an error?
> > > 
> > > While doing these commands a much larger rebuild was going on with a
> > > different array, involving some of the same physical disks but different
> > > partitions.  The partitions being rebuilt are on different physical
> > > disks for the different arrays.
> > > 
> > > Here are the logs, with version info at the end (Debian Lenny + more
> > > recent kernel):
> > ....
> > 
> > > markov:~# uname -a
> > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
> > > markov:~# mdadm --version
> > > mdadm - v2.6.7.2 - 14th November 2008
> > > 
> > > 
> > > I notice that in this case, unlike the other array, the message during
> > > the rebuild (the last detail report) does not include a line like
> > > Rebuild Status : 0% complete
> > > 
> > > I just tried --wait again to see if there was some kind of race, but
> > > once again it returned immediately, though detail says the spare is
> > > rebuilding.
> > 
> > Can you test this patch to see if it fixes the problem?
> > 
> > diff --git a/Monitor.c b/Monitor.c
> > index c4d57c3..a5e7aaa 100644
> > --- a/Monitor.c
> > +++ b/Monitor.c
> > @@ -973,7 +973,7 @@ int Wait(char *dev)
> >  			if (e->devnum == devnum)
> >  				break;
> >  
> > -		if (!e || e->percent < 0) {
> > +		if (!e || e->percent == RESYNC_NONE) {
> >  			if (e && e->metadata_version &&
> >  			    strncmp(e->metadata_version, "external:", 9) == 0) {
> >  				if (is_subarray(&e->metadata_version[9]))
> > 
> > 
> > NeilBrown
> My source for 2.6.7.2 looks somewhat different.  It only has 627 lines;
> I think this is the relevant code (at the end of the file):
> /* Not really Monitor but ... */
> int Wait(char *dev)
> {
>         struct stat stb;
>         int devnum;
>         int rv = 1;
> 
>         if (stat(dev, &stb) != 0) {
>                 fprintf(stderr, Name ": Cannot find %s: %s\n", dev,
>                         strerror(errno));
>                 return 2;
>         }
>         if (major(stb.st_rdev) == MD_MAJOR)
>                 devnum = minor(stb.st_rdev);
>         else
>                 devnum = -1-(minor(stb.st_rdev)/64);
> 
>         while(1) {
>                 struct mdstat_ent *ms = mdstat_read(1, 0);
>                 struct mdstat_ent *e;
> 
>                 for (e=ms ; e; e=e->next)
>                         if (e->devnum == devnum)
>                                 break;
> 
>                 if (!e || e->percent < 0) {
>                         free_mdstat(ms);
>                         return rv;
>                 }
>                 free(ms);
>                 rv = 0;
>                 mdstat_wait(5);
>         }
> }
> 
> 
> The section
>                 if (!e || e->percent < 0) {
>                         free_mdstat(ms);
>                         return rv;
>  is the only one with e->percent < 0.  Is it OK to change that to 
> if (!e || e->percent == RESYNC_NONE) {?
> 
>

That's the right place to make the change, bit it won't compile.
RESYNC_NONE isn't defined in that version of mdadm, and you would need to
make some changes in mdstat.c where ent->percent is set.
Current code has


				if (l > 8 && strcmp(w+l-8, "=DELAYED") == 0)
					ent->percent = RESYNC_DELAYED;
				if (l > 8 && strcmp(w+l-8, "=PENDING") == 0)
					ent->percent = RESYNC_PENDING;

which is completely missing from 2.6.7.2.  You'd be a lot better off starting
with 3.2.6 and adding the patch to that.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mdadm --wait returns while array under construction? [patch question]
  2012-11-27 21:30     ` NeilBrown
@ 2012-11-28  2:10       ` Ross Boylan
  2012-11-29  1:35         ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Ross Boylan @ 2012-11-28  2:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: ross, linux-raid

On Wed, 2012-11-28 at 08:30 +1100, NeilBrown wrote:
> On Tue, 27 Nov 2012 10:28:33 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:
> 
> > On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote:
> > > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:
> > > 
> > > > While switching the disks a RAID 1 is based on I used the --wait command
> > > > to wait for the rebuild to finish.  It returned immediately, but a
> > > > subsequent query showed it had not been rebuilt.  Have I misunderstood
> > > > something, or is this an error?
> > > > 
> > > > While doing these commands a much larger rebuild was going on with a
> > > > different array, involving some of the same physical disks but different
> > > > partitions.  The partitions being rebuilt are on different physical
> > > > disks for the different arrays.
> > > > 
> > > > Here are the logs, with version info at the end (Debian Lenny + more
> > > > recent kernel):
> > > ....
> > > 
> > > > markov:~# uname -a
> > > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
> > > > markov:~# mdadm --version
> > > > mdadm - v2.6.7.2 - 14th November 2008
> > > > 
> > > > 
> > > > I notice that in this case, unlike the other array, the message during
> > > > the rebuild (the last detail report) does not include a line like
> > > > Rebuild Status : 0% complete
> > > > 
> > > > I just tried --wait again to see if there was some kind of race, but
> > > > once again it returned immediately, though detail says the spare is
> > > > rebuilding.
> > > 
> > > Can you test this patch to see if it fixes the problem?
> > > 
> > > diff --git a/Monitor.c b/Monitor.c
> > > index c4d57c3..a5e7aaa 100644
> > > --- a/Monitor.c
> > > +++ b/Monitor.c
> > > @@ -973,7 +973,7 @@ int Wait(char *dev)
> > >  			if (e->devnum == devnum)
> > >  				break;
> > >  
> > > -		if (!e || e->percent < 0) {
> > > +		if (!e || e->percent == RESYNC_NONE) {
> > >  			if (e && e->metadata_version &&
> > >  			    strncmp(e->metadata_version, "external:", 9) == 0) {
> > >  				if (is_subarray(&e->metadata_version[9]))
> > > 
> > > 
> > > NeilBrown
> > My source for 2.6.7.2 looks somewhat different.  It only has 627 lines;
> > I think this is the relevant code (at the end of the file):
> > /* Not really Monitor but ... */
> > int Wait(char *dev)
> > {
> >         struct stat stb;
> >         int devnum;
> >         int rv = 1;
> > 
> >         if (stat(dev, &stb) != 0) {
> >                 fprintf(stderr, Name ": Cannot find %s: %s\n", dev,
> >                         strerror(errno));
> >                 return 2;
> >         }
> >         if (major(stb.st_rdev) == MD_MAJOR)
> >                 devnum = minor(stb.st_rdev);
> >         else
> >                 devnum = -1-(minor(stb.st_rdev)/64);
> > 
> >         while(1) {
> >                 struct mdstat_ent *ms = mdstat_read(1, 0);
> >                 struct mdstat_ent *e;
> > 
> >                 for (e=ms ; e; e=e->next)
> >                         if (e->devnum == devnum)
> >                                 break;
> > 
> >                 if (!e || e->percent < 0) {
> >                         free_mdstat(ms);
> >                         return rv;
> >                 }
> >                 free(ms);
> >                 rv = 0;
> >                 mdstat_wait(5);
> >         }
> > }
> > 
> > 
> > The section
> >                 if (!e || e->percent < 0) {
> >                         free_mdstat(ms);
> >                         return rv;
> >  is the only one with e->percent < 0.  Is it OK to change that to 
> > if (!e || e->percent == RESYNC_NONE) {?
> > 
> >
> 
> That's the right place to make the change, bit it won't compile.
> RESYNC_NONE isn't defined in that version of mdadm, and you would need to
> make some changes in mdstat.c where ent->percent is set.
> Current code has
> 
> 
> 				if (l > 8 && strcmp(w+l-8, "=DELAYED") == 0)
> 					ent->percent = RESYNC_DELAYED;
> 				if (l > 8 && strcmp(w+l-8, "=PENDING") == 0)
> 					ent->percent = RESYNC_PENDING;
> 
> which is completely missing from 2.6.7.2.  You'd be a lot better off starting
> with 3.2.6 and adding the patch to that.
> 
> NeilBrown
I think I'm going to have to pass on testing for now, as the
alternatives appear too high risk:
1) I got the debianized source for 3.2.5 (for some reason 3.2.6 is not
there yet).  It depends on a variety of package versions that post-date
my lenny system.  So it will not install unless I override those, or
located/backport more recent versions of the other packages.  Since this
is messing with core areas of the system (grub, udev, initscripts) it
seems unwise to attempt backports.

2) I considered patching 2.6.7.2 in place with the additional info you
provided, but I'm not sure if you're sayiing the mdstat.c changes alone
are sufficient, or if I need to change Monitor.c in some way.

3) I could just dump your 3.2.6 upstream source over my current 2.6.7.2
Debianized directory.  But then I'd need to figure out what Debian
patches I need to reapply, and wonder if it would all work in a Lenny
environment.

I'd like to help, but since this is just a reporting problem for me I
don't want to risk screwing things up further.  I might be able to do 2)
with a little more information.

BTW, I reviewed the udev rules for mdadm on my system and in the 2.6.7.2
package, and it does not  appear that incremental assembly is being
attempted.  That's not relevant to this  thread, but does matter for
some of my other ones.  Also, the 3.2.5 Debian package's udev rules say
## DISABLED: Incremental udev assembly disabled
## ** this is a Debian-specific change **
GOTO="md_inc_skip"




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: mdadm --wait returns while array under construction? [patch question]
  2012-11-28  2:10       ` Ross Boylan
@ 2012-11-29  1:35         ` NeilBrown
  0 siblings, 0 replies; 9+ messages in thread
From: NeilBrown @ 2012-11-29  1:35 UTC (permalink / raw)
  To: Ross Boylan; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6750 bytes --]

On Tue, 27 Nov 2012 18:10:20 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:

> On Wed, 2012-11-28 at 08:30 +1100, NeilBrown wrote:
> > On Tue, 27 Nov 2012 10:28:33 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:
> > 
> > > On Wed, 2012-11-21 at 08:43 +1100, NeilBrown wrote:
> > > > On Tue, 20 Nov 2012 09:55:41 -0800 Ross Boylan <ross@biostat.ucsf.edu> wrote:
> > > > 
> > > > > While switching the disks a RAID 1 is based on I used the --wait command
> > > > > to wait for the rebuild to finish.  It returned immediately, but a
> > > > > subsequent query showed it had not been rebuilt.  Have I misunderstood
> > > > > something, or is this an error?
> > > > > 
> > > > > While doing these commands a much larger rebuild was going on with a
> > > > > different array, involving some of the same physical disks but different
> > > > > partitions.  The partitions being rebuilt are on different physical
> > > > > disks for the different arrays.
> > > > > 
> > > > > Here are the logs, with version info at the end (Debian Lenny + more
> > > > > recent kernel):
> > > > ....
> > > > 
> > > > > markov:~# uname -a
> > > > > Linux markov 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
> > > > > markov:~# mdadm --version
> > > > > mdadm - v2.6.7.2 - 14th November 2008
> > > > > 
> > > > > 
> > > > > I notice that in this case, unlike the other array, the message during
> > > > > the rebuild (the last detail report) does not include a line like
> > > > > Rebuild Status : 0% complete
> > > > > 
> > > > > I just tried --wait again to see if there was some kind of race, but
> > > > > once again it returned immediately, though detail says the spare is
> > > > > rebuilding.
> > > > 
> > > > Can you test this patch to see if it fixes the problem?
> > > > 
> > > > diff --git a/Monitor.c b/Monitor.c
> > > > index c4d57c3..a5e7aaa 100644
> > > > --- a/Monitor.c
> > > > +++ b/Monitor.c
> > > > @@ -973,7 +973,7 @@ int Wait(char *dev)
> > > >  			if (e->devnum == devnum)
> > > >  				break;
> > > >  
> > > > -		if (!e || e->percent < 0) {
> > > > +		if (!e || e->percent == RESYNC_NONE) {
> > > >  			if (e && e->metadata_version &&
> > > >  			    strncmp(e->metadata_version, "external:", 9) == 0) {
> > > >  				if (is_subarray(&e->metadata_version[9]))
> > > > 
> > > > 
> > > > NeilBrown
> > > My source for 2.6.7.2 looks somewhat different.  It only has 627 lines;
> > > I think this is the relevant code (at the end of the file):
> > > /* Not really Monitor but ... */
> > > int Wait(char *dev)
> > > {
> > >         struct stat stb;
> > >         int devnum;
> > >         int rv = 1;
> > > 
> > >         if (stat(dev, &stb) != 0) {
> > >                 fprintf(stderr, Name ": Cannot find %s: %s\n", dev,
> > >                         strerror(errno));
> > >                 return 2;
> > >         }
> > >         if (major(stb.st_rdev) == MD_MAJOR)
> > >                 devnum = minor(stb.st_rdev);
> > >         else
> > >                 devnum = -1-(minor(stb.st_rdev)/64);
> > > 
> > >         while(1) {
> > >                 struct mdstat_ent *ms = mdstat_read(1, 0);
> > >                 struct mdstat_ent *e;
> > > 
> > >                 for (e=ms ; e; e=e->next)
> > >                         if (e->devnum == devnum)
> > >                                 break;
> > > 
> > >                 if (!e || e->percent < 0) {
> > >                         free_mdstat(ms);
> > >                         return rv;
> > >                 }
> > >                 free(ms);
> > >                 rv = 0;
> > >                 mdstat_wait(5);
> > >         }
> > > }
> > > 
> > > 
> > > The section
> > >                 if (!e || e->percent < 0) {
> > >                         free_mdstat(ms);
> > >                         return rv;
> > >  is the only one with e->percent < 0.  Is it OK to change that to 
> > > if (!e || e->percent == RESYNC_NONE) {?
> > > 
> > >
> > 
> > That's the right place to make the change, bit it won't compile.
> > RESYNC_NONE isn't defined in that version of mdadm, and you would need to
> > make some changes in mdstat.c where ent->percent is set.
> > Current code has
> > 
> > 
> > 				if (l > 8 && strcmp(w+l-8, "=DELAYED") == 0)
> > 					ent->percent = RESYNC_DELAYED;
> > 				if (l > 8 && strcmp(w+l-8, "=PENDING") == 0)
> > 					ent->percent = RESYNC_PENDING;
> > 
> > which is completely missing from 2.6.7.2.  You'd be a lot better off starting
> > with 3.2.6 and adding the patch to that.
> > 
> > NeilBrown
> I think I'm going to have to pass on testing for now, as the
> alternatives appear too high risk:
> 1) I got the debianized source for 3.2.5 (for some reason 3.2.6 is not
> there yet).  It depends on a variety of package versions that post-date
> my lenny system.  So it will not install unless I override those, or
> located/backport more recent versions of the other packages.  Since this
> is messing with core areas of the system (grub, udev, initscripts) it
> seems unwise to attempt backports.
> 
> 2) I considered patching 2.6.7.2 in place with the additional info you
> provided, but I'm not sure if you're sayiing the mdstat.c changes alone
> are sufficient, or if I need to change Monitor.c in some way.

Looks like I communicated quite effectively :-)  I'm not sure.  I thought
about making a patch fro 2.6.7.2 and quickly decided that just upgrading
would be easiest.
You don't need to use the debian version.   Just
  git clone git://neil.brown.name/mdadm
  cd mdadm
  git checkout 3.2.5
  make
  make install

Of course you would void your support contract with Debian....

> 
> 3) I could just dump your 3.2.6 upstream source over my current 2.6.7.2
> Debianized directory.  But then I'd need to figure out what Debian
> patches I need to reapply, and wonder if it would all work in a Lenny
> environment.

I don't think you need any Debian patches.

> 
> I'd like to help, but since this is just a reporting problem for me I
> don't want to risk screwing things up further.  I might be able to do 2)
> with a little more information.
> 
> BTW, I reviewed the udev rules for mdadm on my system and in the 2.6.7.2
> package, and it does not  appear that incremental assembly is being
> attempted.  That's not relevant to this  thread, but does matter for
> some of my other ones.  Also, the 3.2.5 Debian package's udev rules say
> ## DISABLED: Incremental udev assembly disabled
> ## ** this is a Debian-specific change **
> GOTO="md_inc_skip"
> 
> 

Ahhh.. "make install" will change the udev script.  So maybe "make install"
wouldn't quite be such a good idea.

NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-29  1:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-20 17:55 mdadm --wait returns while array under construction? Ross Boylan
2012-11-20 18:22 ` Ross Boylan
2012-11-20 21:43 ` NeilBrown
2012-11-21 16:43   ` Ross Boylan
2012-11-22  6:09     ` NeilBrown
2012-11-27 18:28   ` mdadm --wait returns while array under construction? [patch question] Ross Boylan
2012-11-27 21:30     ` NeilBrown
2012-11-28  2:10       ` Ross Boylan
2012-11-29  1:35         ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).