Raid5 Reshape gone wrong, please help

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid5 Reshape gone wrong, please help
@ 2007-08-18  1:26 Greg Nicholson
  2007-08-18  8:56 ` Neil Brown
  0 siblings, 1 reply; 13+ messages in thread
From: Greg Nicholson @ 2007-08-18  1:26 UTC (permalink / raw)
  To: linux-raid

I was trying to resize a Raid 5 array of 4 500G drives to 5.  Kernel
version 2.6.23-rc3 was the kernel I STARTED on this.

  I added the device to the array :
mdadm --add /dev/md0 /dev/sdb1

Then I started to grow the array :
 mdadm --grow /dev/md0 --raid-devices=5

At this point the machine locked up.  Not good.

I ended up having to hard reboot.  Now, I have the following in dmesg :

md: md0: raid array is not clean -- starting background reconstruction
raid5: reshape_position too early for auto-recovery - aborting.
md: pers->run() failed ...

/proc/mdstat is :

Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdf1[0] sdb1[4] sdc1[3] sdd1[2] sde1[1]
      2441918720 blocks super 0.91

unused devices: <none>


It doesn't look like it actually DID anything besides update the raid
count to 5 from 4. (I think.)

How do I do a manual recovery on this?


Examining the disks:

 mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : a9a472d3:9586c602:9207b56b:a5185bd3
  Creation Time : Thu Dec 21 09:42:27 2006
     Raid Level : raid5
  Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
     Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

  Reshape pos'n : 0
  Delta Devices : 1 (4->5)

    Update Time : Fri Aug 17 19:49:43 2007
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c8ebb87b - correct
         Events : 0.2795

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     4       8       17        4      active sync   /dev/sdb1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       33        3      active sync   /dev/sdc1
   4     4       8       17        4      active sync   /dev/sdb1

 mdadm -E /dev/sdc1
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : a9a472d3:9586c602:9207b56b:a5185bd3
  Creation Time : Thu Dec 21 09:42:27 2006
     Raid Level : raid5
  Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
     Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

  Reshape pos'n : 0
  Delta Devices : 1 (4->5)

    Update Time : Fri Aug 17 19:49:43 2007
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c8ebb889 - correct
         Events : 0.2795

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     3       8       33        3      active sync   /dev/sdc1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       33        3      active sync   /dev/sdc1
   4     4       8       17        4      active sync   /dev/sdb1

 mdadm -E /dev/sdd1
/dev/sdd1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : a9a472d3:9586c602:9207b56b:a5185bd3
  Creation Time : Thu Dec 21 09:42:27 2006
     Raid Level : raid5
  Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
     Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

  Reshape pos'n : 0
  Delta Devices : 1 (4->5)

    Update Time : Fri Aug 17 19:49:43 2007
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c8ebb897 - correct
         Events : 0.2795

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     2       8       49        2      active sync   /dev/sdd1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       33        3      active sync   /dev/sdc1
   4     4       8       17        4      active sync   /dev/sdb1

/dev/sde1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : a9a472d3:9586c602:9207b56b:a5185bd3
  Creation Time : Thu Dec 21 09:42:27 2006
     Raid Level : raid5
  Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
     Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

  Reshape pos'n : 0
  Delta Devices : 1 (4->5)

    Update Time : Fri Aug 17 19:49:43 2007
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c8ebb8a5 - correct
         Events : 0.2795

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     1       8       65        1      active sync   /dev/sde1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       33        3      active sync   /dev/sdc1
   4     4       8       17        4      active sync   /dev/sdb1

/dev/sdf1:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : a9a472d3:9586c602:9207b56b:a5185bd3
  Creation Time : Thu Dec 21 09:42:27 2006
     Raid Level : raid5
  Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
     Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 0

  Reshape pos'n : 0
  Delta Devices : 1 (4->5)

    Update Time : Fri Aug 17 19:49:43 2007
          State : active
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c8ebb8b3 - correct
         Events : 0.2795

         Layout : left-symmetric
     Chunk Size : 256K

      Number   Major   Minor   RaidDevice State
this     0       8       81        0      active sync   /dev/sdf1

   0     0       8       81        0      active sync   /dev/sdf1
   1     1       8       65        1      active sync   /dev/sde1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       33        3      active sync   /dev/sdc1
   4     4       8       17        4      active sync   /dev/sdb1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-18  1:26 Raid5 Reshape gone wrong, please help Greg Nicholson
@ 2007-08-18  8:56 ` Neil Brown
  2007-08-18 15:37   ` Greg Nicholson
  0 siblings, 1 reply; 13+ messages in thread
From: Neil Brown @ 2007-08-18  8:56 UTC (permalink / raw)
  To: Greg Nicholson; +Cc: linux-raid

On Friday August 17, d0gz.net@gmail.com wrote:
> I was trying to resize a Raid 5 array of 4 500G drives to 5.  Kernel
> version 2.6.23-rc3 was the kernel I STARTED on this.
> 
>   I added the device to the array :
> mdadm --add /dev/md0 /dev/sdb1
> 
> Then I started to grow the array :
>  mdadm --grow /dev/md0 --raid-devices=5
> 
> At this point the machine locked up.  Not good.

No, not good.  But it shouldn't be fatal.

> 
> I ended up having to hard reboot.  Now, I have the following in dmesg :
> 
> md: md0: raid array is not clean -- starting background reconstruction
> raid5: reshape_position too early for auto-recovery - aborting.
> md: pers->run() failed ...

Looks like you crashed during the 'critical' period.

> 
> /proc/mdstat is :
> 
> Personalities : [raid6] [raid5] [raid4]
> md0 : inactive sdf1[0] sdb1[4] sdc1[3] sdd1[2] sde1[1]
>       2441918720 blocks super 0.91
> 
> unused devices: <none>
> 
> 
> It doesn't look like it actually DID anything besides update the raid
> count to 5 from 4. (I think.)
> 
> How do I do a manual recovery on this?

Simply use mdadm to assemble the array:

  mdadm -A /dev/md0 /dev/sd[bcdef]1

It should notice that the kernel needs help, and will provide
that help.
Specifically, when you started the 'grow', mdadm copied the first few
stripes into unused space in the new device.  When you re-assemble, it
will copy those stripes back into the new layout, then let the kernel
do the rest.

Please let us know how it goes.

NeilBrown

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-18  8:56 ` Neil Brown
@ 2007-08-18 15:37   ` Greg Nicholson
  2007-08-19 11:17     ` Neil Brown
  0 siblings, 1 reply; 13+ messages in thread
From: Greg Nicholson @ 2007-08-18 15:37 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 8/18/07, Neil Brown <neilb@suse.de> wrote:
> On Friday August 17, d0gz.net@gmail.com wrote:
> > I was trying to resize a Raid 5 array of 4 500G drives to 5.  Kernel
> > version 2.6.23-rc3 was the kernel I STARTED on this.
> >
> >   I added the device to the array :
> > mdadm --add /dev/md0 /dev/sdb1
> >
> > Then I started to grow the array :
> >  mdadm --grow /dev/md0 --raid-devices=5
> >
> > At this point the machine locked up.  Not good.
>
> No, not good.  But it shouldn't be fatal.

Well, that was my thought as well.
>
> >
> > I ended up having to hard reboot.  Now, I have the following in dmesg :
> >
> > md: md0: raid array is not clean -- starting background reconstruction
> > raid5: reshape_position too early for auto-recovery - aborting.
> > md: pers->run() failed ...
>
> Looks like you crashed during the 'critical' period.
>
> >
> > /proc/mdstat is :
> >
> > Personalities : [raid6] [raid5] [raid4]
> > md0 : inactive sdf1[0] sdb1[4] sdc1[3] sdd1[2] sde1[1]
> >       2441918720 blocks super 0.91
> >
> > unused devices: <none>
> >
> >
> > It doesn't look like it actually DID anything besides update the raid
> > count to 5 from 4. (I think.)
> >
> > How do I do a manual recovery on this?
>
> Simply use mdadm to assemble the array:
>
>   mdadm -A /dev/md0 /dev/sd[bcdef]1
>
> It should notice that the kernel needs help, and will provide
> that help.
> Specifically, when you started the 'grow', mdadm copied the first few
> stripes into unused space in the new device.  When you re-assemble, it
> will copy those stripes back into the new layout, then let the kernel
> do the rest.
>
> Please let us know how it goes.
>
> NeilBrown
>

I had already tried to assemble it by hand, before I basically said...
WAIT.  Ask for help.  Don't screw up more. :)

But I tried again:

root@excimer {  }$ mdadm -A /dev/md0 /dev/sd[bcdef]1
mdadm: device /dev/md0 already active - cannot assemble it
root@excimer { ~ }$ mdadm -S /dev/md0
mdadm: stopped /dev/md0
root@excimer { ~ }$ mdadm -A /dev/md0 /dev/sd[bcdef]1
mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument

Dmesg shows:

md: md0 stopped.
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sde1>
md: export_rdev(sde1)
md: md0 stopped.
md: bind<sde1>
md: bind<sdd1>
md: bind<sdc1>
md: bind<sdb1>
md: bind<sdf1>
md: md0: raid array is not clean -- starting background reconstruction
raid5: reshape_position too early for auto-recovery - aborting.
md: pers->run() failed ...
md: md0 stopped.
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: unbind<sde1>
md: export_rdev(sde1)
md: md0 stopped.
md: bind<sde1>
md: bind<sdd1>
md: bind<sdc1>
md: bind<sdb1>
md: bind<sdf1>
md: md0: raid array is not clean -- starting background reconstruction
raid5: reshape_position too early for auto-recovery - aborting.
md: pers->run() failed ...

And the raid stays in an inactive state.

Using mdadm v2.6.2 and kernel 2.6.23-rc3, although I can push back to
earlier versions easily if it would help.

I know that sdb1 is the new device.  When mdadm ran, it said the
critical section was 3920k (approximately).  When I didn't get a
response for five minutes, and there wasn't ANY disk activity, I
booted the box.

Based on your message and the man page, it sounds like mdadm should
have placed something on sdb1.  So... Trying to be non-destructive,
but still gather information:

 dd if=/dev/sdb1 of=/tmp/test bs=1024k count=1000
 hexdump /tmp/test
 0000000 0000 0000 0000 0000 0000 0000 0000 0000
 *
 3e800000

dd if=/dev/sdb1 of=/tmp/test bs=1024k count=1000 skip=999
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 35.0176 seconds, 29.9 MB/s
root@excimer { ~ }$ hexdump /tmp/test
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
3e800000

That looks to me like the first 2 gig is completely empty on the
drive.  I really don't think it actually started to do anything.

Do you have further suggestions on where to go now?

Oh, and thank you very much for your help.  Most of the data on this
array I can stand to loose... It's not critical, but there are some of
my photographs on this that my backup is out of date on.  I can
destroy it all and start over, but really want to try to recover this
if it's possible.  For that matter, if it didn't actually start
rewriting the stripes, is there anyway to push it back down to 4 disks
to recover ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-18 15:37   ` Greg Nicholson
@ 2007-08-19 11:17     ` Neil Brown
  2007-08-19 15:45       ` Greg Nicholson
  0 siblings, 1 reply; 13+ messages in thread
From: Neil Brown @ 2007-08-19 11:17 UTC (permalink / raw)
  To: Greg Nicholson; +Cc: linux-raid

On Saturday August 18, d0gz.net@gmail.com wrote:
> 
> That looks to me like the first 2 gig is completely empty on the
> drive.  I really don't think it actually started to do anything.

The backup data is near the end of the device.  If you look at the
last 2 gig you should see something.

> 
> Do you have further suggestions on where to go now?

Maybe an 'strace' of "mdadm -A ...." might show me something.

If you feel like following the code, Assemble (in Assemble.c) should
call Grow_restart.
This should look in /dev/sdb1 (which is already open in 'fdlist') by
calling 'load_super'.  It should then seek to 8 sectors before the
superblock (or close to there) and read a secondary superblock which
describes the backup data.
If this looks good, it seeks to where the backup data is (which is
towards the end of the device) and reads that.  It uses this to
restore the 'critical section', and then updates the superblock on all
devices.

As you aren't getting the messages 'restoring critical section',
something is going wrong before there.  It should fail:
  /dev/md0: Failed to restore critical section for reshape, sorry.
but I can see that there is a problem with the error return from
'Grow_restart'.  I'll get that fixed.

> 
> Oh, and thank you very much for your help.  Most of the data on this
> array I can stand to loose... It's not critical, but there are some of
> my photographs on this that my backup is out of date on.  I can
> destroy it all and start over, but really want to try to recover this
> if it's possible.  For that matter, if it didn't actually start
> rewriting the stripes, is there anyway to push it back down to 4 disks
> to recover ?

You could always just recreate the array:

 mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1  \
    /dev/sdd1 /dev/sdc1

and make sure the data looks good (which it should).

I'd still like to know that the problem is though....

Thanks,
NeilBeon

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-19 11:17     ` Neil Brown
@ 2007-08-19 15:45       ` Greg Nicholson
  2007-08-20  2:44         ` Greg Nicholson
  0 siblings, 1 reply; 13+ messages in thread
From: Greg Nicholson @ 2007-08-19 15:45 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 8/19/07, Neil Brown <neilb@suse.de> wrote:
> On Saturday August 18, d0gz.net@gmail.com wrote:
> >
> > That looks to me like the first 2 gig is completely empty on the
> > drive.  I really don't think it actually started to do anything.
>
> The backup data is near the end of the device.  If you look at the
> last 2 gig you should see something.
>

I figured something like that after I started thinking about it...
That device is currently offline while I do some DD's to new devices.

> >
> > Do you have further suggestions on where to go now?
>
> Maybe an 'strace' of "mdadm -A ...." might show me something.
>
> If you feel like following the code, Assemble (in Assemble.c) should
> call Grow_restart.
> This should look in /dev/sdb1 (which is already open in 'fdlist') by
> calling 'load_super'.  It should then seek to 8 sectors before the
> superblock (or close to there) and read a secondary superblock which
> describes the backup data.
> If this looks good, it seeks to where the backup data is (which is
> towards the end of the device) and reads that.  It uses this to
> restore the 'critical section', and then updates the superblock on all
> devices.
>
> As you aren't getting the messages 'restoring critical section',
> something is going wrong before there.  It should fail:
>   /dev/md0: Failed to restore critical section for reshape, sorry.
> but I can see that there is a problem with the error return from
> 'Grow_restart'.  I'll get that fixed.
>
>
> >
> > Oh, and thank you very much for your help.  Most of the data on this
> > array I can stand to loose... It's not critical, but there are some of
> > my photographs on this that my backup is out of date on.  I can
> > destroy it all and start over, but really want to try to recover this
> > if it's possible.  For that matter, if it didn't actually start
> > rewriting the stripes, is there anyway to push it back down to 4 disks
> > to recover ?
>
> You could always just recreate the array:
>
>  mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1  \
>     /dev/sdd1 /dev/sdc1
>
> and make sure the data looks good (which it should).
>
> I'd still like to know that the problem is though....
>
> Thanks,
> NeilBeon
>

My current plan of attack, which I've been proceeding upon for the
last 24 hours... I'm DDing the original drives to new devices.  Once I
have copies of the drives, I'm going to try to recreate the array as a
4 device array.  Hopefully, at that point, the raid will come up, LVM
will initialize, and it's time to saturate the GigE offloading
EVERYTHING.

Assuming the above goes well.... which will definitely take some time,
Then I'll take the original drives, run the strace and try to get some
additional data for you.  I'd love to know what's up with this as
well.  If there is additional information I can get you to help, let
me know.  I've grown several arrays before without any issue, which
frankly is why I didn't think this would have been an issue.... thus,
my offload of the stuff I actually cared about wasn't up to date.

At the end of day (or more likely, week)  I'll completely destroy the
existing raid, and rebuild the entire thing to make sure I'm starting
from a good base.  At least at that point, I'll have additional
drives.  Given that I have dual File-servers that will have drives
added, it seems likely that I'll be testing the code again soon.  Big
difference being that this time, I won't make the assumption that
everything will be perfect. :)

Thanks again for your help, I'll post on my results as well as try to
get you that strace.  It's been quite a while since I dove into kernel
internals, or C for that matter, so it's unlikely I'm going to find
anything myself.... But I'll definitely send results back if I can.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-19 15:45       ` Greg Nicholson
@ 2007-08-20  2:44         ` Greg Nicholson
  2007-08-21  2:09           ` Greg Nicholson
  0 siblings, 1 reply; 13+ messages in thread
From: Greg Nicholson @ 2007-08-20  2:44 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 8/19/07, Greg Nicholson <d0gz.net@gmail.com> wrote:
> On 8/19/07, Neil Brown <neilb@suse.de> wrote:
> > On Saturday August 18, d0gz.net@gmail.com wrote:
> > >
> > > That looks to me like the first 2 gig is completely empty on the
> > > drive.  I really don't think it actually started to do anything.
> >
> > The backup data is near the end of the device.  If you look at the
> > last 2 gig you should see something.
> >
>
> I figured something like that after I started thinking about it...
> That device is currently offline while I do some DD's to new devices.
>
> > >
> > > Do you have further suggestions on where to go now?
> >
> > Maybe an 'strace' of "mdadm -A ...." might show me something.
> >
> > If you feel like following the code, Assemble (in Assemble.c) should
> > call Grow_restart.
> > This should look in /dev/sdb1 (which is already open in 'fdlist') by
> > calling 'load_super'.  It should then seek to 8 sectors before the
> > superblock (or close to there) and read a secondary superblock which
> > describes the backup data.
> > If this looks good, it seeks to where the backup data is (which is
> > towards the end of the device) and reads that.  It uses this to
> > restore the 'critical section', and then updates the superblock on all
> > devices.
> >
> > As you aren't getting the messages 'restoring critical section',
> > something is going wrong before there.  It should fail:
> >   /dev/md0: Failed to restore critical section for reshape, sorry.
> > but I can see that there is a problem with the error return from
> > 'Grow_restart'.  I'll get that fixed.
> >
> >
> > >
> > > Oh, and thank you very much for your help.  Most of the data on this
> > > array I can stand to loose... It's not critical, but there are some of
> > > my photographs on this that my backup is out of date on.  I can
> > > destroy it all and start over, but really want to try to recover this
> > > if it's possible.  For that matter, if it didn't actually start
> > > rewriting the stripes, is there anyway to push it back down to 4 disks
> > > to recover ?
> >
> > You could always just recreate the array:
> >
> >  mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1  \
> >     /dev/sdd1 /dev/sdc1
> >
> > and make sure the data looks good (which it should).
> >
> > I'd still like to know that the problem is though....
> >
> > Thanks,
> > NeilBeon
> >
>
> My current plan of attack, which I've been proceeding upon for the
> last 24 hours... I'm DDing the original drives to new devices.  Once I
> have copies of the drives, I'm going to try to recreate the array as a
> 4 device array.  Hopefully, at that point, the raid will come up, LVM
> will initialize, and it's time to saturate the GigE offloading
> EVERYTHING.
>
> Assuming the above goes well.... which will definitely take some time,
> Then I'll take the original drives, run the strace and try to get some
> additional data for you.  I'd love to know what's up with this as
> well.  If there is additional information I can get you to help, let
> me know.  I've grown several arrays before without any issue, which
> frankly is why I didn't think this would have been an issue.... thus,
> my offload of the stuff I actually cared about wasn't up to date.
>
> At the end of day (or more likely, week)  I'll completely destroy the
> existing raid, and rebuild the entire thing to make sure I'm starting
> from a good base.  At least at that point, I'll have additional
> drives.  Given that I have dual File-servers that will have drives
> added, it seems likely that I'll be testing the code again soon.  Big
> difference being that this time, I won't make the assumption that
> everything will be perfect. :)
>
> Thanks again for your help, I'll post on my results as well as try to
> get you that strace.  It's been quite a while since I dove into kernel
> internals, or C for that matter, so it's unlikely I'm going to find
> anything myself.... But I'll definitely send results back if I can.
>


Ok, as an update.  ORDER MATTERS.  :)

The above command didn't work.  It built, but LVM didn't recognize.
So, after despair, I thought, that's not the way I built it.  So, I
redid it in Alphabetical order... and it worked.

I'm in the process of taring and pulling everything off.

Once that is done, I'll put the original drives back in, and try to
understand what went wrong with the original grow/build.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-20  2:44         ` Greg Nicholson
@ 2007-08-21  2:09           ` Greg Nicholson
  2007-08-23 15:04             ` Greg Nicholson
  0 siblings, 1 reply; 13+ messages in thread
From: Greg Nicholson @ 2007-08-21  2:09 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 8/19/07, Greg Nicholson <d0gz.net@gmail.com> wrote:
> On 8/19/07, Greg Nicholson <d0gz.net@gmail.com> wrote:
> > On 8/19/07, Neil Brown <neilb@suse.de> wrote:
> > > On Saturday August 18, d0gz.net@gmail.com wrote:
> > > >
> > > > That looks to me like the first 2 gig is completely empty on the
> > > > drive.  I really don't think it actually started to do anything.
> > >
> > > The backup data is near the end of the device.  If you look at the
> > > last 2 gig you should see something.
> > >
> >
> > I figured something like that after I started thinking about it...
> > That device is currently offline while I do some DD's to new devices.
> >
> > > >
> > > > Do you have further suggestions on where to go now?
> > >
> > > Maybe an 'strace' of "mdadm -A ...." might show me something.
> > >
> > > If you feel like following the code, Assemble (in Assemble.c) should
> > > call Grow_restart.
> > > This should look in /dev/sdb1 (which is already open in 'fdlist') by
> > > calling 'load_super'.  It should then seek to 8 sectors before the
> > > superblock (or close to there) and read a secondary superblock which
> > > describes the backup data.
> > > If this looks good, it seeks to where the backup data is (which is
> > > towards the end of the device) and reads that.  It uses this to
> > > restore the 'critical section', and then updates the superblock on all
> > > devices.
> > >
> > > As you aren't getting the messages 'restoring critical section',
> > > something is going wrong before there.  It should fail:
> > >   /dev/md0: Failed to restore critical section for reshape, sorry.
> > > but I can see that there is a problem with the error return from
> > > 'Grow_restart'.  I'll get that fixed.
> > >
> > >
> > > >
> > > > Oh, and thank you very much for your help.  Most of the data on this
> > > > array I can stand to loose... It's not critical, but there are some of
> > > > my photographs on this that my backup is out of date on.  I can
> > > > destroy it all and start over, but really want to try to recover this
> > > > if it's possible.  For that matter, if it didn't actually start
> > > > rewriting the stripes, is there anyway to push it back down to 4 disks
> > > > to recover ?
> > >
> > > You could always just recreate the array:
> > >
> > >  mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1  \
> > >     /dev/sdd1 /dev/sdc1
> > >
> > > and make sure the data looks good (which it should).
> > >
> > > I'd still like to know that the problem is though....
> > >
> > > Thanks,
> > > NeilBeon
> > >
> >
> > My current plan of attack, which I've been proceeding upon for the
> > last 24 hours... I'm DDing the original drives to new devices.  Once I
> > have copies of the drives, I'm going to try to recreate the array as a
> > 4 device array.  Hopefully, at that point, the raid will come up, LVM
> > will initialize, and it's time to saturate the GigE offloading
> > EVERYTHING.
> >
> > Assuming the above goes well.... which will definitely take some time,
> > Then I'll take the original drives, run the strace and try to get some
> > additional data for you.  I'd love to know what's up with this as
> > well.  If there is additional information I can get you to help, let
> > me know.  I've grown several arrays before without any issue, which
> > frankly is why I didn't think this would have been an issue.... thus,
> > my offload of the stuff I actually cared about wasn't up to date.
> >
> > At the end of day (or more likely, week)  I'll completely destroy the
> > existing raid, and rebuild the entire thing to make sure I'm starting
> > from a good base.  At least at that point, I'll have additional
> > drives.  Given that I have dual File-servers that will have drives
> > added, it seems likely that I'll be testing the code again soon.  Big
> > difference being that this time, I won't make the assumption that
> > everything will be perfect. :)
> >
> > Thanks again for your help, I'll post on my results as well as try to
> > get you that strace.  It's been quite a while since I dove into kernel
> > internals, or C for that matter, so it's unlikely I'm going to find
> > anything myself.... But I'll definitely send results back if I can.
> >
>
>
> Ok, as an update.  ORDER MATTERS.  :)
>
> The above command didn't work.  It built, but LVM didn't recognize.
> So, after despair, I thought, that's not the way I built it.  So, I
> redid it in Alphabetical order... and it worked.
>
> I'm in the process of taring and pulling everything off.
>
> Once that is done, I'll put the original drives back in, and try to
> understand what went wrong with the original grow/build.
>

And as a final update... I pulled all the data from the 4 disk array I
built from the copied Disks.  Everything looks to be intact.  That is
definitely a better feeling for me.

I then put the original disks back in, and compiled 2.6.3 to see if it
did any better on the assemble.  It appears that your update about the
critical section missing was successful, as mdadm cheerfully informed
me I was out of luck. :)

I'm attaching the strace, even though I don't think it will be of much
help... It appears that you solved the critical section failure.... at
least it's verbose about telling you.

I still don't know what happened originally... I think I had an older
copy of mdadm in my path, and that could have been the issue.
Obviously that's no longer the case.  I'll be using the backup file
flag from now on, and probably won't be quite as daring about flying
without a (current, tested) net. :)

Thanks for your help again.


Attached strace from 2.6.3
root@excimer { ~/mdadm-2.6.3 }$ strace mdadm -A /dev/md0 /dev/sd[bcdef]1
execve("/sbin/mdadm", ["mdadm", "-A", "/dev/md0", "/dev/sdb1",
"/dev/sdc1", "/dev/sdd1", "/dev/sde1", "/dev/sdf1"], [/* 20 vars */])
= 0
uname({sys="Linux", node="excimer", ...}) = 0
brk(0)                                  = 0x807b000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7f78000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=51870, ...}) = 0
mmap2(NULL, 51870, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f6b000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/tls/libc.so.6", O_RDONLY)    = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240O\1"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=1241392, ...}) = 0
mmap2(NULL, 1251484, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7e39000
mmap2(0xb7f61000, 28672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x127) = 0xb7f61000
mmap2(0xb7f68000, 10396, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f68000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7e38000
mprotect(0xb7f61000, 20480, PROT_READ)  = 0
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e388e0,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0xb7f6b000, 51870)               = 0
time(NULL)                              = 1187661547
getpid()                                = 5453
brk(0)                                  = 0x807b000
brk(0x809c000)                          = 0x809c000
open("/etc/mdadm.conf", O_RDONLY)       = -1 ENOENT (No such file or directory)
open("/etc/mdadm/mdadm.conf", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=191, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7f77000
read(3, "DEVICE /dev/sda1\nDEVICE /dev/sdb"..., 4096) = 191
read(3, "", 4096)                       = 0
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0xb7f77000, 4096)                = 0
stat64("/dev/md0", {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 0), ...}) = 0
open("/dev/md0", O_RDWR)                = 3
fstat64(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 0), ...}) = 0
ioctl(3, 0x800c0910, 0xbf828344)        = 0
uname({sys="Linux", node="excimer", ...}) = 0
fstat64(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 0), ...}) = 0
ioctl(3, 0x800c0910, 0xbf8280d4)        = 0
ioctl(3, 0x80480911, 0xbf8282b8)        = -1 ENODEV (No such device)
ioctl(3, 0x932, 0)                      = 0
open("/dev/sdb1", O_RDONLY|O_EXCL)      = 4
fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0
ioctl(4, BLKGETSIZE64, 0xbf827fa0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
ioctl(4, BLKGETSIZE64, 0xbf827ee0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105207808, [500105207808], SEEK_SET) = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
ioctl(4, BLKGETSIZE64, 0xbf827ee0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 0, [0], SEEK_SET)            = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
ioctl(4, BLKGETSIZE64, 0xbf827ee0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 4096, [4096], SEEK_SET)      = 0
read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1024) = 1024
ioctl(4, BLKGETSIZE64, 0xbf827fa0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
open("/dev/sdc1", O_RDONLY|O_EXCL)      = 4
fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
open("/dev/sdd1", O_RDONLY|O_EXCL)      = 4
fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49), ...}) = 0
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
open("/dev/sde1", O_RDONLY|O_EXCL)      = 4
fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 65), ...}) = 0
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
open("/dev/sdf1", O_RDONLY|O_EXCL)      = 4
fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 81), ...}) = 0
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
open("/dev/sdb1", O_RDWR|O_EXCL)        = 4
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
stat64("/dev/sdb1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0
open("/dev/sdc1", O_RDWR|O_EXCL)        = 4
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
stat64("/dev/sdc1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0
open("/dev/sdd1", O_RDWR|O_EXCL)        = 4
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
stat64("/dev/sdd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49), ...}) = 0
open("/dev/sde1", O_RDWR|O_EXCL)        = 4
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
stat64("/dev/sde1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 65), ...}) = 0
open("/dev/sdf1", O_RDWR|O_EXCL)        = 4
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKPG, 0xbf828124)             = -1 EINVAL (Invalid argument)
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
stat64("/dev/sdf1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 81), ...}) = 0
open("/dev/sdc1", O_RDONLY|O_EXCL)      = 4
ioctl(4, BLKGETSIZE64, 0xbf8280b0)      = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
_llseek(4, 500105150464, [500105150464], SEEK_SET) = 0
read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
close(4)                                = 0
open("/dev/sdc1", O_RDWR|O_EXCL)        = 4
open("/dev/sdd1", O_RDWR|O_EXCL)        = 5
open("/dev/sde1", O_RDWR|O_EXCL)        = 6
open("/dev/sdf1", O_RDWR|O_EXCL)        = 7
open("/dev/sdb1", O_RDWR|O_EXCL)        = 8
ioctl(8, BLKGETSIZE64, 0xbf826ed0)      = 0
ioctl(8, BLKFLSBUF, 0)                  = 0
_llseek(8, 500105150464, [500105150464], SEEK_SET) = 0
read(8, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"...,
4096) = 4096
_llseek(8, 500104949760, [500104949760], SEEK_SET) = 0
read(8, "md_backup_data-1\323r\244\251\2\306\206\225xo\5\177\267"..., 68) = 68
close(8)                                = 0
close(7)                                = 0
close(6)                                = 0
close(5)                                = 0
close(4)                                = 0
write(2, "mdadm: Failed to restore critica"..., 62mdadm: Failed to
restore critical section for reshape, sorry.
) = 62
exit_group(1)                           = ?
Process 5453 detached

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-21  2:09           ` Greg Nicholson
@ 2007-08-23 15:04             ` Greg Nicholson
  2007-08-23 15:06               ` Greg Nicholson
                                 ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Greg Nicholson @ 2007-08-23 15:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

<Trimming tons of detail, but keeping the thread>

OK.... I've reproduced the original issue on a seperate box.
2.6.23-rc3 does not like to grow Raid 5 arrays.  MDadm 2.6.3

mdadm --add /dev/md0 /dev/sda1
mdadm -G --backup-file=/root/backup.raid.file /dev/md0

(Yes, I added the backup-file this time... just to be sure.)

Mdadm began the grow, and stopped in the critical section, or right
after creating the backup... Not sure which.  Reboot.

Refused to start the array.  So...

 mdadm -A /dev/md0 /dev/sd[abdefg]1

and we have in /proc/mdstat:

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdg1[0] sda1[5] sdf1[4] sdd1[3] sdb1[2] sde1[1]
      1953535488 blocks super 0.91 level 5, 128k chunk, algorithm 2
[6/6] [UUUUUU]
      [>....................]  reshape =  0.0% (512/488383872)
finish=378469.4min speed=0K/sec

unused devices: <none>

And it's sat there without change for the past 2 hours.  Now, I have a
backup, so frankly, I'm about to blow away the array and just recreate
it, but I thought you should know.

I've got the stripe_cache_size at 8192... 256 and 1024 don't change anything.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-23 15:04             ` Greg Nicholson
@ 2007-08-23 15:06               ` Greg Nicholson
  2007-08-27 10:57               ` Neil Brown
  2007-08-29 13:32               ` Bill Davidsen
  2 siblings, 0 replies; 13+ messages in thread
From: Greg Nicholson @ 2007-08-23 15:06 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On 8/23/07, Greg Nicholson <d0gz.net@gmail.com> wrote:
> <Trimming tons of detail, but keeping the thread>
>
> OK.... I've reproduced the original issue on a seperate box.
> 2.6.23-rc3 does not like to grow Raid 5 arrays.  MDadm 2.6.3
>
> mdadm --add /dev/md0 /dev/sda1
> mdadm -G --backup-file=/root/backup.raid.file /dev/md0
>
> (Yes, I added the backup-file this time... just to be sure.)
>
> Mdadm began the grow, and stopped in the critical section, or right
> after creating the backup... Not sure which.  Reboot.
>
> Refused to start the array.  So...
>
>  mdadm -A /dev/md0 /dev/sd[abdefg]1
>
> and we have in /proc/mdstat:
>
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdg1[0] sda1[5] sdf1[4] sdd1[3] sdb1[2] sde1[1]
>       1953535488 blocks super 0.91 level 5, 128k chunk, algorithm 2
> [6/6] [UUUUUU]
>       [>....................]  reshape =  0.0% (512/488383872)
> finish=378469.4min speed=0K/sec
>
> unused devices: <none>
>
> And it's sat there without change for the past 2 hours.  Now, I have a
> backup, so frankly, I'm about to blow away the array and just recreate
> it, but I thought you should know.
>
> I've got the stripe_cache_size at 8192... 256 and 1024 don't change anything.
>

Forgot the DMESG output:

md: bind<sde1>
md: bind<sdb1>
md: bind<sdd1>
md: bind<sdf1>
md: bind<sda1>
md: bind<sdg1>
md: md0: raid array is not clean -- starting background reconstruction
raid5: reshape will continue
raid5: device sdg1 operational as raid disk 0
raid5: device sda1 operational as raid disk 5
raid5: device sdf1 operational as raid disk 4
raid5: device sdd1 operational as raid disk 3
raid5: device sdb1 operational as raid disk 2
raid5: device sde1 operational as raid disk 1
raid5: allocated 6293kB for md0
raid5: raid level 5 set md0 active with 6 out of 6 devices, algorithm 2
RAID5 conf printout:
 --- rd:6 wd:6
 disk 0, o:1, dev:sdg1
 disk 1, o:1, dev:sde1
 disk 2, o:1, dev:sdb1
 disk 3, o:1, dev:sdd1
 disk 4, o:1, dev:sdf1
 disk 5, o:1, dev:sda1
...ok start reshape thread
md: reshape of RAID array md0
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for reshape.
md: using 128k window, over a total of 488383872 blocks.

Looks good, but it doesn't actually do anything.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-23 15:04             ` Greg Nicholson
  2007-08-23 15:06               ` Greg Nicholson
@ 2007-08-27 10:57               ` Neil Brown
  2007-08-27 16:42                 ` Williams, Dan J
  2007-08-29 13:32               ` Bill Davidsen
  2 siblings, 1 reply; 13+ messages in thread
From: Neil Brown @ 2007-08-27 10:57 UTC (permalink / raw)
  To: Greg Nicholson; +Cc: Dan Williams, linux-raid

On Thursday August 23, d0gz.net@gmail.com wrote:
> <Trimming tons of detail, but keeping the thread>
> 
> OK.... I've reproduced the original issue on a seperate box.
> 2.6.23-rc3 does not like to grow Raid 5 arrays.  MDadm 2.6.3

No, you are right. It doesn't.

Obviously insufficient testing and review - thanks for find it for us.

This patch seems to make it work - raid5 and raid6.

Dan: Could you check it for me, particularly the moving of
+		async_tx_ack(tx);
+		dma_wait_for_async_tx(tx);
outside of the loop.

Greg: could you pleas check it works for you too - it works for me,
but double-testing never hurts.

Thanks again,

NeilBrown



---------------------------------
Fix some bugs with growing raid5/raid6 arrays.



### Diffstat output
 ./drivers/md/raid5.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2007-08-24 16:36:22.000000000 +1000
+++ ./drivers/md/raid5.c	2007-08-27 20:50:57.000000000 +1000
@@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid
 	struct dma_async_tx_descriptor *tx = NULL;
 	clear_bit(STRIPE_EXPAND_SOURCE, &sh->state);
 	for (i = 0; i < sh->disks; i++)
-		if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) {
+		if (i != sh->pd_idx && (!r6s || i != r6s->qd_idx)) {
 			int dd_idx, pd_idx, j;
 			struct stripe_head *sh2;
 
@@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid
 			set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags);
 			for (j = 0; j < conf->raid_disks; j++)
 				if (j != sh2->pd_idx &&
-				    (r6s && j != r6s->qd_idx) &&
+				    (!r6s || j != raid6_next_disk(sh2->pd_idx,
+								 sh2->disks)) &&
 				    !test_bit(R5_Expanded, &sh2->dev[j].flags))
 					break;
 			if (j == conf->raid_disks) {
@@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid
 			}
 			release_stripe(sh2);
 
-			/* done submitting copies, wait for them to complete */
-			if (i + 1 >= sh->disks) {
-				async_tx_ack(tx);
-				dma_wait_for_async_tx(tx);
-			}
 		}
+	/* done submitting copies, wait for them to complete */
+	if (tx) {
+		async_tx_ack(tx);
+		dma_wait_for_async_tx(tx);
+	}
 }
 
 /*
@@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe
 		sh->disks = conf->raid_disks;
 		sh->pd_idx = stripe_to_pdidx(sh->sector, conf,
 			conf->raid_disks);
-		s.locked += handle_write_operations5(sh, 0, 1);
+		s.locked += handle_write_operations5(sh, 1, 1);
 	} else if (s.expanded &&
 		!test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) {
 		clear_bit(STRIPE_EXPAND_READY, &sh->state);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Raid5 Reshape gone wrong, please help
  2007-08-27 10:57               ` Neil Brown
@ 2007-08-27 16:42                 ` Williams, Dan J
  2007-08-30  5:47                   ` Neil Brown
  0 siblings, 1 reply; 13+ messages in thread
From: Williams, Dan J @ 2007-08-27 16:42 UTC (permalink / raw)
  To: Neil Brown, Greg Nicholson; +Cc: linux-raid

> From: Neil Brown [mailto:neilb@suse.de]
> On Thursday August 23, d0gz.net@gmail.com wrote:
> > <Trimming tons of detail, but keeping the thread>
> >
> > OK.... I've reproduced the original issue on a seperate box.
> > 2.6.23-rc3 does not like to grow Raid 5 arrays.  MDadm 2.6.3
> 
> No, you are right. It doesn't.
> 
> Obviously insufficient testing and review - thanks for find it for us.
> 
Agreed - seconded.

> This patch seems to make it work - raid5 and raid6.
> 
> Dan: Could you check it for me, particularly the moving of
> +		async_tx_ack(tx);
> +		dma_wait_for_async_tx(tx);
> outside of the loop.
> 
Yes, this definitely needs to be outside the loop.

> Greg: could you pleas check it works for you too - it works for me,
> but double-testing never hurts.
> 
> Thanks again,
> 
> NeilBrown
> 
> 
> 
> ---------------------------------
> Fix some bugs with growing raid5/raid6 arrays.
> 
> 
> 
> ### Diffstat output
>  ./drivers/md/raid5.c |   17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
> 
> diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
> --- .prev/drivers/md/raid5.c	2007-08-24 16:36:22.000000000 +1000
> +++ ./drivers/md/raid5.c	2007-08-27 20:50:57.000000000 +1000
> @@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid
>  	struct dma_async_tx_descriptor *tx = NULL;
>  	clear_bit(STRIPE_EXPAND_SOURCE, &sh->state);
>  	for (i = 0; i < sh->disks; i++)
> -		if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) {
> +		if (i != sh->pd_idx && (!r6s || i != r6s->qd_idx)) {
>  			int dd_idx, pd_idx, j;
>  			struct stripe_head *sh2;
> 
> @@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid
>  			set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags);
>  			for (j = 0; j < conf->raid_disks; j++)
>  				if (j != sh2->pd_idx &&
> -				    (r6s && j != r6s->qd_idx) &&
> +				    (!r6s || j !=
raid6_next_disk(sh2->pd_idx,
> +
sh2->disks)) &&
>  				    !test_bit(R5_Expanded,
&sh2->dev[j].flags))
>  					break;
>  			if (j == conf->raid_disks) {
> @@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid
>  			}
>  			release_stripe(sh2);
> 
> -			/* done submitting copies, wait for them to
complete */
> -			if (i + 1 >= sh->disks) {
> -				async_tx_ack(tx);
> -				dma_wait_for_async_tx(tx);
> -			}
>  		}
> +	/* done submitting copies, wait for them to complete */
> +	if (tx) {
> +		async_tx_ack(tx);
> +		dma_wait_for_async_tx(tx);
> +	}
>  }
> 
>  /*
> @@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe
>  		sh->disks = conf->raid_disks;
>  		sh->pd_idx = stripe_to_pdidx(sh->sector, conf,
>  			conf->raid_disks);
> -		s.locked += handle_write_operations5(sh, 0, 1);
> +		s.locked += handle_write_operations5(sh, 1, 1);
How about for clarity:
	s.locked += handle_write_operations5(sh, RECONSTRUCT_WRITE, 1);

>  	} else if (s.expanded &&
>  		!test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) {
>  		clear_bit(STRIPE_EXPAND_READY, &sh->state);

Signed-off-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Raid5 Reshape gone wrong, please help
  2007-08-27 16:42                 ` Williams, Dan J
@ 2007-08-30  5:47                   ` Neil Brown
  0 siblings, 0 replies; 13+ messages in thread
From: Neil Brown @ 2007-08-30  5:47 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: Greg Nicholson, linux-raid

On Monday August 27, dan.j.williams@intel.com wrote:
> > -		s.locked += handle_write_operations5(sh, 0, 1);
> > +		s.locked += handle_write_operations5(sh, 1, 1);
> How about for clarity:
> 	s.locked += handle_write_operations5(sh, RECONSTRUCT_WRITE, 1);
> 

Nope.  That second argument is a boolean, not an enum.
If it was changed to 'writemode' (or similar) and the code in
handle_write_operations5 were changed to

  switch(writemode) {
  case RECONSTRUCT_WRITE:
         ....
  case READ_MODIFY_WRITE:
         ....
  }

Then it would make sense to use RECONSTRUCT_WRITE in the call - and
the code would probably be more readable on the whole.
But as it is, either 'true' or '1' should go there.

NeilBrown

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Raid5 Reshape gone wrong, please help
  2007-08-23 15:04             ` Greg Nicholson
  2007-08-23 15:06               ` Greg Nicholson
  2007-08-27 10:57               ` Neil Brown
@ 2007-08-29 13:32               ` Bill Davidsen
  2 siblings, 0 replies; 13+ messages in thread
From: Bill Davidsen @ 2007-08-29 13:32 UTC (permalink / raw)
  To: Greg Nicholson; +Cc: Neil Brown, linux-raid

Greg Nicholson wrote:
> <Trimming tons of detail, but keeping the thread>
>
> OK.... I've reproduced the original issue on a seperate box.
> 2.6.23-rc3 does not like to grow Raid 5 arrays.  MDadm 2.6.3
>   

I have to say that trying something as critical as a reshape of live 
data on an -rc kernel is a great way to have a learning experience. Good 
that you found the problem, but also good that *you* found the problem, 
not me.

Thanks for testing. ;-)
> mdadm --add /dev/md0 /dev/sda1
> mdadm -G --backup-file=/root/backup.raid.file /dev/md0
>
> (Yes, I added the backup-file this time... just to be sure.)
>
> Mdadm began the grow, and stopped in the critical section, or right
> after creating the backup... Not sure which.  Reboot.
>
> Refused to start the array.  So...
>
>  mdadm -A /dev/md0 /dev/sd[abdefg]1
>
> and we have in /proc/mdstat:
>
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid5 sdg1[0] sda1[5] sdf1[4] sdd1[3] sdb1[2] sde1[1]
>       1953535488 blocks super 0.91 level 5, 128k chunk, algorithm 2
> [6/6] [UUUUUU]
>       [>....................]  reshape =  0.0% (512/488383872)
> finish=378469.4min speed=0K/sec
>
> unused devices: <none>
>
> And it's sat there without change for the past 2 hours.  Now, I have a
> backup, so frankly, I'm about to blow away the array and just recreate
> it, but I thought you should know.
>
> I've got the stripe_cache_size at 8192... 256 and 1024 don't change anything.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   


-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-08-30  5:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-18  1:26 Raid5 Reshape gone wrong, please help Greg Nicholson
2007-08-18  8:56 ` Neil Brown
2007-08-18 15:37   ` Greg Nicholson
2007-08-19 11:17     ` Neil Brown
2007-08-19 15:45       ` Greg Nicholson
2007-08-20  2:44         ` Greg Nicholson
2007-08-21  2:09           ` Greg Nicholson
2007-08-23 15:04             ` Greg Nicholson
2007-08-23 15:06               ` Greg Nicholson
2007-08-27 10:57               ` Neil Brown
2007-08-27 16:42                 ` Williams, Dan J
2007-08-30  5:47                   ` Neil Brown
2007-08-29 13:32               ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).