* Raid5 Reshape gone wrong, please help
@ 2007-08-18 1:26 Greg Nicholson
2007-08-18 8:56 ` Neil Brown
0 siblings, 1 reply; 13+ messages in thread
From: Greg Nicholson @ 2007-08-18 1:26 UTC (permalink / raw)
To: linux-raid
I was trying to resize a Raid 5 array of 4 500G drives to 5. Kernel
version 2.6.23-rc3 was the kernel I STARTED on this.
I added the device to the array :
mdadm --add /dev/md0 /dev/sdb1
Then I started to grow the array :
mdadm --grow /dev/md0 --raid-devices=5
At this point the machine locked up. Not good.
I ended up having to hard reboot. Now, I have the following in dmesg :
md: md0: raid array is not clean -- starting background reconstruction
raid5: reshape_position too early for auto-recovery - aborting.
md: pers->run() failed ...
/proc/mdstat is :
Personalities : [raid6] [raid5] [raid4]
md0 : inactive sdf1[0] sdb1[4] sdc1[3] sdd1[2] sde1[1]
2441918720 blocks super 0.91
unused devices: <none>
It doesn't look like it actually DID anything besides update the raid
count to 5 from 4. (I think.)
How do I do a manual recovery on this?
Examining the disks:
mdadm -E /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.91.00
UUID : a9a472d3:9586c602:9207b56b:a5185bd3
Creation Time : Thu Dec 21 09:42:27 2006
Raid Level : raid5
Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Reshape pos'n : 0
Delta Devices : 1 (4->5)
Update Time : Fri Aug 17 19:49:43 2007
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : c8ebb87b - correct
Events : 0.2795
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 4 8 17 4 active sync /dev/sdb1
0 0 8 81 0 active sync /dev/sdf1
1 1 8 65 1 active sync /dev/sde1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 8 17 4 active sync /dev/sdb1
mdadm -E /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 00.91.00
UUID : a9a472d3:9586c602:9207b56b:a5185bd3
Creation Time : Thu Dec 21 09:42:27 2006
Raid Level : raid5
Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Reshape pos'n : 0
Delta Devices : 1 (4->5)
Update Time : Fri Aug 17 19:49:43 2007
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : c8ebb889 - correct
Events : 0.2795
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 3 8 33 3 active sync /dev/sdc1
0 0 8 81 0 active sync /dev/sdf1
1 1 8 65 1 active sync /dev/sde1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 8 17 4 active sync /dev/sdb1
mdadm -E /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 00.91.00
UUID : a9a472d3:9586c602:9207b56b:a5185bd3
Creation Time : Thu Dec 21 09:42:27 2006
Raid Level : raid5
Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Reshape pos'n : 0
Delta Devices : 1 (4->5)
Update Time : Fri Aug 17 19:49:43 2007
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : c8ebb897 - correct
Events : 0.2795
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 2 8 49 2 active sync /dev/sdd1
0 0 8 81 0 active sync /dev/sdf1
1 1 8 65 1 active sync /dev/sde1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 8 17 4 active sync /dev/sdb1
/dev/sde1:
Magic : a92b4efc
Version : 00.91.00
UUID : a9a472d3:9586c602:9207b56b:a5185bd3
Creation Time : Thu Dec 21 09:42:27 2006
Raid Level : raid5
Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Reshape pos'n : 0
Delta Devices : 1 (4->5)
Update Time : Fri Aug 17 19:49:43 2007
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : c8ebb8a5 - correct
Events : 0.2795
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 1 8 65 1 active sync /dev/sde1
0 0 8 81 0 active sync /dev/sdf1
1 1 8 65 1 active sync /dev/sde1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 8 17 4 active sync /dev/sdb1
/dev/sdf1:
Magic : a92b4efc
Version : 00.91.00
UUID : a9a472d3:9586c602:9207b56b:a5185bd3
Creation Time : Thu Dec 21 09:42:27 2006
Raid Level : raid5
Used Dev Size : 488383744 (465.76 GiB 500.10 GB)
Array Size : 1953534976 (1863.04 GiB 2000.42 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0
Reshape pos'n : 0
Delta Devices : 1 (4->5)
Update Time : Fri Aug 17 19:49:43 2007
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : c8ebb8b3 - correct
Events : 0.2795
Layout : left-symmetric
Chunk Size : 256K
Number Major Minor RaidDevice State
this 0 8 81 0 active sync /dev/sdf1
0 0 8 81 0 active sync /dev/sdf1
1 1 8 65 1 active sync /dev/sde1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 8 17 4 active sync /dev/sdb1
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: Raid5 Reshape gone wrong, please help 2007-08-18 1:26 Raid5 Reshape gone wrong, please help Greg Nicholson @ 2007-08-18 8:56 ` Neil Brown 2007-08-18 15:37 ` Greg Nicholson 0 siblings, 1 reply; 13+ messages in thread From: Neil Brown @ 2007-08-18 8:56 UTC (permalink / raw) To: Greg Nicholson; +Cc: linux-raid On Friday August 17, d0gz.net@gmail.com wrote: > I was trying to resize a Raid 5 array of 4 500G drives to 5. Kernel > version 2.6.23-rc3 was the kernel I STARTED on this. > > I added the device to the array : > mdadm --add /dev/md0 /dev/sdb1 > > Then I started to grow the array : > mdadm --grow /dev/md0 --raid-devices=5 > > At this point the machine locked up. Not good. No, not good. But it shouldn't be fatal. > > I ended up having to hard reboot. Now, I have the following in dmesg : > > md: md0: raid array is not clean -- starting background reconstruction > raid5: reshape_position too early for auto-recovery - aborting. > md: pers->run() failed ... Looks like you crashed during the 'critical' period. > > /proc/mdstat is : > > Personalities : [raid6] [raid5] [raid4] > md0 : inactive sdf1[0] sdb1[4] sdc1[3] sdd1[2] sde1[1] > 2441918720 blocks super 0.91 > > unused devices: <none> > > > It doesn't look like it actually DID anything besides update the raid > count to 5 from 4. (I think.) > > How do I do a manual recovery on this? Simply use mdadm to assemble the array: mdadm -A /dev/md0 /dev/sd[bcdef]1 It should notice that the kernel needs help, and will provide that help. Specifically, when you started the 'grow', mdadm copied the first few stripes into unused space in the new device. When you re-assemble, it will copy those stripes back into the new layout, then let the kernel do the rest. Please let us know how it goes. NeilBrown ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Raid5 Reshape gone wrong, please help 2007-08-18 8:56 ` Neil Brown @ 2007-08-18 15:37 ` Greg Nicholson 2007-08-19 11:17 ` Neil Brown 0 siblings, 1 reply; 13+ messages in thread From: Greg Nicholson @ 2007-08-18 15:37 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On 8/18/07, Neil Brown <neilb@suse.de> wrote: > On Friday August 17, d0gz.net@gmail.com wrote: > > I was trying to resize a Raid 5 array of 4 500G drives to 5. Kernel > > version 2.6.23-rc3 was the kernel I STARTED on this. > > > > I added the device to the array : > > mdadm --add /dev/md0 /dev/sdb1 > > > > Then I started to grow the array : > > mdadm --grow /dev/md0 --raid-devices=5 > > > > At this point the machine locked up. Not good. > > No, not good. But it shouldn't be fatal. Well, that was my thought as well. > > > > > I ended up having to hard reboot. Now, I have the following in dmesg : > > > > md: md0: raid array is not clean -- starting background reconstruction > > raid5: reshape_position too early for auto-recovery - aborting. > > md: pers->run() failed ... > > Looks like you crashed during the 'critical' period. > > > > > /proc/mdstat is : > > > > Personalities : [raid6] [raid5] [raid4] > > md0 : inactive sdf1[0] sdb1[4] sdc1[3] sdd1[2] sde1[1] > > 2441918720 blocks super 0.91 > > > > unused devices: <none> > > > > > > It doesn't look like it actually DID anything besides update the raid > > count to 5 from 4. (I think.) > > > > How do I do a manual recovery on this? > > Simply use mdadm to assemble the array: > > mdadm -A /dev/md0 /dev/sd[bcdef]1 > > It should notice that the kernel needs help, and will provide > that help. > Specifically, when you started the 'grow', mdadm copied the first few > stripes into unused space in the new device. When you re-assemble, it > will copy those stripes back into the new layout, then let the kernel > do the rest. > > Please let us know how it goes. > > NeilBrown > I had already tried to assemble it by hand, before I basically said... WAIT. Ask for help. Don't screw up more. :) But I tried again: root@excimer { }$ mdadm -A /dev/md0 /dev/sd[bcdef]1 mdadm: device /dev/md0 already active - cannot assemble it root@excimer { ~ }$ mdadm -S /dev/md0 mdadm: stopped /dev/md0 root@excimer { ~ }$ mdadm -A /dev/md0 /dev/sd[bcdef]1 mdadm: failed to RUN_ARRAY /dev/md0: Invalid argument Dmesg shows: md: md0 stopped. md: unbind<sdf1> md: export_rdev(sdf1) md: unbind<sdb1> md: export_rdev(sdb1) md: unbind<sdc1> md: export_rdev(sdc1) md: unbind<sdd1> md: export_rdev(sdd1) md: unbind<sde1> md: export_rdev(sde1) md: md0 stopped. md: bind<sde1> md: bind<sdd1> md: bind<sdc1> md: bind<sdb1> md: bind<sdf1> md: md0: raid array is not clean -- starting background reconstruction raid5: reshape_position too early for auto-recovery - aborting. md: pers->run() failed ... md: md0 stopped. md: unbind<sdf1> md: export_rdev(sdf1) md: unbind<sdb1> md: export_rdev(sdb1) md: unbind<sdc1> md: export_rdev(sdc1) md: unbind<sdd1> md: export_rdev(sdd1) md: unbind<sde1> md: export_rdev(sde1) md: md0 stopped. md: bind<sde1> md: bind<sdd1> md: bind<sdc1> md: bind<sdb1> md: bind<sdf1> md: md0: raid array is not clean -- starting background reconstruction raid5: reshape_position too early for auto-recovery - aborting. md: pers->run() failed ... And the raid stays in an inactive state. Using mdadm v2.6.2 and kernel 2.6.23-rc3, although I can push back to earlier versions easily if it would help. I know that sdb1 is the new device. When mdadm ran, it said the critical section was 3920k (approximately). When I didn't get a response for five minutes, and there wasn't ANY disk activity, I booted the box. Based on your message and the man page, it sounds like mdadm should have placed something on sdb1. So... Trying to be non-destructive, but still gather information: dd if=/dev/sdb1 of=/tmp/test bs=1024k count=1000 hexdump /tmp/test 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 3e800000 dd if=/dev/sdb1 of=/tmp/test bs=1024k count=1000 skip=999 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 35.0176 seconds, 29.9 MB/s root@excimer { ~ }$ hexdump /tmp/test 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 3e800000 That looks to me like the first 2 gig is completely empty on the drive. I really don't think it actually started to do anything. Do you have further suggestions on where to go now? Oh, and thank you very much for your help. Most of the data on this array I can stand to loose... It's not critical, but there are some of my photographs on this that my backup is out of date on. I can destroy it all and start over, but really want to try to recover this if it's possible. For that matter, if it didn't actually start rewriting the stripes, is there anyway to push it back down to 4 disks to recover ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Raid5 Reshape gone wrong, please help 2007-08-18 15:37 ` Greg Nicholson @ 2007-08-19 11:17 ` Neil Brown 2007-08-19 15:45 ` Greg Nicholson 0 siblings, 1 reply; 13+ messages in thread From: Neil Brown @ 2007-08-19 11:17 UTC (permalink / raw) To: Greg Nicholson; +Cc: linux-raid On Saturday August 18, d0gz.net@gmail.com wrote: > > That looks to me like the first 2 gig is completely empty on the > drive. I really don't think it actually started to do anything. The backup data is near the end of the device. If you look at the last 2 gig you should see something. > > Do you have further suggestions on where to go now? Maybe an 'strace' of "mdadm -A ...." might show me something. If you feel like following the code, Assemble (in Assemble.c) should call Grow_restart. This should look in /dev/sdb1 (which is already open in 'fdlist') by calling 'load_super'. It should then seek to 8 sectors before the superblock (or close to there) and read a secondary superblock which describes the backup data. If this looks good, it seeks to where the backup data is (which is towards the end of the device) and reads that. It uses this to restore the 'critical section', and then updates the superblock on all devices. As you aren't getting the messages 'restoring critical section', something is going wrong before there. It should fail: /dev/md0: Failed to restore critical section for reshape, sorry. but I can see that there is a problem with the error return from 'Grow_restart'. I'll get that fixed. > > Oh, and thank you very much for your help. Most of the data on this > array I can stand to loose... It's not critical, but there are some of > my photographs on this that my backup is out of date on. I can > destroy it all and start over, but really want to try to recover this > if it's possible. For that matter, if it didn't actually start > rewriting the stripes, is there anyway to push it back down to 4 disks > to recover ? You could always just recreate the array: mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1 \ /dev/sdd1 /dev/sdc1 and make sure the data looks good (which it should). I'd still like to know that the problem is though.... Thanks, NeilBeon ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Raid5 Reshape gone wrong, please help 2007-08-19 11:17 ` Neil Brown @ 2007-08-19 15:45 ` Greg Nicholson 2007-08-20 2:44 ` Greg Nicholson 0 siblings, 1 reply; 13+ messages in thread From: Greg Nicholson @ 2007-08-19 15:45 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On 8/19/07, Neil Brown <neilb@suse.de> wrote: > On Saturday August 18, d0gz.net@gmail.com wrote: > > > > That looks to me like the first 2 gig is completely empty on the > > drive. I really don't think it actually started to do anything. > > The backup data is near the end of the device. If you look at the > last 2 gig you should see something. > I figured something like that after I started thinking about it... That device is currently offline while I do some DD's to new devices. > > > > Do you have further suggestions on where to go now? > > Maybe an 'strace' of "mdadm -A ...." might show me something. > > If you feel like following the code, Assemble (in Assemble.c) should > call Grow_restart. > This should look in /dev/sdb1 (which is already open in 'fdlist') by > calling 'load_super'. It should then seek to 8 sectors before the > superblock (or close to there) and read a secondary superblock which > describes the backup data. > If this looks good, it seeks to where the backup data is (which is > towards the end of the device) and reads that. It uses this to > restore the 'critical section', and then updates the superblock on all > devices. > > As you aren't getting the messages 'restoring critical section', > something is going wrong before there. It should fail: > /dev/md0: Failed to restore critical section for reshape, sorry. > but I can see that there is a problem with the error return from > 'Grow_restart'. I'll get that fixed. > > > > > > Oh, and thank you very much for your help. Most of the data on this > > array I can stand to loose... It's not critical, but there are some of > > my photographs on this that my backup is out of date on. I can > > destroy it all and start over, but really want to try to recover this > > if it's possible. For that matter, if it didn't actually start > > rewriting the stripes, is there anyway to push it back down to 4 disks > > to recover ? > > You could always just recreate the array: > > mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1 \ > /dev/sdd1 /dev/sdc1 > > and make sure the data looks good (which it should). > > I'd still like to know that the problem is though.... > > Thanks, > NeilBeon > My current plan of attack, which I've been proceeding upon for the last 24 hours... I'm DDing the original drives to new devices. Once I have copies of the drives, I'm going to try to recreate the array as a 4 device array. Hopefully, at that point, the raid will come up, LVM will initialize, and it's time to saturate the GigE offloading EVERYTHING. Assuming the above goes well.... which will definitely take some time, Then I'll take the original drives, run the strace and try to get some additional data for you. I'd love to know what's up with this as well. If there is additional information I can get you to help, let me know. I've grown several arrays before without any issue, which frankly is why I didn't think this would have been an issue.... thus, my offload of the stuff I actually cared about wasn't up to date. At the end of day (or more likely, week) I'll completely destroy the existing raid, and rebuild the entire thing to make sure I'm starting from a good base. At least at that point, I'll have additional drives. Given that I have dual File-servers that will have drives added, it seems likely that I'll be testing the code again soon. Big difference being that this time, I won't make the assumption that everything will be perfect. :) Thanks again for your help, I'll post on my results as well as try to get you that strace. It's been quite a while since I dove into kernel internals, or C for that matter, so it's unlikely I'm going to find anything myself.... But I'll definitely send results back if I can. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Raid5 Reshape gone wrong, please help 2007-08-19 15:45 ` Greg Nicholson @ 2007-08-20 2:44 ` Greg Nicholson 2007-08-21 2:09 ` Greg Nicholson 0 siblings, 1 reply; 13+ messages in thread From: Greg Nicholson @ 2007-08-20 2:44 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On 8/19/07, Greg Nicholson <d0gz.net@gmail.com> wrote: > On 8/19/07, Neil Brown <neilb@suse.de> wrote: > > On Saturday August 18, d0gz.net@gmail.com wrote: > > > > > > That looks to me like the first 2 gig is completely empty on the > > > drive. I really don't think it actually started to do anything. > > > > The backup data is near the end of the device. If you look at the > > last 2 gig you should see something. > > > > I figured something like that after I started thinking about it... > That device is currently offline while I do some DD's to new devices. > > > > > > > Do you have further suggestions on where to go now? > > > > Maybe an 'strace' of "mdadm -A ...." might show me something. > > > > If you feel like following the code, Assemble (in Assemble.c) should > > call Grow_restart. > > This should look in /dev/sdb1 (which is already open in 'fdlist') by > > calling 'load_super'. It should then seek to 8 sectors before the > > superblock (or close to there) and read a secondary superblock which > > describes the backup data. > > If this looks good, it seeks to where the backup data is (which is > > towards the end of the device) and reads that. It uses this to > > restore the 'critical section', and then updates the superblock on all > > devices. > > > > As you aren't getting the messages 'restoring critical section', > > something is going wrong before there. It should fail: > > /dev/md0: Failed to restore critical section for reshape, sorry. > > but I can see that there is a problem with the error return from > > 'Grow_restart'. I'll get that fixed. > > > > > > > > > > Oh, and thank you very much for your help. Most of the data on this > > > array I can stand to loose... It's not critical, but there are some of > > > my photographs on this that my backup is out of date on. I can > > > destroy it all and start over, but really want to try to recover this > > > if it's possible. For that matter, if it didn't actually start > > > rewriting the stripes, is there anyway to push it back down to 4 disks > > > to recover ? > > > > You could always just recreate the array: > > > > mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1 \ > > /dev/sdd1 /dev/sdc1 > > > > and make sure the data looks good (which it should). > > > > I'd still like to know that the problem is though.... > > > > Thanks, > > NeilBeon > > > > My current plan of attack, which I've been proceeding upon for the > last 24 hours... I'm DDing the original drives to new devices. Once I > have copies of the drives, I'm going to try to recreate the array as a > 4 device array. Hopefully, at that point, the raid will come up, LVM > will initialize, and it's time to saturate the GigE offloading > EVERYTHING. > > Assuming the above goes well.... which will definitely take some time, > Then I'll take the original drives, run the strace and try to get some > additional data for you. I'd love to know what's up with this as > well. If there is additional information I can get you to help, let > me know. I've grown several arrays before without any issue, which > frankly is why I didn't think this would have been an issue.... thus, > my offload of the stuff I actually cared about wasn't up to date. > > At the end of day (or more likely, week) I'll completely destroy the > existing raid, and rebuild the entire thing to make sure I'm starting > from a good base. At least at that point, I'll have additional > drives. Given that I have dual File-servers that will have drives > added, it seems likely that I'll be testing the code again soon. Big > difference being that this time, I won't make the assumption that > everything will be perfect. :) > > Thanks again for your help, I'll post on my results as well as try to > get you that strace. It's been quite a while since I dove into kernel > internals, or C for that matter, so it's unlikely I'm going to find > anything myself.... But I'll definitely send results back if I can. > Ok, as an update. ORDER MATTERS. :) The above command didn't work. It built, but LVM didn't recognize. So, after despair, I thought, that's not the way I built it. So, I redid it in Alphabetical order... and it worked. I'm in the process of taring and pulling everything off. Once that is done, I'll put the original drives back in, and try to understand what went wrong with the original grow/build. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Raid5 Reshape gone wrong, please help 2007-08-20 2:44 ` Greg Nicholson @ 2007-08-21 2:09 ` Greg Nicholson 2007-08-23 15:04 ` Greg Nicholson 0 siblings, 1 reply; 13+ messages in thread From: Greg Nicholson @ 2007-08-21 2:09 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On 8/19/07, Greg Nicholson <d0gz.net@gmail.com> wrote: > On 8/19/07, Greg Nicholson <d0gz.net@gmail.com> wrote: > > On 8/19/07, Neil Brown <neilb@suse.de> wrote: > > > On Saturday August 18, d0gz.net@gmail.com wrote: > > > > > > > > That looks to me like the first 2 gig is completely empty on the > > > > drive. I really don't think it actually started to do anything. > > > > > > The backup data is near the end of the device. If you look at the > > > last 2 gig you should see something. > > > > > > > I figured something like that after I started thinking about it... > > That device is currently offline while I do some DD's to new devices. > > > > > > > > > > Do you have further suggestions on where to go now? > > > > > > Maybe an 'strace' of "mdadm -A ...." might show me something. > > > > > > If you feel like following the code, Assemble (in Assemble.c) should > > > call Grow_restart. > > > This should look in /dev/sdb1 (which is already open in 'fdlist') by > > > calling 'load_super'. It should then seek to 8 sectors before the > > > superblock (or close to there) and read a secondary superblock which > > > describes the backup data. > > > If this looks good, it seeks to where the backup data is (which is > > > towards the end of the device) and reads that. It uses this to > > > restore the 'critical section', and then updates the superblock on all > > > devices. > > > > > > As you aren't getting the messages 'restoring critical section', > > > something is going wrong before there. It should fail: > > > /dev/md0: Failed to restore critical section for reshape, sorry. > > > but I can see that there is a problem with the error return from > > > 'Grow_restart'. I'll get that fixed. > > > > > > > > > > > > > > Oh, and thank you very much for your help. Most of the data on this > > > > array I can stand to loose... It's not critical, but there are some of > > > > my photographs on this that my backup is out of date on. I can > > > > destroy it all and start over, but really want to try to recover this > > > > if it's possible. For that matter, if it didn't actually start > > > > rewriting the stripes, is there anyway to push it back down to 4 disks > > > > to recover ? > > > > > > You could always just recreate the array: > > > > > > mdadm -C /dev/md0 -l5 -n4 -c256 --assume-clean /dev/sdf1 /dev/sde1 \ > > > /dev/sdd1 /dev/sdc1 > > > > > > and make sure the data looks good (which it should). > > > > > > I'd still like to know that the problem is though.... > > > > > > Thanks, > > > NeilBeon > > > > > > > My current plan of attack, which I've been proceeding upon for the > > last 24 hours... I'm DDing the original drives to new devices. Once I > > have copies of the drives, I'm going to try to recreate the array as a > > 4 device array. Hopefully, at that point, the raid will come up, LVM > > will initialize, and it's time to saturate the GigE offloading > > EVERYTHING. > > > > Assuming the above goes well.... which will definitely take some time, > > Then I'll take the original drives, run the strace and try to get some > > additional data for you. I'd love to know what's up with this as > > well. If there is additional information I can get you to help, let > > me know. I've grown several arrays before without any issue, which > > frankly is why I didn't think this would have been an issue.... thus, > > my offload of the stuff I actually cared about wasn't up to date. > > > > At the end of day (or more likely, week) I'll completely destroy the > > existing raid, and rebuild the entire thing to make sure I'm starting > > from a good base. At least at that point, I'll have additional > > drives. Given that I have dual File-servers that will have drives > > added, it seems likely that I'll be testing the code again soon. Big > > difference being that this time, I won't make the assumption that > > everything will be perfect. :) > > > > Thanks again for your help, I'll post on my results as well as try to > > get you that strace. It's been quite a while since I dove into kernel > > internals, or C for that matter, so it's unlikely I'm going to find > > anything myself.... But I'll definitely send results back if I can. > > > > > Ok, as an update. ORDER MATTERS. :) > > The above command didn't work. It built, but LVM didn't recognize. > So, after despair, I thought, that's not the way I built it. So, I > redid it in Alphabetical order... and it worked. > > I'm in the process of taring and pulling everything off. > > Once that is done, I'll put the original drives back in, and try to > understand what went wrong with the original grow/build. > And as a final update... I pulled all the data from the 4 disk array I built from the copied Disks. Everything looks to be intact. That is definitely a better feeling for me. I then put the original disks back in, and compiled 2.6.3 to see if it did any better on the assemble. It appears that your update about the critical section missing was successful, as mdadm cheerfully informed me I was out of luck. :) I'm attaching the strace, even though I don't think it will be of much help... It appears that you solved the critical section failure.... at least it's verbose about telling you. I still don't know what happened originally... I think I had an older copy of mdadm in my path, and that could have been the issue. Obviously that's no longer the case. I'll be using the backup file flag from now on, and probably won't be quite as daring about flying without a (current, tested) net. :) Thanks for your help again. Attached strace from 2.6.3 root@excimer { ~/mdadm-2.6.3 }$ strace mdadm -A /dev/md0 /dev/sd[bcdef]1 execve("/sbin/mdadm", ["mdadm", "-A", "/dev/md0", "/dev/sdb1", "/dev/sdc1", "/dev/sdd1", "/dev/sde1", "/dev/sdf1"], [/* 20 vars */]) = 0 uname({sys="Linux", node="excimer", ...}) = 0 brk(0) = 0x807b000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f78000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=51870, ...}) = 0 mmap2(NULL, 51870, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f6b000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/tls/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240O\1"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0644, st_size=1241392, ...}) = 0 mmap2(NULL, 1251484, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e39000 mmap2(0xb7f61000, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x127) = 0xb7f61000 mmap2(0xb7f68000, 10396, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f68000 close(3) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e38000 mprotect(0xb7f61000, 20480, PROT_READ) = 0 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e388e0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 munmap(0xb7f6b000, 51870) = 0 time(NULL) = 1187661547 getpid() = 5453 brk(0) = 0x807b000 brk(0x809c000) = 0x809c000 open("/etc/mdadm.conf", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/mdadm/mdadm.conf", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=191, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f77000 read(3, "DEVICE /dev/sda1\nDEVICE /dev/sdb"..., 4096) = 191 read(3, "", 4096) = 0 read(3, "", 4096) = 0 close(3) = 0 munmap(0xb7f77000, 4096) = 0 stat64("/dev/md0", {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 0), ...}) = 0 open("/dev/md0", O_RDWR) = 3 fstat64(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 0), ...}) = 0 ioctl(3, 0x800c0910, 0xbf828344) = 0 uname({sys="Linux", node="excimer", ...}) = 0 fstat64(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 0), ...}) = 0 ioctl(3, 0x800c0910, 0xbf8280d4) = 0 ioctl(3, 0x80480911, 0xbf8282b8) = -1 ENODEV (No such device) ioctl(3, 0x932, 0) = 0 open("/dev/sdb1", O_RDONLY|O_EXCL) = 4 fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0 ioctl(4, BLKGETSIZE64, 0xbf827fa0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 ioctl(4, BLKGETSIZE64, 0xbf827ee0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105207808, [500105207808], SEEK_SET) = 0 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 ioctl(4, BLKGETSIZE64, 0xbf827ee0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 0, [0], SEEK_SET) = 0 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 ioctl(4, BLKGETSIZE64, 0xbf827ee0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 4096, [4096], SEEK_SET) = 0 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 ioctl(4, BLKGETSIZE64, 0xbf827fa0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 open("/dev/sdc1", O_RDONLY|O_EXCL) = 4 fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0 ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 open("/dev/sdd1", O_RDONLY|O_EXCL) = 4 fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49), ...}) = 0 ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 open("/dev/sde1", O_RDONLY|O_EXCL) = 4 fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 65), ...}) = 0 ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 open("/dev/sdf1", O_RDONLY|O_EXCL) = 4 fstat64(4, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 81), ...}) = 0 ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 open("/dev/sdb1", O_RDWR|O_EXCL) = 4 ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 stat64("/dev/sdb1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0 open("/dev/sdc1", O_RDWR|O_EXCL) = 4 ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 stat64("/dev/sdc1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0 open("/dev/sdd1", O_RDWR|O_EXCL) = 4 ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 stat64("/dev/sdd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49), ...}) = 0 open("/dev/sde1", O_RDWR|O_EXCL) = 4 ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 stat64("/dev/sde1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 65), ...}) = 0 open("/dev/sdf1", O_RDWR|O_EXCL) = 4 ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKPG, 0xbf828124) = -1 EINVAL (Invalid argument) ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 stat64("/dev/sdf1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 81), ...}) = 0 open("/dev/sdc1", O_RDONLY|O_EXCL) = 4 ioctl(4, BLKGETSIZE64, 0xbf8280b0) = 0 ioctl(4, BLKFLSBUF, 0) = 0 _llseek(4, 500105150464, [500105150464], SEEK_SET) = 0 read(4, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 close(4) = 0 open("/dev/sdc1", O_RDWR|O_EXCL) = 4 open("/dev/sdd1", O_RDWR|O_EXCL) = 5 open("/dev/sde1", O_RDWR|O_EXCL) = 6 open("/dev/sdf1", O_RDWR|O_EXCL) = 7 open("/dev/sdb1", O_RDWR|O_EXCL) = 8 ioctl(8, BLKGETSIZE64, 0xbf826ed0) = 0 ioctl(8, BLKFLSBUF, 0) = 0 _llseek(8, 500105150464, [500105150464], SEEK_SET) = 0 read(8, "\374N+\251\0\0\0\0[\0\0\0\0\0\0\0\0\0\0\0\323r\244\251"..., 4096) = 4096 _llseek(8, 500104949760, [500104949760], SEEK_SET) = 0 read(8, "md_backup_data-1\323r\244\251\2\306\206\225xo\5\177\267"..., 68) = 68 close(8) = 0 close(7) = 0 close(6) = 0 close(5) = 0 close(4) = 0 write(2, "mdadm: Failed to restore critica"..., 62mdadm: Failed to restore critical section for reshape, sorry. ) = 62 exit_group(1) = ? Process 5453 detached ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Raid5 Reshape gone wrong, please help 2007-08-21 2:09 ` Greg Nicholson @ 2007-08-23 15:04 ` Greg Nicholson 2007-08-23 15:06 ` Greg Nicholson ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Greg Nicholson @ 2007-08-23 15:04 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid <Trimming tons of detail, but keeping the thread> OK.... I've reproduced the original issue on a seperate box. 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 mdadm --add /dev/md0 /dev/sda1 mdadm -G --backup-file=/root/backup.raid.file /dev/md0 (Yes, I added the backup-file this time... just to be sure.) Mdadm began the grow, and stopped in the critical section, or right after creating the backup... Not sure which. Reboot. Refused to start the array. So... mdadm -A /dev/md0 /dev/sd[abdefg]1 and we have in /proc/mdstat: Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdg1[0] sda1[5] sdf1[4] sdd1[3] sdb1[2] sde1[1] 1953535488 blocks super 0.91 level 5, 128k chunk, algorithm 2 [6/6] [UUUUUU] [>....................] reshape = 0.0% (512/488383872) finish=378469.4min speed=0K/sec unused devices: <none> And it's sat there without change for the past 2 hours. Now, I have a backup, so frankly, I'm about to blow away the array and just recreate it, but I thought you should know. I've got the stripe_cache_size at 8192... 256 and 1024 don't change anything. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Raid5 Reshape gone wrong, please help 2007-08-23 15:04 ` Greg Nicholson @ 2007-08-23 15:06 ` Greg Nicholson 2007-08-27 10:57 ` Neil Brown 2007-08-29 13:32 ` Bill Davidsen 2 siblings, 0 replies; 13+ messages in thread From: Greg Nicholson @ 2007-08-23 15:06 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On 8/23/07, Greg Nicholson <d0gz.net@gmail.com> wrote: > <Trimming tons of detail, but keeping the thread> > > OK.... I've reproduced the original issue on a seperate box. > 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 > > mdadm --add /dev/md0 /dev/sda1 > mdadm -G --backup-file=/root/backup.raid.file /dev/md0 > > (Yes, I added the backup-file this time... just to be sure.) > > Mdadm began the grow, and stopped in the critical section, or right > after creating the backup... Not sure which. Reboot. > > Refused to start the array. So... > > mdadm -A /dev/md0 /dev/sd[abdefg]1 > > and we have in /proc/mdstat: > > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 sdg1[0] sda1[5] sdf1[4] sdd1[3] sdb1[2] sde1[1] > 1953535488 blocks super 0.91 level 5, 128k chunk, algorithm 2 > [6/6] [UUUUUU] > [>....................] reshape = 0.0% (512/488383872) > finish=378469.4min speed=0K/sec > > unused devices: <none> > > And it's sat there without change for the past 2 hours. Now, I have a > backup, so frankly, I'm about to blow away the array and just recreate > it, but I thought you should know. > > I've got the stripe_cache_size at 8192... 256 and 1024 don't change anything. > Forgot the DMESG output: md: bind<sde1> md: bind<sdb1> md: bind<sdd1> md: bind<sdf1> md: bind<sda1> md: bind<sdg1> md: md0: raid array is not clean -- starting background reconstruction raid5: reshape will continue raid5: device sdg1 operational as raid disk 0 raid5: device sda1 operational as raid disk 5 raid5: device sdf1 operational as raid disk 4 raid5: device sdd1 operational as raid disk 3 raid5: device sdb1 operational as raid disk 2 raid5: device sde1 operational as raid disk 1 raid5: allocated 6293kB for md0 raid5: raid level 5 set md0 active with 6 out of 6 devices, algorithm 2 RAID5 conf printout: --- rd:6 wd:6 disk 0, o:1, dev:sdg1 disk 1, o:1, dev:sde1 disk 2, o:1, dev:sdb1 disk 3, o:1, dev:sdd1 disk 4, o:1, dev:sdf1 disk 5, o:1, dev:sda1 ...ok start reshape thread md: reshape of RAID array md0 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reshape. md: using 128k window, over a total of 488383872 blocks. Looks good, but it doesn't actually do anything. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Raid5 Reshape gone wrong, please help 2007-08-23 15:04 ` Greg Nicholson 2007-08-23 15:06 ` Greg Nicholson @ 2007-08-27 10:57 ` Neil Brown 2007-08-27 16:42 ` Williams, Dan J 2007-08-29 13:32 ` Bill Davidsen 2 siblings, 1 reply; 13+ messages in thread From: Neil Brown @ 2007-08-27 10:57 UTC (permalink / raw) To: Greg Nicholson; +Cc: Dan Williams, linux-raid On Thursday August 23, d0gz.net@gmail.com wrote: > <Trimming tons of detail, but keeping the thread> > > OK.... I've reproduced the original issue on a seperate box. > 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 No, you are right. It doesn't. Obviously insufficient testing and review - thanks for find it for us. This patch seems to make it work - raid5 and raid6. Dan: Could you check it for me, particularly the moving of + async_tx_ack(tx); + dma_wait_for_async_tx(tx); outside of the loop. Greg: could you pleas check it works for you too - it works for me, but double-testing never hurts. Thanks again, NeilBrown --------------------------------- Fix some bugs with growing raid5/raid6 arrays. ### Diffstat output ./drivers/md/raid5.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c 2007-08-24 16:36:22.000000000 +1000 +++ ./drivers/md/raid5.c 2007-08-27 20:50:57.000000000 +1000 @@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid struct dma_async_tx_descriptor *tx = NULL; clear_bit(STRIPE_EXPAND_SOURCE, &sh->state); for (i = 0; i < sh->disks; i++) - if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) { + if (i != sh->pd_idx && (!r6s || i != r6s->qd_idx)) { int dd_idx, pd_idx, j; struct stripe_head *sh2; @@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags); for (j = 0; j < conf->raid_disks; j++) if (j != sh2->pd_idx && - (r6s && j != r6s->qd_idx) && + (!r6s || j != raid6_next_disk(sh2->pd_idx, + sh2->disks)) && !test_bit(R5_Expanded, &sh2->dev[j].flags)) break; if (j == conf->raid_disks) { @@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid } release_stripe(sh2); - /* done submitting copies, wait for them to complete */ - if (i + 1 >= sh->disks) { - async_tx_ack(tx); - dma_wait_for_async_tx(tx); - } } + /* done submitting copies, wait for them to complete */ + if (tx) { + async_tx_ack(tx); + dma_wait_for_async_tx(tx); + } } /* @@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe sh->disks = conf->raid_disks; sh->pd_idx = stripe_to_pdidx(sh->sector, conf, conf->raid_disks); - s.locked += handle_write_operations5(sh, 0, 1); + s.locked += handle_write_operations5(sh, 1, 1); } else if (s.expanded && !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) { clear_bit(STRIPE_EXPAND_READY, &sh->state); ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Raid5 Reshape gone wrong, please help 2007-08-27 10:57 ` Neil Brown @ 2007-08-27 16:42 ` Williams, Dan J 2007-08-30 5:47 ` Neil Brown 0 siblings, 1 reply; 13+ messages in thread From: Williams, Dan J @ 2007-08-27 16:42 UTC (permalink / raw) To: Neil Brown, Greg Nicholson; +Cc: linux-raid > From: Neil Brown [mailto:neilb@suse.de] > On Thursday August 23, d0gz.net@gmail.com wrote: > > <Trimming tons of detail, but keeping the thread> > > > > OK.... I've reproduced the original issue on a seperate box. > > 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 > > No, you are right. It doesn't. > > Obviously insufficient testing and review - thanks for find it for us. > Agreed - seconded. > This patch seems to make it work - raid5 and raid6. > > Dan: Could you check it for me, particularly the moving of > + async_tx_ack(tx); > + dma_wait_for_async_tx(tx); > outside of the loop. > Yes, this definitely needs to be outside the loop. > Greg: could you pleas check it works for you too - it works for me, > but double-testing never hurts. > > Thanks again, > > NeilBrown > > > > --------------------------------- > Fix some bugs with growing raid5/raid6 arrays. > > > > ### Diffstat output > ./drivers/md/raid5.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c > --- .prev/drivers/md/raid5.c 2007-08-24 16:36:22.000000000 +1000 > +++ ./drivers/md/raid5.c 2007-08-27 20:50:57.000000000 +1000 > @@ -2541,7 +2541,7 @@ static void handle_stripe_expansion(raid > struct dma_async_tx_descriptor *tx = NULL; > clear_bit(STRIPE_EXPAND_SOURCE, &sh->state); > for (i = 0; i < sh->disks; i++) > - if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) { > + if (i != sh->pd_idx && (!r6s || i != r6s->qd_idx)) { > int dd_idx, pd_idx, j; > struct stripe_head *sh2; > > @@ -2574,7 +2574,8 @@ static void handle_stripe_expansion(raid > set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags); > for (j = 0; j < conf->raid_disks; j++) > if (j != sh2->pd_idx && > - (r6s && j != r6s->qd_idx) && > + (!r6s || j != raid6_next_disk(sh2->pd_idx, > + sh2->disks)) && > !test_bit(R5_Expanded, &sh2->dev[j].flags)) > break; > if (j == conf->raid_disks) { > @@ -2583,12 +2584,12 @@ static void handle_stripe_expansion(raid > } > release_stripe(sh2); > > - /* done submitting copies, wait for them to complete */ > - if (i + 1 >= sh->disks) { > - async_tx_ack(tx); > - dma_wait_for_async_tx(tx); > - } > } > + /* done submitting copies, wait for them to complete */ > + if (tx) { > + async_tx_ack(tx); > + dma_wait_for_async_tx(tx); > + } > } > > /* > @@ -2855,7 +2856,7 @@ static void handle_stripe5(struct stripe > sh->disks = conf->raid_disks; > sh->pd_idx = stripe_to_pdidx(sh->sector, conf, > conf->raid_disks); > - s.locked += handle_write_operations5(sh, 0, 1); > + s.locked += handle_write_operations5(sh, 1, 1); How about for clarity: s.locked += handle_write_operations5(sh, RECONSTRUCT_WRITE, 1); > } else if (s.expanded && > !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) { > clear_bit(STRIPE_EXPAND_READY, &sh->state); Signed-off-by: Dan Williams <dan.j.williams@intel.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Raid5 Reshape gone wrong, please help 2007-08-27 16:42 ` Williams, Dan J @ 2007-08-30 5:47 ` Neil Brown 0 siblings, 0 replies; 13+ messages in thread From: Neil Brown @ 2007-08-30 5:47 UTC (permalink / raw) To: Williams, Dan J; +Cc: Greg Nicholson, linux-raid On Monday August 27, dan.j.williams@intel.com wrote: > > - s.locked += handle_write_operations5(sh, 0, 1); > > + s.locked += handle_write_operations5(sh, 1, 1); > How about for clarity: > s.locked += handle_write_operations5(sh, RECONSTRUCT_WRITE, 1); > Nope. That second argument is a boolean, not an enum. If it was changed to 'writemode' (or similar) and the code in handle_write_operations5 were changed to switch(writemode) { case RECONSTRUCT_WRITE: .... case READ_MODIFY_WRITE: .... } Then it would make sense to use RECONSTRUCT_WRITE in the call - and the code would probably be more readable on the whole. But as it is, either 'true' or '1' should go there. NeilBrown ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Raid5 Reshape gone wrong, please help 2007-08-23 15:04 ` Greg Nicholson 2007-08-23 15:06 ` Greg Nicholson 2007-08-27 10:57 ` Neil Brown @ 2007-08-29 13:32 ` Bill Davidsen 2 siblings, 0 replies; 13+ messages in thread From: Bill Davidsen @ 2007-08-29 13:32 UTC (permalink / raw) To: Greg Nicholson; +Cc: Neil Brown, linux-raid Greg Nicholson wrote: > <Trimming tons of detail, but keeping the thread> > > OK.... I've reproduced the original issue on a seperate box. > 2.6.23-rc3 does not like to grow Raid 5 arrays. MDadm 2.6.3 > I have to say that trying something as critical as a reshape of live data on an -rc kernel is a great way to have a learning experience. Good that you found the problem, but also good that *you* found the problem, not me. Thanks for testing. ;-) > mdadm --add /dev/md0 /dev/sda1 > mdadm -G --backup-file=/root/backup.raid.file /dev/md0 > > (Yes, I added the backup-file this time... just to be sure.) > > Mdadm began the grow, and stopped in the critical section, or right > after creating the backup... Not sure which. Reboot. > > Refused to start the array. So... > > mdadm -A /dev/md0 /dev/sd[abdefg]1 > > and we have in /proc/mdstat: > > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 sdg1[0] sda1[5] sdf1[4] sdd1[3] sdb1[2] sde1[1] > 1953535488 blocks super 0.91 level 5, 128k chunk, algorithm 2 > [6/6] [UUUUUU] > [>....................] reshape = 0.0% (512/488383872) > finish=378469.4min speed=0K/sec > > unused devices: <none> > > And it's sat there without change for the past 2 hours. Now, I have a > backup, so frankly, I'm about to blow away the array and just recreate > it, but I thought you should know. > > I've got the stripe_cache_size at 8192... 256 and 1024 don't change anything. > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2007-08-30 5:47 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-08-18 1:26 Raid5 Reshape gone wrong, please help Greg Nicholson 2007-08-18 8:56 ` Neil Brown 2007-08-18 15:37 ` Greg Nicholson 2007-08-19 11:17 ` Neil Brown 2007-08-19 15:45 ` Greg Nicholson 2007-08-20 2:44 ` Greg Nicholson 2007-08-21 2:09 ` Greg Nicholson 2007-08-23 15:04 ` Greg Nicholson 2007-08-23 15:06 ` Greg Nicholson 2007-08-27 10:57 ` Neil Brown 2007-08-27 16:42 ` Williams, Dan J 2007-08-30 5:47 ` Neil Brown 2007-08-29 13:32 ` Bill Davidsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).