* Trouble when growing a raid5 array
@ 2006-11-30 7:04 Jacob Schmidt Madsen
2006-12-01 11:18 ` Jacob Schmidt Madsen
0 siblings, 1 reply; 4+ messages in thread
From: Jacob Schmidt Madsen @ 2006-11-30 7:04 UTC (permalink / raw)
To: linux-raid
Hey
I bought 2 new disks to be included in a big raid5 array.
I executed:
# mdadm /dev/md5 -a /dev/sdh1
# mdadm /dev/md5 -a /dev/sdi1
# mdadm --grow /dev/md5 --raid-disks=8
After 12 hours it stalled:
# cat /proc/mdstat
md5 : active raid5 sdc1[6] sdb1[7] sdi1[3] sdh1[2] sdg1[1] sdf1[0] sde1[4]
sdd1[5]
1562842880 blocks super 0.91 level 5, 64k chunk, algorithm 2 [8/8]
[UUUUUUUU]
[===================>.] reshape = 98.1% (306783360/312568576)
finish=668.7min speed=144K/sec
Its been stuck at 306783360/312568576 for hours now.
When i check the kernel log it is full of "compute_blocknr: map not correct".
I guess something went really bad? If someone know what is going on or if
someone know what i can do to fix this.
I would really be sad if all the data was gone.
Thanks!
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Trouble when growing a raid5 array 2006-11-30 7:04 Trouble when growing a raid5 array Jacob Schmidt Madsen @ 2006-12-01 11:18 ` Jacob Schmidt Madsen 2006-12-08 19:29 ` Jacob Schmidt Madsen 0 siblings, 1 reply; 4+ messages in thread From: Jacob Schmidt Madsen @ 2006-12-01 11:18 UTC (permalink / raw) To: linux-raid Hey again :-) I'm starting to suspect that its a bug, since all I did was straight forward and it worked many times before. When I try to stop the array by executing "mdadm -S /dev/md5", then mdadm stall (i'm suspecting it hit an error - maybe the same one). I also tryed to restart the computer and made sure the array didnt auto-start. I then manually started it and the reshape process it shown when executing "cat /proc/mdstat", but it doesnt proceed (it seems stalled right away). When I try to stop it as shown above, it then stall mdadm like before. So I'm able to reproduce the error. I've tryed with kernel 2.6.18.3, 2.6.18.4 and 2.6.19 - with the same results as described above. In case its a bug, then I would really like to help out, so its fixed and noone else will experience it (and I get my array fixed). What can I do to make sure its a bug and if it is, then what kind of information will be helpfull and where should I submit it? I've checked the source code (raid5.c), but there's no comment included in the code, so I cant do much myself since my code experience with C is very small when it comes to kernel programming. On Thursday 30 November 2006 08:04, Jacob Schmidt Madsen wrote: > Hey > > I bought 2 new disks to be included in a big raid5 array. > > I executed: > # mdadm /dev/md5 -a /dev/sdh1 > # mdadm /dev/md5 -a /dev/sdi1 > # mdadm --grow /dev/md5 --raid-disks=8 > > After 12 hours it stalled: > # cat /proc/mdstat > md5 : active raid5 sdc1[6] sdb1[7] sdi1[3] sdh1[2] sdg1[1] sdf1[0] sde1[4] > sdd1[5] > 1562842880 blocks super 0.91 level 5, 64k chunk, algorithm 2 [8/8] > [UUUUUUUU] > [===================>.] reshape = 98.1% (306783360/312568576) > finish=668.7min speed=144K/sec > > Its been stuck at 306783360/312568576 for hours now. > > When i check the kernel log it is full of "compute_blocknr: map not > correct". > > I guess something went really bad? If someone know what is going on or if > someone know what i can do to fix this. > I would really be sad if all the data was gone. > > Thanks! > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Trouble when growing a raid5 array 2006-12-01 11:18 ` Jacob Schmidt Madsen @ 2006-12-08 19:29 ` Jacob Schmidt Madsen 2006-12-08 21:08 ` Jacob Schmidt Madsen 0 siblings, 1 reply; 4+ messages in thread From: Jacob Schmidt Madsen @ 2006-12-08 19:29 UTC (permalink / raw) To: linux-raid I think I've found an overflow. After thinking about this for a while I decided to create a new array of all 8 partitions and overwrite the old one. I was counting on almost all data would be intact, if the partitions in the new raid5 array were in the order as in the overwritten array - the reshape process got 98.1% done after all. So I executed: # mdadm --create --verbose /dev/md5 --level=5 --raid-devices=8 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 64K mdadm: /dev/sdb1 appears to be part of a raid array: level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 mdadm: /dev/sdc1 appears to be part of a raid array: level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 mdadm: /dev/sdd1 appears to be part of a raid array: level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 mdadm: /dev/sde1 appears to be part of a raid array: level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 mdadm: /dev/sdf1 appears to be part of a raid array: level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 mdadm: /dev/sdg1 appears to be part of a raid array: level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 mdadm: /dev/sdh1 appears to be part of a raid array: level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 mdadm: /dev/sdi1 appears to be part of a raid array: level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 mdadm: size set to 312568576K Continue creating array? y mdadm: array /dev/md5 started. From what I could tell all the data was still there, so I guessed right and got the same data structure. BUT the new array is ONLY 42gb and there is 8 partitions of 320gb each, so it does look like a overflow or similar. Here's the detailed information of the newly created array (check the array and device size): # mdadm -D /dev/md5 /dev/md5: Version : 00.90.03 Creation Time : Fri Dec 8 19:07:26 2006 Raid Level : raid5 Array Size : 40496384 (38.62 GiB 41.47 GB) Device Size : 312568576 (298.09 GiB 320.07 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 5 Persistence : Superblock is persistent Update Time : Fri Dec 8 19:07:26 2006 State : clean, degraded, recovering Active Devices : 7 Working Devices : 8 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 64K Rebuild Status : 0% complete UUID : a24c9a1d:6ff2910a:9e2ad3b1:f5e7c6a5 Events : 0.1 Number Major Minor RaidDevice State 0 8 81 0 active sync /dev/sdf1 1 8 97 1 active sync /dev/sdg1 2 8 113 2 active sync /dev/sdh1 3 8 129 3 active sync /dev/sdi1 4 8 65 4 active sync /dev/sde1 5 8 49 5 active sync /dev/sdd1 6 8 33 6 active sync /dev/sdc1 8 8 17 7 spare rebuilding /dev/sdb1 On Friday 01 December 2006 12:18, you wrote: > Hey again :-) > > I'm starting to suspect that its a bug, since all I did was straight > forward and it worked many times before. > > When I try to stop the array by executing "mdadm -S /dev/md5", then mdadm > stall (i'm suspecting it hit an error - maybe the same one). > > I also tryed to restart the computer and made sure the array didnt > auto-start. I then manually started it and the reshape process it shown > when > executing "cat /proc/mdstat", but it doesnt proceed (it seems stalled right > away). When I try to stop it as shown above, it then stall mdadm like > before. So I'm able to reproduce the error. > > I've tryed with kernel 2.6.18.3, 2.6.18.4 and 2.6.19 - with the same > results as described above. > > In case its a bug, then I would really like to help out, so its fixed and > noone else will experience it (and I get my array fixed). What can I do to > make sure its a bug and if it is, then what kind of information will be > helpfull and where should I submit it? > > I've checked the source code (raid5.c), but there's no comment included in > the code, so I cant do much myself since my code experience with C is very > small when it comes to kernel programming. > > On Thursday 30 November 2006 08:04, Jacob Schmidt Madsen wrote: > > Hey > > > > I bought 2 new disks to be included in a big raid5 array. > > > > I executed: > > # mdadm /dev/md5 -a /dev/sdh1 > > # mdadm /dev/md5 -a /dev/sdi1 > > # mdadm --grow /dev/md5 --raid-disks=8 > > > > After 12 hours it stalled: > > # cat /proc/mdstat > > md5 : active raid5 sdc1[6] sdb1[7] sdi1[3] sdh1[2] sdg1[1] sdf1[0] > > sde1[4] sdd1[5] > > 1562842880 blocks super 0.91 level 5, 64k chunk, algorithm 2 [8/8] > > [UUUUUUUU] > > [===================>.] reshape = 98.1% (306783360/312568576) > > finish=668.7min speed=144K/sec > > > > Its been stuck at 306783360/312568576 for hours now. > > > > When i check the kernel log it is full of "compute_blocknr: map not > > correct". > > > > I guess something went really bad? If someone know what is going on or if > > someone know what i can do to fix this. > > I would really be sad if all the data was gone. > > > > Thanks! > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Trouble when growing a raid5 array 2006-12-08 19:29 ` Jacob Schmidt Madsen @ 2006-12-08 21:08 ` Jacob Schmidt Madsen 0 siblings, 0 replies; 4+ messages in thread From: Jacob Schmidt Madsen @ 2006-12-08 21:08 UTC (permalink / raw) To: linux-raid Okay, I had an overflow in my brain instead. I wasnt aware of the large block device support in the kernel. Its enabled now and everything is working! Sorry about the spam :-) On Friday 08 December 2006 20:29, you wrote: > I think I've found an overflow. > > After thinking about this for a while I decided to create a new array of > all 8 partitions and overwrite the old one. > I was counting on almost all data would be intact, if the partitions in the > new raid5 array were in the order as in the overwritten array - the reshape > process got 98.1% done after all. > > So I executed: > # > mdadm --create --verbose /dev/md5 --level=5 --raid-devices=8 /dev/sdb1 > /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 > mdadm: layout defaults to left-symmetric > mdadm: chunk size defaults to 64K > mdadm: /dev/sdb1 appears to be part of a raid array: > level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 > mdadm: /dev/sdc1 appears to be part of a raid array: > level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 > mdadm: /dev/sdd1 appears to be part of a raid array: > level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 > mdadm: /dev/sde1 appears to be part of a raid array: > level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 > mdadm: /dev/sdf1 appears to be part of a raid array: > level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 > mdadm: /dev/sdg1 appears to be part of a raid array: > level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 > mdadm: /dev/sdh1 appears to be part of a raid array: > level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 > mdadm: /dev/sdi1 appears to be part of a raid array: > level=raid5 devices=8 ctime=Fri Dec 8 18:08:42 2006 > mdadm: size set to 312568576K > Continue creating array? y > mdadm: array /dev/md5 started. > > From what I could tell all the data was still there, so I guessed right and > got the same data structure. > > BUT the new array is ONLY 42gb and there is 8 partitions of 320gb each, so > it does look like a overflow or similar. > > Here's the detailed information of the newly created array (check the array > and device size): > # mdadm -D /dev/md5 > /dev/md5: > Version : 00.90.03 > Creation Time : Fri Dec 8 19:07:26 2006 > Raid Level : raid5 > Array Size : 40496384 (38.62 GiB 41.47 GB) > Device Size : 312568576 (298.09 GiB 320.07 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 5 > Persistence : Superblock is persistent > > Update Time : Fri Dec 8 19:07:26 2006 > State : clean, degraded, recovering > Active Devices : 7 > Working Devices : 8 > Failed Devices : 0 > Spare Devices : 1 > > Layout : left-symmetric > Chunk Size : 64K > > Rebuild Status : 0% complete > > UUID : a24c9a1d:6ff2910a:9e2ad3b1:f5e7c6a5 > Events : 0.1 > > Number Major Minor RaidDevice State > 0 8 81 0 active sync /dev/sdf1 > 1 8 97 1 active sync /dev/sdg1 > 2 8 113 2 active sync /dev/sdh1 > 3 8 129 3 active sync /dev/sdi1 > 4 8 65 4 active sync /dev/sde1 > 5 8 49 5 active sync /dev/sdd1 > 6 8 33 6 active sync /dev/sdc1 > 8 8 17 7 spare rebuilding /dev/sdb1 > > On Friday 01 December 2006 12:18, you wrote: > > Hey again :-) > > > > I'm starting to suspect that its a bug, since all I did was straight > > forward and it worked many times before. > > > > When I try to stop the array by executing "mdadm -S /dev/md5", then mdadm > > stall (i'm suspecting it hit an error - maybe the same one). > > > > I also tryed to restart the computer and made sure the array didnt > > auto-start. I then manually started it and the reshape process it shown > > when > > executing "cat /proc/mdstat", but it doesnt proceed (it seems stalled > > right away). When I try to stop it as shown above, it then stall mdadm > > like before. So I'm able to reproduce the error. > > > > I've tryed with kernel 2.6.18.3, 2.6.18.4 and 2.6.19 - with the same > > results as described above. > > > > In case its a bug, then I would really like to help out, so its fixed and > > noone else will experience it (and I get my array fixed). What can I do > > to make sure its a bug and if it is, then what kind of information will > > be helpfull and where should I submit it? > > > > I've checked the source code (raid5.c), but there's no comment included > > in the code, so I cant do much myself since my code experience with C is > > very small when it comes to kernel programming. > > > > On Thursday 30 November 2006 08:04, Jacob Schmidt Madsen wrote: > > > Hey > > > > > > I bought 2 new disks to be included in a big raid5 array. > > > > > > I executed: > > > # mdadm /dev/md5 -a /dev/sdh1 > > > # mdadm /dev/md5 -a /dev/sdi1 > > > # mdadm --grow /dev/md5 --raid-disks=8 > > > > > > After 12 hours it stalled: > > > # cat /proc/mdstat > > > md5 : active raid5 sdc1[6] sdb1[7] sdi1[3] sdh1[2] sdg1[1] sdf1[0] > > > sde1[4] sdd1[5] > > > 1562842880 blocks super 0.91 level 5, 64k chunk, algorithm 2 > > > [8/8] [UUUUUUUU] > > > [===================>.] reshape = 98.1% (306783360/312568576) > > > finish=668.7min speed=144K/sec > > > > > > Its been stuck at 306783360/312568576 for hours now. > > > > > > When i check the kernel log it is full of "compute_blocknr: map not > > > correct". > > > > > > I guess something went really bad? If someone know what is going on or > > > if someone know what i can do to fix this. > > > I would really be sad if all the data was gone. > > > > > > Thanks! > > > - > > > To unsubscribe from this list: send the line "unsubscribe linux-raid" > > > in the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-12-08 21:08 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-30 7:04 Trouble when growing a raid5 array Jacob Schmidt Madsen 2006-12-01 11:18 ` Jacob Schmidt Madsen 2006-12-08 19:29 ` Jacob Schmidt Madsen 2006-12-08 21:08 ` Jacob Schmidt Madsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).