* 2 Disks Jumped Out While Reshaping RAID5 @ 2009-09-05 20:22 Majed B. 2009-09-05 21:32 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: Majed B. @ 2009-09-05 20:22 UTC (permalink / raw) To: linux-raid Hello all, I have posted my problem already here: http://ubuntuforums.org/showthread.php?p=7900571#post7900571 It also has file attachments of the output of mdadm -E /dev/sd[a-h]1 I appreciate any help on this. -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2 Disks Jumped Out While Reshaping RAID5 2009-09-05 20:22 2 Disks Jumped Out While Reshaping RAID5 Majed B. @ 2009-09-05 21:32 ` NeilBrown 2009-09-06 10:00 ` Majed B. 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2009-09-05 21:32 UTC (permalink / raw) To: Majed B.; +Cc: linux-raid On Sun, September 6, 2009 6:22 am, Majed B. wrote: > Hello all, > > I have posted my problem already here: > http://ubuntuforums.org/showthread.php?p=7900571#post7900571 > It also has file attachments of the output of mdadm -E /dev/sd[a-h]1 It seems that you need to log in to read the attachements... so I haven't. > > I appreciate any help on this. Hopefully you just need to add "--force" to the assemble command and it would all just work. However I haven't tested that on an array that is in the process of a reshape so I cannot promise. I might try to reproduce your situation and with some scratch drives and check that mdadm -Af does the right thing, but it won't be a day or so, and as I cannot see the --examine output I might get the situation a bit wrong ... (hint hint: it is always best to post full information rather than pointers to it, unless said information is really really big). NeilBrown > -- > Â Â Â Majed B. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2 Disks Jumped Out While Reshaping RAID5 2009-09-05 21:32 ` NeilBrown @ 2009-09-06 10:00 ` Majed B. 2009-09-06 23:52 ` Neil Brown 0 siblings, 1 reply; 9+ messages in thread From: Majed B. @ 2009-09-06 10:00 UTC (permalink / raw) To: linux-raid I forgot to mention that I'm running mdadm 2.6.7.1 (latest in Ubuntu Server repositories). I tried forcing the assembly, but as mentioned, I just got an error: root@Adam:/var/www# mdadm -Af /dev/md0 mdadm: superblock on /dev/sdh1 doesn't match others - assembly aborted I know I could've pasted whatever I wrote here, but it seemed redundant. I'll keep your hint in mind for the next time, if any (hopefully not). This may be of an interest to you: root@Adam:/var/www# mdadm -E /dev/sd[a-h]1 | grep Reshap sda1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) sdb1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) sdc1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) sdd1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) sde1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) sdf1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) sdg1 Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB) sdh1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) Note that sdd1 was the spare. The UUIDs are all the same and the superblock is all similar except for the reshaping position of sdg1. I didn't try to recreate the array as I've never faced this issue before, so I don't know what kind of repercussions it may have. What I do know, that at the worst case scenario, I can recreate the array out of 7 disks (all but sdg1), but lose about 2.3TB worth of data :( On Sun, Sep 6, 2009 at 12:32 AM, NeilBrown<neilb@suse.de> wrote: > On Sun, September 6, 2009 6:22 am, Majed B. wrote: >> Hello all, >> >> I have posted my problem already here: >> http://ubuntuforums.org/showthread.php?p=7900571#post7900571 >> It also has file attachments of the output of mdadm -E /dev/sd[a-h]1 > > It seems that you need to log in to read the attachements... so I haven't. > >> >> I appreciate any help on this. > > Hopefully you just need to add "--force" to the assemble command > and it would all just work. However I haven't tested that on an array > that is in the process of a reshape so I cannot promise. > I might try to reproduce your situation and with some scratch drives > and check that mdadm -Af does the right thing, but it won't be a day > or so, and as I cannot see the --examine output I might get the > situation a bit wrong ... (hint hint: it is always best to post > full information rather than pointers to it, unless said information is > really really big). > > NeilBrown -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2 Disks Jumped Out While Reshaping RAID5 2009-09-06 10:00 ` Majed B. @ 2009-09-06 23:52 ` Neil Brown 2009-09-06 23:55 ` Majed B. 0 siblings, 1 reply; 9+ messages in thread From: Neil Brown @ 2009-09-06 23:52 UTC (permalink / raw) To: Majed B.; +Cc: linux-raid On Sunday September 6, majedb@gmail.com wrote: > I forgot to mention that I'm running mdadm 2.6.7.1 (latest in Ubuntu > Server repositories). You will need at least 2.6.8 to be able to assemble arrays which are in the middle of a reshape. I would suggest 2.6.9 or 3.0. > > I tried forcing the assembly, but as mentioned, I just got an error: > root@Adam:/var/www# mdadm -Af /dev/md0 > mdadm: superblock on /dev/sdh1 doesn't match others - assembly aborted This incorrect message is fixed in 2.6.8 an later. > > I know I could've pasted whatever I wrote here, but it seemed > redundant. I'll keep your hint in mind for the next time, if any > (hopefully not). Redundant? How is that relevant? If you want help, your goal should be to make it as easy as possible for people to help you. Having all information in one email message is easy. Having to use a browser to get some of it makes it hard. Having to register on the website to down load an attachment makes it nearly impossible. > > This may be of an interest to you: > root@Adam:/var/www# mdadm -E /dev/sd[a-h]1 | grep Reshap > sda1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > sdb1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > sdc1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > sdd1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > sde1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > sdf1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > sdg1 Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB) > sdh1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) If you can post that again but without the "grep" I might be able to be more helpful. (i.e. the complete output of "mdadm -E /dev/sd[a-h]1"). NeilBrown > > Note that sdd1 was the spare. > > The UUIDs are all the same and the superblock is all similar except > for the reshaping position of sdg1. > > I didn't try to recreate the array as I've never faced this issue > before, so I don't know what kind of repercussions it may have. > > What I do know, that at the worst case scenario, I can recreate the > array out of 7 disks (all but sdg1), but lose about 2.3TB worth of > data :( > > On Sun, Sep 6, 2009 at 12:32 AM, NeilBrown<neilb@suse.de> wrote: > > On Sun, September 6, 2009 6:22 am, Majed B. wrote: > >> Hello all, > >> > >> I have posted my problem already here: > >> http://ubuntuforums.org/showthread.php?p=7900571#post7900571 > >> It also has file attachments of the output of mdadm -E /dev/sd[a-h]1 > > > > It seems that you need to log in to read the attachements... so I haven't. > > > >> > >> I appreciate any help on this. > > > > Hopefully you just need to add "--force" to the assemble command > > and it would all just work. However I haven't tested that on an array > > that is in the process of a reshape so I cannot promise. > > I might try to reproduce your situation and with some scratch drives > > and check that mdadm -Af does the right thing, but it won't be a day > > or so, and as I cannot see the --examine output I might get the > > situation a bit wrong ... (hint hint: it is always best to post > > full information rather than pointers to it, unless said information is > > really really big). > > > > NeilBrown > > -- > Majed B. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2 Disks Jumped Out While Reshaping RAID5 2009-09-06 23:52 ` Neil Brown @ 2009-09-06 23:55 ` Majed B. 2009-09-07 0:01 ` Majed B. 0 siblings, 1 reply; 9+ messages in thread From: Majed B. @ 2009-09-06 23:55 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Thanks a lot Neil! I didn't know that it requires you to register to download. My bad. Here's the output of examine: /dev/sda1: Magic : a92b4efc Version : 00.91.00 UUID : ed1a2670:03308d80:95a69c9f:ccf9605f Creation Time : Sat May 23 00:22:49 2009 Raid Level : raid5 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) Delta Devices : 1 (7->8) Update Time : Wed Sep 2 13:28:39 2009 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 1 Spare Devices : 0 Checksum : 5b5a1163 - correct Events : 949214 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 4 8 1 4 active sync /dev/sda1 0 0 0 0 0 removed 1 1 8 65 1 active sync /dev/sde1 2 2 8 33 2 active sync /dev/sdc1 3 3 0 0 3 faulty removed 4 4 8 1 4 active sync /dev/sda1 5 5 8 113 5 active sync /dev/sdh1 6 6 8 17 6 active sync /dev/sdb1 7 7 8 49 7 active sync /dev/sdd1 /dev/sdb1: Magic : a92b4efc Version : 00.91.00 UUID : ed1a2670:03308d80:95a69c9f:ccf9605f Creation Time : Sat May 23 00:22:49 2009 Raid Level : raid5 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) Delta Devices : 1 (7->8) Update Time : Wed Sep 2 13:28:39 2009 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 1 Spare Devices : 0 Checksum : 5b5a1177 - correct Events : 949214 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 6 8 17 6 active sync /dev/sdb1 0 0 0 0 0 removed 1 1 8 65 1 active sync /dev/sde1 2 2 8 33 2 active sync /dev/sdc1 3 3 0 0 3 faulty removed 4 4 8 1 4 active sync /dev/sda1 5 5 8 113 5 active sync /dev/sdh1 6 6 8 17 6 active sync /dev/sdb1 7 7 8 49 7 active sync /dev/sdd1 /dev/sdc1: Magic : a92b4efc Version : 00.91.00 UUID : ed1a2670:03308d80:95a69c9f:ccf9605f Creation Time : Sat May 23 00:22:49 2009 Raid Level : raid5 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) Delta Devices : 1 (7->8) Update Time : Wed Sep 2 13:28:39 2009 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 1 Spare Devices : 0 Checksum : 5b5a117f - correct Events : 949214 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 2 8 33 2 active sync /dev/sdc1 0 0 0 0 0 removed 1 1 8 65 1 active sync /dev/sde1 2 2 8 33 2 active sync /dev/sdc1 3 3 0 0 3 faulty removed 4 4 8 1 4 active sync /dev/sda1 5 5 8 113 5 active sync /dev/sdh1 6 6 8 17 6 active sync /dev/sdb1 7 7 8 49 7 active sync /dev/sdd1 /dev/sdd1: Magic : a92b4efc Version : 00.91.00 UUID : ed1a2670:03308d80:95a69c9f:ccf9605f Creation Time : Sat May 23 00:22:49 2009 Raid Level : raid5 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) Delta Devices : 1 (7->8) Update Time : Wed Sep 2 13:28:39 2009 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 1 Spare Devices : 0 Checksum : 5b5a1199 - correct Events : 949214 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 7 8 49 7 active sync /dev/sdd1 0 0 0 0 0 removed 1 1 8 65 1 active sync /dev/sde1 2 2 8 33 2 active sync /dev/sdc1 3 3 0 0 3 faulty removed 4 4 8 1 4 active sync /dev/sda1 5 5 8 113 5 active sync /dev/sdh1 6 6 8 17 6 active sync /dev/sdb1 7 7 8 49 7 active sync /dev/sdd1 /dev/sde1: Magic : a92b4efc Version : 00.91.00 UUID : ed1a2670:03308d80:95a69c9f:ccf9605f Creation Time : Sat May 23 00:22:49 2009 Raid Level : raid5 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) Delta Devices : 1 (7->8) Update Time : Wed Sep 2 13:28:39 2009 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 1 Spare Devices : 0 Checksum : 5b5a119d - correct Events : 949214 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 1 8 65 1 active sync /dev/sde1 0 0 0 0 0 removed 1 1 8 65 1 active sync /dev/sde1 2 2 8 33 2 active sync /dev/sdc1 3 3 0 0 3 faulty removed 4 4 8 1 4 active sync /dev/sda1 5 5 8 113 5 active sync /dev/sdh1 6 6 8 17 6 active sync /dev/sdb1 7 7 8 49 7 active sync /dev/sdd1 /dev/sdf1: Magic : a92b4efc Version : 00.91.00 UUID : ed1a2670:03308d80:95a69c9f:ccf9605f Creation Time : Sat May 23 00:22:49 2009 Raid Level : raid5 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) Delta Devices : 1 (7->8) Update Time : Wed Sep 2 06:40:04 2009 State : clean Active Devices : 7 Working Devices : 7 Failed Devices : 1 Spare Devices : 0 Checksum : 5b59b1c1 - correct Events : 949204 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 0 8 81 0 active sync /dev/sdf1 0 0 8 81 0 active sync /dev/sdf1 1 1 8 65 1 active sync /dev/sde1 2 2 8 33 2 active sync /dev/sdc1 3 3 0 0 3 faulty removed 4 4 8 1 4 active sync /dev/sda1 5 5 8 113 5 active sync /dev/sdh1 6 6 8 17 6 active sync /dev/sdb1 7 7 8 49 7 active sync /dev/sdd1 /dev/sdg1: Magic : a92b4efc Version : 00.91.00 UUID : ed1a2670:03308d80:95a69c9f:ccf9605f Creation Time : Sat May 23 00:22:49 2009 Raid Level : raid5 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB) Delta Devices : 1 (7->8) Update Time : Wed Sep 2 00:10:39 2009 State : clean Active Devices : 8 Working Devices : 8 Failed Devices : 0 Spare Devices : 0 Checksum : fba3471a - correct Events : 874530 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 3 8 97 3 active sync /dev/sdg1 0 0 8 81 0 active sync /dev/sdf1 1 1 8 65 1 active sync /dev/sde1 2 2 8 33 2 active sync /dev/sdc1 3 3 8 97 3 active sync /dev/sdg1 4 4 8 1 4 active sync /dev/sda1 5 5 8 113 5 active sync /dev/sdh1 6 6 8 17 6 active sync /dev/sdb1 7 7 8 49 7 active sync /dev/sdd1 /dev/sdh1: Magic : a92b4efc Version : 00.91.00 UUID : ed1a2670:03308d80:95a69c9f:ccf9605f Creation Time : Sat May 23 00:22:49 2009 Raid Level : raid5 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) Array Size : 6837318656 (6520.58 GiB 7001.41 GB) Raid Devices : 8 Total Devices : 8 Preferred Minor : 0 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) Delta Devices : 1 (7->8) Update Time : Wed Sep 2 13:28:39 2009 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 1 Spare Devices : 0 Checksum : 5b5a11d5 - correct Events : 949214 Layout : left-symmetric Chunk Size : 256K Number Major Minor RaidDevice State this 5 8 113 5 active sync /dev/sdh1 0 0 0 0 0 removed 1 1 8 65 1 active sync /dev/sde1 2 2 8 33 2 active sync /dev/sdc1 3 3 0 0 3 faulty removed 4 4 8 1 4 active sync /dev/sda1 5 5 8 113 5 active sync /dev/sdh1 6 6 8 17 6 active sync /dev/sdb1 7 7 8 49 7 active sync /dev/sdd1 I have already downloaded and compiled mdadm 3.0, but didn't install it, awaiting further instructions from you. I'll install it now and run -Af and report back what happens. Thank you again! On Mon, Sep 7, 2009 at 2:52 AM, Neil Brown<neilb@suse.de> wrote: > On Sunday September 6, majedb@gmail.com wrote: >> I forgot to mention that I'm running mdadm 2.6.7.1 (latest in Ubuntu >> Server repositories). > > You will need at least 2.6.8 to be able to assemble arrays which are > in the middle of a reshape. I would suggest 2.6.9 or 3.0. > >> >> I tried forcing the assembly, but as mentioned, I just got an error: >> root@Adam:/var/www# mdadm -Af /dev/md0 >> mdadm: superblock on /dev/sdh1 doesn't match others - assembly aborted > > This incorrect message is fixed in 2.6.8 an later. > >> >> I know I could've pasted whatever I wrote here, but it seemed >> redundant. I'll keep your hint in mind for the next time, if any >> (hopefully not). > > Redundant? How is that relevant? > If you want help, your goal should be to make it as easy as possible > for people to help you. Having all information in one email message > is easy. Having to use a browser to get some of it makes it hard. > Having to register on the website to down load an attachment makes it > nearly impossible. > >> >> This may be of an interest to you: >> root@Adam:/var/www# mdadm -E /dev/sd[a-h]1 | grep Reshap >> sda1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >> sdb1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >> sdc1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >> sdd1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >> sde1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >> sdf1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >> sdg1 Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB) >> sdh1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > > If you can post that again but without the "grep" I might be able to > be more helpful. (i.e. the complete output of "mdadm -E /dev/sd[a-h]1"). > > NeilBrown > > > >> >> Note that sdd1 was the spare. >> >> The UUIDs are all the same and the superblock is all similar except >> for the reshaping position of sdg1. >> >> I didn't try to recreate the array as I've never faced this issue >> before, so I don't know what kind of repercussions it may have. >> >> What I do know, that at the worst case scenario, I can recreate the >> array out of 7 disks (all but sdg1), but lose about 2.3TB worth of >> data :( >> >> On Sun, Sep 6, 2009 at 12:32 AM, NeilBrown<neilb@suse.de> wrote: >> > On Sun, September 6, 2009 6:22 am, Majed B. wrote: >> >> Hello all, >> >> >> >> I have posted my problem already here: >> >> http://ubuntuforums.org/showthread.php?p=7900571#post7900571 >> >> It also has file attachments of the output of mdadm -E /dev/sd[a-h]1 >> > >> > It seems that you need to log in to read the attachements... so I haven't. >> > >> >> >> >> I appreciate any help on this. >> > >> > Hopefully you just need to add "--force" to the assemble command >> > and it would all just work. However I haven't tested that on an array >> > that is in the process of a reshape so I cannot promise. >> > I might try to reproduce your situation and with some scratch drives >> > and check that mdadm -Af does the right thing, but it won't be a day >> > or so, and as I cannot see the --examine output I might get the >> > situation a bit wrong ... (hint hint: it is always best to post >> > full information rather than pointers to it, unless said information is >> > really really big). >> > >> > NeilBrown >> >> -- >> Majed B. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2 Disks Jumped Out While Reshaping RAID5 2009-09-06 23:55 ` Majed B. @ 2009-09-07 0:01 ` Majed B. 2009-09-07 0:31 ` NeilBrown 0 siblings, 1 reply; 9+ messages in thread From: Majed B. @ 2009-09-07 0:01 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid I have installed mdadm 3.0 and ran -Af and now it's continuing reshaping!!! root@Adam:~# mdadm --version mdadm - v3.0 - 2nd June 2009 root@Adam:~# mdadm -Af --verbose /dev/md0 mdadm: looking for devices for /dev/md0 mdadm: cannot open device /dev/sdi5: Device or resource busy mdadm: /dev/sdi5 has wrong uuid. mdadm: no recogniseable superblock on /dev/sdi2 mdadm: /dev/sdi2 has wrong uuid. mdadm: cannot open device /dev/sdi1: Device or resource busy mdadm: /dev/sdi1 has wrong uuid. mdadm: cannot open device /dev/sdi: Device or resource busy mdadm: /dev/sdi has wrong uuid. mdadm: no RAID superblock on /dev/sdh mdadm: /dev/sdh has wrong uuid. mdadm: no RAID superblock on /dev/sdg mdadm: /dev/sdg has wrong uuid. mdadm: no RAID superblock on /dev/sdf mdadm: /dev/sdf has wrong uuid. mdadm: no RAID superblock on /dev/sde mdadm: /dev/sde has wrong uuid. mdadm: no RAID superblock on /dev/sdd mdadm: /dev/sdd has wrong uuid. mdadm: no RAID superblock on /dev/sdc mdadm: /dev/sdc has wrong uuid. mdadm: no RAID superblock on /dev/sdb mdadm: /dev/sdb has wrong uuid. mdadm: no RAID superblock on /dev/sda mdadm: /dev/sda has wrong uuid. mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 5. mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 3. mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 0. mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 1. mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 7. mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2. mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 6. mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 4. mdadm: forcing event count in /dev/sdf1(0) from 949204 upto 949214 mdadm: added /dev/sde1 to /dev/md0 as 1 mdadm: added /dev/sdc1 to /dev/md0 as 2 mdadm: added /dev/sdg1 to /dev/md0 as 3 mdadm: added /dev/sda1 to /dev/md0 as 4 mdadm: added /dev/sdh1 to /dev/md0 as 5 mdadm: added /dev/sdb1 to /dev/md0 as 6 mdadm: added /dev/sdd1 to /dev/md0 as 7 mdadm: added /dev/sdf1 to /dev/md0 as 0 mdadm: /dev/md0 has been started with 7 drives (out of 8). root@Adam:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid5 sdf1[0] sdd1[7] sdb1[6] sdh1[5] sda1[4] sdc1[2] sde1[1] 5860558848 blocks super 0.91 level 5, 256k chunk, algorithm 2 [8/7] [UUU_UUUU] [=========>...........] reshape = 49.1% (479633152/976759808) finish=950.5min speed=8704K/sec unused devices: <none> sdg1 is not in the list. Is that correct?! sdg1 was one of the array's disks before expanding. So I guess now the array is degraded yet is reshaping as if it had 8 disks, correct? So after the reshaping process is over, I can add sdg1 again and it will resync properly, right? On Mon, Sep 7, 2009 at 2:55 AM, Majed B.<majedb@gmail.com> wrote: > Thanks a lot Neil! > > I didn't know that it requires you to register to download. My bad. > Here's the output of examine: > > /dev/sda1: > Magic : a92b4efc > Version : 00.91.00 > UUID : ed1a2670:03308d80:95a69c9f:ccf9605f > Creation Time : Sat May 23 00:22:49 2009 > Raid Level : raid5 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 0 > > Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > Delta Devices : 1 (7->8) > > Update Time : Wed Sep 2 13:28:39 2009 > State : clean > Active Devices : 6 > Working Devices : 6 > Failed Devices : 1 > Spare Devices : 0 > Checksum : 5b5a1163 - correct > Events : 949214 > > Layout : left-symmetric > Chunk Size : 256K > > Number Major Minor RaidDevice State > this 4 8 1 4 active sync /dev/sda1 > > 0 0 0 0 0 removed > 1 1 8 65 1 active sync /dev/sde1 > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 0 0 3 faulty removed > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 113 5 active sync /dev/sdh1 > 6 6 8 17 6 active sync /dev/sdb1 > 7 7 8 49 7 active sync /dev/sdd1 > /dev/sdb1: > Magic : a92b4efc > Version : 00.91.00 > UUID : ed1a2670:03308d80:95a69c9f:ccf9605f > Creation Time : Sat May 23 00:22:49 2009 > Raid Level : raid5 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 0 > > Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > Delta Devices : 1 (7->8) > > Update Time : Wed Sep 2 13:28:39 2009 > State : clean > Active Devices : 6 > Working Devices : 6 > Failed Devices : 1 > Spare Devices : 0 > Checksum : 5b5a1177 - correct > Events : 949214 > > Layout : left-symmetric > Chunk Size : 256K > > Number Major Minor RaidDevice State > this 6 8 17 6 active sync /dev/sdb1 > > 0 0 0 0 0 removed > 1 1 8 65 1 active sync /dev/sde1 > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 0 0 3 faulty removed > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 113 5 active sync /dev/sdh1 > 6 6 8 17 6 active sync /dev/sdb1 > 7 7 8 49 7 active sync /dev/sdd1 > /dev/sdc1: > Magic : a92b4efc > Version : 00.91.00 > UUID : ed1a2670:03308d80:95a69c9f:ccf9605f > Creation Time : Sat May 23 00:22:49 2009 > Raid Level : raid5 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 0 > > Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > Delta Devices : 1 (7->8) > > Update Time : Wed Sep 2 13:28:39 2009 > State : clean > Active Devices : 6 > Working Devices : 6 > Failed Devices : 1 > Spare Devices : 0 > Checksum : 5b5a117f - correct > Events : 949214 > > Layout : left-symmetric > Chunk Size : 256K > > Number Major Minor RaidDevice State > this 2 8 33 2 active sync /dev/sdc1 > > 0 0 0 0 0 removed > 1 1 8 65 1 active sync /dev/sde1 > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 0 0 3 faulty removed > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 113 5 active sync /dev/sdh1 > 6 6 8 17 6 active sync /dev/sdb1 > 7 7 8 49 7 active sync /dev/sdd1 > /dev/sdd1: > Magic : a92b4efc > Version : 00.91.00 > UUID : ed1a2670:03308d80:95a69c9f:ccf9605f > Creation Time : Sat May 23 00:22:49 2009 > Raid Level : raid5 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 0 > > Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > Delta Devices : 1 (7->8) > > Update Time : Wed Sep 2 13:28:39 2009 > State : clean > Active Devices : 6 > Working Devices : 6 > Failed Devices : 1 > Spare Devices : 0 > Checksum : 5b5a1199 - correct > Events : 949214 > > Layout : left-symmetric > Chunk Size : 256K > > Number Major Minor RaidDevice State > this 7 8 49 7 active sync /dev/sdd1 > > 0 0 0 0 0 removed > 1 1 8 65 1 active sync /dev/sde1 > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 0 0 3 faulty removed > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 113 5 active sync /dev/sdh1 > 6 6 8 17 6 active sync /dev/sdb1 > 7 7 8 49 7 active sync /dev/sdd1 > /dev/sde1: > Magic : a92b4efc > Version : 00.91.00 > UUID : ed1a2670:03308d80:95a69c9f:ccf9605f > Creation Time : Sat May 23 00:22:49 2009 > Raid Level : raid5 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 0 > > Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > Delta Devices : 1 (7->8) > > Update Time : Wed Sep 2 13:28:39 2009 > State : clean > Active Devices : 6 > Working Devices : 6 > Failed Devices : 1 > Spare Devices : 0 > Checksum : 5b5a119d - correct > Events : 949214 > > Layout : left-symmetric > Chunk Size : 256K > > Number Major Minor RaidDevice State > this 1 8 65 1 active sync /dev/sde1 > > 0 0 0 0 0 removed > 1 1 8 65 1 active sync /dev/sde1 > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 0 0 3 faulty removed > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 113 5 active sync /dev/sdh1 > 6 6 8 17 6 active sync /dev/sdb1 > 7 7 8 49 7 active sync /dev/sdd1 > /dev/sdf1: > Magic : a92b4efc > Version : 00.91.00 > UUID : ed1a2670:03308d80:95a69c9f:ccf9605f > Creation Time : Sat May 23 00:22:49 2009 > Raid Level : raid5 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 0 > > Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > Delta Devices : 1 (7->8) > > Update Time : Wed Sep 2 06:40:04 2009 > State : clean > Active Devices : 7 > Working Devices : 7 > Failed Devices : 1 > Spare Devices : 0 > Checksum : 5b59b1c1 - correct > Events : 949204 > > Layout : left-symmetric > Chunk Size : 256K > > Number Major Minor RaidDevice State > this 0 8 81 0 active sync /dev/sdf1 > > 0 0 8 81 0 active sync /dev/sdf1 > 1 1 8 65 1 active sync /dev/sde1 > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 0 0 3 faulty removed > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 113 5 active sync /dev/sdh1 > 6 6 8 17 6 active sync /dev/sdb1 > 7 7 8 49 7 active sync /dev/sdd1 > /dev/sdg1: > Magic : a92b4efc > Version : 00.91.00 > UUID : ed1a2670:03308d80:95a69c9f:ccf9605f > Creation Time : Sat May 23 00:22:49 2009 > Raid Level : raid5 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 0 > > Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB) > Delta Devices : 1 (7->8) > > Update Time : Wed Sep 2 00:10:39 2009 > State : clean > Active Devices : 8 > Working Devices : 8 > Failed Devices : 0 > Spare Devices : 0 > Checksum : fba3471a - correct > Events : 874530 > > Layout : left-symmetric > Chunk Size : 256K > > Number Major Minor RaidDevice State > this 3 8 97 3 active sync /dev/sdg1 > > 0 0 8 81 0 active sync /dev/sdf1 > 1 1 8 65 1 active sync /dev/sde1 > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 8 97 3 active sync /dev/sdg1 > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 113 5 active sync /dev/sdh1 > 6 6 8 17 6 active sync /dev/sdb1 > 7 7 8 49 7 active sync /dev/sdd1 > /dev/sdh1: > Magic : a92b4efc > Version : 00.91.00 > UUID : ed1a2670:03308d80:95a69c9f:ccf9605f > Creation Time : Sat May 23 00:22:49 2009 > Raid Level : raid5 > Used Dev Size : 976759808 (931.51 GiB 1000.20 GB) > Array Size : 6837318656 (6520.58 GiB 7001.41 GB) > Raid Devices : 8 > Total Devices : 8 > Preferred Minor : 0 > > Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) > Delta Devices : 1 (7->8) > > Update Time : Wed Sep 2 13:28:39 2009 > State : clean > Active Devices : 6 > Working Devices : 6 > Failed Devices : 1 > Spare Devices : 0 > Checksum : 5b5a11d5 - correct > Events : 949214 > > Layout : left-symmetric > Chunk Size : 256K > > Number Major Minor RaidDevice State > this 5 8 113 5 active sync /dev/sdh1 > > 0 0 0 0 0 removed > 1 1 8 65 1 active sync /dev/sde1 > 2 2 8 33 2 active sync /dev/sdc1 > 3 3 0 0 3 faulty removed > 4 4 8 1 4 active sync /dev/sda1 > 5 5 8 113 5 active sync /dev/sdh1 > 6 6 8 17 6 active sync /dev/sdb1 > 7 7 8 49 7 active sync /dev/sdd1 > > > I have already downloaded and compiled mdadm 3.0, but didn't install > it, awaiting further instructions from you. I'll install it now and > run -Af and report back what happens. > > Thank you again! > > On Mon, Sep 7, 2009 at 2:52 AM, Neil Brown<neilb@suse.de> wrote: >> On Sunday September 6, majedb@gmail.com wrote: >>> I forgot to mention that I'm running mdadm 2.6.7.1 (latest in Ubuntu >>> Server repositories). >> >> You will need at least 2.6.8 to be able to assemble arrays which are >> in the middle of a reshape. I would suggest 2.6.9 or 3.0. >> >>> >>> I tried forcing the assembly, but as mentioned, I just got an error: >>> root@Adam:/var/www# mdadm -Af /dev/md0 >>> mdadm: superblock on /dev/sdh1 doesn't match others - assembly aborted >> >> This incorrect message is fixed in 2.6.8 an later. >> >>> >>> I know I could've pasted whatever I wrote here, but it seemed >>> redundant. I'll keep your hint in mind for the next time, if any >>> (hopefully not). >> >> Redundant? How is that relevant? >> If you want help, your goal should be to make it as easy as possible >> for people to help you. Having all information in one email message >> is easy. Having to use a browser to get some of it makes it hard. >> Having to register on the website to down load an attachment makes it >> nearly impossible. >> >>> >>> This may be of an interest to you: >>> root@Adam:/var/www# mdadm -E /dev/sd[a-h]1 | grep Reshap >>> sda1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >>> sdb1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >>> sdc1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >>> sdd1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >>> sde1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >>> sdf1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >>> sdg1 Reshape pos'n : 2554257664 (2435.93 GiB 2615.56 GB) >>> sdh1 Reshape pos'n : 3357066496 (3201.55 GiB 3437.64 GB) >> >> If you can post that again but without the "grep" I might be able to >> be more helpful. (i.e. the complete output of "mdadm -E /dev/sd[a-h]1"). >> >> NeilBrown >> >> >> >>> >>> Note that sdd1 was the spare. >>> >>> The UUIDs are all the same and the superblock is all similar except >>> for the reshaping position of sdg1. >>> >>> I didn't try to recreate the array as I've never faced this issue >>> before, so I don't know what kind of repercussions it may have. >>> >>> What I do know, that at the worst case scenario, I can recreate the >>> array out of 7 disks (all but sdg1), but lose about 2.3TB worth of >>> data :( >>> >>> On Sun, Sep 6, 2009 at 12:32 AM, NeilBrown<neilb@suse.de> wrote: >>> > On Sun, September 6, 2009 6:22 am, Majed B. wrote: >>> >> Hello all, >>> >> >>> >> I have posted my problem already here: >>> >> http://ubuntuforums.org/showthread.php?p=7900571#post7900571 >>> >> It also has file attachments of the output of mdadm -E /dev/sd[a-h]1 >>> > >>> > It seems that you need to log in to read the attachements... so I haven't. >>> > >>> >> >>> >> I appreciate any help on this. >>> > >>> > Hopefully you just need to add "--force" to the assemble command >>> > and it would all just work. However I haven't tested that on an array >>> > that is in the process of a reshape so I cannot promise. >>> > I might try to reproduce your situation and with some scratch drives >>> > and check that mdadm -Af does the right thing, but it won't be a day >>> > or so, and as I cannot see the --examine output I might get the >>> > situation a bit wrong ... (hint hint: it is always best to post >>> > full information rather than pointers to it, unless said information is >>> > really really big). >>> > >>> > NeilBrown >>> >>> -- >>> Majed B. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Majed B. > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2 Disks Jumped Out While Reshaping RAID5 2009-09-07 0:01 ` Majed B. @ 2009-09-07 0:31 ` NeilBrown 2009-09-07 0:44 ` Majed B. 0 siblings, 1 reply; 9+ messages in thread From: NeilBrown @ 2009-09-07 0:31 UTC (permalink / raw) To: Majed B.; +Cc: linux-raid On Mon, September 7, 2009 10:01 am, Majed B. wrote: > I have installed mdadm 3.0 and ran -Af and now it's continuing > reshaping!!! Excellent. Based on the --examine info you provided it appears that /dev/sdg1 reported an error at about 00:10:39 on Wednesday morning and was evicted from the array. Reshape was up to 2435GB (37%) at that point. Reshape continued until 06:40:04 that morning at which point it had reached 3201GB (49%). At that point /dev/sdf1 seems to have reported an error so the whole array went off line. When you reassembled with mdadm-3.0 and --force, it excluded sdg1 as that was the oldest, and marked sdf1 as up-to-date, and continued. The reshape processes will have redone the last few chunks so all the data will have been properly relocated. As all the superblocks report that the array was "State : clean", you can be quite sure that all your data is safe (if they were "State : active" there would be a small chance some a block or two was corrupted and a fsck etc would be advised). It wouldn't hurt to examine your kernel logs to see what sort of error was tiggered at those two times in case there might be a need to replace a device. > sdg1 is not in the list. Is that correct?! sdg1 was one of the > array's disks before expanding. So I guess now the array is degraded > yet is reshaping as if it had 8 disks, correct? Yes, that is correct. It may be that sdg has a transient error, or it may have a serious media or other error. You should convince yourself that it is working reliably before adding it back in to the array. > > So after the reshaping process is over, I can add sdg1 again and it > will resync properly, right? Yes it will, providing no write-errors occur while writing data to it. NeilBrown ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2 Disks Jumped Out While Reshaping RAID5 2009-09-07 0:31 ` NeilBrown @ 2009-09-07 0:44 ` Majed B. 2009-09-07 16:34 ` Majed B. 0 siblings, 1 reply; 9+ messages in thread From: Majed B. @ 2009-09-07 0:44 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid Thanks a lot Neil for your help :) kernel logs showed a SATA link error for sdg. I double checked the cables and they were more than fine and the array was running for weeks before I did the reshaping and no errors were reported before the reshaping process. I'm using an MSI motherboard (MS-7514) and been having random issues with it since reaching 6 disks. I've recently ordered an EVGA motherboard and if things turn to be stable on it, I'll ditch MSI for good. Throughout searching for the past 6 days, I noticed people complaining from acpi and apic causing issues, so I turned them off and will see how things turn out. These are the hard disks I'm using: root@Adam:~# hddtemp /dev/sd[a-h] /dev/sda: WDC WD10EACS-00D6B1: 26°C /dev/sdb: WDC WD10EACS-00D6B1: 28°C /dev/sdc: WDC WD10EACS-00ZJB0: 29°C /dev/sdd: WDC WD10EADS-65L5B1: 27°C /dev/sde: WDC WD10EADS-65L5B1: 28°C /dev/sdf: MAXTOR STM31000340AS: 28°C /dev/sdg: WDC WD10EACS-00ZJB0: 26°C /dev/sdh: WDC WD10EADS-00L5B1: 25°C /dev/sdi: Hitachi HDS721680PLAT80: 32°C (sdi is the OS disk) Neil, do you suggest any certain test/stress-tests to put sdg through? I'll force a couple of short and long smartd tests on it, and have dd read the whole disk a couple of times to make sure all sectors are read properly. Is that sufficient? Thank you again. On Mon, Sep 7, 2009 at 3:31 AM, NeilBrown<neilb@suse.de> wrote: > On Mon, September 7, 2009 10:01 am, Majed B. wrote: >> I have installed mdadm 3.0 and ran -Af and now it's continuing >> reshaping!!! > > Excellent. > > Based on the --examine info you provided it appears that > /dev/sdg1 reported an error at about 00:10:39 on Wednesday morning > and was evicted from the array. Reshape was up to 2435GB (37%) at > that point. > Reshape continued until 06:40:04 that morning at which point it > had reached 3201GB (49%). At that point /dev/sdf1 seems to have > reported an error so the whole array went off line. > > When you reassembled with mdadm-3.0 and --force, it excluded sdg1 > as that was the oldest, and marked sdf1 as up-to-date, and continued. > > The reshape processes will have redone the last few chunks so all > the data will have been properly relocated. > > As all the superblocks report that the array was "State : clean", > you can be quite sure that all your data is safe (if they were > "State : active" there would be a small chance some a block or two > was corrupted and a fsck etc would be advised). > > It wouldn't hurt to examine your kernel logs to see what sort of > error was tiggered at those two times in case there might be a need > to replace a device. > > > > >> sdg1 is not in the list. Is that correct?! sdg1 was one of the >> array's disks before expanding. So I guess now the array is degraded >> yet is reshaping as if it had 8 disks, correct? > > Yes, that is correct. > It may be that sdg has a transient error, or it may have a serious > media or other error. You should convince yourself that it is working > reliably before adding it back in to the array. > > > >> >> So after the reshaping process is over, I can add sdg1 again and it >> will resync properly, right? > > Yes it will, providing no write-errors occur while writing data to it. > > NeilBrown > > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: 2 Disks Jumped Out While Reshaping RAID5 2009-09-07 0:44 ` Majed B. @ 2009-09-07 16:34 ` Majed B. 0 siblings, 0 replies; 9+ messages in thread From: Majed B. @ 2009-09-07 16:34 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid A little update on the situation: After uninstalling mdadm 2.6.7.1 which ships with Ubuntu 9.04, and installing mdadm 3.0, I got this: root@Adam:~# cat /proc/mdstat Personalities : unused devices: <none> I'm guessing that happened because initram tools was removed when uninstalling the old mdadm. No problem, I'll just assemble the array on boot (through a line in /etc/rc.local). I then proceeded to assemble the array, but it refused: root@Adam:~# mdadm -Af --verbose /dev/md0 mdadm: looking for devices for /dev/md0 mdadm: cannot open device /dev/sdi5: Device or resource busy mdadm: /dev/sdi5 has wrong uuid. mdadm: no recogniseable superblock on /dev/sdi2 mdadm: /dev/sdi2 has wrong uuid. mdadm: cannot open device /dev/sdi1: Device or resource busy mdadm: /dev/sdi1 has wrong uuid. mdadm: cannot open device /dev/sdi: Device or resource busy mdadm: /dev/sdi has wrong uuid. mdadm: no RAID superblock on /dev/sdh mdadm: /dev/sdh has wrong uuid. mdadm: superblock on /dev/sdg1 doesn't match others - assembly aborted Since sdg1 has flunked out before, I just zeroed its superblock to add it later, if it wasn't dead: root@Adam:~# mdadm --zero-superblock /dev/sdg mdadm: Unrecognised md component device - /dev/sdg root@Adam:~# mdadm --zero-superblock /dev/sdg1 root@Adam:~# mdadm --zero-superblock /dev/sdg1 mdadm: Unrecognised md component device - /dev/sdg1 The array assembled properly after that (with 7 out 8 disks -- running degraded): root@Adam:~# mdadm -Af --verbose /dev/md0 mdadm: looking for devices for /dev/md0 mdadm: cannot open device /dev/sdi5: Device or resource busy mdadm: /dev/sdi5 has wrong uuid. mdadm: no recogniseable superblock on /dev/sdi2 mdadm: /dev/sdi2 has wrong uuid. mdadm: cannot open device /dev/sdi1: Device or resource busy mdadm: /dev/sdi1 has wrong uuid. mdadm: cannot open device /dev/sdi: Device or resource busy mdadm: /dev/sdi has wrong uuid. mdadm: no RAID superblock on /dev/sdh mdadm: /dev/sdh has wrong uuid. mdadm: no RAID superblock on /dev/sdg1 mdadm: /dev/sdg1 has wrong uuid. mdadm: no RAID superblock on /dev/sdg mdadm: /dev/sdg has wrong uuid. mdadm: no RAID superblock on /dev/sdf mdadm: /dev/sdf has wrong uuid. mdadm: no RAID superblock on /dev/sde mdadm: /dev/sde has wrong uuid. mdadm: no RAID superblock on /dev/sdd mdadm: /dev/sdd has wrong uuid. mdadm: no RAID superblock on /dev/sdc mdadm: /dev/sdc has wrong uuid. mdadm: no RAID superblock on /dev/sdb mdadm: /dev/sdb has wrong uuid. mdadm: no RAID superblock on /dev/sda mdadm: /dev/sda has wrong uuid. mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 5. mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 0. mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 1. mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 7. mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2. mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 6. mdadm: /dev/sda1 is identified as a member of /dev/md0, slot 4. mdadm: added /dev/sde1 to /dev/md0 as 1 mdadm: added /dev/sdc1 to /dev/md0 as 2 mdadm: no uptodate device for slot 3 of /dev/md0 mdadm: added /dev/sda1 to /dev/md0 as 4 mdadm: added /dev/sdh1 to /dev/md0 as 5 mdadm: added /dev/sdb1 to /dev/md0 as 6 mdadm: added /dev/sdd1 to /dev/md0 as 7 mdadm: added /dev/sdf1 to /dev/md0 as 0 mdadm: /dev/md0 has been started with 7 drives (out of 8). root@Adam:~# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdf1[0] sdd1[7] sdb1[6] sdh1[5] sda1[4] sdc1[2] sde1[1] 6837318656 blocks level 5, 256k chunk, algorithm 2 [8/7] [UUU_UUUU] unused devices: <none> After some poking, I'm suspecting the MSI motherboard itself, since the problems happens to disks that are on ports 7 and 8 on the motherboard, and those two ports have their own controller and they share a single bus. I've ordered an EVGA motherboard that should arrive in a week or so. I'll update later when I move the hard disks to it and add that sdg disk. Thanks again Neil for your help :) On Mon, Sep 7, 2009 at 3:44 AM, Majed B.<majedb@gmail.com> wrote: > Thanks a lot Neil for your help :) > > kernel logs showed a SATA link error for sdg. I double checked the > cables and they were more than fine and the array was running for > weeks before I did the reshaping and no errors were reported before > the reshaping process. > > I'm using an MSI motherboard (MS-7514) and been having random issues > with it since reaching 6 disks. I've recently ordered an EVGA > motherboard and if things turn to be stable on it, I'll ditch MSI for > good. > > Throughout searching for the past 6 days, I noticed people complaining > from acpi and apic causing issues, so I turned them off and will see > how things turn out. > > These are the hard disks I'm using: > > root@Adam:~# hddtemp /dev/sd[a-h] > /dev/sda: WDC WD10EACS-00D6B1: 26°C > /dev/sdb: WDC WD10EACS-00D6B1: 28°C > /dev/sdc: WDC WD10EACS-00ZJB0: 29°C > /dev/sdd: WDC WD10EADS-65L5B1: 27°C > /dev/sde: WDC WD10EADS-65L5B1: 28°C > /dev/sdf: MAXTOR STM31000340AS: 28°C > /dev/sdg: WDC WD10EACS-00ZJB0: 26°C > /dev/sdh: WDC WD10EADS-00L5B1: 25°C > /dev/sdi: Hitachi HDS721680PLAT80: 32°C > > (sdi is the OS disk) > > Neil, do you suggest any certain test/stress-tests to put sdg through? > > I'll force a couple of short and long smartd tests on it, and have dd > read the whole disk a couple of times to make sure all sectors are > read properly. Is that sufficient? > > Thank you again. > > On Mon, Sep 7, 2009 at 3:31 AM, NeilBrown<neilb@suse.de> wrote: >> On Mon, September 7, 2009 10:01 am, Majed B. wrote: >>> I have installed mdadm 3.0 and ran -Af and now it's continuing >>> reshaping!!! >> >> Excellent. >> >> Based on the --examine info you provided it appears that >> /dev/sdg1 reported an error at about 00:10:39 on Wednesday morning >> and was evicted from the array. Reshape was up to 2435GB (37%) at >> that point. >> Reshape continued until 06:40:04 that morning at which point it >> had reached 3201GB (49%). At that point /dev/sdf1 seems to have >> reported an error so the whole array went off line. >> >> When you reassembled with mdadm-3.0 and --force, it excluded sdg1 >> as that was the oldest, and marked sdf1 as up-to-date, and continued. >> >> The reshape processes will have redone the last few chunks so all >> the data will have been properly relocated. >> >> As all the superblocks report that the array was "State : clean", >> you can be quite sure that all your data is safe (if they were >> "State : active" there would be a small chance some a block or two >> was corrupted and a fsck etc would be advised). >> >> It wouldn't hurt to examine your kernel logs to see what sort of >> error was tiggered at those two times in case there might be a need >> to replace a device. >> >> >> >> >>> sdg1 is not in the list. Is that correct?! sdg1 was one of the >>> array's disks before expanding. So I guess now the array is degraded >>> yet is reshaping as if it had 8 disks, correct? >> >> Yes, that is correct. >> It may be that sdg has a transient error, or it may have a serious >> media or other error. You should convince yourself that it is working >> reliably before adding it back in to the array. >> >> >> >>> >>> So after the reshaping process is over, I can add sdg1 again and it >>> will resync properly, right? >> >> Yes it will, providing no write-errors occur while writing data to it. >> >> NeilBrown >> >> > > > > -- > Majed B. > -- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-09-07 16:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-05 20:22 2 Disks Jumped Out While Reshaping RAID5 Majed B. 2009-09-05 21:32 ` NeilBrown 2009-09-06 10:00 ` Majed B. 2009-09-06 23:52 ` Neil Brown 2009-09-06 23:55 ` Majed B. 2009-09-07 0:01 ` Majed B. 2009-09-07 0:31 ` NeilBrown 2009-09-07 0:44 ` Majed B. 2009-09-07 16:34 ` Majed B.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox