* RE: my first raid disaster on reboot :o( update @ 2005-09-08 11:38 Ken Walker 2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere 0 siblings, 1 reply; 18+ messages in thread From: Ken Walker @ 2005-09-08 11:38 UTC (permalink / raw) To: linux-raid I'm getting confused again. I installed Debian 3.1 onto two SCSI drives set up as raid1. I also set-up the four ide drives, during installation and set them as /dev/md7 using /dev/hda,/dev/hdc /dev/md8 using /dev/hab,/dev/hdd both ext3 they started, and sync'd on reboot, md7 and md8 didn't auto start. so i created them again with mdadm -C /dev/md7 -l1 -n2 /dev/hda /dev/hdc this stared rebuilding. then i did the same for md8 mdadm -C /dev/md8 -l1 -n2 /dev/hdb /dev/hdd then i did mkfs.ext3 /dev/md7 mkfs.ext3 /dev/md8 I checked with Fdisk that they were all set as FD. then i did (i made a copy of the original mdadm.conf first.) mdadm --detail -- scan > mdadm.conf And on reboot only md0 would mount. So i copied the original mdadm.conf back and rebooted, and all the raids apart from md7 and md8 started. I noticed at the top of the original mdadm.conf i had the following DEVICE partitions so i did mdadm --detail -- scan > mdadm.conf again, with md7 and md8 running and rebooted. adding DEVICE partitions back to the top The system booted up but again without md7 or md8, it did its corrupt superblock or ext2 file system complaints. But I'm getting confused, because, on http://www.linuxdevcenter.com/pub/a/linux/2002/12/05/RAID.html which is where i got the mdadm --detail -- scan > mdadm.conf from, the example he gives DEVICE /dev/sdb1 /dev/sdc1 ARRAY /dev/md0 level=raid0 num-devices=2 UUID=410a299e:4cdd535e:169d3df4:48b7144a is the other way round in my mdadm.conf file, i have ARRAY /dev/md0 level=raid0 num-devices=2 UUID=410a299e:4cdd535e:169d3df4:48b7144a DEVICE /dev/sdb1 /dev/sdc1 Which way round should it be? I have also read that a mdadm.conf file isn't really needed, but can be helpful, if i hide me mdadm.conf file will the system boot with md7 and md8. I do have those two raids in my fstab file at the end as /dev/md7 /Cad100 ext3 defaults 0 2 /dev/md8 /Cad200 ext3 defaults 0 2 Can anybody help :o( Ken -----Original Message----- From: Ken Walker [mailto:ken.walker@manchester.ac.uk] Sent: 06 September 2005 2:26 pm To: linux-raid@vger.kernel.org Subject: my first raid disaster on reboot :o( I've got debian 3.1, kernel 2.6 installed on a machine with two 9.1g SCSI and 4 160g IDE's. The SCSI is split up into / /usr /var /swap /tmp and /home, each set as a raid1. The IDE's are set up as raid1 on the ide channels, such that hda is mirrored with hdc and hdb is mirrored with hdd. I had to move the system today so powered down with shutdown -h now. On reboot i just get / mounted ( i think ) and everything else says mdx corrupt superblock or such and not a valid ext2 fs. all the mirrors were set us as ext3 and when it was up and running /proc/mdstat said all was well. /etc/fstab has all the raids present. I'm kinda stuck as to where to start. Could anybody point me in the right direction please. many thanks Ken - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Drive fails & raid6 array is not self rebuild . 2005-09-08 11:38 my first raid disaster on reboot :o( update Ken Walker @ 2005-09-08 18:54 ` Mr. James W. Laferriere 2005-09-08 19:34 ` Molle Bestefich 2005-09-08 21:09 ` Neil Brown 0 siblings, 2 replies; 18+ messages in thread From: Mr. James W. Laferriere @ 2005-09-08 18:54 UTC (permalink / raw) To: linux-raid maillist Hello All , Is there a documented procedure to follow during creation or after that will get a raid6 array to self rebuild ? Why I am asking . I was getting the errors below at a heavy rate , so ... Sep 7 20:11:49 localhost kernel: scsi2 (2:0): rejecting I/O to dead device Sep 7 20:11:49 localhost kernel: md: write_disk_sb failed for device sde Sep 7 20:11:49 localhost kernel: md: excessive errors occurred during superblock update, exiting Sep 7 20:11:49 localhost kernel: raid5: Disk failure on sde, disabling device. Operation continuing on 35 devices I ran the below & the above messages stopped . But the array (appears to have) never tried rebuilding . # mdadm --manage --fail /dev/md_d0 /dev/sde The problem arose because the drive died totally . ie: root@devel-0:/ # fdisk /dev/sde Unable to open /dev/sde # cat /proc/mdstat ...snip... md_d0 : active raid5 sdc[0] sdao[40] sdan[34] sdam[33] sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1] 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] ...snip... # cat /etc/mdadm.conf DEV /dev/sd[c-l] /dev/sd[n-w] /dev/sd[yz] /dev/sda[a-h] /dev/sda[j-s] ARRAY /dev/md_d0 level=raid5 num-devices=36 spares=4 UUID=2006d8c6:71918820:247e00b0:460d5bc1 -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere @ 2005-09-08 19:34 ` Molle Bestefich 2005-09-08 21:09 ` Neil Brown 1 sibling, 0 replies; 18+ messages in thread From: Molle Bestefich @ 2005-09-08 19:34 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist Mr. James W. Laferriere wrote: > Is there a documented procedure to follow during > creation or after that will get a raid6 array to self > rebuild ? MD will rebuild your array automatically, given that it has a spare disk to use. > raid5: Disk failure on sde, disabling device. Operation continuing on 35 devices Seems like a raid5, not raid6.. > [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] No need to do any rebuilding on the remaining devices, since the data on them are fine. You've lost redundancy however, so you should add a new disk to the array ASAP. With 35 disks, I'd recommend that you at least use raid6 in place of raid5.. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere 2005-09-08 19:34 ` Molle Bestefich @ 2005-09-08 21:09 ` Neil Brown 2005-09-08 21:39 ` Mr. James W. Laferriere 1 sibling, 1 reply; 18+ messages in thread From: Neil Brown @ 2005-09-08 21:09 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist On Thursday September 8, babydr@baby-dragons.com wrote: > Hello All , Is there a documented procedure to follow during > creation or after that will get a raid6 array to self > rebuild ? I suspect a kernel upgrade would do the trick, though you don't say what kernel you are running. You could probably kick it along by removing and re-adding your spare: mdadm /dev/md_d0 --remove /dev/sdao mdadm /dev/md_d0 --add /dev/sdao (And I assume you mean 'raid5' rather than 'raid6', not that it matters..) NeilBrown > # cat /proc/mdstat > ...snip... > md_d0 : active raid5 sdc[0] sdao[40] sdan[34] sdam[33] sdal[32] > sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25] > sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17] > sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8] > sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1] > 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] > [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-08 21:09 ` Neil Brown @ 2005-09-08 21:39 ` Mr. James W. Laferriere 2005-09-09 0:50 ` Neil Brown 0 siblings, 1 reply; 18+ messages in thread From: Mr. James W. Laferriere @ 2005-09-08 21:39 UTC (permalink / raw) To: linux-raid maillist Hello Neil , Inline . On Fri, 9 Sep 2005, Neil Brown wrote: > On Thursday September 8, babydr@baby-dragons.com wrote: >> Hello All , Is there a documented procedure to follow during >> creation or after that will get a raid6 array to self >> rebuild ? > I suspect a kernel upgrade would do the trick, though you don't say > what kernel you are running. > You could probably kick it along by removing and re-adding your spare: > mdadm /dev/md_d0 --remove /dev/sdao > mdadm /dev/md_d0 --add /dev/sdao > > (And I assume you mean 'raid5' rather than 'raid6', not that it > matters..) Sorry , yes I meant raid5 . My kernel version is . root@devel-0:/ # uname -a Linux devel-0 2.6.12.5 #1 SMP Fri Aug 26 20:09:46 UTC 2005 i686 GNU/Linux When I try to do the remove I get . root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao mdadm: hot remove failed for /dev/sdao: Device or resource busy I should also have 3 other drives that are spares . I could try hot remove on one of them . See at bottom the output of mdadm --misc -Q --detail /dev/md_d0 Which is showing no spare drives ? And I built it with 4 spares root@devel-0:~ # cat /etc/mdadm.conf DEV /dev/sd[c-l] /dev/sd[n-w] /dev/sd[yz] /dev/sda[a-h] /dev/sda[j-s] ARRAY /dev/md_d0 level=raid5 num-devices=36 spares=4 UUID=2006d8c6:71918820:247e00b0:460d5bc1 c-l is 10 devices (one is dead 'e' leaves 9) . n-w is 10 devices yz is 2 devices aa-h is 8 devices aj-s is 10 devices ---------- 40 devices given in mdadm.conf -1 dead device . ---------- 39 devices 36 devices used (per /proc/mdstat) ---------- 3 devices for spares . >> # cat /proc/mdstat >> ...snip... >> md_d0 : active raid5 sdc[0] sdao[40] sdan[34] sdam[33] sdal[32] >> sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25] >> sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17] >> sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8] >> sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1] >> 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] >> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] /dev/md_d0: Version : 01.02.01 Creation Time : Sun Aug 28 17:46:59 2005 Raid Level : raid5 Array Size : 1244826240 (1187.16 GiB 1274.70 GB) Device Size : 35566464 (33.92 GiB 36.42 GB) Raid Devices : 36 Total Devices : 36 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Thu Sep 8 06:26:10 2005 State : clean, degraded Active Devices : 35 Working Devices : 35 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K Name : UUID : 2006d8c6:71918820:247e00b0:460d5bc1 Events : 5308 Number Major Minor RaidDevice State 0 8 32 0 active sync /dev/sdc 1 8 48 1 active sync /dev/sdd 0 0 0 0 removed 3 8 80 3 active sync /dev/sdf 4 8 96 4 active sync /dev/sdg 5 8 112 5 active sync /dev/sdh 6 8 128 6 active sync /dev/sdi 7 8 144 7 active sync /dev/sdj 8 8 160 8 active sync /dev/sdk 9 8 176 9 active sync /dev/sdl 10 8 208 10 active sync /dev/sdn 11 8 224 11 active sync /dev/sdo 12 8 240 12 active sync /dev/sdp 13 65 0 13 active sync /dev/sdq 14 65 16 14 active sync /dev/sdr 15 65 32 15 active sync /dev/sds 16 65 48 16 active sync /dev/sdt 17 65 64 17 active sync /dev/sdu 18 65 80 18 active sync /dev/sdv 19 65 96 19 active sync /dev/sdw 20 65 128 20 active sync /dev/sdy 21 65 144 21 active sync /dev/sdz 22 65 160 22 active sync /dev/sdaa 23 65 176 23 active sync /dev/sdab 24 65 192 24 active sync /dev/sdac 25 65 208 25 active sync /dev/sdad 26 65 224 26 active sync /dev/sdae 27 65 240 27 active sync /dev/sdaf 28 66 0 28 active sync /dev/sdag 29 66 16 29 active sync /dev/sdah 30 66 48 30 active sync /dev/sdaj 31 66 64 31 active sync /dev/sdak 32 66 80 32 active sync /dev/sdal 33 66 96 33 active sync /dev/sdam 34 66 112 34 active sync /dev/sdan 40 66 128 35 active sync /dev/sdao 2 8 64 - faulty spare /dev/sde -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-08 21:39 ` Mr. James W. Laferriere @ 2005-09-09 0:50 ` Neil Brown 2005-09-09 2:05 ` Mr. James W. Laferriere 0 siblings, 1 reply; 18+ messages in thread From: Neil Brown @ 2005-09-09 0:50 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist On Thursday September 8, babydr@baby-dragons.com wrote: > > When I try to do the remove I get . > root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao > mdadm: hot remove failed for /dev/sdao: Device or resource busy > > I should also have 3 other drives that are spares . I could > try hot remove on one of them . See at bottom the output of > mdadm --misc -Q --detail /dev/md_d0 > Which is showing no spare drives ? And I built it with 4 > spares Yes... /dev/sda[pqrs] are missing. I wonder why.. What does mdadm -E /dev/sda[pqrs] show? What happens if you then mdadm /dev/md_d0 -a /dev/sda[pqrs] ?? NeilBrown ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 0:50 ` Neil Brown @ 2005-09-09 2:05 ` Mr. James W. Laferriere 2005-09-09 2:15 ` Mr. James W. Laferriere 2005-09-09 7:40 ` Neil Brown 0 siblings, 2 replies; 18+ messages in thread From: Mr. James W. Laferriere @ 2005-09-09 2:05 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid maillist Hello Neil , On Fri, 9 Sep 2005, Neil Brown wrote: > On Thursday September 8, babydr@baby-dragons.com wrote: >> When I try to do the remove I get . >> root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao >> mdadm: hot remove failed for /dev/sdao: Device or resource busy >> >> I should also have 3 other drives that are spares . I could >> try hot remove on one of them . See at bottom the output of >> mdadm --misc -Q --detail /dev/md_d0 >> Which is showing no spare drives ? And I built it with 4 >> spares > > Yes... /dev/sda[pqrs] are missing. I wonder why.. > > What does > mdadm -E /dev/sda[pqrs] > show? See way below . > What happens if you then > mdadm /dev/md_d0 -a /dev/sda[pqrs] > ?? Getting stranger & stranger . root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs] mdadm: re-added /dev/sdap root@devel-0:~ # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10] md_d0 : active raid5 sdap[36] sdc[0] sdao[40] sdan[34] sdam[33] sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1] 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] md1 : active raid1 sdb2[0] sda2[1] 1003968 blocks [2/2] [UU] md2 : active raid1 sdb3[0] sda3[1] 34700288 blocks [2/2] [UU] md0 : active raid1 sdb1[0] sda1[1] 136448 blocks [2/2] [UU] unused devices: <none> It appears they think their still part of the array . root@devel-0:~ # mdadm -E /dev/sda[pqrs] /dev/sdap: Magic : a92b4efc Version : 01.00 Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1 Name : Creation Time : Sun Aug 28 17:46:59 2005 Raid Level : raid5 Raid Devices : 36 Device Size : 71132943 (33.92 GiB 36.42 GB) Data Offset : 16 sectors Super Offset : 8 sectors State : clean Device UUID : c083f71d:ce15a0aa:24341675:45ec6e3e Update Time : Sun Aug 28 20:43:06 2005 Checksum : dc216e5 - correct Events : 1 Layout : left-symmetric Chunk Size : 64K Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed /dev/sdaq: Magic : a92b4efc Version : 01.00 Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1 Name : Creation Time : Sun Aug 28 17:46:59 2005 Raid Level : raid5 Raid Devices : 36 Device Size : 71132943 (33.92 GiB 36.42 GB) Data Offset : 16 sectors Super Offset : 8 sectors State : clean Device UUID : 430b9730:4416eb44:2f793e78:a3a92cc1 Update Time : Sun Aug 28 20:43:06 2005 Checksum : 4092a148 - correct Events : 1 Layout : left-symmetric Chunk Size : 64K Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed /dev/sdar: Magic : a92b4efc Version : 01.00 Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1 Name : Creation Time : Sun Aug 28 17:46:59 2005 Raid Level : raid5 Raid Devices : 36 Device Size : 71132943 (33.92 GiB 36.42 GB) Data Offset : 16 sectors Super Offset : 8 sectors State : clean Device UUID : 33ea7f64:976740bb:ff88e4bc:84534774 Update Time : Sun Aug 28 20:43:06 2005 Checksum : e2918b3d - correct Events : 1 Layout : left-symmetric Chunk Size : 64K Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed /dev/sdas: Magic : a92b4efc Version : 01.00 Array UUID : 2006d8c6:71918820:247e00b0:460d5bc1 Name : Creation Time : Sun Aug 28 17:46:59 2005 Raid Level : raid5 Raid Devices : 36 Device Size : 71132943 (33.92 GiB 36.42 GB) Data Offset : 16 sectors Super Offset : 8 sectors State : clean Device UUID : acb2ea9d:7c3f3b6e:98d9f85c:c8cb2bae Update Time : Sun Aug 28 20:43:06 2005 Checksum : a8eff479 - correct Events : 1 Layout : left-symmetric Chunk Size : 64K Array State : uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu 1 failed root@devel-0:~ # -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 2:05 ` Mr. James W. Laferriere @ 2005-09-09 2:15 ` Mr. James W. Laferriere 2005-09-09 7:40 ` Neil Brown 1 sibling, 0 replies; 18+ messages in thread From: Mr. James W. Laferriere @ 2005-09-09 2:15 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid maillist Hello Neil , On Thu, 8 Sep 2005, Mr. James W. Laferriere wrote: > On Fri, 9 Sep 2005, Neil Brown wrote: >> On Thursday September 8, babydr@baby-dragons.com wrote: >>> When I try to do the remove I get . >>> root@devel-0:/ # mdadm /dev/md_d0 --remove /dev/sdao >>> mdadm: hot remove failed for /dev/sdao: Device or resource busy >>> >>> I should also have 3 other drives that are spares . I could >>> try hot remove on one of them . See at bottom the output of >>> mdadm --misc -Q --detail /dev/md_d0 >>> Which is showing no spare drives ? And I built it with 4 >>> spares >> >> Yes... /dev/sda[pqrs] are missing. I wonder why.. >> >> What does >> mdadm -E /dev/sda[pqrs] >> show? > See way below . > >> What happens if you then >> mdadm /dev/md_d0 -a /dev/sda[pqrs] >> ?? > > Getting stranger & stranger . > > root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs] > mdadm: re-added /dev/sdap Is there any debugging ouptions I can enable from the boot: prompt or compile in ? just some more info ... Hth , JimL # dmesg | tail -n 43 RAID5 conf printout: --- rd:36 wd:35 fd:1 disk 0, o:1, dev:sdc disk 1, o:1, dev:sdd disk 3, o:1, dev:sdf disk 4, o:1, dev:sdg disk 5, o:1, dev:sdh disk 6, o:1, dev:sdi disk 7, o:1, dev:sdj disk 8, o:1, dev:sdk disk 9, o:1, dev:sdl disk 10, o:1, dev:sdn disk 11, o:1, dev:sdo disk 12, o:1, dev:sdp disk 13, o:1, dev:sdq disk 14, o:1, dev:sdr disk 15, o:1, dev:sds disk 16, o:1, dev:sdt disk 17, o:1, dev:sdu disk 18, o:1, dev:sdv disk 19, o:1, dev:sdw disk 20, o:1, dev:sdy disk 21, o:1, dev:sdz disk 22, o:1, dev:sdaa disk 23, o:1, dev:sdab disk 24, o:1, dev:sdac disk 25, o:1, dev:sdad disk 26, o:1, dev:sdae disk 27, o:1, dev:sdaf disk 28, o:1, dev:sdag disk 29, o:1, dev:sdah disk 30, o:1, dev:sdaj disk 31, o:1, dev:sdak disk 32, o:1, dev:sdal disk 33, o:1, dev:sdam disk 34, o:1, dev:sdan disk 35, o:1, dev:sdao md: cannot remove active disk sdao from md_d0 ... md: cannot remove active disk sdao from md_d0 ... md: bind<sdap> md: bind<sdaq> md: bind<sdar> md: bind<sdas> -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 2:05 ` Mr. James W. Laferriere 2005-09-09 2:15 ` Mr. James W. Laferriere @ 2005-09-09 7:40 ` Neil Brown 2005-09-09 11:37 ` David M. Strang 2005-09-09 20:07 ` Mr. James W. Laferriere 1 sibling, 2 replies; 18+ messages in thread From: Neil Brown @ 2005-09-09 7:40 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 1300 bytes --] On Thursday September 8, babydr@baby-dragons.com wrote: > > What happens if you then > > mdadm /dev/md_d0 -a /dev/sda[pqrs] > > ?? > > Getting stranger & stranger . > > root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs] > mdadm: re-added /dev/sdap > Hmm.. mdadm bug. > root@devel-0:~ # cat /proc/mdstat > Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10] > md_d0 : active raid5 sdap[36] sdc[0] sdao[40] sdan[34] sdam[33] > sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] > sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] > sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] > sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1] > 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] > [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] Hmm.. obviously hot-add isn't enough to trigger the rebuild in that kernel. Attached are three patches. The first two are needed by 2.6.12.5 to make sure resync happens (this is particularly a problem for version-1 superblocks) or just upgrade to 2.6.13. The last fixes mdadm-v2.0 so that when you add /dev/sda[pqrs] it actually adds all of them, and so that when you --assemble a version-1 array with spares, the spares actually get included. NeilBrown [-- Attachment #2: 349MdHotAddFix --] [-- Type: text/plain, Size: 786 bytes --] Status: ok Make sure recovery happens when add_new_disk is used for hot_add Currently if add_new_disk is used to hot-add a drive to a degraded array, recovery doesn't start ... because we didn't tell it to. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> ### Diffstat output ./drivers/md/md.c | 2 ++ 1 files changed, 2 insertions(+) diff ./drivers/md/md.c~current~ ./drivers/md/md.c --- ./drivers/md/md.c~current~ 2005-05-31 13:40:35.000000000 +1000 +++ ./drivers/md/md.c 2005-05-31 13:40:34.000000000 +1000 @@ -2232,6 +2232,8 @@ static int add_new_disk(mddev_t * mddev, err = bind_rdev_to_array(rdev, mddev); if (err) export_rdev(rdev); + + set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); if (mddev->thread) md_wakeup_thread(mddev->thread); return err; [-- Attachment #3: 418MdWakeThread --] [-- Type: text/plain, Size: 1420 bytes --] Status: ok Make sure resync gets started when array starts. We weren't actually waking up the md thread after setting MD_RECOVERY_NEEDED when assembling an array, so it is possible to lose a race and not actually start resync. So add a call to md_wakeup_thread, and while we are at it, remove all the "if (mddev->thread)" guards as md_wake_thread does its own checking. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> ### Diffstat output ./drivers/md/md.c | 7 +++---- 1 files changed, 3 insertions(+), 4 deletions(-) diff ./drivers/md/md.c~current~ ./drivers/md/md.c --- ./drivers/md/md.c 2005-08-26 17:00:30.000000000 +1000 +++ ./drivers/md/md.c~current~ 2005-08-26 17:00:39.000000000 +1000 @@ -256,8 +256,7 @@ static inline void mddev_unlock(mddev_t { up(&mddev->reconfig_sem); - if (mddev->thread) - md_wakeup_thread(mddev->thread); + md_wakeup_thread(mddev->thread); } mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr) @@ -1726,6 +1725,7 @@ static int do_md_run(mddev_t * mddev) mddev->in_sync = 1; set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); + md_wakeup_thread(mddev->thread); if (mddev->sb_dirty) md_update_sb(mddev); @@ -2255,8 +2255,7 @@ static int add_new_disk(mddev_t * mddev, export_rdev(rdev); set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); - if (mddev->thread) - md_wakeup_thread(mddev->thread); + md_wakeup_thread(mddev->thread); return err; } [-- Attachment #4: patch --] [-- Type: text/plain, Size: 1128 bytes --] diff ./Assemble.c~current~ ./Assemble.c --- ./Assemble.c~current~ 2005-09-05 10:55:01.000000000 +1000 +++ ./Assemble.c 2005-09-09 16:24:50.000000000 +1000 @@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char struct mdinfo info; struct mddev_ident_s ident2; char *avail; + int nextspare = 0; vers = md_get_version(mdfd); if (vers <= 0) { @@ -320,6 +321,11 @@ int Assemble(struct supertype *st, char i = devcnt; else i = devices[devcnt].raid_disk; + if (i+1 == 0) { + if (nextspare < info.array.raid_disks) + nextspare = info.array.raid_disks; + i = nextspare++; + } if (i < 10000) { if (i >= bestcnt) { unsigned int newbestcnt = i+10; diff ./Manage.c~current~ ./Manage.c --- ./Manage.c~current~ 2005-09-05 10:54:55.000000000 +1000 +++ ./Manage.c 2005-09-09 16:04:12.000000000 +1000 @@ -288,7 +288,7 @@ int Manage_subdevs(char *devname, int fd if (ioctl(fd, ADD_NEW_DISK, &disc) == 0) { if (verbose >= 0) fprintf(stderr, Name ": re-added %s\n", dv->devname); - return 0; + continue; } /* fall back on normal-add */ } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 7:40 ` Neil Brown @ 2005-09-09 11:37 ` David M. Strang 2005-09-09 13:52 ` Mr. James W. Laferriere 2005-09-09 20:07 ` Mr. James W. Laferriere 1 sibling, 1 reply; 18+ messages in thread From: David M. Strang @ 2005-09-09 11:37 UTC (permalink / raw) To: Neil Brown, Mr. James W. Laferriere; +Cc: linux-raid maillist NeilBrown wrote: > Hmm.. obviously hot-add isn't enough to trigger the rebuild in that > kernel. I can attest to this; as I workaround I've been using: mdadm --readonly /dev/mdX mdadm --readwrite /dev/mdX That will trigger a rebuild. -- David ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 11:37 ` David M. Strang @ 2005-09-09 13:52 ` Mr. James W. Laferriere 2005-09-09 13:59 ` David M. Strang 0 siblings, 1 reply; 18+ messages in thread From: Mr. James W. Laferriere @ 2005-09-09 13:52 UTC (permalink / raw) To: David M. Strang; +Cc: Neil Brown, linux-raid maillist Hello David , Thank you for the idea . But ... root@devel-0:~ # mdadm --readonly /dev/md_d0 mdadm: failed to set readonly for /dev/md_d0: Device or resource busy I think I'll try Neil's upgrade to 2.6.13 & his patch to mdadm . I'll report back if that cures my problem . Tnx to all , JimL On Fri, 9 Sep 2005, David M. Strang wrote: > NeilBrown wrote: >> Hmm.. obviously hot-add isn't enough to trigger the rebuild in that >> kernel. > > I can attest to this; as I workaround I've been using: > > mdadm --readonly /dev/mdX > mdadm --readwrite /dev/mdX > > That will trigger a rebuild. > > > -- David > -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 13:52 ` Mr. James W. Laferriere @ 2005-09-09 13:59 ` David M. Strang 2005-09-09 19:59 ` Mr. James W. Laferriere 0 siblings, 1 reply; 18+ messages in thread From: David M. Strang @ 2005-09-09 13:59 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: Neil Brown, linux-raid maillist Mr. James W. Laferriere wrote: > Hello David , Thank you for the idea . But ... > > root@devel-0:~ # mdadm --readonly /dev/md_d0 > mdadm: failed to set readonly for /dev/md_d0: Device or resource busy James -- umount /dev/md_d0 first; you can remount it right after you re-enable writes. That should do the trick =) -- David ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 13:59 ` David M. Strang @ 2005-09-09 19:59 ` Mr. James W. Laferriere 0 siblings, 0 replies; 18+ messages in thread From: Mr. James W. Laferriere @ 2005-09-09 19:59 UTC (permalink / raw) To: David M. Strang; +Cc: Neil Brown, linux-raid maillist Hello David , That did work . Thank you again . JimL umount /directory mdadm --readonly /dev/mdX mdadm --readwrite /dev/mdX mount /directory On Fri, 9 Sep 2005, David M. Strang wrote: > Mr. James W. Laferriere wrote: >> Hello David , Thank you for the idea . But ... >> >> root@devel-0:~ # mdadm --readonly /dev/md_d0 >> mdadm: failed to set readonly for /dev/md_d0: Device or resource busy > > James -- > > umount /dev/md_d0 first; you can remount it right after you re-enable writes. > > That should do the trick =) > > -- David -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 7:40 ` Neil Brown 2005-09-09 11:37 ` David M. Strang @ 2005-09-09 20:07 ` Mr. James W. Laferriere 2005-09-09 20:58 ` OT: lilo overwriting partition info ? Mr. James W. Laferriere 2005-09-09 21:49 ` Drive fails & raid6 array is not self rebuild Neil Brown 1 sibling, 2 replies; 18+ messages in thread From: Mr. James W. Laferriere @ 2005-09-09 20:07 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid maillist Hello Neil , I patched all were successful . But after a make clean ; make I get ... Tia , JimL ..snip... gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c Assemble.c: In function `Assemble': Assemble.c:323: error: `nextspare' undeclared (first use in this function) Assemble.c:323: error: (Each undeclared identifier is reported only once Assemble.c:323: error: for each function it appears in.) make: *** [Assemble.o] Error 1 On Fri, 9 Sep 2005, Neil Brown wrote: > On Thursday September 8, babydr@baby-dragons.com wrote: >>> What happens if you then >>> mdadm /dev/md_d0 -a /dev/sda[pqrs] >>> ?? >> Getting stranger & stranger . >> >> root@devel-0:~ # mdadm /dev/md_d0 -a /dev/sda[pqrs] >> mdadm: re-added /dev/sdap > Hmm.. mdadm bug. > >> root@devel-0:~ # cat /proc/mdstat >> Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [raid6] [raid10] >> md_d0 : active raid5 sdap[36] sdc[0] sdao[40] sdan[34] sdam[33] >> sdal[32] sdak[31] sdaj[30] sdah[29] sdag[28] sdaf[27] sdae[26] >> sdad[25] sdac[24] sdab[23] sdaa[22] sdz[21] sdy[20] sdw[19] sdv[18] >> sdu[17] sdt[16] sds[15] sdr[14] sdq[13] sdp[12] sdo[11] sdn[10] sdl[9] >> sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2](F) sdd[1] >> 1244826240 blocks level 5, 64k chunk, algorithm 2 [36/35] >> [UU_UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU] > > Hmm.. obviously hot-add isn't enough to trigger the rebuild in that > kernel. > Attached are three patches. > The first two are needed by 2.6.12.5 to make sure resync happens (this > is particularly a problem for version-1 superblocks) or just upgrade > to 2.6.13. > The last fixes mdadm-v2.0 so that when you add /dev/sda[pqrs] it > actually adds all of them, and so that when you --assemble a version-1 > array with spares, the spares actually get included. > NeilBrown -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* OT: lilo overwriting partition info ? 2005-09-09 20:07 ` Mr. James W. Laferriere @ 2005-09-09 20:58 ` Mr. James W. Laferriere 2005-09-09 21:49 ` Drive fails & raid6 array is not self rebuild Neil Brown 1 sibling, 0 replies; 18+ messages in thread From: Mr. James W. Laferriere @ 2005-09-09 20:58 UTC (permalink / raw) To: linux-raid maillist Hello All , Off topic I know ... I have a question not related to MD . Have you heard of complaints about lilo overwriting partition info on disks after the first 2 if those are in an raid1 ? Or any mentions of lilo writing to all 16 disks causing problems ? Tia , JimL -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 20:07 ` Mr. James W. Laferriere 2005-09-09 20:58 ` OT: lilo overwriting partition info ? Mr. James W. Laferriere @ 2005-09-09 21:49 ` Neil Brown 2005-09-10 0:54 ` Mr. James W. Laferriere 1 sibling, 1 reply; 18+ messages in thread From: Neil Brown @ 2005-09-09 21:49 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist On Friday September 9, babydr@baby-dragons.com wrote: > > Hello Neil , I patched all were successful . But after a > make clean ; make > I get ... Tia , JimL > ..snip... > gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c > Assemble.c: In function `Assemble': > Assemble.c:323: error: `nextspare' undeclared (first use in this function) > Assemble.c:323: error: (Each undeclared identifier is reported only once > Assemble.c:323: error: for each function it appears in.) > make: *** [Assemble.o] Error 1 That's odd, as the patch contained: --- ./Assemble.c~current~ 2005-09-05 10:55:01.000000000 +1000 +++ ./Assemble.c 2005-09-09 16:24:50.000000000 +1000 @@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char struct mdinfo info; struct mddev_ident_s ident2; char *avail; + int nextspare = 0; vers = md_get_version(mdfd); if (vers <= 0) { Maybe add that bit in by hand?? NeilBrown ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-09 21:49 ` Drive fails & raid6 array is not self rebuild Neil Brown @ 2005-09-10 0:54 ` Mr. James W. Laferriere 2005-09-10 21:58 ` Neil Brown 0 siblings, 1 reply; 18+ messages in thread From: Mr. James W. Laferriere @ 2005-09-10 0:54 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid maillist Hello Neil , On Sat, 10 Sep 2005, Neil Brown wrote: > On Friday September 9, babydr@baby-dragons.com wrote: >> >> Hello Neil , I patched all were successful . But after a >> make clean ; make >> I get ... Tia , JimL >> ..snip... >> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c >> Assemble.c: In function `Assemble': >> Assemble.c:323: error: `nextspare' undeclared (first use in this function) >> Assemble.c:323: error: (Each undeclared identifier is reported only once >> Assemble.c:323: error: for each function it appears in.) >> make: *** [Assemble.o] Error 1 > > That's odd, as the patch contained: > > --- ./Assemble.c~current~ 2005-09-05 10:55:01.000000000 +1000 > +++ ./Assemble.c 2005-09-09 16:24:50.000000000 +1000 > @@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char > struct mdinfo info; > struct mddev_ident_s ident2; > char *avail; > + int nextspare = 0; > > vers = md_get_version(mdfd); > if (vers <= 0) { What was missing from my 2.0 sources was the 'char *avail;' and patching failed on that hunk , Which totally missed . So I hand entered as you suggested the above bits . Now it failes on a Warning (???) . Never heard of failures on warnings before . gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c Assemble.c: In function `Assemble': Assemble.c:121: warning: unused variable `avail' make: *** [Assemble.o] Error 1 Would you please cut a source set to the kernel site > Say as version 2.0a so I can see the diffs against the sources I have ? Tia , JimL -- +------------------------------------------------------------------+ | James W. Laferriere | System Techniques | Give me VMS | | Network Engineer | 3542 Broken Yoke Dr. | Give me Linux | | babydr@baby-dragons.com | Billings , MT. 59105 | only on AXP | +------------------------------------------------------------------+ ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Drive fails & raid6 array is not self rebuild . 2005-09-10 0:54 ` Mr. James W. Laferriere @ 2005-09-10 21:58 ` Neil Brown 0 siblings, 0 replies; 18+ messages in thread From: Neil Brown @ 2005-09-10 21:58 UTC (permalink / raw) To: Mr. James W. Laferriere; +Cc: linux-raid maillist [-- Attachment #1: message body text --] [-- Type: text/plain, Size: 2287 bytes --] On Friday September 9, babydr@baby-dragons.com wrote: > Hello Neil , > > On Sat, 10 Sep 2005, Neil Brown wrote: > > On Friday September 9, babydr@baby-dragons.com wrote: > >> > >> Hello Neil , I patched all were successful . But after a > >> make clean ; make > >> I get ... Tia , JimL > >> ..snip... > >> gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c > >> Assemble.c: In function `Assemble': > >> Assemble.c:323: error: `nextspare' undeclared (first use in this function) > >> Assemble.c:323: error: (Each undeclared identifier is reported only once > >> Assemble.c:323: error: for each function it appears in.) > >> make: *** [Assemble.o] Error 1 > > > > That's odd, as the patch contained: > > > > --- ./Assemble.c~current~ 2005-09-05 10:55:01.000000000 +1000 > > +++ ./Assemble.c 2005-09-09 16:24:50.000000000 +1000 > > @@ -119,6 +119,7 @@ int Assemble(struct supertype *st, char > > struct mdinfo info; > > struct mddev_ident_s ident2; > > char *avail; > > + int nextspare = 0; > > > > vers = md_get_version(mdfd); > > if (vers <= 0) { > > What was missing from my 2.0 sources was the 'char *avail;' > and patching failed on that hunk , Which totally missed . The 'avail' is for a different independent patch which fixes a raid10 issue. You can ignore it. > So I hand entered as you suggested the above bits . > > Now it failes on a Warning (???) . I guess you didn't ignore it. Just add the 'int next_spare = 0;' to what you had. Don't worry that the 'char *avail;' isn't there. > Never heard of failures on warnings before . That would be because of the '-Werror' I put in there to make sure I don't get lazy about warnings. > > gcc -Wall -Werror -Wstrict-prototypes -DCONFFILE=\"/etc/mdadm.conf\" -ggdb -DSendmail=\""/usr/sbin/sendmail -t"\" -c -o Assemble.o Assemble.c > Assemble.c: In function `Assemble': > Assemble.c:121: warning: unused variable `avail' > make: *** [Assemble.o] Error 1 > > Would you please cut a source set to the kernel site > > Say as version 2.0a so I can see the diffs against the > sources I have ? Tia , JimL I hope to do a 2.1 next week. Here is the current complete patch against 2.0. NeilBrown [-- Attachment #2: mdadm.diff --] [-- Type: application/octet-stream, Size: 6338 bytes --] diff -ru /var/tmp/mdadm-old/mdadm-2.0/Assemble.c /var/tmp/mdadm-new/mdadm-2.0/Assemble.c --- /var/tmp/mdadm-old/mdadm-2.0/Assemble.c 2005-08-15 16:31:57.000000000 +1000 +++ /var/tmp/mdadm-new/mdadm-2.0/Assemble.c 2005-09-09 16:24:50.000000000 +1000 @@ -118,6 +118,8 @@ mddev_dev_t tmpdev; struct mdinfo info; struct mddev_ident_s ident2; + char *avail; + int nextspare = 0; vers = md_get_version(mdfd); if (vers <= 0) { @@ -319,6 +321,11 @@ i = devcnt; else i = devices[devcnt].raid_disk; + if (i+1 == 0) { + if (nextspare < info.array.raid_disks) + nextspare = info.array.raid_disks; + i = nextspare++; + } if (i < 10000) { if (i >= bestcnt) { unsigned int newbestcnt = i+10; @@ -359,6 +366,8 @@ /* now we have some devices that might be suitable. * I wonder how many */ + avail = malloc(info.array.raid_disks); + memset(avail, 0, info.array.raid_disks); okcnt = 0; sparecnt=0; for (i=0; i< bestcnt ;i++) { @@ -377,13 +386,16 @@ if (devices[j].events+event_margin >= devices[most_recent].events) { devices[j].uptodate = 1; - if (i < info.array.raid_disks) + if (i < info.array.raid_disks) { okcnt++; - else + avail[i]=1; + } else sparecnt++; } } - while (force && !enough(info.array.level, info.array.raid_disks, okcnt)) { + while (force && !enough(info.array.level, info.array.raid_disks, + info.array.layout, + avail, okcnt)) { /* Choose the newest best drive which is * not up-to-date, update the superblock * and add it. @@ -434,6 +446,7 @@ close(fd); devices[chosen_drive].events = devices[most_recent].events; devices[chosen_drive].uptodate = 1; + avail[chosen_drive] = 1; okcnt++; free(super); } @@ -599,7 +612,7 @@ if (runstop == 1 || (runstop == 0 && - ( enough(info.array.level, info.array.raid_disks, okcnt) && + ( enough(info.array.level, info.array.raid_disks, info.array.layout, avail, okcnt) && (okcnt >= req_cnt || start_partial_ok) ))) { if (ioctl(mdfd, RUN_ARRAY, NULL)==0) { @@ -627,7 +640,7 @@ fprintf(stderr, Name ": %s assembled from %d drive%s", mddev, okcnt, okcnt==1?"":"s"); if (sparecnt) fprintf(stderr, " and %d spare%s", sparecnt, sparecnt==1?"":"s"); - if (!enough(info.array.level, info.array.raid_disks, okcnt)) + if (!enough(info.array.level, info.array.raid_disks, info.array.layout, avail, okcnt)) fprintf(stderr, " - not enough to start the array.\n"); else { if (req_cnt == info.array.raid_disks) diff -ru /var/tmp/mdadm-old/mdadm-2.0/Manage.c /var/tmp/mdadm-new/mdadm-2.0/Manage.c --- /var/tmp/mdadm-old/mdadm-2.0/Manage.c 2005-08-26 14:49:25.000000000 +1000 +++ /var/tmp/mdadm-new/mdadm-2.0/Manage.c 2005-09-09 16:04:12.000000000 +1000 @@ -288,7 +288,7 @@ if (ioctl(fd, ADD_NEW_DISK, &disc) == 0) { if (verbose >= 0) fprintf(stderr, Name ": re-added %s\n", dv->devname); - return 0; + continue; } /* fall back on normal-add */ } diff -ru /var/tmp/mdadm-old/mdadm-2.0/mdadm.h /var/tmp/mdadm-new/mdadm-2.0/mdadm.h --- /var/tmp/mdadm-old/mdadm-2.0/mdadm.h 2005-08-26 14:49:24.000000000 +1000 +++ /var/tmp/mdadm-new/mdadm-2.0/mdadm.h 2005-09-05 10:55:01.000000000 +1000 @@ -291,7 +291,8 @@ extern int same_uuid(int a[4], int b[4], int swapuuid); /* extern int compare_super(mdp_super_t *first, mdp_super_t *second);*/ extern unsigned long calc_csum(void *super, int bytes); -extern int enough(int level, int raid_disks, int avail_disks); +extern int enough(int level, int raid_disks, int layout, + char *avail, int avail_disks); extern int ask(char *mesg); diff -ru /var/tmp/mdadm-old/mdadm-2.0/super0.c /var/tmp/mdadm-new/mdadm-2.0/super0.c --- /var/tmp/mdadm-old/mdadm-2.0/super0.c 2005-08-26 14:49:24.000000000 +1000 +++ /var/tmp/mdadm-new/mdadm-2.0/super0.c 2005-09-05 10:55:01.000000000 +1000 @@ -131,6 +131,10 @@ c = map_num(r5layout, sb->layout); printf(" Layout : %s\n", c?c:"-unknown-"); } + if (sb->level == 10) { + printf(" Layout : near=%d, far=%d\n", + sb->layout&255, (sb->layout>>8)&255); + } switch(sb->level) { case 0: case 4: @@ -234,6 +238,7 @@ info->array.patch_version = sb->patch_version; info->array.raid_disks = sb->raid_disks; info->array.level = sb->level; + info->array.layout = sb->layout; info->array.md_minor = sb->md_minor; info->array.ctime = sb->ctime; diff -ru /var/tmp/mdadm-old/mdadm-2.0/super1.c /var/tmp/mdadm-new/mdadm-2.0/super1.c --- /var/tmp/mdadm-old/mdadm-2.0/super1.c 2005-08-26 16:07:33.000000000 +1000 +++ /var/tmp/mdadm-new/mdadm-2.0/super1.c 2005-09-05 10:55:01.000000000 +1000 @@ -180,6 +180,11 @@ c = map_num(r5layout, __le32_to_cpu(sb->layout)); printf(" Layout : %s\n", c?c:"-unknown-"); } + if (__le32_to_cpu(sb->level) == 10) { + int lo = __le32_to_cpu(sb->layout); + printf(" Layout : near=%d, far=%d\n", + lo&255, (lo>>8)&255); + } switch(__le32_to_cpu(sb->level)) { case 0: case 4: @@ -290,6 +295,7 @@ info->array.patch_version = 0; info->array.raid_disks = __le32_to_cpu(sb->raid_disks); info->array.level = __le32_to_cpu(sb->level); + info->array.layout = __le32_to_cpu(sb->layout); info->array.md_minor = -1; info->array.ctime = __le64_to_cpu(sb->ctime); diff -ru /var/tmp/mdadm-old/mdadm-2.0/util.c /var/tmp/mdadm-new/mdadm-2.0/util.c --- /var/tmp/mdadm-old/mdadm-2.0/util.c 2005-08-17 14:28:38.000000000 +1000 +++ /var/tmp/mdadm-new/mdadm-2.0/util.c 2005-09-05 10:55:01.000000000 +1000 @@ -118,10 +118,31 @@ return (a*1000000)+(b*1000)+c; } -int enough(int level, int raid_disks, int avail_disks) +int enough(int level, int raid_disks, int layout, + char *avail, int avail_disks) { + int copies, first; switch (level) { - case 10: return 1; /* a lie, but it is hard to tell */ + case 10: + /* This is the tricky one - we need to check + * which actual disks are present. + */ + copies = (layout&255)* (layout>>8); + first=0; + do { + /* there must be one of the 'copies' form 'first' */ + int n = copies; + int cnt=0; + while (n--) { + if (avail[first]) + cnt++; + first = (first+1) % raid_disks; + } + if (cnt == 0) + return 0; + + } while (first != 0); + return 1; case -4: return avail_disks>= 1; ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2005-09-10 21:58 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-09-08 11:38 my first raid disaster on reboot :o( update Ken Walker 2005-09-08 18:54 ` Drive fails & raid6 array is not self rebuild Mr. James W. Laferriere 2005-09-08 19:34 ` Molle Bestefich 2005-09-08 21:09 ` Neil Brown 2005-09-08 21:39 ` Mr. James W. Laferriere 2005-09-09 0:50 ` Neil Brown 2005-09-09 2:05 ` Mr. James W. Laferriere 2005-09-09 2:15 ` Mr. James W. Laferriere 2005-09-09 7:40 ` Neil Brown 2005-09-09 11:37 ` David M. Strang 2005-09-09 13:52 ` Mr. James W. Laferriere 2005-09-09 13:59 ` David M. Strang 2005-09-09 19:59 ` Mr. James W. Laferriere 2005-09-09 20:07 ` Mr. James W. Laferriere 2005-09-09 20:58 ` OT: lilo overwriting partition info ? Mr. James W. Laferriere 2005-09-09 21:49 ` Drive fails & raid6 array is not self rebuild Neil Brown 2005-09-10 0:54 ` Mr. James W. Laferriere 2005-09-10 21:58 ` Neil Brown
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.